Re: Spark 2.0 regression when querying very wide data frames

2016-08-22 Thread mhornbech
low: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27571.html > To unsubscribe from Spark 2.0 regression when querying very wide data frames, > click here. > NAML -- View this message in context: http://apache

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread ponkin
I generated CSV file with 300 columns, and it seems to work fine with Spark Dataframes(Spark 2.0). I think you need to post your issue in spark-cassandra-connector community (https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user) - if you are using it. -- View this

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread ponkin
Did you try to load wide, for example, CSV file or Parquet? May be the problem is in spark-cassandra-connector not Spark itself? Are you using spark-cassandra-connector(https://github.com/datastax/spark-cassandra-connector)? -- View this message in context:

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread mhornbech
to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27569.html > To unsubscribe from Spark 2.0 regression when querying very wide data frames, > click here. > NAML -- View this message

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread ponkin
Hi, What kind of datasource do you have? CSV, Avro, Parquet? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27569.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread Sean Owen
Yes, have a look through JIRA in cases like this. https://issues.apache.org/jira/browse/SPARK-16664 On Sat, Aug 20, 2016 at 1:57 AM, mhornbech wrote: > I did some extra digging. Running the query "select column1 from myTable" I > can reproduce the problem on a frame with a

Re: Spark 2.0 regression when querying very wide data frames

2016-08-19 Thread mhornbech
I did some extra digging. Running the query "select column1 from myTable" I can reproduce the problem on a frame with a single row - it occurs exactly when the frame has more than 200 columns, which smells a bit like a hardcoded limit. Interestingly the problem disappears when replacing the query

Spark 2.0 regression when querying very wide data frames

2016-08-19 Thread mhornbech
Hi We currently have some workloads in Spark 1.6.2 with queries operating on a data frame with 1500+ columns (17000 rows). This has never been quite stable, and some queries, such as "select *" would yield empty result sets, but queries restricting to specific columns have mostly worked. Needless