Re: [Community] Python support added to Spark Job Server

2016-08-20 Thread Taotao.Li
awesome, one question about the job server, when will it support Spark 2.x ? great thanks~ On Thu, Aug 18, 2016 at 1:04 AM, Evan Chan wrote: > Hi folks, > > Just a friendly message that we have added Python support to the REST > Spark Job Server project. If you are a

mutable.LinkedHashMap kryo serialization issues

2016-08-20 Thread Rahul Palamuttam
Hi, I recently switched to using kryo serialization and I've been running into errors with the mutable.LinkedHashMap class. If I don't register the mutable.LinkedHashMap class then I get an ArrayStoreException seen below. If I do register the class, then when the LinkedHashMap is collected on

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread ponkin
I generated CSV file with 300 columns, and it seems to work fine with Spark Dataframes(Spark 2.0). I think you need to post your issue in spark-cassandra-connector community (https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user) - if you are using it. -- View this

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread ponkin
Did you try to load wide, for example, CSV file or Parquet? May be the problem is in spark-cassandra-connector not Spark itself? Are you using spark-cassandra-connector(https://github.com/datastax/spark-cassandra-connector)? -- View this message in context:

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread mhornbech
Cassandra. Morten > Den 20. aug. 2016 kl. 13.53 skrev ponkin [via Apache Spark User List] > : > > Hi, > What kind of datasource do you have? CSV, Avro, Parquet? > > If you reply to this email, your message will be added to the discussion > below: >

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread ponkin
Hi, What kind of datasource do you have? CSV, Avro, Parquet? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-regression-when-querying-very-wide-data-frames-tp27567p27569.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark 2.0 regression when querying very wide data frames

2016-08-20 Thread Sean Owen
Yes, have a look through JIRA in cases like this. https://issues.apache.org/jira/browse/SPARK-16664 On Sat, Aug 20, 2016 at 1:57 AM, mhornbech wrote: > I did some extra digging. Running the query "select column1 from myTable" I > can reproduce the problem on a frame with a

Re: Best way to read XML data from RDD

2016-08-20 Thread Jörn Franke
I fear the issue is that this will create and destroy a XML parser object 2 mio times, which is very inefficient - it does not really look like a parser performance issue. Can't you do something about the format choice? Ask your supplier to deliver another format (ideally avro or sth like