Re: Pyspark 2.1.0 weird behavior with repartition

2017-03-11 Thread Olivier Girardot
I kinda reproduced that, with pyspark 2.1 also for hadoop 2.6 and with python 3.x I'll look into it a bit more after I've fixed a few other issues regarding the salting of strings on the cluster. 2017-01-30 20:19 GMT+01:00 Blaž Šnuderl : > I am loading a simple text file using

Re: Which streaming platform is best? Kafka or Spark Streaming?

2017-03-11 Thread Gaurav Pandya
Thank you very much guys. My question may sound little bit off but was somewhat confused so wanted to get some expert advice on this. I will take a look at the links mentioned in the replies. I really appreciate your suggestions. These are the kind of answers I needed to clear my doubts. Have a

Re: How to improve performance of saveAsTextFile()

2017-03-11 Thread Yan Facai
How about increasing RDD's partitions / rebalancing data? On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud wrote: > How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“). > This is taking over 30 minutes on a cluster of 10 nodes. > Running Spark on YARN. > >

Re: org.apache.spark.SparkException: Task not serializable

2017-03-11 Thread Yan Facai
For scala, make your class Serializable, like this ``` class YourClass *extends Serializable {}```* On Sat, Mar 11, 2017 at 3:51 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > hi mina, > > can you paste your new code here pleasel > i meet this issue too but do not get Ankur's idea. > > thanks >

Re: java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext

2017-03-11 Thread ??????????
i think the val you defined are only valid in the driver, you can try boardcast variable. ---Original--- From: "lk_spark" Date: 2017/2/27 11:14:23 To: "user.spark"; Subject: java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext

Re: Issues: Generate JSON with null values in Spark 2.0.x

2017-03-11 Thread Dongjin Lee
Hello Chetan, Could you post some code? If I understood correctly, you are trying to save JSON like: { "first_name": "Dongjin", "last_name: null } not in omitted form, like: { "first_name": "Dongjin" } right? - Dongjin On Wed, Mar 8, 2017 at 5:58 AM, Chetan Khatri