I kinda reproduced that, with pyspark 2.1 also for hadoop 2.6 and with
python 3.x
I'll look into it a bit more after I've fixed a few other issues regarding
the salting of strings on the cluster.
2017-01-30 20:19 GMT+01:00 Blaž Šnuderl :
> I am loading a simple text file using
Thank you very much guys. My question may sound little bit off but was
somewhat confused so wanted to get some expert advice on this. I will take
a look at the links mentioned in the replies. I really appreciate your
suggestions. These are the kind of answers I needed to clear my doubts.
Have a
How about increasing RDD's partitions / rebalancing data?
On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud
wrote:
> How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“).
> This is taking over 30 minutes on a cluster of 10 nodes.
> Running Spark on YARN.
>
>
For scala,
make your class Serializable, like this
```
class YourClass
*extends Serializable {}```*
On Sat, Mar 11, 2017 at 3:51 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote:
> hi mina,
>
> can you paste your new code here pleasel
> i meet this issue too but do not get Ankur's idea.
>
> thanks
>
i think the val you defined are only valid in the driver, you can try
boardcast variable.
---Original---
From: "lk_spark"
Date: 2017/2/27 11:14:23
To: "user.spark";
Subject: java.io.NotSerializableException:
org.apache.spark.streaming.StreamingContext
Hello Chetan,
Could you post some code? If I understood correctly, you are trying to save
JSON like:
{
"first_name": "Dongjin",
"last_name: null
}
not in omitted form, like:
{
"first_name": "Dongjin"
}
right?
- Dongjin
On Wed, Mar 8, 2017 at 5:58 AM, Chetan Khatri