Hi Abhishek,
Thanks for your suggestion, I did considered it, but I'm not sure if to
achieve that I'd ned to collect() the data first, I don't think it would
fit into the Driver memory.
Since I'm trying all of this inside the pyspark shell I'm using a small
dataset, however the main dataset is
Hi Daniel
Yes it will work without the collect method. You just do a map operation on
every item of the RDD.
Thanks
Abhishek S
> On 16 Dec 2015, at 18:10, Daniel Valdivia wrote:
>
> Hi Abhishek,
>
> Thanks for your suggestion, I did considered it, but I'm not
Hello Daniel,
I was thinking if you can write
catGroupArr.map(lambda line: create_and_write_file(line))
def create_and_write_file(line):
1. look at the key of line: line[0]
2. Open a file with required file name based on key
3. iterate through the values of this key,value pair
Hello everyone,
I have a PairRDD with a set of key and list of values, each value in the list
is a json which I already loaded beginning of my spark app, how can I iterate
over each value of the list in my pair RDD to transform it to a string then
save the whole content of the key to a file?