(SOLVED) Ahhhh... Spark creates >30000 partitions... What can I do?

2015-10-22 Thread t3l
I was able to solve this by myself. What I did is changing the way spark computes the partitioning for binary files. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-Spark-creates-3-partitions-What-can-I-do-tp25140p25170.html Sent from the Apache

Re: Ahhhh... Spark creates >30000 partitions... What can I do?

2015-10-21 Thread Ranadip Chatterjee
RDD with more than 3 partitions. >>> And >>> this takes ages to process these partitions even if you simply run >>> "count". >>> Performing a "repartition" directly after loading does

Re: Ahhhh... Spark creates >30000 partitions... What can I do?

2015-10-20 Thread Lan Jiang
e partitions even if you simply run "count". >> Performing a "repartition" directly after loading does not help, because >> Spark seems to insist on materializing the RDD created by binaryFiles first. >> >> How I can get around this? >> >>

Re: Ahhhh... Spark creates >30000 partitions... What can I do?

2015-10-20 Thread Sean Owen
> > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/A-Spark-creates-3-partitions-What-can-I-do-tp25140.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >

Ahhhh... Spark creates >30000 partitions... What can I do?

2015-10-20 Thread t3l
D created by binaryFiles first. How I can get around this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/A-Spark-creates-3-partitions-What-can-I-do-tp25140.html Sent from the Apache Spark User List mail

Re: Ahhhh... Spark creates >30000 partitions... What can I do?

2015-10-20 Thread Deenar Toraskar
;> Performing a "repartition" directly after loading does not help, because >> Spark seems to insist on materializing the RDD created by binaryFiles >> first. >> >> How I can get around this? >> >> &

Re: Ahhhh... Spark creates >30000 partitions... What can I do?

2015-10-20 Thread Lan Jiang
> that >>> involves this RDD, Spark spawns a RDD with more than 3 partitions. And >>> this takes ages to process these partitions even if you simply run "count". >>> Performing a "repartition" directly after loading does not help, because >>&g

Re: Ahhhh... Spark creates >30000 partitions... What can I do?

2015-10-20 Thread François Pelletier
ing the RDD created by > binaryFiles first. > > How I can get around this? > > > > -- > View this message in context: > > http://apache-spark-user-list.1001560.n3.nabble.com/A-Spa