Greetings!
SPARK-1476 says that there is a 2G limit for "blocks".Is this the same as a 2G 
limit for partitions (or approximately so?)?

What I had been attempting to do is the following.1) Start with a moderately 
large data set (currently about 100GB, but growing).2) Create about 1,000 files 
(yes, files) each representing a subset of the data.
The current attempt I am working on is something like this.1) Do a "map" whose 
output key indicates which of the 1,000 files it will go into and whose value 
is what I will want to stick into the file.2) Partition the data and use the 
body of mapPartition to open a file and save the data.
My apologies, this is actually embedded in a bigger mess, so I won't post it.
However, I get errors telling me that there is an "IllegalArgumentException: 
Size exceeds Inter.MAX_VALUE", with sun.nio.ch.FileChannelImpl.map at the top 
of the stack.  This leads me to think that I have hit the limit or partition 
and/or block size.
Perhaps this is not a good way to do it?
I suppose I could run 1,000 passes over the data, each time collecting the 
output for one of my 1,000 final files, but that seems likely to be painfully 
slow to run.
Am I missing something?
Admittedly, this is an odd use case....
Thanks!
Sincerely, Mike Albert

Reply via email to