Dear Spark developers,

I have 100 binary files in local file system that I want to load into Spark 
RDD. I need the data from each file to be in a separate partition. However, I 
cannot make it happen:

scala> sc.binaryFiles("/data/subset").partitions.size
res5: Int = 66

The "minPartitions" parameter does not seems to help:
scala> sc.binaryFiles("/data/subset", minPartitions = 100).partitions.size
res8: Int = 66

At the same time, Spark produces the required number of partitions with 
sc.textFiles (though I cannot use it because my files are binary):
scala> sc.textFile("/data/subset").partitions.size
res9: Int = 100

Could you suggest how to force Spark to load binary files each in a separate 
partition?

Best regards, Alexander

Reply via email to