Dear Spark developers, I have 100 binary files in local file system that I want to load into Spark RDD. I need the data from each file to be in a separate partition. However, I cannot make it happen:
scala> sc.binaryFiles("/data/subset").partitions.size res5: Int = 66 The "minPartitions" parameter does not seems to help: scala> sc.binaryFiles("/data/subset", minPartitions = 100).partitions.size res8: Int = 66 At the same time, Spark produces the required number of partitions with sc.textFiles (though I cannot use it because my files are binary): scala> sc.textFile("/data/subset").partitions.size res9: Int = 100 Could you suggest how to force Spark to load binary files each in a separate partition? Best regards, Alexander