I loaded a very tiny file into Spark -- 23 lines of text, 2.6kb Given the size, and that it is a single file, I assumed it would only be in a single partition. But when I cache it, I can see in the Spark App UI that it actually splits it into two partitions:
[image: Inline image 1] Is this correct behavior? How does Spark decide how big a partition should be, or how many partitions to create for an RDD. If it matters, I have only a single worker in my "cluster", so both partitions are stored on the same worker. The file was on HDFS and was only a single block. Thanks for any insight. Diana
<<inline: sparkdev_2014-04-11.png>>