Hello, I am using the Spark shell in Scala on the localhost. I am using sc.textFile to read a directory. The directory looks like this (generated by another Spark script):
part-00000 part-00001 _SUCCESS The part-00000 has four short lines of text while part-00001 has two short lines of text. The _SUCCESS file is empty. When I check the number of partitions on the RDD I get: scala> foo.partitions.length 15/03/27 14:57:31 INFO FileInputFormat: Total input paths to process : 2 res68: Int = 3 I wonder why do the two input files generate three partitions. Does Spark check the number of lines in each file and try to generate three balanced partitions? Thanks! Rares