Hello,

I am using the Spark shell in Scala on the localhost. I am using sc.textFile
to read a directory. The directory looks like this (generated by another
Spark script):

part-00000
part-00001
_SUCCESS


The part-00000 has four short lines of text while part-00001 has two short
lines of text. The _SUCCESS file is empty. When I check the number of
partitions on the RDD I get:

scala> foo.partitions.length
15/03/27 14:57:31 INFO FileInputFormat: Total input paths to process : 2
res68: Int = 3


I wonder why do the two input files generate three partitions. Does Spark
check the number of lines in each file and try to generate three balanced
partitions?

Thanks!
Rares

Reply via email to