2 input paths generate 3 partitions

Rares Vernica Fri, 27 Mar 2015 15:13:01 -0700

Hello,

I am using the Spark shell in Scala on the localhost. I am using sc.textFile
to read a directory. The directory looks like this (generated by another
Spark script):


part-00000
part-00001
_SUCCESS


The part-00000 has four short lines of text while part-00001 has two short
lines of text. The _SUCCESS file is empty. When I check the number of
partitions on the RDD I get:

scala> foo.partitions.length
15/03/27 14:57:31 INFO FileInputFormat: Total input paths to process : 2
res68: Int = 3


I wonder why do the two input files generate three partitions. Does Spark
check the number of lines in each file and try to generate three balanced
partitions?

Thanks!
Rares

2 input paths generate 3 partitions

Reply via email to