Hi Rares,

The number of partition is controlled by HDFS input format, and one file may 
have multiple partitions if it consists of multiple block. In you case, I think 
there is one file with 2 splits.

Thanks.

Zhan Zhang
On Mar 27, 2015, at 3:12 PM, Rares Vernica 
<rvern...@gmail.com<mailto:rvern...@gmail.com>> wrote:

Hello,

I am using the Spark shell in Scala on the localhost. I am using sc.textFile to 
read a directory. The directory looks like this (generated by another Spark 
script):

part-00000
part-00001
_SUCCESS

The part-00000 has four short lines of text while part-00001 has two short 
lines of text. The _SUCCESS file is empty. When I check the number of 
partitions on the RDD I get:

scala> foo.partitions.length
15/03/27 14:57:31 INFO FileInputFormat: Total input paths to process : 2
res68: Int = 3

I wonder why do the two input files generate three partitions. Does Spark check 
the number of lines in each file and try to generate three balanced partitions?

Thanks!
Rares

Reply via email to