Hi, I am not using HDFS, I am using the local file system. Moreover, I did not modify the defaultParallelism. The Spark instance is the default one started by Spark Shell.
Thanks! Rares On Fri, Mar 27, 2015 at 4:48 PM, java8964 <java8...@hotmail.com> wrote: > The files sound too small to be 2 blocks in HDFS. > > Did you set the defaultParallelism to be 3 in your spark? > > Yong > > ------------------------------ > Subject: Re: 2 input paths generate 3 partitions > From: zzh...@hortonworks.com > To: rvern...@gmail.com > CC: user@spark.apache.org > Date: Fri, 27 Mar 2015 23:15:38 +0000 > > > Hi Rares, > > The number of partition is controlled by HDFS input format, and one file > may have multiple partitions if it consists of multiple block. In you case, > I think there is one file with 2 splits. > > Thanks. > > Zhan Zhang > On Mar 27, 2015, at 3:12 PM, Rares Vernica <rvern...@gmail.com> wrote: > > Hello, > > I am using the Spark shell in Scala on the localhost. I am using > sc.textFile to read a directory. The directory looks like this (generated > by another Spark script): > > part-00000 > part-00001 > _SUCCESS > > > The part-00000 has four short lines of text while part-00001 has two > short lines of text. The _SUCCESS file is empty. When I check the number > of partitions on the RDD I get: > > scala> foo.partitions.length > 15/03/27 14:57:31 INFO FileInputFormat: Total input paths to process : 2 > res68: Int = 3 > > > I wonder why do the two input files generate three partitions. Does > Spark check the number of lines in each file and try to generate three > balanced partitions? > > Thanks! > Rares > > >