Re: 2 input paths generate 3 partitions

Rares Vernica Fri, 27 Mar 2015 20:31:58 -0700

Hi,

I am not using HDFS, I am using the local file system. Moreover, I did not
modify the defaultParallelism. The Spark instance is the default one
started by Spark Shell.


Thanks!
Rares


On Fri, Mar 27, 2015 at 4:48 PM, java8964 <java8...@hotmail.com> wrote:

> The files sound too small to be 2 blocks in HDFS.
>
> Did you set the defaultParallelism to be 3 in your spark?
>
> Yong
>
> ------------------------------
> Subject: Re: 2 input paths generate 3 partitions
> From: zzh...@hortonworks.com
> To: rvern...@gmail.com
> CC: user@spark.apache.org
> Date: Fri, 27 Mar 2015 23:15:38 +0000
>
>
> Hi Rares,
>
>  The number of partition is controlled by HDFS input format, and one file
> may have multiple partitions if it consists of multiple block. In you case,
> I think there is one file with 2 splits.
>
>  Thanks.
>
>  Zhan Zhang
>  On Mar 27, 2015, at 3:12 PM, Rares Vernica <rvern...@gmail.com> wrote:
>
>  Hello,
>
>  I am using the Spark shell in Scala on the localhost. I am using
> sc.textFile to read a directory. The directory looks like this (generated
> by another Spark script):
>
>  part-00000
> part-00001
> _SUCCESS
>
>
>  The part-00000 has four short lines of text while part-00001 has two
> short lines of text. The _SUCCESS file is empty. When I check the number
> of partitions on the RDD I get:
>
>  scala> foo.partitions.length
> 15/03/27 14:57:31 INFO FileInputFormat: Total input paths to process : 2
> res68: Int = 3
>
>
>  I wonder why do the two input files generate three partitions. Does
> Spark check the number of lines in each file and try to generate three
> balanced partitions?
>
>  Thanks!
> Rares
>
>
>

Re: 2 input paths generate 3 partitions

Reply via email to