Hi All,
Thanks for your answers.
I have one more details to point out.
It is clear now how partition number is defined for HDFS file,
However, if i have my dataset replicated on all the machines in the same
absolute path.
In this case each machine has for instance ext3 filesystem.
If i load
What file system are you using ?
If you use hdfs, the documentation you cited is pretty clear on how
partitions are determined.
bq. file X replicated on 4 machines
I don't think replication factor plays a role w.r.t. partitions.
On Thu, Feb 19, 2015 at 8:05 AM, Alessandro Lulli
By default you will have (fileSize in Mb / 64) partitions. You can also set
the number of partitions when you read in a file with sc.textFile as an
optional second parameter.
On Thu, Feb 19, 2015 at 8:07 AM Alessandro Lulli lu...@di.unipi.it wrote:
Hi All,
Could you please help me
bq. *blocks being 64MB by default in HDFS*
*In hadoop 2.1+, default block size has been increased.*
See https://issues.apache.org/jira/browse/HDFS-4053
Cheers
On Thu, Feb 19, 2015 at 8:32 AM, Ted Yu yuzhih...@gmail.com wrote:
What file system are you using ?
If you use hdfs, the
@spark.apache.org
Cc: Massimiliano Bertolucci
Subject: Re: RDD Partition number
By default you will have (fileSize in Mb / 64) partitions. You can also set the
number of partitions when you read in a file with sc.textFile as an optional
second parameter.
On Thu, Feb 19, 2015 at 8:07 AM Alessandro Lulli
lu