Re: Number Of Partitions in RDD

2017-06-23 Thread Vikash Pareek
Local mode



-

__Vikash Pareek
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Number-Of-Partitions-in-RDD-tp28730p28786.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Number Of Partitions in RDD

2017-06-02 Thread neil90
CLuster mode with HDFS? or local mode?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Number-Of-Partitions-in-RDD-tp28730p28737.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Number Of Partitions in RDD

2017-06-02 Thread Vikash Pareek
Spark 1.6.1



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Number-Of-Partitions-in-RDD-tp28730p28735.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Number Of Partitions in RDD

2017-06-01 Thread neil90
What version of spark of spark are you using?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Number-Of-Partitions-in-RDD-tp28730p28732.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Number Of Partitions in RDD

2017-06-01 Thread Michael Mior
While I'm not sure why you're seeing an increase in partitions with such a
small data file, it's worth noting that the second parameter to textFile is
the *minimum* number of partitions so there's no guarantee you'll get
exactly that number.

--
Michael Mior
mm...@apache.org

2017-06-01 6:28 GMT-04:00 Vikash Pareek :

> Hi,
>
> I am creating a RDD from a text file by specifying number of partitions.
> But
> it gives me different number of partitions than the specified one.
>
> */scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 0)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[72] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res47: Int = 1
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 1)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[50] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res36: Int = 1
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 2)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[52] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res37: Int = 2
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 3)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[54] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res38: Int = 3
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 4)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[56] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res39: Int = 4
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 5)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[58] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res40: Int = 6
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 6)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[60] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res41: Int = 7
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 7)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[62] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res42: Int = 8
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 8)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[64] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res43: Int = 9
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 9)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[66] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res44: Int = 11
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 10)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[68] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res45: Int = 11
>
> scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 11)
> people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[70] at
> textFile
> at :27
>
> scala> people.getNumPartitions
> res46: Int = 13/*
>
> Contents of the file /home/pvikash/data/test.txt is:
> "
> This is a test file.
> Will be used for rdd partition
> "
>
> I am trying to understand why number of partitions is changing here and in
> case we have small data (which can fit into one partition) then why spark
> creates empty partitions?
>
> Any explanation would be appreciated.
>
> --Vikash
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Number-Of-Partitions-in-RDD-tp28730.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>