the meaining of "samplePointsPerPartitionHint" in RangePartitioner

2018-03-20 Thread 1427357...@qq.com
HI all, The belowing is the code of RangePartitioner. class RangePartitioner[K : Ordering : ClassTag, V]( partitions: Int, rdd: RDD[_ <: Product2[K, V]], private var ascending: Boolean = true, val samplePointsPerPartitionHint: Int = 20) I feel puzzled ab

Re: how to investigate skew and DataFrames and RangePartitioner

2016-06-14 Thread Takeshi Yamamuro
ure > out how to figure out the size of the file it’s trying to write out. > > Second, We use to use RDDs and RangePartitioner for task partitioning. > However, I don’t see this available in DataFrames. How does one

how to investigate skew and DataFrames and RangePartitioner

2016-06-13 Thread Peter Halliday
density we certainly could be hitting the fact we are getting tiles too dense. I’m trying to figure out how to figure out the size of the file it’s trying to write out. Second, We use to use RDDs and RangePartitioner for task partitioning. However, I don’t see this available in DataFrames. How

RangePartitioner in Spark 1.2.1

2015-02-17 Thread java8964
Hi, Sparkers: I just happened to search in google for something related to the RangePartitioner of spark, and found an old thread in this email list as here: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-and-Partition-td991.html I followed the code example mentioned in that email thread

Re: RangePartitioner in Spark 1.2.1

2015-02-17 Thread Aaron Davidson
RangePartitioner does not actually provide a guarantee that all partitions will be equal sized (that is hard), and instead uses sampling to approximate equal buckets. Thus, it is possible that a bucket is left empty. If you want the specified behavior, you should define your own partitioner

Re: Spark job stuck at RangePartitioner at Exchange.scala:79

2015-01-21 Thread Sunita Arvind
spark.SparkContext: Starting job: RangePartitioner at Exchange.scala:79 A bit of background which may or may not be relevant. The program was working fine in eclipse, however, was getting hung upon submission to the cluster. In an attempt to debug, I changed the version in build.sbt to match the one

Re: RangePartitioner

2015-01-21 Thread Sandy Ryza
proceeds. What might be the issue and possible solution? INFO SparkContext: Starting job: RangePartitioner at Exchange.scala:79 Table 1 has 450 columns Table2 has 100 columns Both tables have few million rows val table1= myTable1.as('table1) val table2= myTable2

RangePartitioner

2015-01-20 Thread Rishi Yadav
I am joining two tables as below, the program stalls at below log line and never proceeds. What might be the issue and possible solution? INFO SparkContext: Starting job: RangePartitioner at Exchange.scala:79 Table 1 has  450 columns Table2 has  100 columns Both tables have few million

Spark job stuck at RangePartitioner at Exchange.scala:79

2015-01-17 Thread Sunita Arvind
MB) 15/01/17 11:44:16 INFO spark.SparkContext: Starting job: RangePartitioner at Exchange.scala:79 A bit of background which may or may not be relevant. The program was working fine in eclipse, however, was getting hung upon submission to the cluster. In an attempt to debug, I changed the version