Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
Hi Jasbir, Yes, you are right. Do you have any idea about my question? Thanks, Fei On Mon, Jan 16, 2017 at 12:37 AM, wrote: > Hi, > > > > Coalesce is used to decrease the number of partitions. If you give the > value of numPartitions greater than the current

RE: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread jasbir.sing
Hi, Coalesce is used to decrease the number of partitions. If you give the value of numPartitions greater than the current partition, I don’t think RDD number of partitions will be increased. Thanks, Jasbir From: Fei Hu [mailto:hufe...@gmail.com] Sent: Sunday, January 15, 2017 10:10 PM To:

Re: Error at starting Phoenix shell with HBase

2017-01-15 Thread Chetan Khatri
Any updates for the above error guys ? On Fri, Jan 13, 2017 at 9:35 PM, Josh Elser wrote: > (-cc dev@phoenix) > > phoenix-4.8.2-HBase-1.2-server.jar in the top-level binary tarball of > Apache Phoenix 4.8.0 is the jar which is meant to be deployed to all > HBase's classpath.

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
Hi Liang-Chi, Yes, you are right. I implement the following solution for this problem, and it works. But I am not sure if it is efficient: I double the partitions of the parent RDD, and then use the new partitions and parent RDD to construct the target RDD. In the compute() function of the

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Liang-Chi Hsieh
Hi, When calling `coalesce` with `shuffle = false`, it is going to produce at most min(numPartitions, previous RDD's number of partitions). So I think it can't be used to double the number of partitions. Anastasios Zouzias wrote > Hi Fei, > > How you tried coalesce(numPartitions: Int,

Re: Limit Query Performance Suggestion

2017-01-15 Thread Liang-Chi Hsieh
Hi Sujith, Thanks for suggestion. The codes you quoted are from `CollectLimitExec` which will be in the plan if a logical `Limit` is the final operator in an logical plan. But in the physical plan you showed, there are `GlobalLimit` and `LocalLimit` for the logical `Limit` operation, so the

unsubscribe

2017-01-15 Thread Hosun Lee
*unsubscribe*

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
Hi Anastasios, Thanks for your information. I will look into the CoalescedRDD code. Thanks, Fei On Sun, Jan 15, 2017 at 12:21 PM, Anastasios Zouzias wrote: > Hi Fei, > > I looked at the code of CoalescedRDD and probably what I suggested will > not work. > > Speaking of

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Anastasios Zouzias
Hi Fei, I looked at the code of CoalescedRDD and probably what I suggested will not work. Speaking of which, CoalescedRDD is private[spark]. If this was not the case, you could set balanceSlack to 1, and get what you requested, see

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
Hi Anastasios, Thanks for your reply. If I just increase the numPartitions to be twice larger, how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps the data locality? Do I need to define my own Partitioner? Thanks, Fei On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Fei Hu
Hi Rishi, Thanks for your reply! The RDD has 24 partitions, and the cluster has a master node + 24 computing nodes (12 cores per node). Each node will have a partition, and I want to split each partition to two sub-partitions on the same node to improve the parallelism and achieve high data

unsubscribe

2017-01-15 Thread Boris Lenzinger

Re: Both Spark AM and Client are trying to delete Staging Directory

2017-01-15 Thread Liang-Chi Hsieh
Hi, Will it be a problem if the staging directory is already deleted? Because even the directory doesn't exist, fs.delete(stagingDirPath, true) won't cause failure but just return false. Rostyslav Sotnychenko wrote > Hi all! > > I am a bit confused why Spark AM and Client are both trying to

Re: What about removing TaskContext#getPartitionId?

2017-01-15 Thread Sean Owen
As you mentioned, it's called in ForeachSink. I don't know that the scaladoc is wrong. You're saying something else, that there's no such thing as local execution. I confess I don't know if that's true? but the doc isn't wrong in that case, really. More broadly, I just don't think this type of

Re: Equally split a RDD partition into two partition at the same node

2017-01-15 Thread Anastasios Zouzias
Hi Fei, How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ? https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395 coalesce is mostly used for reducing the number of partitions before writing to HDFS, but it might still be a