Hi Jasbir,
Yes, you are right. Do you have any idea about my question?
Thanks,
Fei
On Mon, Jan 16, 2017 at 12:37 AM, wrote:
> Hi,
>
>
>
> Coalesce is used to decrease the number of partitions. If you give the
> value of numPartitions greater than the current
Hi,
Coalesce is used to decrease the number of partitions. If you give the value of
numPartitions greater than the current partition, I don’t think RDD number of
partitions will be increased.
Thanks,
Jasbir
From: Fei Hu [mailto:hufe...@gmail.com]
Sent: Sunday, January 15, 2017 10:10 PM
To:
Any updates for the above error guys ?
On Fri, Jan 13, 2017 at 9:35 PM, Josh Elser wrote:
> (-cc dev@phoenix)
>
> phoenix-4.8.2-HBase-1.2-server.jar in the top-level binary tarball of
> Apache Phoenix 4.8.0 is the jar which is meant to be deployed to all
> HBase's classpath.
Hi Liang-Chi,
Yes, you are right. I implement the following solution for this problem,
and it works. But I am not sure if it is efficient:
I double the partitions of the parent RDD, and then use the new partitions
and parent RDD to construct the target RDD. In the compute() function of
the
Hi,
When calling `coalesce` with `shuffle = false`, it is going to produce at
most min(numPartitions, previous RDD's number of partitions). So I think it
can't be used to double the number of partitions.
Anastasios Zouzias wrote
> Hi Fei,
>
> How you tried coalesce(numPartitions: Int,
Hi Sujith,
Thanks for suggestion.
The codes you quoted are from `CollectLimitExec` which will be in the plan
if a logical `Limit` is the final operator in an logical plan. But in the
physical plan you showed, there are `GlobalLimit` and `LocalLimit` for the
logical `Limit` operation, so the
*unsubscribe*
Hi Anastasios,
Thanks for your information. I will look into the CoalescedRDD code.
Thanks,
Fei
On Sun, Jan 15, 2017 at 12:21 PM, Anastasios Zouzias
wrote:
> Hi Fei,
>
> I looked at the code of CoalescedRDD and probably what I suggested will
> not work.
>
> Speaking of
Hi Fei,
I looked at the code of CoalescedRDD and probably what I suggested will not
work.
Speaking of which, CoalescedRDD is private[spark]. If this was not the
case, you could set balanceSlack to 1, and get what you requested, see
Hi Anastasios,
Thanks for your reply. If I just increase the numPartitions to be twice
larger, how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps
the data locality? Do I need to define my own Partitioner?
Thanks,
Fei
On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias
Hi Rishi,
Thanks for your reply! The RDD has 24 partitions, and the cluster has a
master node + 24 computing nodes (12 cores per node). Each node will have a
partition, and I want to split each partition to two sub-partitions on the
same node to improve the parallelism and achieve high data
Hi,
Will it be a problem if the staging directory is already deleted? Because
even the directory doesn't exist, fs.delete(stagingDirPath, true) won't
cause failure but just return false.
Rostyslav Sotnychenko wrote
> Hi all!
>
> I am a bit confused why Spark AM and Client are both trying to
As you mentioned, it's called in ForeachSink. I don't know that the
scaladoc is wrong. You're saying something else, that there's no such thing
as local execution. I confess I don't know if that's true? but the doc
isn't wrong in that case, really.
More broadly, I just don't think this type of
Hi Fei,
How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395
coalesce is mostly used for reducing the number of partitions before
writing to HDFS, but it might still be a
15 matches
Mail list logo