, 2017 at 2:07 PM, Pradeep Gollakota <pradeep...@gmail.com>
wrote:
> Usually this kind of thing can be done at a lower level in the InputFormat
> usually by specifying the max split size. Have you looked into that
> possibility with your InputFormat?
>
> On Sun, Jan 15, 2017 at
greater than the current partition, I don’t think
> RDD number of partitions will be increased.
>
>
>
> Thanks,
>
> Jasbir
>
>
>
> *From:* Fei Hu [mailto:hufe...@gmail.com]
> *Sent:* Sunday, January 15, 2017 10:10 PM
> *To:* zouz...@cs.toronto.edu
> *Cc:* user @sp
you could try to use the CoalescedRDD code to implement your
> requirement.
>
> Good luck!
> Cheers,
> Anastasios
>
>
> On Sun, Jan 15, 2017 at 5:39 PM, Fei Hu <hufe...@gmail.com> wrote:
>
>> Hi Anastasios,
>>
>> Thanks for your reply. If I ju
of partitions before
> writing to HDFS, but it might still be a narrow dependency (satisfying your
> requirements) if you increase the # of partitions.
>
> Best,
> Anastasios
>
> On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu <hufe...@gmail.com> wrote:
>
>> Dear all,
locality.
Thanks,
Fei
On Sun, Jan 15, 2017 at 2:33 AM, Rishi Yadav <ri...@infoobjects.com> wrote:
> Can you provide some more details:
> 1. How many partitions does RDD have
> 2. How big is the cluster
> On Sat, Jan 14, 2017 at 3:59 PM Fei Hu <hufe...@gmail.com> wrote:
>
Dear all,
I want to equally divide a RDD partition into two partitions. That means,
the first half of elements in the partition will create a new partition,
and the second half of elements in the partition will generate another new
partition. But the two new partitions are required to be at the
> You can’t call runJob inside getPreferredLocations().
> You can take a look at the source code of HadoopRDD to help you implement
> getPreferredLocations()
> appropriately.
>
> On Dec 31, 2016, at 09:48, Fei Hu <hufe...@gmail.com> wrote:
>
> That is a good idea.
>
Dear all,
I tried to customize my own RDD. In the getPreferredLocations() function, I
used the following code to query anonter RDD, which was used as an input to
initialize this customized RDD:
* val results: Array[Array[DataChunkPartition]] =
context.runJob(partitionsRDD,
class of RDD and override the
> getPreferredLocations() to implement the logic of dynamic changing of the
> locations.
> > On Dec 30, 2016, at 12:06, Fei Hu <hufe...@gmail.com> wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location fo
Dear all,
Is there any way to change the host location for a certain partition of RDD?
"protected def getPreferredLocations(split: Partition)" can be used to
initialize the location, but how to change it after the initialization?
Thanks,
Fei
Hi All,
I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version
1.5.0). I customized the Spark interpreter to use org.apache.spark.
serializer.KryoSerializer as spark.serializer. And in the dependency I
added Kyro-3.0.3 as following:
com.esotericsoftware:kryo:3.0.3
When I
Hi All,
I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version
1.5.0). I customized the Spark interpreter to use
org.apache.spark.serializer.KryoSerializer as spark.serializer. And in the
dependency I added Kyro-3.0.3 as following:
com.esotericsoftware:kryo:3.0.3
When I
Dear all,
I have a question about how to measure the runtime for a Spak application.
Here is an example:
- On the Spark UI: the total duration time is 2.0 minutes = 120 seconds
as following
[image: Screen Shot 2016-07-09 at 11.45.44 PM.png]
- However, when I check the jobs launched
deploy/SparkSubmit.scala
>
> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala>
>
> What's your purpose for using "java -cp", for local development, IDE should
> be sufficient.
>
>
>
>
Hi,
I am wondering how to run the spark job by java command, such as: java -cp
spark.jar mainclass. When running/debugging the spark program in IntelliJ IDEA,
it uses java command to run spark main class, so I think it should be able to
run the spark job by java command besides the
Hi,
I am trying to migrate the program from Hadoop to Spark, but I met a problem
about the serialization. In the Hadoop program, the key and value classes
implement org.apache.hadoop.io.WritableComparable, which are for the
serialization. Now in the spark program, I used newAPIHadoopRDD to
16 matches
Mail list logo