wrote:
There is a new API called repartitionAndSortWithinPartitions() in
master, it may help in this case,
then you should do the `groupBy()` by yourself.
On Wed, Oct 8, 2014 at 4:03 PM, chinchu chinchu@gmail.com wrote:
Sean,
I am having a similar issue, but I have a lot of data
Sean,
I am having a similar issue, but I have a lot of data for a group I cannot
materialize the iterable into a List or Seq in memory. [I tried it runs
into OOM]. is there any other way to do this ?
I also tried a secondary-sort, with the key having the group::time, but
the problem with that
Hi,
I am using the fold(zeroValue)(t1, t2) on the RDD I noticed that it runs
in parallel on all the partitions then aggregates the results from the
partitions. My data object is not aggregate-able I was wondering if
there's any way to run the fold sequentially. [I am looking to do a foldLeft
Thanks Andrew. that helps
On Fri, Sep 19, 2014 at 5:47 PM, Andrew Or-2 [via Apache Spark User List]
ml-node+s1001560n14708...@n3.nabble.com wrote:
Hey just a minor clarification, you _can_ use SparkFiles.get in your
application only if it runs on the executors, e.g. in the following way:
Thanks Andrew.
I understand the problem a little better now. There was a typo in my earlier
mail a bug in the code (causing the NPE in SparkFiles). I am using the
--master yarn-cluster (not local). And in this mode, the
com.test.batch.modeltrainer.ModelTrainerMain - my main-class will run on the
Thanks Marcelo. The code trying to read the file always runs in the driver. I
understand the problem with other master-deployment but will it work in
local, yarn-client yarn-cluster deployments.. that's all I care for now
:-)
Also what is the suggested way to do something like this ? Put the
Hi,
I am running spark-1.1.0 and I want to pass in a file (that contains java
serialized objects used to initialize my program) to the App main program. I
am using the --files option but I am not able to retrieve the file in the
main_class. It reports a null pointer exception. [I tried both local