Hi Yasemin,
We had the same question and found this:
https://issues.apache.org/jira/browse/SPARK-6884
Thanks,
Maximo
On Sep 10, 2015, at 9:09 AM, Yasemin Kaya
> wrote:
Hi ,
I am using Random Forest Alg. for recommendation system. I get users
g' DataFrame?
Basically, the equivalent of writing by partition and creating a DataFrame for
each result, but skipping the HDFS step.
On Tue, Sep 8, 2015 at 10:47 AM, Maximo Gurmendez
<mgurmen...@dataxu.com<mailto:mgurmen...@dataxu.com>> wrote:
Hi,
I have a RDD that needs to
ly in
bigRdd)
2) The caching happens in a way that preserves the partitioning by client Id
(and the locality)
Thanks,
Maximo
PD: I am aware that this might cause imbalance of data, but I can probably
mitigate that with a smarter partitioner.
On Sep 9, 2015, at 9:30 AM, Maximo Gurmendez
<mg
Hi,
I have a RDD that needs to be split (say, by client) in order to train n
models (i.e. one for each client). Since most of the classifiers that come with
ml-lib only can accept an RDD as input (and cannot build multiple models in one
pass - as I understand it), the only way to train n
Hi,
As part of SparkContext.newAPIHadoopRDD(). Would Spark support an InputFormat
that uses Hadoop’s distributed cache?
Thanks,
Máximo
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands,