e.org/mod_mbox/spark-user/201501.mbox/%3ccaae1cqr8rd8ypebcmbjwfhm+lxh6nw4+r+uharx00psk_sh...@mail.gmail.com%3E
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Partition-sorting-by-Spark-framework-td18213.html
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Alternatives-to-groupByKey-td20293.html
>>>
>>> And this Jira seems relevant too:
>>> https://issues.apache.org/jira/browse/SPARK-3655
>>>
>>> The amount of memory that I'm using is 2g per executor, and I can't go
>>> higher than that because each executor gets a YARN container from nodes
>>> with 16 GB of RAM and 5 YARN containers allowed per node.
>>>
>>> So I'd like to know if there's an easy solution to executing my logic on
>>> my full dataset in Spark.
>>>
>>> Thanks!
>>>
>>> -- Elango
>>>
>>
>>
>
--
Alexis GILLAIN
ache.spark.SparkContext.clean(SparkContext.scala:1893)
>> org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
>> org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
>>
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>> org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>> org.apache.spark.rdd.RDD.filter(RDD.scala:310)
>> cmd6$$user$$anonfun$3.apply(Main.scala:134)
>> cmd6$$user$$anonfun$3.apply(Main.scala:133)
>>
>> Thanks,
>> Balaji
>>
>
--
Alexis GILLAIN
e
> (not memory space), GC does not run; therefore the finalize() methods for
> the intermediate RDDs are not triggered.
>
>
> 2. System.gc() is only executed on the driver, but not on the workers (Is
> it how it works??!!)
>
> Any suggestions?
>
> Kind regards
> Ali H
ng the
> intermediate data from the previous iteration. Anyways, why does it keep
> the intermediate data for ALL previous iterations???
> How can we enforce Spark to clear these intermediate data *during* the
> execution of job?
>
> Kind regards,
> Ali hadian
>
>
--
Alexis GILLAIN
e question:
> How to set the decent number of partition, if it need not to be equal to
> the number of keys ?
>
> 在 2015年9月15日,下午3:41,Alexis Gillain <alexis.gill...@googlemail.com> 写道:
>
> Sorry I made a typo error in my previous message, you can't
> sortByKey(youkey,
on ? If it is the former, comOp Function
> do nothing!
>
> I tried to take the second “numPartitions” parameter, pass the number of
> key to it. But, the number of key is so large to all the tasks be killed.
>
>
> What should I do with this case ?
>
> I'm asking for advises online...
>
> Thank you.
>
--
Alexis GILLAIN
t I find
> http://spark.apache.org/docs/latest/mllib-classification-regression.html,
> it is not what I mean. Is there a way to use multilabel classification?
> Thanks alot.
>
> Best,
> yasemin
>
> --
> hiç ender hiç
>
--
Alexis GILLAIN
anbo Liang <yblia...@gmail.com>:
> LogisticRegression in MLlib(not ML) package supports both multiclass and
> multilabel classification.
>
>
> 2015-09-11 16:21 GMT+08:00 Alexis Gillain <alexis.gill...@googlemail.com>:
>
>> You can try these packages for
; line browser to look at the webui (I cannot access the server in graphical
> display mode), this should help me understand what's going on. I will also
> try the workarounds mentioned in the link. Keep you posted.
>
> Again, thanks a lot!
>
> Best,
>
> Aurelien
>
>
t; Cloudera Manager) *besides* the checkpoint files (which are regular HDFS
> files), and the application eventually runs out of disk space. The same is
> true even if I checkpoint at every iteration.
>
> What am I doing wrong? Maybe some garbage collector setting?
>
> Thanks a lot for the help,
Feynman Liang fli...@databricks.com:
CCing the mailing list again.
It's currently not on the radar. Do you have a use case for it? I can
bring it up during 1.6 roadmap planning tomorrow.
On Mon, Aug 24, 2015 at 8:28 PM, alexis GILLAIN ila...@hotmail.com
wrote:
Hi,
I just realized
.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Alexis GILLAIN
Hi Aurelien,
The first code should create a new RDD in memory at each iteration (check
the webui).
The second code will unpersist the RDD but that's not the main problem.
I think you have trouble due to long lineage as .cache() keep track of
lineage for recovery.
You should have a look at
I want to use prefixspan so I had a look at the code and the cited paper :
Distributed PrefixSpan Algorithm Based on MapReduce.
There is a result in the paper I didn't really undertstand and I could'nt
find where it is used in the code.
Suppose a sequence database S = {1,2...n}, a sequence
I haven't register my class in kryo but I dont think it would have such an
impact on the stack size.
I'm thinking of using graphx and I'm wondering how it serializes the graph
object as it can use kryo as serializer.
2015-03-14 6:22 GMT+01:00 Ted Yu yuzhih...@gmail.com:
Have you registered
15 matches
Mail list logo