In my experiment, if I do not call gc() explicitly, the shuffle files will not 
be cleaned until the whole job finish… I don’t know why, maybe the rdd could 
not be GCed implicitly.
In my situation, a full gc in driver takes about 10 seconds, so I start a 
thread in driver to do GC  like this : (do GC every 120 seconds)

while (true) {
System.gc();
Thread.sleep(120 * 1000);
}


it works well now.
Do you have more elegant ways to clean the shuffle files?

Best Regards,
Sendong Li



> 在 2015年4月1日,上午5:09,Xiangrui Meng <men...@gmail.com> 写道:
> 
> Hey Guoqiang and Sendong,
> 
> Could you comment on the overhead of calling gc() explicitly? The shuffle 
> files should get cleaned in a few seconds after checkpointing, but it is 
> certainly possible to accumulates TBs of files in a few seconds. In this 
> case, calling gc() may work the same as waiting for a few seconds after each 
> checkpoint. Is it correct?
> 
> Best,
> Xiangrui
> 
> On Tue, Mar 31, 2015 at 8:58 AM, lisendong <lisend...@163.com 
> <mailto:lisend...@163.com>> wrote:
> guoqiang ’s method works very well …
> 
> it only takes 1TB disk now.
> 
> thank you very much!
> 
> 
> 
>> 在 2015年3月31日,下午4:47,GuoQiang Li <wi...@qq.com <mailto:wi...@qq.com>> 写道:
>> 
>> You can try to enforce garbage collection:
>> 
>> /** Run GC and make sure it actually has run */
>> def runGC() {
>>   val weakRef = new WeakReference(new Object())
>>   val startTime = System.currentTimeMillis
>>   System.gc() // Make a best effort to run the garbage collection. It 
>> *usually* runs GC.
>>   // Wait until a weak reference object has been GCed
>>   System.runFinalization()
>>   while (weakRef.get != null) {
>>     System.gc()
>>     System.runFinalization()
>>     Thread.sleep(200)
>>     if (System.currentTimeMillis - startTime > 10000) {
>>       throw new Exception("automatically cleanup error")
>>     }
>>   }
>> }
>> 
>> 
>> ------------------ 原始邮件 ------------------
>> 发件人: "lisendong"<lisend...@163.com <mailto:lisend...@163.com>>; 
>> 发送时间: 2015年3月31日(星期二) 下午3:47
>> 收件人: "Xiangrui Meng"<men...@gmail.com <mailto:men...@gmail.com>>; 
>> 抄送: "Xiangrui Meng"<m...@databricks.com <mailto:m...@databricks.com>>; 
>> "user"<user@spark.apache.org <mailto:user@spark.apache.org>>; "Sean 
>> Owen"<so...@cloudera.com <mailto:so...@cloudera.com>>; "GuoQiang 
>> Li"<wi...@qq.com <mailto:wi...@qq.com>>; 
>> 主题: Re: different result from implicit ALS with explicit ALS
>> 
>> I have update my spark source code to 1.3.1.
>> 
>> the checkpoint works well. 
>> 
>> BUT the shuffle data still could not be delete automatically…the disk usage 
>> is still 30TB…
>> 
>> I have set the spark.cleaner.referenceTracking.blocking.shuffle to true.
>> 
>> Do you know how to solve my problem?
>> 
>> Sendong Li
>> 
>> 
>> 
>>> 在 2015年3月31日,上午12:11,Xiangrui Meng <men...@gmail.com 
>>> <mailto:men...@gmail.com>> 写道:
>>> 
>>> setCheckpointInterval was added in the current master and branch-1.3. 
>>> Please help check whether it works. It will be included in the 1.3.1 and 
>>> 1.4.0 release. -Xiangrui
>>> 
>>> On Mon, Mar 30, 2015 at 7:27 AM, lisendong <lisend...@163.com 
>>> <mailto:lisend...@163.com>> wrote:
>>> hi, xiangrui:
>>> I found the ALS of spark 1.3.0 forget to do checkpoint() in explicit ALS:
>>> the code is :
>>> https://github.com/apache/spark/blob/db34690466d67f9c8ac6a145fddb5f7ea30a8d8d/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
>>>  
>>> <https://github.com/apache/spark/blob/db34690466d67f9c8ac6a145fddb5f7ea30a8d8d/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala>
>>> <PastedGraphic-2.tiff>
>>> 
>>> the checkpoint is very important in my situation, because my task will 
>>> produce 1TB shuffle data in each iteration, it the shuffle data is not 
>>> deleted in each iteration(using checkpoint()), the task will produce 30TB 
>>> data…
>>> 
>>> 
>>> So I change the ALS code, and re-compile by myself, but it seems the 
>>> checkpoint does not take effects, and the task still occupy 30TB disk… ( I 
>>> only add two lines to the ALS.scala) :
>>> 
>>> <PastedGraphic-3.tiff>
>>> 
>>> 
>>> 
>>> and the driver’s log seems strange, why the log is printed together...
>>> <PastedGraphic-1.tiff>
>>> 
>>> thank you very much!
>>> 
>>> 
>>>> 在 2015年2月26日,下午11:33,163 <lisend...@163.com <mailto:lisend...@163.com>> 写道:
>>>> 
>>>> Thank you very much for your opinion:)
>>>> 
>>>> In our case, maybe it 's dangerous to treat un-observed item as negative 
>>>> interaction(although we could give them small confidence, I think they are 
>>>> still incredible...)
>>>> 
>>>> I will do more experiments and give you feedback:)
>>>> 
>>>> Thank you;)
>>>> 
>>>> 
>>>>> 在 2015年2月26日,23:16,Sean Owen <so...@cloudera.com 
>>>>> <mailto:so...@cloudera.com>> 写道:
>>>>> 
>>>>> I believe that's right, and is what I was getting at. yes the implicit
>>>>> formulation ends up implicitly including every possible interaction in
>>>>> its loss function, even unobserved ones. That could be the difference.
>>>>> 
>>>>> This is mostly an academic question though. In practice, you have
>>>>> click-like data and should be using the implicit version for sure.
>>>>> 
>>>>> However you can give negative implicit feedback to the model. You
>>>>> could consider no-click as a mild, observed, negative interaction.
>>>>> That is: supply a small negative value for these cases. Unobserved
>>>>> pairs are not part of the data set. I'd be careful about assuming the
>>>>> lack of an action carries signal.
>>>>> 
>>>>>> On Thu, Feb 26, 2015 at 3:07 PM, 163 <lisend...@163.com 
>>>>>> <mailto:lisend...@163.com>> wrote:
>>>>>> oh my god, I think I understood...
>>>>>> In my case, there are three kinds of user-item pairs:
>>>>>> 
>>>>>> Display and click pair(positive pair)
>>>>>> Display but no-click pair(negative pair)
>>>>>> No-display pair(unobserved pair)
>>>>>> 
>>>>>> Explicit ALS only consider the first and the second kinds
>>>>>> But implicit ALS consider all the three kinds of pair(and consider the 
>>>>>> third
>>>>>> kind as the second pair, because their preference value are all zero and
>>>>>> confidence are all 1)
>>>>>> 
>>>>>> So the result are different. right?
>>>>>> 
>>>>>> Could you please give me some advice, which ALS should I use?
>>>>>> If I use the implicit ALS, how to distinguish the second and the third 
>>>>>> kind
>>>>>> of pair:)
>>>>>> 
>>>>>> My opinion is in my case, I should use explicit ALS ...
>>>>>> 
>>>>>> Thank you so much
>>>>>> 
>>>>>> 在 2015年2月26日,22:41,Xiangrui Meng <m...@databricks.com 
>>>>>> <mailto:m...@databricks.com>> 写道:
>>>>>> 
>>>>>> Lisen, did you use all m-by-n pairs during training? Implicit model
>>>>>> penalizes unobserved ratings, while explicit model doesn't. -Xiangrui
>>>>>> 
>>>>>>> On Feb 26, 2015 6:26 AM, "Sean Owen" <so...@cloudera.com 
>>>>>>> <mailto:so...@cloudera.com>> wrote:
>>>>>>> 
>>>>>>> +user
>>>>>>> 
>>>>>>>> On Thu, Feb 26, 2015 at 2:26 PM, Sean Owen <so...@cloudera.com 
>>>>>>>> <mailto:so...@cloudera.com>> wrote:
>>>>>>>> 
>>>>>>>> I think I may have it backwards, and that you are correct to keep the 0
>>>>>>>> elements in train() in order to try to reproduce the same result.
>>>>>>>> 
>>>>>>>> The second formulation is called 'weighted regularization' and is used
>>>>>>>> for both implicit and explicit feedback, as far as I can see in the 
>>>>>>>> code.
>>>>>>>> 
>>>>>>>> Hm, I'm actually not clear why these would produce different results.
>>>>>>>> Different code paths are used to be sure, but I'm not yet sure why they
>>>>>>>> would give different results.
>>>>>>>> 
>>>>>>>> In general you wouldn't use train() for data like this though, and 
>>>>>>>> would
>>>>>>>> never set alpha=0.
>>>>>>>> 
>>>>>>>>> On Thu, Feb 26, 2015 at 2:15 PM, lisendong <lisend...@163.com 
>>>>>>>>> <mailto:lisend...@163.com>> wrote:
>>>>>>>>> 
>>>>>>>>> I want to confirm the loss function you use (sorry I’m not so familiar
>>>>>>>>> with scala code so I did not understand the source code of mllib)
>>>>>>>>> 
>>>>>>>>> According to the papers :
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> in your implicit feedback ALS, the loss function is (ICDM 2008):
>>>>>>>>> 
>>>>>>>>> in the explicit feedback ALS, the loss function is (Netflix 2008):
>>>>>>>>> 
>>>>>>>>> note that besides the difference of confidence parameter Cui, the
>>>>>>>>> regularization is also different.  does your code also has this 
>>>>>>>>> difference?
>>>>>>>>> 
>>>>>>>>> Best Regards,
>>>>>>>>> Sendong Li
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 在 2015年2月26日,下午9:42,lisendong <lisend...@163.com 
>>>>>>>>>> <mailto:lisend...@163.com>> 写道:
>>>>>>>>>> 
>>>>>>>>>> Hi meng, fotero, sowen:
>>>>>>>>>> 
>>>>>>>>>> I’m using ALS with spark 1.0.0, the code should be:
>>>>>>>>>> 
>>>>>>>>>> https://github.com/apache/spark/blob/branch-1.0/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
>>>>>>>>>>  
>>>>>>>>>> <https://github.com/apache/spark/blob/branch-1.0/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala>
>>>>>>>>>> 
>>>>>>>>>> I think the following two method should produce the same (or near)
>>>>>>>>>> result:
>>>>>>>>>> 
>>>>>>>>>> MatrixFactorizationModel model = ALS.train(ratings.rdd(), 30, 30, 
>>>>>>>>>> 0.01,
>>>>>>>>>> -1, 1);
>>>>>>>>>> 
>>>>>>>>>> MatrixFactorizationModel model = ALS.trainImplicit(ratings.rdd(), 30,
>>>>>>>>>> 30, 0.01, -1, 0, 1);
>>>>>>>>>> 
>>>>>>>>>> the data I used is display log, the format of log is as following:
>>>>>>>>>> 
>>>>>>>>>> user  item  if-click
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I use 1.0 as score for click pair, and 0 as score for non-click pair.
>>>>>>>>>> 
>>>>>>>>>> in the second method, the alpha is set to zero, so the confidence for
>>>>>>>>>> positive and negative are both 1.0 (right?)
>>>>>>>>>> 
>>>>>>>>>> I think the two method should produce similar result, but the result 
>>>>>>>>>> is
>>>>>>>>>> :  the second method’s result is very bad (the AUC of the first 
>>>>>>>>>> result is
>>>>>>>>>> 0.7, but the AUC of the second result is only 0.61)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I could not understand why, could you help me?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thank you very much!
>>>>>>>>>> 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Sendong Li
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>> 
>>> 
>>>  邮件带有附件预览链接,若您转发或回复此邮件时不希望对方预览附件,建议您手动删除链接。
>>> 共有 3 个附件
>>> PastedGraphic-2.tiff(48K)
>>> 极速下载 
>>> <http://preview.mail.163.com/xdownload?filename=PastedGraphic-2.tiff&mid=1tbiyBrMDVEAMpbFKgAAsJ&part=3&sign=cca8b2e547991f21222b2755d4e03f4d&time=1427731931&uid=lisendong%40163.com>
>>> PastedGraphic-1.tiff(139K)
>>> 极速下载 
>>> <http://preview.mail.163.com/xdownload?filename=PastedGraphic-1.tiff&mid=1tbiyBrMDVEAMpbFKgAAsJ&part=4&sign=cca8b2e547991f21222b2755d4e03f4d&time=1427731931&uid=lisendong%40163.com>
>>> PastedGraphic-3.tiff(81K)
>>> 极速下载 
>>> <http://preview.mail.163.com/xdownload?filename=PastedGraphic-3.tiff&mid=1tbiyBrMDVEAMpbFKgAAsJ&part=5&sign=cca8b2e547991f21222b2755d4e03f4d&time=1427731931&uid=lisendong%40163.com>
> 

Reply via email to