I've experienced something related to what we discussed. NaïveBayes crashes 
with native blas/lapack libraries for breeze/netlib on Windows: 
https://issues.apache.org/jira/browse/SPARK-3403
I've also attached to the issue another example with gradient that crashes in 
runMiniBatchSGD, probably trying to do grad1 += grad2.
Could you take a close look at this issue? It paralyzed my development for 
mllib...

Best regards, Alexander

-----Original Message-----
From: Xiangrui Meng [mailto:men...@gmail.com] 
Sent: Wednesday, September 03, 2014 11:18 PM
To: RJ Nowling
Cc: David Hall; Ulanov, Alexander; <dev@spark.apache.org>
Subject: Re: Is breeze thread safe in Spark?

RJ, could you provide a code example that can re-produce the bug you observed 
in local testing? Breeze's += is not thread-safe. But in a Spark job, calls to 
a resultHandler is synchronized:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52
. Let's move our discussion to the JIRA page. -Xiangrui

On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling <rnowl...@gmail.com> wrote:
> Here's the JIRA:
>
> https://issues.apache.org/jira/browse/SPARK-3384
>
> Even if the current implementation uses += in a thread safe manner, it 
> can be easy to make the mistake of accidentally using += in a 
> parallelized context.  I suggest changing all instances of += to +.
>
> I would encourage others to reproduce and validate this issue, though.
>
>
> On Wed, Sep 3, 2014 at 3:02 PM, David Hall <d...@cs.berkeley.edu> wrote:
>
>> mutating operations are not thread safe. Operations that don't mutate 
>> should be thread safe. I can't speak to what Evan said, but I would 
>> guess that the way they're using += should be safe.
>>
>>
>> On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling <rnowl...@gmail.com> wrote:
>>
>>> David,
>>>
>>> Can you confirm that += is not thread safe but + is?  I'm assuming + 
>>> allocates a new object for the write, while += doesn't.
>>>
>>> Thanks!
>>> RJ
>>>
>>>
>>> On Wed, Sep 3, 2014 at 2:50 PM, David Hall <d...@cs.berkeley.edu> wrote:
>>>
>>>> In general, in Breeze we allocate separate work arrays for each 
>>>> call to lapack, so it should be fine. In general concurrent 
>>>> modification isn't thread safe of course, but things that "ought" 
>>>> to be thread safe really should be.
>>>>
>>>>
>>>> On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling <rnowl...@gmail.com> wrote:
>>>>
>>>>> No, it's not in all cases.   Since Breeze uses lapack under the hood,
>>>>> changes to memory between different threads is bad.
>>>>>
>>>>> There's actually a potential bug in the KMeans code where it uses 
>>>>> += instead of +.
>>>>>
>>>>>
>>>>> On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander < 
>>>>> alexander.ula...@hp.com>
>>>>> wrote:
>>>>>
>>>>> > Hi,
>>>>> >
>>>>> > Is breeze library called thread safe from Spark mllib code in 
>>>>> > case
>>>>> when
>>>>> > native libs for blas and lapack are used? Might it be an issue 
>>>>> > when
>>>>> running
>>>>> > Spark locally?
>>>>> >
>>>>> > Best regards, Alexander
>>>>> > ----------------------------------------------------------------
>>>>> > ----- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
>>>>> > For additional commands, e-mail: dev-h...@spark.apache.org
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> em rnowl...@gmail.com
>>>>> c 954.496.2314
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> em rnowl...@gmail.com
>>> c 954.496.2314
>>>
>>
>>
>
>
> --
> em rnowl...@gmail.com
> c 954.496.2314

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to