RE: Is breeze thread safe in Spark?

2014-09-04 Thread Ulanov, Alexander
I've experienced something related to what we discussed. NaïveBayes crashes 
with native blas/lapack libraries for breeze/netlib on Windows: 
https://issues.apache.org/jira/browse/SPARK-3403
I've also attached to the issue another example with gradient that crashes in 
runMiniBatchSGD, probably trying to do grad1 += grad2.
Could you take a close look at this issue? It paralyzed my development for 
mllib...

Best regards, Alexander

-Original Message-
From: Xiangrui Meng [mailto:men...@gmail.com] 
Sent: Wednesday, September 03, 2014 11:18 PM
To: RJ Nowling
Cc: David Hall; Ulanov, Alexander; dev@spark.apache.org
Subject: Re: Is breeze thread safe in Spark?

RJ, could you provide a code example that can re-produce the bug you observed 
in local testing? Breeze's += is not thread-safe. But in a Spark job, calls to 
a resultHandler is synchronized:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52
. Let's move our discussion to the JIRA page. -Xiangrui

On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling rnowl...@gmail.com wrote:
 Here's the JIRA:

 https://issues.apache.org/jira/browse/SPARK-3384

 Even if the current implementation uses += in a thread safe manner, it 
 can be easy to make the mistake of accidentally using += in a 
 parallelized context.  I suggest changing all instances of += to +.

 I would encourage others to reproduce and validate this issue, though.


 On Wed, Sep 3, 2014 at 3:02 PM, David Hall d...@cs.berkeley.edu wrote:

 mutating operations are not thread safe. Operations that don't mutate 
 should be thread safe. I can't speak to what Evan said, but I would 
 guess that the way they're using += should be safe.


 On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote:

 David,

 Can you confirm that += is not thread safe but + is?  I'm assuming + 
 allocates a new object for the write, while += doesn't.

 Thanks!
 RJ


 On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote:

 In general, in Breeze we allocate separate work arrays for each 
 call to lapack, so it should be fine. In general concurrent 
 modification isn't thread safe of course, but things that ought 
 to be thread safe really should be.


 On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote:

 No, it's not in all cases.   Since Breeze uses lapack under the hood,
 changes to memory between different threads is bad.

 There's actually a potential bug in the KMeans code where it uses 
 += instead of +.


 On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander  
 alexander.ula...@hp.com
 wrote:

  Hi,
 
  Is breeze library called thread safe from Spark mllib code in 
  case
 when
  native libs for blas and lapack are used? Might it be an issue 
  when
 running
  Spark locally?
 
  Best regards, Alexander
  
  - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


 --
 em rnowl...@gmail.com
 c 954.496.2314





 --
 em rnowl...@gmail.com
 c 954.496.2314





 --
 em rnowl...@gmail.com
 c 954.496.2314

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Is breeze thread safe in Spark?

2014-09-03 Thread RJ Nowling
David,

Can you confirm that += is not thread safe but + is?  I'm assuming +
allocates a new object for the write, while += doesn't.

Thanks!
RJ


On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote:

 In general, in Breeze we allocate separate work arrays for each call to
 lapack, so it should be fine. In general concurrent modification isn't
 thread safe of course, but things that ought to be thread safe really
 should be.


 On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote:

 No, it's not in all cases.   Since Breeze uses lapack under the hood,
 changes to memory between different threads is bad.

 There's actually a potential bug in the KMeans code where it uses +=
 instead of +.


 On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander 
 alexander.ula...@hp.com
 wrote:

  Hi,
 
  Is breeze library called thread safe from Spark mllib code in case when
  native libs for blas and lapack are used? Might it be an issue when
 running
  Spark locally?
 
  Best regards, Alexander
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


 --
 em rnowl...@gmail.com
 c 954.496.2314





-- 
em rnowl...@gmail.com
c 954.496.2314


Re: Is breeze thread safe in Spark?

2014-09-03 Thread Evan R. Sparks
Additionally, at the higher level, MLlib allocates separate Breeze
Vectors/Matrices on a Per-executor basis. The only place I can think of
where data structures might be over-written concurrently is in a
.aggregate() call, and these calls happen sequentially.

RJ - Do you have a JIRA reference for that bug?

Thanks!


On Wed, Sep 3, 2014 at 11:50 AM, David Hall d...@cs.berkeley.edu wrote:

 In general, in Breeze we allocate separate work arrays for each call to
 lapack, so it should be fine. In general concurrent modification isn't
 thread safe of course, but things that ought to be thread safe really
 should be.


 On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote:

  No, it's not in all cases.   Since Breeze uses lapack under the hood,
  changes to memory between different threads is bad.
 
  There's actually a potential bug in the KMeans code where it uses +=
  instead of +.
 
 
  On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander 
 alexander.ula...@hp.com
  
  wrote:
 
   Hi,
  
   Is breeze library called thread safe from Spark mllib code in case when
   native libs for blas and lapack are used? Might it be an issue when
  running
   Spark locally?
  
   Best regards, Alexander
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
  
 
 
  --
  em rnowl...@gmail.com
  c 954.496.2314
 



Re: Is breeze thread safe in Spark?

2014-09-03 Thread RJ Nowling
Never filed a JIRA -- I actually forgot about it.  Let me file one now.



On Wed, Sep 3, 2014 at 2:58 PM, Evan R. Sparks evan.spa...@gmail.com
wrote:

 Additionally, at the higher level, MLlib allocates separate Breeze
 Vectors/Matrices on a Per-executor basis. The only place I can think of
 where data structures might be over-written concurrently is in a
 .aggregate() call, and these calls happen sequentially.

 RJ - Do you have a JIRA reference for that bug?

 Thanks!


 On Wed, Sep 3, 2014 at 11:50 AM, David Hall d...@cs.berkeley.edu wrote:

 In general, in Breeze we allocate separate work arrays for each call to
 lapack, so it should be fine. In general concurrent modification isn't
 thread safe of course, but things that ought to be thread safe really
 should be.


 On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote:

  No, it's not in all cases.   Since Breeze uses lapack under the hood,
  changes to memory between different threads is bad.
 
  There's actually a potential bug in the KMeans code where it uses +=
  instead of +.
 
 
  On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander 
 alexander.ula...@hp.com
  
  wrote:
 
   Hi,
  
   Is breeze library called thread safe from Spark mllib code in case
 when
   native libs for blas and lapack are used? Might it be an issue when
  running
   Spark locally?
  
   Best regards, Alexander
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
  
 
 
  --
  em rnowl...@gmail.com
  c 954.496.2314
 





-- 
em rnowl...@gmail.com
c 954.496.2314


Re: Is breeze thread safe in Spark?

2014-09-03 Thread David Hall
mutating operations are not thread safe. Operations that don't mutate
should be thread safe. I can't speak to what Evan said, but I would guess
that the way they're using += should be safe.


On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote:

 David,

 Can you confirm that += is not thread safe but + is?  I'm assuming +
 allocates a new object for the write, while += doesn't.

 Thanks!
 RJ


 On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote:

 In general, in Breeze we allocate separate work arrays for each call to
 lapack, so it should be fine. In general concurrent modification isn't
 thread safe of course, but things that ought to be thread safe really
 should be.


 On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote:

 No, it's not in all cases.   Since Breeze uses lapack under the hood,
 changes to memory between different threads is bad.

 There's actually a potential bug in the KMeans code where it uses +=
 instead of +.


 On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander 
 alexander.ula...@hp.com
 wrote:

  Hi,
 
  Is breeze library called thread safe from Spark mllib code in case when
  native libs for blas and lapack are used? Might it be an issue when
 running
  Spark locally?
 
  Best regards, Alexander
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


 --
 em rnowl...@gmail.com
 c 954.496.2314





 --
 em rnowl...@gmail.com
 c 954.496.2314



Re: Is breeze thread safe in Spark?

2014-09-03 Thread RJ Nowling
Here's the JIRA:

https://issues.apache.org/jira/browse/SPARK-3384

Even if the current implementation uses += in a thread safe manner, it can
be easy to make the mistake of accidentally using += in a parallelized
context.  I suggest changing all instances of += to +.

I would encourage others to reproduce and validate this issue, though.


On Wed, Sep 3, 2014 at 3:02 PM, David Hall d...@cs.berkeley.edu wrote:

 mutating operations are not thread safe. Operations that don't mutate
 should be thread safe. I can't speak to what Evan said, but I would guess
 that the way they're using += should be safe.


 On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote:

 David,

 Can you confirm that += is not thread safe but + is?  I'm assuming +
 allocates a new object for the write, while += doesn't.

 Thanks!
 RJ


 On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote:

 In general, in Breeze we allocate separate work arrays for each call to
 lapack, so it should be fine. In general concurrent modification isn't
 thread safe of course, but things that ought to be thread safe really
 should be.


 On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote:

 No, it's not in all cases.   Since Breeze uses lapack under the hood,
 changes to memory between different threads is bad.

 There's actually a potential bug in the KMeans code where it uses +=
 instead of +.


 On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander 
 alexander.ula...@hp.com
 wrote:

  Hi,
 
  Is breeze library called thread safe from Spark mllib code in case
 when
  native libs for blas and lapack are used? Might it be an issue when
 running
  Spark locally?
 
  Best regards, Alexander
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


 --
 em rnowl...@gmail.com
 c 954.496.2314





 --
 em rnowl...@gmail.com
 c 954.496.2314





-- 
em rnowl...@gmail.com
c 954.496.2314


Re: Is breeze thread safe in Spark?

2014-09-03 Thread Xiangrui Meng
RJ, could you provide a code example that can re-produce the bug you
observed in local testing? Breeze's += is not thread-safe. But in a
Spark job, calls to a resultHandler is synchronized:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52
. Let's move our discussion to the JIRA page. -Xiangrui

On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling rnowl...@gmail.com wrote:
 Here's the JIRA:

 https://issues.apache.org/jira/browse/SPARK-3384

 Even if the current implementation uses += in a thread safe manner, it can
 be easy to make the mistake of accidentally using += in a parallelized
 context.  I suggest changing all instances of += to +.

 I would encourage others to reproduce and validate this issue, though.


 On Wed, Sep 3, 2014 at 3:02 PM, David Hall d...@cs.berkeley.edu wrote:

 mutating operations are not thread safe. Operations that don't mutate
 should be thread safe. I can't speak to what Evan said, but I would guess
 that the way they're using += should be safe.


 On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote:

 David,

 Can you confirm that += is not thread safe but + is?  I'm assuming +
 allocates a new object for the write, while += doesn't.

 Thanks!
 RJ


 On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote:

 In general, in Breeze we allocate separate work arrays for each call to
 lapack, so it should be fine. In general concurrent modification isn't
 thread safe of course, but things that ought to be thread safe really
 should be.


 On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote:

 No, it's not in all cases.   Since Breeze uses lapack under the hood,
 changes to memory between different threads is bad.

 There's actually a potential bug in the KMeans code where it uses +=
 instead of +.


 On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander 
 alexander.ula...@hp.com
 wrote:

  Hi,
 
  Is breeze library called thread safe from Spark mllib code in case
 when
  native libs for blas and lapack are used? Might it be an issue when
 running
  Spark locally?
 
  Best regards, Alexander
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


 --
 em rnowl...@gmail.com
 c 954.496.2314





 --
 em rnowl...@gmail.com
 c 954.496.2314





 --
 em rnowl...@gmail.com
 c 954.496.2314

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Is breeze thread safe in Spark?

2014-09-03 Thread Ulanov, Alexander
What about the allocation of a new breeze vector? Can it happen unsafe within 
Spark (in several threads)?

Best regards, Alexander

03.09.2014, в 23:17, Xiangrui Meng men...@gmail.com написал(а):

 RJ, could you provide a code example that can re-produce the bug you
 observed in local testing? Breeze's += is not thread-safe. But in a
 Spark job, calls to a resultHandler is synchronized:
 https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52
 . Let's move our discussion to the JIRA page. -Xiangrui
 
 On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling rnowl...@gmail.com wrote:
 Here's the JIRA:
 
 https://issues.apache.org/jira/browse/SPARK-3384
 
 Even if the current implementation uses += in a thread safe manner, it can
 be easy to make the mistake of accidentally using += in a parallelized
 context.  I suggest changing all instances of += to +.
 
 I would encourage others to reproduce and validate this issue, though.
 
 
 On Wed, Sep 3, 2014 at 3:02 PM, David Hall d...@cs.berkeley.edu wrote:
 
 mutating operations are not thread safe. Operations that don't mutate
 should be thread safe. I can't speak to what Evan said, but I would guess
 that the way they're using += should be safe.
 
 
 On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote:
 
 David,
 
 Can you confirm that += is not thread safe but + is?  I'm assuming +
 allocates a new object for the write, while += doesn't.
 
 Thanks!
 RJ
 
 
 On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote:
 
 In general, in Breeze we allocate separate work arrays for each call to
 lapack, so it should be fine. In general concurrent modification isn't
 thread safe of course, but things that ought to be thread safe really
 should be.
 
 
 On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote:
 
 No, it's not in all cases.   Since Breeze uses lapack under the hood,
 changes to memory between different threads is bad.
 
 There's actually a potential bug in the KMeans code where it uses +=
 instead of +.
 
 
 On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander 
 alexander.ula...@hp.com
 wrote:
 
 Hi,
 
 Is breeze library called thread safe from Spark mllib code in case
 when
 native libs for blas and lapack are used? Might it be an issue when
 running
 Spark locally?
 
 Best regards, Alexander
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 --
 em rnowl...@gmail.com
 c 954.496.2314
 
 
 --
 em rnowl...@gmail.com
 c 954.496.2314
 
 
 --
 em rnowl...@gmail.com
 c 954.496.2314

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org