RE: Is breeze thread safe in Spark?
I've experienced something related to what we discussed. NaïveBayes crashes with native blas/lapack libraries for breeze/netlib on Windows: https://issues.apache.org/jira/browse/SPARK-3403 I've also attached to the issue another example with gradient that crashes in runMiniBatchSGD, probably trying to do grad1 += grad2. Could you take a close look at this issue? It paralyzed my development for mllib... Best regards, Alexander -Original Message- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Wednesday, September 03, 2014 11:18 PM To: RJ Nowling Cc: David Hall; Ulanov, Alexander; dev@spark.apache.org Subject: Re: Is breeze thread safe in Spark? RJ, could you provide a code example that can re-produce the bug you observed in local testing? Breeze's += is not thread-safe. But in a Spark job, calls to a resultHandler is synchronized: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52 . Let's move our discussion to the JIRA page. -Xiangrui On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling rnowl...@gmail.com wrote: Here's the JIRA: https://issues.apache.org/jira/browse/SPARK-3384 Even if the current implementation uses += in a thread safe manner, it can be easy to make the mistake of accidentally using += in a parallelized context. I suggest changing all instances of += to +. I would encourage others to reproduce and validate this issue, though. On Wed, Sep 3, 2014 at 3:02 PM, David Hall d...@cs.berkeley.edu wrote: mutating operations are not thread safe. Operations that don't mutate should be thread safe. I can't speak to what Evan said, but I would guess that the way they're using += should be safe. On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote: David, Can you confirm that += is not thread safe but + is? I'm assuming + allocates a new object for the write, while += doesn't. Thanks! RJ On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote: In general, in Breeze we allocate separate work arrays for each call to lapack, so it should be fine. In general concurrent modification isn't thread safe of course, but things that ought to be thread safe really should be. On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote: No, it's not in all cases. Since Breeze uses lapack under the hood, changes to memory between different threads is bad. There's actually a potential bug in the KMeans code where it uses += instead of +. On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Is breeze library called thread safe from Spark mllib code in case when native libs for blas and lapack are used? Might it be an issue when running Spark locally? Best regards, Alexander - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Is breeze thread safe in Spark?
David, Can you confirm that += is not thread safe but + is? I'm assuming + allocates a new object for the write, while += doesn't. Thanks! RJ On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote: In general, in Breeze we allocate separate work arrays for each call to lapack, so it should be fine. In general concurrent modification isn't thread safe of course, but things that ought to be thread safe really should be. On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote: No, it's not in all cases. Since Breeze uses lapack under the hood, changes to memory between different threads is bad. There's actually a potential bug in the KMeans code where it uses += instead of +. On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Is breeze library called thread safe from Spark mllib code in case when native libs for blas and lapack are used? Might it be an issue when running Spark locally? Best regards, Alexander - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314
Re: Is breeze thread safe in Spark?
Additionally, at the higher level, MLlib allocates separate Breeze Vectors/Matrices on a Per-executor basis. The only place I can think of where data structures might be over-written concurrently is in a .aggregate() call, and these calls happen sequentially. RJ - Do you have a JIRA reference for that bug? Thanks! On Wed, Sep 3, 2014 at 11:50 AM, David Hall d...@cs.berkeley.edu wrote: In general, in Breeze we allocate separate work arrays for each call to lapack, so it should be fine. In general concurrent modification isn't thread safe of course, but things that ought to be thread safe really should be. On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote: No, it's not in all cases. Since Breeze uses lapack under the hood, changes to memory between different threads is bad. There's actually a potential bug in the KMeans code where it uses += instead of +. On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Is breeze library called thread safe from Spark mllib code in case when native libs for blas and lapack are used? Might it be an issue when running Spark locally? Best regards, Alexander - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314
Re: Is breeze thread safe in Spark?
Never filed a JIRA -- I actually forgot about it. Let me file one now. On Wed, Sep 3, 2014 at 2:58 PM, Evan R. Sparks evan.spa...@gmail.com wrote: Additionally, at the higher level, MLlib allocates separate Breeze Vectors/Matrices on a Per-executor basis. The only place I can think of where data structures might be over-written concurrently is in a .aggregate() call, and these calls happen sequentially. RJ - Do you have a JIRA reference for that bug? Thanks! On Wed, Sep 3, 2014 at 11:50 AM, David Hall d...@cs.berkeley.edu wrote: In general, in Breeze we allocate separate work arrays for each call to lapack, so it should be fine. In general concurrent modification isn't thread safe of course, but things that ought to be thread safe really should be. On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote: No, it's not in all cases. Since Breeze uses lapack under the hood, changes to memory between different threads is bad. There's actually a potential bug in the KMeans code where it uses += instead of +. On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Is breeze library called thread safe from Spark mllib code in case when native libs for blas and lapack are used? Might it be an issue when running Spark locally? Best regards, Alexander - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314
Re: Is breeze thread safe in Spark?
mutating operations are not thread safe. Operations that don't mutate should be thread safe. I can't speak to what Evan said, but I would guess that the way they're using += should be safe. On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote: David, Can you confirm that += is not thread safe but + is? I'm assuming + allocates a new object for the write, while += doesn't. Thanks! RJ On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote: In general, in Breeze we allocate separate work arrays for each call to lapack, so it should be fine. In general concurrent modification isn't thread safe of course, but things that ought to be thread safe really should be. On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote: No, it's not in all cases. Since Breeze uses lapack under the hood, changes to memory between different threads is bad. There's actually a potential bug in the KMeans code where it uses += instead of +. On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Is breeze library called thread safe from Spark mllib code in case when native libs for blas and lapack are used? Might it be an issue when running Spark locally? Best regards, Alexander - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314
Re: Is breeze thread safe in Spark?
Here's the JIRA: https://issues.apache.org/jira/browse/SPARK-3384 Even if the current implementation uses += in a thread safe manner, it can be easy to make the mistake of accidentally using += in a parallelized context. I suggest changing all instances of += to +. I would encourage others to reproduce and validate this issue, though. On Wed, Sep 3, 2014 at 3:02 PM, David Hall d...@cs.berkeley.edu wrote: mutating operations are not thread safe. Operations that don't mutate should be thread safe. I can't speak to what Evan said, but I would guess that the way they're using += should be safe. On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote: David, Can you confirm that += is not thread safe but + is? I'm assuming + allocates a new object for the write, while += doesn't. Thanks! RJ On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote: In general, in Breeze we allocate separate work arrays for each call to lapack, so it should be fine. In general concurrent modification isn't thread safe of course, but things that ought to be thread safe really should be. On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote: No, it's not in all cases. Since Breeze uses lapack under the hood, changes to memory between different threads is bad. There's actually a potential bug in the KMeans code where it uses += instead of +. On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Is breeze library called thread safe from Spark mllib code in case when native libs for blas and lapack are used? Might it be an issue when running Spark locally? Best regards, Alexander - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314
Re: Is breeze thread safe in Spark?
RJ, could you provide a code example that can re-produce the bug you observed in local testing? Breeze's += is not thread-safe. But in a Spark job, calls to a resultHandler is synchronized: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52 . Let's move our discussion to the JIRA page. -Xiangrui On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling rnowl...@gmail.com wrote: Here's the JIRA: https://issues.apache.org/jira/browse/SPARK-3384 Even if the current implementation uses += in a thread safe manner, it can be easy to make the mistake of accidentally using += in a parallelized context. I suggest changing all instances of += to +. I would encourage others to reproduce and validate this issue, though. On Wed, Sep 3, 2014 at 3:02 PM, David Hall d...@cs.berkeley.edu wrote: mutating operations are not thread safe. Operations that don't mutate should be thread safe. I can't speak to what Evan said, but I would guess that the way they're using += should be safe. On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote: David, Can you confirm that += is not thread safe but + is? I'm assuming + allocates a new object for the write, while += doesn't. Thanks! RJ On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote: In general, in Breeze we allocate separate work arrays for each call to lapack, so it should be fine. In general concurrent modification isn't thread safe of course, but things that ought to be thread safe really should be. On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote: No, it's not in all cases. Since Breeze uses lapack under the hood, changes to memory between different threads is bad. There's actually a potential bug in the KMeans code where it uses += instead of +. On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Is breeze library called thread safe from Spark mllib code in case when native libs for blas and lapack are used? Might it be an issue when running Spark locally? Best regards, Alexander - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Is breeze thread safe in Spark?
What about the allocation of a new breeze vector? Can it happen unsafe within Spark (in several threads)? Best regards, Alexander 03.09.2014, в 23:17, Xiangrui Meng men...@gmail.com написал(а): RJ, could you provide a code example that can re-produce the bug you observed in local testing? Breeze's += is not thread-safe. But in a Spark job, calls to a resultHandler is synchronized: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52 . Let's move our discussion to the JIRA page. -Xiangrui On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling rnowl...@gmail.com wrote: Here's the JIRA: https://issues.apache.org/jira/browse/SPARK-3384 Even if the current implementation uses += in a thread safe manner, it can be easy to make the mistake of accidentally using += in a parallelized context. I suggest changing all instances of += to +. I would encourage others to reproduce and validate this issue, though. On Wed, Sep 3, 2014 at 3:02 PM, David Hall d...@cs.berkeley.edu wrote: mutating operations are not thread safe. Operations that don't mutate should be thread safe. I can't speak to what Evan said, but I would guess that the way they're using += should be safe. On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote: David, Can you confirm that += is not thread safe but + is? I'm assuming + allocates a new object for the write, while += doesn't. Thanks! RJ On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote: In general, in Breeze we allocate separate work arrays for each call to lapack, so it should be fine. In general concurrent modification isn't thread safe of course, but things that ought to be thread safe really should be. On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling rnowl...@gmail.com wrote: No, it's not in all cases. Since Breeze uses lapack under the hood, changes to memory between different threads is bad. There's actually a potential bug in the KMeans code where it uses += instead of +. On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Is breeze library called thread safe from Spark mllib code in case when native libs for blas and lapack are used? Might it be an issue when running Spark locally? Best regards, Alexander - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org