I've experienced something related to what we discussed. NaïveBayes crashes with native blas/lapack libraries for breeze/netlib on Windows: https://issues.apache.org/jira/browse/SPARK-3403 I've also attached to the issue another example with gradient that crashes in runMiniBatchSGD, probably trying to do grad1 += grad2. Could you take a close look at this issue? It paralyzed my development for mllib...
Best regards, Alexander -----Original Message----- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Wednesday, September 03, 2014 11:18 PM To: RJ Nowling Cc: David Hall; Ulanov, Alexander; <dev@spark.apache.org> Subject: Re: Is breeze thread safe in Spark? RJ, could you provide a code example that can re-produce the bug you observed in local testing? Breeze's += is not thread-safe. But in a Spark job, calls to a resultHandler is synchronized: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52 . Let's move our discussion to the JIRA page. -Xiangrui On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling <rnowl...@gmail.com> wrote: > Here's the JIRA: > > https://issues.apache.org/jira/browse/SPARK-3384 > > Even if the current implementation uses += in a thread safe manner, it > can be easy to make the mistake of accidentally using += in a > parallelized context. I suggest changing all instances of += to +. > > I would encourage others to reproduce and validate this issue, though. > > > On Wed, Sep 3, 2014 at 3:02 PM, David Hall <d...@cs.berkeley.edu> wrote: > >> mutating operations are not thread safe. Operations that don't mutate >> should be thread safe. I can't speak to what Evan said, but I would >> guess that the way they're using += should be safe. >> >> >> On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling <rnowl...@gmail.com> wrote: >> >>> David, >>> >>> Can you confirm that += is not thread safe but + is? I'm assuming + >>> allocates a new object for the write, while += doesn't. >>> >>> Thanks! >>> RJ >>> >>> >>> On Wed, Sep 3, 2014 at 2:50 PM, David Hall <d...@cs.berkeley.edu> wrote: >>> >>>> In general, in Breeze we allocate separate work arrays for each >>>> call to lapack, so it should be fine. In general concurrent >>>> modification isn't thread safe of course, but things that "ought" >>>> to be thread safe really should be. >>>> >>>> >>>> On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling <rnowl...@gmail.com> wrote: >>>> >>>>> No, it's not in all cases. Since Breeze uses lapack under the hood, >>>>> changes to memory between different threads is bad. >>>>> >>>>> There's actually a potential bug in the KMeans code where it uses >>>>> += instead of +. >>>>> >>>>> >>>>> On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander < >>>>> alexander.ula...@hp.com> >>>>> wrote: >>>>> >>>>> > Hi, >>>>> > >>>>> > Is breeze library called thread safe from Spark mllib code in >>>>> > case >>>>> when >>>>> > native libs for blas and lapack are used? Might it be an issue >>>>> > when >>>>> running >>>>> > Spark locally? >>>>> > >>>>> > Best regards, Alexander >>>>> > ---------------------------------------------------------------- >>>>> > ----- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>> > For additional commands, e-mail: dev-h...@spark.apache.org >>>>> > >>>>> > >>>>> >>>>> >>>>> -- >>>>> em rnowl...@gmail.com >>>>> c 954.496.2314 >>>>> >>>> >>>> >>> >>> >>> -- >>> em rnowl...@gmail.com >>> c 954.496.2314 >>> >> >> > > > -- > em rnowl...@gmail.com > c 954.496.2314 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org