Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
Hey All, This is not an official vote, but I wanted to cut an RC so that people can test against the Maven artifacts, test building with their configuration, etc. We are still chasing down a few issues and updating docs, etc. If you have issues or bug reports for this release, please send an

Code Review for SPARK-1516: Throw exception in yarn client instead of System.exit

2014-04-29 Thread DB Tsai
Hi All, Since we're launching Spark Yarn Job in our tomcat application, the default behavior of calling System.exit when job is finished or runs into any error isn't desirable. We create this PR https://github.com/apache/spark/pull/490 to address this issue. Since the logical is fairly

Re: Spark 1.0.0 rc3

2014-04-29 Thread Marcelo Vanzin
Hi Patrick, What are the expectations / guarantees on binary compatibility between 0.9 and 1.0? You mention some API changes, which kinda hint that binary compatibility has already been broken, but just wanted to point out there are other cases. e.g.: Exception in thread main

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
What are the expectations / guarantees on binary compatibility between 0.9 and 1.0? There are not guarantees.

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
Hi Dean, We always used the Hadoop libraries here to read and write local files. In Spark 1.0 we started enforcing the rule that you can't over-write an existing directory because it can cause confusing/undefined behavior if multiple jobs output to the directory (they partially clobber each

Re: Spark 1.0.0 rc3

2014-04-29 Thread Dean Wampler
Thanks. I'm fine with the logic change, although I was a bit surprised to see Hadoop used for file I/O. Anyway, the jira issue and pull request discussions mention a flag to enable overwrites. That would be very convenient for a tutorial I'm writing, although I wouldn't recommend it for normal

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-29 Thread David Hall
Yeah, that's probably the easiest though obviously pretty hacky. I'm surprised that the hessian approximation isn't worse than it is. (As in, I'd expect error messages.) It's obviously line searching much more, so the approximation must be worse. You might be interested in this online bfgs:

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-29 Thread DB Tsai
Yeah, the approximation of hssian in LBFGS isn't stateless, and it does depend on previous LBFGS step as Xiangrui also pointed out. It's surprising that it works without error message. I also saw the loss is fluctuating like SGD during the training. We will remove the miniBatch mode in LBFGS in

Re: Spark 1.0.0 rc3

2014-04-29 Thread Patrick Wendell
That suggestion got lost along the way and IIRC the patch didn't have that. It's a good idea though, if nothing else to provide a simple means for backwards compatibility. I created a JIRA for this. It's very straightforward so maybe someone can pick it up quickly: