Re: yarn-cluster mode throwing NullPointerException

2015-10-12 Thread Venkatakrishnan Sowrirajan
Hi Rachana, Are you by any chance saying something like this in your code ​? ​ "sparkConf.setMaster("yarn-cluster");" ​SparkContext is not supported with yarn-cluster mode.​ I think you are hitting this bug -- > https://issues.apache.org/jira/browse/SPARK-7504. This got fixed in Spark-1.4.0,

Re: sbt test error -- "Could not reserve enough space"

2015-10-12 Thread Xiao Li
Hi, Robert, Please check the following link. It might help you. http://stackoverflow.com/questions/18155325/scala-error-occurred-during-initialization-of-vm-on-ubuntu-12-04 Good luck, Xiao Li 2015-10-09 9:41 GMT-07:00 Robert Dodier : > Hi, > > I am trying to build

Re: Build spark 1.5.1 branch fails

2015-10-12 Thread Xiao Li
Hi, Chester, Please check your pom.xml. Your java.version and maven.version might not match your build environment. Or using -Denforcer.skip=true from the command line to skip it. Good luck, Xiao Li 2015-10-08 10:35 GMT-07:00 Chester Chen : > Question regarding

Re: taking the heap dump when an executor goes OOM

2015-10-12 Thread Ted Yu
http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss > On Oct 11, 2015, at 10:45 PM, Niranda Perera wrote: > > Hi all, > > is there a way for me to get the heap-dump hprof of an executor jvm, when it > goes out

Re: What is the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS

2015-10-12 Thread YiZhi Liu
Hi Joseph, Thank you for clarifying the motivation that you setup a different API for ml pipelines, it sounds great. But I still think we could extract some common parts of the training & inference procedures for ml and mllib. In ml.classification.LogisticRegression, you simply transform the

Regarding SPARK JIRA ID-10286

2015-10-12 Thread Jagadeesan A.S.
Hi, I'm newbie to SPARK community. Last three months i started to working on spark and it's various modules. I tried to test spark with spark-perf workbench. Now one step forward, started to contribute in JIRA ID. i took SPARK JIRA ID - 10286 and sent pull request. Add @since annotation to

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-12 Thread Sean Owen
Yeah, was the issue that it had to be built vs Maven to show the error and this uses SBT -- or vice versa? that's why the existing test didn't detect it. Was just thinking of adding one more of these non-PR builds, but I forget if there was a reason this is hard. Certainly not worth building for

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-12 Thread Sean Owen
There are many Jenkins jobs besides the pull request builder that build against various Hadoop combinations, for example, in the background. Is there an obstacle to building vs 2.11 on both Maven and SBT this way? On Mon, Oct 12, 2015 at 2:55 PM, Iulian Dragoș wrote:

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-12 Thread Krishna Sankar
I think the key is to vote a specific set of source tarballs without any binary artifacts. The specific binaries are useful but shouldn't be part of the voting process. Makes sense, we really cannot prove (and no need to) that the binaries do not contain malware, but the source can be proven to

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-12 Thread Patrick Wendell
It's really easy to create and modify those builds. If the issue is that we need to add SBT or Maven to the existing one, it's a short change. We can just have it build both of them. I wasn't aware of things breaking before in one build but not another. - Patrick On Mon, Oct 12, 2015 at 9:21 AM,

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-12 Thread Iulian Dragoș
On Fri, Oct 9, 2015 at 10:34 PM, Patrick Wendell wrote: > I would push back slightly. The reason we have the PR builds taking so > long is death by a million small things that we add. Doing a full 2.11 > compile is order minutes... it's a nontrivial increase to the build

SparkSQL can not extract values from UDT (like VectorUDT)

2015-10-12 Thread Hao Ren
Hi, Consider the following code using spark.ml to get the probability column on a data set: model.transform(dataSet) .selectExpr("probability.values") .printSchema() Note that "probability" is `vector` type which is a UDT with the following implementation. class VectorUDT extends

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-12 Thread Tom Graves
I know there are multiple things being talked about here, but  I agree with Patrick here, we vote on the source distribution - src tarball (and of course the tag should match).  Perhaps in principle we vote on all the other specific binary distributions since they are generated from source

Re: Regarding SPARK JIRA ID-10286

2015-10-12 Thread Sean Owen
I don't see that you ever opened a pull request. You just linked to commits in your branch. Please have a look at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Mon, Oct 12, 2015 at 4:56 PM, Jagadeesan A.S. wrote: > Hi, > > I'm newbie to SPARK

RE: No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-12 Thread Ulanov, Alexander
Hi Disha, The problem might be as follows. The data that you have might physically reside only on two nodes and Spark launches data-local tasks. As a result, only two workers are used. You might want to force Spark to distribute the data across all nodes, however it does not seem to be

RE: Operations with cached RDD

2015-10-12 Thread Ulanov, Alexander
Thank you, Nitin. This does explain the problem. It seems that UI should make this more clear to the user, otherwise it is simply misleading if you read it as it. From: Nitin Goyal [mailto:nitin2go...@gmail.com] Sent: Sunday, October 11, 2015 5:57 AM To: Ulanov, Alexander Cc:

Flaky Jenkins tests?

2015-10-12 Thread Meihua Wu
Hi Spark Devs, I recently encountered several cases that the Jenkin failed tests that are supposed to be unrelated to my patch. For example, I made a patch to Spark ML Scala API but some Scala RDD tests failed due to timeout, or the java_gateway in PySpark fails. Just wondering if these are

Re: What is the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS

2015-10-12 Thread DB Tsai
Hi Liu, In ML, even after extracting the data into RDD, the versions between MLib and ML are quite different. Due to legacy design, in MLlib, we use Updater for handling regularization, and this layer of abstraction also does adaptive step size which is only for SGD. In order to get it working

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
You can go to: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN and see if the test failure(s) you encountered appeared there. FYI On Mon, Oct 12, 2015 at 1:24 PM, Meihua Wu wrote: > Hi Spark Devs, > > I recently encountered several cases

Re: Adding Spark Testing functionality

2015-10-12 Thread Holden Karau
So here is a quick description of the current testing bits (I can expand on it if people are interested) http://bit.ly/pandaPandaPanda . On Tue, Oct 6, 2015 at 3:49 PM, Holden Karau wrote: > I'll put together a google doc and send that out (in the meantime a quick > guide

Re: Flaky Jenkins tests?

2015-10-12 Thread Meihua Wu
Hi Ted, Thanks for the info. I have checked but I did not find the failures though. In my cases, I have seen 1) spilling in ExternalAppendOnlyMapSuite failed due to timeout. [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43531/console] 2) pySpark failure

Live UI

2015-10-12 Thread Jakob Odersky
Hi everyone, I am just getting started working on spark and was thinking of a first way to contribute whilst still trying to wrap my head around the codebase. Exploring the web UI, I noticed it is a classic request-response website, requiring manual refresh to get the latest data. I think it

Re: Live UI

2015-10-12 Thread Ryan Williams
Yea, definitely check out Spree ! It functions as "live" UI, history server, and archival storage of event log data. There are pros and cons to building something like it in Spark trunk (and running it in the Spark driver, presumably) that I've spent a lot of

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
Can you re-submit your PR to trigger a new build - assuming the tests are flaky ? If any test fails again, consider contacting the owner of the module for expert opinion. Cheers On Mon, Oct 12, 2015 at 2:07 PM, Meihua Wu wrote: > Hi Ted, > > Thanks for the info.

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
Josh: We're on the same page. I used the term 're-submit your PR' which was different from opening new PR. On Mon, Oct 12, 2015 at 2:47 PM, Personal wrote: > Just ask Jenkins to retest; no need to open a new PR just to re-trigger > the build. > > > On October 12, 2015 at

Re: Live UI

2015-10-12 Thread Holden Karau
I don't think there has been much work done with ScalaJS and Spark (outside of the April fools press release), but there is a live Web UI project out of hammerlab with Ryan Williams https://github.com/hammerlab/spree which you may want to take a look at. On Mon, Oct 12, 2015 at 2:36 PM, Jakob

a few major changes / improvements for Spark 1.6

2015-10-12 Thread Reynold Xin
Hi Spark devs, It is hard to track everything going on in Spark with so many pull requests and JIRA tickets. Below are 4 major improvements that will likely be in Spark 1.6. We have already done prototyping for all of them, and want feedback on their design. 1. SPARK-9850 Adaptive query