Re: Temporary jenkins issue

2015-02-08 Thread Josh Rosen
It looks like this may be fixed soon in Jenkins: https://issues.jenkins-ci.org/browse/JENKINS-25446 https://github.com/jenkinsci/flaky-test-handler-plugin/pull/1 On February 2, 2015 at 7:38:19 PM, Patrick Wendell (pwend...@gmail.com) wrote: Hey All, I made a change to the Jenkins

Re: Data source API | sizeInBytes should be to *Scan

2015-02-08 Thread Aniket Bhatnagar
Thanks for looking into this. If this true, isn't this an issue today? The default implementation of sizeInBytes is 1 + broadcast threshold. So, if catalyst's cardinality estimation estimates even a small filter selectivity, it will result in broadcasting the relation. Therefore, shouldn't the

Re: Improving metadata in Spark JIRA

2015-02-08 Thread Patrick Wendell
I think we already have a YARN component. https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20%3D%20YARN I don't think JIRA allows it to be mandatory, but if it does, that would be useful. On Sat, Feb 7, 2015 at 5:08 PM, Nicholas Chammas

Re: Improving metadata in Spark JIRA

2015-02-08 Thread Nicholas Chammas
Oh derp, missed the YARN component. JIRA, does allow admins to make fields mandatory: https://confluence.atlassian.com/display/JIRA/Specifying+Field+Behavior#SpecifyingFieldBehavior-Makingafieldrequiredoroptional Nick On Sat Feb 07 2015 at 5:23:10 PM Patrick Wendell pwend...@gmail.com wrote:

Re: Using CUDA within Spark / boosting linear algebra

2015-02-08 Thread Evan R. Sparks
I would build OpenBLAS yourself, since good BLAS performance comes from getting cache sizes, etc. set up correctly for your particular hardware - this is often a very tricky process (see, e.g. ATLAS), but we found that on relatively modern Xeon chips, OpenBLAS builds quickly and yields performance

[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-08 Thread Patrick Wendell
This vote passes with 5 +1 votes (3 binding) and no 0 or -1 votes. +1 Votes: Krishna Sankar Sean Owen* Chip Senkbeil Matei Zaharia* Patrick Wendell* 0 Votes: (none) -1 Votes: (none) On Fri, Feb 6, 2015 at 5:12 PM, Patrick Wendell pwend...@gmail.com wrote: I'll add a +1 as well. On Fri, Feb

Re: Data source API | sizeInBytes should be to *Scan

2015-02-08 Thread Reynold Xin
We thought about this today after seeing this email. I actually built a patch for this (adding filter/column to data source stat estimation), but ultimately dropped it due to the potential problems the change the cause. The main problem I see is that column pruning/predicate pushdowns are

Re: Improving metadata in Spark JIRA

2015-02-08 Thread Nicholas Chammas
By the way, isn't it possible to make the Component field mandatory when people open new issues? Shouldn't we do that? Btw Patrick, don't we need a YARN component? I think our JIRA components should roughly match the components on the PR dashboard https://spark-prs.appspot.com/. Nick On Fri Feb

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-08 Thread Patrick Wendell
I'll add a +1 as well. On Fri, Feb 6, 2015 at 2:38 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested on Mac OS X. Matei On Feb 2, 2015, at 8:57 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1! The

Re: Using CUDA within Spark / boosting linear algebra

2015-02-08 Thread Evan R. Sparks
Getting breeze to pick up the right blas library is critical for performance. I recommend using OpenBLAS (or MKL, if you already have it). It might make sense to force BIDMat to use the same underlying BLAS library as well. On Fri, Feb 6, 2015 at 4:42 PM, Ulanov, Alexander alexander.ula...@hp.com

Spark SQL Window Functions

2015-02-08 Thread Evan R. Sparks
Currently there's no standard way of handling time series data in Spark. We were kicking around some ideas in the lab today and one thing that came up was SQL Window Functions as a way to support them and query over time series (do things like moving average, etc.) These don't seem to be

Pull Requests on github

2015-02-08 Thread fommil
Hi all, I'm the author of netlib-java and I noticed that the documentation in MLlib was out of date and misleading, so I submitted a pull request on github which will hopefully make things easier for everybody to understand the benefits of system optimised natives and how to use them :-)

RE: Using CUDA within Spark / boosting linear algebra

2015-02-08 Thread Ulanov, Alexander
Evan, could you elaborate on how to force BIDMat and netlib-java to force loading the right blas? For netlib, I there are few JVM flags, such as -Dcom.github.fommil.netlib.BLAS=com.github.fommil.netlib.F2jBLAS, so I can force it to use Java implementation. Not sure I understand how to force use

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-08 Thread Matei Zaharia
+1 Tested on Mac OS X. Matei On Feb 2, 2015, at 8:57 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc3 (commit b6eaf77):

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-08 Thread WangTaoTheTonic
Should we merge this commit into branch1.2 too? https://github.com/apache/spark/commit/2483c1efb6429a7d8a20c96d18ce2fec93a1aff9 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-2-1-RC3-tp10405p10503.html Sent from the

Unit tests

2015-02-08 Thread Patrick Wendell
Hey All, The tests are in a not-amazing state right now due to a few compounding factors: 1. We've merged a large volume of patches recently. 2. The load on jenkins has been relatively high, exposing races and other behavior not seen at lower load. For those not familiar, the main issue is

RE: Using CUDA within Spark / boosting linear algebra

2015-02-08 Thread Ulanov, Alexander
Hi Evan, Joseph I did few matrix multiplication test and BIDMat seems to be ~10x faster than netlib-java+breeze (sorry for weird table formatting): |A*B size | BIDMat MKL | Breeze+Netlib-java native_system_linux_x86-64| Breeze+Netlib-java f2jblas |

Re: Using CUDA within Spark / boosting linear algebra

2015-02-08 Thread Nicholas Chammas
Lemme butt in randomly here and say there is an interesting discussion on this Spark PR https://github.com/apache/spark/pull/4448 about netlib-java, JBLAS, Breeze, and other things I know nothing of, that y'all may find interesting. Among the participants is the author of netlib-java. On Sun Feb

Re: Pull Requests on github

2015-02-08 Thread Akhil Das
You can open a Jira issue pointing this PR to get it processed faster. :) Thanks Best Regards On Sat, Feb 7, 2015 at 7:07 AM, fommil sam.halli...@gmail.com wrote: Hi all, I'm the author of netlib-java and I noticed that the documentation in MLlib was out of date and misleading, so I

Re: Spark SQL Window Functions

2015-02-08 Thread Reynold Xin
This is the original ticket: https://issues.apache.org/jira/browse/SPARK-1442 I believe it will happen, one way or another :) On Fri, Feb 6, 2015 at 5:29 PM, Evan R. Sparks evan.spa...@gmail.com wrote: Currently there's no standard way of handling time series data in Spark. We were kicking

Re: Welcoming three new committers

2015-02-08 Thread Likun (Jacky)
Congratulations guys! Keep helping this awesome community. BR, Jacky Li - 发自 Smartisan T1 - 2015年2月4日,上午6:36于 Matei Zaharia matei.zaha...@gmail.com 写道: Hi all, The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley and Sean Owen. All three have been major contributors