Re: Open Issues for Contributors

2015-09-22 Thread Luciano Resende
You can use Jira filters to narrow down the scope of issues you want to possible address, for instance, I use this filter to look into open issues, that are unassigned : https://issues.apache.org/jira/issues/?filter=12333428 For a specific release, you can also filter the release, and I Reynold

Re: Why there is no snapshots for 1.5 branch?

2015-09-22 Thread Patrick Wendell
I just added snapshot builds for 1.5. They will take a few hours to build, but once we get them working should publish every few hours. https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging - Patrick On Mon, Sep 21, 2015 at 10:36 PM, Bin Wang wrote: > However I find

Re: JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

2015-09-22 Thread shane knapp
ok, here's the updated downtime schedule for this week: wednesday, sept 23rd: firewall maintenance cancelled, as jon took care of the update saturday morning while we were bringing jenkins back up after the colo fire thursday, sept 24th: jenkins maintenance is still scheduled, but abbreviated

Re: Open Issues for Contributors

2015-09-22 Thread Pedro Rodriguez
Thanks for the links (first one is broken or private). I think the main mistake I was making was looking at fix version instead of target version (JIRA homepage with listings of versions links to fix versions). For anyone else interested in MLlib things, I am looking at this to see what goals

column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas
I am puzzled by the behavior of column identifiers in Spark SQL. I don't find any guidance in the "Spark SQL and DataFrame Guide" at http://spark.apache.org/docs/latest/sql-programming-guide.html. I am seeing odd behavior related to case-sensitivity and to delimited (quoted) identifiers.

Re: column identifiers in Spark SQL

2015-09-22 Thread Michael Armbrust
Are you using a SQLContext or a HiveContext? The programming guide suggests the latter, as the former is really only there because some applications may have conflicts with Hive dependencies. SQLContext is case sensitive by default where as the HiveContext is not. The parser in HiveContext is

Re: column identifiers in Spark SQL

2015-09-22 Thread Michael Armbrust
HiveQL uses `backticks` for quoted identifiers. On Tue, Sep 22, 2015 at 1:06 PM, Richard Hillegas wrote: > Thanks for that tip, Michael. I think that my sqlContext was a raw > SQLContext originally. I have rebuilt Spark like so... > > sbt/sbt -Phive assembly/assembly > >

Re: SparkR package path

2015-09-22 Thread Shivaram Venkataraman
As Rui says it would be good to understand the use case we want to support (supporting CRAN installs could be one for example). I don't think it should be very hard to do as the RBackend itself doesn't use the R source files. The RRDD does use it and the value comes from

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
Which Spark release are you building ? For master branch, I get the following: lib_managed/jars/datanucleus-api-jdo-3.2.6.jar lib_managed/jars/datanucleus-core-3.2.10.jar lib_managed/jars/datanucleus-rdbms-3.2.9.jar FYI On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas

Derby version in Spark

2015-09-22 Thread Richard Hillegas
I see that lib_managed/jars holds these old Derby versions: lib_managed/jars/derby-10.10.1.1.jar lib_managed/jars/derby-10.10.2.0.jar The Derby 10.10 release family supports some ancient JVMs: Java SE 5 and Java ME CDC/Foundation Profile 1.1. It's hard to imagine anyone running Spark on

Re: column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas
Thanks for that tip, Michael. I think that my sqlContext was a raw SQLContext originally. I have rebuilt Spark like so... sbt/sbt -Phive assembly/assembly Now I see that my sqlContext is a HiveContext. That fixes one of the queries. Now unnormalized column names work: // ...unnormalized

Fwd: Parallel collection in driver programs

2015-09-22 Thread Andy Huang
Hi Devs, Hopefully one of you know more on this? Thanks Andy -- Forwarded message -- From: Andy Huang Date: Wed, Sep 23, 2015 at 12:39 PM Subject: Parallel collection in driver programs To: u...@spark.apache.org Hi All, Would like know if anyone

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
I see. I use maven to build so I observe different contents under lib_managed directory. Here is snippet of dependency tree: [INFO] | +- org.spark-project.hive:hive-metastore:jar:1.2.1.spark:compile [INFO] | | +- com.jolbox:bonecp:jar:0.8.0.RELEASE:compile [INFO] | | +-

Re: Derby version in Spark

2015-09-22 Thread Richard Hillegas
Thanks, Ted. I'll follow up with the Hive folks. Cheers, -Rick Ted Yu wrote on 09/22/2015 03:41:12 PM: > From: Ted Yu > To: Richard Hillegas/San Francisco/IBM@IBMUS > Cc: Dev > Date: 09/22/2015 03:41 PM > Subject: Re: Derby

Re: column identifiers in Spark SQL

2015-09-22 Thread Richard Hillegas
Thanks for that additional tip, Michael. Backticks fix the problem query in which an identifier was transformed into a string literal. So this works now... // now correctly resolves the unnormalized column id sqlContext.sql("""select `b` from test_data""").show Any suggestion about how to

Re: Derby version in Spark

2015-09-22 Thread Richard Hillegas
Thanks, Ted. I'm working on my master branch. The lib_managed/jars directory has a lot of jarballs, including hadoop and hive. Maybe these were faulted in when I built with the following command? sbt/sbt -Phive assembly/assembly The Derby jars seem to be used in order to manage the

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
I cloned Hive 1.2 code base and saw: 10.10.2.0 So the version used by Spark is quite close to what Hive uses. On Tue, Sep 22, 2015 at 3:29 PM, Ted Yu wrote: > I see. > I use maven to build so I observe different contents under lib_managed > directory. > > Here is

Why Filter return a DataFrame object in DataFrame.scala?

2015-09-22 Thread qiuhai
Hi, Recently,I am reading source code(1.5 version) about sparksql .  In DataFrame.scala, there is a funtion named filter in the 737 row *def filter(condition: Column): DataFrame = Filter(condition.expr, logicalPlan)* The fucntion return a Filter object,but it require a DataFrame

Re: Why there is no snapshots for 1.5 branch?

2015-09-22 Thread Bin Wang
Thanks. I've solved it. I modified pom.xml and add my own repo into it, then use "mvn deploy". Fengdong Yu 于2015年9月22日周二 下午2:08写道: > basically, you can build snapshot by yourself. > > just clone the source code, and then 'mvn package/deploy/install…..’ > > > Azuryy Yu >

RowMatrix tallSkinnyQR - ERROR: Second call to constructor of static parser

2015-09-22 Thread Saif.A.Ellafi
Hi all, wondering if any could make the new 1.5.0 stallSkinnyQR to work. Follows my output, which is a big loop of the same errors until the shell dies. I am curious since im failing to load any implementations from BLAS, LAPACK, etc. scala> mat.tallSkinnyQR(false) 15/09/22 10:18:11 WARN

Open Issues for Contributors

2015-09-22 Thread Pedro Rodriguez
Where is the best place to look at open issues that haven't been assigned/started for the next release? I am interested in working on something, but I don't know what issues are higher priority for the next release. On a similar note, is there somewhere which outlines the overall goals for the