Re: HA support for Spark

2014-12-10 Thread Reynold Xin
This would be plausible for specific purposes such as Spark streaming or Spark SQL, but I don't think it is doable for general Spark driver since it is just a normal JVM process with arbitrary program state. On Wed, Dec 10, 2014 at 12:25 AM, Jun Feng Liu liuj...@cn.ibm.com wrote: Do we have any

Maven profile in MLLib netlib-lgpl not working (1.1.1)

2014-12-10 Thread Guillaume Pitel
Hi Issue created https://issues.apache.org/jira/browse/SPARK-4816 Probably a maven-related question for profiles in child modules I couldn't find a clean solution, just a workaround : modify pom.xml in mllib module to force activation of netlib-lgpl module. Hope a maven expert will help.

Re: HA support for Spark

2014-12-10 Thread Jun Feng Liu
Well, it should not be mission impossible thinking there are so many HA solution existing today. I would interest to know if there is any specific difficult. Best Regards Jun Feng Liu IBM China Systems Technology Laboratory in Beijing Phone: 86-10-82452683 E-mail: liuj...@cn.ibm.com

Tachyon in Spark

2014-12-10 Thread Jun Feng Liu
Dose Spark today really leverage Tachyon linage to process data? It seems like the application should call createDependency function in TachyonFS to create a new linage node. But I did not find any place call that in Spark code. Did I missed anything? Best Regards Jun Feng Liu IBM China

Re: HA support for Spark

2014-12-10 Thread Sandy Ryza
I think that if we were able to maintain the full set of created RDDs as well as some scheduler and block manager state, it would be enough for most apps to recover. On Wed, Dec 10, 2014 at 5:30 AM, Jun Feng Liu liuj...@cn.ibm.com wrote: Well, it should not be mission impossible thinking there

Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Andrew Lee
Hi All, I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run HiveContext, it gives me the following error: Caused by: java.lang.ClassNotFoundException:

RE: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Andrew Lee
Apologize for the format, somehow it got messed up and linefeed were removed. Here's a reformatted version. Hi All, I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run HiveContext, it gives me

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-10 Thread shane knapp
reminder -- this is happening friday morning @ 730am! On Mon, Dec 1, 2014 at 5:10 PM, shane knapp skn...@berkeley.edu wrote: i'll send out a reminder next week, but i wanted to give a heads up: i'll be bringing down the entire jenkins infrastructure for reboots and system updates. please

Row Similarity

2014-12-10 Thread Debasish Das
Hi, It seems there are multiple places where we would like to compute row similarity (accurate or approximate similarities) Basically through RowMatrix columnSimilarities we can compute column similarities of a tall skinny matrix Similarly we should have an API in RowMatrix called

Re: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Patrick Wendell
Hi Andrew, It looks like somehow you are including jars from the upstream Apache Hive 0.13 project on your classpath. For Spark 1.2 Hive 0.13 support, we had to modify Hive to use a different version of Kryo that was compatible with Spark's Kryo version.

[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-10 Thread Patrick Wendell
This vote is closed in favor of RC2. On Fri, Dec 5, 2014 at 2:02 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks all for the continued testing! The issue I mentioned earlier SPARK-4498 was fixed earlier this week (hat tip to Mark Hamstra who contributed to fix). In the

[VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-10 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc.

Re: Row Similarity

2014-12-10 Thread Reza Zadeh
It's not so cheap to compute row similarities when there are many rows, as it amounts to computing the outer product of a matrix A (i.e. computing AA^T, which is expensive). There is a JIRA to track handling (1) and (2) more efficiently than computing all pairs:

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-10 Thread Matei Zaharia
+1 Tested on Mac OS X. Matei On Dec 10, 2014, at 1:08 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2):

Re: Is Apache JIRA down?

2014-12-10 Thread Nicholas Chammas
Nevermind, seems to be back up now. On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: For example: https://issues.apache.org/jira/browse/SPARK-3431 Where do we report/track issues with JIRA itself being down? Nick

Re: Is Apache JIRA down?

2014-12-10 Thread Patrick Wendell
I believe many apache services are/were down due to an outage. On Wed, Dec 10, 2014 at 5:24 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Nevermind, seems to be back up now. On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: For example:

Re: Row Similarity

2014-12-10 Thread Debasish Das
I added code to compute topK products for each user and topK user for each product in SPARK-3066.. That is different than row similarity calculation as we need both user and product factors to calculate the topK recommendations.. For (1) and (2) we are trying to answer similarUsers to given a

Re: Row Similarity

2014-12-10 Thread Reza Zadeh
Here we go: https://issues.apache.org/jira/browse/SPARK-4823 On Wed, Dec 10, 2014 at 9:01 PM, Debasish Das debasish.da...@gmail.com wrote: I added code to compute topK products for each user and topK user for each product in SPARK-3066.. That is different than row similarity calculation as

SparkSQL not honoring schema

2014-12-10 Thread Alessandro Baretta
Hello, I defined a SchemaRDD by applying a hand-crafted StructType to an RDD. Some of the Rows in the RDD are malformed--that is, they do not conform to the schema defined by the StructType. When running a select statement on this SchemaRDD I would expect SparkSQL to either reject the malformed

Re: SparkSQL not honoring schema

2014-12-10 Thread Michael Armbrust
As the scala doc for applySchema says, It is important to make sure that the structure of every [[Row]] of the provided RDD matches the provided schema. Otherwise, there will be runtime exceptions. We don't check as doing runtime reflection on all of the data would be very expensive. You will

Re: SparkSQL not honoring schema

2014-12-10 Thread Alessandro Baretta
Hey Michael, Thanks for the clarification. I was actually assuming the query would fail. Ok, so this means I will have to do the validation in an RDD transformation feeding into the SchemaRDD. On Wed, Dec 10, 2014 at 6:27 PM, Michael Armbrust mich...@databricks.com wrote: As the scala doc for

Re: HA support for Spark

2014-12-10 Thread Jun Feng Liu
Right, perhaps also need preserve some DAG information? I am wondering if there is any work around this. Sandy Ryza sandy.ryza@cloud