This would be plausible for specific purposes such as Spark streaming or
Spark SQL, but I don't think it is doable for general Spark driver since it
is just a normal JVM process with arbitrary program state.
On Wed, Dec 10, 2014 at 12:25 AM, Jun Feng Liu liuj...@cn.ibm.com wrote:
Do we have any
Hi
Issue created https://issues.apache.org/jira/browse/SPARK-4816
Probably a maven-related question for profiles in child modules
I couldn't find a clean solution, just a workaround : modify pom.xml in
mllib module to force activation of netlib-lgpl module.
Hope a maven expert will help.
Well, it should not be mission impossible thinking there are so many HA
solution existing today. I would interest to know if there is any specific
difficult.
Best Regards
Jun Feng Liu
IBM China Systems Technology Laboratory in Beijing
Phone: 86-10-82452683
E-mail: liuj...@cn.ibm.com
Dose Spark today really leverage Tachyon linage to process data? It seems
like the application should call createDependency function in TachyonFS to
create a new linage node. But I did not find any place call that in Spark
code. Did I missed anything?
Best Regards
Jun Feng Liu
IBM China
I think that if we were able to maintain the full set of created RDDs as
well as some scheduler and block manager state, it would be enough for most
apps to recover.
On Wed, Dec 10, 2014 at 5:30 AM, Jun Feng Liu liuj...@cn.ibm.com wrote:
Well, it should not be mission impossible thinking there
Hi All,
I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to
include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run
HiveContext, it gives me the following error:
Caused by: java.lang.ClassNotFoundException:
Apologize for the format, somehow it got messed up and linefeed were removed.
Here's a reformatted version.
Hi All,
I tried to include necessary libraries in SPARK_CLASSPATH in spark-env.sh to
include auxiliaries JARs and datanucleus*.jars from Hive, however, when I run
HiveContext, it gives me
reminder -- this is happening friday morning @ 730am!
On Mon, Dec 1, 2014 at 5:10 PM, shane knapp skn...@berkeley.edu wrote:
i'll send out a reminder next week, but i wanted to give a heads up: i'll
be bringing down the entire jenkins infrastructure for reboots and system
updates.
please
Hi,
It seems there are multiple places where we would like to compute row
similarity (accurate or approximate similarities)
Basically through RowMatrix columnSimilarities we can compute column
similarities of a tall skinny matrix
Similarly we should have an API in RowMatrix called
Hi Andrew,
It looks like somehow you are including jars from the upstream Apache
Hive 0.13 project on your classpath. For Spark 1.2 Hive 0.13 support,
we had to modify Hive to use a different version of Kryo that was
compatible with Spark's Kryo version.
This vote is closed in favor of RC2.
On Fri, Dec 5, 2014 at 2:02 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey All,
Thanks all for the continued testing!
The issue I mentioned earlier SPARK-4498 was fixed earlier this week
(hat tip to Mark Hamstra who contributed to fix).
In the
Please vote on releasing the following candidate as Apache Spark version 1.2.0!
The tag to be voted on is v1.2.0-rc2 (commit a428c446e2):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e
The release files, including signatures, digests, etc.
It's not so cheap to compute row similarities when there are many rows, as
it amounts to computing the outer product of a matrix A (i.e. computing
AA^T, which is expensive).
There is a JIRA to track handling (1) and (2) more efficiently than
computing all pairs:
+1
Tested on Mac OS X.
Matei
On Dec 10, 2014, at 1:08 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.2.0!
The tag to be voted on is v1.2.0-rc2 (commit a428c446e2):
Nevermind, seems to be back up now.
On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas
nicholas.cham...@gmail.com wrote:
For example: https://issues.apache.org/jira/browse/SPARK-3431
Where do we report/track issues with JIRA itself being down?
Nick
I believe many apache services are/were down due to an outage.
On Wed, Dec 10, 2014 at 5:24 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
Nevermind, seems to be back up now.
On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas
nicholas.cham...@gmail.com wrote:
For example:
I added code to compute topK products for each user and topK user for each
product in SPARK-3066..
That is different than row similarity calculation as we need both user and
product factors to calculate the topK recommendations..
For (1) and (2) we are trying to answer similarUsers to given a
Here we go: https://issues.apache.org/jira/browse/SPARK-4823
On Wed, Dec 10, 2014 at 9:01 PM, Debasish Das debasish.da...@gmail.com
wrote:
I added code to compute topK products for each user and topK user for each
product in SPARK-3066..
That is different than row similarity calculation as
Hello,
I defined a SchemaRDD by applying a hand-crafted StructType to an RDD. Some
of the Rows in the RDD are malformed--that is, they do not conform to the
schema defined by the StructType. When running a select statement on this
SchemaRDD I would expect SparkSQL to either reject the malformed
As the scala doc for applySchema says, It is important to make sure that
the structure of every [[Row]] of the provided RDD matches the provided
schema. Otherwise, there will be runtime exceptions. We don't check as
doing runtime reflection on all of the data would be very expensive. You
will
Hey Michael,
Thanks for the clarification. I was actually assuming the query would fail.
Ok, so this means I will have to do the validation in an RDD transformation
feeding into the SchemaRDD.
On Wed, Dec 10, 2014 at 6:27 PM, Michael Armbrust mich...@databricks.com
wrote:
As the scala doc for
Right, perhaps also need preserve some DAG information? I am wondering if
there is any work around this.
Sandy Ryza
sandy.ryza@cloud
22 matches
Mail list logo