Re: Graphx: GraphLoader.edgeListFile with edge weight

2014-05-22 Thread Reynold Xin
You can submit a pull request on the github mirror: https://github.com/apache/spark Thanks. On Wed, May 21, 2014 at 10:59 PM, npanj nitinp...@gmail.com wrote: Hi, For my project I needed to load a graph with edge weight; for this I have updated GraphLoader.edgeListFile to consider third

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Kevin Markey
I've discovered that one of the anomalies I encountered was due to a (embarrassing? humorous?) user error. See the user list thread Failed RC-10 yarn-cluster job for FS closed error when cleaning up staging directory for my discussion. With the user error corrected, the FS closed exception

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-22 Thread Xiangrui Meng
Hi DB, I found it is a little hard to implement the solution I mentioned: Do not send the primary jar and secondary jars to executors' distributed cache. Instead, add them to spark.jars in SparkSubmit and serve them via http by called sc.addJar in SparkContext. If you look at

Contributions to MLlib

2014-05-22 Thread MEETHU MATHEW
Hi, I would like to do some contributions towards the MLlib .I've a few concerns regarding the same. 1. Is there any reason for implementing the algorithms supported  by MLlib in Scala 2. Will you accept if  the contributions are done in Python or Java Thanks, Meethu M

Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Gerard Maas
Sure. Should I create a Jira as well? I saw there's already a broader ticket regarding the ambiguous use of SPARK_HOME [1] (cc: Patrick as owner of that ticket) I don't know if it would be more relevant to remove the use of SPARK_HOME when using mesos and have the assembly as the only way

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Kevin Markey
I retested several different cases... 1. FS closed exception shows up ONLY in RC-10, not in Spark 0.9.1, with both Hadoop 2.2 and 2.3. 2. SPARK-1898 has no effect for my use cases. 3. The failure to report that the underlying application is RUNNING and that it has succeeded is due ONLY to my

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Marcelo Vanzin
Hi Kevin, On Thu, May 22, 2014 at 9:49 AM, Kevin Markey kevin.mar...@oracle.com wrote: The FS closed exception only effects the cleanup of the staging directory, not the final success or failure. I've not yet tested the effect of changing my application's initialization, use, or closing of

Re: Contributions to MLlib

2014-05-22 Thread Xiangrui Meng
Hi Meethu, Thanks for asking! Scala is the native language in Spark. Implementing algorithms in Scala can utilize the full power of Spark Core. Also, Scala's syntax is very concise. Implementing ML algorithms using different languages would increase the maintenance cost. However, there are still

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Colin McCabe
The FileSystem cache is something that has caused a lot of pain over the years. Unfortunately we (in Hadoop core) can't change the way it works now because there are too many users depending on the current behavior. Basically, the idea is that when you request a FileSystem with certain options

Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Andrew Ash
Fixing the immediate issue of requiring SPARK_HOME to be set when it's not actually used is a separate ticket in my mind from a larger cleanup of what SPARK_HOME means across the cluster. I think you should file a new ticket for just this particular issue. On Thu, May 22, 2014 at 11:03 AM,

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Aaron Davidson
In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly, and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue was fixed: https://issues.apache.org/jira/browse/SPARK-1676 Prior to this fix, each Spark task created and cached its own FileSystems due to a bug

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Kevin Markey
Thank you, all! This is quite helpful. We have been arguing how to handle this issue across a growing application. Unfortunately the Hadoop FileSystem java doc should say all this but doesn't! Kevin On 05/22/2014 01:48 PM, Aaron Davidson wrote: In Spark 0.9.0 and 0.9.1, we stopped using

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Tathagata Das
Hey all, On further testing, I came across a bug that breaks execution of pyspark scripts on YARN. https://issues.apache.org/jira/browse/SPARK-1900 This is a blocker and worth cutting a new RC. We also found a fix for a known issue that prevents additional jar files to be specified through

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Colin McCabe
On Thu, May 22, 2014 at 12:48 PM, Aaron Davidson ilike...@gmail.com wrote: In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly, and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue was fixed: https://issues.apache.org/jira/browse/SPARK-1676

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Henry Saputra
Looks like SPARK-1900 is a blocker for YARN and might as well add SPARK-1870 while at it. TD or Patrick, could you kindly send [CANCEL] prefixed in the subject email out for the RC10 Vote to help people follow the active VOTE threads? The VOTE emails are getting a bit hard to follow. - Henry

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Tathagata Das
Right! Doing that. TD On Thu, May 22, 2014 at 3:07 PM, Henry Saputra henry.sapu...@gmail.com wrote: Looks like SPARK-1900 is a blocker for YARN and might as well add SPARK-1870 while at it. TD or Patrick, could you kindly send [CANCEL] prefixed in the subject email out for the RC10 Vote to

Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Gerard Maas
ack On Thu, May 22, 2014 at 9:26 PM, Andrew Ash and...@andrewash.com wrote: Fixing the immediate issue of requiring SPARK_HOME to be set when it's not actually used is a separate ticket in my mind from a larger cleanup of what SPARK_HOME means across the cluster. I think you should file a