Re: spark_classpath in core/pom.xml and yarn/porm.xml

2014-09-25 Thread Ye Xianjin
hi, Sandy Ryza: I believe It's you originally added the SPARK_CLASSPATH in core/pom.xml in the org.scalatest section. Does this still needed in 1.1? I noticed this setting because when I looked into the unit-tests.log, It shows something below: 14/09/24 23:57:19.246 WARN SparkConf:

MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Spark developers, I try to implement a framework with Spark and MLlib to do duplicate detection. I'm not familiar with Spark and Scala so please be patient with me. In order to enrich the LabeledPoint class with some information I tried to extend it and added some properties. But the ML

Re: Spark SQL use of alias in where clause

2014-09-25 Thread Nicholas Chammas
That is correct. Aliases in the SELECT clause can only be referenced in the ORDER BY and HAVING clauses. Otherwise, you'll have to just repeat the statement, like concat() in this case. A more elegant alternative, which is probably not available in Spark SQL yet, is to use Common Table

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Yu Ishikawa
Hi Niklas Wilcke, As you said, it is difficult to extend LabeledPoint class in mllib.regression. Do you want to extend LabeledPoint class in order to use any other type exclude Double type? If you have your code on Github, could you show us it? I want to know what you want to do. Community By

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Egor Pahomov
@Yu Ishikawa, *I think the right place for such discussion - https://issues.apache.org/jira/browse/SPARK-3573 https://issues.apache.org/jira/browse/SPARK-3573* 2014-09-25 18:02 GMT+04:00 Yu Ishikawa yuu.ishikawa+sp...@gmail.com: Hi Niklas Wilcke, As you said, it is difficult to extend

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Yu Ishikawa
Hi Egor Pahomov, Thank you for your comment! - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-enable-extension-of-the-LabeledPoint-class-tp8546p8551.html Sent from the Apache Spark Developers List mailing list archive at

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Yu Ishikawa, I'm sorry but I can't share my code via github at the moment. Hopefully in some months I can. I don't want to change the type of the label but that would be also a very nice improvement. Making LabeledPoint abstract is exactly what I need. That enables me to create a class like

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Egor Pahomov, thanks for your suggestions. I think I will do the dirty workaround because I don't want to maintain my own version of spark for now. Maybe I will do later when I feel ready to contribute to the project. Kind Regards, Niklas Wilcke On 25.09.2014 16:27, Egor Pahomov wrote: I

Re: Spark SQL use of alias in where clause

2014-09-25 Thread Du Li
Thanks, Yanbo and Nicholas. Now it makes more sense — query optimization is the answer. /Du From: Nicholas Chammas nicholas.cham...@gmail.commailto:nicholas.cham...@gmail.com Date: Thursday, September 25, 2014 at 6:43 AM To: Yanbo Liang yanboha...@gmail.commailto:yanboha...@gmail.com Cc: Du Li

VertexRDD partition imbalance

2014-09-25 Thread Larry Xiao
Hi all VertexRDD is partitioned with HashPartitioner, and it exhibits some imbalance of tasks. For example, Connected Components with partition strategy Edge2D: Aggregated Metrics by Executor Executor ID Task Time Total Tasks Failed Tasks Succeeded Tasks Input Shuffle Read

Re: do MIMA checking before all test cases start?

2014-09-25 Thread Patrick Wendell
Yeah we can also move it first. Wouldn't hurt. On Thu, Sep 25, 2014 at 6:39 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It might still make sense to make this change if MIMA checks are always relatively quick, for the same reason we do style checks first. On Thu, Sep 25, 2014 at

Re: spark_classpath in core/pom.xml and yarn/pom.xml

2014-09-25 Thread Ye Xianjin
Hi Sandy, Sorry for the bothering. The tests run ok even the SPARK_CLASS setting is there now, but It gives a config warning and will potential interfere other settings like Marcelo said. The warning goes away if I remove it out. And Marcelo, I believe the setting in core/pom should not be