Re: spark_classpath in core/pom.xml and yarn/pom.xml

2014-09-25 Thread Ye Xianjin
Hi Sandy, 

Sorry for the bothering. 

The tests run ok even the SPARK_CLASS setting is there now, but It gives a 
config warning and will potential interfere other settings like Marcelo said. 
The warning goes away if I remove it out.

And Marcelo, I believe the setting in core/pom should not be used any more. But 
I don't think it's worthy to file a JIRA for such small change. Maybe put it 
into other related JIRA. It's a pity that your pr
already got merged.
 

-- 
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, September 26, 2014 at 6:29 AM, Sandy Ryza wrote:

> Hi Ye,
> 
> I think git blame shows me because I fixed the formatting in core/pom.xml, 
> but I don't actually know the original reason for setting SPARK_CLASSPATH 
> there.
> 
> Do the tests run OK if you take it out?
> 
> -Sandy
> 
> 
> On Thu, Sep 25, 2014 at 1:59 AM, Ye Xianjin  (mailto:advance...@gmail.com)> wrote:
> > hi, Sandy Ryza:
> >  I believe It's you originally added the SPARK_CLASSPATH in 
> > core/pom.xml in the org.scalatest section. Does this still needed in 1.1?
> >  I noticed this setting because when I looked into the unit-tests.log, 
> > It shows something below:
> > > 14/09/24 23:57:19.246 WARN SparkConf:
> > > SPARK_CLASSPATH was detected (set to 'null').
> > > This is deprecated in Spark 1.0+.
> > >
> > > Please instead use:
> > >  - ./spark-submit with --driver-class-path to augment the driver classpath
> > >  - spark.executor.extraClassPath to augment the executor classpath
> > >
> > > 14/09/24 23:57:19.246 WARN SparkConf: Setting 
> > > 'spark.executor.extraClassPath' to 'null' as a work-around.
> > > 14/09/24 23:57:19.247 WARN SparkConf: Setting 
> > > 'spark.driver.extraClassPath' to 'null' as a work-around.
> > 
> > However I didn't set SPARK_CLASSPATH env variable. And looked into the 
> > SparkConf.scala, If user actually set extraClassPath,  the SparkConf will 
> > throw SparkException.
> > --
> > Ye Xianjin
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > 
> > 
> > On Tuesday, September 23, 2014 at 12:56 AM, Ye Xianjin wrote:
> > 
> > > Hi:
> > > I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment 
> > > variable for testing. But in the SparkConf.scala, this is deprecated in 
> > > Spark 1.0+.
> > > So what this variable for? should we just remove this variable?
> > >
> > >
> > > --
> > > Ye Xianjin
> > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > >
> > 
> 



Re: spark_classpath in core/pom.xml and yarn/porm.xml

2014-09-25 Thread Marcelo Vanzin
BTW I removed it from the yarn pom since it was not used (and actually
interfered with a test I was writing).

I did not touch the core pom, but I wouldn't be surprised if it's not
needed there either.

On Thu, Sep 25, 2014 at 3:29 PM, Sandy Ryza  wrote:
> Hi Ye,
>
> I think git blame shows me because I fixed the formatting in core/pom.xml,
> but I don't actually know the original reason for setting SPARK_CLASSPATH
> there.
>
> Do the tests run OK if you take it out?
>
> -Sandy
>
>
> On Thu, Sep 25, 2014 at 1:59 AM, Ye Xianjin  wrote:
>
>> hi, Sandy Ryza:
>>  I believe It's you originally added the SPARK_CLASSPATH in
>> core/pom.xml in the org.scalatest section. Does this still needed in 1.1?
>>  I noticed this setting because when I looked into the unit-tests.log,
>> It shows something below:
>> > 14/09/24 23:57:19.246 WARN SparkConf:
>> > SPARK_CLASSPATH was detected (set to 'null').
>> > This is deprecated in Spark 1.0+.
>> >
>> > Please instead use:
>> >  - ./spark-submit with --driver-class-path to augment the driver
>> classpath
>> >  - spark.executor.extraClassPath to augment the executor classpath
>> >
>> > 14/09/24 23:57:19.246 WARN SparkConf: Setting
>> 'spark.executor.extraClassPath' to 'null' as a work-around.
>> > 14/09/24 23:57:19.247 WARN SparkConf: Setting
>> 'spark.driver.extraClassPath' to 'null' as a work-around.
>>
>> However I didn't set SPARK_CLASSPATH env variable. And looked into the
>> SparkConf.scala, If user actually set extraClassPath,  the SparkConf will
>> throw SparkException.
>> --
>> Ye Xianjin
>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>>
>>
>> On Tuesday, September 23, 2014 at 12:56 AM, Ye Xianjin wrote:
>>
>> > Hi:
>> > I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment
>> variable for testing. But in the SparkConf.scala, this is deprecated in
>> Spark 1.0+.
>> > So what this variable for? should we just remove this variable?
>> >
>> >
>> > --
>> > Ye Xianjin
>> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>> >
>>
>>



-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark_classpath in core/pom.xml and yarn/porm.xml

2014-09-25 Thread Sandy Ryza
Hi Ye,

I think git blame shows me because I fixed the formatting in core/pom.xml,
but I don't actually know the original reason for setting SPARK_CLASSPATH
there.

Do the tests run OK if you take it out?

-Sandy


On Thu, Sep 25, 2014 at 1:59 AM, Ye Xianjin  wrote:

> hi, Sandy Ryza:
>  I believe It's you originally added the SPARK_CLASSPATH in
> core/pom.xml in the org.scalatest section. Does this still needed in 1.1?
>  I noticed this setting because when I looked into the unit-tests.log,
> It shows something below:
> > 14/09/24 23:57:19.246 WARN SparkConf:
> > SPARK_CLASSPATH was detected (set to 'null').
> > This is deprecated in Spark 1.0+.
> >
> > Please instead use:
> >  - ./spark-submit with --driver-class-path to augment the driver
> classpath
> >  - spark.executor.extraClassPath to augment the executor classpath
> >
> > 14/09/24 23:57:19.246 WARN SparkConf: Setting
> 'spark.executor.extraClassPath' to 'null' as a work-around.
> > 14/09/24 23:57:19.247 WARN SparkConf: Setting
> 'spark.driver.extraClassPath' to 'null' as a work-around.
>
> However I didn't set SPARK_CLASSPATH env variable. And looked into the
> SparkConf.scala, If user actually set extraClassPath,  the SparkConf will
> throw SparkException.
> --
> Ye Xianjin
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Tuesday, September 23, 2014 at 12:56 AM, Ye Xianjin wrote:
>
> > Hi:
> > I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment
> variable for testing. But in the SparkConf.scala, this is deprecated in
> Spark 1.0+.
> > So what this variable for? should we just remove this variable?
> >
> >
> > --
> > Ye Xianjin
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >
>
>


Re: do MIMA checking before all test cases start?

2014-09-25 Thread Patrick Wendell
Yeah we can also move it first. Wouldn't hurt.

On Thu, Sep 25, 2014 at 6:39 AM, Nicholas Chammas
 wrote:
> It might still make sense to make this change if MIMA checks are always
> relatively quick, for the same reason we do style checks first.
>
> On Thu, Sep 25, 2014 at 12:25 AM, Nan Zhu  wrote:
>>
>> yeah, I tried that, but there is always an issue when I ran dev/mima,
>>
>> it always gives me some binary compatibility error on Java API part
>>
>> so I have to wait for Jenkins' result when fixing MIMA issues
>>
>> --
>> Nan Zhu
>>
>>
>> On Thursday, September 25, 2014 at 12:04 AM, Patrick Wendell wrote:
>>
>> > Have you considered running the mima checks locally? We prefer people
>> > not use Jenkins for very frequent checks since it takes resources away
>> > from other people trying to run tests.
>> >
>> > On Wed, Sep 24, 2014 at 6:44 PM, Nan Zhu > > (mailto:zhunanmcg...@gmail.com)> wrote:
>> > > Hi, all
>> > >
>> > > It seems that, currently, Jenkins makes MIMA checking after all test
>> > > cases have finished, IIRC, during the first months we introduced MIMA, 
>> > > we do
>> > > the MIMA checking before running test cases
>> > >
>> > > What's the motivation to adjust this behaviour?
>> > >
>> > > In my opinion, if you have some binary compatibility issues, you just
>> > > need to do some minor changes, but in the current environment, you can 
>> > > only
>> > > get if your change works after all test cases finished (1 hour later...)
>> > >
>> > > Best,
>> > >
>> > > --
>> > > Nan Zhu
>> > >
>> >
>> >
>> >
>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Code reading tips Spark source

2014-09-25 Thread Mozumder, Monir
Folks,

I am starting to explore Spark framework and hopefully contribute to it in 
future. I was wondering if you have any documentation or tips to get 
understanding the inner workings of the code quickly. 

I am new to both Spark and Scala and am taking a look at the *Rdd*.scala files 
in the source tree.

My ultimate goal is to offload some of the compute done on a partition to the 
GPU cores available on the node. Any prior attempt or design discussion done on 
that aspect?

Bests,
-Monir




VertexRDD partition imbalance

2014-09-25 Thread Larry Xiao

Hi all

VertexRDD is partitioned with HashPartitioner, and it exhibits some 
imbalance of tasks.

For example, Connected Components with partition strategy Edge2D:


   Aggregated Metrics by Executor

Executor ID 	Task Time 	Total Tasks 	Failed Tasks 	Succeeded Tasks 
Input 	Shuffle Read 	Shuffle Write 	Shuffle Spill (Memory) 	Shuffle 
Spill (Disk)

1   10 s10  0   10  234.6 MB0.0 B   43.2 MB 
0.0 B   0.0 B
2   3 s 3   0   3   70.4 MB 0.0 B   13.0 MB 
0.0 B   0.0 B
3   6 s 6   0   6   140.7 MB0.0 B   25.9 MB 
0.0 B   0.0 B
4   9 s 8   0   8   187.9 MB0.0 B   34.6 MB 
0.0 B   0.0 B
5   10 s9   0   9   211.4 MB0.0 B   38.9 MB 
0.0 B   0.0 B

For a stage on mapPartitions at VertexRDD.scala:347
343
344   /** Generates an RDD of vertex attributes suitable for shipping to 
the edge partitions. */

345   private[graphx] def shipVertexAttributes(
346   shipSrc: Boolean, shipDst: Boolean): RDD[(PartitionID, 
VertexAttributeBlock[VD])] = {
347 
partitionsRDD.mapPartitions(_.flatMap(_.shipVertexAttributes(shipSrc, 
shipDst)))

348   }
349

This is executed for every iteration in Pregel, so the imbalance is bad 
for performance.


However, when run PageRank with Edge2D, the tasks are even across 
executors. (all finish 6 tasks)

Our configuration is 6 node, 36 partitions.

My questions is:

   What decides the number of tasks for different executors? And how to
   make it balance?

Thanks!
Larry



Re: Spark SQL use of alias in where clause

2014-09-25 Thread Du Li
Thanks, Yanbo and Nicholas. Now it makes more sense — query optimization is the 
answer. /Du

From: Nicholas Chammas 
mailto:nicholas.cham...@gmail.com>>
Date: Thursday, September 25, 2014 at 6:43 AM
To: Yanbo Liang mailto:yanboha...@gmail.com>>
Cc: Du Li mailto:l...@yahoo-inc.com.invalid>>, 
"dev@spark.apache.org" 
mailto:dev@spark.apache.org>>, 
"u...@spark.apache.org" 
mailto:u...@spark.apache.org>>
Subject: Re: Spark SQL use of alias in where clause

That is correct. Aliases in the SELECT clause can only be referenced in the 
ORDER BY and HAVING clauses. Otherwise, you'll have to just repeat the 
statement, like concat() in this case.

A more elegant alternative, which is probably not available in Spark SQL yet, 
is to use Common Table 
Expressions.

On Wed, Sep 24, 2014 at 11:32 PM, Yanbo Liang 
mailto:yanboha...@gmail.com>> wrote:
Maybe it's the way SQL works.
The select part is executed after the where filter is applied, so you cannot 
use alias declared in select part in where clause.
Hive and Oracle behavior the same as Spark SQL.

2014-09-25 8:58 GMT+08:00 Du Li 
mailto:l...@yahoo-inc.com.invalid>>:
Hi,

The following query does not work in Shark nor in the new Spark SQLContext or 
HiveContext.
SELECT key, value, concat(key, value) as combined from src where combined like 
’11%’;

The following tweak of syntax works fine although a bit ugly.
SELECT key, value, concat(key, value) as combined from src where 
concat(key,value) like ’11%’ order by combined;

Are you going to support alias in where clause soon?

Thanks,
Du




Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Egor Pahomov,

thanks for your suggestions. I think I will do the dirty workaround
because I don't want to maintain my own version of spark for now. Maybe
I will do later when I feel ready to contribute to the project.

Kind Regards,
Niklas Wilcke

On 25.09.2014 16:27, Egor Pahomov wrote:
> I agree with Yu, that you should tell more about your intentions, but
> possible dirty workaround is create wrapper class for LabeledPoint with all
> additional information you need and unwrap values before train, and wrap
> them again after. (look at zipWithIndex - it helps match back additional
> information after unwrapping)
>
> But I would rather patch my spark with method signature chagnes you
> suggested.
>
> 2014-09-25 18:22 GMT+04:00 Egor Pahomov :
>
>> @Yu Ishikawa,
>>
>> *I think the right place for such discussion -
>>  https://issues.apache.org/jira/browse/SPARK-3573
>> *
>>
>>
>> 2014-09-25 18:02 GMT+04:00 Yu Ishikawa :
>>
>>> Hi Niklas Wilcke,
>>>
>>> As you said, it is difficult to extend LabeledPoint class in
>>> mllib.regression.
>>> Do you want to extend LabeledPoint class in order to use any other type
>>> exclude Double type?
>>> If you have your code on Github, could you show us it? I want to know what
>>> you want to do.
>>>
 Community
>>> By the way, I think LabeledPoint class is very useful exclude
>>> mllib.regression package.
>>> Especially, some estimation algorithms should use a type for the labels
>>> exclude Double type,
>>> such as String type. The common generics labeled-point class would be
>>> useful
>>> in MLlib.
>>> I'd like to get your thoughts on it.
>>>
>>> For example,
>>> ```
>>> abstract class LabeledPoint[T](label: T, features: Vector)
>>> ```
>>>
>>> thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>> -
>>> -- Yu Ishikawa
>>> --
>>> View this message in context:
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-enable-extension-of-the-LabeledPoint-class-tp8546p8549.html
>>> Sent from the Apache Spark Developers List mailing list archive at
>>> Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>
>> --
>>
>>
>>
>> *Sincerely yoursEgor PakhomovScala Developer, Yandex*
>>
>
>


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Yu Ishikawa,

I'm sorry but I can't share my code via github at the moment. Hopefully
in some months I can.
I don't want to change the type of the label but that would be also a
very nice improvement.

Making LabeledPoint abstract is exactly what I need. That enables me to
create a class like

LabeledPointTuplePair(label: Double, features: Vector, tuplePair: (Tuple, 
Tuple)) extends LabeledPoint

and use it in combination of the existing ML algorithms.

In my limited understanding of the MLlib I agree with your proposal of
the LabeledPoint interface

abstract class LabeledPoint[T](label: T, features: Vector)

In my opinion making LabeledPoint abstract is necessary and introducing
a generic label would be nice to have.
Just to clarify my priorities.

Kind Regards,
Niklas Wilcke


On 25.09.2014 16:02, Yu Ishikawa wrote:
> Hi Niklas Wilcke,
>
> As you said, it is difficult to extend LabeledPoint class in
> mllib.regression.
> Do you want to extend LabeledPoint class in order to use any other type
> exclude Double type?
> If you have your code on Github, could you show us it? I want to know what
> you want to do.
>
>> Community
> By the way, I think LabeledPoint class is very useful exclude
> mllib.regression package.
> Especially, some estimation algorithms should use a type for the labels
> exclude Double type, 
> such as String type. The common generics labeled-point class would be useful
> in MLlib.
> I'd like to get your thoughts on it.
>
> For example,
> ```
> abstract class LabeledPoint[T](label: T, features: Vector)
> ```
>
> thanks
>
>
>
>
>
>
> -
> -- Yu Ishikawa
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-enable-extension-of-the-LabeledPoint-class-tp8546p8549.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Egor Pahomov
I agree with Yu, that you should tell more about your intentions, but
possible dirty workaround is create wrapper class for LabeledPoint with all
additional information you need and unwrap values before train, and wrap
them again after. (look at zipWithIndex - it helps match back additional
information after unwrapping)

But I would rather patch my spark with method signature chagnes you
suggested.

2014-09-25 18:22 GMT+04:00 Egor Pahomov :

> @Yu Ishikawa,
>
> *I think the right place for such discussion -
>  https://issues.apache.org/jira/browse/SPARK-3573
> *
>
>
> 2014-09-25 18:02 GMT+04:00 Yu Ishikawa :
>
>> Hi Niklas Wilcke,
>>
>> As you said, it is difficult to extend LabeledPoint class in
>> mllib.regression.
>> Do you want to extend LabeledPoint class in order to use any other type
>> exclude Double type?
>> If you have your code on Github, could you show us it? I want to know what
>> you want to do.
>>
>> > Community
>> By the way, I think LabeledPoint class is very useful exclude
>> mllib.regression package.
>> Especially, some estimation algorithms should use a type for the labels
>> exclude Double type,
>> such as String type. The common generics labeled-point class would be
>> useful
>> in MLlib.
>> I'd like to get your thoughts on it.
>>
>> For example,
>> ```
>> abstract class LabeledPoint[T](label: T, features: Vector)
>> ```
>>
>> thanks
>>
>>
>>
>>
>>
>>
>> -
>> -- Yu Ishikawa
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-enable-extension-of-the-LabeledPoint-class-tp8546p8549.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>
>
> --
>
>
>
> *Sincerely yoursEgor PakhomovScala Developer, Yandex*
>



-- 



*Sincerely yoursEgor PakhomovScala Developer, Yandex*


Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Yu Ishikawa
Hi Egor Pahomov, 

Thank you for your comment!



-
-- Yu Ishikawa
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-enable-extension-of-the-LabeledPoint-class-tp8546p8551.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Egor Pahomov
@Yu Ishikawa,

*I think the right place for such discussion -
 https://issues.apache.org/jira/browse/SPARK-3573
*


2014-09-25 18:02 GMT+04:00 Yu Ishikawa :

> Hi Niklas Wilcke,
>
> As you said, it is difficult to extend LabeledPoint class in
> mllib.regression.
> Do you want to extend LabeledPoint class in order to use any other type
> exclude Double type?
> If you have your code on Github, could you show us it? I want to know what
> you want to do.
>
> > Community
> By the way, I think LabeledPoint class is very useful exclude
> mllib.regression package.
> Especially, some estimation algorithms should use a type for the labels
> exclude Double type,
> such as String type. The common generics labeled-point class would be
> useful
> in MLlib.
> I'd like to get your thoughts on it.
>
> For example,
> ```
> abstract class LabeledPoint[T](label: T, features: Vector)
> ```
>
> thanks
>
>
>
>
>
>
> -
> -- Yu Ishikawa
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-enable-extension-of-the-LabeledPoint-class-tp8546p8549.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 



*Sincerely yoursEgor PakhomovScala Developer, Yandex*


Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Yu Ishikawa
Hi Niklas Wilcke,

As you said, it is difficult to extend LabeledPoint class in
mllib.regression.
Do you want to extend LabeledPoint class in order to use any other type
exclude Double type?
If you have your code on Github, could you show us it? I want to know what
you want to do.

> Community
By the way, I think LabeledPoint class is very useful exclude
mllib.regression package.
Especially, some estimation algorithms should use a type for the labels
exclude Double type, 
such as String type. The common generics labeled-point class would be useful
in MLlib.
I'd like to get your thoughts on it.

For example,
```
abstract class LabeledPoint[T](label: T, features: Vector)
```

thanks






-
-- Yu Ishikawa
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-enable-extension-of-the-LabeledPoint-class-tp8546p8549.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark SQL use of alias in where clause

2014-09-25 Thread Nicholas Chammas
That is correct. Aliases in the SELECT clause can only be referenced in the
ORDER BY and HAVING clauses. Otherwise, you'll have to just repeat the
statement, like concat() in this case.

A more elegant alternative, which is probably not available in Spark SQL
yet, is to use Common Table Expressions
.

On Wed, Sep 24, 2014 at 11:32 PM, Yanbo Liang  wrote:

> Maybe it's the way SQL works.
> The select part is executed after the where filter is applied, so you
> cannot use alias declared in select part in where clause.
> Hive and Oracle behavior the same as Spark SQL.
>
> 2014-09-25 8:58 GMT+08:00 Du Li :
>
>>   Hi,
>>
>>  The following query does not work in Shark nor in the new Spark
>> SQLContext or HiveContext.
>> SELECT key, value, concat(key, value) as combined from src where combined
>> like ’11%’;
>>
>>  The following tweak of syntax works fine although a bit ugly.
>> SELECT key, value, concat(key, value) as combined from src where
>> concat(key,value) like ’11%’ order by combined;
>>
>>  Are you going to support alias in where clause soon?
>>
>>  Thanks,
>> Du
>>
>
>


Re: do MIMA checking before all test cases start?

2014-09-25 Thread Nicholas Chammas
It might still make sense to make this change if MIMA checks are always
relatively quick, for the same reason we do style checks first.

On Thu, Sep 25, 2014 at 12:25 AM, Nan Zhu  wrote:

> yeah, I tried that, but there is always an issue when I ran dev/mima,
>
> it always gives me some binary compatibility error on Java API part….
>
> so I have to wait for Jenkins’ result when fixing MIMA issues
>
> --
> Nan Zhu
>
>
> On Thursday, September 25, 2014 at 12:04 AM, Patrick Wendell wrote:
>
> > Have you considered running the mima checks locally? We prefer people
> > not use Jenkins for very frequent checks since it takes resources away
> > from other people trying to run tests.
> >
> > On Wed, Sep 24, 2014 at 6:44 PM, Nan Zhu  (mailto:zhunanmcg...@gmail.com)> wrote:
> > > Hi, all
> > >
> > > It seems that, currently, Jenkins makes MIMA checking after all test
> cases have finished, IIRC, during the first months we introduced MIMA, we
> do the MIMA checking before running test cases
> > >
> > > What's the motivation to adjust this behaviour?
> > >
> > > In my opinion, if you have some binary compatibility issues, you just
> need to do some minor changes, but in the current environment, you can only
> get if your change works after all test cases finished (1 hour later...)
> > >
> > > Best,
> > >
> > > --
> > > Nan Zhu
> > >
> >
> >
> >
>
>
>


MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Spark developers,

I try to implement a framework with Spark and MLlib to do duplicate
detection. I'm not familiar with Spark and Scala so please be patient
with me. In order to enrich the LabeledPoint class with some information
I tried to extend it and added some properties.
But the ML algorithms (in my case DecisionTree) don't accept my new
ExtendedLabeledPoint class. They just accept the type RDD[LabeledPoint].
Would it be possible to extract a LabeledPoint interface / trait or to
change the type to something covariant like
def train[A <: LabeledPoint]( RDD[A], ... )

I hope it's a usefull idea and it's possible.

Thanks in advance,
Niklas

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark_classpath in core/pom.xml and yarn/porm.xml

2014-09-25 Thread Ye Xianjin
hi, Sandy Ryza:
 I believe It's you originally added the SPARK_CLASSPATH in core/pom.xml in 
the org.scalatest section. Does this still needed in 1.1?
 I noticed this setting because when I looked into the unit-tests.log, It 
shows something below:
> 14/09/24 23:57:19.246 WARN SparkConf:
> SPARK_CLASSPATH was detected (set to 'null').
> This is deprecated in Spark 1.0+.
> 
> Please instead use:
>  - ./spark-submit with --driver-class-path to augment the driver classpath
>  - spark.executor.extraClassPath to augment the executor classpath
> 
> 14/09/24 23:57:19.246 WARN SparkConf: Setting 'spark.executor.extraClassPath' 
> to 'null' as a work-around.
> 14/09/24 23:57:19.247 WARN SparkConf: Setting 'spark.driver.extraClassPath' 
> to 'null' as a work-around.

However I didn't set SPARK_CLASSPATH env variable. And looked into the 
SparkConf.scala, If user actually set extraClassPath,  the SparkConf will throw 
SparkException.
-- 
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, September 23, 2014 at 12:56 AM, Ye Xianjin wrote:

> Hi:
> I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment 
> variable for testing. But in the SparkConf.scala, this is deprecated in Spark 
> 1.0+.
> So what this variable for? should we just remove this variable?
> 
> 
> -- 
> Ye Xianjin
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>