RE: Does yarn-stable still accept pull request?

2014-02-11 Thread Liu, Raymond
Should be fixed in both alpha and stable code base, since we aim to support 
both version

Best Regards,
Raymond Liu

-Original Message-
From: Nan Zhu [mailto:zhunanmcg...@gmail.com] 
Sent: Wednesday, February 12, 2014 10:29 AM
To: dev@spark.incubator.apache.org
Subject: Does yarn-stable still accept pull request?

Hi, all

I’m a new user of spark-yarn  

I would like to create a pull request for an issue found in my usage, where 
should I modify the code, stable or alpha (the problem exists in both)?

Best,  

--  
Nan Zhu




RE: yarn, fat-jars and lib_managed

2014-01-09 Thread Liu, Raymond
I think you could put the spark jar and other jar your app depends on while not 
changes a lot on HDFS, and use --files or --addjars ( depends on the mode you 
run YarnClient/YarnStandalone ) to refer to them.
And then just need to redeploy your thin app jar on each invoke.

Best Regards,
Raymond Liu


-Original Message-
From: Alex Cozzi [mailto:alexco...@gmail.com] 
Sent: Friday, January 10, 2014 5:32 AM
To: dev@spark.incubator.apache.org
Subject: yarn, fat-jars and lib_managed

I am just starting out playing with spark on our hadoop 2.2 cluster and I have 
a question.

The current way to submit jobs to the cluster is to create fat-jars with sbt 
assembly. This approach works but I think is less than optimal in many large 
hadoop installation:

the way we interact with the cluster is to log into a CLI machine, which is the 
only authorized to submit jobs. Now, I can not use the CLI machine as a dev 
environment since for security reason the CLI and hadoop cluster is fire-walled 
and can not reach out to the internet, so sbt and manven resolution does not 
work.

So the procedure now is:
- hack code
- sbt assembly
- rsync my spark directory to the CLI machine
- run my job.

the issue is that every time i need to shuttle large binary files (all the 
fat-jars) back and forth, they are about 120Mb now, which is slow, particularly 
when I am working remotely from home.

I was wondering whether a better solution would be to create normal thin-jars 
of my code, which is very small, less than a Mb, and have no problem to copy 
every time to the cluster, but to take advantage of the sbt-create directory 
lib_managed to handle dependencies. We already have this directory that sbt 
handles with all the needed dependencies for the job to run. Wouldn't be 
possible to have the Spark Yarn Client take care of adding all the jars in 
lib_managed to class path and distribute them to the workers automatically (and 
they could also be cached across invocations of spark, after all those jars are 
versioned and immutable, with the possible exception of -SNAPSHOT releases). I 
think that this would greatly simplify the development procedure and remove the 
need of messing with ADD_JAR and SPARK_CLASSPATH.

What do you think?

Alex 


RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
I think you also need to set yarn.version

Say something like 

mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package


hadoop.version is default to 2.2.0 while yarn.version not when you chose the 
new-yarn profile. We probably need to fix it later for easy usage.



Best Regards,
Raymond Liu

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Friday, January 03, 2014 1:07 PM
To: dev@spark.incubator.apache.org
Subject: compiling against hadoop 2.2

Hi,
I used the following command to compile against hadoop 2.2:
mvn clean package -DskipTests -Pnew-yarn

But I got a lot of compilation errors.

Did I use the wrong command ?

Cheers


RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests 
clean package

The one in previous mail not yet available.


Best Regards,
Raymond Liu


-Original Message-
From: Liu, Raymond 
Sent: Friday, January 03, 2014 2:09 PM
To: dev@spark.incubator.apache.org
Subject: RE: compiling against hadoop 2.2

I think you also need to set yarn.version

Say something like 

mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package


hadoop.version is default to 2.2.0 while yarn.version not when you chose the 
new-yarn profile. We probably need to fix it later for easy usage.



Best Regards,
Raymond Liu

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Friday, January 03, 2014 1:07 PM
To: dev@spark.incubator.apache.org
Subject: compiling against hadoop 2.2

Hi,
I used the following command to compile against hadoop 2.2:
mvn clean package -DskipTests -Pnew-yarn

But I got a lot of compilation errors.

Did I use the wrong command ?

Cheers


RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
Yep, you are right. While we will merge in new code pretty soon ( maybe today? 
I hope so) on this part. Might shift a few lines

Best Regards,
Raymond Liu

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Friday, January 03, 2014 2:21 PM
To: dev@spark.incubator.apache.org
Subject: Re: compiling against hadoop 2.2

Specification of yarn.version can be inserted following this line (#762 in 
pom.xml), right ?
 hadoop.version2.2.0/hadoop.version


On Thu, Jan 2, 2014 at 10:10 PM, Liu, Raymond raymond@intel.com wrote:

 Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 
 -DskipTests clean package

 The one in previous mail not yet available.


 Best Regards,
 Raymond Liu


 -Original Message-
 From: Liu, Raymond
 Sent: Friday, January 03, 2014 2:09 PM
 To: dev@spark.incubator.apache.org
 Subject: RE: compiling against hadoop 2.2

 I think you also need to set yarn.version

 Say something like

 mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests 
 clean package


 hadoop.version is default to 2.2.0 while yarn.version not when you 
 chose the new-yarn profile. We probably need to fix it later for easy usage.



 Best Regards,
 Raymond Liu

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Friday, January 03, 2014 1:07 PM
 To: dev@spark.incubator.apache.org
 Subject: compiling against hadoop 2.2

 Hi,
 I used the following command to compile against hadoop 2.2:
 mvn clean package -DskipTests -Pnew-yarn

 But I got a lot of compilation errors.

 Did I use the wrong command ?

 Cheers



RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
And I am not sure where it is value able to providing different setting for 
hadoop/hdfs and yarn version. When build with SBT, they will always be the 
same. Maybe in mvn we should do so too. 

Best Regards,
Raymond Liu


-Original Message-
From: Liu, Raymond 
Sent: Friday, January 03, 2014 2:29 PM
To: dev@spark.incubator.apache.org
Subject: RE: compiling against hadoop 2.2

Yep, you are right. While we will merge in new code pretty soon ( maybe today? 
I hope so) on this part. Might shift a few lines

Best Regards,
Raymond Liu

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Friday, January 03, 2014 2:21 PM
To: dev@spark.incubator.apache.org
Subject: Re: compiling against hadoop 2.2

Specification of yarn.version can be inserted following this line (#762 in 
pom.xml), right ?
 hadoop.version2.2.0/hadoop.version


On Thu, Jan 2, 2014 at 10:10 PM, Liu, Raymond raymond@intel.com wrote:

 Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 
 -DskipTests clean package

 The one in previous mail not yet available.


 Best Regards,
 Raymond Liu


 -Original Message-
 From: Liu, Raymond
 Sent: Friday, January 03, 2014 2:09 PM
 To: dev@spark.incubator.apache.org
 Subject: RE: compiling against hadoop 2.2

 I think you also need to set yarn.version

 Say something like

 mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests 
 clean package


 hadoop.version is default to 2.2.0 while yarn.version not when you 
 chose the new-yarn profile. We probably need to fix it later for easy usage.



 Best Regards,
 Raymond Liu

 -Original Message-
 From: Ted Yu [mailto:yuzhih...@gmail.com]
 Sent: Friday, January 03, 2014 1:07 PM
 To: dev@spark.incubator.apache.org
 Subject: compiling against hadoop 2.2

 Hi,
 I used the following command to compile against hadoop 2.2:
 mvn clean package -DskipTests -Pnew-yarn

 But I got a lot of compilation errors.

 Did I use the wrong command ?

 Cheers



RE: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-15 Thread Liu, Raymond
Hi Azuryy

Please Check https://spark-project.atlassian.net/browse/SPARK-995 for this 
protobuf version issue

Best Regards,
Raymond Liu

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com] 
Sent: Monday, December 16, 2013 10:30 AM
To: dev@spark.incubator.apache.org
Subject: Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

Hi here,
Do we have plan to upgrade protobuf from 2.4.1 to 2.5.0? PB has some 
uncompatable API between these two versions.
Hadoop-2.x using protobuf-2.5.0


but if some guys want to run Spark on mesos, then mesos using
protobuf-2.4.1 currently. so we may discuss here for a better solution.



On Mon, Dec 16, 2013 at 7:42 AM, Azuryy Yu azury...@gmail.com wrote:

 Thanks Patrick.
 On 16 Dec 2013 02:43, Patrick Wendell pwend...@gmail.com wrote:

 You can checkout the docs mentioned in the vote thread. There is also 
 a pre-build binary for hadoop2 that is compiled for YARN 2.2

 - Patrick

 On Sun, Dec 15, 2013 at 4:31 AM, Azuryy Yu azury...@gmail.com wrote:
  yarn 2.2, not yarn 0.22, I am so sorry.
 
 
  On Sun, Dec 15, 2013 at 8:31 PM, Azuryy Yu azury...@gmail.com wrote:
 
  Hi,
  Spark-0.8.1 supports yarn 0.22 right? where to find the release note?
  Thanks.
 
 
  On Sun, Dec 15, 2013 at 3:20 AM, Henry Saputra 
 henry.sapu...@gmail.comwrote:
 
  Yeah seems like it. He was ok with our prev release.
  Let's wait for his reply
 
  On Saturday, December 14, 2013, Patrick Wendell wrote:
 
   Henry - from that thread it looks like sebb's concern was 
   something different than this.
  
   On Sat, Dec 14, 2013 at 11:08 AM, Henry Saputra 
  henry.sapu...@gmail.com
   wrote:
Hi Patrick,
   
Yeap I agree, but technically ASF VOTE release on source 
only,
 there
even debate about it =), so putting it in the vote staging
 artifact
could confuse people because in our case we do package 3rd 
party libraries in the binary jars.
   
I have sent email to sebb asking clarification about his 
concern
 in
general@ list.
   
- Henry
   
On Sat, Dec 14, 2013 at 10:56 AM, Patrick Wendell 
 pwend...@gmail.com
  
   wrote:
Hey Henry,
   
One thing a lot of people do during the vote is test the
 binaries and
make sure they work. This is really valuable. If you'd like 
I
 could
add a caveat to the vote thread explaining that we are only
 voting on
the source.
   
- Patrick
   
On Sat, Dec 14, 2013 at 10:40 AM, Henry Saputra 
   henry.sapu...@gmail.com wrote:
Actually we should be fine putting the binaries there as 
long
 as the
VOTE is for the source.
   
Let's verify with sebb in the general@ list about his concern.
   
- Henry
   
On Sat, Dec 14, 2013 at 10:31 AM, Henry Saputra 
   henry.sapu...@gmail.com wrote:
Hi Patrick, as sebb has mentioned let's move the binaries 
from
 the
voting directory in your people.apache.org directory.
ASF release voting is for source code and not binaries, 
and technically we provide binaries for convenience.
   
And add link to the KEYS location in the dist[1] to let 
verify
   signatures.
   
Sorry for the late response to the VOTE thread, guys.
   
- Henry
   
[1]
  https://dist.apache.org/repos/dist/release/incubator/spark/KEYS
   
On Fri, Dec 13, 2013 at 6:37 PM, Patrick Wendell 
  pwend...@gmail.com
   wrote:
The vote is now closed. This vote passes with 5 PPMC +1's 
and
 no 0
   or -1
votes.
   
+1 (5 Total)
Matei Zaharia*
Nick Pentreath*
Patrick Wendell*
Prashant Sharma*
Tom Graves*
   
0 (0 Total)
   
-1 (0 Total)
   
* = Binding Vote
   
As per the incubator release guide [1] I'll be sending 
this
 to the
general incubator list for a final vote from IPMC members.
   
[1]
   
  
 
 http://incubator.apache.org/guides/releasemanagement.html#best-practi
 ce-incubator-release-
vote
   
   
On Thu, Dec 12, 2013 at 8:59 AM, Evan Chan 
e...@ooyala.com
 wrote:
   
I'd be personally fine with a standard workflow of
 assemble-deps
  +
packaging just the Spark files as separate packages, if 
it
  speeds up
everyone's development time.
   
   
On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra 
   m...@clearstorydata.com
wrote:
   
 I don't know how to make sense of the numbers, but 
 here's
 what
   I've got
 from a very small sample size.
 
 
 




RE: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-15 Thread Liu, Raymond
That issue is for 0.9's solution.

And if you mean for 0.8.1, when you build against hadoop 2.2 Yarn, protobuf is 
already using 2.5.0 instead of 2.4.1. so it will works fine with hadoop 2.2
And regarding on 0.8.1 you build against hadoop 2.2 Yarn, while run upon 
mesos... strange combination, I am not sure, might have problem. If have 
problem, you might need to build mesos against 2.5.0, I don't test that, if you 
got time, mind take a test?

Best Regards,
Raymond Liu


-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com] 
Sent: Monday, December 16, 2013 10:48 AM
To: dev@spark.incubator.apache.org
Subject: RE: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

Hi Azuryy

Please Check https://spark-project.atlassian.net/browse/SPARK-995 for this 
protobuf version issue

Best Regards,
Raymond Liu

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: Monday, December 16, 2013 10:30 AM
To: dev@spark.incubator.apache.org
Subject: Re: [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

Hi here,
Do we have plan to upgrade protobuf from 2.4.1 to 2.5.0? PB has some 
uncompatable API between these two versions.
Hadoop-2.x using protobuf-2.5.0


but if some guys want to run Spark on mesos, then mesos using
protobuf-2.4.1 currently. so we may discuss here for a better solution.



On Mon, Dec 16, 2013 at 7:42 AM, Azuryy Yu azury...@gmail.com wrote:

 Thanks Patrick.
 On 16 Dec 2013 02:43, Patrick Wendell pwend...@gmail.com wrote:

 You can checkout the docs mentioned in the vote thread. There is also 
 a pre-build binary for hadoop2 that is compiled for YARN 2.2

 - Patrick

 On Sun, Dec 15, 2013 at 4:31 AM, Azuryy Yu azury...@gmail.com wrote:
  yarn 2.2, not yarn 0.22, I am so sorry.
 
 
  On Sun, Dec 15, 2013 at 8:31 PM, Azuryy Yu azury...@gmail.com wrote:
 
  Hi,
  Spark-0.8.1 supports yarn 0.22 right? where to find the release note?
  Thanks.
 
 
  On Sun, Dec 15, 2013 at 3:20 AM, Henry Saputra 
 henry.sapu...@gmail.comwrote:
 
  Yeah seems like it. He was ok with our prev release.
  Let's wait for his reply
 
  On Saturday, December 14, 2013, Patrick Wendell wrote:
 
   Henry - from that thread it looks like sebb's concern was 
   something different than this.
  
   On Sat, Dec 14, 2013 at 11:08 AM, Henry Saputra 
  henry.sapu...@gmail.com
   wrote:
Hi Patrick,
   
Yeap I agree, but technically ASF VOTE release on source 
only,
 there
even debate about it =), so putting it in the vote staging
 artifact
could confuse people because in our case we do package 3rd 
party libraries in the binary jars.
   
I have sent email to sebb asking clarification about his 
concern
 in
general@ list.
   
- Henry
   
On Sat, Dec 14, 2013 at 10:56 AM, Patrick Wendell 
 pwend...@gmail.com
  
   wrote:
Hey Henry,
   
One thing a lot of people do during the vote is test the
 binaries and
make sure they work. This is really valuable. If you'd like 
I
 could
add a caveat to the vote thread explaining that we are only
 voting on
the source.
   
- Patrick
   
On Sat, Dec 14, 2013 at 10:40 AM, Henry Saputra 
   henry.sapu...@gmail.com wrote:
Actually we should be fine putting the binaries there as 
long
 as the
VOTE is for the source.
   
Let's verify with sebb in the general@ list about his concern.
   
- Henry
   
On Sat, Dec 14, 2013 at 10:31 AM, Henry Saputra 
   henry.sapu...@gmail.com wrote:
Hi Patrick, as sebb has mentioned let's move the binaries 
from
 the
voting directory in your people.apache.org directory.
ASF release voting is for source code and not binaries, 
and technically we provide binaries for convenience.
   
And add link to the KEYS location in the dist[1] to let 
verify
   signatures.
   
Sorry for the late response to the VOTE thread, guys.
   
- Henry
   
[1]
  https://dist.apache.org/repos/dist/release/incubator/spark/KEYS
   
On Fri, Dec 13, 2013 at 6:37 PM, Patrick Wendell 
  pwend...@gmail.com
   wrote:
The vote is now closed. This vote passes with 5 PPMC +1's 
and
 no 0
   or -1
votes.
   
+1 (5 Total)
Matei Zaharia*
Nick Pentreath*
Patrick Wendell*
Prashant Sharma*
Tom Graves*
   
0 (0 Total)
   
-1 (0 Total)
   
* = Binding Vote
   
As per the incubator release guide [1] I'll be sending 
this
 to the
general incubator list for a final vote from IPMC members.
   
[1]
   
  
 
 http://incubator.apache.org/guides/releasemanagement.html#best-practi
 ce-incubator-release-
vote
   
   
On Thu, Dec 12, 2013 at 8:59 AM, Evan Chan 
e...@ooyala.com
 wrote:
   
I'd be personally fine with a standard workflow of
 assemble-deps
  +
packaging just the Spark files as separate packages, if 
it
  speeds up
everyone's development time.
   
   
On Wed, Dec 11, 2013 at 1:10 PM, Mark Hamstra 
   m...@clearstorydata.com
wrote:
   
 I

RE: Scala 2.10 Merge

2013-12-12 Thread Liu, Raymond
Hi Patrick

What does that means for drop YARN 2.2? seems codes are still there. 
You mean if build upon 2.2 it will break, and won't and work right? Since the 
home made akka build on scala 2.10 are not there. While, if for this case, can 
we just use akka 2.3-M1 which run on protobuf 2.5 for replacement?

Best Regards,
Raymond Liu


-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: Thursday, December 12, 2013 4:21 PM
To: dev@spark.incubator.apache.org
Subject: Scala 2.10 Merge

Hi Developers,

In the next few days we are planning to merge Scala 2.10 support into Spark. 
For those that haven't been following this, Prashant Sharma has been 
maintaining the scala-2.10 branch of Spark for several months. This branch is 
current with master and has been reviewed for merging:

https://github.com/apache/incubator-spark/tree/scala-2.10

Scala 2.10 support is one of the most requested features for Spark - it will be 
great to get this into Spark 0.9! Please note that *Scala 2.10 is not binary 
compatible with Scala 2.9*. With that in mind, I wanted to give a few 
heads-up/requests to developers:

If you are developing applications on top of Spark's master branch, those will 
need to migrate to Scala 2.10. You may want to download and test the current 
scala-2.10 branch in order to make sure you will be okay as Spark developments 
move forward. Of course, you can always stick with the current master commit 
and be fine (I'll cut a tag when we do the merge in order to delineate where 
the version changes). Please open new threads on the dev list to report and 
discuss any issues.

This merge will temporarily drop support for YARN 2.2 on the master branch.
This is because the workaround we used was only compiled for Scala 2.9. We are 
going to come up with a more robust solution to YARN 2.2 support before 
releasing 0.9.

Going forward, we will continue to make maintenance releases on branch-0.8 
which will remain compatible with Scala 2.9.

For those interested, the primary code changes in this merge are upgrading the 
akka version, changing the use of Scala 2.9's ClassManifest construct to Scala 
2.10's ClassTag, and updating the spark shell to work with Scala 2.10's repl.

- Patrick


RE: Scala 2.10 Merge

2013-12-12 Thread Liu, Raymond
Hi Patrick

So what's the plan for support Yarn 2.2 in 0.9? As far as I can see, if 
you want to support both 2.2 and 2.0 , due to protobuf version incompatible 
issue. You need two version of akka anyway.

Akka 2.3-M1 looks like have a little bit change in API, we probably 
could isolate the code like what we did on yarn part API. I remember that it is 
mentioned that to use reflection for different API is preferred. So the purpose 
to use reflection is to use one release bin jar to support both version of 
Hadoop/Yarn on runtime, instead of build different bin jar on compile time?

 Then all code related to hadoop will also be built in separate modules 
for loading on demand? This sounds to me involve a lot of works. And you still 
need to have shim layer and separate code for different version API and depends 
on different version Akka etc. Sounds like and even strict demands versus our 
current approaching on master, and with dynamic class loader in addition, And 
the problem we are facing now are still there?

Best Regards,
Raymond Liu

-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: Thursday, December 12, 2013 5:13 PM
To: dev@spark.incubator.apache.org
Subject: Re: Scala 2.10 Merge

Also - the code is still there because of a recent merge that took in some 
newer changes... we'll be removing it for the final merge.


On Thu, Dec 12, 2013 at 1:12 AM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Raymond,

 This won't work because AFAIK akka 2.3-M1 is not binary compatible 
 with akka 2.2.3 (right?). For all of the non-yarn 2.2 versions we need 
 to still use the older protobuf library, so we'd need to support both.

 I'd also be concerned about having a reference to a non-released 
 version of akka. Akka is the source of our hardest-to-find bugs and 
 simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting. 
 Of course, if you are building off of master you can maintain a fork that 
 uses this.

 - Patrick


 On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond raymond@intel.comwrote:

 Hi Patrick

 What does that means for drop YARN 2.2? seems codes are still 
 there. You mean if build upon 2.2 it will break, and won't and work right?
 Since the home made akka build on scala 2.10 are not there. While, if 
 for this case, can we just use akka 2.3-M1 which run on protobuf 2.5 
 for replacement?

 Best Regards,
 Raymond Liu


 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Thursday, December 12, 2013 4:21 PM
 To: dev@spark.incubator.apache.org
 Subject: Scala 2.10 Merge

 Hi Developers,

 In the next few days we are planning to merge Scala 2.10 support into 
 Spark. For those that haven't been following this, Prashant Sharma 
 has been maintaining the scala-2.10 branch of Spark for several 
 months. This branch is current with master and has been reviewed for merging:

 https://github.com/apache/incubator-spark/tree/scala-2.10

 Scala 2.10 support is one of the most requested features for Spark - 
 it will be great to get this into Spark 0.9! Please note that *Scala 
 2.10 is not binary compatible with Scala 2.9*. With that in mind, I 
 wanted to give a few heads-up/requests to developers:

 If you are developing applications on top of Spark's master branch, 
 those will need to migrate to Scala 2.10. You may want to download 
 and test the current scala-2.10 branch in order to make sure you will 
 be okay as Spark developments move forward. Of course, you can always 
 stick with the current master commit and be fine (I'll cut a tag when 
 we do the merge in order to delineate where the version changes). 
 Please open new threads on the dev list to report and discuss any issues.

 This merge will temporarily drop support for YARN 2.2 on the master 
 branch.
 This is because the workaround we used was only compiled for Scala 2.9.
 We are going to come up with a more robust solution to YARN 2.2 
 support before releasing 0.9.

 Going forward, we will continue to make maintenance releases on
 branch-0.8 which will remain compatible with Scala 2.9.

 For those interested, the primary code changes in this merge are 
 upgrading the akka version, changing the use of Scala 2.9's 
 ClassManifest construct to Scala 2.10's ClassTag, and updating the 
 spark shell to work with Scala 2.10's repl.

 - Patrick





what's the strategy for code sync between branches e.g. scala-2.10 v.s. master?

2013-11-04 Thread Liu, Raymond
Hi
It seems to me that dev branches are sync with master by keep merging 
trunk codes. E.g. scala-2.10 branches continuously merge latest master code 
into itself for update.

While I am wondering, what's the general guide line on doing this? It 
seems to me that not every code in master are merged into scala-2.10 branch. 
Say, on OCT 10, there are a merge from master to scala-2.10 branch. While some 
commit in OCT.4 not included. E.g. StandaloneX rename to CoarseGrainedX. So I 
am puzzled, how do we track which commit is already merged into scala-2.10 
branch and which is not? And how do we plan to merge scala-2.10 branch back to 
master? And is there any good way to find out that any changes are done by 2.10 
branch or by master through merge operation. It seems to me pretty hard to 
identify them and sync codes.

It seems to me that a rebase on master won't lead to the above issues, 
since all branch changes will stay on the top. So any reason that merging is 
chosen instead of rebase other than not want a force update on checked out 
source?


Best Regards,
Raymond Liu




RE: issue regarding akka, protobuf and Hadoop version

2013-11-04 Thread Liu, Raymond
I plan to do the work on scala-2.10 branch, which already move to akka 2.2.3, 
hope that to move to akka 2.3-M1 (which support protobuf 2.5.x) will not cause 
many problem and make it a test to see is there further issues, then wait for 
the formal release of akka 2.3.x

While the issue is that I can see many commits on master branch is not merged 
into scala-2.10 branch yet. The latest merge seems to happen on OCT.11, while 
as I mentioned in the dev branch merge/sync thread, seems that many earlier 
commit is not included and which will surely bring extra works on future code 
merging/rebase. So again, what's the code sync strategy and what's the plan of 
merge back into master? 

Best Regards,
Raymond Liu


-Original Message-
From: Reynold Xin [mailto:r...@apache.org] 
Sent: Tuesday, November 05, 2013 8:34 AM
To: dev@spark.incubator.apache.org
Subject: Re: issue regarding akka, protobuf and Hadoop version

I chatted with Matt Massie about this, and here are some options:

1. Use dependency injection in google-guice to make Akka use one version of 
protobuf, and YARN use the other version.

2. Look into OSGi to accomplish the same goal.

3. Rewrite the messaging part of Spark to use a simple, custom RPC library 
instead of Akka. We are really only using a very simple subset of Akka 
features, and we can probably implement a simple RPC library tailored for Spark 
quickly. We should only do this as the last resort.

4. Talk to Akka guys and hope they can make a maintenance release of Akka that 
supports protobuf 2.5.


None of these are ideal, but we'd have to pick one. It would be great if you 
have other suggestions.


On Sun, Nov 3, 2013 at 11:46 PM, Liu, Raymond raymond@intel.com wrote:

 Hi

 I am working on porting spark onto Hadoop 2.2.0, With some 
 renaming and call into new YARN API works done. I can run up the spark 
 master. While I encounter the issue that Executor Actor could not 
 connecting to Driver actor.

 After some investigation, I found the root cause is that the 
 akka-remote do not support protobuf 2.5.0 before 2.3. And hadoop move 
 to protobuf 2.5.0 from 2.1-beta.

 The issue is that if I exclude the akka dependency from hadoop 
 and force protobuf dependency to 2.4.1, the compile/packing will fail 
 since hadoop common jar require a new interface from protobuf 2.5.0.

  So any suggestion on this?

 Best Regards,
 Raymond Liu