Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-25 Thread jameszhouyi
Compiled:
git clone https://github.com/apache/spark.git
git checkout tags/v1.4.0-rc2
./make-distribution.sh --tgz --skip-java-test -Pyarn -Phadoop-2.4
-Dhadoop.version=2.5.0 -Phive -Phive-0.13.1 -Phive-thriftserver -DskipTests

Block issue in RC1/RC2:
https://issues.apache.org/jira/browse/SPARK-7119





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-4-0-RC2-tp12420p12444.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



SparkR and RDDs

2015-05-25 Thread Andrew Psaltis
Hi,
I understand from SPARK-6799[1] and the respective merge commit [2]  that
the RDD class is private in Spark 1.4 . If I wanted to modify the old
Kmeans and/or LR examples so that the computation happened in Spark what is
the best direction to go? Sorry if I am missing something obvious, but
based on the NAMESPACE file [3] in the SparkR codebase I am having trouble
seeing the obvious direction to go.

Thanks in advance,
Andrew

[1] https://issues.apache.org/jira/browse/SPARK-6799
[2]
https://github.com/apache/spark/commit/4b91e18d9b7803dbfe1e1cf20b46163d8cb8716c
[3] https://github.com/apache/spark/blob/branch-1.4/R/pkg/NAMESPACE


Re: Change for submitting to yarn in 1.3.1

2015-05-25 Thread Chester Chen
I put the design requirements and description in the commit comment. So I
will close the PR. please refer the following commit

https://github.com/AlpineNow/spark/commit/5b336bbfe92eabca7f4c20e5d49e51bb3721da4d



On Mon, May 25, 2015 at 3:21 PM, Chester Chen  wrote:

> All,
>  I have created a PR just for the purpose of helping document the use
> case, requirements and design. As it is unlikely to get merge in. So it
> only used to illustrate the problems we trying and solve and approaches we
> took.
>
>https://github.com/apache/spark/pull/6398
>
>
> Hope this helps the discussion
>
> Chester
>
>
>
>
>
>
> On Fri, May 22, 2015 at 10:55 AM, Kevin Markey 
> wrote:
>
>>  Thanks.  We'll look at it.
>> I've sent another reply addressing some of your other comments.
>> Kevin
>>
>>
>> On 05/22/2015 10:27 AM, Marcelo Vanzin wrote:
>>
>>  Hi Kevin,
>>
>>  One thing that might help you in the meantime, while we work on a better
>> interface for all this...
>>
>> On Thu, May 21, 2015 at 5:21 PM, Kevin Markey 
>> wrote:
>>
>>> Making *yarn.Client* private has prevented us from moving from Spark
>>> 1.0.x to Spark 1.2 or 1.3 despite many alluring new features.
>>>
>>
>>  Since you're not afraid to use private APIs, and to avoid using ugly
>> reflection hacks, you could abuse the fact that private things in Scala are
>> not really private most of the time. For example (trimmed to show just
>> stuff that might be interesting to you):
>>
>> # javap -classpath
>> /opt/cloudera/parcels/CDH/jars/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar
>> org.apache.spark.deploy.yarn.Client
>> Compiled from "Client.scala"
>> public class org.apache.spark.deploy.yarn.Client implements
>> org.apache.spark.Logging {
>>   ...
>>   public org.apache.hadoop.yarn.client.api.YarnClient
>> org$apache$spark$deploy$yarn$Client$$yarnClient();
>>   public void run();
>>   public
>> org.apache.spark.deploy.yarn.Client(org.apache.spark.deploy.yarn.ClientArguments,
>> org.apache.hadoop.conf.Configuration, org.apache.spark.SparkConf);
>>   public
>> org.apache.spark.deploy.yarn.Client(org.apache.spark.deploy.yarn.ClientArguments,
>> org.apache.spark.SparkConf);
>>   public
>> org.apache.spark.deploy.yarn.Client(org.apache.spark.deploy.yarn.ClientArguments);
>> }
>>
>>  So it should be easy to write a small Java wrapper around this. No less
>> hacky than relying on the "private-but-public" code of before.
>>
>> --
>> Marcelo
>>
>>
>>
>


Re: Change for submitting to yarn in 1.3.1

2015-05-25 Thread Chester Chen
All,
 I have created a PR just for the purpose of helping document the use
case, requirements and design. As it is unlikely to get merge in. So it
only used to illustrate the problems we trying and solve and approaches we
took.

   https://github.com/apache/spark/pull/6398


Hope this helps the discussion

Chester






On Fri, May 22, 2015 at 10:55 AM, Kevin Markey 
wrote:

>  Thanks.  We'll look at it.
> I've sent another reply addressing some of your other comments.
> Kevin
>
>
> On 05/22/2015 10:27 AM, Marcelo Vanzin wrote:
>
>  Hi Kevin,
>
>  One thing that might help you in the meantime, while we work on a better
> interface for all this...
>
> On Thu, May 21, 2015 at 5:21 PM, Kevin Markey 
> wrote:
>
>> Making *yarn.Client* private has prevented us from moving from Spark
>> 1.0.x to Spark 1.2 or 1.3 despite many alluring new features.
>>
>
>  Since you're not afraid to use private APIs, and to avoid using ugly
> reflection hacks, you could abuse the fact that private things in Scala are
> not really private most of the time. For example (trimmed to show just
> stuff that might be interesting to you):
>
> # javap -classpath
> /opt/cloudera/parcels/CDH/jars/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar
> org.apache.spark.deploy.yarn.Client
> Compiled from "Client.scala"
> public class org.apache.spark.deploy.yarn.Client implements
> org.apache.spark.Logging {
>   ...
>   public org.apache.hadoop.yarn.client.api.YarnClient
> org$apache$spark$deploy$yarn$Client$$yarnClient();
>   public void run();
>   public
> org.apache.spark.deploy.yarn.Client(org.apache.spark.deploy.yarn.ClientArguments,
> org.apache.hadoop.conf.Configuration, org.apache.spark.SparkConf);
>   public
> org.apache.spark.deploy.yarn.Client(org.apache.spark.deploy.yarn.ClientArguments,
> org.apache.spark.SparkConf);
>   public
> org.apache.spark.deploy.yarn.Client(org.apache.spark.deploy.yarn.ClientArguments);
> }
>
>  So it should be easy to write a small Java wrapper around this. No less
> hacky than relying on the "private-but-public" code of before.
>
> --
> Marcelo
>
>
>


Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-25 Thread Olivier Girardot
I've just tested the new window functions using PySpark in the Spark 1.4.0
rc2 distribution for hadoop 2.4 with and without hive support.
It works well with the hive support enabled distribution and fails as
expected on the other one (with an explicit error :  "Could not resolve
window function 'lead'. Note that, using window functions currently
requires a HiveContext").

Thank you for your work.

Regards,

Olivier.

Le lun. 25 mai 2015 à 11:25, Wang, Daoyuan  a
écrit :

> Good catch! BTW, SPARK-6784 is duplicate to SPAKR-7790, didn't notice we
> changed the title of SPARK-7853..
>
>
> -Original Message-
> From: Cheng, Hao [mailto:hao.ch...@intel.com]
> Sent: Monday, May 25, 2015 4:47 PM
> To: Sean Owen; Patrick Wendell
> Cc: dev@spark.apache.org
> Subject: RE: [VOTE] Release Apache Spark 1.4.0 (RC2)
>
> Add another Blocker issue, just created! It seems a regression.
>
> https://issues.apache.org/jira/browse/SPARK-7853
>
>
> -Original Message-
> From: Sean Owen [mailto:so...@cloudera.com]
> Sent: Monday, May 25, 2015 3:37 PM
> To: Patrick Wendell
> Cc: dev@spark.apache.org
> Subject: Re: [VOTE] Release Apache Spark 1.4.0 (RC2)
>
> We still have 1 blocker for 1.4:
>
> SPARK-6784 Make sure values of partitioning columns are correctly
> converted based on their data types
>
> CC Davies Liu / Adrian Wang to check on the status of this.
>
> There are still 50 Critical issues tagged for 1.4, and 183 issues targeted
> for 1.4 in general. Obviously almost all of those won't be in 1.4. How do
> people want to deal with those? The field can be cleared, but do people
> want to take a pass at bumping a few to 1.4.1 that really truly are
> supposed to go into 1.4.1?
>
>
> On Sun, May 24, 2015 at 8:22 AM, Patrick Wendell 
> wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> 1.4.0!
> >
> > The tag to be voted on is v1.4.0-rc2 (commit 03fb26a3):
> > https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=03fb26a
> > 3e50e00739cc815ba4e2e82d71d003168
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > [published as version: 1.4.0]
> > https://repository.apache.org/content/repositories/orgapachespark-1103
> > /
> > [published as version: 1.4.0-rc2]
> > https://repository.apache.org/content/repositories/orgapachespark-1104
> > /
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-docs
> > /
> >
> > Please vote on releasing this package as Apache Spark 1.4.0!
> >
> > The vote is open until Wednesday, May 27, at 08:12 UTC and passes if a
> > majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not
> > release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > == What has changed since RC1 ==
> > Below is a list of bug fixes that went into this RC:
> > http://s.apache.org/U1M
> >
> > == How can I help test this release? == If you are a Spark user, you
> > can help us test this release by taking a Spark 1.3 workload and
> > running on this release candidate, then reporting any regressions.
> >
> > == What justifies a -1 vote for this release? == This vote is
> > happening towards the end of the 1.4 QA period, so -1 votes should
> > only occur for significant regressions from 1.3.1.
> > Bugs already present in 1.3.X, minor regressions, or bugs related to
> > new features will not block this release.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For
> > additional commands, e-mail: dev-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional
> commands, e-mail: dev-h...@spark.apache.org
>
>


Hive metadata operations support

2015-05-25 Thread Igor Mazur
Hi!

I've found that Spark only supports ExecuteStatementOperation in
SparkSQLOperationManager.

Are there any plans to support others metadata operations? 

Why this question occurs - I'm trying to connect PrestoDB through standard
hive jdbc driver and it doesn't see any tables that registered as
registerTempTable. 
Also through beeline tool I can execute "SHOW TABLES in default" and get
those tables, but !tables doesn't show me same.

So, if it was correct from your vision, I would try to implement others
operations.

Igor Mazur



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Hive-metadata-operations-support-tp12439.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-25 Thread Wang, Daoyuan
Good catch! BTW, SPARK-6784 is duplicate to SPAKR-7790, didn't notice we 
changed the title of SPARK-7853..


-Original Message-
From: Cheng, Hao [mailto:hao.ch...@intel.com] 
Sent: Monday, May 25, 2015 4:47 PM
To: Sean Owen; Patrick Wendell
Cc: dev@spark.apache.org
Subject: RE: [VOTE] Release Apache Spark 1.4.0 (RC2)

Add another Blocker issue, just created! It seems a regression.

https://issues.apache.org/jira/browse/SPARK-7853


-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Monday, May 25, 2015 3:37 PM
To: Patrick Wendell
Cc: dev@spark.apache.org
Subject: Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

We still have 1 blocker for 1.4:

SPARK-6784 Make sure values of partitioning columns are correctly converted 
based on their data types

CC Davies Liu / Adrian Wang to check on the status of this.

There are still 50 Critical issues tagged for 1.4, and 183 issues targeted for 
1.4 in general. Obviously almost all of those won't be in 1.4. How do people 
want to deal with those? The field can be cleared, but do people want to take a 
pass at bumping a few to 1.4.1 that really truly are supposed to go into 1.4.1?


On Sun, May 24, 2015 at 8:22 AM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.4.0!
>
> The tag to be voted on is v1.4.0-rc2 (commit 03fb26a3):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=03fb26a
> 3e50e00739cc815ba4e2e82d71d003168
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> [published as version: 1.4.0]
> https://repository.apache.org/content/repositories/orgapachespark-1103
> /
> [published as version: 1.4.0-rc2]
> https://repository.apache.org/content/repositories/orgapachespark-1104
> /
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-docs
> /
>
> Please vote on releasing this package as Apache Spark 1.4.0!
>
> The vote is open until Wednesday, May 27, at 08:12 UTC and passes if a 
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not 
> release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> == What has changed since RC1 ==
> Below is a list of bug fixes that went into this RC:
> http://s.apache.org/U1M
>
> == How can I help test this release? == If you are a Spark user, you 
> can help us test this release by taking a Spark 1.3 workload and 
> running on this release candidate, then reporting any regressions.
>
> == What justifies a -1 vote for this release? == This vote is 
> happening towards the end of the 1.4 QA period, so -1 votes should 
> only occur for significant regressions from 1.3.1.
> Bugs already present in 1.3.X, minor regressions, or bugs related to 
> new features will not block this release.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
> additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional 
commands, e-mail: dev-h...@spark.apache.org



RE: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-25 Thread Cheng, Hao
Add another Blocker issue, just created! It seems a regression.

https://issues.apache.org/jira/browse/SPARK-7853


-Original Message-
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Monday, May 25, 2015 3:37 PM
To: Patrick Wendell
Cc: dev@spark.apache.org
Subject: Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

We still have 1 blocker for 1.4:

SPARK-6784 Make sure values of partitioning columns are correctly converted 
based on their data types

CC Davies Liu / Adrian Wang to check on the status of this.

There are still 50 Critical issues tagged for 1.4, and 183 issues targeted for 
1.4 in general. Obviously almost all of those won't be in 1.4. How do people 
want to deal with those? The field can be cleared, but do people want to take a 
pass at bumping a few to 1.4.1 that really truly are supposed to go into 1.4.1?


On Sun, May 24, 2015 at 8:22 AM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.4.0!
>
> The tag to be voted on is v1.4.0-rc2 (commit 03fb26a3):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=03fb26a
> 3e50e00739cc815ba4e2e82d71d003168
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> [published as version: 1.4.0]
> https://repository.apache.org/content/repositories/orgapachespark-1103
> /
> [published as version: 1.4.0-rc2]
> https://repository.apache.org/content/repositories/orgapachespark-1104
> /
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-docs
> /
>
> Please vote on releasing this package as Apache Spark 1.4.0!
>
> The vote is open until Wednesday, May 27, at 08:12 UTC and passes if a 
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not 
> release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> == What has changed since RC1 ==
> Below is a list of bug fixes that went into this RC:
> http://s.apache.org/U1M
>
> == How can I help test this release? == If you are a Spark user, you 
> can help us test this release by taking a Spark 1.3 workload and 
> running on this release candidate, then reporting any regressions.
>
> == What justifies a -1 vote for this release? == This vote is 
> happening towards the end of the 1.4 QA period, so -1 votes should 
> only occur for significant regressions from 1.3.1.
> Bugs already present in 1.3.X, minor regressions, or bugs related to 
> new features will not block this release.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
> additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional 
commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-25 Thread Sean Owen
We still have 1 blocker for 1.4:

SPARK-6784 Make sure values of partitioning columns are correctly
converted based on their data types

CC Davies Liu / Adrian Wang to check on the status of this.

There are still 50 Critical issues tagged for 1.4, and 183 issues
targeted for 1.4 in general. Obviously almost all of those won't be in
1.4. How do people want to deal with those? The field can be cleared,
but do people want to take a pass at bumping a few to 1.4.1 that
really truly are supposed to go into 1.4.1?


On Sun, May 24, 2015 at 8:22 AM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.4.0!
>
> The tag to be voted on is v1.4.0-rc2 (commit 03fb26a3):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=03fb26a3e50e00739cc815ba4e2e82d71d003168
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> [published as version: 1.4.0]
> https://repository.apache.org/content/repositories/orgapachespark-1103/
> [published as version: 1.4.0-rc2]
> https://repository.apache.org/content/repositories/orgapachespark-1104/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-docs/
>
> Please vote on releasing this package as Apache Spark 1.4.0!
>
> The vote is open until Wednesday, May 27, at 08:12 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == What has changed since RC1 ==
> Below is a list of bug fixes that went into this RC:
> http://s.apache.org/U1M
>
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.3 workload and running on this release candidate,
> then reporting any regressions.
>
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.4 QA period,
> so -1 votes should only occur for significant regressions from 1.3.1.
> Bugs already present in 1.3.X, minor regressions, or bugs related
> to new features will not block this release.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Tungsten's Vectorized Execution

2015-05-25 Thread Reynold Xin
Yes that's exactly the reason.


On Sat, May 23, 2015 at 12:37 AM, Yijie Shen 
wrote:

> Davies and Reynold,
>
> Glad to hear about the status.
>
> I’ve seen [SPARK-7813](https://issues.apache.org/jira/browse/SPARK-7813)
> and watching it now.
>
> If I understand correctly, it’s aimed at moving CodeGenerator’s
> expressionEvaluator’s code-gen logic into each expressions’ eval() and
> eliminating to chose between row evaluation methods in Physical Operators?
> What’s the reason motives this refactoring job? to use code-gen version
> aggressively in evaluation?
>
>
> On May 22, 2015 at 3:05:24 PM, Xin Reynold (r...@databricks.com) wrote:
>
> Yijie,
>
> As Davies said, it will take us a while to get to vectorized execution.
> However, before that, we are going to refactor code generation to push it
> into each expression: https://issues.apache.org/jira/browse/SPARK-7813
>
> Once this one is in (probably in the next 2 or 3 weeks), there will be
> lots of expressions to create code-gen versions, and it'd be great to get
> as much help as possible from the community.
>
>
>
>
> On Thu, May 21, 2015 at 1:59 PM, Davies Liu  wrote:
>
>> We have not start to prototype the vectorized one yet, will evaluated
>> in 1.5 and may targeted for 1.6.
>>
>> We're glad to hear some feedback/suggestions/comments from your side!
>>
>> On Thu, May 21, 2015 at 9:37 AM, Yijie Shen 
>> wrote:
>> > Hi all,
>> >
>> > I’ve seen the Blog of Project Tungsten here, it sounds awesome to me!
>> >
>> > I’ve also noticed there is a plan to change the code generation from
>> > record-at-a-time evaluation to a vectorized one, which interests me
>> most.
>> >
>> > What’s the status of vectorized evaluation?  Is this an inner effort of
>> > Databricks or welcome to be involved?
>> >
>> > Since I’ve done similar stuffs on Spark SQL, I would like to get
>> involved if
>> > that’s possible.
>> >
>> >
>> > Yours,
>> >
>> > Yijie
>>
>>  -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>