Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-02 Thread Denny Lee
+1 (non-binding)

Verified on OSX 10.10.2, built from source,
spark-shell / spark-submit jobs
ran various simple Spark / Scala queries
ran various SparkSQL queries (including HiveContext)
ran ThriftServer service and connected via beeline
ran SparkSVD


On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell 
wrote:

> Hey All,
>
> Just an update. Josh, Andrew, and others are working to reproduce
> SPARK-4498 and fix it. Other than that issue no serious regressions
> have been reported so far. If we are able to get a fix in for that
> soon, we'll likely cut another RC with the patch.
>
> Continued testing of RC1 is definitely appreciated!
>
> I'll leave this vote open to allow folks to continue posting comments.
> It's fine to still give "+1" from your own testing... i.e. you can
> assume at this point SPARK-4498 will be fixed before releasing.
>
> - Patrick
>
> On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia 
> wrote:
> > +0.9 from me. Tested it on Mac and Windows (someone has to do it) and
> while things work, I noticed a few recent scripts don't have Windows
> equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and
> https://issues.apache.org/jira/browse/SPARK-4684. The first one at least
> would be good to fix if we do another RC. Not blocking the release but
> useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685.
> >
> > Matei
> >
> >
> >> On Dec 1, 2014, at 11:18 AM, Josh Rosen  wrote:
> >>
> >> Hi everyone,
> >>
> >> There's an open bug report related to Spark standalone which could be a
> potential release-blocker (pending investigation / a bug fix):
> https://issues.apache.org/jira/browse/SPARK-4498.  This issue seems
> non-deterministc and only affects long-running Spark standalone
> deployments, so it may be hard to reproduce.  I'm going to work on a patch
> to add additional logging in order to help with debugging.
> >>
> >> I just wanted to give an early head's up about this issue and to get
> more eyes on it in case anyone else has run into it or wants to help with
> debugging.
> >>
> >> - Josh
> >>
> >> On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com)
> wrote:
> >>
> >> Please vote on releasing the following candidate as Apache Spark
> version 1.2.0!
> >>
> >> The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
> >> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
> 1056e9ec13203d0c51564265e94d77a054498fdb
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> http://people.apache.org/~pwendell/spark-1.2.0-rc1/
> >>
> >> Release artifacts are signed with the following key:
> >> https://people.apache.org/keys/committer/pwendell.asc
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1048/
> >>
> >> The documentation corresponding to this release can be found at:
> >> http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/
> >>
> >> Please vote on releasing this package as Apache Spark 1.2.0!
> >>
> >> The vote is open until Tuesday, December 02, at 05:15 UTC and passes
> >> if a majority of at least 3 +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Spark 1.1.0
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see
> >> http://spark.apache.org/
> >>
> >> == What justifies a -1 vote for this release? ==
> >> This vote is happening very late into the QA period compared with
> >> previous votes, so -1 votes should only occur for significant
> >> regressions from 1.0.2. Bugs already present in 1.1.X, minor
> >> regressions, or bugs related to new features will not block this
> >> release.
> >>
> >> == What default changes should I be aware of? ==
> >> 1. The default value of "spark.shuffle.blockTransferService" has been
> >> changed to "netty"
> >> --> Old behavior can be restored by switching to "nio"
> >>
> >> 2. The default value of "spark.shuffle.manager" has been changed to
> "sort".
> >> --> Old behavior can be restored by setting "spark.shuffle.manager" to
> "hash".
> >>
> >> == Other notes ==
> >> Because this vote is occurring over a weekend, I will likely extend
> >> the vote if this RC survives until the end of the vote period.
> >>
> >> - Patrick
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Can the Scala classes in the spark source code, be inherited in Java classes?

2014-12-02 Thread Niranda Perera
Thanks.

And @Reynold, sorry my bad, Guess I should have used something like
Stackoverflow!

On Tue, Dec 2, 2014 at 12:18 PM, Reynold Xin  wrote:

> Oops my previous response wasn't sent properly to the dev list. Here you
> go for archiving.
>
>
> Yes you can. Scala classes are compiled down to classes in bytecode. Take
> a look at this: https://twitter.github.io/scala_school/java.html
>
> Note that questions like this are not exactly what this dev list is meant
> for  ...
>
> On Mon, Dec 1, 2014 at 9:22 PM, Niranda Perera  wrote:
>
>> Hi,
>>
>> Can the Scala classes in the spark source code, be inherited (and other
>> OOP
>> concepts) in Java classes?
>>
>> I want to customize some part of the code, but I would like to do it in a
>> Java environment.
>>
>> Rgds
>>
>> --
>> *Niranda Perera*
>> Software Engineer, WSO2 Inc.
>> Mobile: +94-71-554-8430
>> Twitter: @n1r44 
>>
>
>


-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 


Re: Required file not found in building

2014-12-02 Thread Stephen Boesch
Thanks Sean, I followed suit (brew install zinc) and that is working.

2014-12-01 22:39 GMT-08:00 Sean Owen :

> I'm having no problems with the build or zinc on my Mac. I use zinc
> from "brew install zinc".
>
> On Tue, Dec 2, 2014 at 3:02 AM, Stephen Boesch  wrote:
> > Mac as well.  Just found the problem:  I had created an alias to zinc a
> > couple of months back. Apparently that is not happy with the build
> anymore.
> > No problem now that the issue has been isolated - just need to fix my
> zinc
> > alias.
> >
> > 2014-12-01 18:55 GMT-08:00 Ted Yu :
> >
> >> I tried the same command on MacBook and didn't experience the same
> error.
> >>
> >> Which OS are you using ?
> >>
> >> Cheers
> >>
> >> On Mon, Dec 1, 2014 at 6:42 PM, Stephen Boesch 
> wrote:
> >>
> >>> It seems there were some additional settings required to build spark
> now .
> >>> This should be a snap for most of you ot there about what I am missing.
> >>> Here is the command line I have traditionally used:
> >>>
> >>>mvn -Pyarn -Phadoop-2.3 -Phive install compile package -DskipTests
> >>>
> >>> That command line is however failing with the lastest from HEAD:
> >>>
> >>> INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @
> >>> spark-network-common_2.10 ---
> >>> [INFO] Using zinc server for incremental compilation
> >>> [INFO] compiler plugin:
> >>> BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
> >>>
> >>> *[error] Required file not found: scala-compiler-2.10.4.jar*
> >>>
> >>> *[error] See zinc -help for information about locating necessary files*
> >>>
> >>> [INFO]
> >>>
> 
> >>> [INFO] Reactor Summary:
> >>> [INFO]
> >>> [INFO] Spark Project Parent POM .. SUCCESS
> >>> [4.077s]
> >>> [INFO] Spark Project Networking .. FAILURE
> >>> [0.445s]
> >>>
> >>>
> >>> OK let's try "zinc -help":
> >>>
> >>> 18:38:00/spark2 $*zinc -help*
> >>> Nailgun server running with 1 cached compiler
> >>>
> >>> Version = 0.3.5.1
> >>>
> >>> Zinc compiler cache limit = 5
> >>> Resident scalac cache limit = 0
> >>> Analysis cache limit = 5
> >>>
> >>> Compiler(Scala 2.10.4) [74ff364f]
> >>> Setup = {
> >>> *   scala compiler =
> >>>
> >>>
> /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar*
> >>>scala library =
> >>>
> >>>
> /Users/steve/.m2/repository/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar
> >>>scala extra = {
> >>>
> >>>
> >>>
> /Users/steve/.m2/repository/org/scala-lang/scala-reflect/2.10.4/scala-reflect-2.10.4.jar
> >>>   /shared/zinc-0.3.5.1/lib/scala-reflect.jar
> >>>}
> >>>sbt interface = /shared/zinc-0.3.5.1/lib/sbt-interface.jar
> >>>compiler interface sources =
> >>> /shared/zinc-0.3.5.1/lib/compiler-interface-sources.jar
> >>>java home =
> >>>fork java = false
> >>>cache directory = /Users/steve/.zinc/0.3.5.1
> >>> }
> >>>
> >>> Does that compiler jar exist?  Yes!
> >>>
> >>> 18:39:34/spark2 $ll
> >>>
> >>>
> /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar
> >>> -rw-r--r--  1 steve  staff  14445780 Apr  9  2014
> >>>
> >>>
> /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar
> >>>
> >>
> >>
>


Fwd: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe

2014-12-02 Thread Yana Kadiyska
Apologies if people get this more than once -- I sent mail to dev@spark
last night and don't see it in the archives. Trying the incubator list
now...wanted to make sure it doesn't get lost in case it's a bug...

-- Forwarded message --
From: Yana Kadiyska 
Date: Mon, Dec 1, 2014 at 8:10 PM
Subject: [Thrift,1.2 RC] what happened to
parquet.hive.serde.ParquetHiveSerDe
To: dev@spark.apache.org


Hi all, apologies if this is not a question for the dev list -- figured
User list might not be appropriate since I'm having trouble with the RC tag.

I just tried deploying the RC and running ThriftServer. I see the following
error:

14/12/01 21:31:42 ERROR UserGroupInformation: PriviledgedActionException
as:anonymous (auth:SIMPLE)
cause:org.apache.hive.service.cli.HiveSQLException:
java.lang.RuntimeException:
MetaException(message:java.lang.ClassNotFoundException Class
parquet.hive.serde.ParquetHiveSerDe not found)
14/12/01 21:31:42 WARN ThriftCLIService: Error executing statement:
org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException:
MetaException(message:java.lang.ClassNotFoundException Class
parquet.hive.serde.ParquetHiveSerDe not found)
at
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:192)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:212)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
at
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
at
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
​


I looked at a working installation that I have(build master a few weeks
ago) and this class used to be included in spark-assembly:

ls *.jar|xargs grep parquet.hive.serde.ParquetHiveSerDe
Binary file spark-assembly-1.2.0-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.0.jar
matches

but with the RC build it's not there?

I tried both the prebuilt CDH drop and later manually built the tag with
the following command:

 ./make-distribution.sh --tgz -Phive -Dhadoop.version=2.0.0-mr1-cdh4.2.0
-Phive-thriftserver
$JAVA_HOME/bin/jar -tvf spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar
|grep parquet.hive.serde.ParquetHiveSerDe

comes back empty...


Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-02 Thread Jeremy Freeman
+1 (non-binding)

Installed version pre-built for Hadoop on a private HPC
ran PySpark shell w/ iPython
loaded data using custom Hadoop input formats
ran MLlib routines in PySpark
ran custom workflows in PySpark
browsed the web UI

Noticeable improvements in stability and performance during large shuffles (as 
well as the elimination of frequent but unpredictable “FileNotFound / too many 
open files” errors).

We initially hit errors during large collects that ran fine in 1.1, but setting 
the new spark.driver.maxResultSize to 0 preserved the old behavior. Definitely 
worth highlighting this setting in the release notes, as the new default may be 
too small for some users and workloads.

— Jeremy

-
jeremyfreeman.net
@thefreemanlab

On Dec 2, 2014, at 3:22 AM, Denny Lee  wrote:

> +1 (non-binding)
> 
> Verified on OSX 10.10.2, built from source,
> spark-shell / spark-submit jobs
> ran various simple Spark / Scala queries
> ran various SparkSQL queries (including HiveContext)
> ran ThriftServer service and connected via beeline
> ran SparkSVD
> 
> 
> On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell 
> wrote:
> 
>> Hey All,
>> 
>> Just an update. Josh, Andrew, and others are working to reproduce
>> SPARK-4498 and fix it. Other than that issue no serious regressions
>> have been reported so far. If we are able to get a fix in for that
>> soon, we'll likely cut another RC with the patch.
>> 
>> Continued testing of RC1 is definitely appreciated!
>> 
>> I'll leave this vote open to allow folks to continue posting comments.
>> It's fine to still give "+1" from your own testing... i.e. you can
>> assume at this point SPARK-4498 will be fixed before releasing.
>> 
>> - Patrick
>> 
>> On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia 
>> wrote:
>>> +0.9 from me. Tested it on Mac and Windows (someone has to do it) and
>> while things work, I noticed a few recent scripts don't have Windows
>> equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and
>> https://issues.apache.org/jira/browse/SPARK-4684. The first one at least
>> would be good to fix if we do another RC. Not blocking the release but
>> useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685.
>>> 
>>> Matei
>>> 
>>> 
 On Dec 1, 2014, at 11:18 AM, Josh Rosen  wrote:
 
 Hi everyone,
 
 There's an open bug report related to Spark standalone which could be a
>> potential release-blocker (pending investigation / a bug fix):
>> https://issues.apache.org/jira/browse/SPARK-4498.  This issue seems
>> non-deterministc and only affects long-running Spark standalone
>> deployments, so it may be hard to reproduce.  I'm going to work on a patch
>> to add additional logging in order to help with debugging.
 
 I just wanted to give an early head's up about this issue and to get
>> more eyes on it in case anyone else has run into it or wants to help with
>> debugging.
 
 - Josh
 
 On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com)
>> wrote:
 
 Please vote on releasing the following candidate as Apache Spark
>> version 1.2.0!
 
 The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
>> 1056e9ec13203d0c51564265e94d77a054498fdb
 
 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.2.0-rc1/
 
 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc
 
 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1048/
 
 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/
 
 Please vote on releasing this package as Apache Spark 1.2.0!
 
 The vote is open until Tuesday, December 02, at 05:15 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.
 
 [ ] +1 Release this package as Apache Spark 1.1.0
 [ ] -1 Do not release this package because ...
 
 To learn more about Apache Spark, please see
 http://spark.apache.org/
 
 == What justifies a -1 vote for this release? ==
 This vote is happening very late into the QA period compared with
 previous votes, so -1 votes should only occur for significant
 regressions from 1.0.2. Bugs already present in 1.1.X, minor
 regressions, or bugs related to new features will not block this
 release.
 
 == What default changes should I be aware of? ==
 1. The default value of "spark.shuffle.blockTransferService" has been
 changed to "netty"
 --> Old behavior can be restored by switching to "nio"
 
 2. The default value of "spark.shuffle.manager" has been changed to
>> "sort".
 --> Old behavior can be restored by setting "spark.shuffle.ma

keeping PR titles / descriptions up to date

2014-12-02 Thread Kay Ousterhout
Hi all,

I've noticed a bunch of times lately where a pull request changes to be
pretty different from the original pull request, and the title /
description never get updated.  Because the pull request title and
description are used as the commit message, the incorrect description lives
on forever, making it harder to understand the reason behind a particular
commit without going back and reading the entire conversation on the pull
request.  If folks could try to keep these up to date (and committers, try
to remember to verify that the title and description are correct before
making merging pull requests), that would be awesome.

-Kay


Re: keeping PR titles / descriptions up to date

2014-12-02 Thread Mridul Muralidharan
I second that !
Would also be great if the JIRA was updated accordingly too.

Regards,
Mridul


On Wed, Dec 3, 2014 at 1:53 AM, Kay Ousterhout  wrote:
> Hi all,
>
> I've noticed a bunch of times lately where a pull request changes to be
> pretty different from the original pull request, and the title /
> description never get updated.  Because the pull request title and
> description are used as the commit message, the incorrect description lives
> on forever, making it harder to understand the reason behind a particular
> commit without going back and reading the entire conversation on the pull
> request.  If folks could try to keep these up to date (and committers, try
> to remember to verify that the title and description are correct before
> making merging pull requests), that would be awesome.
>
> -Kay

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-02 Thread Andrew Or
+1. I also tested on Windows just in case, with jars referring other jars
and python files referring other python files. Path resolution still works.

2014-12-02 10:16 GMT-08:00 Jeremy Freeman :

> +1 (non-binding)
>
> Installed version pre-built for Hadoop on a private HPC
> ran PySpark shell w/ iPython
> loaded data using custom Hadoop input formats
> ran MLlib routines in PySpark
> ran custom workflows in PySpark
> browsed the web UI
>
> Noticeable improvements in stability and performance during large shuffles
> (as well as the elimination of frequent but unpredictable “FileNotFound /
> too many open files” errors).
>
> We initially hit errors during large collects that ran fine in 1.1, but
> setting the new spark.driver.maxResultSize to 0 preserved the old behavior.
> Definitely worth highlighting this setting in the release notes, as the new
> default may be too small for some users and workloads.
>
> — Jeremy
>
> -
> jeremyfreeman.net
> @thefreemanlab
>
> On Dec 2, 2014, at 3:22 AM, Denny Lee  wrote:
>
> > +1 (non-binding)
> >
> > Verified on OSX 10.10.2, built from source,
> > spark-shell / spark-submit jobs
> > ran various simple Spark / Scala queries
> > ran various SparkSQL queries (including HiveContext)
> > ran ThriftServer service and connected via beeline
> > ran SparkSVD
> >
> >
> > On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell 
> > wrote:
> >
> >> Hey All,
> >>
> >> Just an update. Josh, Andrew, and others are working to reproduce
> >> SPARK-4498 and fix it. Other than that issue no serious regressions
> >> have been reported so far. If we are able to get a fix in for that
> >> soon, we'll likely cut another RC with the patch.
> >>
> >> Continued testing of RC1 is definitely appreciated!
> >>
> >> I'll leave this vote open to allow folks to continue posting comments.
> >> It's fine to still give "+1" from your own testing... i.e. you can
> >> assume at this point SPARK-4498 will be fixed before releasing.
> >>
> >> - Patrick
> >>
> >> On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia 
> >> wrote:
> >>> +0.9 from me. Tested it on Mac and Windows (someone has to do it) and
> >> while things work, I noticed a few recent scripts don't have Windows
> >> equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683
> and
> >> https://issues.apache.org/jira/browse/SPARK-4684. The first one at
> least
> >> would be good to fix if we do another RC. Not blocking the release but
> >> useful to fix in docs is
> https://issues.apache.org/jira/browse/SPARK-4685.
> >>>
> >>> Matei
> >>>
> >>>
>  On Dec 1, 2014, at 11:18 AM, Josh Rosen  wrote:
> 
>  Hi everyone,
> 
>  There's an open bug report related to Spark standalone which could be
> a
> >> potential release-blocker (pending investigation / a bug fix):
> >> https://issues.apache.org/jira/browse/SPARK-4498.  This issue seems
> >> non-deterministc and only affects long-running Spark standalone
> >> deployments, so it may be hard to reproduce.  I'm going to work on a
> patch
> >> to add additional logging in order to help with debugging.
> 
>  I just wanted to give an early head's up about this issue and to get
> >> more eyes on it in case anyone else has run into it or wants to help
> with
> >> debugging.
> 
>  - Josh
> 
>  On November 28, 2014 at 9:18:09 PM, Patrick Wendell (
> pwend...@gmail.com)
> >> wrote:
> 
>  Please vote on releasing the following candidate as Apache Spark
> >> version 1.2.0!
> 
>  The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
>  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
> >> 1056e9ec13203d0c51564265e94d77a054498fdb
> 
>  The release files, including signatures, digests, etc. can be found
> at:
>  http://people.apache.org/~pwendell/spark-1.2.0-rc1/
> 
>  Release artifacts are signed with the following key:
>  https://people.apache.org/keys/committer/pwendell.asc
> 
>  The staging repository for this release can be found at:
> 
> https://repository.apache.org/content/repositories/orgapachespark-1048/
> 
>  The documentation corresponding to this release can be found at:
>  http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/
> 
>  Please vote on releasing this package as Apache Spark 1.2.0!
> 
>  The vote is open until Tuesday, December 02, at 05:15 UTC and passes
>  if a majority of at least 3 +1 PMC votes are cast.
> 
>  [ ] +1 Release this package as Apache Spark 1.1.0
>  [ ] -1 Do not release this package because ...
> 
>  To learn more about Apache Spark, please see
>  http://spark.apache.org/
> 
>  == What justifies a -1 vote for this release? ==
>  This vote is happening very late into the QA period compared with
>  previous votes, so -1 votes should only occur for significant
>  regressions from 1.0.2. Bugs already present in 1.1.X, minor
>  regressions, or bugs related to new f

Re: keeping PR titles / descriptions up to date

2014-12-02 Thread Patrick Wendell
Also a note on this for committers - it's possible to re-word the
title during merging, by just running "git commit -a --amend" before
you push the PR.

- Patrick

On Tue, Dec 2, 2014 at 12:50 PM, Mridul Muralidharan  wrote:
> I second that !
> Would also be great if the JIRA was updated accordingly too.
>
> Regards,
> Mridul
>
>
> On Wed, Dec 3, 2014 at 1:53 AM, Kay Ousterhout  
> wrote:
>> Hi all,
>>
>> I've noticed a bunch of times lately where a pull request changes to be
>> pretty different from the original pull request, and the title /
>> description never get updated.  Because the pull request title and
>> description are used as the commit message, the incorrect description lives
>> on forever, making it harder to understand the reason behind a particular
>> commit without going back and reading the entire conversation on the pull
>> request.  If folks could try to keep these up to date (and committers, try
>> to remember to verify that the title and description are correct before
>> making merging pull requests), that would be awesome.
>>
>> -Kay
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Announcing Spark 1.1.1!

2014-12-02 Thread Andrew Or
I am happy to announce the availability of Spark 1.1.1! This is a
maintenance release with many bug fixes, most of which are concentrated in
the core. This list includes various fixes to sort-based shuffle, memory
leak, and spilling issues. Contributions from this release came from 55
developers.

Visit the release notes [1] to read about the new features, or
download [2] the release today.

[1] http://spark.apache.org/releases/spark-release-1-1-1.html
[2] http://spark.apache.org/downloads.html

Please e-mail me directly for any typo's in the release notes or name
listing.

Thanks for everyone who contributed, and congratulations!
-Andrew


Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-02 Thread Tom Graves
+1 tested on yarn.
Tom 

 On Friday, November 28, 2014 11:18 PM, Patrick Wendell 
 wrote:
   

 Please vote on releasing the following candidate as Apache Spark version 1.2.0!

The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.2.0-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1048/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/

Please vote on releasing this package as Apache Spark 1.2.0!

The vote is open until Tuesday, December 02, at 05:15 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.1.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== What justifies a -1 vote for this release? ==
This vote is happening very late into the QA period compared with
previous votes, so -1 votes should only occur for significant
regressions from 1.0.2. Bugs already present in 1.1.X, minor
regressions, or bugs related to new features will not block this
release.

== What default changes should I be aware of? ==
1. The default value of "spark.shuffle.blockTransferService" has been
changed to "netty"
--> Old behavior can be restored by switching to "nio"

2. The default value of "spark.shuffle.manager" has been changed to "sort".
--> Old behavior can be restored by setting "spark.shuffle.manager" to "hash".

== Other notes ==
Because this vote is occurring over a weekend, I will likely extend
the vote if this RC survives until the end of the vote period.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org





Re: Spurious test failures, testing best practices

2014-12-02 Thread Ryan Williams
Following on Mark's Maven examples, here is another related issue I'm
having:

I'd like to compile just the `core` module after a `mvn clean`, without
building an assembly JAR first. Is this possible?

Attempting to do it myself, the steps I performed were:

- `mvn compile -pl core`: fails because `core` depends on `network/common`
and `network/shuffle`, neither of which is installed in my local maven
cache (and which don't exist in central Maven repositories, I guess? I
thought Spark is publishing snapshot releases?)

- `network/shuffle` also depends on `network/common`, so I'll `mvn install`
the latter first: `mvn install -DskipTests -pl network/common`. That
succeeds, and I see a newly built 1.3.0-SNAPSHOT jar in my local maven
repository.

- However, `mvn install -DskipTests -pl network/shuffle` subsequently
fails, seemingly due to not finding network/core. Here's
 a sample full
output from running `mvn install -X -U -DskipTests -pl network/shuffle`
from such a state (the -U was to get around a previous failure based on
having cached a failed lookup of network-common-1.3.0-SNAPSHOT).

- Thinking maven might be special-casing "-SNAPSHOT" versions, I tried
replacing "1.3.0-SNAPSHOT" with "1.3.0.1" globally and repeating these
steps, but the error seems to be the same
.

Any ideas?

Thanks,

-Ryan

On Sun Nov 30 2014 at 6:37:28 PM Mark Hamstra 
wrote:

> >
> > - Start the SBT interactive console with sbt/sbt
> > - Build your assembly by running the "assembly" target in the assembly
> > project: assembly/assembly
> > - Run all the tests in one module: core/test
> > - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
> (this
> > also supports tab completion)
>
>
> The equivalent using Maven:
>
> - Start zinc
> - Build your assembly using the mvn "package" or "install" target
> ("install" is actually the equivalent of SBT's "publishLocal") -- this step
> is the first step in
> http://spark.apache.org/docs/latest/building-with-maven.
> html#spark-tests-in-maven
> - Run all the tests in one module: mvn -pl core test
> - Run a specific suite: mvn -pl core
> -DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't
> strictly necessary if you don't mind waiting for Maven to scan through all
> the other sub-projects only to do nothing; and, of course, it needs to be
> something other than "core" if the test you want to run is in another
> sub-project.)
>
> You also typically want to carry along in each subsequent step any relevant
> command line options you added in the "package"/"install" step.
>
> On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia 
> wrote:
>
> > Hi Ryan,
> >
> > As a tip (and maybe this isn't documented well), I normally use SBT for
> > development to avoid the slow build process, and use its interactive
> > console to run only specific tests. The nice advantage is that SBT can
> keep
> > the Scala compiler loaded and JITed across builds, making it faster to
> > iterate. To use it, you can do the following:
> >
> > - Start the SBT interactive console with sbt/sbt
> > - Build your assembly by running the "assembly" target in the assembly
> > project: assembly/assembly
> > - Run all the tests in one module: core/test
> > - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
> (this
> > also supports tab completion)
> >
> > Running all the tests does take a while, and I usually just rely on
> > Jenkins for that once I've run the tests for the things I believed my
> patch
> > could break. But this is because some of them are integration tests (e.g.
> > DistributedSuite, which creates multi-process mini-clusters). Many of the
> > individual suites run fast without requiring this, however, so you can
> pick
> > the ones you want. Perhaps we should find a way to tag them so people
> can
> > do a "quick-test" that skips the integration ones.
> >
> > The assembly builds are annoying but they only take about a minute for me
> > on a MacBook Pro with SBT warmed up. The assembly is actually only
> required
> > for some of the "integration" tests (which launch new processes), but I'd
> > recommend doing it all the time anyway since it would be very confusing
> to
> > run those with an old assembly. The Scala compiler crash issue can also
> be
> > a problem, but I don't see it very often with SBT. If it happens, I exit
> > SBT and do sbt clean.
> >
> > Anyway, this is useful feedback and I think we should try to improve some
> > of these suites, but hopefully you can also try the faster SBT process.
> At
> > the end of the day, if we want integration tests, the whole test process
> > will take an hour, but most of the developers I know leave that to
> Jenkins
> > and only run individual tests locally before submitting a patch.
> >
> > Matei
> >
> >
> > > On Nov 30, 2014, at 2:39 PM, Ryan Williams <
> > ryan.blake.willi...@gmail.com> wrote:
> > >

Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams
 wrote:
> Following on Mark's Maven examples, here is another related issue I'm
> having:
>
> I'd like to compile just the `core` module after a `mvn clean`, without
> building an assembly JAR first. Is this possible?

Out of curiosity, may I ask why? What's the problem with running "mvn
install -DskipTests" first (or "package" instead of "install",
although I generally do the latter)?

You can probably do what you want if you manually build / install all
the needed dependencies first; you found two, but it seems you're also
missing the "spark-parent" project (which is the top-level pom). That
sounds like a lot of trouble though, for not any gains that I can
see... after the first build you should be able to do what you want
easily.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spurious test failures, testing best practices

2014-12-02 Thread Patrick Wendell
Hey Ryan,

What if you run a single "mvn install" to install all libraries
locally - then can you "mvn compile -pl core"? I think this may be the
only way to make it work.

- Patrick

On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams
 wrote:
> Following on Mark's Maven examples, here is another related issue I'm
> having:
>
> I'd like to compile just the `core` module after a `mvn clean`, without
> building an assembly JAR first. Is this possible?
>
> Attempting to do it myself, the steps I performed were:
>
> - `mvn compile -pl core`: fails because `core` depends on `network/common`
> and `network/shuffle`, neither of which is installed in my local maven
> cache (and which don't exist in central Maven repositories, I guess? I
> thought Spark is publishing snapshot releases?)
>
> - `network/shuffle` also depends on `network/common`, so I'll `mvn install`
> the latter first: `mvn install -DskipTests -pl network/common`. That
> succeeds, and I see a newly built 1.3.0-SNAPSHOT jar in my local maven
> repository.
>
> - However, `mvn install -DskipTests -pl network/shuffle` subsequently
> fails, seemingly due to not finding network/core. Here's
>  a sample full
> output from running `mvn install -X -U -DskipTests -pl network/shuffle`
> from such a state (the -U was to get around a previous failure based on
> having cached a failed lookup of network-common-1.3.0-SNAPSHOT).
>
> - Thinking maven might be special-casing "-SNAPSHOT" versions, I tried
> replacing "1.3.0-SNAPSHOT" with "1.3.0.1" globally and repeating these
> steps, but the error seems to be the same
> .
>
> Any ideas?
>
> Thanks,
>
> -Ryan
>
> On Sun Nov 30 2014 at 6:37:28 PM Mark Hamstra 
> wrote:
>
>> >
>> > - Start the SBT interactive console with sbt/sbt
>> > - Build your assembly by running the "assembly" target in the assembly
>> > project: assembly/assembly
>> > - Run all the tests in one module: core/test
>> > - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
>> (this
>> > also supports tab completion)
>>
>>
>> The equivalent using Maven:
>>
>> - Start zinc
>> - Build your assembly using the mvn "package" or "install" target
>> ("install" is actually the equivalent of SBT's "publishLocal") -- this step
>> is the first step in
>> http://spark.apache.org/docs/latest/building-with-maven.
>> html#spark-tests-in-maven
>> - Run all the tests in one module: mvn -pl core test
>> - Run a specific suite: mvn -pl core
>> -DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't
>> strictly necessary if you don't mind waiting for Maven to scan through all
>> the other sub-projects only to do nothing; and, of course, it needs to be
>> something other than "core" if the test you want to run is in another
>> sub-project.)
>>
>> You also typically want to carry along in each subsequent step any relevant
>> command line options you added in the "package"/"install" step.
>>
>> On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia 
>> wrote:
>>
>> > Hi Ryan,
>> >
>> > As a tip (and maybe this isn't documented well), I normally use SBT for
>> > development to avoid the slow build process, and use its interactive
>> > console to run only specific tests. The nice advantage is that SBT can
>> keep
>> > the Scala compiler loaded and JITed across builds, making it faster to
>> > iterate. To use it, you can do the following:
>> >
>> > - Start the SBT interactive console with sbt/sbt
>> > - Build your assembly by running the "assembly" target in the assembly
>> > project: assembly/assembly
>> > - Run all the tests in one module: core/test
>> > - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
>> (this
>> > also supports tab completion)
>> >
>> > Running all the tests does take a while, and I usually just rely on
>> > Jenkins for that once I've run the tests for the things I believed my
>> patch
>> > could break. But this is because some of them are integration tests (e.g.
>> > DistributedSuite, which creates multi-process mini-clusters). Many of the
>> > individual suites run fast without requiring this, however, so you can
>> pick
>> > the ones you want. Perhaps we should find a way to tag them so people
>> can
>> > do a "quick-test" that skips the integration ones.
>> >
>> > The assembly builds are annoying but they only take about a minute for me
>> > on a MacBook Pro with SBT warmed up. The assembly is actually only
>> required
>> > for some of the "integration" tests (which launch new processes), but I'd
>> > recommend doing it all the time anyway since it would be very confusing
>> to
>> > run those with an old assembly. The Scala compiler crash issue can also
>> be
>> > a problem, but I don't see it very often with SBT. If it happens, I exit
>> > SBT and do sbt clean.
>> >
>> > Anyway, this is useful feedback and I think we should try to improve some
>> > of these suites, but hopefully you can also try

Re: Spurious test failures, testing best practices

2014-12-02 Thread Ryan Williams
Marcelo: by my count, there are 19 maven modules in the codebase. I am
typically only concerned with "core" (and therefore its two dependencies as
well, `network/{shuffle,common}`).

The `mvn package` workflow (and its sbt equivalent) that most people
apparently use involves (for me) compiling+packaging 16 other modules that
I don't care about; I pay this cost whenever I rebase off of master or
encounter the sbt-compiler-crash bug, among other possible scenarios.

Compiling one module (after building/installing its dependencies) seems
like the sort of thing that should be possible, and I don't see why my
previously-documented attempt is failing.

re: Marcelo's comment about "missing the 'spark-parent' project", I saw
that error message too and tried to ascertain what it could mean. Why would
`network/shuffle` need something from the parent project? AFAICT
`network/common` has the same references to the parent project as
`network/shuffle` (namely just a  block in its POM), and yet I can
`mvn install -pl` the former but not the latter. Why would this be? One
difference is that `network/shuffle` has a dependency on another module,
while `network/common` does not.

Does Maven not let you build modules that depend on *any* other modules
without building *all* modules, or is there a way to do this that we've not
found yet?

Patrick: per my response to Marcelo above, I am trying to avoid having to
compile and package a bunch of stuff I am not using, which both `mvn
package` and `mvn install` on the parent project do.





On Tue Dec 02 2014 at 3:45:48 PM Marcelo Vanzin  wrote:

> On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams
>  wrote:
> > Following on Mark's Maven examples, here is another related issue I'm
> > having:
> >
> > I'd like to compile just the `core` module after a `mvn clean`, without
> > building an assembly JAR first. Is this possible?
>
> Out of curiosity, may I ask why? What's the problem with running "mvn
> install -DskipTests" first (or "package" instead of "install",
> although I generally do the latter)?
>
> You can probably do what you want if you manually build / install all
> the needed dependencies first; you found two, but it seems you're also
> missing the "spark-parent" project (which is the top-level pom). That
> sounds like a lot of trouble though, for not any gains that I can
> see... after the first build you should be able to do what you want
> easily.
>
> --
> Marcelo
>


Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 3:39 PM, Ryan Williams
 wrote:
> Marcelo: by my count, there are 19 maven modules in the codebase. I am
> typically only concerned with "core" (and therefore its two dependencies as
> well, `network/{shuffle,common}`).

But you only need to compile the others once. Once you've established
the baseline, you can just compile / test "core" to your heart's
desire. Core tests won't even run until you build the assembly anyway,
since some of them require the assembly to be present.

Also, even if you work in core - I'd say especially if you work in
core - you should still, at some point, compile and test everything
else that depends on it.

So, do this ONCE:

  mvn install -DskipTests

Then do this as many times as you want:

  mvn -pl spark-core_2.10 something

That doesn't seem too bad to me. (Be aware of the "assembly" comment
above, since testing spark-core means you may have to rebuild the
assembly from time to time, if your changes affect those tests.)

> re: Marcelo's comment about "missing the 'spark-parent' project", I saw that
> error message too and tried to ascertain what it could mean. Why would
> `network/shuffle` need something from the parent project?

The "spark-parent" project is the main pom that defines dependencies
and their version, along with lots of build plugins and
configurations. It's needed by all modules to compile correctly.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spurious test failures, testing best practices

2014-12-02 Thread Ryan Williams
On Tue Dec 02 2014 at 4:46:20 PM Marcelo Vanzin  wrote:

> On Tue, Dec 2, 2014 at 3:39 PM, Ryan Williams
>  wrote:
> > Marcelo: by my count, there are 19 maven modules in the codebase. I am
> > typically only concerned with "core" (and therefore its two dependencies
> as
> > well, `network/{shuffle,common}`).
>
> But you only need to compile the others once.


once... every time I rebase off master, or am obliged to `mvn clean` by
some other build-correctness bug, as I said before. In my experience this
works out to a few times per week.


> Once you've established
> the baseline, you can just compile / test "core" to your heart's
> desire.


I understand that this is a workflow that does what I want as a side effect
of doing 3-5x more work (depending whether you count [number of modules
built] or [lines of scala/java compiled]), none of the extra work being
useful to me (more on that below).


> Core tests won't even run until you build the assembly anyway,
> since some of them require the assembly to be present.


The tests you refer to are exactly the ones that I'd like to let Jenkins
run from here on out, per advice I was given elsewhere in this thread and
due to the myriad unpleasantries I've encountered in trying to run them
myself.


>
> Also, even if you work in core - I'd say especially if you work in
> core - you should still, at some point, compile and test everything
> else that depends on it.
>

Last response applies.


>
> So, do this ONCE:
>

again, s/ONCE/several times a week/, in my experience.


>
>   mvn install -DskipTests
>
> Then do this as many times as you want:
>
>   mvn -pl spark-core_2.10 something
>
> That doesn't seem too bad to me.

(Be aware of the "assembly" comment
> above, since testing spark-core means you may have to rebuild the
> assembly from time to time, if your changes affect those tests.)
>
> > re: Marcelo's comment about "missing the 'spark-parent' project", I saw
> that
> > error message too and tried to ascertain what it could mean. Why would
> > `network/shuffle` need something from the parent project?
>
> The "spark-parent" project is the main pom that defines dependencies
> and their version, along with lots of build plugins and
> configurations. It's needed by all modules to compile correctly.
>

- I understand the parent POM has that information.

- I don't understand why Maven would feel that it is unable to compile the
`network/shuffle` module without having first compiled, packaged, and
installed 17 modules (19 minus `network/shuffle` and its dependency
`network/common`) that are not transitive dependencies of `network/shuffle`.

- I am trying to understand whether my failure to get Maven to compile
`network/shuffle` stems from my not knowing the correct incantation to feed
to Maven or from Maven's having a different (and seemingly worse) model for
how it handles module dependencies than I expected.



>
> --
> Marcelo
>


Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 4:40 PM, Ryan Williams
 wrote:
>> But you only need to compile the others once.
>
> once... every time I rebase off master, or am obliged to `mvn clean` by some
> other build-correctness bug, as I said before. In my experience this works
> out to a few times per week.

No, you only need to do it something upstream from core changed (i.e.,
spark-parent, network/common or network/shuffle) in an incompatible
way. Otherwise, you can rebase and just recompile / retest core,
without having to install everything else. I do this kind of thing all
the time. If you have to do "mvn clean" often you're probably doing
something wrong somewhere else.

I understand where you're coming from, but the way you're thinking is
just not how maven works. I too find annoying that maven requires lots
of things to be "installed" before you can use them, when they're all
part of the same project. But well, that's the way things are.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe

2014-12-02 Thread Michael Armbrust
 In Hive 13 (which is the default for Spark 1.2), parquet is included and
thus we no longer include the Hive parquet bundle. You can now use the
included
ParquetSerDe: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

If you want to compile Spark 1.2 with Hive 12 instead you can pass
-Phive-0.12.0 and  parquet.hive.serde.ParquetHiveSerDe will be included as
before.

Michael

On Tue, Dec 2, 2014 at 9:31 AM, Yana Kadiyska 
wrote:

> Apologies if people get this more than once -- I sent mail to dev@spark
> last night and don't see it in the archives. Trying the incubator list
> now...wanted to make sure it doesn't get lost in case it's a bug...
>
> -- Forwarded message --
> From: Yana Kadiyska 
> Date: Mon, Dec 1, 2014 at 8:10 PM
> Subject: [Thrift,1.2 RC] what happened to
> parquet.hive.serde.ParquetHiveSerDe
> To: dev@spark.apache.org
>
>
> Hi all, apologies if this is not a question for the dev list -- figured
> User list might not be appropriate since I'm having trouble with the RC
> tag.
>
> I just tried deploying the RC and running ThriftServer. I see the following
> error:
>
> 14/12/01 21:31:42 ERROR UserGroupInformation: PriviledgedActionException
> as:anonymous (auth:SIMPLE)
> cause:org.apache.hive.service.cli.HiveSQLException:
> java.lang.RuntimeException:
> MetaException(message:java.lang.ClassNotFoundException Class
> parquet.hive.serde.ParquetHiveSerDe not found)
> 14/12/01 21:31:42 WARN ThriftCLIService: Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException:
> MetaException(message:java.lang.ClassNotFoundException Class
> parquet.hive.serde.ParquetHiveSerDe not found)
> at
>
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:192)
> at
>
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> at
>
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:212)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
>
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
> at
>
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
> at
>
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> ​
>
>
> I looked at a working installation that I have(build master a few weeks
> ago) and this class used to be included in spark-assembly:
>
> ls *.jar|xargs grep parquet.hive.serde.ParquetHiveSerDe
> Binary file spark-assembly-1.2.0-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.0.jar
> matches
>
> but with the RC build it's not there?
>
> I tried both the prebuilt CDH drop and later manually built the tag with
> the following command:
>
>  ./make-distribution.sh --tgz -Phive -Dhadoop.version=2.0.0-mr1-cdh4.2.0
> -Phive-thriftserver
> $JAVA_HOME/bin/jar -tvf spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar
> |grep parquet.hive.serde.ParquetHiveSerDe
>
> comes back empty...
>


object xxx is not a member of package com

2014-12-02 Thread flyson
Hello everyone,

Could anybody tell me how to import and call the 3rd party java classes from
inside spark?
Here's my case:
I have a jar file (the directory layout is com.xxx.yyy.zzz) which contains
some java classes, and I need to call some of them in spark code.
I used the statement "import com.xxx.yyy.zzz._" on top of the impacted spark
file and set the location of the jar file in the CLASSPATH environment, and
use ".sbt/sbt assembly" to build the project. As a result, I got an error
saying "object xxx is not a member of package com".

I thought that could be related to the library dependencies, but couldn't
figure it out. Any suggestion/solution from you would be appreciated!

By the way in the scala console, if the :cp is used to point to the jar
file, I can import the classes from the jar file.

Thanks! 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/object-xxx-is-not-a-member-of-package-com-tp9619.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: object xxx is not a member of package com

2014-12-02 Thread Wang, Daoyuan
I think you can place the jar in lib/ in SPARK_HOME, and then compile without 
any change to your class path. This could be a temporary way to include your 
jar. You can also put them in your pom.xml.

Thanks,
Daoyuan

-Original Message-
From: flyson [mailto:m_...@msn.com] 
Sent: Wednesday, December 03, 2014 11:23 AM
To: d...@spark.incubator.apache.org
Subject: object xxx is not a member of package com

Hello everyone,

Could anybody tell me how to import and call the 3rd party java classes from 
inside spark?
Here's my case:
I have a jar file (the directory layout is com.xxx.yyy.zzz) which contains some 
java classes, and I need to call some of them in spark code.
I used the statement "import com.xxx.yyy.zzz._" on top of the impacted spark 
file and set the location of the jar file in the CLASSPATH environment, and use 
".sbt/sbt assembly" to build the project. As a result, I got an error saying 
"object xxx is not a member of package com".

I thought that could be related to the library dependencies, but couldn't 
figure it out. Any suggestion/solution from you would be appreciated!

By the way in the scala console, if the :cp is used to point to the jar file, I 
can import the classes from the jar file.

Thanks! 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/object-xxx-is-not-a-member-of-package-com-tp9619.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional 
commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org