Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-02 Thread Denny Lee
+1 (non-binding)

Verified on OSX 10.10.2, built from source,
spark-shell / spark-submit jobs
ran various simple Spark / Scala queries
ran various SparkSQL queries (including HiveContext)
ran ThriftServer service and connected via beeline
ran SparkSVD


On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com
wrote:

 Hey All,

 Just an update. Josh, Andrew, and others are working to reproduce
 SPARK-4498 and fix it. Other than that issue no serious regressions
 have been reported so far. If we are able to get a fix in for that
 soon, we'll likely cut another RC with the patch.

 Continued testing of RC1 is definitely appreciated!

 I'll leave this vote open to allow folks to continue posting comments.
 It's fine to still give +1 from your own testing... i.e. you can
 assume at this point SPARK-4498 will be fixed before releasing.

 - Patrick

 On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  +0.9 from me. Tested it on Mac and Windows (someone has to do it) and
 while things work, I noticed a few recent scripts don't have Windows
 equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and
 https://issues.apache.org/jira/browse/SPARK-4684. The first one at least
 would be good to fix if we do another RC. Not blocking the release but
 useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685.
 
  Matei
 
 
  On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote:
 
  Hi everyone,
 
  There's an open bug report related to Spark standalone which could be a
 potential release-blocker (pending investigation / a bug fix):
 https://issues.apache.org/jira/browse/SPARK-4498.  This issue seems
 non-deterministc and only affects long-running Spark standalone
 deployments, so it may be hard to reproduce.  I'm going to work on a patch
 to add additional logging in order to help with debugging.
 
  I just wanted to give an early head's up about this issue and to get
 more eyes on it in case anyone else has run into it or wants to help with
 debugging.
 
  - Josh
 
  On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com)
 wrote:
 
  Please vote on releasing the following candidate as Apache Spark
 version 1.2.0!
 
  The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 1056e9ec13203d0c51564265e94d77a054498fdb
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-1.2.0-rc1/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1048/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/
 
  Please vote on releasing this package as Apache Spark 1.2.0!
 
  The vote is open until Tuesday, December 02, at 05:15 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.1.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What justifies a -1 vote for this release? ==
  This vote is happening very late into the QA period compared with
  previous votes, so -1 votes should only occur for significant
  regressions from 1.0.2. Bugs already present in 1.1.X, minor
  regressions, or bugs related to new features will not block this
  release.
 
  == What default changes should I be aware of? ==
  1. The default value of spark.shuffle.blockTransferService has been
  changed to netty
  -- Old behavior can be restored by switching to nio
 
  2. The default value of spark.shuffle.manager has been changed to
 sort.
  -- Old behavior can be restored by setting spark.shuffle.manager to
 hash.
 
  == Other notes ==
  Because this vote is occurring over a weekend, I will likely extend
  the vote if this RC survives until the end of the vote period.
 
  - Patrick
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Can the Scala classes in the spark source code, be inherited in Java classes?

2014-12-02 Thread Niranda Perera
Thanks.

And @Reynold, sorry my bad, Guess I should have used something like
Stackoverflow!

On Tue, Dec 2, 2014 at 12:18 PM, Reynold Xin r...@databricks.com wrote:

 Oops my previous response wasn't sent properly to the dev list. Here you
 go for archiving.


 Yes you can. Scala classes are compiled down to classes in bytecode. Take
 a look at this: https://twitter.github.io/scala_school/java.html

 Note that questions like this are not exactly what this dev list is meant
 for  ...

 On Mon, Dec 1, 2014 at 9:22 PM, Niranda Perera nira...@wso2.com wrote:

 Hi,

 Can the Scala classes in the spark source code, be inherited (and other
 OOP
 concepts) in Java classes?

 I want to customize some part of the code, but I would like to do it in a
 Java environment.

 Rgds

 --
 *Niranda Perera*
 Software Engineer, WSO2 Inc.
 Mobile: +94-71-554-8430
 Twitter: @n1r44 https://twitter.com/N1R44





-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 https://twitter.com/N1R44


Re: Required file not found in building

2014-12-02 Thread Stephen Boesch
Thanks Sean, I followed suit (brew install zinc) and that is working.

2014-12-01 22:39 GMT-08:00 Sean Owen so...@cloudera.com:

 I'm having no problems with the build or zinc on my Mac. I use zinc
 from brew install zinc.

 On Tue, Dec 2, 2014 at 3:02 AM, Stephen Boesch java...@gmail.com wrote:
  Mac as well.  Just found the problem:  I had created an alias to zinc a
  couple of months back. Apparently that is not happy with the build
 anymore.
  No problem now that the issue has been isolated - just need to fix my
 zinc
  alias.
 
  2014-12-01 18:55 GMT-08:00 Ted Yu yuzhih...@gmail.com:
 
  I tried the same command on MacBook and didn't experience the same
 error.
 
  Which OS are you using ?
 
  Cheers
 
  On Mon, Dec 1, 2014 at 6:42 PM, Stephen Boesch java...@gmail.com
 wrote:
 
  It seems there were some additional settings required to build spark
 now .
  This should be a snap for most of you ot there about what I am missing.
  Here is the command line I have traditionally used:
 
 mvn -Pyarn -Phadoop-2.3 -Phive install compile package -DskipTests
 
  That command line is however failing with the lastest from HEAD:
 
  INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @
  spark-network-common_2.10 ---
  [INFO] Using zinc server for incremental compilation
  [INFO] compiler plugin:
  BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
 
  *[error] Required file not found: scala-compiler-2.10.4.jar*
 
  *[error] See zinc -help for information about locating necessary files*
 
  [INFO]
 
 
  [INFO] Reactor Summary:
  [INFO]
  [INFO] Spark Project Parent POM .. SUCCESS
  [4.077s]
  [INFO] Spark Project Networking .. FAILURE
  [0.445s]
 
 
  OK let's try zinc -help:
 
  18:38:00/spark2 $*zinc -help*
  Nailgun server running with 1 cached compiler
 
  Version = 0.3.5.1
 
  Zinc compiler cache limit = 5
  Resident scalac cache limit = 0
  Analysis cache limit = 5
 
  Compiler(Scala 2.10.4) [74ff364f]
  Setup = {
  *   scala compiler =
 
 
 /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar*
 scala library =
 
 
 /Users/steve/.m2/repository/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar
 scala extra = {
 
 
 
 /Users/steve/.m2/repository/org/scala-lang/scala-reflect/2.10.4/scala-reflect-2.10.4.jar
/shared/zinc-0.3.5.1/lib/scala-reflect.jar
 }
 sbt interface = /shared/zinc-0.3.5.1/lib/sbt-interface.jar
 compiler interface sources =
  /shared/zinc-0.3.5.1/lib/compiler-interface-sources.jar
 java home =
 fork java = false
 cache directory = /Users/steve/.zinc/0.3.5.1
  }
 
  Does that compiler jar exist?  Yes!
 
  18:39:34/spark2 $ll
 
 
 /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar
  -rw-r--r--  1 steve  staff  14445780 Apr  9  2014
 
 
 /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar
 
 
 



Fwd: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe

2014-12-02 Thread Yana Kadiyska
Apologies if people get this more than once -- I sent mail to dev@spark
last night and don't see it in the archives. Trying the incubator list
now...wanted to make sure it doesn't get lost in case it's a bug...

-- Forwarded message --
From: Yana Kadiyska yana.kadiy...@gmail.com
Date: Mon, Dec 1, 2014 at 8:10 PM
Subject: [Thrift,1.2 RC] what happened to
parquet.hive.serde.ParquetHiveSerDe
To: dev@spark.apache.org


Hi all, apologies if this is not a question for the dev list -- figured
User list might not be appropriate since I'm having trouble with the RC tag.

I just tried deploying the RC and running ThriftServer. I see the following
error:

14/12/01 21:31:42 ERROR UserGroupInformation: PriviledgedActionException
as:anonymous (auth:SIMPLE)
cause:org.apache.hive.service.cli.HiveSQLException:
java.lang.RuntimeException:
MetaException(message:java.lang.ClassNotFoundException Class
parquet.hive.serde.ParquetHiveSerDe not found)
14/12/01 21:31:42 WARN ThriftCLIService: Error executing statement:
org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException:
MetaException(message:java.lang.ClassNotFoundException Class
parquet.hive.serde.ParquetHiveSerDe not found)
at
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:192)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:212)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
at
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
at
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
​


I looked at a working installation that I have(build master a few weeks
ago) and this class used to be included in spark-assembly:

ls *.jar|xargs grep parquet.hive.serde.ParquetHiveSerDe
Binary file spark-assembly-1.2.0-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.0.jar
matches

but with the RC build it's not there?

I tried both the prebuilt CDH drop and later manually built the tag with
the following command:

 ./make-distribution.sh --tgz -Phive -Dhadoop.version=2.0.0-mr1-cdh4.2.0
-Phive-thriftserver
$JAVA_HOME/bin/jar -tvf spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar
|grep parquet.hive.serde.ParquetHiveSerDe

comes back empty...


Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-02 Thread Jeremy Freeman
+1 (non-binding)

Installed version pre-built for Hadoop on a private HPC
ran PySpark shell w/ iPython
loaded data using custom Hadoop input formats
ran MLlib routines in PySpark
ran custom workflows in PySpark
browsed the web UI

Noticeable improvements in stability and performance during large shuffles (as 
well as the elimination of frequent but unpredictable “FileNotFound / too many 
open files” errors).

We initially hit errors during large collects that ran fine in 1.1, but setting 
the new spark.driver.maxResultSize to 0 preserved the old behavior. Definitely 
worth highlighting this setting in the release notes, as the new default may be 
too small for some users and workloads.

— Jeremy

-
jeremyfreeman.net
@thefreemanlab

On Dec 2, 2014, at 3:22 AM, Denny Lee denny.g@gmail.com wrote:

 +1 (non-binding)
 
 Verified on OSX 10.10.2, built from source,
 spark-shell / spark-submit jobs
 ran various simple Spark / Scala queries
 ran various SparkSQL queries (including HiveContext)
 ran ThriftServer service and connected via beeline
 ran SparkSVD
 
 
 On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com
 wrote:
 
 Hey All,
 
 Just an update. Josh, Andrew, and others are working to reproduce
 SPARK-4498 and fix it. Other than that issue no serious regressions
 have been reported so far. If we are able to get a fix in for that
 soon, we'll likely cut another RC with the patch.
 
 Continued testing of RC1 is definitely appreciated!
 
 I'll leave this vote open to allow folks to continue posting comments.
 It's fine to still give +1 from your own testing... i.e. you can
 assume at this point SPARK-4498 will be fixed before releasing.
 
 - Patrick
 
 On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 +0.9 from me. Tested it on Mac and Windows (someone has to do it) and
 while things work, I noticed a few recent scripts don't have Windows
 equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and
 https://issues.apache.org/jira/browse/SPARK-4684. The first one at least
 would be good to fix if we do another RC. Not blocking the release but
 useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685.
 
 Matei
 
 
 On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote:
 
 Hi everyone,
 
 There's an open bug report related to Spark standalone which could be a
 potential release-blocker (pending investigation / a bug fix):
 https://issues.apache.org/jira/browse/SPARK-4498.  This issue seems
 non-deterministc and only affects long-running Spark standalone
 deployments, so it may be hard to reproduce.  I'm going to work on a patch
 to add additional logging in order to help with debugging.
 
 I just wanted to give an early head's up about this issue and to get
 more eyes on it in case anyone else has run into it or wants to help with
 debugging.
 
 - Josh
 
 On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com)
 wrote:
 
 Please vote on releasing the following candidate as Apache Spark
 version 1.2.0!
 
 The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 1056e9ec13203d0c51564265e94d77a054498fdb
 
 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.2.0-rc1/
 
 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc
 
 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1048/
 
 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/
 
 Please vote on releasing this package as Apache Spark 1.2.0!
 
 The vote is open until Tuesday, December 02, at 05:15 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.
 
 [ ] +1 Release this package as Apache Spark 1.1.0
 [ ] -1 Do not release this package because ...
 
 To learn more about Apache Spark, please see
 http://spark.apache.org/
 
 == What justifies a -1 vote for this release? ==
 This vote is happening very late into the QA period compared with
 previous votes, so -1 votes should only occur for significant
 regressions from 1.0.2. Bugs already present in 1.1.X, minor
 regressions, or bugs related to new features will not block this
 release.
 
 == What default changes should I be aware of? ==
 1. The default value of spark.shuffle.blockTransferService has been
 changed to netty
 -- Old behavior can be restored by switching to nio
 
 2. The default value of spark.shuffle.manager has been changed to
 sort.
 -- Old behavior can be restored by setting spark.shuffle.manager to
 hash.
 
 == Other notes ==
 Because this vote is occurring over a weekend, I will likely extend
 the vote if this RC survives until the end of the vote period.
 
 - Patrick
 
 

keeping PR titles / descriptions up to date

2014-12-02 Thread Kay Ousterhout
Hi all,

I've noticed a bunch of times lately where a pull request changes to be
pretty different from the original pull request, and the title /
description never get updated.  Because the pull request title and
description are used as the commit message, the incorrect description lives
on forever, making it harder to understand the reason behind a particular
commit without going back and reading the entire conversation on the pull
request.  If folks could try to keep these up to date (and committers, try
to remember to verify that the title and description are correct before
making merging pull requests), that would be awesome.

-Kay


Re: keeping PR titles / descriptions up to date

2014-12-02 Thread Mridul Muralidharan
I second that !
Would also be great if the JIRA was updated accordingly too.

Regards,
Mridul


On Wed, Dec 3, 2014 at 1:53 AM, Kay Ousterhout kayousterh...@gmail.com wrote:
 Hi all,

 I've noticed a bunch of times lately where a pull request changes to be
 pretty different from the original pull request, and the title /
 description never get updated.  Because the pull request title and
 description are used as the commit message, the incorrect description lives
 on forever, making it harder to understand the reason behind a particular
 commit without going back and reading the entire conversation on the pull
 request.  If folks could try to keep these up to date (and committers, try
 to remember to verify that the title and description are correct before
 making merging pull requests), that would be awesome.

 -Kay

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-02 Thread Andrew Or
+1. I also tested on Windows just in case, with jars referring other jars
and python files referring other python files. Path resolution still works.

2014-12-02 10:16 GMT-08:00 Jeremy Freeman freeman.jer...@gmail.com:

 +1 (non-binding)

 Installed version pre-built for Hadoop on a private HPC
 ran PySpark shell w/ iPython
 loaded data using custom Hadoop input formats
 ran MLlib routines in PySpark
 ran custom workflows in PySpark
 browsed the web UI

 Noticeable improvements in stability and performance during large shuffles
 (as well as the elimination of frequent but unpredictable “FileNotFound /
 too many open files” errors).

 We initially hit errors during large collects that ran fine in 1.1, but
 setting the new spark.driver.maxResultSize to 0 preserved the old behavior.
 Definitely worth highlighting this setting in the release notes, as the new
 default may be too small for some users and workloads.

 — Jeremy

 -
 jeremyfreeman.net
 @thefreemanlab

 On Dec 2, 2014, at 3:22 AM, Denny Lee denny.g@gmail.com wrote:

  +1 (non-binding)
 
  Verified on OSX 10.10.2, built from source,
  spark-shell / spark-submit jobs
  ran various simple Spark / Scala queries
  ran various SparkSQL queries (including HiveContext)
  ran ThriftServer service and connected via beeline
  ran SparkSVD
 
 
  On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com
  wrote:
 
  Hey All,
 
  Just an update. Josh, Andrew, and others are working to reproduce
  SPARK-4498 and fix it. Other than that issue no serious regressions
  have been reported so far. If we are able to get a fix in for that
  soon, we'll likely cut another RC with the patch.
 
  Continued testing of RC1 is definitely appreciated!
 
  I'll leave this vote open to allow folks to continue posting comments.
  It's fine to still give +1 from your own testing... i.e. you can
  assume at this point SPARK-4498 will be fixed before releasing.
 
  - Patrick
 
  On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
  +0.9 from me. Tested it on Mac and Windows (someone has to do it) and
  while things work, I noticed a few recent scripts don't have Windows
  equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683
 and
  https://issues.apache.org/jira/browse/SPARK-4684. The first one at
 least
  would be good to fix if we do another RC. Not blocking the release but
  useful to fix in docs is
 https://issues.apache.org/jira/browse/SPARK-4685.
 
  Matei
 
 
  On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote:
 
  Hi everyone,
 
  There's an open bug report related to Spark standalone which could be
 a
  potential release-blocker (pending investigation / a bug fix):
  https://issues.apache.org/jira/browse/SPARK-4498.  This issue seems
  non-deterministc and only affects long-running Spark standalone
  deployments, so it may be hard to reproduce.  I'm going to work on a
 patch
  to add additional logging in order to help with debugging.
 
  I just wanted to give an early head's up about this issue and to get
  more eyes on it in case anyone else has run into it or wants to help
 with
  debugging.
 
  - Josh
 
  On November 28, 2014 at 9:18:09 PM, Patrick Wendell (
 pwend...@gmail.com)
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
  version 1.2.0!
 
  The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  1056e9ec13203d0c51564265e94d77a054498fdb
 
  The release files, including signatures, digests, etc. can be found
 at:
  http://people.apache.org/~pwendell/spark-1.2.0-rc1/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
 
 https://repository.apache.org/content/repositories/orgapachespark-1048/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/
 
  Please vote on releasing this package as Apache Spark 1.2.0!
 
  The vote is open until Tuesday, December 02, at 05:15 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.1.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What justifies a -1 vote for this release? ==
  This vote is happening very late into the QA period compared with
  previous votes, so -1 votes should only occur for significant
  regressions from 1.0.2. Bugs already present in 1.1.X, minor
  regressions, or bugs related to new features will not block this
  release.
 
  == What default changes should I be aware of? ==
  1. The default value of spark.shuffle.blockTransferService has been
  changed to netty
  -- Old behavior can be restored by switching to nio
 
  2. The default value of spark.shuffle.manager has been 

Re: keeping PR titles / descriptions up to date

2014-12-02 Thread Patrick Wendell
Also a note on this for committers - it's possible to re-word the
title during merging, by just running git commit -a --amend before
you push the PR.

- Patrick

On Tue, Dec 2, 2014 at 12:50 PM, Mridul Muralidharan mri...@gmail.com wrote:
 I second that !
 Would also be great if the JIRA was updated accordingly too.

 Regards,
 Mridul


 On Wed, Dec 3, 2014 at 1:53 AM, Kay Ousterhout kayousterh...@gmail.com 
 wrote:
 Hi all,

 I've noticed a bunch of times lately where a pull request changes to be
 pretty different from the original pull request, and the title /
 description never get updated.  Because the pull request title and
 description are used as the commit message, the incorrect description lives
 on forever, making it harder to understand the reason behind a particular
 commit without going back and reading the entire conversation on the pull
 request.  If folks could try to keep these up to date (and committers, try
 to remember to verify that the title and description are correct before
 making merging pull requests), that would be awesome.

 -Kay

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Announcing Spark 1.1.1!

2014-12-02 Thread Andrew Or
I am happy to announce the availability of Spark 1.1.1! This is a
maintenance release with many bug fixes, most of which are concentrated in
the core. This list includes various fixes to sort-based shuffle, memory
leak, and spilling issues. Contributions from this release came from 55
developers.

Visit the release notes [1] to read about the new features, or
download [2] the release today.

[1] http://spark.apache.org/releases/spark-release-1-1-1.html
[2] http://spark.apache.org/downloads.html

Please e-mail me directly for any typo's in the release notes or name
listing.

Thanks for everyone who contributed, and congratulations!
-Andrew


Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-02 Thread Tom Graves
+1 tested on yarn.
Tom 

 On Friday, November 28, 2014 11:18 PM, Patrick Wendell 
pwend...@gmail.com wrote:
   

 Please vote on releasing the following candidate as Apache Spark version 1.2.0!

The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.2.0-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1048/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/

Please vote on releasing this package as Apache Spark 1.2.0!

The vote is open until Tuesday, December 02, at 05:15 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.1.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== What justifies a -1 vote for this release? ==
This vote is happening very late into the QA period compared with
previous votes, so -1 votes should only occur for significant
regressions from 1.0.2. Bugs already present in 1.1.X, minor
regressions, or bugs related to new features will not block this
release.

== What default changes should I be aware of? ==
1. The default value of spark.shuffle.blockTransferService has been
changed to netty
-- Old behavior can be restored by switching to nio

2. The default value of spark.shuffle.manager has been changed to sort.
-- Old behavior can be restored by setting spark.shuffle.manager to hash.

== Other notes ==
Because this vote is occurring over a weekend, I will likely extend
the vote if this RC survives until the end of the vote period.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org





Re: Spurious test failures, testing best practices

2014-12-02 Thread Ryan Williams
Following on Mark's Maven examples, here is another related issue I'm
having:

I'd like to compile just the `core` module after a `mvn clean`, without
building an assembly JAR first. Is this possible?

Attempting to do it myself, the steps I performed were:

- `mvn compile -pl core`: fails because `core` depends on `network/common`
and `network/shuffle`, neither of which is installed in my local maven
cache (and which don't exist in central Maven repositories, I guess? I
thought Spark is publishing snapshot releases?)

- `network/shuffle` also depends on `network/common`, so I'll `mvn install`
the latter first: `mvn install -DskipTests -pl network/common`. That
succeeds, and I see a newly built 1.3.0-SNAPSHOT jar in my local maven
repository.

- However, `mvn install -DskipTests -pl network/shuffle` subsequently
fails, seemingly due to not finding network/core. Here's
https://gist.github.com/ryan-williams/1711189e7d0af558738d a sample full
output from running `mvn install -X -U -DskipTests -pl network/shuffle`
from such a state (the -U was to get around a previous failure based on
having cached a failed lookup of network-common-1.3.0-SNAPSHOT).

- Thinking maven might be special-casing -SNAPSHOT versions, I tried
replacing 1.3.0-SNAPSHOT with 1.3.0.1 globally and repeating these
steps, but the error seems to be the same
https://gist.github.com/ryan-williams/37fcdd14dd92fa562dbe.

Any ideas?

Thanks,

-Ryan

On Sun Nov 30 2014 at 6:37:28 PM Mark Hamstra m...@clearstorydata.com
wrote:

 
  - Start the SBT interactive console with sbt/sbt
  - Build your assembly by running the assembly target in the assembly
  project: assembly/assembly
  - Run all the tests in one module: core/test
  - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
 (this
  also supports tab completion)


 The equivalent using Maven:

 - Start zinc
 - Build your assembly using the mvn package or install target
 (install is actually the equivalent of SBT's publishLocal) -- this step
 is the first step in
 http://spark.apache.org/docs/latest/building-with-maven.
 html#spark-tests-in-maven
 - Run all the tests in one module: mvn -pl core test
 - Run a specific suite: mvn -pl core
 -DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't
 strictly necessary if you don't mind waiting for Maven to scan through all
 the other sub-projects only to do nothing; and, of course, it needs to be
 something other than core if the test you want to run is in another
 sub-project.)

 You also typically want to carry along in each subsequent step any relevant
 command line options you added in the package/install step.

 On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  Hi Ryan,
 
  As a tip (and maybe this isn't documented well), I normally use SBT for
  development to avoid the slow build process, and use its interactive
  console to run only specific tests. The nice advantage is that SBT can
 keep
  the Scala compiler loaded and JITed across builds, making it faster to
  iterate. To use it, you can do the following:
 
  - Start the SBT interactive console with sbt/sbt
  - Build your assembly by running the assembly target in the assembly
  project: assembly/assembly
  - Run all the tests in one module: core/test
  - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
 (this
  also supports tab completion)
 
  Running all the tests does take a while, and I usually just rely on
  Jenkins for that once I've run the tests for the things I believed my
 patch
  could break. But this is because some of them are integration tests (e.g.
  DistributedSuite, which creates multi-process mini-clusters). Many of the
  individual suites run fast without requiring this, however, so you can
 pick
  the ones you want. Perhaps we should find a way to tag them so people
 can
  do a quick-test that skips the integration ones.
 
  The assembly builds are annoying but they only take about a minute for me
  on a MacBook Pro with SBT warmed up. The assembly is actually only
 required
  for some of the integration tests (which launch new processes), but I'd
  recommend doing it all the time anyway since it would be very confusing
 to
  run those with an old assembly. The Scala compiler crash issue can also
 be
  a problem, but I don't see it very often with SBT. If it happens, I exit
  SBT and do sbt clean.
 
  Anyway, this is useful feedback and I think we should try to improve some
  of these suites, but hopefully you can also try the faster SBT process.
 At
  the end of the day, if we want integration tests, the whole test process
  will take an hour, but most of the developers I know leave that to
 Jenkins
  and only run individual tests locally before submitting a patch.
 
  Matei
 
 
   On Nov 30, 2014, at 2:39 PM, Ryan Williams 
  ryan.blake.willi...@gmail.com wrote:
  
   In the course of trying to make contributions to Spark, I have had a
 lot
  of
   trouble running Spark's tests 

Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams
ryan.blake.willi...@gmail.com wrote:
 Following on Mark's Maven examples, here is another related issue I'm
 having:

 I'd like to compile just the `core` module after a `mvn clean`, without
 building an assembly JAR first. Is this possible?

Out of curiosity, may I ask why? What's the problem with running mvn
install -DskipTests first (or package instead of install,
although I generally do the latter)?

You can probably do what you want if you manually build / install all
the needed dependencies first; you found two, but it seems you're also
missing the spark-parent project (which is the top-level pom). That
sounds like a lot of trouble though, for not any gains that I can
see... after the first build you should be able to do what you want
easily.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spurious test failures, testing best practices

2014-12-02 Thread Patrick Wendell
Hey Ryan,

What if you run a single mvn install to install all libraries
locally - then can you mvn compile -pl core? I think this may be the
only way to make it work.

- Patrick

On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams
ryan.blake.willi...@gmail.com wrote:
 Following on Mark's Maven examples, here is another related issue I'm
 having:

 I'd like to compile just the `core` module after a `mvn clean`, without
 building an assembly JAR first. Is this possible?

 Attempting to do it myself, the steps I performed were:

 - `mvn compile -pl core`: fails because `core` depends on `network/common`
 and `network/shuffle`, neither of which is installed in my local maven
 cache (and which don't exist in central Maven repositories, I guess? I
 thought Spark is publishing snapshot releases?)

 - `network/shuffle` also depends on `network/common`, so I'll `mvn install`
 the latter first: `mvn install -DskipTests -pl network/common`. That
 succeeds, and I see a newly built 1.3.0-SNAPSHOT jar in my local maven
 repository.

 - However, `mvn install -DskipTests -pl network/shuffle` subsequently
 fails, seemingly due to not finding network/core. Here's
 https://gist.github.com/ryan-williams/1711189e7d0af558738d a sample full
 output from running `mvn install -X -U -DskipTests -pl network/shuffle`
 from such a state (the -U was to get around a previous failure based on
 having cached a failed lookup of network-common-1.3.0-SNAPSHOT).

 - Thinking maven might be special-casing -SNAPSHOT versions, I tried
 replacing 1.3.0-SNAPSHOT with 1.3.0.1 globally and repeating these
 steps, but the error seems to be the same
 https://gist.github.com/ryan-williams/37fcdd14dd92fa562dbe.

 Any ideas?

 Thanks,

 -Ryan

 On Sun Nov 30 2014 at 6:37:28 PM Mark Hamstra m...@clearstorydata.com
 wrote:

 
  - Start the SBT interactive console with sbt/sbt
  - Build your assembly by running the assembly target in the assembly
  project: assembly/assembly
  - Run all the tests in one module: core/test
  - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
 (this
  also supports tab completion)


 The equivalent using Maven:

 - Start zinc
 - Build your assembly using the mvn package or install target
 (install is actually the equivalent of SBT's publishLocal) -- this step
 is the first step in
 http://spark.apache.org/docs/latest/building-with-maven.
 html#spark-tests-in-maven
 - Run all the tests in one module: mvn -pl core test
 - Run a specific suite: mvn -pl core
 -DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't
 strictly necessary if you don't mind waiting for Maven to scan through all
 the other sub-projects only to do nothing; and, of course, it needs to be
 something other than core if the test you want to run is in another
 sub-project.)

 You also typically want to carry along in each subsequent step any relevant
 command line options you added in the package/install step.

 On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

  Hi Ryan,
 
  As a tip (and maybe this isn't documented well), I normally use SBT for
  development to avoid the slow build process, and use its interactive
  console to run only specific tests. The nice advantage is that SBT can
 keep
  the Scala compiler loaded and JITed across builds, making it faster to
  iterate. To use it, you can do the following:
 
  - Start the SBT interactive console with sbt/sbt
  - Build your assembly by running the assembly target in the assembly
  project: assembly/assembly
  - Run all the tests in one module: core/test
  - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
 (this
  also supports tab completion)
 
  Running all the tests does take a while, and I usually just rely on
  Jenkins for that once I've run the tests for the things I believed my
 patch
  could break. But this is because some of them are integration tests (e.g.
  DistributedSuite, which creates multi-process mini-clusters). Many of the
  individual suites run fast without requiring this, however, so you can
 pick
  the ones you want. Perhaps we should find a way to tag them so people
 can
  do a quick-test that skips the integration ones.
 
  The assembly builds are annoying but they only take about a minute for me
  on a MacBook Pro with SBT warmed up. The assembly is actually only
 required
  for some of the integration tests (which launch new processes), but I'd
  recommend doing it all the time anyway since it would be very confusing
 to
  run those with an old assembly. The Scala compiler crash issue can also
 be
  a problem, but I don't see it very often with SBT. If it happens, I exit
  SBT and do sbt clean.
 
  Anyway, this is useful feedback and I think we should try to improve some
  of these suites, but hopefully you can also try the faster SBT process.
 At
  the end of the day, if we want integration tests, the whole test process
  will take an hour, but most of the developers I know leave that to
 Jenkins
  

Re: Spurious test failures, testing best practices

2014-12-02 Thread Ryan Williams
Marcelo: by my count, there are 19 maven modules in the codebase. I am
typically only concerned with core (and therefore its two dependencies as
well, `network/{shuffle,common}`).

The `mvn package` workflow (and its sbt equivalent) that most people
apparently use involves (for me) compiling+packaging 16 other modules that
I don't care about; I pay this cost whenever I rebase off of master or
encounter the sbt-compiler-crash bug, among other possible scenarios.

Compiling one module (after building/installing its dependencies) seems
like the sort of thing that should be possible, and I don't see why my
previously-documented attempt is failing.

re: Marcelo's comment about missing the 'spark-parent' project, I saw
that error message too and tried to ascertain what it could mean. Why would
`network/shuffle` need something from the parent project? AFAICT
`network/common` has the same references to the parent project as
`network/shuffle` (namely just a parent block in its POM), and yet I can
`mvn install -pl` the former but not the latter. Why would this be? One
difference is that `network/shuffle` has a dependency on another module,
while `network/common` does not.

Does Maven not let you build modules that depend on *any* other modules
without building *all* modules, or is there a way to do this that we've not
found yet?

Patrick: per my response to Marcelo above, I am trying to avoid having to
compile and package a bunch of stuff I am not using, which both `mvn
package` and `mvn install` on the parent project do.





On Tue Dec 02 2014 at 3:45:48 PM Marcelo Vanzin van...@cloudera.com wrote:

 On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams
 ryan.blake.willi...@gmail.com wrote:
  Following on Mark's Maven examples, here is another related issue I'm
  having:
 
  I'd like to compile just the `core` module after a `mvn clean`, without
  building an assembly JAR first. Is this possible?

 Out of curiosity, may I ask why? What's the problem with running mvn
 install -DskipTests first (or package instead of install,
 although I generally do the latter)?

 You can probably do what you want if you manually build / install all
 the needed dependencies first; you found two, but it seems you're also
 missing the spark-parent project (which is the top-level pom). That
 sounds like a lot of trouble though, for not any gains that I can
 see... after the first build you should be able to do what you want
 easily.

 --
 Marcelo



Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 3:39 PM, Ryan Williams
ryan.blake.willi...@gmail.com wrote:
 Marcelo: by my count, there are 19 maven modules in the codebase. I am
 typically only concerned with core (and therefore its two dependencies as
 well, `network/{shuffle,common}`).

But you only need to compile the others once. Once you've established
the baseline, you can just compile / test core to your heart's
desire. Core tests won't even run until you build the assembly anyway,
since some of them require the assembly to be present.

Also, even if you work in core - I'd say especially if you work in
core - you should still, at some point, compile and test everything
else that depends on it.

So, do this ONCE:

  mvn install -DskipTests

Then do this as many times as you want:

  mvn -pl spark-core_2.10 something

That doesn't seem too bad to me. (Be aware of the assembly comment
above, since testing spark-core means you may have to rebuild the
assembly from time to time, if your changes affect those tests.)

 re: Marcelo's comment about missing the 'spark-parent' project, I saw that
 error message too and tried to ascertain what it could mean. Why would
 `network/shuffle` need something from the parent project?

The spark-parent project is the main pom that defines dependencies
and their version, along with lots of build plugins and
configurations. It's needed by all modules to compile correctly.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spurious test failures, testing best practices

2014-12-02 Thread Ryan Williams
On Tue Dec 02 2014 at 4:46:20 PM Marcelo Vanzin van...@cloudera.com wrote:

 On Tue, Dec 2, 2014 at 3:39 PM, Ryan Williams
 ryan.blake.willi...@gmail.com wrote:
  Marcelo: by my count, there are 19 maven modules in the codebase. I am
  typically only concerned with core (and therefore its two dependencies
 as
  well, `network/{shuffle,common}`).

 But you only need to compile the others once.


once... every time I rebase off master, or am obliged to `mvn clean` by
some other build-correctness bug, as I said before. In my experience this
works out to a few times per week.


 Once you've established
 the baseline, you can just compile / test core to your heart's
 desire.


I understand that this is a workflow that does what I want as a side effect
of doing 3-5x more work (depending whether you count [number of modules
built] or [lines of scala/java compiled]), none of the extra work being
useful to me (more on that below).


 Core tests won't even run until you build the assembly anyway,
 since some of them require the assembly to be present.


The tests you refer to are exactly the ones that I'd like to let Jenkins
run from here on out, per advice I was given elsewhere in this thread and
due to the myriad unpleasantries I've encountered in trying to run them
myself.



 Also, even if you work in core - I'd say especially if you work in
 core - you should still, at some point, compile and test everything
 else that depends on it.


Last response applies.



 So, do this ONCE:


again, s/ONCE/several times a week/, in my experience.



   mvn install -DskipTests

 Then do this as many times as you want:

   mvn -pl spark-core_2.10 something

 That doesn't seem too bad to me.

(Be aware of the assembly comment
 above, since testing spark-core means you may have to rebuild the
 assembly from time to time, if your changes affect those tests.)

  re: Marcelo's comment about missing the 'spark-parent' project, I saw
 that
  error message too and tried to ascertain what it could mean. Why would
  `network/shuffle` need something from the parent project?

 The spark-parent project is the main pom that defines dependencies
 and their version, along with lots of build plugins and
 configurations. It's needed by all modules to compile correctly.


- I understand the parent POM has that information.

- I don't understand why Maven would feel that it is unable to compile the
`network/shuffle` module without having first compiled, packaged, and
installed 17 modules (19 minus `network/shuffle` and its dependency
`network/common`) that are not transitive dependencies of `network/shuffle`.

- I am trying to understand whether my failure to get Maven to compile
`network/shuffle` stems from my not knowing the correct incantation to feed
to Maven or from Maven's having a different (and seemingly worse) model for
how it handles module dependencies than I expected.




 --
 Marcelo



Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 4:40 PM, Ryan Williams
ryan.blake.willi...@gmail.com wrote:
 But you only need to compile the others once.

 once... every time I rebase off master, or am obliged to `mvn clean` by some
 other build-correctness bug, as I said before. In my experience this works
 out to a few times per week.

No, you only need to do it something upstream from core changed (i.e.,
spark-parent, network/common or network/shuffle) in an incompatible
way. Otherwise, you can rebase and just recompile / retest core,
without having to install everything else. I do this kind of thing all
the time. If you have to do mvn clean often you're probably doing
something wrong somewhere else.

I understand where you're coming from, but the way you're thinking is
just not how maven works. I too find annoying that maven requires lots
of things to be installed before you can use them, when they're all
part of the same project. But well, that's the way things are.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe

2014-12-02 Thread Michael Armbrust
 In Hive 13 (which is the default for Spark 1.2), parquet is included and
thus we no longer include the Hive parquet bundle. You can now use the
included
ParquetSerDe: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

If you want to compile Spark 1.2 with Hive 12 instead you can pass
-Phive-0.12.0 and  parquet.hive.serde.ParquetHiveSerDe will be included as
before.

Michael

On Tue, Dec 2, 2014 at 9:31 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:

 Apologies if people get this more than once -- I sent mail to dev@spark
 last night and don't see it in the archives. Trying the incubator list
 now...wanted to make sure it doesn't get lost in case it's a bug...

 -- Forwarded message --
 From: Yana Kadiyska yana.kadiy...@gmail.com
 Date: Mon, Dec 1, 2014 at 8:10 PM
 Subject: [Thrift,1.2 RC] what happened to
 parquet.hive.serde.ParquetHiveSerDe
 To: dev@spark.apache.org


 Hi all, apologies if this is not a question for the dev list -- figured
 User list might not be appropriate since I'm having trouble with the RC
 tag.

 I just tried deploying the RC and running ThriftServer. I see the following
 error:

 14/12/01 21:31:42 ERROR UserGroupInformation: PriviledgedActionException
 as:anonymous (auth:SIMPLE)
 cause:org.apache.hive.service.cli.HiveSQLException:
 java.lang.RuntimeException:
 MetaException(message:java.lang.ClassNotFoundException Class
 parquet.hive.serde.ParquetHiveSerDe not found)
 14/12/01 21:31:42 WARN ThriftCLIService: Error executing statement:
 org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException:
 MetaException(message:java.lang.ClassNotFoundException Class
 parquet.hive.serde.ParquetHiveSerDe not found)
 at

 org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:192)
 at

 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
 at

 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:212)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at

 org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
 at

 org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
 at

 org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 ​


 I looked at a working installation that I have(build master a few weeks
 ago) and this class used to be included in spark-assembly:

 ls *.jar|xargs grep parquet.hive.serde.ParquetHiveSerDe
 Binary file spark-assembly-1.2.0-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.0.jar
 matches

 but with the RC build it's not there?

 I tried both the prebuilt CDH drop and later manually built the tag with
 the following command:

  ./make-distribution.sh --tgz -Phive -Dhadoop.version=2.0.0-mr1-cdh4.2.0
 -Phive-thriftserver
 $JAVA_HOME/bin/jar -tvf spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar
 |grep parquet.hive.serde.ParquetHiveSerDe

 comes back empty...



object xxx is not a member of package com

2014-12-02 Thread flyson
Hello everyone,

Could anybody tell me how to import and call the 3rd party java classes from
inside spark?
Here's my case:
I have a jar file (the directory layout is com.xxx.yyy.zzz) which contains
some java classes, and I need to call some of them in spark code.
I used the statement import com.xxx.yyy.zzz._ on top of the impacted spark
file and set the location of the jar file in the CLASSPATH environment, and
use .sbt/sbt assembly to build the project. As a result, I got an error
saying object xxx is not a member of package com.

I thought that could be related to the library dependencies, but couldn't
figure it out. Any suggestion/solution from you would be appreciated!

By the way in the scala console, if the :cp is used to point to the jar
file, I can import the classes from the jar file.

Thanks! 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/object-xxx-is-not-a-member-of-package-com-tp9619.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: object xxx is not a member of package com

2014-12-02 Thread Wang, Daoyuan
I think you can place the jar in lib/ in SPARK_HOME, and then compile without 
any change to your class path. This could be a temporary way to include your 
jar. You can also put them in your pom.xml.

Thanks,
Daoyuan

-Original Message-
From: flyson [mailto:m_...@msn.com] 
Sent: Wednesday, December 03, 2014 11:23 AM
To: d...@spark.incubator.apache.org
Subject: object xxx is not a member of package com

Hello everyone,

Could anybody tell me how to import and call the 3rd party java classes from 
inside spark?
Here's my case:
I have a jar file (the directory layout is com.xxx.yyy.zzz) which contains some 
java classes, and I need to call some of them in spark code.
I used the statement import com.xxx.yyy.zzz._ on top of the impacted spark 
file and set the location of the jar file in the CLASSPATH environment, and use 
.sbt/sbt assembly to build the project. As a result, I got an error saying 
object xxx is not a member of package com.

I thought that could be related to the library dependencies, but couldn't 
figure it out. Any suggestion/solution from you would be appreciated!

By the way in the scala console, if the :cp is used to point to the jar file, I 
can import the classes from the jar file.

Thanks! 



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/object-xxx-is-not-a-member-of-package-com-tp9619.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional 
commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org