Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 (non-binding) Verified on OSX 10.10.2, built from source, spark-shell / spark-submit jobs ran various simple Spark / Scala queries ran various SparkSQL queries (including HiveContext) ran ThriftServer service and connected via beeline ran SparkSVD On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com wrote: Hey All, Just an update. Josh, Andrew, and others are working to reproduce SPARK-4498 and fix it. Other than that issue no serious regressions have been reported so far. If we are able to get a fix in for that soon, we'll likely cut another RC with the patch. Continued testing of RC1 is definitely appreciated! I'll leave this vote open to allow folks to continue posting comments. It's fine to still give +1 from your own testing... i.e. you can assume at this point SPARK-4498 will be fixed before releasing. - Patrick On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +0.9 from me. Tested it on Mac and Windows (someone has to do it) and while things work, I noticed a few recent scripts don't have Windows equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and https://issues.apache.org/jira/browse/SPARK-4684. The first one at least would be good to fix if we do another RC. Not blocking the release but useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685. Matei On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote: Hi everyone, There's an open bug report related to Spark standalone which could be a potential release-blocker (pending investigation / a bug fix): https://issues.apache.org/jira/browse/SPARK-4498. This issue seems non-deterministc and only affects long-running Spark standalone deployments, so it may be hard to reproduce. I'm going to work on a patch to add additional logging in order to help with debugging. I just wanted to give an early head's up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com) wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Can the Scala classes in the spark source code, be inherited in Java classes?
Thanks. And @Reynold, sorry my bad, Guess I should have used something like Stackoverflow! On Tue, Dec 2, 2014 at 12:18 PM, Reynold Xin r...@databricks.com wrote: Oops my previous response wasn't sent properly to the dev list. Here you go for archiving. Yes you can. Scala classes are compiled down to classes in bytecode. Take a look at this: https://twitter.github.io/scala_school/java.html Note that questions like this are not exactly what this dev list is meant for ... On Mon, Dec 1, 2014 at 9:22 PM, Niranda Perera nira...@wso2.com wrote: Hi, Can the Scala classes in the spark source code, be inherited (and other OOP concepts) in Java classes? I want to customize some part of the code, but I would like to do it in a Java environment. Rgds -- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44 https://twitter.com/N1R44 -- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44 https://twitter.com/N1R44
Re: Required file not found in building
Thanks Sean, I followed suit (brew install zinc) and that is working. 2014-12-01 22:39 GMT-08:00 Sean Owen so...@cloudera.com: I'm having no problems with the build or zinc on my Mac. I use zinc from brew install zinc. On Tue, Dec 2, 2014 at 3:02 AM, Stephen Boesch java...@gmail.com wrote: Mac as well. Just found the problem: I had created an alias to zinc a couple of months back. Apparently that is not happy with the build anymore. No problem now that the issue has been isolated - just need to fix my zinc alias. 2014-12-01 18:55 GMT-08:00 Ted Yu yuzhih...@gmail.com: I tried the same command on MacBook and didn't experience the same error. Which OS are you using ? Cheers On Mon, Dec 1, 2014 at 6:42 PM, Stephen Boesch java...@gmail.com wrote: It seems there were some additional settings required to build spark now . This should be a snap for most of you ot there about what I am missing. Here is the command line I have traditionally used: mvn -Pyarn -Phadoop-2.3 -Phive install compile package -DskipTests That command line is however failing with the lastest from HEAD: INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-network-common_2.10 --- [INFO] Using zinc server for incremental compilation [INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null) *[error] Required file not found: scala-compiler-2.10.4.jar* *[error] See zinc -help for information about locating necessary files* [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .. SUCCESS [4.077s] [INFO] Spark Project Networking .. FAILURE [0.445s] OK let's try zinc -help: 18:38:00/spark2 $*zinc -help* Nailgun server running with 1 cached compiler Version = 0.3.5.1 Zinc compiler cache limit = 5 Resident scalac cache limit = 0 Analysis cache limit = 5 Compiler(Scala 2.10.4) [74ff364f] Setup = { * scala compiler = /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar* scala library = /Users/steve/.m2/repository/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar scala extra = { /Users/steve/.m2/repository/org/scala-lang/scala-reflect/2.10.4/scala-reflect-2.10.4.jar /shared/zinc-0.3.5.1/lib/scala-reflect.jar } sbt interface = /shared/zinc-0.3.5.1/lib/sbt-interface.jar compiler interface sources = /shared/zinc-0.3.5.1/lib/compiler-interface-sources.jar java home = fork java = false cache directory = /Users/steve/.zinc/0.3.5.1 } Does that compiler jar exist? Yes! 18:39:34/spark2 $ll /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar -rw-r--r-- 1 steve staff 14445780 Apr 9 2014 /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar
Fwd: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe
Apologies if people get this more than once -- I sent mail to dev@spark last night and don't see it in the archives. Trying the incubator list now...wanted to make sure it doesn't get lost in case it's a bug... -- Forwarded message -- From: Yana Kadiyska yana.kadiy...@gmail.com Date: Mon, Dec 1, 2014 at 8:10 PM Subject: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe To: dev@spark.apache.org Hi all, apologies if this is not a question for the dev list -- figured User list might not be appropriate since I'm having trouble with the RC tag. I just tried deploying the RC and running ThriftServer. I see the following error: 14/12/01 21:31:42 ERROR UserGroupInformation: PriviledgedActionException as:anonymous (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class parquet.hive.serde.ParquetHiveSerDe not found) 14/12/01 21:31:42 WARN ThriftCLIService: Error executing statement: org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class parquet.hive.serde.ParquetHiveSerDe not found) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:192) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:212) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) I looked at a working installation that I have(build master a few weeks ago) and this class used to be included in spark-assembly: ls *.jar|xargs grep parquet.hive.serde.ParquetHiveSerDe Binary file spark-assembly-1.2.0-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.0.jar matches but with the RC build it's not there? I tried both the prebuilt CDH drop and later manually built the tag with the following command: ./make-distribution.sh --tgz -Phive -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phive-thriftserver $JAVA_HOME/bin/jar -tvf spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar |grep parquet.hive.serde.ParquetHiveSerDe comes back empty...
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 (non-binding) Installed version pre-built for Hadoop on a private HPC ran PySpark shell w/ iPython loaded data using custom Hadoop input formats ran MLlib routines in PySpark ran custom workflows in PySpark browsed the web UI Noticeable improvements in stability and performance during large shuffles (as well as the elimination of frequent but unpredictable “FileNotFound / too many open files” errors). We initially hit errors during large collects that ran fine in 1.1, but setting the new spark.driver.maxResultSize to 0 preserved the old behavior. Definitely worth highlighting this setting in the release notes, as the new default may be too small for some users and workloads. — Jeremy - jeremyfreeman.net @thefreemanlab On Dec 2, 2014, at 3:22 AM, Denny Lee denny.g@gmail.com wrote: +1 (non-binding) Verified on OSX 10.10.2, built from source, spark-shell / spark-submit jobs ran various simple Spark / Scala queries ran various SparkSQL queries (including HiveContext) ran ThriftServer service and connected via beeline ran SparkSVD On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com wrote: Hey All, Just an update. Josh, Andrew, and others are working to reproduce SPARK-4498 and fix it. Other than that issue no serious regressions have been reported so far. If we are able to get a fix in for that soon, we'll likely cut another RC with the patch. Continued testing of RC1 is definitely appreciated! I'll leave this vote open to allow folks to continue posting comments. It's fine to still give +1 from your own testing... i.e. you can assume at this point SPARK-4498 will be fixed before releasing. - Patrick On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +0.9 from me. Tested it on Mac and Windows (someone has to do it) and while things work, I noticed a few recent scripts don't have Windows equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and https://issues.apache.org/jira/browse/SPARK-4684. The first one at least would be good to fix if we do another RC. Not blocking the release but useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685. Matei On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote: Hi everyone, There's an open bug report related to Spark standalone which could be a potential release-blocker (pending investigation / a bug fix): https://issues.apache.org/jira/browse/SPARK-4498. This issue seems non-deterministc and only affects long-running Spark standalone deployments, so it may be hard to reproduce. I'm going to work on a patch to add additional logging in order to help with debugging. I just wanted to give an early head's up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com) wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick
keeping PR titles / descriptions up to date
Hi all, I've noticed a bunch of times lately where a pull request changes to be pretty different from the original pull request, and the title / description never get updated. Because the pull request title and description are used as the commit message, the incorrect description lives on forever, making it harder to understand the reason behind a particular commit without going back and reading the entire conversation on the pull request. If folks could try to keep these up to date (and committers, try to remember to verify that the title and description are correct before making merging pull requests), that would be awesome. -Kay
Re: keeping PR titles / descriptions up to date
I second that ! Would also be great if the JIRA was updated accordingly too. Regards, Mridul On Wed, Dec 3, 2014 at 1:53 AM, Kay Ousterhout kayousterh...@gmail.com wrote: Hi all, I've noticed a bunch of times lately where a pull request changes to be pretty different from the original pull request, and the title / description never get updated. Because the pull request title and description are used as the commit message, the incorrect description lives on forever, making it harder to understand the reason behind a particular commit without going back and reading the entire conversation on the pull request. If folks could try to keep these up to date (and committers, try to remember to verify that the title and description are correct before making merging pull requests), that would be awesome. -Kay - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1. I also tested on Windows just in case, with jars referring other jars and python files referring other python files. Path resolution still works. 2014-12-02 10:16 GMT-08:00 Jeremy Freeman freeman.jer...@gmail.com: +1 (non-binding) Installed version pre-built for Hadoop on a private HPC ran PySpark shell w/ iPython loaded data using custom Hadoop input formats ran MLlib routines in PySpark ran custom workflows in PySpark browsed the web UI Noticeable improvements in stability and performance during large shuffles (as well as the elimination of frequent but unpredictable “FileNotFound / too many open files” errors). We initially hit errors during large collects that ran fine in 1.1, but setting the new spark.driver.maxResultSize to 0 preserved the old behavior. Definitely worth highlighting this setting in the release notes, as the new default may be too small for some users and workloads. — Jeremy - jeremyfreeman.net @thefreemanlab On Dec 2, 2014, at 3:22 AM, Denny Lee denny.g@gmail.com wrote: +1 (non-binding) Verified on OSX 10.10.2, built from source, spark-shell / spark-submit jobs ran various simple Spark / Scala queries ran various SparkSQL queries (including HiveContext) ran ThriftServer service and connected via beeline ran SparkSVD On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com wrote: Hey All, Just an update. Josh, Andrew, and others are working to reproduce SPARK-4498 and fix it. Other than that issue no serious regressions have been reported so far. If we are able to get a fix in for that soon, we'll likely cut another RC with the patch. Continued testing of RC1 is definitely appreciated! I'll leave this vote open to allow folks to continue posting comments. It's fine to still give +1 from your own testing... i.e. you can assume at this point SPARK-4498 will be fixed before releasing. - Patrick On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +0.9 from me. Tested it on Mac and Windows (someone has to do it) and while things work, I noticed a few recent scripts don't have Windows equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and https://issues.apache.org/jira/browse/SPARK-4684. The first one at least would be good to fix if we do another RC. Not blocking the release but useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685. Matei On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote: Hi everyone, There's an open bug report related to Spark standalone which could be a potential release-blocker (pending investigation / a bug fix): https://issues.apache.org/jira/browse/SPARK-4498. This issue seems non-deterministc and only affects long-running Spark standalone deployments, so it may be hard to reproduce. I'm going to work on a patch to add additional logging in order to help with debugging. I just wanted to give an early head's up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell ( pwend...@gmail.com) wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been
Re: keeping PR titles / descriptions up to date
Also a note on this for committers - it's possible to re-word the title during merging, by just running git commit -a --amend before you push the PR. - Patrick On Tue, Dec 2, 2014 at 12:50 PM, Mridul Muralidharan mri...@gmail.com wrote: I second that ! Would also be great if the JIRA was updated accordingly too. Regards, Mridul On Wed, Dec 3, 2014 at 1:53 AM, Kay Ousterhout kayousterh...@gmail.com wrote: Hi all, I've noticed a bunch of times lately where a pull request changes to be pretty different from the original pull request, and the title / description never get updated. Because the pull request title and description are used as the commit message, the incorrect description lives on forever, making it harder to understand the reason behind a particular commit without going back and reading the entire conversation on the pull request. If folks could try to keep these up to date (and committers, try to remember to verify that the title and description are correct before making merging pull requests), that would be awesome. -Kay - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Announcing Spark 1.1.1!
I am happy to announce the availability of Spark 1.1.1! This is a maintenance release with many bug fixes, most of which are concentrated in the core. This list includes various fixes to sort-based shuffle, memory leak, and spilling issues. Contributions from this release came from 55 developers. Visit the release notes [1] to read about the new features, or download [2] the release today. [1] http://spark.apache.org/releases/spark-release-1-1-1.html [2] http://spark.apache.org/downloads.html Please e-mail me directly for any typo's in the release notes or name listing. Thanks for everyone who contributed, and congratulations! -Andrew
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 tested on yarn. Tom On Friday, November 28, 2014 11:18 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spurious test failures, testing best practices
Following on Mark's Maven examples, here is another related issue I'm having: I'd like to compile just the `core` module after a `mvn clean`, without building an assembly JAR first. Is this possible? Attempting to do it myself, the steps I performed were: - `mvn compile -pl core`: fails because `core` depends on `network/common` and `network/shuffle`, neither of which is installed in my local maven cache (and which don't exist in central Maven repositories, I guess? I thought Spark is publishing snapshot releases?) - `network/shuffle` also depends on `network/common`, so I'll `mvn install` the latter first: `mvn install -DskipTests -pl network/common`. That succeeds, and I see a newly built 1.3.0-SNAPSHOT jar in my local maven repository. - However, `mvn install -DskipTests -pl network/shuffle` subsequently fails, seemingly due to not finding network/core. Here's https://gist.github.com/ryan-williams/1711189e7d0af558738d a sample full output from running `mvn install -X -U -DskipTests -pl network/shuffle` from such a state (the -U was to get around a previous failure based on having cached a failed lookup of network-common-1.3.0-SNAPSHOT). - Thinking maven might be special-casing -SNAPSHOT versions, I tried replacing 1.3.0-SNAPSHOT with 1.3.0.1 globally and repeating these steps, but the error seems to be the same https://gist.github.com/ryan-williams/37fcdd14dd92fa562dbe. Any ideas? Thanks, -Ryan On Sun Nov 30 2014 at 6:37:28 PM Mark Hamstra m...@clearstorydata.com wrote: - Start the SBT interactive console with sbt/sbt - Build your assembly by running the assembly target in the assembly project: assembly/assembly - Run all the tests in one module: core/test - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this also supports tab completion) The equivalent using Maven: - Start zinc - Build your assembly using the mvn package or install target (install is actually the equivalent of SBT's publishLocal) -- this step is the first step in http://spark.apache.org/docs/latest/building-with-maven. html#spark-tests-in-maven - Run all the tests in one module: mvn -pl core test - Run a specific suite: mvn -pl core -DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't strictly necessary if you don't mind waiting for Maven to scan through all the other sub-projects only to do nothing; and, of course, it needs to be something other than core if the test you want to run is in another sub-project.) You also typically want to carry along in each subsequent step any relevant command line options you added in the package/install step. On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi Ryan, As a tip (and maybe this isn't documented well), I normally use SBT for development to avoid the slow build process, and use its interactive console to run only specific tests. The nice advantage is that SBT can keep the Scala compiler loaded and JITed across builds, making it faster to iterate. To use it, you can do the following: - Start the SBT interactive console with sbt/sbt - Build your assembly by running the assembly target in the assembly project: assembly/assembly - Run all the tests in one module: core/test - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this also supports tab completion) Running all the tests does take a while, and I usually just rely on Jenkins for that once I've run the tests for the things I believed my patch could break. But this is because some of them are integration tests (e.g. DistributedSuite, which creates multi-process mini-clusters). Many of the individual suites run fast without requiring this, however, so you can pick the ones you want. Perhaps we should find a way to tag them so people can do a quick-test that skips the integration ones. The assembly builds are annoying but they only take about a minute for me on a MacBook Pro with SBT warmed up. The assembly is actually only required for some of the integration tests (which launch new processes), but I'd recommend doing it all the time anyway since it would be very confusing to run those with an old assembly. The Scala compiler crash issue can also be a problem, but I don't see it very often with SBT. If it happens, I exit SBT and do sbt clean. Anyway, this is useful feedback and I think we should try to improve some of these suites, but hopefully you can also try the faster SBT process. At the end of the day, if we want integration tests, the whole test process will take an hour, but most of the developers I know leave that to Jenkins and only run individual tests locally before submitting a patch. Matei On Nov 30, 2014, at 2:39 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: In the course of trying to make contributions to Spark, I have had a lot of trouble running Spark's tests
Re: Spurious test failures, testing best practices
On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: Following on Mark's Maven examples, here is another related issue I'm having: I'd like to compile just the `core` module after a `mvn clean`, without building an assembly JAR first. Is this possible? Out of curiosity, may I ask why? What's the problem with running mvn install -DskipTests first (or package instead of install, although I generally do the latter)? You can probably do what you want if you manually build / install all the needed dependencies first; you found two, but it seems you're also missing the spark-parent project (which is the top-level pom). That sounds like a lot of trouble though, for not any gains that I can see... after the first build you should be able to do what you want easily. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spurious test failures, testing best practices
Hey Ryan, What if you run a single mvn install to install all libraries locally - then can you mvn compile -pl core? I think this may be the only way to make it work. - Patrick On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: Following on Mark's Maven examples, here is another related issue I'm having: I'd like to compile just the `core` module after a `mvn clean`, without building an assembly JAR first. Is this possible? Attempting to do it myself, the steps I performed were: - `mvn compile -pl core`: fails because `core` depends on `network/common` and `network/shuffle`, neither of which is installed in my local maven cache (and which don't exist in central Maven repositories, I guess? I thought Spark is publishing snapshot releases?) - `network/shuffle` also depends on `network/common`, so I'll `mvn install` the latter first: `mvn install -DskipTests -pl network/common`. That succeeds, and I see a newly built 1.3.0-SNAPSHOT jar in my local maven repository. - However, `mvn install -DskipTests -pl network/shuffle` subsequently fails, seemingly due to not finding network/core. Here's https://gist.github.com/ryan-williams/1711189e7d0af558738d a sample full output from running `mvn install -X -U -DskipTests -pl network/shuffle` from such a state (the -U was to get around a previous failure based on having cached a failed lookup of network-common-1.3.0-SNAPSHOT). - Thinking maven might be special-casing -SNAPSHOT versions, I tried replacing 1.3.0-SNAPSHOT with 1.3.0.1 globally and repeating these steps, but the error seems to be the same https://gist.github.com/ryan-williams/37fcdd14dd92fa562dbe. Any ideas? Thanks, -Ryan On Sun Nov 30 2014 at 6:37:28 PM Mark Hamstra m...@clearstorydata.com wrote: - Start the SBT interactive console with sbt/sbt - Build your assembly by running the assembly target in the assembly project: assembly/assembly - Run all the tests in one module: core/test - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this also supports tab completion) The equivalent using Maven: - Start zinc - Build your assembly using the mvn package or install target (install is actually the equivalent of SBT's publishLocal) -- this step is the first step in http://spark.apache.org/docs/latest/building-with-maven. html#spark-tests-in-maven - Run all the tests in one module: mvn -pl core test - Run a specific suite: mvn -pl core -DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't strictly necessary if you don't mind waiting for Maven to scan through all the other sub-projects only to do nothing; and, of course, it needs to be something other than core if the test you want to run is in another sub-project.) You also typically want to carry along in each subsequent step any relevant command line options you added in the package/install step. On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi Ryan, As a tip (and maybe this isn't documented well), I normally use SBT for development to avoid the slow build process, and use its interactive console to run only specific tests. The nice advantage is that SBT can keep the Scala compiler loaded and JITed across builds, making it faster to iterate. To use it, you can do the following: - Start the SBT interactive console with sbt/sbt - Build your assembly by running the assembly target in the assembly project: assembly/assembly - Run all the tests in one module: core/test - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this also supports tab completion) Running all the tests does take a while, and I usually just rely on Jenkins for that once I've run the tests for the things I believed my patch could break. But this is because some of them are integration tests (e.g. DistributedSuite, which creates multi-process mini-clusters). Many of the individual suites run fast without requiring this, however, so you can pick the ones you want. Perhaps we should find a way to tag them so people can do a quick-test that skips the integration ones. The assembly builds are annoying but they only take about a minute for me on a MacBook Pro with SBT warmed up. The assembly is actually only required for some of the integration tests (which launch new processes), but I'd recommend doing it all the time anyway since it would be very confusing to run those with an old assembly. The Scala compiler crash issue can also be a problem, but I don't see it very often with SBT. If it happens, I exit SBT and do sbt clean. Anyway, this is useful feedback and I think we should try to improve some of these suites, but hopefully you can also try the faster SBT process. At the end of the day, if we want integration tests, the whole test process will take an hour, but most of the developers I know leave that to Jenkins
Re: Spurious test failures, testing best practices
Marcelo: by my count, there are 19 maven modules in the codebase. I am typically only concerned with core (and therefore its two dependencies as well, `network/{shuffle,common}`). The `mvn package` workflow (and its sbt equivalent) that most people apparently use involves (for me) compiling+packaging 16 other modules that I don't care about; I pay this cost whenever I rebase off of master or encounter the sbt-compiler-crash bug, among other possible scenarios. Compiling one module (after building/installing its dependencies) seems like the sort of thing that should be possible, and I don't see why my previously-documented attempt is failing. re: Marcelo's comment about missing the 'spark-parent' project, I saw that error message too and tried to ascertain what it could mean. Why would `network/shuffle` need something from the parent project? AFAICT `network/common` has the same references to the parent project as `network/shuffle` (namely just a parent block in its POM), and yet I can `mvn install -pl` the former but not the latter. Why would this be? One difference is that `network/shuffle` has a dependency on another module, while `network/common` does not. Does Maven not let you build modules that depend on *any* other modules without building *all* modules, or is there a way to do this that we've not found yet? Patrick: per my response to Marcelo above, I am trying to avoid having to compile and package a bunch of stuff I am not using, which both `mvn package` and `mvn install` on the parent project do. On Tue Dec 02 2014 at 3:45:48 PM Marcelo Vanzin van...@cloudera.com wrote: On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: Following on Mark's Maven examples, here is another related issue I'm having: I'd like to compile just the `core` module after a `mvn clean`, without building an assembly JAR first. Is this possible? Out of curiosity, may I ask why? What's the problem with running mvn install -DskipTests first (or package instead of install, although I generally do the latter)? You can probably do what you want if you manually build / install all the needed dependencies first; you found two, but it seems you're also missing the spark-parent project (which is the top-level pom). That sounds like a lot of trouble though, for not any gains that I can see... after the first build you should be able to do what you want easily. -- Marcelo
Re: Spurious test failures, testing best practices
On Tue, Dec 2, 2014 at 3:39 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: Marcelo: by my count, there are 19 maven modules in the codebase. I am typically only concerned with core (and therefore its two dependencies as well, `network/{shuffle,common}`). But you only need to compile the others once. Once you've established the baseline, you can just compile / test core to your heart's desire. Core tests won't even run until you build the assembly anyway, since some of them require the assembly to be present. Also, even if you work in core - I'd say especially if you work in core - you should still, at some point, compile and test everything else that depends on it. So, do this ONCE: mvn install -DskipTests Then do this as many times as you want: mvn -pl spark-core_2.10 something That doesn't seem too bad to me. (Be aware of the assembly comment above, since testing spark-core means you may have to rebuild the assembly from time to time, if your changes affect those tests.) re: Marcelo's comment about missing the 'spark-parent' project, I saw that error message too and tried to ascertain what it could mean. Why would `network/shuffle` need something from the parent project? The spark-parent project is the main pom that defines dependencies and their version, along with lots of build plugins and configurations. It's needed by all modules to compile correctly. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spurious test failures, testing best practices
On Tue Dec 02 2014 at 4:46:20 PM Marcelo Vanzin van...@cloudera.com wrote: On Tue, Dec 2, 2014 at 3:39 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: Marcelo: by my count, there are 19 maven modules in the codebase. I am typically only concerned with core (and therefore its two dependencies as well, `network/{shuffle,common}`). But you only need to compile the others once. once... every time I rebase off master, or am obliged to `mvn clean` by some other build-correctness bug, as I said before. In my experience this works out to a few times per week. Once you've established the baseline, you can just compile / test core to your heart's desire. I understand that this is a workflow that does what I want as a side effect of doing 3-5x more work (depending whether you count [number of modules built] or [lines of scala/java compiled]), none of the extra work being useful to me (more on that below). Core tests won't even run until you build the assembly anyway, since some of them require the assembly to be present. The tests you refer to are exactly the ones that I'd like to let Jenkins run from here on out, per advice I was given elsewhere in this thread and due to the myriad unpleasantries I've encountered in trying to run them myself. Also, even if you work in core - I'd say especially if you work in core - you should still, at some point, compile and test everything else that depends on it. Last response applies. So, do this ONCE: again, s/ONCE/several times a week/, in my experience. mvn install -DskipTests Then do this as many times as you want: mvn -pl spark-core_2.10 something That doesn't seem too bad to me. (Be aware of the assembly comment above, since testing spark-core means you may have to rebuild the assembly from time to time, if your changes affect those tests.) re: Marcelo's comment about missing the 'spark-parent' project, I saw that error message too and tried to ascertain what it could mean. Why would `network/shuffle` need something from the parent project? The spark-parent project is the main pom that defines dependencies and their version, along with lots of build plugins and configurations. It's needed by all modules to compile correctly. - I understand the parent POM has that information. - I don't understand why Maven would feel that it is unable to compile the `network/shuffle` module without having first compiled, packaged, and installed 17 modules (19 minus `network/shuffle` and its dependency `network/common`) that are not transitive dependencies of `network/shuffle`. - I am trying to understand whether my failure to get Maven to compile `network/shuffle` stems from my not knowing the correct incantation to feed to Maven or from Maven's having a different (and seemingly worse) model for how it handles module dependencies than I expected. -- Marcelo
Re: Spurious test failures, testing best practices
On Tue, Dec 2, 2014 at 4:40 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: But you only need to compile the others once. once... every time I rebase off master, or am obliged to `mvn clean` by some other build-correctness bug, as I said before. In my experience this works out to a few times per week. No, you only need to do it something upstream from core changed (i.e., spark-parent, network/common or network/shuffle) in an incompatible way. Otherwise, you can rebase and just recompile / retest core, without having to install everything else. I do this kind of thing all the time. If you have to do mvn clean often you're probably doing something wrong somewhere else. I understand where you're coming from, but the way you're thinking is just not how maven works. I too find annoying that maven requires lots of things to be installed before you can use them, when they're all part of the same project. But well, that's the way things are. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe
In Hive 13 (which is the default for Spark 1.2), parquet is included and thus we no longer include the Hive parquet bundle. You can now use the included ParquetSerDe: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe If you want to compile Spark 1.2 with Hive 12 instead you can pass -Phive-0.12.0 and parquet.hive.serde.ParquetHiveSerDe will be included as before. Michael On Tue, Dec 2, 2014 at 9:31 AM, Yana Kadiyska yana.kadiy...@gmail.com wrote: Apologies if people get this more than once -- I sent mail to dev@spark last night and don't see it in the archives. Trying the incubator list now...wanted to make sure it doesn't get lost in case it's a bug... -- Forwarded message -- From: Yana Kadiyska yana.kadiy...@gmail.com Date: Mon, Dec 1, 2014 at 8:10 PM Subject: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe To: dev@spark.apache.org Hi all, apologies if this is not a question for the dev list -- figured User list might not be appropriate since I'm having trouble with the RC tag. I just tried deploying the RC and running ThriftServer. I see the following error: 14/12/01 21:31:42 ERROR UserGroupInformation: PriviledgedActionException as:anonymous (auth:SIMPLE) cause:org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class parquet.hive.serde.ParquetHiveSerDe not found) 14/12/01 21:31:42 WARN ThriftCLIService: Error executing statement: org.apache.hive.service.cli.HiveSQLException: java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class parquet.hive.serde.ParquetHiveSerDe not found) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:192) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:212) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) I looked at a working installation that I have(build master a few weeks ago) and this class used to be included in spark-assembly: ls *.jar|xargs grep parquet.hive.serde.ParquetHiveSerDe Binary file spark-assembly-1.2.0-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.0.jar matches but with the RC build it's not there? I tried both the prebuilt CDH drop and later manually built the tag with the following command: ./make-distribution.sh --tgz -Phive -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phive-thriftserver $JAVA_HOME/bin/jar -tvf spark-assembly-1.2.0-hadoop2.0.0-mr1-cdh4.2.0.jar |grep parquet.hive.serde.ParquetHiveSerDe comes back empty...
object xxx is not a member of package com
Hello everyone, Could anybody tell me how to import and call the 3rd party java classes from inside spark? Here's my case: I have a jar file (the directory layout is com.xxx.yyy.zzz) which contains some java classes, and I need to call some of them in spark code. I used the statement import com.xxx.yyy.zzz._ on top of the impacted spark file and set the location of the jar file in the CLASSPATH environment, and use .sbt/sbt assembly to build the project. As a result, I got an error saying object xxx is not a member of package com. I thought that could be related to the library dependencies, but couldn't figure it out. Any suggestion/solution from you would be appreciated! By the way in the scala console, if the :cp is used to point to the jar file, I can import the classes from the jar file. Thanks! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/object-xxx-is-not-a-member-of-package-com-tp9619.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
RE: object xxx is not a member of package com
I think you can place the jar in lib/ in SPARK_HOME, and then compile without any change to your class path. This could be a temporary way to include your jar. You can also put them in your pom.xml. Thanks, Daoyuan -Original Message- From: flyson [mailto:m_...@msn.com] Sent: Wednesday, December 03, 2014 11:23 AM To: d...@spark.incubator.apache.org Subject: object xxx is not a member of package com Hello everyone, Could anybody tell me how to import and call the 3rd party java classes from inside spark? Here's my case: I have a jar file (the directory layout is com.xxx.yyy.zzz) which contains some java classes, and I need to call some of them in spark code. I used the statement import com.xxx.yyy.zzz._ on top of the impacted spark file and set the location of the jar file in the CLASSPATH environment, and use .sbt/sbt assembly to build the project. As a result, I got an error saying object xxx is not a member of package com. I thought that could be related to the library dependencies, but couldn't figure it out. Any suggestion/solution from you would be appreciated! By the way in the scala console, if the :cp is used to point to the jar file, I can import the classes from the jar file. Thanks! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/object-xxx-is-not-a-member-of-package-com-tp9619.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org