Re: Compile failure with SBT on master
I used the same command on Linux and it passed: Linux k.net 2.6.32-220.23.1.el6.YAHOO.20120713.x86_64 #1 SMP Fri Jul 13 11:40:51 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux Cheers On Mon, Jun 16, 2014 at 9:29 PM, Andrew Ash and...@andrewash.com wrote: I can't run sbt/sbt gen-idea on a clean checkout of Spark master. I get resolution errors on junit#junit;4.10!junit.zip(source) As shown below: aash@aash-mbp /tmp/git/spark$ sbt/sbt gen-idea Using /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. [info] Loading project definition from /private/tmp/git/spark/project/project [info] Loading project definition from /private/tmp/git/spark/project [info] Set current project to root (in build file:/private/tmp/git/spark/) [info] Creating IDEA module for project 'assembly' ... [info] Updating {file:/private/tmp/git/spark/}core... [info] Resolving org.fusesource.jansi#jansi;1.4 ... [warn] [FAILED ] junit#junit;4.10!junit.zip(source): (0ms) [warn] local: tried [warn] /Users/aash/.ivy2/local/junit/junit/4.10/sources/junit.zip [warn] public: tried [warn] http://repo1.maven.org/maven2/junit/junit/4.10/junit-4.10.zip [warn] Maven Repository: tried [warn] http://repo.maven.apache.org/maven2/junit/junit/4.10/junit-4.10.zip [warn] Apache Repository: tried [warn] https://repository.apache.org/content/repositories/releases/junit/junit/4.10/junit-4.10.zip [warn] JBoss Repository: tried [warn] https://repository.jboss.org/nexus/content/repositories/releases/junit/junit/4.10/junit-4.10.zip [warn] MQTT Repository: tried [warn] https://repo.eclipse.org/content/repositories/paho-releases/junit/junit/4.10/junit-4.10.zip [warn] Cloudera Repository: tried [warn] http://repository.cloudera.com/artifactory/cloudera-repos/junit/junit/4.10/junit-4.10.zip [warn] Pivotal Repository: tried [warn] http://repo.spring.io/libs-release/junit/junit/4.10/junit-4.10.zip [warn] Maven2 Local: tried [warn] file:/Users/aash/.m2/repository/junit/junit/4.10/junit-4.10.zip [warn] :: [warn] :: FAILED DOWNLOADS:: [warn] :: ^ see resolution messages for details ^ :: [warn] :: [warn] :: junit#junit;4.10!junit.zip(source) [warn] :: sbt.ResolveException: download failed: junit#junit;4.10!junit.zip(source) By bumping the junit dependency to 4.11 I'm able to generate the IDE files. Are other people having this problem or does everyone use the maven configuration? Andrew
Re: Compile failure with SBT on master
I didn't get that error on Mac either: java version 1.7.0_55 Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) Darwin TYus-MacBook-Pro.local 12.5.0 Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 On Mon, Jun 16, 2014 at 10:04 PM, Andrew Ash and...@andrewash.com wrote: Maybe it's a Mac OS X thing? On Mon, Jun 16, 2014 at 9:57 PM, Ted Yu yuzhih...@gmail.com wrote: I used the same command on Linux and it passed: Linux k.net 2.6.32-220.23.1.el6.YAHOO.20120713.x86_64 #1 SMP Fri Jul 13 11:40:51 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux Cheers On Mon, Jun 16, 2014 at 9:29 PM, Andrew Ash and...@andrewash.com wrote: I can't run sbt/sbt gen-idea on a clean checkout of Spark master. I get resolution errors on junit#junit;4.10!junit.zip(source) As shown below: aash@aash-mbp /tmp/git/spark$ sbt/sbt gen-idea Using /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. [info] Loading project definition from /private/tmp/git/spark/project/project [info] Loading project definition from /private/tmp/git/spark/project [info] Set current project to root (in build file:/private/tmp/git/spark/) [info] Creating IDEA module for project 'assembly' ... [info] Updating {file:/private/tmp/git/spark/}core... [info] Resolving org.fusesource.jansi#jansi;1.4 ... [warn] [FAILED ] junit#junit;4.10!junit.zip(source): (0ms) [warn] local: tried [warn] /Users/aash/.ivy2/local/junit/junit/4.10/sources/junit.zip [warn] public: tried [warn] http://repo1.maven.org/maven2/junit/junit/4.10/junit-4.10.zip [warn] Maven Repository: tried [warn] http://repo.maven.apache.org/maven2/junit/junit/4.10/junit-4.10.zip [warn] Apache Repository: tried [warn] https://repository.apache.org/content/repositories/releases/junit/junit/4.10/junit-4.10.zip [warn] JBoss Repository: tried [warn] https://repository.jboss.org/nexus/content/repositories/releases/junit/junit/4.10/junit-4.10.zip [warn] MQTT Repository: tried [warn] https://repo.eclipse.org/content/repositories/paho-releases/junit/junit/4.10/junit-4.10.zip [warn] Cloudera Repository: tried [warn] http://repository.cloudera.com/artifactory/cloudera-repos/junit/junit/4.10/junit-4.10.zip [warn] Pivotal Repository: tried [warn] http://repo.spring.io/libs-release/junit/junit/4.10/junit-4.10.zip [warn] Maven2 Local: tried [warn] file:/Users/aash/.m2/repository/junit/junit/4.10/junit-4.10.zip [warn] :: [warn] :: FAILED DOWNLOADS:: [warn] :: ^ see resolution messages for details ^ :: [warn] :: [warn] :: junit#junit;4.10!junit.zip(source) [warn] :: sbt.ResolveException: download failed: junit#junit;4.10!junit.zip(source) By bumping the junit dependency to 4.11 I'm able to generate the IDE files. Are other people having this problem or does everyone use the maven configuration? Andrew
Re: (send this email to subscribe)
See http://spark.apache.org/news/spark-mailing-lists-moving-to-apache.html Cheers On Jul 8, 2014, at 4:17 AM, Leon Zhang leonca...@gmail.com wrote:
Re: (send this email to subscribe)
This is the correct page: http://spark.apache.org/community.html Cheers On Jul 8, 2014, at 4:43 AM, Ted Yu yuzhih...@gmail.com wrote: See http://spark.apache.org/news/spark-mailing-lists-moving-to-apache.html Cheers On Jul 8, 2014, at 4:17 AM, Leon Zhang leonca...@gmail.com wrote:
Re: [VOTE] Release Apache Spark 1.0.2 (RC1)
HADOOP-10456 is fixed in hadoop 2.4.1 Does this mean that synchronization on HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK can be bypassed for hadoop 2.4.1 ? Cheers On Fri, Jul 25, 2014 at 6:00 PM, Patrick Wendell pwend...@gmail.com wrote: The most important issue in this release is actually an ammendment to an earlier fix. The original fix caused a deadlock which was a regression from 1.0.0-1.0.1: Issue: https://issues.apache.org/jira/browse/SPARK-1097 1.0.1 Fix: https://github.com/apache/spark/pull/1273/files (had a deadlock) 1.0.2 Fix: https://github.com/apache/spark/pull/1409/files I failed to correctly label this on JIRA, but I've updated it! On Fri, Jul 25, 2014 at 5:35 PM, Michael Armbrust mich...@databricks.com wrote: That query is looking at Fix Version not Target Version. The fact that the first one is still open is only because the bug is not resolved in master. It is fixed in 1.0.2. The second one is partially fixed in 1.0.2, but is not worth blocking the release for. On Fri, Jul 25, 2014 at 4:23 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: TD, there are a couple of unresolved issues slated for 1.0.2 https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%201.0.2%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC . Should they be edited somehow? On Fri, Jul 25, 2014 at 7:08 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.2. This release fixes a number of bugs in Spark 1.0.1. Some of the notable ones are - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for SPARK-1199. The fix was reverted for 1.0.2. - SPARK-2576: NoClassDefFoundError when executing Spark QL query on HDFS CSV file. The full list is at http://s.apache.org/9NJ The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f The release files, including signatures, digests, etc can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1024/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/ Please vote on releasing this package as Apache Spark 1.0.2! The vote is open until Tuesday, July 29, at 23:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.2 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/
Re: Working Formula for Hive 0.13?
I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
After manually copying hive 0.13.1 jars to local maven repo, I got the following errors when building spark-hive_2.10 module : [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182: type mismatch; found : String required: Array[String] [ERROR] val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf) [ERROR] ^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60: value getAllPartitionsForPruner is not a member of org.apache. hadoop.hive.ql.metadata.Hive [ERROR] client.getAllPartitionsForPruner(table).toSeq [ERROR]^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267: overloaded method constructor TableDesc with alternatives: (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2: Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc and ()org.apache.hadoop.hive.ql.plan.TableDesc cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer], Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in value tableDesc)(in value tableDesc)], java.util.Properties) [ERROR] val tableDesc = new TableDesc( [ERROR] ^ [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing with a stub. [WARNING] Class org.antlr.runtime.Token not found - continuing with a stub. [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a stub. [ERROR] while compiling: /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 The above shows incompatible changes between 0.12 and 0.13.1 e.g. the first error corresponds to the following method in CommandProcessorFactory : public static CommandProcessor get(String[] cmd, HiveConf conf) Cheers On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com wrote: So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s a release date schedule for 0.14. On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote: Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that¹s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote: I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote: Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its
Re: Working Formula for Hive 0.13?
I was looking for a class where reflection-related code should reside. I found this but don't think it is the proper class for bridging differences between hive 0.12 and 0.13.1: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala Cheers On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote: After manually copying hive 0.13.1 jars to local maven repo, I got the following errors when building spark-hive_2.10 module : [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182: type mismatch; found : String required: Array[String] [ERROR] val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf) [ERROR] ^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60: value getAllPartitionsForPruner is not a member of org.apache. hadoop.hive.ql.metadata.Hive [ERROR] client.getAllPartitionsForPruner(table).toSeq [ERROR]^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267: overloaded method constructor TableDesc with alternatives: (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2: Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc and ()org.apache.hadoop.hive.ql.plan.TableDesc cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer], Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in value tableDesc)(in value tableDesc)], java.util.Properties) [ERROR] val tableDesc = new TableDesc( [ERROR] ^ [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing with a stub. [WARNING] Class org.antlr.runtime.Token not found - continuing with a stub. [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a stub. [ERROR] while compiling: /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 The above shows incompatible changes between 0.12 and 0.13.1 e.g. the first error corresponds to the following method in CommandProcessorFactory : public static CommandProcessor get(String[] cmd, HiveConf conf) Cheers On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com wrote: So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s a release date schedule for 0.14. On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote: Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that¹s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote: I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote: Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local
Re: Working Formula for Hive 0.13?
bq. Either way its unclear to if there is any reason to use reflection to support multiple versions, instead of just upgrading to Hive 0.13.0 Which Spark release would this Hive upgrade take place ? I agree it is cleaner to upgrade Hive dependency vs. introducing reflection. Cheers On Mon, Jul 28, 2014 at 5:22 PM, Michael Armbrust mich...@databricks.com wrote: A few things: - When we upgrade to Hive 0.13.0, Patrick will likely republish the hive-exec jar just as we did for 0.12.0 - Since we have to tie into some pretty low level APIs it is unsurprising that the code doesn't just compile out of the box against 0.13.0 - ScalaReflection is for determining Schema from Scala classes, not reflection based bridge code. Either way its unclear to if there is any reason to use reflection to support multiple versions, instead of just upgrading to Hive 0.13.0 One question I have is, What is the goal of upgrading to hive 0.13.0? Is it purely because you are having problems connecting to newer metastores? Are there some features you are hoping for? This will help me prioritize this effort. Michael On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote: I was looking for a class where reflection-related code should reside. I found this but don't think it is the proper class for bridging differences between hive 0.12 and 0.13.1: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala Cheers On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote: After manually copying hive 0.13.1 jars to local maven repo, I got the following errors when building spark-hive_2.10 module : [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182: type mismatch; found : String required: Array[String] [ERROR] val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf) [ERROR] ^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60: value getAllPartitionsForPruner is not a member of org.apache. hadoop.hive.ql.metadata.Hive [ERROR] client.getAllPartitionsForPruner(table).toSeq [ERROR]^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267: overloaded method constructor TableDesc with alternatives: (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2: Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc and ()org.apache.hadoop.hive.ql.plan.TableDesc cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer], Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in value tableDesc)(in value tableDesc)], java.util.Properties) [ERROR] val tableDesc = new TableDesc( [ERROR] ^ [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing with a stub. [WARNING] Class org.antlr.runtime.Token not found - continuing with a stub. [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a stub. [ERROR] while compiling: /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 The above shows incompatible changes between 0.12 and 0.13.1 e.g. the first error corresponds to the following method in CommandProcessorFactory : public static CommandProcessor get(String[] cmd, HiveConf conf) Cheers On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com wrote: So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s a release date schedule for 0.14. On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote: Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that¹s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote: I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did
Re: subscribe dev list for spark
See Mailing list section of: https://spark.apache.org/community.html On Wed, Jul 30, 2014 at 6:53 PM, Grace syso...@gmail.com wrote:
Re: failed to build spark with maven for both 1.0.1 and latest master branch
The following command succeeded (on Linux) on Spark master checked out this morning: mvn -Pyarn -Phive -Phadoop-2.4 -DskipTests install FYI On Thu, Jul 31, 2014 at 1:36 PM, yao yaosheng...@gmail.com wrote: Hi TD, I've asked my colleagues to do the same thing but compile still fails. However, maven build succeeded once I built it on my personal macbook (with the latest MacOS Yosemite). So I guess there might be something wrong in my build environment. Wonder if anyone tried to compile spark using maven under Mavericks, please let me know your result. Thanks Shengzhe On Thu, Jul 31, 2014 at 1:25 AM, Tathagata Das tathagata.das1...@gmail.com wrote: Does a mvn clean or sbt/sbt clean help? TD On Wed, Jul 30, 2014 at 9:25 PM, yao yaosheng...@gmail.com wrote: Hi Folks, Today I am trying to build spark using maven; however, the following command failed consistently for both 1.0.1 and the latest master. (BTW, it seems sbt works fine: *sbt/sbt -Dhadoop.version=2.4.0 -Pyarn clean assembly)* Environment: Mac OS Mavericks Maven: 3.2.2 (installed by homebrew) *export M2_HOME=/usr/local/Cellar/maven/3.2.2/libexec/export PATH=$M2_HOME/bin:$PATHexport MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512mmvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package* Build outputs: [INFO] Scanning for projects... [INFO] [INFO] Reactor Build Order: [INFO] [INFO] Spark Project Parent POM [INFO] Spark Project Core [INFO] Spark Project Bagel [INFO] Spark Project GraphX [INFO] Spark Project ML Library [INFO] Spark Project Streaming [INFO] Spark Project Tools [INFO] Spark Project Catalyst [INFO] Spark Project SQL [INFO] Spark Project Hive [INFO] Spark Project REPL [INFO] Spark Project YARN Parent POM [INFO] Spark Project YARN Stable API [INFO] Spark Project Assembly [INFO] Spark Project External Twitter [INFO] Spark Project External Kafka [INFO] Spark Project External Flume [INFO] Spark Project External ZeroMQ [INFO] Spark Project External MQTT [INFO] Spark Project Examples [INFO] [INFO] [INFO] Building Spark Project Parent POM 1.0.1 [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-parent --- [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-versions) @ spark-parent --- [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-scala-sources) @ spark-parent --- [INFO] Source directory: /Users/syao/git/grid/thirdparty/spark/src/main/scala added. [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-parent --- [INFO] [INFO] --- scala-maven-plugin:3.1.6:add-source (scala-compile-first) @ spark-parent --- [INFO] Add Test Source directory: /Users/syao/git/grid/thirdparty/spark/src/test/scala [INFO] [INFO] --- scala-maven-plugin:3.1.6:compile (scala-compile-first) @ spark-parent --- [INFO] No sources to compile [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-test-source (add-scala-test-sources) @ spark-parent --- [INFO] Test Source directory: /Users/syao/git/grid/thirdparty/spark/src/test/scala added. [INFO] [INFO] --- scala-maven-plugin:3.1.6:testCompile (scala-test-compile-first) @ spark-parent --- [INFO] No sources to compile [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ spark-parent --- [INFO] [INFO] --- maven-source-plugin:2.2.1:jar-no-fork (create-source-jar) @ spark-parent --- [INFO] [INFO] [INFO] Building Spark Project Core 1.0.1 [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-core_2.10 --- [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-versions) @ spark-core_2.10 --- [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-scala-sources) @ spark-core_2.10 --- [INFO] Source directory: /Users/syao/git/grid/thirdparty/spark/core/src/main/scala added. [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-core_2.10 --- [INFO] [INFO] --- exec-maven-plugin:1.2.1:exec (default) @ spark-core_2.10 --- Archive: lib/py4j-0.8.1-src.zip inflating: build/py4j/tests/java_map_test.py extracting: build/py4j/tests/__init__.py inflating: build/py4j/tests/java_gateway_test.py inflating: build/py4j/tests/java_callback_test.py inflating: build/py4j/tests/java_list_test.py
compilation error in Catalyst module
I refreshed my workspace. I got the following error with this command: mvn -Pyarn -Phive -Phadoop-2.4 -DskipTests install [ERROR] bad symbolic reference. A signature in package.class refers to term scalalogging in package com.typesafe which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling package.class. [ERROR] /homes/hortonzy/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/package.scala:36: bad symbolic reference. A signature in package.class refers to term slf4j in value com.typesafe.scalalogging which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling package.class. [ERROR] package object trees extends Logging { [ERROR] ^ [ERROR] two errors found Has anyone else seen the above ? Thanks
Re: compilation error in Catalyst module
Forgot to do that step. Now compilation passes. On Wed, Aug 6, 2014 at 1:36 PM, Zongheng Yang zonghen...@gmail.com wrote: Hi Ted, By refreshing do you mean you have done 'mvn clean'? On Wed, Aug 6, 2014 at 1:17 PM, Ted Yu yuzhih...@gmail.com wrote: I refreshed my workspace. I got the following error with this command: mvn -Pyarn -Phive -Phadoop-2.4 -DskipTests install [ERROR] bad symbolic reference. A signature in package.class refers to term scalalogging in package com.typesafe which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling package.class. [ERROR] /homes/hortonzy/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/package.scala:36: bad symbolic reference. A signature in package.class refers to term slf4j in value com.typesafe.scalalogging which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling package.class. [ERROR] package object trees extends Logging { [ERROR] ^ [ERROR] two errors found Has anyone else seen the above ? Thanks
Re: Unit tests in 5 minutes
How about using parallel execution feature of maven-surefire-plugin (assuming all the tests were made parallel friendly) ? http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html Cheers On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen so...@cloudera.com wrote: A common approach is to separate unit tests from integration tests. Maven has support for this distinction. I'm not sure it helps a lot though, since it only helps you to not run integration tests all the time. But lots of Spark tests are integration-test-like and are important to run to know a change works. I haven't heard of a plugin to run different test suites remotely on many machines, but I would not be surprised if it exists. The Jenkins servers aren't CPU-bound as far as I can tell. It's that the tests spend a lot of time waiting for bits to start up or complete. That implies the existing tests could be sped up by just running in parallel locally. I recall someone recently proposed this? And I think the problem with that is simply that some of the tests collide with each other, by opening up the same port at the same time for example. I know that kind of problem is being attacked even right now. But if all the tests were made parallel friendly, I imagine parallelism could be enabled and speed up builds greatly without any remote machines. On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Howdy, Do we think it's both feasible and worthwhile to invest in getting our unit tests to finish in under 5 minutes (or something similarly brief) when run by Jenkins? Unit tests currently seem to take anywhere from 30 min to 2 hours. As people add more tests, I imagine this time will only grow. I think it would be better for both contributors and reviewers if they didn't have to wait so long for test results; PR reviews would be shorter, if nothing else. I don't know how how this is normally done, but maybe it wouldn't be too much work to get a test cycle to feel lighter. Most unit tests are independent and can be run concurrently, right? Would it make sense to build a given patch on many servers at once and send disjoint sets of unit tests to each? I'd be interested in working on something like that if possible (and sensible). Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
reference to dstream in package org.apache.spark.streaming which is not available
Hi, Using the following command on (refreshed) master branch: mvn clean package -DskipTests I got: constituent[36]: file:/homes/hortonzy/apache-maven-3.1.1/conf/logging/ --- java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) Caused by: scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in TestSuiteBase.class refers to term dstream in package org.apache.spark.streaming which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling TestSuiteBase.class. at scala.reflect.internal.pickling.UnPickler$Scan.toTypeError(UnPickler.scala:847) at scala.reflect.internal.pickling.UnPickler$Scan$LazyTypeRef.complete(UnPickler.scala:854) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1231) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply(Types.scala:4280) at scala.reflect.internal.Types$TypeMap$$anonfun$noChangeToSymbols$1.apply(Types.scala:4280) at scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:70) at scala.collection.immutable.List.forall(List.scala:84) at scala.reflect.internal.Types$TypeMap.noChangeToSymbols(Types.scala:4280) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4293) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4196) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$TypeMap.mapOver(Types.scala:4202) at scala.reflect.internal.Types$AsSeenFromMap.apply(Types.scala:4638) at scala.reflect.internal.Types$Type.asSeenFrom(Types.scala:754) at scala.reflect.internal.Types$Type.memberInfo(Types.scala:773) at xsbt.ExtractAPI.defDef(ExtractAPI.scala:224) at xsbt.ExtractAPI.xsbt$ExtractAPI$$definition(ExtractAPI.scala:315) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply(ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$xsbt$ExtractAPI$$processDefinitions$1.apply(ExtractAPI.scala:296) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108) at xsbt.ExtractAPI.xsbt$ExtractAPI$$processDefinitions(ExtractAPI.scala:296) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.ExtractAPI$$anonfun$mkStructure$4.apply(ExtractAPI.scala:293) at xsbt.Message$$anon$1.apply(Message.scala:8) at xsbti.SafeLazy$$anonfun$apply$1.apply(SafeLazy.scala:8) at xsbti.SafeLazy$Impl._t$lzycompute(SafeLazy.scala:20) at xsbti.SafeLazy$Impl._t(SafeLazy.scala:18) at xsbti.SafeLazy$Impl.get(SafeLazy.scala:24) at xsbt.ExtractAPI$$anonfun$forceStructures$1.apply(ExtractAPI.scala:138) at xsbt.ExtractAPI$$anonfun$forceStructures$1.apply(ExtractAPI.scala:138) at scala.collection.immutable.List.foreach(List.scala:318) at xsbt.ExtractAPI.forceStructures(ExtractAPI.scala:138) at xsbt.ExtractAPI.forceStructures(ExtractAPI.scala:139) at xsbt.API$ApiPhase.processScalaUnit(API.scala:54) at xsbt.API$ApiPhase.processUnit(API.scala:38) at xsbt.API$ApiPhase$$anonfun$run$1.apply(API.scala:34) at xsbt.API$ApiPhase$$anonfun$run$1.apply(API.scala:34) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at xsbt.API$ApiPhase.run(API.scala:34) at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) at scala.tools.nsc.Global$Run.compile(Global.scala:1662) at xsbt.CachedCompiler0.run(CompilerInterface.scala:123) at xsbt.CachedCompiler0.run(CompilerInterface.scala:99) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at
Re: Dependency hell in Spark applications
From output of dependency:tree: [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-streaming_2.10 --- [INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile ... [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile [INFO] | | +- commons-codec:commons-codec:jar:1.5:compile [INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile [INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile bq. excluding httpclient from spark-streaming dependency in your sbt/maven project This should work. On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das tathagata.das1...@gmail.com wrote: If httpClient dependency is coming from Hive, you could build Spark without Hive. Alternatively, have you tried excluding httpclient from spark-streaming dependency in your sbt/maven project? TD On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote: custom spark builds should not be the answer. at least not if spark ever wants to have a vibrant community for spark apps. spark does support a user-classpath-first option, which would deal with some of these issues, but I don't think it works. On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote: Hi, I run into the same issue and apart from the ideas Aniket said, I only could find a nasty workaround. Add my custom PoolingClientConnectionManager to my classpath. http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote: Dumb question -- are you using a Spark build that includes the Kinesis dependency? that build would have resolved conflicts like this for you. Your app would need to use the same version of the Kinesis client SDK, ideally. All of these ideas are well-known, yes. In cases of super-common dependencies like Guava, they are already shaded. This is a less-common source of conflicts so I don't think http-client is shaded, especially since it is not used directly by Spark. I think this is a case of your app conflicting with a third-party dependency? I think OSGi is deemed too over the top for things like this. On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) I can create a custom Spark build with org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I was wondering if this is something Spark devs have noticed and are looking to resolve in near releases. Here are my thoughts on this issue: Containers that allow running custom user code have to often resolve dependency
BasicOperationsSuite failing ?
Hi, Running test suite in trunk, I got: ^[[32mBasicOperationsSuite:^[[0m ^[[32m- map^[[0m ^[[32m- flatMap^[[0m ^[[32m- filter^[[0m ^[[32m- glom^[[0m ^[[32m- mapPartitions^[[0m ^[[32m- repartition (more partitions)^[[0m ^[[32m- repartition (fewer partitions)^[[0m ^[[32m- groupByKey^[[0m ^[[32m- reduceByKey^[[0m ^[[32m- reduce^[[0m ^[[32m- count^[[0m ^[[32m- countByValue^[[0m ^[[32m- mapValues^[[0m ^[[32m- flatMapValues^[[0m ^[[32m- union^[[0m ^[[32m- StreamingContext.union^[[0m ^[[32m- transform^[[0m ^[[32m- transformWith^[[0m ^[[32m- StreamingContext.transform^[[0m ^[[32m- cogroup^[[0m ^[[32m- join^[[0m ^[[32m- leftOuterJoin^[[0m ^[[32m- rightOuterJoin^[[0m ^[[32m- fullOuterJoin^[[0m ^[[32m- updateStateByKey^[[0m ^[[32m- updateStateByKey - object lifecycle^[[0m ^[[32m- slice^[[0m ^[[32m- slice - has not been initialized^[[0m ^[[32m- rdd cleanup - map and window^[[0m ^[[32m- rdd cleanup - updateStateByKey^[[0m ^[[31m- rdd cleanup - input blocks and persisted RDDs *** FAILED ***^[[0m ^[[31m org.scalatest.exceptions.TestFailedException was thrown. (BasicOperationsSuite.scala:528)^[[0m However, using sbt for this testsuite, it seemed to pass: [info] - slice - has not been initialized [info] - rdd cleanup - map and window [info] - rdd cleanup - updateStateByKey Exception in thread Thread-561 org.apache.spark.SparkException: Job cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:701) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:700) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:700) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1406) at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201) at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163) at akka.actor.ActorCell.terminate(ActorCell.scala:338) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262) at akka.dispatch.Mailbox.run(Mailbox.scala:218) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [info] - rdd cleanup - input blocks and persisted RDDs [info] ScalaTest [info] Run completed in 1 minute, 1 second. [info] Total number of tests run: 31 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [info] Passed: Total 31, Failed 0, Errors 0, Passed 31 java.lang.AssertionError: assertion failed: List(object package$DebugNode, object package$DebugNode) at scala.reflect.internal.Symbols$Symbol.suchThat(Symbols.scala:1678) at scala.reflect.internal.Symbols$ClassSymbol.companionModule0(Symbols.scala:2988) at scala.reflect.internal.Symbols$ClassSymbol.companionModule(Symbols.scala:2991) at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genClass(GenASM.scala:1371) at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.run(GenASM.scala:120) at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) at scala.tools.nsc.Global$Run.compile(Global.scala:1662) at xsbt.CachedCompiler0.run(CompilerInterface.scala:123) at xsbt.CachedCompiler0.run(CompilerInterface.scala:99) at xsbt.CompilerInterface.run(CompilerInterface.scala:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:102) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:48) at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41) at sbt.compiler.AggressiveCompile$$anonfun$3$$anonfun$compileScala$1$1.apply$mcV$sp(AggressiveCompile.scala:99) at sbt.compiler.AggressiveCompile$$anonfun$3$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:99) at sbt.compiler.AggressiveCompile$$anonfun$3$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:99) at sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:166) at
Re: Extending Scala style checks
Please take a look at WhitespaceEndOfLineChecker under: http://www.scalastyle.org/rules-0.1.0.html Cheers On Wed, Oct 1, 2014 at 2:01 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As discussed here https://github.com/apache/spark/pull/2619, it would be good to extend our Scala style checks to programmatically enforce as many of our style rules as possible. Does anyone know if it's relatively straightforward to enforce additional rules like the no trailing spaces rule mentioned in the linked PR? Nick
Re: something wrong with Jenkins or something untested merged?
I performed build on latest master branch but didn't get compilation error. FYI On Mon, Oct 20, 2014 at 3:51 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, I just submitted a patch https://github.com/apache/spark/pull/2864/files with one line change but the Jenkins told me it's failed to compile on the unrelated files? https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21935/console Best, Nan
Re: scalastyle annoys me a little bit
Koert: Have you tried adding the following on your commandline ? -Dscalastyle.failOnViolation=false Cheers On Thu, Oct 23, 2014 at 11:07 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Koert, I think disabling the style checks in maven package could be a good idea for the reason you point out. I was sort of mixed on that when it was proposed for this exact reason. It's just annoying to developers. In terms of changing the global limit, this is more religion than anything else, but there are other cases where the current limit is useful (e.g. if you have many windows open in a large screen). - Patrick On Thu, Oct 23, 2014 at 11:03 AM, Koert Kuipers ko...@tresata.com wrote: 100 max width seems very restrictive to me. even the most restrictive environment i have for development (ssh with emacs) i get a lot more characters to work with than that. personally i find the code harder to read, not easier. like i kept wondering why there are weird newlines in the middle of constructors and such, only to realise later it was because of the 100 character limit. also, i find mvn package erroring out because of style errors somewhat excessive. i understand that a pull request needs to conform to the style before being accepted, but this means i cant even run tests on code that does not conform to the style guide, which is a bit silly. i keep going out for coffee while package and tests run, only to come back for an annoying error that my line is 101 characters and therefore nothing ran. is there some maven switch to disable the style checks? best! koert - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: scalastyle annoys me a little bit
Koert: If you have time, you can try this diff - with which you would be able to specify the following on the command line: -Dscalastyle.failonviolation=false diff --git a/pom.xml b/pom.xml index 687cc63..108585e 100644 --- a/pom.xml +++ b/pom.xml @@ -123,6 +123,7 @@ log4j.version1.2.17/log4j.version hadoop.version1.0.4/hadoop.version protobuf.version2.4.1/protobuf.version +scalastyle.failonviolationtrue/scalastyle.failonviolation yarn.version${hadoop.version}/yarn.version hbase.version0.94.6/hbase.version flume.version1.4.0/flume.version @@ -1071,7 +1072,7 @@ version0.4.0/version configuration verbosefalse/verbose - failOnViolationtrue/failOnViolation + failOnViolation${scalastyle.failonviolation}/failOnViolation includeTestSourceDirectoryfalse/includeTestSourceDirectory failOnWarningfalse/failOnWarning sourceDirectory${basedir}/src/main/scala/sourceDirectory On Thu, Oct 23, 2014 at 12:07 PM, Koert Kuipers ko...@tresata.com wrote: Hey Ted, i tried: mvn clean package -DskipTests -Dscalastyle.failOnViolation=false no luck, still get [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-core_2.10: Failed during scalastyle execution: You have 3 Scalastyle violation(s). - [Help 1] On Thu, Oct 23, 2014 at 2:14 PM, Ted Yu yuzhih...@gmail.com wrote: Koert: Have you tried adding the following on your commandline ? -Dscalastyle.failOnViolation=false Cheers On Thu, Oct 23, 2014 at 11:07 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Koert, I think disabling the style checks in maven package could be a good idea for the reason you point out. I was sort of mixed on that when it was proposed for this exact reason. It's just annoying to developers. In terms of changing the global limit, this is more religion than anything else, but there are other cases where the current limit is useful (e.g. if you have many windows open in a large screen). - Patrick On Thu, Oct 23, 2014 at 11:03 AM, Koert Kuipers ko...@tresata.com wrote: 100 max width seems very restrictive to me. even the most restrictive environment i have for development (ssh with emacs) i get a lot more characters to work with than that. personally i find the code harder to read, not easier. like i kept wondering why there are weird newlines in the middle of constructors and such, only to realise later it was because of the 100 character limit. also, i find mvn package erroring out because of style errors somewhat excessive. i understand that a pull request needs to conform to the style before being accepted, but this means i cant even run tests on code that does not conform to the style guide, which is a bit silly. i keep going out for coffee while package and tests run, only to come back for an annoying error that my line is 101 characters and therefore nothing ran. is there some maven switch to disable the style checks? best! koert - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: scalastyle annoys me a little bit
Created SPARK-4066 and attached patch there. On Thu, Oct 23, 2014 at 1:07 PM, Koert Kuipers ko...@tresata.com wrote: great thanks i will do that On Thu, Oct 23, 2014 at 3:55 PM, Ted Yu yuzhih...@gmail.com wrote: Koert: If you have time, you can try this diff - with which you would be able to specify the following on the command line: -Dscalastyle.failonviolation=false diff --git a/pom.xml b/pom.xml index 687cc63..108585e 100644 --- a/pom.xml +++ b/pom.xml @@ -123,6 +123,7 @@ log4j.version1.2.17/log4j.version hadoop.version1.0.4/hadoop.version protobuf.version2.4.1/protobuf.version +scalastyle.failonviolationtrue/scalastyle.failonviolation yarn.version${hadoop.version}/yarn.version hbase.version0.94.6/hbase.version flume.version1.4.0/flume.version @@ -1071,7 +1072,7 @@ version0.4.0/version configuration verbosefalse/verbose - failOnViolationtrue/failOnViolation + failOnViolation${scalastyle.failonviolation}/failOnViolation includeTestSourceDirectoryfalse/includeTestSourceDirectory failOnWarningfalse/failOnWarning sourceDirectory${basedir}/src/main/scala/sourceDirectory On Thu, Oct 23, 2014 at 12:07 PM, Koert Kuipers ko...@tresata.com wrote: Hey Ted, i tried: mvn clean package -DskipTests -Dscalastyle.failOnViolation=false no luck, still get [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-core_2.10: Failed during scalastyle execution: You have 3 Scalastyle violation(s). - [Help 1] On Thu, Oct 23, 2014 at 2:14 PM, Ted Yu yuzhih...@gmail.com wrote: Koert: Have you tried adding the following on your commandline ? -Dscalastyle.failOnViolation=false Cheers On Thu, Oct 23, 2014 at 11:07 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Koert, I think disabling the style checks in maven package could be a good idea for the reason you point out. I was sort of mixed on that when it was proposed for this exact reason. It's just annoying to developers. In terms of changing the global limit, this is more religion than anything else, but there are other cases where the current limit is useful (e.g. if you have many windows open in a large screen). - Patrick On Thu, Oct 23, 2014 at 11:03 AM, Koert Kuipers ko...@tresata.com wrote: 100 max width seems very restrictive to me. even the most restrictive environment i have for development (ssh with emacs) i get a lot more characters to work with than that. personally i find the code harder to read, not easier. like i kept wondering why there are weird newlines in the middle of constructors and such, only to realise later it was because of the 100 character limit. also, i find mvn package erroring out because of style errors somewhat excessive. i understand that a pull request needs to conform to the style before being accepted, but this means i cant even run tests on code that does not conform to the style guide, which is a bit silly. i keep going out for coffee while package and tests run, only to come back for an annoying error that my line is 101 characters and therefore nothing ran. is there some maven switch to disable the style checks? best! koert - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: create_image.sh contains broken hadoop web link
Have you seen this thread ? http://search-hadoop.com/m/LgpTk2Pnw6O/andrew+apache+mirrorsubj=Re+All+mirrored+download+links+from+the+Apache+Hadoop+site+are+broken Cheers On Wed, Nov 5, 2014 at 7:36 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As part of my work for SPARK-3821 https://issues.apache.org/jira/browse/SPARK-3821, I tried building an AMI today using create_image.sh. This line https://github.com/mesos/spark-ec2/blob/f6773584dd71afc49f1225be48439653313c0341/create_image.sh#L68 appears to be broken now (it wasn’t a week or so ago). This link appears to be broken: http://apache.mirrors.tds.net/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz Is this temporary? Should we update this to something else? Nick
Re: create_image.sh contains broken hadoop web link
The artifacts are in archive: http://archive.apache.org/dist/hadoop/common/hadoop-2.4.1/ Cheers On Nov 5, 2014, at 8:07 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Nope, thanks for pointing me to it. Doesn't look like there is a resolution to the issue. Also, the like you pointed to also appears to be broken now: http://apache.mesi.com.ar/hadoop/common/ Nick On Wed, Nov 5, 2014 at 10:43 PM, Ted Yu yuzhih...@gmail.com wrote: Have you seen this thread ? http://search-hadoop.com/m/LgpTk2Pnw6O/andrew+apache+mirrorsubj=Re+All+mirrored+download+links+from+the+Apache+Hadoop+site+are+broken Cheers On Wed, Nov 5, 2014 at 7:36 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As part of my work for SPARK-3821 https://issues.apache.org/jira/browse/SPARK-3821, I tried building an AMI today using create_image.sh. This line https://github.com/mesos/spark-ec2/blob/f6773584dd71afc49f1225be48439653313c0341/create_image.sh#L68 appears to be broken now (it wasn’t a week or so ago). This link appears to be broken: http://apache.mirrors.tds.net/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz Is this temporary? Should we update this to something else? Nick
Re: Has anyone else observed this build break?
Sorry for the late reply. I tested my patch on Mac with the following JDK: java version 1.7.0_60 Java(TM) SE Runtime Environment (build 1.7.0_60-b19) Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) Let me see if the problem can be solved upstream in HBase hbase-annotations module. Cheers On Fri, Nov 14, 2014 at 12:32 PM, Patrick Wendell pwend...@gmail.com wrote: I think in this case we can probably just drop that dependency, so there is a simpler fix. But mostly I'm curious whether anyone else has observed this. On Fri, Nov 14, 2014 at 12:24 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: Seems like a comment on that page mentions a fix, which would add yet another profile though -- specifically telling mvn that if it is an apple jdk, use the classes.jar as the tools.jar as well, since Apple-packaged JDK 6 bundled them together. Link: http://permalink.gmane.org/gmane.comp.java.maven-plugins.mojo.user/4320 I didn't test it, but maybe this can fix it? Thanks, Hari On Fri, Nov 14, 2014 at 12:21 PM, Patrick Wendell pwend...@gmail.com wrote: A work around for this fix is identified here: http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsjar-in-jdk.html However, if this affects more users I'd prefer to just fix it properly in our build. On Fri, Nov 14, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com wrote: A recent patch broke clean builds for me, I am trying to see how widespread this issue is and whether we need to revert the patch. The error I've seen is this when building the examples project: spark-examples_2.10: Could not resolve dependencies for project org.apache.spark:spark-examples_2.10:jar:1.2.0-SNAPSHOT: Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/../lib/tools.jar The reason for this error is that hbase-annotations is using a system scoped dependency in their hbase-annotations pom, and this doesn't work with certain JDK layouts such as that provided on Mac OS: http://central.maven.org/maven2/org/apache/hbase/hbase-annotations/0.98.7-hadoop2/hbase-annotations-0.98.7-hadoop2.pom Has anyone else seen this or is it just me? - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Has anyone else observed this build break?
I couldn't reproduce the problem using: java version 1.6.0_65 Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode) Since hbase-annotations is a transitive dependency, I created the following pull request to exclude it from various hbase modules: https://github.com/apache/spark/pull/3286 Cheers https://github.com/apache/spark/pull/3286 On Sat, Nov 15, 2014 at 6:56 AM, Ted Yu yuzhih...@gmail.com wrote: Sorry for the late reply. I tested my patch on Mac with the following JDK: java version 1.7.0_60 Java(TM) SE Runtime Environment (build 1.7.0_60-b19) Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) Let me see if the problem can be solved upstream in HBase hbase-annotations module. Cheers On Fri, Nov 14, 2014 at 12:32 PM, Patrick Wendell pwend...@gmail.com wrote: I think in this case we can probably just drop that dependency, so there is a simpler fix. But mostly I'm curious whether anyone else has observed this. On Fri, Nov 14, 2014 at 12:24 PM, Hari Shreedharan hshreedha...@cloudera.com wrote: Seems like a comment on that page mentions a fix, which would add yet another profile though -- specifically telling mvn that if it is an apple jdk, use the classes.jar as the tools.jar as well, since Apple-packaged JDK 6 bundled them together. Link: http://permalink.gmane.org/gmane.comp.java.maven-plugins.mojo.user/4320 I didn't test it, but maybe this can fix it? Thanks, Hari On Fri, Nov 14, 2014 at 12:21 PM, Patrick Wendell pwend...@gmail.com wrote: A work around for this fix is identified here: http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsjar-in-jdk.html However, if this affects more users I'd prefer to just fix it properly in our build. On Fri, Nov 14, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com wrote: A recent patch broke clean builds for me, I am trying to see how widespread this issue is and whether we need to revert the patch. The error I've seen is this when building the examples project: spark-examples_2.10: Could not resolve dependencies for project org.apache.spark:spark-examples_2.10:jar:1.2.0-SNAPSHOT: Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/../lib/tools.jar The reason for this error is that hbase-annotations is using a system scoped dependency in their hbase-annotations pom, and this doesn't work with certain JDK layouts such as that provided on Mac OS: http://central.maven.org/maven2/org/apache/hbase/hbase-annotations/0.98.7-hadoop2/hbase-annotations-0.98.7-hadoop2.pom Has anyone else seen this or is it just me? - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: How spark and hive integrate in long term?
bq. spark-0.12 also has some nice feature added Minor correction: you meant Spark 1.2.0 I guess Cheers On Fri, Nov 21, 2014 at 3:45 PM, Zhan Zhang zzh...@hortonworks.com wrote: Thanks Dean, for the information. Hive-on-spark is nice. Spark sql has the advantage to take the full advantage of spark and allows user to manipulate the table as RDD through native spark support. When I tried to upgrade the current hive-0.13.1 support to hive-0.14.0. I found the hive parser is not compatible any more. In the meantime, those new feature introduced in hive-0.14.1, e.g, ACID, etc, is not there yet. In the meantime, spark-0.12 also has some nice feature added which is supported by thrift-server too, e.g., hive-0.13, table cache, etc. Given that both have more and more features added, it would be great if user can take advantage of both. Current, spark sql give us such benefits partially, but I am wondering how to keep such integration in long term. Thanks. Zhan Zhang On Nov 21, 2014, at 3:12 PM, Dean Wampler deanwamp...@gmail.com wrote: I can't comment on plans for Spark SQL's support for Hive, but several companies are porting Hive itself onto Spark: http://blog.cloudera.com/blog/2014/11/apache-hive-on-apache-spark-the-first-demo/ I'm not sure if they are leveraging the old Shark code base or not, but it appears to be a fresh effort. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Fri, Nov 21, 2014 at 2:51 PM, Zhan Zhang zhaz...@gmail.com wrote: Now Spark and hive integration is a very nice feature. But I am wondering what the long term roadmap is for spark integration with hive. Both of these two projects are undergoing fast improvement and changes. Currently, my understanding is that spark hive sql part relies on hive meta store and basic parser to operate, and the thrift-server intercept hive query and replace it with its own engine. With every release of hive, there need a significant effort on spark part to support it. For the metastore part, we may possibly replace it with hcatalog. But given the dependency of other parts on hive, e.g., metastore, thriftserver, hcatlog may not be able to help much. Does anyone have any insight or idea in mind? Thanks. Zhan Zhang -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-spark-and-hive-integrate-in-long-term-tp9482.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Required file not found in building
I tried the same command on MacBook and didn't experience the same error. Which OS are you using ? Cheers On Mon, Dec 1, 2014 at 6:42 PM, Stephen Boesch java...@gmail.com wrote: It seems there were some additional settings required to build spark now . This should be a snap for most of you ot there about what I am missing. Here is the command line I have traditionally used: mvn -Pyarn -Phadoop-2.3 -Phive install compile package -DskipTests That command line is however failing with the lastest from HEAD: INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-network-common_2.10 --- [INFO] Using zinc server for incremental compilation [INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null) *[error] Required file not found: scala-compiler-2.10.4.jar* *[error] See zinc -help for information about locating necessary files* [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .. SUCCESS [4.077s] [INFO] Spark Project Networking .. FAILURE [0.445s] OK let's try zinc -help: 18:38:00/spark2 $*zinc -help* Nailgun server running with 1 cached compiler Version = 0.3.5.1 Zinc compiler cache limit = 5 Resident scalac cache limit = 0 Analysis cache limit = 5 Compiler(Scala 2.10.4) [74ff364f] Setup = { * scala compiler = /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar* scala library = /Users/steve/.m2/repository/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar scala extra = { /Users/steve/.m2/repository/org/scala-lang/scala-reflect/2.10.4/scala-reflect-2.10.4.jar /shared/zinc-0.3.5.1/lib/scala-reflect.jar } sbt interface = /shared/zinc-0.3.5.1/lib/sbt-interface.jar compiler interface sources = /shared/zinc-0.3.5.1/lib/compiler-interface-sources.jar java home = fork java = false cache directory = /Users/steve/.zinc/0.3.5.1 } Does that compiler jar exist? Yes! 18:39:34/spark2 $ll /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar -rw-r--r-- 1 steve staff 14445780 Apr 9 2014 /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar
Re: Required file not found in building
I used the following for brew: http://repo.typesafe.com/typesafe/zinc/com/typesafe/zinc/dist/0.3.0/zinc-0.3.0.tgz After starting zinc, I issued the same mvn command but didn't encounter the error you saw. FYI On Mon, Dec 1, 2014 at 8:18 PM, Stephen Boesch java...@gmail.com wrote: The zinc src zip for 0.3.5.3 was downloaded and exploded. Then I ran sbt dist/create . zinc is being launched from dist/target/zinc-0.3.5.3/bin/zinc 2014-12-01 20:12 GMT-08:00 Ted Yu yuzhih...@gmail.com: I use zinc 0.2.0 and started zinc with the same command shown below. I don't observe such error. How did you install zinc-0.3.5.3 ? Cheers On Mon, Dec 1, 2014 at 8:00 PM, Stephen Boesch java...@gmail.com wrote: Anyone maybe can assist on how to run zinc with the latest maven build? I am starting zinc as follows: /shared/zinc-0.3.5.3/dist/target/zinc-0.3.5.3/bin/zinc -scala-home $SCALA_HOME -nailed -start The pertinent env vars are: 19:58:11/lib $echo $SCALA_HOME /shared/scala 19:58:14/lib $which scala /shared/scala/bin/scala 19:58:16/lib $scala -version Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL When I do *not *start zinc then the maven build works .. but v slowly since no incremental compiler available. When zinc is started as shown above then the error occurs on all of the modules except parent: [INFO] Using zinc server for incremental compilation [INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null) [error] Required file not found: scala-compiler-2.10.4.jar [error] See zinc -help for information about locating necessary files 2014-12-01 19:02 GMT-08:00 Stephen Boesch java...@gmail.com: Mac as well. Just found the problem: I had created an alias to zinc a couple of months back. Apparently that is not happy with the build anymore. No problem now that the issue has been isolated - just need to fix my zinc alias. 2014-12-01 18:55 GMT-08:00 Ted Yu yuzhih...@gmail.com: I tried the same command on MacBook and didn't experience the same error. Which OS are you using ? Cheers On Mon, Dec 1, 2014 at 6:42 PM, Stephen Boesch java...@gmail.com wrote: It seems there were some additional settings required to build spark now . This should be a snap for most of you ot there about what I am missing. Here is the command line I have traditionally used: mvn -Pyarn -Phadoop-2.3 -Phive install compile package -DskipTests That command line is however failing with the lastest from HEAD: INFO] --- scala-maven-plugin:3.2.0:compile (scala-compile-first) @ spark-network-common_2.10 --- [INFO] Using zinc server for incremental compilation [INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null) *[error] Required file not found: scala-compiler-2.10.4.jar* *[error] See zinc -help for information about locating necessary files* [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM .. SUCCESS [4.077s] [INFO] Spark Project Networking .. FAILURE [0.445s] OK let's try zinc -help: 18:38:00/spark2 $*zinc -help* Nailgun server running with 1 cached compiler Version = 0.3.5.1 Zinc compiler cache limit = 5 Resident scalac cache limit = 0 Analysis cache limit = 5 Compiler(Scala 2.10.4) [74ff364f] Setup = { * scala compiler = /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar* scala library = /Users/steve/.m2/repository/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar scala extra = { /Users/steve/.m2/repository/org/scala-lang/scala-reflect/2.10.4/scala-reflect-2.10.4.jar /shared/zinc-0.3.5.1/lib/scala-reflect.jar } sbt interface = /shared/zinc-0.3.5.1/lib/sbt-interface.jar compiler interface sources = /shared/zinc-0.3.5.1/lib/compiler-interface-sources.jar java home = fork java = false cache directory = /Users/steve/.zinc/0.3.5.1 } Does that compiler jar exist? Yes! 18:39:34/spark2 $ll /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar -rw-r--r-- 1 steve staff 14445780 Apr 9 2014 /Users/steve/.m2/repository/org/scala-lang/scala-compiler/2.10.4/scala-compiler-2.10.4.jar
Re: Unit tests in 5 minutes
Have you seen this thread http://search-hadoop.com/m/JW1q5xxSAa2 ? Test categorization in HBase is done through maven-surefire-plugin Cheers On Thu, Dec 4, 2014 at 4:05 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: fwiw, when we did this work in HBase, we categorized the tests. Then some tests can share a single jvm, while some others need to be isolated in their own jvm. Nevertheless surefire can still run them in parallel by starting/stopping several jvm. I think we need to do this as well. Perhaps the test naming hierarchy can be used to group non-parallelizable tests in the same JVM. For example, here are some Hive tests from our project: org.apache.spark.sql.hive.StatisticsSuite org.apache.spark.sql.hive.execution.HiveQuerySuite org.apache.spark.sql.QueryTest org.apache.spark.sql.parquet.HiveParquetSuite If we group tests by the first 5 parts of their name (e.g. org.apache.spark.sql.hive), then we’d have the first 2 tests run in the same JVM, and the next 2 tests each run in their own JVM. I’m new to this stuff so I’m not sure if I’m going about this in the right way, but you can see my attempt with this approach on GitHub https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L388-L397, as well as the related discussion on JIRA https://issues.apache.org/jira/browse/SPARK-3431. If anyone has more feedback on this, I’d love to hear it (either on this thread or in the JIRA issue). Nick On Sun Sep 07 2014 at 8:28:51 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: On Fri, Aug 8, 2014 at 1:12 PM, Reynold Xin r...@databricks.com wrote: Nick, Would you like to file a ticket to track this? SPARK-3431 https://issues.apache.org/jira/browse/SPARK-3431: Parallelize execution of tests Sub-task: SPARK-3432 https://issues.apache.org/jira/browse/SPARK-3432: Fix logging of unit test execution time Nick
Re: Unit tests in 5 minutes
bq. I may move on to trying Maven. Maven is my favorite :-) On Sat, Dec 6, 2014 at 10:54 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Ted, I posted some updates https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14236540page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236540 on JIRA on my progress (or lack thereof) getting SBT to parallelize test suites properly. I'm currently stuck with SBT / ScalaTest, so I may move on to trying Maven. Andrew, Once we have a basic grasp of how to parallelize some of the tests, the next step will probably be to use containers (i.e. Docker) to allow more parallelization, especially for those tests that, for example, contend for ports. Nick On Fri Dec 05 2014 at 2:05:29 PM Andrew Or and...@databricks.com wrote: @Patrick and Josh actually we went even further than that. We simply disable the UI for most tests and these used to be the single largest source of port conflict.
Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet
Andy: I saw two emails from you from yesterday. See this thread: http://search-hadoop.com/m/JW1q5opRsY1 Cheers On Fri, Dec 19, 2014 at 12:51 PM, Andy Konwinski andykonwin...@gmail.com wrote: Yesterday, I changed the domain name in the mailing list archive settings to remove .incubator so maybe it'll work now. However, I also sent two emails about this through the nabble interface (in this same thread) yesterday and they don't appear to have made it through so not sure if it actually worked after all. Andy On Wed, Dec 17, 2014 at 1:09 PM, Josh Rosen rosenvi...@gmail.com wrote: Yeah, it looks like messages that are successfully posted via Nabble end up on the Apache mailing list, but messages posted directly to Apache aren't mirrored to Nabble anymore because it's based off the incubator mailing list. We should fix this so that Nabble posts to / archives the non-incubator list. On Sat, Dec 13, 2014 at 6:27 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: Since you mentioned this, I had a related quandry recently -- it also says that the forum archives *u...@spark.incubator.apache.org u...@spark.incubator.apache.org/* *d...@spark.incubator.apache.org d...@spark.incubator.apache.org *respectively, yet the Community page clearly says to email the @spark.apache.org list (but the nabble archive is linked right there too). IMO even putting a clear explanation at the top Posting here requires that you create an account via the UI. Your message will be sent to both spark.incubator.apache.org and spark.apache.org (if that is the case, i'm not sure which alias nabble posts get sent to) would make things a lot more clear. On Sat, Dec 13, 2014 at 5:05 PM, Josh Rosen rosenvi...@gmail.com wrote: I've noticed that several users are attempting to post messages to Spark's user / dev mailing lists using the Nabble web UI ( http://apache-spark-user-list.1001560.n3.nabble.com/). However, there are many posts in Nabble that are not posted to the Apache lists and are flagged with This post has NOT been accepted by the mailing list yet. errors. I suspect that the issue is that users are not completing the sign-up confirmation process ( http://apache-spark-user-list.1001560.n3.nabble.com/mailing_list/MailingListOptions.jtp?forum=1), which is preventing their emails from being accepted by the mailing list. I wanted to mention this issue to the Spark community to see whether there are any good solutions to address this. I have spoken to users who think that our mailing list is unresponsive / inactive because their un-posted messages haven't received any replies. - Josh
Re: Assembly jar file name does not match profile selection
Can you try this command ? sbt/sbt -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive assembly On Fri, Dec 26, 2014 at 6:15 PM, Alessandro Baretta alexbare...@gmail.com wrote: I am building spark with sbt off of branch 1.2. I'm using the following command: sbt/sbt -Pyarn -Phadoop-2.3 assembly (http://spark.apache.org/docs/latest/building-spark.html#building-with-sbt ) Although the jar file I obtain does contain the proper version of the hadoop libraries (v. 2.4), the assembly jar file name refers to hadoop v.1.0.4: ./assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop1.0.4.jar Any idea why? Alex
Re: Why the major.minor version of the new hive-exec is 51.0?
I extracted org/apache/hadoop/hive/common/CompressionUtils.class from the jar and used hexdump to view the class file. Bytes 6 and 7 are 00 and 33, respectively. According to http://en.wikipedia.org/wiki/Java_class_file, the jar was produced using Java 7. FYI On Tue, Dec 30, 2014 at 8:09 PM, Shixiong Zhu zsxw...@gmail.com wrote: The major.minor version of the new org.spark-project.hive.hive-exec is 51.0, so it will require people use JDK7. Is it intentional? dependency groupIdorg.spark-project.hive/groupId artifactIdhive-exec/artifactId version0.12.0-protobuf-2.5/version /dependency You can use the following steps to reproduce it (Need to use JDK6): 1. Create a Test.java file with the following content: public class Test { public static void main(String[] args) throws Exception{ Class.forName(org.apache.hadoop.hive.conf.HiveConf); } } 2. javac Test.java 3. java -classpath ~/.m2/repository/org/spark-project/hive/hive-exec/0.12.0-protobuf-2.5/hive-exec-0.12.0-protobuf-2.5.jar:. Test Exception in thread main java.lang.UnsupportedClassVersionError: org/apache/hadoop/hive/conf/HiveConf : Unsupported major.minor version 51.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) at Test.main(Test.java:5) Best Regards, Shixiong Zhu
Re: Welcoming three new committers
Congratulations, Cheng, Joseph and Sean. On Tue, Feb 3, 2015 at 2:53 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Congratulations guys! On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia matei.zaha...@gmail.com wrote: Hi all, The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley and Sean Owen. All three have been major contributors to Spark in the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many pieces throughout Spark Core. Join me in welcoming them as committers! Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Standardized Spark dev environment
How many profiles (hadoop / hive /scala) would this development environment support ? Cheers On Tue, Jan 20, 2015 at 4:13 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: What do y'all think of creating a standardized Spark development environment, perhaps encoded as a Vagrantfile, and publishing it under `dev/`? The goal would be to make it easier for new developers to get started with all the right configs and tools pre-installed. If we use something like Vagrant, we may even be able to make it so that a single Vagrantfile creates equivalent development environments across OS X, Linux, and Windows, without having to do much (or any) OS-specific work. I imagine for committers and regular contributors, this exercise may seem pointless, since y'all are probably already very comfortable with your workflow. I wonder, though, if any of you think this would be worthwhile as a improvement to the new Spark developer experience. Nick
Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver
Please tale a look at SPARK-4048 and SPARK-5108 Cheers On Sat, Jan 17, 2015 at 10:26 PM, Gil Vernik g...@il.ibm.com wrote: Hi, I took a source code of Spark 1.2.0 and tried to build it together with hadoop-openstack.jar ( To allow Spark an access to OpenStack Swift ) I used Hadoop 2.6.0. The build was fine without problems, however in run time, while trying to access swift:// name space i got an exception: java.lang.NoClassDefFoundError: org/codehaus/jackson/annotate/JsonClass at org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector.findDeserializationType(JacksonAnnotationIntrospector.java:524) at org.codehaus.jackson.map.deser.BasicDeserializerFactory.modifyTypeByAnnotation(BasicDeserializerFactory.java:732) ...and the long stack trace goes here Digging into the problem i saw the following: Jackson versions 1.9.X are not backward compatible, in particular they removed JsonClass annotation. Hadoop 2.6.0 uses jackson-asl version 1.9.13, while Spark has reference to older version of jackson. This is the main pom.xml of Spark 1.2.0 : dependency !-- Matches the version of jackson-core-asl pulled in by avro -- groupIdorg.codehaus.jackson/groupId artifactIdjackson-mapper-asl/artifactId version1.8.8/version /dependency Referencing 1.8.8 version, which is not compatible with Hadoop 2.6.0 . If we change version to 1.9.13, than all will work fine and there will be no run time exceptions while accessing Swift. The following change will solve the problem: dependency !-- Matches the version of jackson-core-asl pulled in by avro -- groupIdorg.codehaus.jackson/groupId artifactIdjackson-mapper-asl/artifactId version1.9.13/version /dependency I am trying to resolve this somehow so people will not get into this issue. Is there any particular need in Spark for jackson 1.8.8 and not 1.9.13? Can we remove 1.8.8 and put 1.9.13 for Avro? It looks to me that all works fine when Spark build with jackson 1.9.13, but i am not an expert and not sure what should be tested. Thanks, Gil Vernik.
Re: 1.2.1 start-all.sh broken?
After some googling / trial and error, I got the following working (against a directory with space in its name): #!/usr/bin/env bash OLDIFS=$IFS # save it IFS= # don't split on any white space dir=$1/* for f in $dir; do cat $f done IFS=$OLDIFS # restore IFS Cheers On Wed, Feb 11, 2015 at 2:47 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: The tragic thing here is that I was asked to review the patch that introduced this https://github.com/apache/spark/pull/3377#issuecomment-68077315, and totally missed it... :( On Wed Feb 11 2015 at 2:46:35 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: lol yeah, I changed the path for the email... turned out to be the issue itself. On Wed Feb 11 2015 at 2:43:09 PM Ted Yu yuzhih...@gmail.com wrote: I see. '/path/to/spark-1.2.1-bin-hadoop2.4' didn't contain space :-) On Wed, Feb 11, 2015 at 2:41 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Found it: https://github.com/apache/spark/compare/v1.2.0...v1.2.1#diff- 73058f8e51951ec0b4cb3d48ade91a1fR73 GRRR BASH WORD SPLITTING My path has a space in it... Nick On Wed Feb 11 2015 at 2:37:39 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: This is what get: spark-1.2.1-bin-hadoop2.4$ ls -1 lib/ datanucleus-api-jdo-3.2.6.jar datanucleus-core-3.2.10.jar datanucleus-rdbms-3.2.9.jar spark-1.2.1-yarn-shuffle.jar spark-assembly-1.2.1-hadoop2.4.0.jar spark-examples-1.2.1-hadoop2.4.0.jar So that looks correct… Hmm. Nick On Wed Feb 11 2015 at 2:34:51 PM Ted Yu yuzhih...@gmail.com wrote: I downloaded 1.2.1 tar ball for hadoop 2.4 I got: ls lib/ datanucleus-api-jdo-3.2.6.jar datanucleus-rdbms-3.2.9.jar spark-assembly-1.2.1-hadoop2.4.0.jar datanucleus-core-3.2.10.jarspark-1.2.1-yarn-shuffle.jar spark-examples-1.2.1-hadoop2.4.0.jar FYI On Wed, Feb 11, 2015 at 2:27 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I just downloaded 1.2.1 pre-built for Hadoop 2.4+ and ran sbin/start-all.sh on my OS X. Failed to find Spark assembly in /path/to/spark-1.2.1-bin-hadoo p2.4/lib You need to build Spark before running this program. Did the same for 1.2.0 and it worked fine. Nick
Re: 1.2.1 start-all.sh broken?
I downloaded 1.2.1 tar ball for hadoop 2.4 I got: ls lib/ datanucleus-api-jdo-3.2.6.jar datanucleus-rdbms-3.2.9.jar spark-assembly-1.2.1-hadoop2.4.0.jar datanucleus-core-3.2.10.jarspark-1.2.1-yarn-shuffle.jar spark-examples-1.2.1-hadoop2.4.0.jar FYI On Wed, Feb 11, 2015 at 2:27 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I just downloaded 1.2.1 pre-built for Hadoop 2.4+ and ran sbin/start-all.sh on my OS X. Failed to find Spark assembly in /path/to/spark-1.2.1-bin-hadoop2.4/lib You need to build Spark before running this program. Did the same for 1.2.0 and it worked fine. Nick
Re: 1.2.1 start-all.sh broken?
I see. '/path/to/spark-1.2.1-bin-hadoop2.4' didn't contain space :-) On Wed, Feb 11, 2015 at 2:41 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Found it: https://github.com/apache/spark/compare/v1.2.0...v1.2.1#diff-73058f8e51951ec0b4cb3d48ade91a1fR73 GRRR BASH WORD SPLITTING My path has a space in it... Nick On Wed Feb 11 2015 at 2:37:39 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: This is what get: spark-1.2.1-bin-hadoop2.4$ ls -1 lib/ datanucleus-api-jdo-3.2.6.jar datanucleus-core-3.2.10.jar datanucleus-rdbms-3.2.9.jar spark-1.2.1-yarn-shuffle.jar spark-assembly-1.2.1-hadoop2.4.0.jar spark-examples-1.2.1-hadoop2.4.0.jar So that looks correct… Hmm. Nick On Wed Feb 11 2015 at 2:34:51 PM Ted Yu yuzhih...@gmail.com wrote: I downloaded 1.2.1 tar ball for hadoop 2.4 I got: ls lib/ datanucleus-api-jdo-3.2.6.jar datanucleus-rdbms-3.2.9.jar spark-assembly-1.2.1-hadoop2.4.0.jar datanucleus-core-3.2.10.jarspark-1.2.1-yarn-shuffle.jar spark-examples-1.2.1-hadoop2.4.0.jar FYI On Wed, Feb 11, 2015 at 2:27 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I just downloaded 1.2.1 pre-built for Hadoop 2.4+ and ran sbin/start-all.sh on my OS X. Failed to find Spark assembly in /path/to/spark-1.2.1-bin-hadoop2.4/lib You need to build Spark before running this program. Did the same for 1.2.0 and it worked fine. Nick
Re: Intellij IDEA 14 env setup; NoClassDefFoundError when run examples
Have you read / followed this ? https://cwiki.apache.org/confluence/display/SPARK /Useful+Developer+Tools#UsefulDeveloperTools-BuildingSparkinIntelliJIDEA Cheers On Sat, Jan 31, 2015 at 8:01 PM, Yafeng Guo daniel.yafeng@gmail.com wrote: Hi, I'm setting up a dev environment with Intellij IDEA 14. I selected profile scala-2.10, maven-3, hadoop 2.4, hive, hive 0.13.1. The compilation passed. But when I try to run LogQuery in examples, I met below issue: Connected to the target VM, address: '127.0.0.1:37182', transport: 'socket' Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/SparkConf at org.apache.spark.examples.LogQuery$.main(LogQuery.scala:46) at org.apache.spark.examples.LogQuery.main(LogQuery.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 2 more Disconnected from the target VM, address: '127.0.0.1:37182', transport: 'socket' anyone met similar issue before? Thanks a lot Regards, Ya-Feng
Re: python converter in HBaseConverter.scala(spark/examples)
HBaseConverter is in Spark source tree. Therefore I think it makes sense for this improvement to be accepted so that the example is more useful. Cheers On Mon, Jan 5, 2015 at 7:54 AM, Nick Pentreath nick.pentre...@gmail.com wrote: Hey These converters are actually just intended to be examples of how to set up a custom converter for a specific input format. The converter interface is there to provide flexibility where needed, although with the new SparkSQL data store interface the intention is that most common use cases can be handled using that approach rather than custom converters. The intention is not to have specific converters living in Spark core, which is why these are in the examples project. Having said that, if you wish to expand the example converter for others reference do feel free to submit a PR. Ideally though, I would think that various custom converters would be part of external projects that can be listed with http://spark-packages.org/ I see your project is already listed there. — Sent from Mailbox https://www.dropbox.com/mailbox On Mon, Jan 5, 2015 at 5:37 PM, Ted Yu yuzhih...@gmail.com wrote: In my opinion this would be useful - there was another thread where returning only the value of first column in the result was mentioned. Please create a SPARK JIRA and a pull request. Cheers On Mon, Jan 5, 2015 at 6:42 AM, tgbaggio gen.tan...@gmail.com wrote: Hi, In HBaseConverter.scala https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala , the python converter HBaseResultToStringConverter return only the value of first column in the result. In my opinion, it limits the utility of this converter, because it returns only one value per row and moreover it loses the other information of record, such as column:cell, timestamp. Therefore, I would like to propose some modifications about HBaseResultToStringConverter which will be able to return all records in the hbase with more complete information: I have already written some code in pythonConverters.scala https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala and it works Is it OK to modify the code in HBaseConverters.scala, please? Thanks a lot in advance. Cheers Gen -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-tp10001.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Results of tests
For a build which uses JUnit, we would see a summary such as the following ( https://builds.apache.org/job/HBase-TRUNK/6007/console): Tests run: 2199, Failures: 0, Errors: 0, Skipped: 25 In https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull , I don't see such statistics. Looks like scalatest-maven-plugin can be enhanced :-) On Fri, Jan 9, 2015 at 3:52 AM, Sean Owen so...@cloudera.com wrote: Hey Tony, the number of tests run could vary depending on how the build is configured. For example, YARN-related tests would only run when the yarn profile is turned on. Java 8 tests would only run under Java 8. Although I don't know that there's any reason to believe the IBM JVM has a problem with Spark, I see this issue that is potentially related to endian-ness : https://issues.apache.org/jira/browse/SPARK-2018 I don't know if that was a Spark issue. Certainly, would be good for you to investigate if you are interested in resolving it. The Jenkins output shows you exactly what tests were run and how -- have a look at the logs. https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull On Fri, Jan 9, 2015 at 9:15 AM, Tony Reix tony.r...@bull.net wrote: Hi Ted Thanks for the info. However, I'm still unable to understand how the page: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/ has been built. This page contains details I do not find in the page you indicated to me: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull As an example, I'm still unable to find these details: org.apache.spark https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark/ 12 mn 0 1 247 248 org.apache.spark.api.python https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.api.python/ 20 ms 0 0 2 2 org.apache.spark.bagel https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.bagel/ 7.7 s 0 0 4 4 org.apache.spark.broadcast https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.broadcast/ 43 s0 0 17 17 org.apache.spark.deploy https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.deploy/ 16 s0 0 29 29 org.apache.spark.deploy.worker https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.deploy.worker/ 0.55 s 0 0 12 12 Moreover, in my Ubuntu/x86_64 environment, I do not find 3745 tests and 0 failures, but 3485 tests and 4 failures (when using Oracle JVM 1.7 ). When using IBM JVM, there are only 2566 tests and 5 failures (in same component: Streaming). On my PPC64BE (BE = Big-Endian)environment, the tests block after 2 hundreds of tests. Is Spark independent of Little/Big-Endian stuff ? On my PPC64LE (LE = Little-Endian) environment, I have 3485 tests only (like on Ubuntu/x86_64 with IBM JVM), with 6 or 285 failures... So, I need to learn more about how your Jenkins environment extracts details about the results. Moreover, which JVM is used ? Do you plan to use IBM JVM in order to check that Spark and IBM JVM are compatible ? (they already do not look to be compatible 100% ...). Thanks Tony IBM Coop Architect Technical Leader Office : +33 (0) 4 76 29 72 67 1 rue de Provence - 38432 Échirolles - France www.atos.nethttp://www.atos.net/ De : Ted Yu [yuzhih...@gmail.com] Envoyé : jeudi 8 janvier 2015 17:43 À : Tony Reix Cc : dev@spark.apache.org Objet : Re: Results of tests Here it is: [centos] $ /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn -DHADOOP_PROFILE=hadoop-2.4 -Dlabel=centos -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package You can find the above in https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE
Re: Results of tests
I noticed that org.apache.spark.sql.hive.execution has a lot of tests skipped. Is there plan to enable these tests on Jenkins (so that there is no regression across releases) ? Cheers On Fri, Jan 9, 2015 at 11:46 AM, Josh Rosen rosenvi...@gmail.com wrote: The Test Result pages for Jenkins builds shows some nice statistics for the test run, including individual test times: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/ Currently this only covers the Java / Scala tests, but we might be able to integrate the PySpark tests here, too (I think it's just a matter of getting the Python test runner to generate the correct test result XML output). On Fri, Jan 9, 2015 at 10:47 AM, Ted Yu yuzhih...@gmail.com wrote: For a build which uses JUnit, we would see a summary such as the following ( https://builds.apache.org/job/HBase-TRUNK/6007/console): Tests run: 2199, Failures: 0, Errors: 0, Skipped: 25 In https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull , I don't see such statistics. Looks like scalatest-maven-plugin can be enhanced :-) On Fri, Jan 9, 2015 at 3:52 AM, Sean Owen so...@cloudera.com wrote: Hey Tony, the number of tests run could vary depending on how the build is configured. For example, YARN-related tests would only run when the yarn profile is turned on. Java 8 tests would only run under Java 8. Although I don't know that there's any reason to believe the IBM JVM has a problem with Spark, I see this issue that is potentially related to endian-ness : https://issues.apache.org/jira/browse/SPARK-2018 I don't know if that was a Spark issue. Certainly, would be good for you to investigate if you are interested in resolving it. The Jenkins output shows you exactly what tests were run and how -- have a look at the logs. https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull On Fri, Jan 9, 2015 at 9:15 AM, Tony Reix tony.r...@bull.net wrote: Hi Ted Thanks for the info. However, I'm still unable to understand how the page: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/ has been built. This page contains details I do not find in the page you indicated to me: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull As an example, I'm still unable to find these details: org.apache.spark https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark/ 12 mn 0 1 247 248 org.apache.spark.api.python https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.api.python/ 20 ms 0 0 2 2 org.apache.spark.bagel https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.bagel/ 7.7 s 0 0 4 4 org.apache.spark.broadcast https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.broadcast/ 43 s0 0 17 17 org.apache.spark.deploy https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.deploy/ 16 s0 0 29 29 org.apache.spark.deploy.worker https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/org.apache.spark.deploy.worker/ 0.55 s 0 0 12 12 Moreover, in my Ubuntu/x86_64 environment, I do not find 3745 tests and 0 failures, but 3485 tests and 4 failures (when using Oracle JVM 1.7 ). When using IBM JVM, there are only 2566 tests and 5 failures (in same component: Streaming). On my PPC64BE (BE = Big-Endian)environment, the tests block after 2 hundreds of tests. Is Spark independent of Little/Big-Endian stuff ? On my PPC64LE (LE = Little-Endian) environment, I have 3485 tests only (like on Ubuntu/x86_64 with IBM JVM), with 6 or 285 failures... So, I need to learn more about how
Re: python converter in HBaseConverter.scala(spark/examples)
In my opinion this would be useful - there was another thread where returning only the value of first column in the result was mentioned. Please create a SPARK JIRA and a pull request. Cheers On Mon, Jan 5, 2015 at 6:42 AM, tgbaggio gen.tan...@gmail.com wrote: Hi, In HBaseConverter.scala https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala , the python converter HBaseResultToStringConverter return only the value of first column in the result. In my opinion, it limits the utility of this converter, because it returns only one value per row and moreover it loses the other information of record, such as column:cell, timestamp. Therefore, I would like to propose some modifications about HBaseResultToStringConverter which will be able to return all records in the hbase with more complete information: I have already written some code in pythonConverters.scala https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala and it works Is it OK to modify the code in HBaseConverters.scala, please? Thanks a lot in advance. Cheers Gen -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-tp10001.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Results of tests
Here it is: [centos] $ /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn -DHADOOP_PROFILE=hadoop-2.4 -Dlabel=centos -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package You can find the above in https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/consoleFull Cheers On Thu, Jan 8, 2015 at 8:05 AM, Tony Reix tony.r...@bull.net wrote: Thanks ! I've been able to see that there are 3745 tests for version 1.2.0 with profile Hadoop 2.4 . However, on my side, the maximum tests I've seen are 3485... About 300 tests are missing on my side. Which Maven option has been used for producing the report file used for building the page: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/lastSuccessfulBuild/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/ ? (I'm not authorized to look at the configuration part) Thx ! Tony -- *De :* Ted Yu [yuzhih...@gmail.com] *Envoyé :* jeudi 8 janvier 2015 16:11 *À :* Tony Reix *Cc :* dev@spark.apache.org *Objet :* Re: Results of tests Please take a look at https://amplab.cs.berkeley.edu/jenkins/view/Spark/ On Thu, Jan 8, 2015 at 5:40 AM, Tony Reix tony.r...@bull.net wrote: Hi, I'm checking that Spark works fine on a new environment (PPC64 hardware). I've found some issues, with versions 1.1.0, 1.1.1, and 1.2.0, even when running on Ubuntu on x86_64 with Oracle JVM. I'd like to know where I can find the results of the tests of Spark, for each version and for the different versions, in order to have a reference to compare my results with. I cannot find them on Spark web-site. Thx Tony
Re: Wrong version on the Spark documentation page
When I enter http://spark.apache.org/docs/latest/ into Chrome address bar, I saw 1.3.0 Cheers On Sun, Mar 15, 2015 at 11:12 AM, Patrick Wendell pwend...@gmail.com wrote: Cheng - what if you hold shift+refresh? For me the /latest link correctly points to 1.3.0 On Sun, Mar 15, 2015 at 10:40 AM, Cheng Lian lian.cs@gmail.com wrote: It's still marked as 1.2.1 here http://spark.apache.org/docs/latest/ But this page is updated (1.3.0) http://spark.apache.org/docs/latest/index.html Cheng - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Error: 'SparkContext' object has no attribute 'getActiveStageIds'
Please take a look at core/src/main/scala/org/apache/spark/SparkStatusTracker.scala, around line 58: def getActiveStageIds(): Array[Int] = { Cheers On Fri, Mar 20, 2015 at 3:59 PM, xing ehomec...@gmail.com wrote: getStageInfo in self._jtracker.getStageInfo below seems not implemented/included in the current python library. def getStageInfo(self, stageId): Returns a :class:`SparkStageInfo` object, or None if the stage info could not be found or was garbage collected. stage = self._jtracker.getStageInfo(stageId) if stage is not None: # TODO: fetch them in batch for better performance attrs = [getattr(stage, f)() for f in SparkStageInfo._fields[1:]] return SparkStageInfo(stageId, *attrs) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Error-SparkContext-object-has-no-attribute-getActiveStageIds-tp11136p11140.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: GitHub Syncing Down
Looks like github is functioning again (I no longer encounter this problem when pushing to hbase repo). Do you want to give it a try ? Cheers On Tue, Mar 10, 2015 at 6:54 PM, Michael Armbrust mich...@databricks.com wrote: FYI: https://issues.apache.org/jira/browse/INFRA-9259
Re: Jira Issues
Issues are tracked on Apache JIRA: https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel Cheers On Wed, Mar 25, 2015 at 1:51 PM, Igor Costa igorco...@apache.org wrote: Hi there Guys. I want to be more collaborative to Spark, but I have two questions. Issues are used in Github or jira Issues? If so on Jira, Is there a way I can get in to see the issues? I've tried to login but no success. I'm PMC from another Apache project, flex.apache.org Best Regards Igor
Re: should we add a start-masters.sh script in sbin?
Sounds good to me. On Tue, Mar 31, 2015 at 6:12 PM, sequoiadb mailing-list-r...@sequoiadb.com wrote: Hey, start-slaves.sh script is able to read from slaves file and start slaves node in multiple boxes. However in standalone mode if I want to use multiple masters, I’ll have to start masters in each individual box, and also need to provide the list of masters’ hostname+port to each worker. ( start-slaves.sh only take 1 master ip+port for now) I wonder should we create a new script called start-masters.sh to read conf/masters file? Also start-slaves.sh script may need to change a little bit so that master list can be passed to worker nodes. Thanks - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: One corrupt gzip in a directory of 100s
bq. writing the output (to Amazon S3) failed What's the value of fs.s3.maxRetries ? Increasing the value should help. Cheers On Wed, Apr 1, 2015 at 8:34 AM, Romi Kuntsman r...@totango.com wrote: What about communication errors and not corrupted files? Both when reading input and when writing output. We currently experience a failure of the entire process, if the last stage of writing the output (to Amazon S3) failed because of a very temporary DNS resolution issue (easily resolved by retrying). *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Wed, Apr 1, 2015 at 12:58 PM, Gil Vernik g...@il.ibm.com wrote: I actually saw the same issue, where we analyzed some container with few hundreds of GBs zip files - one was corrupted and Spark exit with Exception on the entire job. I like SPARK-6593, since it can cover also additional cases, not just in case of corrupted zip files. From: Dale Richardson dale...@hotmail.com To: dev@spark.apache.org dev@spark.apache.org Date: 29/03/2015 11:48 PM Subject:One corrupt gzip in a directory of 100s Recently had an incident reported to me where somebody was analysing a directory of gzipped log files, and was struggling to load them into spark because one of the files was corrupted - calling sc.textFiles('hdfs:///logs/*.gz') caused an IOException on the particular executor that was reading that file, which caused the entire job to be cancelled after the retry count was exceeded, without any way of catching and recovering from the error. While normally I think it is entirely appropriate to stop execution if something is wrong with your input, sometimes it is useful to analyse what you can get (as long as you are aware that input has been skipped), and treat corrupt files as acceptable losses. To cater for this particular case I've added SPARK-6593 (PR at https://github.com/apache/spark/pull/5250). Which adds an option (spark.hadoop.ignoreInputErrors) to log exceptions raised by the hadoop Input format, but to continue on with the next task. Ideally in this case you would want to report the corrupt file paths back to the master so they could be dealt with in a particular way (eg moved to a separate directory), but that would require a public API change/addition. I was pondering on an addition to Spark's hadoop API that could report processing status back to the master via an optional accumulator that collects filepath/Option(exception message) tuples so the user has some idea of what files are being processed, and what files are being skipped. Regards,Dale.
Re: trouble with sbt building network-* projects?
bq. to be able to run my tests in sbt, though, it makes the development iterations much faster. Was the preference for sbt due to long maven build time ? Have you started Zinc on your machine ? Cheers On Fri, Feb 27, 2015 at 11:10 AM, Imran Rashid iras...@cloudera.com wrote: Has anyone else noticed very strange build behavior in the network-* projects? maven seems to the doing the right, but sbt is very inconsistent. Sometimes when it builds network-shuffle it doesn't know about any of the code in network-common. Sometimes it will completely skip the java unit tests. And then some time later, it'll suddenly decide it knows about some more of the java unit tests. Its not from a simple change, like touching a test file, or a file the test depends on -- nor a restart of sbt. I am pretty confused. maven had issues when I tried to add scala code to network-common, it would compile the scala code but not make it available to java. I'm working around that by just coding in java anyhow. I'd really like to be able to run my tests in sbt, though, it makes the development iterations much faster. thanks, Imran
Re: trouble with sbt building network-* projects?
bq. I have to keep cd'ing into network/common, run mvn install, then go back to network/shuffle and run some other mvn command over there. Yeah - been through this. Having continuous testing for maven would be nice. On Fri, Feb 27, 2015 at 11:31 AM, Imran Rashid iras...@cloudera.com wrote: well, perhaps I just need to learn to use maven better, but currently I find sbt much more convenient for continuously running my tests. I do use zinc, but I'm looking for continuous testing. This makes me think I need sbt for that: http://stackoverflow.com/questions/11347633/is-there-a-java-continuous-testing-plugin-for-maven 1) I really like that in sbt I can run ~test-only com.foo.bar.SomeTestSuite (or whatever other pattern) and just leave that running as I code, without having to go and explicitly trigger mvn test and wait for the result. 2) I find sbt's handling of sub-projects much simpler (when it works). I'm trying to make changes to network/common network/shuffle, which means I have to keep cd'ing into network/common, run mvn install, then go back to network/shuffle and run some other mvn command over there. I don't want to run mvn at the root project level, b/c I don't want to wait for it to compile all the other projects when I just want to run tests in network/common. Even with incremental compiling, in my day-to-day coding I want to entirely skip compiling sql, graphx, mllib etc. -- I have to switch branches often enough that i end up triggering a full rebuild of those projects even when I haven't touched them. On Fri, Feb 27, 2015 at 1:14 PM, Ted Yu yuzhih...@gmail.com wrote: bq. to be able to run my tests in sbt, though, it makes the development iterations much faster. Was the preference for sbt due to long maven build time ? Have you started Zinc on your machine ? Cheers On Fri, Feb 27, 2015 at 11:10 AM, Imran Rashid iras...@cloudera.com wrote: Has anyone else noticed very strange build behavior in the network-* projects? maven seems to the doing the right, but sbt is very inconsistent. Sometimes when it builds network-shuffle it doesn't know about any of the code in network-common. Sometimes it will completely skip the java unit tests. And then some time later, it'll suddenly decide it knows about some more of the java unit tests. Its not from a simple change, like touching a test file, or a file the test depends on -- nor a restart of sbt. I am pretty confused. maven had issues when I tried to add scala code to network-common, it would compile the scala code but not make it available to java. I'm working around that by just coding in java anyhow. I'd really like to be able to run my tests in sbt, though, it makes the development iterations much faster. thanks, Imran
Re: org.spark-project.jetty and guava repo locations
Take a look at the maven-shade-plugin in pom.xml. Here is the snippet for org.spark-project.jetty : relocation patternorg.eclipse.jetty/pattern shadedPatternorg.spark-project.jetty/shadedPattern includes includeorg.eclipse.jetty.**/include /includes /relocation On Thu, Apr 2, 2015 at 3:59 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, I am looking for the org.spark-project.jetty and org.spark-project.guava repo locations but I'm unable to find it in the maven repository. are these publicly available? rgds -- Niranda
Re: [sql] Dataframe how to check null values
I found: https://issues.apache.org/jira/browse/SPARK-6573 On Apr 20, 2015, at 4:29 AM, Peter Rudenko petro.rude...@gmail.com wrote: Sounds very good. Is there a jira for this? Would be cool to have in 1.4, because currently cannot use dataframe.describe function with NaN values, need to filter manually all the columns. Thanks, Peter Rudenko On 2015-04-02 21:18, Reynold Xin wrote: Incidentally, we were discussing this yesterday. Here are some thoughts on null handling in SQL/DataFrames. Would be great to get some feedback. 1. Treat floating point NaN and null as the same null value. This would be consistent with most SQL databases, and Pandas. This would also require some inbound conversion. 2. Internally, when we see a NaN value, we should mark the null bit as true, and keep the NaN value. When we see a null value for a floating point field, we should mark the null bit as true, and update the field to store NaN. 3. Externally, for floating point values, return NaN when the value is null. 4. For all other types, return null for null values. 5. For UDFs, if the argument is primitive type only (i.e. does not handle null) and not a floating point field, simply evaluate the expression to null. This is consistent with most SQL UDFs and most programming languages' treatment of NaN. Any thoughts on this semantics? On Thu, Apr 2, 2015 at 5:51 AM, Dean Wampler deanwamp...@gmail.com mailto:deanwamp...@gmail.com wrote: I'm afraid you're a little stuck. In Scala, the types Int, Long, Float, Double, Byte, and Boolean look like reference types in source code, but they are compiled to the corresponding JVM primitive types, which can't be null. That's why you get the warning about ==. It might be your best choice is to use NaN as the placeholder for null, then create one DF using a filter that removes those values. Use that DF to compute the mean. Then apply a map step to the original DF to translate the NaN's to the mean. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition http://shop.oreilly.com/product/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Thu, Apr 2, 2015 at 7:54 AM, Peter Rudenko petro.rude...@gmail.com mailto:petro.rude...@gmail.com wrote: Hi i need to implement MeanImputor - impute missing values with mean. If i set missing values to null - then dataframe aggregation works properly, but in UDF it treats null values to 0.0. Here’s example: |val df = sc.parallelize(Array(1.0,2.0, null, 3.0, 5.0, null)).toDF df.agg(avg(_1)).first //res45: org.apache.spark.sql.Row = [2.75] df.withColumn(d2, callUDF({(value: Double) = value}, DoubleType, df(d))),show() d d2 1.0 1.0 2.0 2.0 null 0.0 3.0 3.0 5.0 5.0 null 0.0 val df = sc.parallelize(Array(1.0,2.0, Double.NaN, 3.0, 5.0, Double.NaN)).toDF df.agg(avg(_1)).first //res46: org.apache.spark.sql.Row = [Double.NaN] | In UDF i cannot compare scala’s Double to null: |comparing values of types Double and Null using `==' will always yield false [warn] if (value==null) meanValue else value | With Double.NaN instead of null i can compare in UDF, but aggregation doesn’t work properly. Maybe it’s related to : https://issues.apache.org/ jira/browse/SPARK-6573 Thanks, Peter Rudenko - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [discuss] DataFrame function namespacing
IMHO I would go with choice #1 Cheers On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin r...@databricks.com wrote: We definitely still have the name collision problem in SQL. On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: Do we still have to keep the names of the functions distinct to avoid collisions in SQL? Or is there a plan to allow importing a namespace into SQL somehow? I ask because if we have to keep worrying about name collisions then I'm not sure what the added complexity of #2 and #3 buys us. Punya On Wed, Apr 29, 2015 at 3:52 PM Reynold Xin r...@databricks.com wrote: Scaladoc isn't much of a problem because scaladocs are grouped. Java/Python is the main problem ... See https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ On Wed, Apr 29, 2015 at 3:38 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: My feeling is that we should have a handful of namespaces (say 4 or 5). It becomes too cumbersome to import / remember more package names and having everything in one package makes it hard to read scaladoc etc. Thanks Shivaram On Wed, Apr 29, 2015 at 3:30 PM, Reynold Xin r...@databricks.com wrote: To add a little bit more context, some pros/cons I can think of are: Option 1: Very easy for users to find the function, since they are all in org.apache.spark.sql.functions. However, there will be quite a large number of them. Option 2: I can't tell why we would want this one over Option 3, since it has all the problems of Option 3, and not as nice of a hierarchy. Option 3: Opposite of Option 1. Each package or static class has a small number of functions that are relevant to each other, but for some functions it is unclear where they should go (e.g. should min go into basic or math?) On Wed, Apr 29, 2015 at 3:21 PM, Reynold Xin r...@databricks.com wrote: Before we make DataFrame non-alpha, it would be great to decide how we want to namespace all the functions. There are 3 alternatives: 1. Put all in org.apache.spark.sql.functions. This is how SQL does it, since SQL doesn't have namespaces. I estimate eventually we will have ~ 200 functions. 2. Have explicit namespaces, which is what master branch currently looks like: - org.apache.spark.sql.functions - org.apache.spark.sql.mathfunctions - ... 3. Have explicit namespaces, but restructure them slightly so everything is under functions. package object functions { // all the old functions here -- but deprecated so we keep source compatibility def ... } package org.apache.spark.sql.functions object mathFunc { ... } object basicFuncs { ... }
Re: [discuss] ending support for Java 6?
+1 on ending support for Java 6. BTW from https://www.java.com/en/download/faq/java_7.xml : After April 2015, Oracle will no longer post updates of Java SE 7 to its public download sites. On Thu, Apr 30, 2015 at 1:34 PM, Punyashloka Biswal punya.bis...@gmail.com wrote: I'm in favor of ending support for Java 6. We should also articulate a policy on how long we want to support current and future versions of Java after Oracle declares them EOL (Java 7 will be in that bucket in a matter of days). Punya On Thu, Apr 30, 2015 at 1:18 PM shane knapp skn...@berkeley.edu wrote: something to keep in mind: we can easily support java 6 for the build environment, particularly if there's a definite EOL. i'd like to fix our java versioning 'problem', and this could be a big instigator... right now we're hackily setting java_home in test invocation on jenkins, which really isn't the best. if i decide, within jenkins, to reconfigure every build to 'do the right thing' WRT java version, then i will clean up the old mess and pay down on some technical debt. or i can just install java 6 and we use that as JAVA_HOME on a build-by-build basis. this will be a few days of prep and another morning-long downtime if i do the right thing (within jenkins), and only a couple of hours the hacky way (system level). either way, we can test on java 6. :) On Thu, Apr 30, 2015 at 1:00 PM, Koert Kuipers ko...@tresata.com wrote: nicholas started it! :) for java 6 i would have said the same thing about 1 year ago: it is foolish to drop it. but i think the time is right about now. about half our clients are on java 7 and the other half have active plans to migrate to it within 6 months. On Thu, Apr 30, 2015 at 3:57 PM, Reynold Xin r...@databricks.com wrote: Guys thanks for chiming in, but please focus on Java here. Python is an entirely separate issue. On Thu, Apr 30, 2015 at 12:53 PM, Koert Kuipers ko...@tresata.com wrote: i am not sure eol means much if it is still actively used. we have a lot of clients with centos 5 (for which we still support python 2.4 in some form or another, fun!). most of them are on centos 6, which means python 2.6. by cutting out python 2.6 you would cut out the majority of the actual clusters i am aware of. unless you intention is to truly make something academic i dont think that is wise. On Thu, Apr 30, 2015 at 3:48 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: (On that note, I think Python 2.6 should be next on the chopping block sometime later this year, but that’s for another thread.) (To continue the parenthetical, Python 2.6 was in fact EOL-ed in October of 2013. https://www.python.org/download/releases/2.6.9/) On Thu, Apr 30, 2015 at 3:18 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: I understand the concern about cutting out users who still use Java 6, and I don't have numbers about how many people are still using Java 6. But I want to say at a high level that I support deprecating older versions of stuff to reduce our maintenance burden and let us use more modern patterns in our code. Maintenance always costs way more than initial development over the lifetime of a project, and for that reason anti-support is just as important as support. (On that note, I think Python 2.6 should be next on the chopping block sometime later this year, but that's for another thread.) Nick On Thu, Apr 30, 2015 at 3:03 PM Reynold Xin r...@databricks.com wrote: This has been discussed a few times in the past, but now Oracle has ended support for Java 6 for over a year, I wonder if we should just drop Java 6 support. There is one outstanding issue Tom has brought to my attention: PySpark on YARN doesn't work well with Java 7/8, but we have an outstanding pull request to fix that. https://issues.apache.org/jira/browse/SPARK-6869 https://issues.apache.org/jira/browse/SPARK-1920
Re: [discuss] ending support for Java 6?
+1 On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan mri...@gmail.com wrote: We could build on minimum jdk we support for testing pr's - which will automatically cause build failures in case code uses newer api ? Regards, Mridul On Fri, May 1, 2015 at 2:46 PM, Reynold Xin r...@databricks.com wrote: It's really hard to inspect API calls since none of us have the Java standard library in our brain. The only way we can enforce this is to have it in Jenkins, and Tom you are currently our mini-Jenkins server :) Joking aside, looks like we should support Java 6 in 1.4, and in the release notes include a message saying starting in 1.5 we will drop Java 6 support. On Fri, May 1, 2015 at 2:00 PM, Thomas Graves tgra...@yahoo-inc.com wrote: Hey folks, 2 more things that broke jdk6 got committed last night/today. Please watch the java api's being used until we choose to deprecate jdk6. Tom On Thursday, April 30, 2015 2:04 PM, Reynold Xin r...@databricks.com wrote: This has been discussed a few times in the past, but now Oracle has ended support for Java 6 for over a year, I wonder if we should just drop Java 6 support. There is one outstanding issue Tom has brought to my attention: PySpark on YARN doesn't work well with Java 7/8, but we have an outstanding pull request to fix that. https://issues.apache.org/jira/browse/SPARK-6869 https://issues.apache.org/jira/browse/SPARK-1920 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Speeding up Spark build during development
Pramod: Please remember to run Zinc so that the build is faster. Cheers On Fri, May 1, 2015 at 9:36 AM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi Pramod, For cluster-like tests you might want to use the same code as in mllib's LocalClusterSparkContext. You can rebuild only the package that you change and then run this main class. Best regards, Alexander -Original Message- From: Pramod Biligiri [mailto:pramodbilig...@gmail.com] Sent: Friday, May 01, 2015 1:46 AM To: dev@spark.apache.org Subject: Speeding up Spark build during development Hi, I'm making some small changes to the Spark codebase and trying it out on a cluster. I was wondering if there's a faster way to build than running the package target each time. Currently I'm using: mvn -DskipTests package All the nodes have the same filesystem mounted at the same mount point. Pramod
Re: Mima test failure in the master branch?
Looks like this has been taken care of: commit beeafcfd6ee1e460c4d564cd1515d8781989b422 Author: Patrick Wendell patr...@databricks.com Date: Thu Apr 30 20:33:36 2015 -0700 Revert [SPARK-5213] [SQL] Pluggable SQL Parser Support On Thu, Apr 30, 2015 at 7:58 PM, zhazhan zzh...@hortonworks.com wrote: [info] spark-sql: found 1 potential binary incompatibilities (filtered 129) [error] * method sqlParser()org.apache.spark.sql.SparkSQLParser in class org.apache.spark.sql.SQLContext does not have a correspondent in new version [error] filter with: ProblemFilters.excludeMissingMethodProblem -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Mima-test-failure-in-the-master-branch-tp11949.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [discuss] ending support for Java 6?
But it is hard to know how long customers stay with their most recent download. Cheers On Thu, Apr 30, 2015 at 2:26 PM, Sree V sree_at_ch...@yahoo.com.invalid wrote: If there is any possibility of getting the download counts,then we can use it as EOS criteria as well.Say, if download counts are lower than 30% (or another number) of Life time highest,then it qualifies for EOS. Thanking you. With Regards Sree On Thursday, April 30, 2015 2:22 PM, Sree V sree_at_ch...@yahoo.com.INVALID wrote: Hi Team, Should we take this opportunity to layout and evangelize a pattern for EOL of dependencies.I propose, we follow the official EOL of java, python, scala, .And add say 6-12-24 months depending on the popularity. Java 6 official EOL Feb 2013Add 6-12 monthsAug 2013 - Feb 2014 official End of Support for Java 6 in SparkAnnounce 3-6 months prior to EOS. Thanking you. With Regards Sree On Thursday, April 30, 2015 1:41 PM, Marcelo Vanzin van...@cloudera.com wrote: As for the idea, I'm +1. Spark is the only reason I still have jdk6 around - exactly because I don't want to cause the issue that started this discussion (inadvertently using JDK7 APIs). And as has been pointed out, even J7 is about to go EOL real soon. Even Hadoop is moving away (I think 2.7 will be j7-only). Hive 1.1 is already j7-only. And when Hadoop moves away from something, it's an event worthy of headlines. They're still on Jetty 6! As for pyspark, https://github.com/apache/spark/pull/5580 should get rid of the last incompatibility with large assemblies, by keeping the python files in separate archives. If we remove support for Java 6, then we don't need to worry about the size of the assembly anymore. On Thu, Apr 30, 2015 at 1:32 PM, Sean Owen so...@cloudera.com wrote: I'm firmly in favor of this. It would also fix https://issues.apache.org/jira/browse/SPARK-7009 and avoid any more of the long-standing 64K file limit thing that's still a problem for PySpark. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: unable to extract tgz files downloaded from spark
From which site did you download the tar ball ? Which package type did you choose (pre-built for which distro) ? Thanks On Wed, May 6, 2015 at 7:16 PM, Praveen Kumar Muthuswamy muthusamy...@gmail.com wrote: Hi I have been trying to install latest spark verison and downloaded the .tgz files(ex spark-1.3.1.tgz). But, I could not extract them. It complains of invalid tar format. Has any seen this issue ? Thanks Praveen
Re: Recent Spark test failures
Makes sense. Having high determinism in these tests would make Jenkins build stable. On Mon, May 11, 2015 at 1:08 PM, Andrew Or and...@databricks.com wrote: Hi Ted, Yes, those two options can be useful, but in general I think the standard to set is that tests should never fail. It's actually the worst if tests fail sometimes but not others, because we can't reproduce them deterministically. Using -M and -A actually tolerates flaky tests to a certain extent, and I would prefer to instead increase the determinism in these tests. -Andrew 2015-05-08 17:56 GMT-07:00 Ted Yu yuzhih...@gmail.com: Andrew: Do you think the -M and -A options described here can be used in test runs ? http://scalatest.org/user_guide/using_the_runner Cheers On Wed, May 6, 2015 at 5:41 PM, Andrew Or and...@databricks.com wrote: Dear all, I'm sure you have all noticed that the Spark tests have been fairly unstable recently. I wanted to share a tool that I use to track which tests have been failing most often in order to prioritize fixing these flaky tests. Here is an output of the tool. This spreadsheet reports the top 10 failed tests this week (ending yesterday 5/5): https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4 It is produced by a small project: https://github.com/andrewor14/spark-test-failures I have been filing JIRAs on flaky tests based on this tool. Hopefully we can collectively stabilize the build a little more as we near the release for Spark 1.4. -Andrew
Re: [PySpark DataFrame] When a Row is not a Row
In Row#equals(): while (i len) { if (apply(i) != that.apply(i)) { '!=' should be !apply(i).equals(that.apply(i)) ? Cheers On Mon, May 11, 2015 at 1:49 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: This is really strange. # Spark 1.3.1 print type(results) class 'pyspark.sql.dataframe.DataFrame' a = results.take(1)[0] print type(a) class 'pyspark.sql.types.Row' print pyspark.sql.types.Row class 'pyspark.sql.types.Row' print type(a) == pyspark.sql.types.Row False print isinstance(a, pyspark.sql.types.Row) False If I set a as follows, then the type checks pass fine. a = pyspark.sql.types.Row('name')('Nick') Is this a bug? What can I do to narrow down the source? results is a massive DataFrame of spark-perf results. Nick
Re: Build fail...
Looks like you're right: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.3-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/427/console [error] /home/jenkins/workspace/Spark-1.3-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:370: value tryWithSafeFinally is not a member of object org.apache.spark.util.Utils [error] Utils.tryWithSafeFinally { [error] ^ FYI On Fri, May 8, 2015 at 6:53 PM, rtimp dolethebobdol...@gmail.com wrote: Hi, From what I myself noticed a few minutes ago, I think branch-1.3 might be failing to compile due to the most recent commit. I tried reverting to commit 7fd212b575b6227df5068844416e51f11740e771 (the commit prior to the head) on that branch and recompiling, and was successful. As Ferris would say, it is so choice. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Build-fail-tp12170p12171.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: jackson.databind exception in RDDOperationScope.jsonMapper.writeValueAsString(this)
Looks like mismatch of jackson version. Spark uses: fasterxml.jackson.version2.4.4/fasterxml.jackson.version FYI On Wed, May 6, 2015 at 8:00 AM, A.M.Chan kaka_1...@163.com wrote: Hey, guys. I meet this exception while testing SQL/Columns. I didn't change the pom or the core project. In the morning, it's fine to test my PR. I don't know what happed. An exception or error caused a run to abort: com.fasterxml.jackson.databind.introspect.POJOPropertyBuilder.addField(Lcom/fasterxml/jackson/databind/introspect/AnnotatedField;Lcom/fasterxml/jackson/databind/PropertyName;ZZZ)V java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.introspect.POJOPropertyBuilder.addField(Lcom/fasterxml/jackson/databind/introspect/AnnotatedField;Lcom/fasterxml/jackson/databind/PropertyName;ZZZ)V at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector.com $fasterxml$jackson$module$scala$introspect$ScalaPropertiesCollector$$_addField(ScalaPropertiesCollector.scala:109) at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2$$anonfun$apply$11.apply(ScalaPropertiesCollector.scala:100) at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2$$anonfun$apply$11.apply(ScalaPropertiesCollector.scala:99) at scala.Option.foreach(Option.scala:236) at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2.apply(ScalaPropertiesCollector.scala:99) at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2.apply(ScalaPropertiesCollector.scala:93) at scala.collection.GenTraversableViewLike$Filtered$$anonfun$foreach$4.apply(GenTraversableViewLike.scala:109) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.SeqLike$$anon$2.foreach(SeqLike.scala:635) at scala.collection.GenTraversableViewLike$Filtered$class.foreach(GenTraversableViewLike.scala:108) at scala.collection.SeqViewLike$$anon$5.foreach(SeqViewLike.scala:80) at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector._addFields(ScalaPropertiesCollector.scala:93) at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collect(POJOPropertiesCollector.java:233) at com.fasterxml.jackson.databind.introspect.BasicClassIntrospector.collectProperties(BasicClassIntrospector.java:142) at com.fasterxml.jackson.databind.introspect.BasicClassIntrospector.forSerialization(BasicClassIntrospector.java:68) at com.fasterxml.jackson.databind.introspect.BasicClassIntrospector.forSerialization(BasicClassIntrospector.java:11) at com.fasterxml.jackson.databind.SerializationConfig.introspect(SerializationConfig.java:530) at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:133) at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1077) at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1037) at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:445) at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:599) at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:93) at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2811) at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2268) at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:51) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:124) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:99) at org.apache.spark.SparkContext.withScope(SparkContext.scala:671) at org.apache.spark.SparkContext.parallelize(SparkContext.scala:685) -- A.M.Chan
Re: Recent Spark test failures
Andrew: Do you think the -M and -A options described here can be used in test runs ? http://scalatest.org/user_guide/using_the_runner Cheers On Wed, May 6, 2015 at 5:41 PM, Andrew Or and...@databricks.com wrote: Dear all, I'm sure you have all noticed that the Spark tests have been fairly unstable recently. I wanted to share a tool that I use to track which tests have been failing most often in order to prioritize fixing these flaky tests. Here is an output of the tool. This spreadsheet reports the top 10 failed tests this week (ending yesterday 5/5): https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4 It is produced by a small project: https://github.com/andrewor14/spark-test-failures I have been filing JIRAs on flaky tests based on this tool. Hopefully we can collectively stabilize the build a little more as we near the release for Spark 1.4. -Andrew
Re: How to link code pull request with JIRA ID?
Subproject tag should follow SPARK JIRA number. e.g. [SPARK-5277][SQL] ... Cheers On Wed, May 13, 2015 at 11:50 AM, Stephen Boesch java...@gmail.com wrote: following up from Nicholas, it is [SPARK-12345] Your PR description where 12345 is the jira number. One thing I tend to forget is when/where to include the subproject tag e.g. [MLLIB] 2015-05-13 11:11 GMT-07:00 Nicholas Chammas nicholas.cham...@gmail.com: That happens automatically when you open a PR with the JIRA key in the PR title. On Wed, May 13, 2015 at 2:10 PM Chandrashekhar Kotekar shekhar.kote...@gmail.com wrote: Hi, I am new to open source contribution and trying to understand the process starting from pulling code to uploading patch. I have managed to pull code from GitHub. In JIRA I saw that each JIRA issue is connected with pull request. I would like to know how do people attach pull request details to JIRA issue? Thanks, Chandrash3khar Kotekar Mobile - +91 8600011455
Re: Recent Spark test failures
Jenkins build against hadoop 2.4 has been unstable recently: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ I haven't found the test which hung / failed in recent Jenkins builds. But PR builder has several green builds lately: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ Maybe PR builder doesn't build against hadoop 2.4 ? Cheers On Mon, May 11, 2015 at 1:11 PM, Ted Yu yuzhih...@gmail.com wrote: Makes sense. Having high determinism in these tests would make Jenkins build stable. On Mon, May 11, 2015 at 1:08 PM, Andrew Or and...@databricks.com wrote: Hi Ted, Yes, those two options can be useful, but in general I think the standard to set is that tests should never fail. It's actually the worst if tests fail sometimes but not others, because we can't reproduce them deterministically. Using -M and -A actually tolerates flaky tests to a certain extent, and I would prefer to instead increase the determinism in these tests. -Andrew 2015-05-08 17:56 GMT-07:00 Ted Yu yuzhih...@gmail.com: Andrew: Do you think the -M and -A options described here can be used in test runs ? http://scalatest.org/user_guide/using_the_runner Cheers On Wed, May 6, 2015 at 5:41 PM, Andrew Or and...@databricks.com wrote: Dear all, I'm sure you have all noticed that the Spark tests have been fairly unstable recently. I wanted to share a tool that I use to track which tests have been failing most often in order to prioritize fixing these flaky tests. Here is an output of the tool. This spreadsheet reports the top 10 failed tests this week (ending yesterday 5/5): https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4 It is produced by a small project: https://github.com/andrewor14/spark-test-failures I have been filing JIRAs on flaky tests based on this tool. Hopefully we can collectively stabilize the build a little more as we near the release for Spark 1.4. -Andrew
Re: Recent Spark test failures
From https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32831/consoleFull : [info] Building Spark with these arguments: -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver Should PR builder cover hadoop 2.4 as well ? Thanks On Fri, May 15, 2015 at 9:23 AM, Ted Yu yuzhih...@gmail.com wrote: Jenkins build against hadoop 2.4 has been unstable recently: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ I haven't found the test which hung / failed in recent Jenkins builds. But PR builder has several green builds lately: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ Maybe PR builder doesn't build against hadoop 2.4 ? Cheers On Mon, May 11, 2015 at 1:11 PM, Ted Yu yuzhih...@gmail.com wrote: Makes sense. Having high determinism in these tests would make Jenkins build stable. On Mon, May 11, 2015 at 1:08 PM, Andrew Or and...@databricks.com wrote: Hi Ted, Yes, those two options can be useful, but in general I think the standard to set is that tests should never fail. It's actually the worst if tests fail sometimes but not others, because we can't reproduce them deterministically. Using -M and -A actually tolerates flaky tests to a certain extent, and I would prefer to instead increase the determinism in these tests. -Andrew 2015-05-08 17:56 GMT-07:00 Ted Yu yuzhih...@gmail.com: Andrew: Do you think the -M and -A options described here can be used in test runs ? http://scalatest.org/user_guide/using_the_runner Cheers On Wed, May 6, 2015 at 5:41 PM, Andrew Or and...@databricks.com wrote: Dear all, I'm sure you have all noticed that the Spark tests have been fairly unstable recently. I wanted to share a tool that I use to track which tests have been failing most often in order to prioritize fixing these flaky tests. Here is an output of the tool. This spreadsheet reports the top 10 failed tests this week (ending yesterday 5/5): https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4 It is produced by a small project: https://github.com/andrewor14/spark-test-failures I have been filing JIRAs on flaky tests based on this tool. Hopefully we can collectively stabilize the build a little more as we near the release for Spark 1.4. -Andrew
Re: Recent Spark test failures
bq. would be prohibitive to build all configurations for every push Agreed. Can PR builder rotate testing against hadoop 2.3, 2.4, 2.6 and 2.7 (each test run still uses one hadoop profile) ? This way we would have some coverage for each of the major hadoop releases. Cheers On Fri, May 15, 2015 at 10:30 AM, Sean Owen so...@cloudera.com wrote: You all are looking only at the pull request builder. It just does one build to sanity-check a pull request, since that already takes 2 hours and would be prohibitive to build all configurations for every push. There is a different set of Jenkins jobs that periodically tests master against a lot more configurations, including Hadoop 2.4. On Fri, May 15, 2015 at 6:02 PM, Frederick R Reiss frre...@us.ibm.com wrote: The PR builder seems to be building against Hadoop 2.3. In the log for the most recent successful build ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32805/consoleFull ) I see: = Building Spark = [info] Compile with Hive 0.13.1 [info] Building Spark with these arguments: -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver ... = Running Spark unit tests = [info] Running Spark tests with these arguments: -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl test Is anyone testing individual pull requests against Hadoop 2.4 or 2.6 before the code is declared clean? Fred [image: Inactive hide details for Ted Yu ---05/15/2015 09:29:09 AM---Jenkins build against hadoop 2.4 has been unstable recently: https]Ted Yu ---05/15/2015 09:29:09 AM---Jenkins build against hadoop 2.4 has been unstable recently: https://amplab.cs.berkeley.edu/jenkins/ From: Ted Yu yuzhih...@gmail.com To: Andrew Or and...@databricks.com Cc: dev@spark.apache.org dev@spark.apache.org Date: 05/15/2015 09:29 AM Subject: Re: Recent Spark test failures -- Jenkins build against hadoop 2.4 has been unstable recently: *https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/* https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ I haven't found the test which hung / failed in recent Jenkins builds. But PR builder has several green builds lately: *https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/* https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ Maybe PR builder doesn't build against hadoop 2.4 ? Cheers On Mon, May 11, 2015 at 1:11 PM, Ted Yu *yuzhih...@gmail.com* yuzhih...@gmail.com wrote: Makes sense. Having high determinism in these tests would make Jenkins build stable. On Mon, May 11, 2015 at 1:08 PM, Andrew Or *and...@databricks.com* and...@databricks.com wrote: Hi Ted, Yes, those two options can be useful, but in general I think the standard to set is that tests should never fail. It's actually the worst if tests fail sometimes but not others, because we can't reproduce them deterministically. Using -M and -A actually tolerates flaky tests to a certain extent, and I would prefer to instead increase the determinism in these tests. -Andrew 2015-05-08 17:56 GMT-07:00 Ted Yu *yuzhih...@gmail.com* yuzhih...@gmail.com: Andrew: Do you think the -M and -A options described here can be used in test runs ? *http://scalatest.org/user_guide/using_the_runner* http://scalatest.org/user_guide/using_the_runner Cheers On Wed, May 6, 2015 at 5:41 PM, Andrew Or *and...@databricks.com* and...@databricks.com wrote: Dear all, I'm sure you have all noticed that the Spark tests have been fairly unstable recently. I wanted to share a tool that I use to track which tests have been failing most often in order to prioritize fixing these flaky tests. Here is an output of the tool. This spreadsheet reports the top 10 failed tests this week (ending yesterday 5/5): *https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4* https://docs.google.com/spreadsheets/d/1Iv_UDaTFGTMad1sOQ_s4ddWr6KD3PuFIHmTSzL7LSb4 It is produced by a small project: *https://github.com/andrewor14/spark-test-failures* https://github.com/andrewor14/spark-test-failures I have been filing JIRAs on flaky tests based on this tool
Re: how long does it takes for full build ?
You can find the command at the beginning of the console output: [centos] $ /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn -DHADOOP_PROFILE=hadoop-2.4 -Dlabel=centos -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package On Thu, Apr 16, 2015 at 12:42 PM, Sree V sree_at_ch...@yahoo.com wrote: 1. 40 min+ to 1hr+, from jenkins. I didn't find the commands of the job. Does it require a login ? Part of the console output: git checkout -f 3ae37b93a7c299bd8b22a36248035bca5de3422f git rev-list de4fa6b6d12e2bee0307ffba2abfca0c33f15e45 # timeout=10 Triggering Spark-Master-Maven-pre-YARN ? 2.0.0-mr1-cdh4.1.2,centos https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/ Triggering Spark-Master-Maven-pre-YARN ? 1.0.4,centos https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/ How to find the commands of these 'triggers' ? I am interested, whether these named triggers use -DskipTests or not. 2. This page, gives examples all with -DskipTests only. http://spark.apache.org/docs/1.2.0/building-spark.html 3. For casting VOTE to release 1.2.2-rc1, I am running 'mvn clean package' on spark 1.2.2-rc1 with oralce jdk8_40 on centos7. This is stuck at, from last night. i.e. almost 12 hours. ... ExternalSorterSuite: - empty data stream - few elements per partition - empty partitions with spilling - empty partitions with spilling, bypass merge-sort Any pointers ? Thanking you. With Regards Sree On Thursday, April 16, 2015 12:01 PM, Ted Yu yuzhih...@gmail.com wrote: You can get some idea by looking at the builds here: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.2-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ Cheers On Thu, Apr 16, 2015 at 11:56 AM, Sree V sree_at_ch...@yahoo.com.invalid wrote: Hi Team, How long does it takes for a full build 'mvn clean package' on spark 1.2.2-rc1 ? Thanking you. With Regards Sree
Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}
The image didn't go through. I think you were referring to: override def map[R: ClassTag](f: Row = R): RDD[R] = rdd.map(f) Cheers On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi everyone, I had an issue trying to use Spark SQL from Java (8 or 7), I tried to reproduce it in a small test case close to the actual documentation https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection, so sorry for the long mail, but this is Java : import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; import java.io.Serializable; import java.util.ArrayList; import java.util.Arrays; import java.util.List; class Movie implements Serializable { private int id; private String name; public Movie(int id, String name) { this.id = id; this.name = name; } public int getId() { return id; } public void setId(int id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } } public class SparkSQLTest { public static void main(String[] args) { SparkConf conf = new SparkConf(); conf.setAppName(My Application); conf.setMaster(local); JavaSparkContext sc = new JavaSparkContext(conf); ArrayListMovie movieArrayList = new ArrayListMovie(); movieArrayList.add(new Movie(1, Indiana Jones)); JavaRDDMovie movies = sc.parallelize(movieArrayList); SQLContext sqlContext = new SQLContext(sc); DataFrame frame = sqlContext.applySchema(movies, Movie.class); frame.registerTempTable(movies); sqlContext.sql(select name from movies) *.map(row - row.getString(0)) // this is what i would expect to work *.collect(); } } But this does not compile, here's the compilation error : [ERROR] /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47] method map in class org.apache.spark.sql.DataFrame cannot be applied to given types; [ERROR] *required: scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR * [ERROR]* found: (row)-Na[...]ng(0) * [ERROR] *reason: cannot infer type-variable(s) R * [ERROR] *(actual and formal argument lists differ in length) * [ERROR] /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17] method map in class org.apache.spark.sql.DataFrame cannot be applied to given types; [ERROR] required: scala.Function1org.apache.spark.sql.Row,R,scala.reflect.ClassTagR [ERROR] found: (row)-row[...]ng(0) [ERROR] reason: cannot infer type-variable(s) R [ERROR] (actual and formal argument lists differ in length) [ERROR] - [Help 1] Because in the DataFrame the *map *method is defined as : [image: Images intégrées 1] And once this is translated to bytecode the actual Java signature uses a Function1 and adds a ClassTag parameter. I can try to go around this and use the scala.reflect.ClassTag$ like that : ClassTag$.MODULE$.apply(String.class) To get the second ClassTag parameter right, but then instantiating a java.util.Function or using the Java 8 lambdas fail to work, and if I try to instantiate a proper scala Function1... well this is a world of pain. This is a regression introduced by the 1.3.x DataFrame because JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are not callable with JFunctions), I can open a Jira if you want ? Regards, -- *Olivier Girardot* | Associé o.girar...@lateral-thoughts.com +33 6 24 09 17 94
Re: wait time between start master and start slaves
From SparkUI.scala : def getUIPort(conf: SparkConf): Int = { conf.getInt(spark.ui.port, SparkUI.DEFAULT_PORT) } Better retrieve effective UI port before probing. Cheers On Sat, Apr 11, 2015 at 2:38 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: So basically, to tell if the master is ready to accept slaves, just poll http://master-node:4040 for an HTTP 200 response? On Sat, Apr 11, 2015 at 2:42 PM Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Yeah from what I remember it was set defensively. I don't know of a good way to check if the master is up though. I guess we could poll the Master Web UI and see if we get a 200/ok response Shivaram On Fri, Apr 10, 2015 at 8:24 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Check this out https://github.com/mesos/spark-ec2/blob/f0a48be1bb5aaeef508619a46065648beb8f1d92/spark-standalone/setup.sh#L26-L33 (from spark-ec2): # Start Master$BIN_FOLDER/start-master.sh # Pause sleep 20 # Start Workers$BIN_FOLDER/start-slaves.sh I know this was probably done defensively, but is there a more direct way to know when the master is ready? Nick
Re: Anyone facing problem in incremental building of individual project
Andrew Or put in this workaround : diff --git a/pom.xml b/pom.xml index 0b1aaad..d03d33b 100644 --- a/pom.xml +++ b/pom.xml @@ -1438,6 +1438,8 @@ version2.3/version configuration shadedArtifactAttachedfalse/shadedArtifactAttached + !-- Work around MSHADE-148 -- + createDependencyReducedPomfalse/createDependencyReducedPom artifactSet includes !-- At a minimum we must include this to force effective pom generation -- FYI On Thu, Jun 4, 2015 at 6:25 AM, Steve Loughran ste...@hortonworks.com wrote: On 4 Jun 2015, at 11:16, Meethu Mathew meethu.mat...@flytxt.com wrote: Hi all, I added some new code to MLlib. When I am trying to build only the mllib project using *mvn --projects mllib/ -DskipTests clean install* * *after setting export S PARK_PREPEND_CLASSES=true , the build is getting stuck with the following message. Excluding org.jpmml:pmml-schema:jar:1.1.15 from the shaded jar. [INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.7 from the shaded jar. [INFO] Excluding com.sun.xml.bind:jaxb-core:jar:2.2.7 from the shaded jar. [INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.7 from the shaded jar. [INFO] Including org.spark-project.spark:unused:jar:1.0.0 in the shaded jar. [INFO] Excluding org.scala-lang:scala-reflect:jar:2.10.4 from the shaded jar. [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT.jar with /home/meethu/git/FlytxtRnD/spark/mllib/target/spark-mllib_2.10-1.4.0-SNAPSHOT-shaded.jar [INFO] Dependency-reduced POM written at: /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/meethu/git/FlytxtRnD/spark/mllib/dependency-reduced-pom.xml . I've seen something similar in a different build, It looks like MSHADE-148: https://issues.apache.org/jira/browse/MSHADE-148 if you apply Tom White's patch, does your problem go away?
Re: [VOTE] Release Apache Spark 1.4.1
I got the following when running test suite: [INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null) ^[[0m[^[[0minfo^[[0m] ^[[0mCompiling 2 Scala sources and 1 Java source to /home/hbase/spark-1.4.1/streaming/target/scala-2.10/test-classes...^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m/home/hbase/spark-1.4.1/streaming/src/test/scala/org/apache/spark/streaming/DStreamClosureSuite.scala:82: not found: type TestException^[[0m ^[[0m[^[[31merror^[[0m] ^[[0mthrow new TestException(^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m ^^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m/home/hbase/spark-1.4.1/streaming/src/test/scala/org/apache/spark/streaming/scheduler/JobGeneratorSuite.scala:73: not found: type TestReceiver^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m val inputStream = ssc.receiverStream(new TestReceiver)^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m ^^[[0m ^[[0m[^[[31merror^[[0m] ^[[0mtwo errors found^[[0m ^[[0m[^[[31merror^[[0m] ^[[0mCompile failed at Jun 25, 2015 5:12:24 PM [1.492s]^[[0m Has anyone else seen similar error ? Thanks On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 60e08e50751fe3929156de956d62faea79f5b801 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1118/ [published as version: 1.4.1-rc1] https://repository.apache.org/content/repositories/orgapachespark-1119/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Saturday, June 27, at 06:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1
Pardon. During earlier test run, I got: ^[[32mStreamingContextSuite:^[[0m ^[[32m- from no conf constructor^[[0m ^[[32m- from no conf + spark home^[[0m ^[[32m- from no conf + spark home + env^[[0m ^[[32m- from conf with settings^[[0m ^[[32m- from existing SparkContext^[[0m ^[[32m- from existing SparkContext with settings^[[0m ^[[31m*** RUN ABORTED ***^[[0m ^[[31m java.lang.NoSuchMethodError: org.apache.spark.ui.JettyUtils$.createStaticHandler(Ljava/lang/String;Ljava/lang/String;)Lorg/eclipse/jetty/servlet/ServletContextHandler;^[[0m ^[[31m at org.apache.spark.streaming.ui.StreamingTab.attach(StreamingTab.scala:49)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:601)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:601)^[[0m ^[[31m at scala.Option.foreach(Option.scala:236)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:601)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply$mcV$sp(StreamingContextSuite.scala:101)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m ^[[31m at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)^[[0m ^[[31m at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)^[[0m The error from previous email was due to absence of StreamingContextSuite.scala On Fri, Jun 26, 2015 at 1:27 PM, Ted Yu yuzhih...@gmail.com wrote: I got the following when running test suite: [INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null) ^[[0m[^[[0minfo^[[0m] ^[[0mCompiling 2 Scala sources and 1 Java source to /home/hbase/spark-1.4.1/streaming/target/scala-2.10/test-classes...^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m/home/hbase/spark-1.4.1/streaming/src/test/scala/org/apache/spark/streaming/DStreamClosureSuite.scala:82: not found: type TestException^[[0m ^[[0m[^[[31merror^[[0m] ^[[0mthrow new TestException(^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m ^^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m/home/hbase/spark-1.4.1/streaming/src/test/scala/org/apache/spark/streaming/scheduler/JobGeneratorSuite.scala:73: not found: type TestReceiver^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m val inputStream = ssc.receiverStream(new TestReceiver)^[[0m ^[[0m[^[[31merror^[[0m] ^[[0m ^^[[0m ^[[0m[^[[31merror^[[0m] ^[[0mtwo errors found^[[0m ^[[0m[^[[31merror^[[0m] ^[[0mCompile failed at Jun 25, 2015 5:12:24 PM [1.492s]^[[0m Has anyone else seen similar error ? Thanks On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 60e08e50751fe3929156de956d62faea79f5b801 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1118/ [published as version: 1.4.1-rc1] https://repository.apache.org/content/repositories/orgapachespark-1119/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Saturday, June 27, at 06:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: problem with using mapPartitions
bq. val result = fDB.mappartitions(testMP).collect Not sure if you pasted the above code - there was a typo: method name should be mapPartitions Cheers On Sat, May 30, 2015 at 9:44 AM, unioah uni...@gmail.com wrote: Hi, I try to aggregate the value in each partition internally. For example, Before: worker 1:worker 2: 1, 2, 1 2, 1, 2 After: worker 1: worker 2: (1-2), (2-1) (1-1), (2-2) I try to use mappartitions, object MyTest { def main(args: Array[String]) { val conf = new SparkConf().setAppName(This is a test) val sc = new SparkContext(conf) val fDB = sc.parallelize(List(1, 2, 1, 2, 1, 2, 5, 5, 2), 3) val result = fDB.mappartitions(testMP).collect println(result.mkString) sc.stop } def testMP(iter: Iterator[Int]): Iterator[(Long, Int)] = { var result = new LongMap[Int]() var cur = 0l while (iter.hasNext) { cur = iter.next.toLong if (result.contains(cur)) { result(cur) += 1 } else { result += (cur, 1) } } result.toList.iterator } } But I got the error message no matter how I tried. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependent Stages(DAGScheduler.scala:1204) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/05/30 10:41:21 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 Anybody can help me? Thx -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/problem-with-using-mapPartitions-tp12514.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
StreamingContextSuite fails with NoSuchMethodError
Hi, I ran the following command on 1.4.0 RC3: mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package I saw the following failure: ^[[32mStreamingContextSuite:^[[0m ^[[32m- from no conf constructor^[[0m ^[[32m- from no conf + spark home^[[0m ^[[32m- from no conf + spark home + env^[[0m ^[[32m- from conf with settings^[[0m ^[[32m- from existing SparkContext^[[0m ^[[32m- from existing SparkContext with settings^[[0m ^[[31m*** RUN ABORTED ***^[[0m ^[[31m java.lang.NoSuchMethodError: org.apache.spark.ui.JettyUtils$.createStaticHandler(Ljava/lang/String;Ljava/lang/String;)Lorg/eclipse/jetty/servlet/ServletContextHandler;^[[0m ^[[31m at org.apache.spark.streaming.ui.StreamingTab.attach(StreamingTab.scala:49)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:585)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:585)^[[0m ^[[31m at scala.Option.foreach(Option.scala:236)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:585)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply$mcV$sp(StreamingContextSuite.scala:101)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m ^[[31m at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)^[[0m ^[[31m at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)^[[0m Did anyone else encounter similar error ? Cheers
Re: StreamingContextSuite fails with NoSuchMethodError
I downloaded source tar ball and ran command similar to following with: clean package -DskipTests Then I ran the following command. Fyi On May 30, 2015, at 12:42 AM, Tathagata Das t...@databricks.com wrote: Did was it a clean compilation? TD On Fri, May 29, 2015 at 10:48 PM, Ted Yu yuzhih...@gmail.com wrote: Hi, I ran the following command on 1.4.0 RC3: mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package I saw the following failure: ^[[32mStreamingContextSuite:^[[0m ^[[32m- from no conf constructor^[[0m ^[[32m- from no conf + spark home^[[0m ^[[32m- from no conf + spark home + env^[[0m ^[[32m- from conf with settings^[[0m ^[[32m- from existing SparkContext^[[0m ^[[32m- from existing SparkContext with settings^[[0m ^[[31m*** RUN ABORTED ***^[[0m ^[[31m java.lang.NoSuchMethodError: org.apache.spark.ui.JettyUtils$.createStaticHandler(Ljava/lang/String;Ljava/lang/String;)Lorg/eclipse/jetty/servlet/ServletContextHandler;^[[0m ^[[31m at org.apache.spark.streaming.ui.StreamingTab.attach(StreamingTab.scala:49)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:585)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContext$$anonfun$start$2.apply(StreamingContext.scala:585)^[[0m ^[[31m at scala.Option.foreach(Option.scala:236)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:585)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply$mcV$sp(StreamingContextSuite.scala:101)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m ^[[31m at org.apache.spark.streaming.StreamingContextSuite$$anonfun$8.apply(StreamingContextSuite.scala:96)^[[0m ^[[31m at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)^[[0m ^[[31m at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)^[[0m Did anyone else encounter similar error ? Cheers
Re: Can not build master
Here is mine: Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c; 2015-03-13T13:10:27-07:00) Maven home: /home/hbase/apache-maven-3.3.1 Java version: 1.8.0_45, vendor: Oracle Corporation Java home: /home/hbase/jdk1.8.0_45/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-504.el6.x86_64, arch: amd64, family: unix On Fri, Jul 3, 2015 at 6:05 PM, Andrew Or and...@databricks.com wrote: @Tarek and Ted, what maven versions are you using? 2015-07-03 17:35 GMT-07:00 Krishna Sankar ksanka...@gmail.com: Patrick, I assume an RC3 will be out for folks like me to test the distribution. As usual, I will run the tests when you have a new distribution. Cheers k/ On Fri, Jul 3, 2015 at 4:38 PM, Patrick Wendell pwend...@gmail.com wrote: Patch that added test-jar dependencies: https://github.com/apache/spark/commit/bfe74b34 Patch that originally disabled dependency reduced poms: https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724 Patch that reverted the disabling of dependency reduced poms: https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell pwend...@gmail.com wrote: Okay I did some forensics with Sean Owen. Some things about this bug: 1. The underlying cause is that we added some code to make the tests of sub modules depend on the core tests. For unknown reasons this causes Spark to hit MSHADE-148 for *some* combinations of build profiles. 2. MSHADE-148 can be worked around by disabling building of dependency reduced poms because then the buggy code path is circumvented. Andrew Or did this in a patch on the 1.4 branch. However, that is not a tenable option for us because our *published* pom files require dependency reduction to substitute in the scala version correctly for the poms published to maven central. 3. As a result, Andrew Or reverted his patch recently, causing some package builds to start failing again (but publishing works now). 4. The reason this is not detected in our test harness or release build is that it is sensitive to the profiles enabled. The combination of profiles we enable in the test harness and release builds do not trigger this bug. The best path I see forward right now is to do the following: 1. Disable creation of dependency reduced poms by default (this doesn't matter for people doing a package build) so typical users won't have this bug. 2. Add a profile that re-enables that setting. 3. Use the above profile when publishing release artifacts to maven central. 4. Hope that we don't hit this bug for publishing. - Patrick On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote: Doesn't change anything for me. On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote: Can you try using the built in maven build/mvn...? All of our builds are passing on Jenkins so I wonder if it's a maven version issue: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/ - Patrick On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at SPARK-8781 (https://github.com/apache/spark/pull/7193) Cheers On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote: I found a solution, there might be a better one. https://github.com/apache/spark/pull/7217 On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk wrote: Yes me too On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote: This is what I got (the last line was repeated non-stop): [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com wrote: Hi all, I am trying to build the master, but it stucks and prints [INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml build command: mvn -DskipTests clean package Do others have the same issue? Regards, Tarek - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Can not build master
This is what I got (the last line was repeated non-stop): [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com wrote: Hi all, I am trying to build the master, but it stucks and prints [INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml build command: mvn -DskipTests clean package Do others have the same issue? Regards, Tarek
Re: [VOTE] Release Apache Spark 1.4.1 (RC2)
Patrick: I used the following command: ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package The build doesn't seem to stop. Here is tail of build output: [INFO] Dependency-reduced POM written at: /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml Here is part of the stack trace for the build process: http://pastebin.com/xL2Y0QMU FYI On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 07b95c7adf88f0662b7ab1c47e302ff5e6859606 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1120/ [published as version: 1.4.1-rc2] https://repository.apache.org/content/repositories/orgapachespark-1121/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Monday, July 06, at 22:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1
Here is the command I used: mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive package Java: 1.8.0_45 OS: Linux x.com 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Cheers On Mon, Jun 29, 2015 at 12:04 AM, Tathagata Das tathagata.das1...@gmail.com wrote: @Ted, could you elaborate more on what was the test command that you ran? What profiles, using SBT or Maven? TD On Sun, Jun 28, 2015 at 12:21 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Krishna - this is still the current release candidate. - Patrick On Sun, Jun 28, 2015 at 12:14 PM, Krishna Sankar ksanka...@gmail.com wrote: Patrick, Haven't seen any replies on test results. I will byte ;o) - Should I test this version or is another one in the wings ? Cheers k/ On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 60e08e50751fe3929156de956d62faea79f5b801 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1118/ [published as version: 1.4.1-rc1] https://repository.apache.org/content/repositories/orgapachespark-1119/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Saturday, June 27, at 06:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Spark 1.5.0-SNAPSHOT broken with Scala 2.11
Spark-Master-Scala211-Compile build is green. However it is not clear what the actual command is: [EnvInject] - Variables injected successfully. [Spark-Master-Scala211-Compile] $ /bin/bash /tmp/hudson8945334776362889961.sh FYI On Sun, Jun 28, 2015 at 6:02 PM, Alessandro Baretta alexbare...@gmail.com wrote: I am building the current master branch with Scala 2.11 following these instructions: Building for Scala 2.11 To produce a Spark package compiled with Scala 2.11, use the -Dscala-2.11 property: dev/change-version-to-2.11.sh mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package Here's what I'm seeing: log4j:WARN No appenders could be found for logger (org.apache.hadoop.security.Groups). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties To adjust logging level use sc.setLogLevel(INFO) Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.0-SNAPSHOT /_/ Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_79) Type in expressions to have them evaluated. Type :help for more information. 15/06/29 00:42:20 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down ActorSystem [sparkDriver] java.lang.VerifyError: class akka.remote.WireFormats$AkkaControlMessage overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at akka.remote.transport.AkkaPduProtobufCodec$.constructControlMessagePdu(AkkaPduCodec.scala:231) at akka.remote.transport.AkkaPduProtobufCodec$.init(AkkaPduCodec.scala:153) at akka.remote.transport.AkkaPduProtobufCodec$.clinit(AkkaPduCodec.scala) at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:733) at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:703) What am I doing wrong?
Re: Kryo option changed
Please update to the following: commit c2f0821aad3b82dcd327e914c9b297e92526649d Author: Zhang, Liye liye.zh...@intel.com Date: Fri May 8 09:10:58 2015 +0100 [SPARK-7392] [CORE] bugfix: Kryo buffer size cannot be larger than 2M On Sun, May 24, 2015 at 8:04 AM, Debasish Das debasish.da...@gmail.com wrote: I am May 3rd commit: commit 49549d5a1a867c3ba25f5e4aec351d4102444bc0 Author: WangTaoTheTonic wangtao...@huawei.com Date: Sun May 3 00:47:47 2015 +0100 [SPARK-7031] [THRIFTSERVER] let thrift server take SPARK_DAEMON_MEMORY and SPARK_DAEMON_JAVA_OPTS On Sat, May 23, 2015 at 7:54 PM, Josh Rosen rosenvi...@gmail.com wrote: Which commit of master are you building off? It looks like there was a bugfix for an issue related to KryoSerializer buffer configuration: https://github.com/apache/spark/pull/5934 That patch was committed two weeks ago, but you mentioned that you're building off a newer version of master. Could you confirm the commit that you're running? If this used to work but now throws an error, then this is a regression that should be fixed; we shouldn't require you to perform a mb - kb conversion to work around this. On Sat, May 23, 2015 at 6:37 PM, Ted Yu yuzhih...@gmail.com wrote: Pardon me. Please use '8192k' Cheers On Sat, May 23, 2015 at 6:24 PM, Debasish Das debasish.da...@gmail.com wrote: Tried 8mb...still I am failing on the same error... On Sat, May 23, 2015 at 6:10 PM, Ted Yu yuzhih...@gmail.com wrote: bq. it shuld be 8mb Please use the above syntax. Cheers On Sat, May 23, 2015 at 6:04 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am on last week's master but all the examples that set up the following .set(spark.kryoserializer.buffer, 8m) are failing with the following error: Exception in thread main java.lang.IllegalArgumentException: spark.kryoserializer.buffer must be less than 2048 mb, got: + 8192 mb. looks like buffer.mb is deprecated...Is 8m is not the right syntax to get 8mb kryo buffer or it shuld be 8mb Thanks. Deb
Re: Kryo option changed
bq. it shuld be 8mb Please use the above syntax. Cheers On Sat, May 23, 2015 at 6:04 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am on last week's master but all the examples that set up the following .set(spark.kryoserializer.buffer, 8m) are failing with the following error: Exception in thread main java.lang.IllegalArgumentException: spark.kryoserializer.buffer must be less than 2048 mb, got: + 8192 mb. looks like buffer.mb is deprecated...Is 8m is not the right syntax to get 8mb kryo buffer or it shuld be 8mb Thanks. Deb
Re: Kryo option changed
Pardon me. Please use '8192k' Cheers On Sat, May 23, 2015 at 6:24 PM, Debasish Das debasish.da...@gmail.com wrote: Tried 8mb...still I am failing on the same error... On Sat, May 23, 2015 at 6:10 PM, Ted Yu yuzhih...@gmail.com wrote: bq. it shuld be 8mb Please use the above syntax. Cheers On Sat, May 23, 2015 at 6:04 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am on last week's master but all the examples that set up the following .set(spark.kryoserializer.buffer, 8m) are failing with the following error: Exception in thread main java.lang.IllegalArgumentException: spark.kryoserializer.buffer must be less than 2048 mb, got: + 8192 mb. looks like buffer.mb is deprecated...Is 8m is not the right syntax to get 8mb kryo buffer or it shuld be 8mb Thanks. Deb
Re: [IMPORTANT] Committers please update merge script
INFRA-9646 has been resolved. FYI On Wed, May 13, 2015 at 6:00 PM, Patrick Wendell pwend...@gmail.com wrote: Hi All - unfortunately the fix introduced another bug, which is that fixVersion was not updated properly. I've updated the script and had one other person test it. So committers please pull from master again thanks! - Patrick On Tue, May 12, 2015 at 6:25 PM, Patrick Wendell pwend...@gmail.com wrote: Due to an ASF infrastructure change (bug?) [1] the default JIRA resolution status has switched to Pending Closed. I've made a change to our merge script to coerce the correct status of Fixed when resolving [2]. Please upgrade the merge script to master. I've manually corrected JIRA's that were closed with the incorrect status. Let me know if you have any issues. [1] https://issues.apache.org/jira/browse/INFRA-9646 [2] https://github.com/apache/spark/commit/1b9e434b6c19f23a01e9875a3c1966cd03ce8e2d - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Unable to build from assembly
What version of Java do you use ? Can you run this command first ? build/sbt clean BTW please see [SPARK-7498] [MLLIB] add varargs back to setDefault Cheers On Fri, May 22, 2015 at 7:34 AM, Manoj Kumar manojkumarsivaraj...@gmail.com wrote: Hello, I updated my master from upstream recently, and on running build/sbt assembly it gives me this error [error] /home/manoj/spark/examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java:106: error: MyJavaLogisticRegression is not abstract and does not override abstract method setDefault(ParamPair?...) in Params [error] class MyJavaLogisticRegression [error] ^ [error] /home/manoj/spark/examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java:168: error: MyJavaLogisticRegressionModel is not abstract and does not override abstract method setDefault(ParamPair?...) in Params [error] class MyJavaLogisticRegressionModel [error] ^ [error] 2 errors [error] (examples/compile:compile) javac returned nonzero exit code It was working fine before this. Could someone please guide me on what could be wrong? -- Godspeed, Manoj Kumar, http://manojbits.wordpress.com http://goog_1017110195 http://github.com/MechCoder
Re: 答复: 答复: Package Release Annoucement: Spark SQL on HBase Astro
Yan: Where can I find performance numbers for Astro (it's close to middle of August) ? Cheers On Tue, Aug 11, 2015 at 3:58 PM, Yan Zhou.sc yan.zhou...@huawei.com wrote: Finally I can take a look at HBASE-14181 now. Unfortunately there is no design doc mentioned. Superficially it is very similar to Astro with a difference of this being part of HBase client library; while Astro works as a Spark package so will evolve and function more closely with Spark SQL/Dataframe instead of HBase. In terms of architecture, my take is loosely-coupled query engines on top of KV store vs. an array of query engines supported by, and packaged as part of, a KV store. Functionality-wise the two could be close but Astro also supports Python as a result of tight integration with Spark. It will be interesting to see performance comparisons when HBase-14181 is ready. Thanks, *From:* Ted Yu [mailto:yuzhih...@gmail.com] *Sent:* Tuesday, August 11, 2015 3:28 PM *To:* Yan Zhou.sc *Cc:* Bing Xiao (Bing); dev@spark.apache.org; u...@spark.apache.org *Subject:* Re: 答复: Package Release Annoucement: Spark SQL on HBase Astro HBase will not have query engine. It will provide better support to query engines. Cheers On Aug 10, 2015, at 11:11 PM, Yan Zhou.sc yan.zhou...@huawei.com wrote: Ted, I’m in China now, and seem to experience difficulty to access Apache Jira. Anyways, it appears to me that HBASE-14181 https://issues.apache.org/jira/browse/HBASE-14181 attempts to support Spark DataFrame inside HBase. If true, one question to me is whether HBase is intended to have a built-in query engine or not. Or it will stick with the current way as a k-v store with some built-in processing capabilities in the forms of coprocessor, custom filter, …, etc., which allows for loosely-coupled query engines built on top of it. Thanks, *发件人**:* Ted Yu [mailto:yuzhih...@gmail.com yuzhih...@gmail.com] *发送时间**:* 2015年8月11日 8:54 *收件人**:* Bing Xiao (Bing) *抄送**:* dev@spark.apache.org; u...@spark.apache.org; Yan Zhou.sc *主题**:* Re: Package Release Annoucement: Spark SQL on HBase Astro Yan / Bing: Mind taking a look at HBASE-14181 https://issues.apache.org/jira/browse/HBASE-14181 'Add Spark DataFrame DataSource to HBase-Spark Module' ? Thanks On Wed, Jul 22, 2015 at 4:53 PM, Bing Xiao (Bing) bing.x...@huawei.com wrote: We are happy to announce the availability of the Spark SQL on HBase 1.0.0 release. http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase The main features in this package, dubbed “Astro”, include: · Systematic and powerful handling of data pruning and intelligent scan, based on partial evaluation technique · HBase pushdown capabilities like custom filters and coprocessor to support ultra low latency processing · SQL, Data Frame support · More SQL capabilities made possible (Secondary index, bloom filter, Primary Key, Bulk load, Update) · Joins with data from other sources · Python/Java/Scala support · Support latest Spark 1.4.0 release The tests by Huawei team and community contributors covered the areas: bulk load; projection pruning; partition pruning; partial evaluation; code generation; coprocessor; customer filtering; DML; complex filtering on keys and non-keys; Join/union with non-Hbase data; Data Frame; multi-column family test. We will post the test results including performance tests the middle of August. You are very welcomed to try out or deploy the package, and help improve the integration tests with various combinations of the settings, extensive Data Frame tests, complex join/union test and extensive performance tests. Please use the “Issues” “Pull Requests” links at this package homepage, if you want to report bugs, improvement or feature requests. Special thanks to project owner and technical leader Yan Zhou, Huawei global team, community contributors and Databricks. Databricks has been providing great assistance from the design to the release. “Astro”, the Spark SQL on HBase package will be useful for ultra low latency* query and analytics of large scale data sets in vertical enterprises**.* We will continue to work with the community to develop new features and improve code base. Your comments and suggestions are greatly appreciated. Yan Zhou / Bing Xiao Huawei Big Data team
Re: [VOTE] Release Apache Spark 1.5.0 (RC1)
I pointed hbase-spark module (in HBase project) to 1.5.0-rc1 and was able to build the module (with proper maven repo). FYI On Fri, Aug 21, 2015 at 2:17 PM, mkhaitman mark.khait...@chango.com wrote: Just a heads up that this RC1 release is still appearing as 1.5.0-SNAPSHOT (Not just me right..?) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC1-tp13780p13792.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: What's the best practice for developing new features for spark ?
See this thread: http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+modulesubj=Building+Spark+Building+just+one+module+ On Aug 19, 2015, at 1:44 AM, canan chen ccn...@gmail.com wrote: I want to work on one jira, but it is not easy to do unit test, because it involves different components especially UI. spark building is pretty slow, I don't want to build it each time to test my code change. I am wondering how other people do ? Is there any experience can share ? Thanks - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1
The test passes when run alone on my machine as well. Please run test suite. Thanks On Mon, Jun 29, 2015 at 2:01 PM, Tathagata Das tathagata.das1...@gmail.com wrote: @Ted, I ran the following two commands. mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive -DskipTests clean package mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive -DwildcardSuites=org.apache.spark.streaming.StreamingContextSuite test Using Java version 1.7.0_51, the tests passed normally. On Mon, Jun 29, 2015 at 1:05 PM, Krishna Sankar ksanka...@gmail.com wrote: +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:26 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK 3. Scala - MLlib 3.1. statistics (min,max,mean,Pearson,Spearman) OK 3.2. LinearRegressionWithSGD OK 3.3. Decision Tree OK 3.4. KMeans OK 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK 3.6. saveAsParquetFile OK 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, registerTempTable, sql OK 3.8. result = sqlContext.sql(SELECT OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK 4.0. Spark SQL from Python OK 4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA') OK 5.0. Packages 5.1. com.databricks.spark.csv - read/write OK Cheers k/ On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 60e08e50751fe3929156de956d62faea79f5b801 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1118/ [published as version: 1.4.1-rc1] https://repository.apache.org/content/repositories/orgapachespark-1119/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Saturday, June 27, at 06:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1
Andrew: I agree with your assessment. Cheers On Mon, Jun 29, 2015 at 3:33 PM, Andrew Or and...@databricks.com wrote: Hi Ted, We haven't observed a StreamingContextSuite failure on our test infrastructure recently. Given that we cannot reproduce it even locally it is unlikely that this uncovers a real bug. Even if it does I would not block the release on it because many in the community are waiting for a few important fixes. In general, there will always be outstanding issues in Spark that we cannot address in every release. -Andrew 2015-06-29 14:29 GMT-07:00 Ted Yu yuzhih...@gmail.com: The test passes when run alone on my machine as well. Please run test suite. Thanks On Mon, Jun 29, 2015 at 2:01 PM, Tathagata Das tathagata.das1...@gmail.com wrote: @Ted, I ran the following two commands. mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive -DskipTests clean package mvn -Phadoop-2.4 -Dhadoop.version=2.7.0 -Pyarn -Phive -DwildcardSuites=org.apache.spark.streaming.StreamingContextSuite test Using Java version 1.7.0_51, the tests passed normally. On Mon, Jun 29, 2015 at 1:05 PM, Krishna Sankar ksanka...@gmail.com wrote: +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:26 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK 3. Scala - MLlib 3.1. statistics (min,max,mean,Pearson,Spearman) OK 3.2. LinearRegressionWithSGD OK 3.3. Decision Tree OK 3.4. KMeans OK 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK 3.6. saveAsParquetFile OK 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, registerTempTable, sql OK 3.8. result = sqlContext.sql(SELECT OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK 4.0. Spark SQL from Python OK 4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA') OK 5.0. Packages 5.1. com.databricks.spark.csv - read/write OK Cheers k/ On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 60e08e50751fe3929156de956d62faea79f5b801 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1118/ [published as version: 1.4.1-rc1] https://repository.apache.org/content/repositories/orgapachespark-1119/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Saturday, June 27, at 06:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: add to user list
Please take a look at the first section of: https://spark.apache.org/community On Thu, Jul 30, 2015 at 9:23 PM, Sachin Aggarwal different.sac...@gmail.com wrote: -- Thanks Regards Sachin Aggarwal 7760502772
Re: High availability with zookeeper: worker discovery
zookeeper is not a direct dependency of Spark. Can you give a bit more detail on how the election / discovery of master works ? Cheers On Thu, Jul 30, 2015 at 7:41 PM, Christophe Schmitz cofcof...@gmail.com wrote: Hi there, I am trying to run a 3 node spark cluster where each nodes contains a spark worker and a spark maser. Election of the master happens via zookeeper. The way I am configuring it is by (on each node) giving the IP:PORT of the local master to the local worker, and I wish the worker could autodiscover the elected master automatically. But unfortunatly, only the local worker of the elected master registered to the elected master. Why aren't the other worker getting to connect to the elected master? The interessing thing is that if I kill the elected master and wait a bit, then the new elected master sees all the workers! I am wondering if I am missing something to make this happens without having to kill the elected master. Thanks! PS: I am on spark 1.2.2