Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier
Sean Do you know how to tell decision tree that the "label" is a binary or set some attributes to dataframe to carry number of classes? Thanks! - Terry On Sun, Sep 6, 2015 at 5:23 PM, Sean Owen <so...@cloudera.com> wrote: > (Sean) > The error suggests that the type is not a binary or nominal attribute > though. I think that's the missing step. A double-valued column need > not be one of these attribute types. > > On Sun, Sep 6, 2015 at 10:14 AM, Terry Hole <hujie.ea...@gmail.com> wrote: > > Hi, Owen, > > > > The dataframe "training" is from a RDD of case class: > RDD[LabeledDocument], > > while the case class is defined as this: > > case class LabeledDocument(id: Long, text: String, label: Double) > > > > So there is already has the default "label" column with "double" type. > > > > I already tried to set the label column for decision tree as this: > > val lr = new > > > DecisionTreeClassifier().setMaxDepth(5).setMaxBins(32).setImpurity("gini").setLabelCol("label") > > It raised the same error. > > > > I also tried to change the "label" to "int" type, it also reported error > > like following stack, I have no idea how to make this work. > > > > java.lang.IllegalArgumentException: requirement failed: Column label > must be > > of type DoubleType but was actually IntegerType. > > at scala.Predef$.require(Predef.scala:233) > > at > > > org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:37) > > at > > > org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:53) > > at > > > org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71) > > at > > org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116) > > at > > > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:162) > > at > > > org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:162) > > at > > > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) > > at > > > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) > > at > > scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:108) > > at > org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:162) > > at > > org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:59) > > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:116) > > at > > > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51) > > at > > > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:56) > > at > > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:58) > > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:60) > > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:62) > > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:64) > > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:66) > > at $iwC$$iwC$$iwC$$iwC$$iwC.(:68) > > at $iwC$$iwC$$iwC$$iwC.(:70) > > at $iwC$$iwC$$iwC.(:72) > > at $iwC$$iwC.(:74) > > at $iwC.(:76) > > at (:78) > > at .(:82) > > at .() > > at .(:7) > > at .() > > at $print() > > > > Thanks! > > - Terry > > > > On Sun, Sep 6, 2015 at 4:53 PM, Sean Owen <so...@cloudera.com> wrote: > >> > >> I think somewhere alone the line you've not specified your label > >> column -- it's defaulting to "label" and it does not recognize it, or > >> at least not as a binary or nominal attribute. > >> > >> On Sun, Sep 6, 2015 at 5:47 AM, Terry Hole <hujie.ea...@gmail.com> > wrote: > >> > Hi, Experts, > >> > > >> > I followed the guide of spark ml pipe to test DecisionTreeClassifier > on > >> > spark shell with spark 1.4.1, but always meets error like following, > do > >> > you > >> > have any idea how to fix this? > >> > > >> > The error stack: > >> > java.lang.IllegalArgumentException: DecisionTreeClassifier was given > >> > input > >> > with invalid label column label, without the number of classes > >> > specified. > >> > See StringIndexer. > >> > at > >> > > >&g
`sbt core/test` hangs on LogUrlsStandaloneSuite?
Hi, Am I doing something off base to execute tests for core module using sbt as follows? [spark]> core/test ... [info] KryoSerializerAutoResetDisabledSuite: [info] - sort-shuffle with bypassMergeSort (SPARK-7873) (53 milliseconds) [info] - calling deserialize() after deserializeStream() (2 milliseconds) [info] LogUrlsStandaloneSuite: ...AND HANGS HERE :( The reason I'm asking is that the command hangs after printing the above [info]. I'm on Mac OS and Java 8 with the latest sources - fc48307797912dc1d53893dce741ddda8630957b. While taking a thread dump I can see quite a few WAITINGs and TIMED_WAITING "at sun.misc.Unsafe.park(Native Method)" Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Writing test case for spark streaming checkpointing
Kill the job in the middle of a batch, look at the worker logs to see which offsets were being processed, verify the messages for those offsets are read when you start the job back up On Thu, Aug 27, 2015 at 10:14 AM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi! I have enables check pointing in spark streaming with kafka. I can see that spark streaming is checkpointing to the mentioned directory at hdfs. How can i test that it works fine and recover with no data loss ? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-test-case-for-spark-streaming-checkpointing-tp24475.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Writing test case for spark streaming checkpointing
Hi! I have enables check pointing in spark streaming with kafka. I can see that spark streaming is checkpointing to the mentioned directory at hdfs. How can i test that it works fine and recover with no data loss ? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-test-case-for-spark-streaming-checkpointing-tp24475.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to unit test HiveContext without OutOfMemoryError (using sbt)
Thanks for your response Yana, I can increase the MaxPermSize parameter and it will allow me to run the unit test a few more times before I run out of memory. However, the primary issue is that running the same unit test in the same JVM (multiple times) results in increased memory (each run of the unit test) and I believe it has something to do with HiveContext not reclaiming memory after it is finished (or I'm not shutting it down properly). It could very well be related to sbt, however, it's not clear to me. On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: The PermGen space error is controlled with MaxPermSize parameter. I run with this in my pom, I think copied pretty literally from Spark's own tests... I don't know what the sbt equivalent is but you should be able to pass it...possibly via SBT_OPTS? plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId version1.0/version configuration reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory parallelfalse/parallel junitxml./junitxml filereportsSparkTestSuite.txt/filereports argLine-Xmx3g -XX:MaxPermSize=256m -XX:ReservedCodeCacheSize=512m/argLine stderr/ systemProperties java.awt.headlesstrue/java.awt.headless spark.testing1/spark.testing spark.ui.enabledfalse/spark.ui.enabled spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts /systemProperties /configuration executions execution idtest/id goals goaltest/goal /goals /execution /executions /plugin /plugins On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com wrote: Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception in thread Thread-2 java.io.EOFException As a work-around, I can fork a new JVM each time I run the unit test, however, it seems like a bad solution as takes a while to run the unit test. By the way, I tried to importing the TestHiveContext: - import org.apache.spark.sql.hive.test.TestHiveContext However, it suffers from the same memory issue. Has anyone else suffered from the same problem? Note that I am running these unit tests on my mac. Cheers, Mike.
Re: How to unit test HiveContext without OutOfMemoryError (using sbt)
I'd suggest setting sbt to fork when running tests. On Wed, Aug 26, 2015 at 10:51 AM, Mike Trienis mike.trie...@orcsol.com wrote: Thanks for your response Yana, I can increase the MaxPermSize parameter and it will allow me to run the unit test a few more times before I run out of memory. However, the primary issue is that running the same unit test in the same JVM (multiple times) results in increased memory (each run of the unit test) and I believe it has something to do with HiveContext not reclaiming memory after it is finished (or I'm not shutting it down properly). It could very well be related to sbt, however, it's not clear to me. On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: The PermGen space error is controlled with MaxPermSize parameter. I run with this in my pom, I think copied pretty literally from Spark's own tests... I don't know what the sbt equivalent is but you should be able to pass it...possibly via SBT_OPTS? plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId version1.0/version configuration reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory parallelfalse/parallel junitxml./junitxml filereportsSparkTestSuite.txt/filereports argLine-Xmx3g -XX:MaxPermSize=256m -XX:ReservedCodeCacheSize=512m/argLine stderr/ systemProperties java.awt.headlesstrue/java.awt.headless spark.testing1/spark.testing spark.ui.enabledfalse/spark.ui.enabled spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts /systemProperties /configuration executions execution idtest/id goals goaltest/goal /goals /execution /executions /plugin /plugins On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com wrote: Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception in thread Thread-2 java.io.EOFException As a work-around, I can fork a new JVM each time I run the unit test, however, it seems like a bad solution as takes a while to run the unit test. By the way, I tried to importing the TestHiveContext: - import org.apache.spark.sql.hive.test.TestHiveContext However, it suffers from the same memory issue. Has anyone else suffered from the same problem? Note that I am running these unit tests on my mac. Cheers, Mike.
How to unit test HiveContext without OutOfMemoryError (using sbt)
Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception in thread Thread-2 java.io.EOFException As a work-around, I can fork a new JVM each time I run the unit test, however, it seems like a bad solution as takes a while to run the unit test. By the way, I tried to importing the TestHiveContext: - import org.apache.spark.sql.hive.test.TestHiveContext However, it suffers from the same memory issue. Has anyone else suffered from the same problem? Note that I am running these unit tests on my mac. Cheers, Mike.
Re:RE: Test case for the spark sql catalyst
Thanks Chenghao! At 2015-08-25 13:06:40, Cheng, Hao hao.ch...@intel.com wrote: Yes, check the source code under:https://github.com/apache/spark/tree/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst From: Todd [mailto:bit1...@163.com] Sent: Tuesday, August 25, 2015 1:01 PM To:user@spark.apache.org Subject: Test case for the spark sql catalyst Hi, Are there test cases for the spark sql catalyst, such as testing the rules of transforming unsolved query plan? Thanks!
Re: How to unit test HiveContext without OutOfMemoryError (using sbt)
The PermGen space error is controlled with MaxPermSize parameter. I run with this in my pom, I think copied pretty literally from Spark's own tests... I don't know what the sbt equivalent is but you should be able to pass it...possibly via SBT_OPTS? plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId version1.0/version configuration reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory parallelfalse/parallel junitxml./junitxml filereportsSparkTestSuite.txt/filereports argLine-Xmx3g -XX:MaxPermSize=256m -XX:ReservedCodeCacheSize=512m/argLine stderr/ systemProperties java.awt.headlesstrue/java.awt.headless spark.testing1/spark.testing spark.ui.enabledfalse/spark.ui.enabled spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts /systemProperties /configuration executions execution idtest/id goals goaltest/goal /goals /execution /executions /plugin /plugins On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com wrote: Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception in thread Thread-2 java.io.EOFException As a work-around, I can fork a new JVM each time I run the unit test, however, it seems like a bad solution as takes a while to run the unit test. By the way, I tried to importing the TestHiveContext: - import org.apache.spark.sql.hive.test.TestHiveContext However, it suffers from the same memory issue. Has anyone else suffered from the same problem? Note that I am running these unit tests on my mac. Cheers, Mike.
RE: Test case for the spark sql catalyst
Yes, check the source code under: https://github.com/apache/spark/tree/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst From: Todd [mailto:bit1...@163.com] Sent: Tuesday, August 25, 2015 1:01 PM To: user@spark.apache.org Subject: Test case for the spark sql catalyst Hi, Are there test cases for the spark sql catalyst, such as testing the rules of transforming unsolved query plan? Thanks!
Test case for the spark sql catalyst
Hi, Are there test cases for the spark sql catalyst, such as testing the rules of transforming unsolved query plan? Thanks!
Re: HiBench test for hadoop/hive/spark cluster
From log file: 15/07/16 11:16:56 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-root/mapred/local/1437016615898/user_agents - /opt/HiBench-master/user_agents 15/07/16 11:16:56 INFO mapred.LocalDistributedCacheManager: Localized hdfs://spark-study:9000/HiBench/Aggregation/temp/user_agents as file:/tmp/hadoop-root/mapred/local/ 1437016615898/user_agents ... java.io.FileNotFoundException: file:/tmp/hadoop-root/mapred/local/1437016615898/user_agents (No such file or directory) at java.io.FileInputStream.open(Native Method) However, FileNotFoundException didn't happen to other localized files, such as country_codes FYI On Wed, Jul 15, 2015 at 8:53 PM, luohui20...@sina.com wrote: Hi all when I am running my HiBench in my spark/Hadoop/Hive cluster. I found there is always a failure in my aggregation test. I doubt this problem maybe some issue relative with my hive settings? attaches are my config file and log file . Any idea to solve this issue? my Spark cluster is a one node standalone cluster : hadoop:2.7 spark:1.3 hive:1.2.1 here is the log: Prepare aggregation ... Exec script: /opt/HiBench-master/workloads/aggregation/prepare/prepare.sh Parsing conf: /opt/HiBench-master/conf/00-default-properties.conf Parsing conf: /opt/HiBench-master/conf/10-data-scale-profile.conf Parsing conf: /opt/HiBench-master/conf/99-user_defined_properties.conf Parsing conf: /opt/HiBench-master/workloads/aggregation/conf/00-aggregation-default.conf Parsing conf: /opt/HiBench-master/workloads/aggregation/conf/10-aggregation-userdefine.conf start HadoopPrepareAggregation bench hdfs rm -r: /usr/lib/hadoop/bin/hadoop --config /usr/lib/hadoop/etc/hadoop fs -rm -r -skipTrash hdfs://spark-study:9000/HiBench/Aggregation/Input 15/07/16 11:16:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Deleted hdfs://spark-study:9000/HiBench/Aggregation/Input Pages:12, USERVISITS:100 Submit MapReduce Job: /usr/lib/hadoop/bin/hadoop --config /usr/lib/hadoop/etc/hadoop jar /opt/HiBench-master/src/autogen/target/autogen-4.0-SNAPSHOT-jar-with-dependencies.jar HiBench.DataGen -t hive -b hdfs://spark-study:9000/HiBench/Aggregation -n Input -m 12 -r 6 -p 12 -v 100 -o sequence 15/07/16 11:16:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable sfubmzperrrbupqoq 15/07/16 11:16:38 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 15/07/16 11:16:40 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:42 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:43 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:43 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:44 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:45 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:45 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:46 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:46 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:48 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 15/07/16 11:16:50 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 0, 0 erros, 0 missed 15/07/16 11:16:50 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 15/07/16 11:16:51 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 1, 0 erros, 0 missed 15/07/16 11:16:51 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 15/07/16 11:16:51 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 2, 0 erros, 0 missed 15/07/16 11:16:52 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 15/07/16 11:16:52 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 3, 0 erros, 0 missed 15/07/16 11:16:52 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 15/07/16 11:16:53 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 4, 0 erros, 0 missed 15/07/16 11:16:53 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 15/07/16 11:16:54 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 5, 0 erros, 0 missed 15/07/16 11:16:55 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. java.io.FileNotFoundException: file:/tmp/hadoop-root/mapred/local/1437016615898/user_agents (No such file or directory) java.io.FileNotFoundException: file:/tmp/hadoop-root/mapred/local/1437016615898/user_agents
回复:Re: HiBench test for hadoop/hive/spark cluster
Hi Ted Thanks for your advice, i found that there is something wrong with hadoop fs -get command, 'cause I believe the localization of hdfs://spark-study:9000/HiBench/Aggregation/temp/user_agents to /tmp/hadoop-root/mapred/local/1437016615898/user_agents is a behaviour like hadoop fs -get I am forwarding to solve below issue now,:[root@spark-study HiBench-master]# hadoop fs -get /HiBench/Aggregation/temp/user_agents 15/07/16 12:31:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/07/16 12:31:58 WARN hdfs.DFSClient: DFSInputStream has been closed already Hi Mike: I am new to Hibench, so I just setup a test enviroment of 1 node spark/hadoop cluster to test, no data actually. Because Hibench will autogenerate test data itself. Thanksamp;Best regards! San.Luo - 原始邮件 - 发件人:Ted Yu yuzhih...@gmail.com 收件人:罗辉 luohui20...@sina.com 抄送人:jie.huang jie.hu...@intel.com, user user@spark.apache.org 主题:Re: HiBench test for hadoop/hive/spark cluster 日期:2015年07月16日 12点17分 From log file: 15/07/16 11:16:56 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-root/mapred/local/1437016615898/user_agents - /opt/HiBench-master/user_agents15/07/16 11:16:56 INFO mapred.LocalDistributedCacheManager: Localized hdfs://spark-study:9000/HiBench/Aggregation/temp/user_agents as file:/tmp/hadoop-root/mapred/local/ 1437016615898/user_agents...java.io.FileNotFoundException: file:/tmp/hadoop-root/mapred/local/1437016615898/user_agents (No such file or directory) at java.io.FileInputStream.open(Native Method) However, FileNotFoundException didn't happen to other localized files, such as country_codes FYI On Wed, Jul 15, 2015 at 8:53 PM, luohui20...@sina.com wrote: Hi all when I am running my HiBench in my spark/Hadoop/Hive cluster. I found there is always a failure in my aggregation test. I doubt this problem maybe some issue relative with my hive settings? attaches are my config file and log file . Any idea to solve this issue? my Spark cluster is a one node standalone cluster :hadoop:2.7spark:1.3hive:1.2.1 here is the log:Prepare aggregation ... Exec script: /opt/HiBench-master/workloads/aggregation/prepare/prepare.sh Parsing conf: /opt/HiBench-master/conf/00-default-properties.conf Parsing conf: /opt/HiBench-master/conf/10-data-scale-profile.conf Parsing conf: /opt/HiBench-master/conf/99-user_defined_properties.conf Parsing conf: /opt/HiBench-master/workloads/aggregation/conf/00-aggregation-default.conf Parsing conf: /opt/HiBench-master/workloads/aggregation/conf/10-aggregation-userdefine.conf start HadoopPrepareAggregation bench hdfs rm -r: /usr/lib/hadoop/bin/hadoop --config /usr/lib/hadoop/etc/hadoop fs -rm -r -skipTrash hdfs://spark-study:9000/HiBench/Aggregation/Input 15/07/16 11:16:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Deleted hdfs://spark-study:9000/HiBench/Aggregation/Input Pages:12, USERVISITS:100 Submit MapReduce Job: /usr/lib/hadoop/bin/hadoop --config /usr/lib/hadoop/etc/hadoop jar /opt/HiBench-master/src/autogen/target/autogen-4.0-SNAPSHOT-jar-with-dependencies.jar HiBench.DataGen -t hive -b hdfs://spark-study:9000/HiBench/Aggregation -n Input -m 12 -r 6 -p 12 -v 100 -o sequence 15/07/16 11:16:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable sfubmzperrrbupqoq 15/07/16 11:16:38 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 15/07/16 11:16:40 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:42 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:43 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:43 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:44 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:45 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:45 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:46 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:46 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!! 15/07/16 11:16:48 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 15/07/16 11:16:50 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 0, 0 erros, 0 missed 15/07/16 11:16:50 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 15/07/16 11:16:51 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 1, 0 erros, 0 missed 15/07/16 11:16:51 INFO reduce.EventFetcher
Spark stream test throw org.apache.spark.SparkException: Task not serializable when execute in spark shell
hi ,all there two examples one is throw Task not serializable when execute in spark shell,the other one is ok,i am very puzzled,can anyone give what's different about this two code and why the other is ok 1.The one which throw Task not serializable : import org.apache.spark._ import SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.broadcast._ @transient val ssc = new StreamingContext(sc, Seconds(5)) val lines = ssc.textFileStream(/a.log) val testFun = (line:String) = { if ((line.contains( ERROR)) || (line.startsWith(Spark))){ true } else{ false } } val p_date_bc = sc.broadcast(^\\w+ \\wfile://\\w+ \\dfile://\\d+ \\d{2}:\\d{2}:\\d{2file://\\d{2}:\\d{2}:\\d{2} \\d{4}.rfile://\\d{4}.r) val p_ORA_bc = sc.broadcast(^ORA-\\d+.+.r) val A = (iter: Iterator[String],data_bc:Broadcast[scala.util.matching.Regex],ORA_bc:Broadcast[scala.util.matching.Regex]) = { val p_date = data_bc.value val p_ORA = ORA_bc.value var res = List[String]() var lasttime = while (iter.hasNext) { val line = iter.next.toString val currentcode = p_ORA findFirstIn line getOrElse null if (currentcode != null){ res ::= lasttime + | + currentcode }else{ val currentdate = p_date findFirstIn line getOrElse null if (currentdate != null){ lasttime = currentdate } } } res.iterator } val cdd = lines.filter(testFun).mapPartitions(x = A(x,p_date_bc,p_ORA_bc)) //org.apache.spark.SparkException: Task not serializable 2.The other one is ok: import org.apache.spark._ import SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.broadcast._ val ssc = new StreamingContext(sc, Seconds(5)) val lines = ssc.textFileStream(/a.log) val testFun = (line:String) = { if ((line.contains( ERROR)) || (line.startsWith(Spark))){ true } else{ false } } val A = (iter: Iterator[String]) = { var res = List[String]() var lasttime = while (iter.hasNext) { val line = iter.next.toString val currentcode = ^\\w+ \\wfile://\\w+ \\dfile://\\d+ \\d{2}:\\d{2}:\\d{2file://\\d{2}:\\d{2}:\\d{2} \\d{4}.r.findFirstIn(line).getOrElse(nullfile://\\d{4}.r.findFirstIn(line).getOrElse(null) if (currentcode != null){ res ::= lasttime + | + currentcode }else{ val currentdate = ^ORA-\\d+.+.r.findFirstIn(line).getOrElse(null) if (currentdate != null){ lasttime = currentdate } } } res.iterator } val cdd= lines.filter(testFun).mapPartitions(A)
Re: Spark stream test throw org.apache.spark.SparkException: Task not serializable when execute in spark shell
I can't tell immediately, but you might be able to get more info with the hint provided here: http://stackoverflow.com/questions/27980781/spark-task-not-serializable-with-simple-accumulator (short version, set -Dsun.io.serialization.extendedDebugInfo=true) Also, unless you're simplifying your example a lot, you only have 2 regexes, so I'm not quite sure why you want to broadcast them, as opposed to just having an object that holds them on each executor, or just create them at the start of mapPartitions (outside of iter.hasNext as shown in your second snippet). Broadcasting seems overcomplicated, but maybe you just showed a simplified example... On Wed, Jun 24, 2015 at 8:41 AM, yuemeng (A) yueme...@huawei.com wrote: hi ,all there two examples one is throw Task not serializable when execute in spark shell,the other one is ok,i am very puzzled,can anyone give what's different about this two code and why the other is ok 1.The one which throw Task not serializable : import org.apache.spark._ import SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.broadcast._ @transient val ssc = new StreamingContext(sc, Seconds(5)) val lines = ssc.textFileStream(/a.log) val testFun = (line:String) = { if ((line.contains( ERROR)) || (line.startsWith(Spark))){ true } else{ false } } val p_date_bc = sc.broadcast(^\\w+ \\w+ \\d+ \\d{2}:\\d{2}:\\d{2} \\d{4}.r) val p_ORA_bc = sc.broadcast(^ORA-\\d+.+.r) val A = (iter: Iterator[String],data_bc:Broadcast[scala.util.matching.Regex],ORA_bc:Broadcast[scala.util.matching.Regex]) = { val p_date = data_bc.value val p_ORA = ORA_bc.value var res = List[String]() var lasttime = while (iter.hasNext) { val line = iter.next.toString val currentcode = p_ORA findFirstIn line getOrElse null if (currentcode != null){ res ::= lasttime + | + currentcode }else{ val currentdate = p_date findFirstIn line getOrElse null if (currentdate != null){ lasttime = currentdate } } } res.iterator } val cdd = lines.filter(testFun).mapPartitions(x = A(x,p_date_bc,p_ORA_bc)) //org.apache.spark.SparkException: Task not serializable 2.The other one is ok: import org.apache.spark._ import SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.broadcast._ val ssc = new StreamingContext(sc, Seconds(5)) val lines = ssc.textFileStream(/a.log) val testFun = (line:String) = { if ((line.contains( ERROR)) || (line.startsWith(Spark))){ true } else{ false } } val A = (iter: Iterator[String]) = { var res = List[String]() var lasttime = while (iter.hasNext) { val line = iter.next.toString val currentcode = ^\\w+ \\w+ \\d+ \\d{2}:\\d{2}:\\d{2} \\d{4}.r.findFirstIn(line).getOrElse(null) if (currentcode != null){ res ::= lasttime + | + currentcode }else{ val currentdate = ^ORA-\\d+.+.r.findFirstIn(line).getOrElse(null) if (currentdate != null){ lasttime = currentdate } } } res.iterator } val cdd= lines.filter(testFun).mapPartitions(A)
Re: Spark Maven Test error
Dear List, I'm trying to reference a lonely message to this list from March 25th,( http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Maven-Test-error-td22216.html ), but I'm unsure this will thread properly. Sorry, if didn't work out. Anyway, using Spark 1.4.0-RC4 I run into the same issue when performing tests, using build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phadoop-provided -Phive -Phive-thriftserver test after a successful clean-package build. The error is: java.lang.IllegalStateException: failed to create a child event loop Could this be due to another instance of Spark blocking ports? In that case maybe the test case should be able to adapt to this particular issue. Thanks for any help, Rick
Re: HiveContext test, Spark Context did not initialize after waiting 10000ms
I got a similar problem.I'm not sure if your problem is already resolved. For the record, I solved this type of error by calling sc..setMaster(yarn-cluster); If you find the solution, please let us know. Regards,Mohammad On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.driver.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.yarn.am.waitTime, 30L) val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) def inputRDD = sqlContext.sql(describe spark_poc.src_digital_profile_user); inputRDD.collect().foreach { println } println(inputRDD.schema.getClass.getName) / Getting this exception. Any clues? The weird part is if I try to do the same thing but in Java instead of Scala, it runs fine. /Exception in thread Driver java.lang.NullPointerException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) 15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 1 ms. Please check earlier log output for errors. Failing the application. Exception in thread main java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: HiveContext test, Spark Context did not initialize after waiting 10000ms
That is a much better solution than how I resolved it. I got around it by placing comma separated jar paths for all the hive related jars in --jars clause. I will try your solution. Thanks for sharing it. On Tue, May 26, 2015 at 4:14 AM, Mohammad Islam misla...@yahoo.com wrote: I got a similar problem. I'm not sure if your problem is already resolved. For the record, I solved this type of error by calling sc..setMaster( yarn-cluster); If you find the solution, please let us know. Regards, Mohammad On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.driver.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.yarn.am.waitTime, 30L) val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) def inputRDD = sqlContext.sql(describe spark_poc.src_digital_profile_user); inputRDD.collect().foreach { println } println(inputRDD.schema.getClass.getName) / Getting this exception. Any clues? The weird part is if I try to do the same thing but in Java instead of Scala, it runs fine. /Exception in thread Driver java.lang.NullPointerException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) 15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 1 ms. Please check earlier log output for errors. Failing the application. Exception in thread main java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed
Has this been fixed for you now? There has been a number of patches since then and it may have been fixed. On Thu, May 14, 2015 at 7:20 AM, Wangfei (X) wangf...@huawei.com wrote: Yes it is repeatedly on my locally Jenkins. 发自我的 iPhone 在 2015年5月14日,18:30,Tathagata Das t...@databricks.com 写道: Do you get this failure repeatedly? On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote: Hi, all, i got following error when i run unit test of spark by dev/run-tests on the latest branch-1.4 branch. the latest commit id: commit d518c0369fa412567855980c3f0f426cde5c190d Author: zsxwing zsxw...@gmail.com Date: Wed May 13 17:58:29 2015 -0700 error [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed: org.apache.spark.SparkException: Error communicating with MapOutputTracker [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113) [error] at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119) [error] at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93) [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala) [error] at org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103) [error] ... [error] Caused by: org.apache.spark.SparkException: Error sending message [message = StopMapOutputTracker] [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109) [error] ... 52 more [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] [error] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) [error] at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) [error] at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) [error] at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) [error] at scala.concurrent.Await$.result(package.scala:107) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) [error] ... 54 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed
Yes it is repeatedly on my locally Jenkins. 发自我的 iPhone 在 2015年5月14日,18:30,Tathagata Das t...@databricks.commailto:t...@databricks.com 写道: Do you get this failure repeatedly? On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.commailto:wangf...@huawei.com wrote: Hi, all, i got following error when i run unit test of spark by dev/run-tests on the latest branch-1.4 branch. the latest commit id: commit d518c0369fa412567855980c3f0f426cde5c190d Author: zsxwing zsxw...@gmail.commailto:zsxw...@gmail.com Date: Wed May 13 17:58:29 2015 -0700 error [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed: org.apache.spark.SparkException: Error communicating with MapOutputTracker [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113) [error] at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119) [error] at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93) [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala) [error] at org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103) [error] ... [error] Caused by: org.apache.spark.SparkException: Error sending message [message = StopMapOutputTracker] [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109) [error] ... 52 more [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] [error] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) [error] at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) [error] at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) [error] at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) [error] at scala.concurrent.Await$.result(package.scala:107) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) [error] ... 54 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html Sent from the Apache Spark User List mailing list archive at Nabble.comhttp://Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
[Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed
Hi, all, i got following error when i run unit test of spark by dev/run-tests on the latest branch-1.4 branch. the latest commit id: commit d518c0369fa412567855980c3f0f426cde5c190d Author: zsxwing zsxw...@gmail.com Date: Wed May 13 17:58:29 2015 -0700 error [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed: org.apache.spark.SparkException: Error communicating with MapOutputTracker [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113) [error] at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119) [error] at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93) [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala) [error] at org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103) [error] ... [error] Caused by: org.apache.spark.SparkException: Error sending message [message = StopMapOutputTracker] [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109) [error] ... 52 more [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] [error] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) [error] at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) [error] at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) [error] at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) [error] at scala.concurrent.Await$.result(package.scala:107) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) [error] ... 54 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed
Do you get this failure repeatedly? On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote: Hi, all, i got following error when i run unit test of spark by dev/run-tests on the latest branch-1.4 branch. the latest commit id: commit d518c0369fa412567855980c3f0f426cde5c190d Author: zsxwing zsxw...@gmail.com Date: Wed May 13 17:58:29 2015 -0700 error [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed: org.apache.spark.SparkException: Error communicating with MapOutputTracker [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113) [error] at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119) [error] at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93) [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala) [error] at org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103) [error] ... [error] Caused by: org.apache.spark.SparkException: Error sending message [message = StopMapOutputTracker] [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109) [error] ... 52 more [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] [error] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) [error] at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) [error] at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) [error] at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) [error] at scala.concurrent.Await$.result(package.scala:107) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) [error] ... 54 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark unit test fails
I'm also getting the same error. Any ideas? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368p22798.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
contributing code - how to test
Hi, I selected a starter task in JIRA, and made changes to my github fork of the current code. I assumed I would be able to build and test. % mvn clean compile was fine but %mvn package failed [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18:test (default-test) on project spark-launcher_2.10: There are test failures. I then reverted my changes, but same story. Any advice is appreciated! Deb
Re: contributing code - how to test
The standard incantation -- which is a little different from standard Maven practice -- is: mvn -DskipTests [your options] clean package mvn [your options] test Some tests require the assembly, so you have to do it this way. I don't know what the test failures were, you didn't post them, but I'm guessing this is the cause since it failed very early on the launcher module and not on some module that you changed. Sean On Fri, Apr 24, 2015 at 7:35 PM, Deborah Siegel deborah.sie...@gmail.com wrote: Hi, I selected a starter task in JIRA, and made changes to my github fork of the current code. I assumed I would be able to build and test. % mvn clean compile was fine but %mvn package failed [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18:test (default-test) on project spark-launcher_2.10: There are test failures. I then reverted my changes, but same story. Any advice is appreciated! Deb - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
ChiSquared Test from user response flat files to RDD[Vector]?
Hi Spark community, I'm very new to the Apache Spark community; but if this (very active) group is anything like the other Apache project user groups I've worked with, I'm going to enjoy discussions here very much. Thanks in advance! Use Case: I am trying to go from flat files of user response data, to contingency tables of frequency counts, to Pearson Chi-Square correlation statistics and perform a Chi-Squared hypothesis test. The user response data represents a multiple choice question-answer (MCQ) format. The goal is to compute all choose-two combinations of question answers (precondition, question X question) contingency tables. Each cell of the contingency table is the intersection of the users whom responded per each option per each question of the table. An overview of the problem: // data ingestion and typing schema: Observation (u: String, d: java.util.Date, t: String, q: String, v: String, a: Int) // a question (q) has a finite set of response options (v) per which a user (u) responds // additional response fields are not required per this test for (precondition a) { for (q_i in lex ordered questions) { for (q_j in lex ordered question, q_j q_i) { forall v_k \in q_i get set of distinct users {u}_ik forall v_l \in q_j get set of distinct users {u}_jl forall cells per table (a,q_i,q_j) defn C_ijkl = |intersect({u}_ik, {u}_jl)| // contingency table construct compute chisq test per this contingency table and persist } } } The scala main I'm testing is provided below, and I was planning to use the provided example https://spark.apache.org/docs/1.3.1/mllib-statistics.html however I am not sure how to go from my RDD[Observation] to the necessary precondition of RDD[Vector] for ingestion def main(args: Array[String]): Unit = { // setup main space for test val conf = new SparkConf().setAppName(TestMain) val sc = new SparkContext(conf) // data ETL and typing schema case class Observation (u: String, d: java.util.Date, t: String, q: String, v: String, a: Int) val date_format = new java.text.SimpleDateFormat(MMdd) val data_project_abs_dir = /my/path/to/data/files val data_files = data_project_abs_dir + /part-*.gz val data = sc.textFile(data_files) val observations = data.map(line = line.split(,).map(_.trim)).map(r = Observation(r(0).toString, date_format.parse(r(1).toString), r(2).toString, r(3).toString, r(4).toString, r(5).toInt)) observations.cache // ToDo: the basic keying of the space, possibly... val qvu = observations.map(o = ((o.a, o.q, o.v), o.u)).distinct // ToDo: ok, so now how to get this into the precondition RDD[Vector] from the Spark example to make a contingency table?... // ToDo: perform then persist the resulting chisq and p-value on these contingency tables... } Any help is appreciated. Thanks! -Dan
Re: ChiSquared Test from user response flat files to RDD[Vector]?
You can find the user guide for vector creation here: http://spark.apache.org/docs/latest/mllib-data-types.html#local-vector. -Xiangrui On Mon, Apr 20, 2015 at 2:32 PM, Dan DeCapria, CivicScience dan.decap...@civicscience.com wrote: Hi Spark community, I'm very new to the Apache Spark community; but if this (very active) group is anything like the other Apache project user groups I've worked with, I'm going to enjoy discussions here very much. Thanks in advance! Use Case: I am trying to go from flat files of user response data, to contingency tables of frequency counts, to Pearson Chi-Square correlation statistics and perform a Chi-Squared hypothesis test. The user response data represents a multiple choice question-answer (MCQ) format. The goal is to compute all choose-two combinations of question answers (precondition, question X question) contingency tables. Each cell of the contingency table is the intersection of the users whom responded per each option per each question of the table. An overview of the problem: // data ingestion and typing schema: Observation (u: String, d: java.util.Date, t: String, q: String, v: String, a: Int) // a question (q) has a finite set of response options (v) per which a user (u) responds // additional response fields are not required per this test for (precondition a) { for (q_i in lex ordered questions) { for (q_j in lex ordered question, q_j q_i) { forall v_k \in q_i get set of distinct users {u}_ik forall v_l \in q_j get set of distinct users {u}_jl forall cells per table (a,q_i,q_j) defn C_ijkl = |intersect({u}_ik, {u}_jl)| // contingency table construct compute chisq test per this contingency table and persist } } } The scala main I'm testing is provided below, and I was planning to use the provided example https://spark.apache.org/docs/1.3.1/mllib-statistics.html however I am not sure how to go from my RDD[Observation] to the necessary precondition of RDD[Vector] for ingestion def main(args: Array[String]): Unit = { // setup main space for test val conf = new SparkConf().setAppName(TestMain) val sc = new SparkContext(conf) // data ETL and typing schema case class Observation (u: String, d: java.util.Date, t: String, q: String, v: String, a: Int) val date_format = new java.text.SimpleDateFormat(MMdd) val data_project_abs_dir = /my/path/to/data/files val data_files = data_project_abs_dir + /part-*.gz val data = sc.textFile(data_files) val observations = data.map(line = line.split(,).map(_.trim)).map(r = Observation(r(0).toString, date_format.parse(r(1).toString), r(2).toString, r(3).toString, r(4).toString, r(5).toInt)) observations.cache // ToDo: the basic keying of the space, possibly... val qvu = observations.map(o = ((o.a, o.q, o.v), o.u)).distinct // ToDo: ok, so now how to get this into the precondition RDD[Vector] from the Spark example to make a contingency table?... // ToDo: perform then persist the resulting chisq and p-value on these contingency tables... } Any help is appreciated. Thanks! -Dan - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Cannot run unit test.
It's because your tests are running in parallel and you can only have one context running at a time. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p22429.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark unit test fails
Trying to bump up the rank of the question. Any example on Github can someone point to? ..Manas On Fri, Apr 3, 2015 at 9:39 AM, manasdebashiskar manasdebashis...@gmail.com wrote: Hi experts, I am trying to write unit tests for my spark application which fails with javax.servlet.FilterRegistration error. I am using CDH5.3.2 Spark and below is my dependencies list. val spark = 1.2.0-cdh5.3.2 val esriGeometryAPI = 1.2 val csvWriter = 1.0.0 val hadoopClient= 2.3.0 val scalaTest = 2.2.1 val jodaTime= 1.6.0 val scalajHTTP = 1.0.1 val avro= 1.7.7 val scopt = 3.2.0 val config = 1.2.1 val jobserver = 0.4.1 val excludeJBossNetty = ExclusionRule(organization = org.jboss.netty) val excludeIONetty = ExclusionRule(organization = io.netty) val excludeEclipseJetty = ExclusionRule(organization = org.eclipse.jetty) val excludeMortbayJetty = ExclusionRule(organization = org.mortbay.jetty) val excludeAsm = ExclusionRule(organization = org.ow2.asm) val excludeOldAsm = ExclusionRule(organization = asm) val excludeCommonsLogging = ExclusionRule(organization = commons-logging) val excludeSLF4J = ExclusionRule(organization = org.slf4j) val excludeScalap = ExclusionRule(organization = org.scala-lang, artifact = scalap) val excludeHadoop = ExclusionRule(organization = org.apache.hadoop) val excludeCurator = ExclusionRule(organization = org.apache.curator) val excludePowermock = ExclusionRule(organization = org.powermock) val excludeFastutil = ExclusionRule(organization = it.unimi.dsi) val excludeJruby = ExclusionRule(organization = org.jruby) val excludeThrift = ExclusionRule(organization = org.apache.thrift) val excludeServletApi = ExclusionRule(organization = javax.servlet, artifact = servlet-api) val excludeJUnit = ExclusionRule(organization = junit) I found the link ( http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-SecurityException-when-running-tests-with-Spark-1-0-0-td6747.html#a6749 ) talking about the issue and the work around of the same. But that work around does not get rid of the problem for me. I am using an SBT build which can't be changed to maven. What am I missing? Stack trace - [info] FiltersRDDSpec: [info] - Spark Filter *** FAILED *** [info] java.lang.SecurityException: class javax.servlet.FilterRegistration's signer information does not match signer information of other classes in the same package [info] at java.lang.ClassLoader.checkCerts(Unknown Source) [info] at java.lang.ClassLoader.preDefineClass(Unknown Source) [info] at java.lang.ClassLoader.defineClass(Unknown Source) [info] at java.security.SecureClassLoader.defineClass(Unknown Source) [info] at java.net.URLClassLoader.defineClass(Unknown Source) [info] at java.net.URLClassLoader.access$100(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.security.AccessController.doPrivileged(Native Method) [info] at java.net.URLClassLoader.findClass(Unknown Source) Thanks Manas Manas Kar -- View this message in context: Spark unit test fails http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368.html Sent from the Apache Spark User List mailing list archive http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.
Spark unit test fails
Hi experts, I am trying to write unit tests for my spark application which fails with javax.servlet.FilterRegistration error. I am using CDH5.3.2 Spark and below is my dependencies list. val spark = 1.2.0-cdh5.3.2 val esriGeometryAPI = 1.2 val csvWriter = 1.0.0 val hadoopClient= 2.3.0 val scalaTest = 2.2.1 val jodaTime= 1.6.0 val scalajHTTP = 1.0.1 val avro= 1.7.7 val scopt = 3.2.0 val config = 1.2.1 val jobserver = 0.4.1 val excludeJBossNetty = ExclusionRule(organization = org.jboss.netty) val excludeIONetty = ExclusionRule(organization = io.netty) val excludeEclipseJetty = ExclusionRule(organization = org.eclipse.jetty) val excludeMortbayJetty = ExclusionRule(organization = org.mortbay.jetty) val excludeAsm = ExclusionRule(organization = org.ow2.asm) val excludeOldAsm = ExclusionRule(organization = asm) val excludeCommonsLogging = ExclusionRule(organization = commons-logging) val excludeSLF4J = ExclusionRule(organization = org.slf4j) val excludeScalap = ExclusionRule(organization = org.scala-lang, artifact = scalap) val excludeHadoop = ExclusionRule(organization = org.apache.hadoop) val excludeCurator = ExclusionRule(organization = org.apache.curator) val excludePowermock = ExclusionRule(organization = org.powermock) val excludeFastutil = ExclusionRule(organization = it.unimi.dsi) val excludeJruby = ExclusionRule(organization = org.jruby) val excludeThrift = ExclusionRule(organization = org.apache.thrift) val excludeServletApi = ExclusionRule(organization = javax.servlet, artifact = servlet-api) val excludeJUnit = ExclusionRule(organization = junit) I found the link ( http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-SecurityException-when-running-tests-with-Spark-1-0-0-td6747.html#a6749 ) talking about the issue and the work around of the same. But that work around does not get rid of the problem for me. I am using an SBT build which can't be changed to maven. What am I missing? Stack trace - [info] FiltersRDDSpec: [info] - Spark Filter *** FAILED *** [info] java.lang.SecurityException: class javax.servlet.FilterRegistration's signer information does not match signer information of other classes in the same package [info] at java.lang.ClassLoader.checkCerts(Unknown Source) [info] at java.lang.ClassLoader.preDefineClass(Unknown Source) [info] at java.lang.ClassLoader.defineClass(Unknown Source) [info] at java.security.SecureClassLoader.defineClass(Unknown Source) [info] at java.net.URLClassLoader.defineClass(Unknown Source) [info] at java.net.URLClassLoader.access$100(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.security.AccessController.doPrivileged(Native Method) [info] at java.net.URLClassLoader.findClass(Unknown Source) Thanks Manas
Re: rdd.toDF().saveAsParquetFile(tachyon://host:19998/test)
You are hitting https://issues.apache.org/jira/browse/SPARK-6330. It has been fixed in 1.3.1, which will be released soon. On Fri, Mar 27, 2015 at 10:42 PM, sud_self 852677...@qq.com wrote: spark version is 1.3.0 with tanhyon-0.6.1 QUESTION DESCRIPTION: rdd.saveAsObjectFile(tachyon://host:19998/test) and rdd.saveAsTextFile(tachyon://host:19998/test) succeed, but rdd.toDF().saveAsParquetFile(tachyon://host:19998/test) failure. ERROR MESSAGE:java.lang.IllegalArgumentException: Wrong FS: tachyon://host:19998/test, expected: hdfs://host:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/rdd-toDF-saveAsParquetFile-tachyon-host-19998-test-tp22264.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
rdd.toDF().saveAsParquetFile(tachyon://host:19998/test)
spark version is 1.3.0 with tanhyon-0.6.1 QUESTION DESCRIPTION: rdd.saveAsObjectFile(tachyon://host:19998/test) and rdd.saveAsTextFile(tachyon://host:19998/test) succeed, but rdd.toDF().saveAsParquetFile(tachyon://host:19998/test) failure. ERROR MESSAGE:java.lang.IllegalArgumentException: Wrong FS: tachyon://host:19998/test, expected: hdfs://host:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/rdd-toDF-saveAsParquetFile-tachyon-host-19998-test-tp22264.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Maven Test error
I use command to run Unit test, as follow: ./make-distribution.sh --tgz --skip-java-test -Pscala-2.10 -Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn -Dyarn.version=2.3.0-cdh5.1.2 -Dhadoop.version=2.3.0-cdh5.1.2 mvn -Pscala-2.10 -Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn -Dyarn.version=2.3.0-cdh5.1.2 -Dhadoop.version=2.3.0-cdh5.1.2 -pl core,launcher,network/common,network/shuffle,sql/core,sql/catalyst,sql/hive -DwildcardSuites=org.apache.spark.sql.JoinSuite test Error occur: --- T E S T S --- Running org.apache.spark.network.ProtocolSuite log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.251 sec - in org.apache.spark.network.ProtocolSuite Running org.apache.spark.network.RpcIntegrationSuite Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.54 sec - in org.apache.spark.network.RpcIntegrationSuite Running org.apache.spark.network.ChunkFetchIntegrationSuite Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.34 sec - in org.apache.spark.network.ChunkFetchIntegrationSuite Running org.apache.spark.network.sasl.SparkSaslSuite Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec - in org.apache.spark.network.sasl.SparkSaslSuite Running org.apache.spark.network.TransportClientFactorySuite Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 11.42 sec FAILURE! - in org.apache.spark.network.TransportClientFactorySuite reuseClientsUpToConfigVariable(org.apache.spark.network.TransportClientFactorySuite) Time elapsed: 2.332 sec ERROR! java.lang.IllegalStateException: failed to create a child event loop at sun.nio.ch.IOUtil.makePipe(Native Method) at sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:65) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36) at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126) at io.netty.channel.nio.NioEventLoop.init(NioEventLoop.java:120) at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87) at io.netty.util.concurrent.MultithreadEventExecutorGroup.init(MultithreadEventExecutorGroup.java:64) at io.netty.channel.MultithreadEventLoopGroup.init(MultithreadEventLoopGroup.java:49) at io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:61) at io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:52) at org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:56) at org.apache.spark.network.client.TransportClientFactory.init(TransportClientFactory.java:104) at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:76) at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:80) at org.apache.spark.network.TransportClientFactorySuite.testClientReuse(TransportClientFactorySuite.java:86) at org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable(TransportClientFactorySuite.java:131) reuseClientsUpToConfigVariableConcurrent(org.apache.spark.network.TransportClientFactorySuite) Time elapsed: 2.279 sec ERROR! java.lang.IllegalStateException: failed to create a child event loop at sun.nio.ch.IOUtil.makePipe(Native Method) at sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:65) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36) at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126) at io.netty.channel.nio.NioEventLoop.init(NioEventLoop.java:120) at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87) at io.netty.util.concurrent.MultithreadEventExecutorGroup.init(MultithreadEventExecutorGroup.java:64) at io.netty.channel.MultithreadEventLoopGroup.init(MultithreadEventLoopGroup.java:49) at io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:61) at io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:52) at org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:56) at org.apache.spark.network.client.TransportClientFactory.init(TransportClientFactory.java:104) at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:76) at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:80
HiveContext test, Spark Context did not initialize after waiting 10000ms
I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.driver.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.yarn.am.waitTime, 30L) val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) def inputRDD = sqlContext.sql(describe spark_poc.src_digital_profile_user); inputRDD.collect().foreach { println } println(inputRDD.schema.getClass.getName) / Getting this exception. Any clues? The weird part is if I try to do the same thing but in Java instead of Scala, it runs fine. /Exception in thread Driver java.lang.NullPointerException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) 15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 1 ms. Please check earlier log output for errors. Failing the application. Exception in thread main java.lang.NullPointerException at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: HiveContext test, Spark Context did not initialize after waiting 10000ms
On Fri, Mar 6, 2015 at 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote: I am trying to run a Hive query from Spark using HiveContext. Here is the code / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest) conf.set(spark.executor.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.driver.extraClassPath, /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib); conf.set(spark.yarn.am.waitTime, 30L) You're missing /* at the end of your classpath entries. Also, since you're on CDH 5.2, you'll probably need to filter out the guava jar from Hive's lib directory, otherwise things might break. So things will get a little more complicated. With CDH 5.3 you shouldn't need to filter out the guava jar. -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Standard Application to Test
Hello, I am preparing some tests to execute in Spark in order to manipulate properties and check the variations in results. For this, I need to use a Standard Application in my environment like the well-known apps to Hadoop: Terasort https://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html and specially Terrier http://terrier.org/ or something similar. I do not need applications wordcount and grep because I have used them. Can anyone suggest me something about this? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Standard-Application-to-Test-tp21803.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Test
Sent from my iPhone - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
java.lang.NoClassDefFoundError: io/netty/util/TimerTask Error when running sbt test
I am using Spark-1.1.1. When I used sbt test, I ran into the following exceptions. Any idea how to solve it? Thanks! I think somebody posted this question before, but no one seemed to have answered it. Could it be the version of io.netty I put in my build.sbt? I included an dependency libraryDependencies += io.netty % netty % 3.6.6.Final in my build.sbt file. java.lang.NoClassDefFoundError: io/netty/util/TimerTaskat org.apache.spark.storage.BlockManager.init(BlockManager.scala:72) at org.apache.spark.storage.BlockManager.init(BlockManager.scala:168) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:230)at org.apache.spark.SparkContext.init(SparkContext.scala:204) at spark.jobserver.util.DefaultSparkContextFactory.makeContext(SparkContextFactory.scala:34) at spark.jobserver.JobManagerActor.createContextFromConfig(JobManagerActor.scala:255) at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:104) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)...
TestSuiteBase based unit test using a sliding window join timesout
Hi, I extended org.apache.spark.streaming.TestSuiteBase for some testing, and I was able to run this test fine: test(Sliding window join with 3 second window duration) { val input1 = Seq( Seq(req1), Seq(req2, req3), Seq(), Seq(req4, req5, req6), Seq(req7), Seq(), Seq() ) val input2 = Seq( Seq((tx1, req1)), Seq(), Seq((tx2, req3)), Seq((tx3, req2)), Seq(), Seq((tx4, req7)), Seq((tx5, req5), (tx6, req4)) ) val expectedOutput = Seq( Seq((req1, (1, tx1))), Seq(), Seq((req3, (1, tx2))), Seq((req2, (1, tx3))), Seq(), Seq((req7, (1, tx4))), Seq() ) val operation = (rq: DStream[String], tx: DStream[(String,String)]) = { rq.window(Seconds(3), Seconds(1)).map(x = (x, 1)).join(tx.map{ case (k, v) = (v, k)}) } testOperation(input1, input2, operation, expectedOutput, useSet=true) } However, this seemingly OK looking test fails with operation timeout: test(Sliding window join with 3 second window duration + a tumbling window) { val input1 = Seq( Seq(req1), Seq(req2, req3), Seq(), Seq(req4, req5, req6), Seq(req7), Seq() ) val input2 = Seq( Seq((tx1, req1)), Seq(), Seq((tx2, req3)), Seq((tx3, req2)), Seq(), Seq((tx4, req7)) ) val expectedOutput = Seq( Seq((req1, (1, tx1))), Seq((req2, (1, tx3)), (req3, (1, tx3))), Seq((req7, (1, tx4))) ) val operation = (rq: DStream[String], tx: DStream[(String,String)]) = { rq.window(Seconds(3), Seconds(2)).map(x = (x, 1)).join(tx.window(Seconds(2), Seconds(2)).map{ case (k, v) = (v, k)}) } testOperation(input1, input2, operation, expectedOutput, useSet=true) } Stacktrace: 10033 was not less than 1 Operation timed out after 10033 ms org.scalatest.exceptions.TestFailedException: 10033 was not less than 1 Operation timed out after 10033 ms at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:338) Does anybody know why this could be? ᐧ
Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
Your guess is right, that there are two incompatible versions of Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey, but its transitive dependencies may, or your transitive dependencies may. I don't see Jersey in Spark's dependency tree except from HBase tests, which in turn only appear in examples, so that's unlikely to be it. I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/ - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
[INFO] | | \- javax.validation:validation-api:jar:1.1.0.Final:compile [INFO] | \- javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile [INFO] +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile [INFO] | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.4.0:compile [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.4.0:compile [INFO] | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.4.0:compile [INFO] | | | | \- com.sun.jersey:jersey-client:jar:1.9:compile [INFO] | | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.4.0:compile [INFO] | | \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.4.0:compile [INFO] | +- org.apache.hadoop:hadoop-yarn-api:jar:2.4.0:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.4.0:compile [INFO] | \- org.apache.hadoop:hadoop-annotations:jar:2.4.0:compile [INFO] +- com.google.guava:guava:jar:16.0:compile [INFO] +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.4.0:compile [INFO] | +- org.apache.hadoop:hadoop-yarn-common:jar:2.4.0:compile [INFO] | | +- javax.xml.bind:jaxb-api:jar:2.2.2:compile [INFO] | | | +- javax.xml.stream:stax-api:jar:1.0-2:compile [INFO] | | | \- javax.activation:activation:jar:1.1:compile [INFO] | | +- javax.servlet:servlet-api:jar:2.5:compile [INFO] | | +- com.google.inject:guice:jar:3.0:compile [INFO] | | | +- javax.inject:javax.inject:jar:1:compile [INFO] | | | \- aopalliance:aopalliance:jar:1.0:compile [INFO] | | \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile [INFO] | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] | +- org.slf4j:slf4j-api:jar:1.7.5:compile [INFO] | +- com.google.inject.extensions:guice-servlet:jar:3.0:compile [INFO] | \- io.netty:netty:jar:3.6.2.Final:compile [INFO] +- json-mapreduce:json-mapreduce:jar:1.0-SNAPSHOT:compile [INFO] +- org.apache.avro:avro-mapred:jar:1.7.7:compile [INFO] | +- org.apache.avro:avro-ipc:jar:1.7.7:compile [INFO] | | +- org.apache.velocity:velocity:jar:1.7:compile [INFO] | | \- org.mortbay.jetty:servlet-api:jar:2.5-20081211:compile [INFO] | +- org.apache.avro:avro-ipc:jar:tests:1.7.7:compile [INFO] | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile [INFO] | \- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile [INFO] +- junit:junit:jar:4.11:test [INFO] | \- org.hamcrest:hamcrest-core:jar:1.3:test [INFO] +- org.apache.avro:avro:jar:1.7.7:compile [INFO] | +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | +- org.xerial.snappy:snappy-java:jar:1.0.5:compile [INFO] | \- org.apache.commons:commons-compress:jar:1.4.1:compile [INFO] | \- org.tukaani:xz:jar:1.0:compile [INFO] +- org.apache.hadoop:hadoop-common:jar:2.4.0:provided [INFO] | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | +- org.apache.commons:commons-math3:jar:3.1.1:provided [INFO] | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:provided [INFO] | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | +- commons-io:commons-io:jar:2.4:compile [INFO] | +- commons-net:commons-net:jar:3.1:provided [INFO] | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | +- org.mortbay.jetty:jetty:jar:6.1.26:compile [INFO] | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | +- com.sun.jersey:jersey-core:jar:1.9:compile [INFO] | +- com.sun.jersey:jersey-json:jar:1.9:compile [INFO] | | +- org.codehaus.jettison:jettison:jar:1.1:compile [INFO] | | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile [INFO] | | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile [INFO] | | \- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile [INFO] | +- com.sun.jersey:jersey-server:jar:1.9:compile [INFO] | | \- asm:asm:jar:3.1:compile [INFO] | +- tomcat:jasper-compiler:jar:5.5.23:provided [INFO] | +- tomcat:jasper-runtime:jar:5.5.23:provided [INFO] | +- javax.servlet.jsp:jsp-api:jar:2.1:provided [INFO] | +- commons-el:commons-el:jar:1.0:provided [INFO] | +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] | +- log4j:log4j:jar:1.2.17:compile [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.0:provided [INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.1.2:provided [INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:provided [INFO] | | \- com.jamesmurty.utils:java-xmlbuilder:jar:0.4:provided [INFO] | +- commons-lang:commons-lang:jar:2.6:compile [INFO] | +- commons-configuration:commons-configuration:jar:1.6:provided [INFO] | | +- commons-digester:commons-digester:jar:1.8:provided [INFO] | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:provided [INFO] | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:provided [INFO] | +- org.apache.hadoop:hadoop-auth:jar:2.4.0:provided [INFO] | +- com.jcraft:jsch:jar:0.1.42:provided [INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:provided [INFO
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
It seems like YARN depends an older version of Jersey, that is 1.9: https://github.com/apache/spark/blob/master/yarn/pom.xml When I've modified my dependencies to have only: dependency groupIdcom.sun.jersey/groupId artifactIdjersey-core/artifactId version1.9.1/version /dependency And then modified the code to use the old Jersey API: Client c = Client.create(); WebResource r = c.resource(http://localhost:/rest;) .path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, r.getURI()); String response = r.accept(MediaType.APPLICATION_JSON_TYPE) //.header() .get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); It seems to work when I use spark-submit to submit the application that includes this code. Funny thing is, now my relevant unit test does not run, complaining about not having enough memory: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 25165824 bytes for committing reserved memory. -- Emre On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Your guess is right, that there are two incompatible versions of Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey, but its transitive dependencies may, or your transitive dependencies may. I don't see Jersey in Spark's dependency tree except from HBase tests, which in turn only appear in examples, so that's unlikely to be it. I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/ -- Emre Sevinc
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
That could well be it -- oops, I forgot to run with the YARN profile and so didn't see the YARN dependencies. Try the userClassPathFirst option to try to make your app's copy take precedence. The second error is really a JVM bug, but, is from having too little memory available for the unit tests. http://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage On Wed, Dec 24, 2014 at 1:56 PM, Emre Sevinc emre.sev...@gmail.com wrote: It seems like YARN depends an older version of Jersey, that is 1.9: https://github.com/apache/spark/blob/master/yarn/pom.xml When I've modified my dependencies to have only: dependency groupIdcom.sun.jersey/groupId artifactIdjersey-core/artifactId version1.9.1/version /dependency And then modified the code to use the old Jersey API: Client c = Client.create(); WebResource r = c.resource(http://localhost:/rest;) .path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, r.getURI()); String response = r.accept(MediaType.APPLICATION_JSON_TYPE) //.header() .get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); It seems to work when I use spark-submit to submit the application that includes this code. Funny thing is, now my relevant unit test does not run, complaining about not having enough memory: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 25165824 bytes for committing reserved memory. -- Emre On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Your guess is right, that there are two incompatible versions of Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey, but its transitive dependencies may, or your transitive dependencies may. I don't see Jersey in Spark's dependency tree except from HBase tests, which in turn only appear in examples, so that's unlikely to be it. I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/ -- Emre Sevinc - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
Sean, Thanks a lot for the important information, especially userClassPathFirst. Cheers, Emre On Wed, Dec 24, 2014 at 3:38 PM, Sean Owen so...@cloudera.com wrote: That could well be it -- oops, I forgot to run with the YARN profile and so didn't see the YARN dependencies. Try the userClassPathFirst option to try to make your app's copy take precedence. The second error is really a JVM bug, but, is from having too little memory available for the unit tests. http://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage On Wed, Dec 24, 2014 at 1:56 PM, Emre Sevinc emre.sev...@gmail.com wrote: It seems like YARN depends an older version of Jersey, that is 1.9: https://github.com/apache/spark/blob/master/yarn/pom.xml When I've modified my dependencies to have only: dependency groupIdcom.sun.jersey/groupId artifactIdjersey-core/artifactId version1.9.1/version /dependency And then modified the code to use the old Jersey API: Client c = Client.create(); WebResource r = c.resource(http://localhost:/rest;) .path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, r.getURI()); String response = r.accept(MediaType.APPLICATION_JSON_TYPE) //.header() .get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); It seems to work when I use spark-submit to submit the application that includes this code. Funny thing is, now my relevant unit test does not run, complaining about not having enough memory: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 25165824 bytes for committing reserved memory. -- Emre On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Your guess is right, that there are two incompatible versions of Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey, but its transitive dependencies may, or your transitive dependencies may. I don't see Jersey in Spark's dependency tree except from HBase tests, which in turn only appear in examples, so that's unlikely to be it. I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/ -- Emre Sevinc -- Emre Sevinc
Re: Intermittent test failures
Using TestSQLContext from multiple tests leads to: SparkException: : Task not serializable ERROR ContextCleaner: Error cleaning broadcast 10 java.lang.NullPointerException at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:246) at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:46) at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66) at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138) at scala.Option.foreach(Option.scala:236) On 15.12.2014, at 22:36, Marius Soutier mps@gmail.com wrote: Ok, maybe these test versions will help me then. I’ll check it out. On 15.12.2014, at 22:33, Michael Armbrust mich...@databricks.com wrote: Using a single SparkContext should not cause this problem. In the SQL tests we use TestSQLContext and TestHive which are global singletons for all of our unit testing. On Mon, Dec 15, 2014 at 1:27 PM, Marius Soutier mps@gmail.com wrote: Possible, yes, although I’m trying everything I can to prevent it, i.e. fork in Test := true and isolated. Can you confirm that reusing a single SparkContext for multiple tests poses a problem as well? Other than that, just switching from SQLContext to HiveContext also provoked the error. On 15.12.2014, at 20:22, Michael Armbrust mich...@databricks.com wrote: Is it possible that you are starting more than one SparkContext in a single JVM with out stopping previous ones? I'd try testing with Spark 1.2, which will throw an exception in this case. On Mon, Dec 15, 2014 at 8:48 AM, Marius Soutier mps@gmail.com wrote: Hi, I’m seeing strange, random errors when running unit tests for my Spark jobs. In this particular case I’m using Spark SQL to read and write Parquet files, and one error that I keep running into is this one: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) I can only prevent this from happening by using isolated Specs tests thats always create a new SparkContext that is not shared between tests (but there can also be only a single SparkContext per test), and also by using standard SQLContext instead of HiveContext. It does not seem to have anything to do with the actual files that I also create during the test run with SQLContext.saveAsParquetFile. Cheers - Marius PS The full trace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125) org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232) org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169) org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927) org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155) sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990
Intermittent test failures
Hi, I’m seeing strange, random errors when running unit tests for my Spark jobs. In this particular case I’m using Spark SQL to read and write Parquet files, and one error that I keep running into is this one: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) I can only prevent this from happening by using isolated Specs tests thats always create a new SparkContext that is not shared between tests (but there can also be only a single SparkContext per test), and also by using standard SQLContext instead of HiveContext. It does not seem to have anything to do with the actual files that I also create during the test run with SQLContext.saveAsParquetFile. Cheers - Marius PS The full trace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125) org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232) org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169) org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927) org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155) sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) ~[spark-core_2.10-1.1.1.jar:1.1.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) ~[spark-core_2.10-1.1.1.jar:1.1.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) ~[spark-core_2.10-1.1.1.jar:1.1.1] at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[scala-library.jar:na] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) ~[scala-library.jar:na] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) ~[spark-core_2.10-1.1.1.jar:1.1.1] - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Intermittent test failures
Is it possible that you are starting more than one SparkContext in a single JVM with out stopping previous ones? I'd try testing with Spark 1.2, which will throw an exception in this case. On Mon, Dec 15, 2014 at 8:48 AM, Marius Soutier mps@gmail.com wrote: Hi, I’m seeing strange, random errors when running unit tests for my Spark jobs. In this particular case I’m using Spark SQL to read and write Parquet files, and one error that I keep running into is this one: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) I can only prevent this from happening by using isolated Specs tests thats always create a new SparkContext that is not shared between tests (but there can also be only a single SparkContext per test), and also by using standard SQLContext instead of HiveContext. It does not seem to have anything to do with the actual files that I also create during the test run with SQLContext.saveAsParquetFile. Cheers - Marius PS The full trace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125) org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232) org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169) org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927) org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155) sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) ~[spark-core_2.10-1.1.1.jar:1.1.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) ~[spark-core_2.10-1.1.1.jar:1.1.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) ~[spark-core_2.10-1.1.1.jar:1.1.1] at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[scala-library.jar:na] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) ~[scala-library.jar:na] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) ~[spark-core_2.10-1.1.1.jar:1.1.1] - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Intermittent test failures
Ok, maybe these test versions will help me then. I’ll check it out. On 15.12.2014, at 22:33, Michael Armbrust mich...@databricks.com wrote: Using a single SparkContext should not cause this problem. In the SQL tests we use TestSQLContext and TestHive which are global singletons for all of our unit testing. On Mon, Dec 15, 2014 at 1:27 PM, Marius Soutier mps@gmail.com wrote: Possible, yes, although I’m trying everything I can to prevent it, i.e. fork in Test := true and isolated. Can you confirm that reusing a single SparkContext for multiple tests poses a problem as well? Other than that, just switching from SQLContext to HiveContext also provoked the error. On 15.12.2014, at 20:22, Michael Armbrust mich...@databricks.com wrote: Is it possible that you are starting more than one SparkContext in a single JVM with out stopping previous ones? I'd try testing with Spark 1.2, which will throw an exception in this case. On Mon, Dec 15, 2014 at 8:48 AM, Marius Soutier mps@gmail.com wrote: Hi, I’m seeing strange, random errors when running unit tests for my Spark jobs. In this particular case I’m using Spark SQL to read and write Parquet files, and one error that I keep running into is this one: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) I can only prevent this from happening by using isolated Specs tests thats always create a new SparkContext that is not shared between tests (but there can also be only a single SparkContext per test), and also by using standard SQLContext instead of HiveContext. It does not seem to have anything to do with the actual files that I also create during the test run with SQLContext.saveAsParquetFile. Cheers - Marius PS The full trace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125) org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58) org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232) org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169) org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927) org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155) sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) ~[spark-core_2.10-1.1.1.jar:1.1.1
How can I make Spark Streaming count the words in a file in a unit test?
Hello, I've successfully built a very simple Spark Streaming application in Java that is based on the HdfsCount example in Scala at https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala . When I submit this application to my local Spark, it waits for a file to be written to a given directory, and when I create that file it successfully prints the number of words. I terminate the application by pressing Ctrl+C. Now I've tried to create a very basic unit test for this functionality, but in the test I was not able to print the same information, that is the number of words. What am I missing? Below is the unit test file, and after that I've also included the code snippet that shows the countWords method: = StarterAppTest.java = import com.google.common.io.Files; import org.apache.spark.streaming.Duration; import org.apache.spark.streaming.api.java.JavaDStream; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import org.junit.*; import java.io.*; public class StarterAppTest { JavaStreamingContext ssc; File tempDir; @Before public void setUp() { ssc = new JavaStreamingContext(local, test, new Duration(3000)); tempDir = Files.createTempDir(); tempDir.deleteOnExit(); } @After public void tearDown() { ssc.stop(); ssc = null; } @Test public void testInitialization() { Assert.assertNotNull(ssc.sc()); } @Test public void testCountWords() { StarterApp starterApp = new StarterApp(); try { JavaDStreamString lines = ssc.textFileStream(tempDir.getAbsolutePath()); JavaPairDStreamString, Integer wordCounts = starterApp.countWords(lines); System.err.println(= Word Counts ===); wordCounts.print(); System.err.println(= Word Counts ===); ssc.start(); File tmpFile = new File(tempDir.getAbsolutePath(), tmp.txt); PrintWriter writer = new PrintWriter(tmpFile, UTF-8); writer.println(8-Dec-2014: Emre Emre Emre Ergin Ergin Ergin); writer.close(); System.err.println(= Word Counts ===); wordCounts.print(); System.err.println(= Word Counts ===); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } Assert.assertTrue(true); } } = This test compiles and starts to run, Spark Streaming prints a lot of diagnostic messages on the console but the calls to wordCounts.print(); does not print anything, whereas in StarterApp.java itself, they do. I've also added ssc.awaitTermination(); after ssc.start() but nothing changed in that respect. After that I've also tried to create a new file in the directory that this Spark Streaming application was checking but this time it gave an error. For completeness, below is the wordCounts method: public JavaPairDStreamString, Integer countWords(JavaDStreamString lines) { JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { return Lists.newArrayList(SPACE.split(x)); } }); JavaPairDStreamString, Integer wordCounts = words.mapToPair( new PairFunctionString, String, Integer() { @Override public Tuple2String, Integer call(String s) { return new Tuple2(s, 1); } }).reduceByKey((i1, i2) - i1 + i2); return wordCounts; } Kind regards Emre Sevinç
Re: How can I make Spark Streaming count the words in a file in a unit test?
Hi, https://github.com/databricks/spark-perf/tree/master/streaming-tests/src/main/scala/streaming/perf contains some performance tests for streaming. There are examples of how to generate synthetic files during the test in that repo, maybe you can find some code snippets that you can use there. Best, Burak - Original Message - From: Emre Sevinc emre.sev...@gmail.com To: user@spark.apache.org Sent: Monday, December 8, 2014 2:36:41 AM Subject: How can I make Spark Streaming count the words in a file in a unit test? Hello, I've successfully built a very simple Spark Streaming application in Java that is based on the HdfsCount example in Scala at https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala . When I submit this application to my local Spark, it waits for a file to be written to a given directory, and when I create that file it successfully prints the number of words. I terminate the application by pressing Ctrl+C. Now I've tried to create a very basic unit test for this functionality, but in the test I was not able to print the same information, that is the number of words. What am I missing? Below is the unit test file, and after that I've also included the code snippet that shows the countWords method: = StarterAppTest.java = import com.google.common.io.Files; import org.apache.spark.streaming.Duration; import org.apache.spark.streaming.api.java.JavaDStream; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import org.junit.*; import java.io.*; public class StarterAppTest { JavaStreamingContext ssc; File tempDir; @Before public void setUp() { ssc = new JavaStreamingContext(local, test, new Duration(3000)); tempDir = Files.createTempDir(); tempDir.deleteOnExit(); } @After public void tearDown() { ssc.stop(); ssc = null; } @Test public void testInitialization() { Assert.assertNotNull(ssc.sc()); } @Test public void testCountWords() { StarterApp starterApp = new StarterApp(); try { JavaDStreamString lines = ssc.textFileStream(tempDir.getAbsolutePath()); JavaPairDStreamString, Integer wordCounts = starterApp.countWords(lines); System.err.println(= Word Counts ===); wordCounts.print(); System.err.println(= Word Counts ===); ssc.start(); File tmpFile = new File(tempDir.getAbsolutePath(), tmp.txt); PrintWriter writer = new PrintWriter(tmpFile, UTF-8); writer.println(8-Dec-2014: Emre Emre Emre Ergin Ergin Ergin); writer.close(); System.err.println(= Word Counts ===); wordCounts.print(); System.err.println(= Word Counts ===); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } Assert.assertTrue(true); } } = This test compiles and starts to run, Spark Streaming prints a lot of diagnostic messages on the console but the calls to wordCounts.print(); does not print anything, whereas in StarterApp.java itself, they do. I've also added ssc.awaitTermination(); after ssc.start() but nothing changed in that respect. After that I've also tried to create a new file in the directory that this Spark Streaming application was checking but this time it gave an error. For completeness, below is the wordCounts method: public JavaPairDStreamString, Integer countWords(JavaDStreamString lines) { JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { return Lists.newArrayList(SPACE.split(x)); } }); JavaPairDStreamString, Integer wordCounts = words.mapToPair( new PairFunctionString, String, Integer() { @Override public Tuple2String, Integer call(String s) { return new Tuple2(s, 1); } }).reduceByKey((i1, i2) - i1 + i2); return wordCounts; } Kind regards Emre Sevinç - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How can I compile only the core and streaming (so that I can get test utilities of streaming)?
Hello, I'm currently developing a Spark Streaming application and trying to write my first unit test. I've used Java for this application, and I also need use Java (and JUnit) for writing unit tests. I could not find any documentation that focuses on Spark Streaming unit testing, all I could find was the Java based unit tests in Spark Streaming source code: https://github.com/apache/spark/blob/branch-1.1/streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java https://webmail.cronos.be/owa/redir.aspx?C=Jqwj6T-dSk63ezaf0lH8P7NV7ZXa4NEIco3EZ9VQhfNEvS3bxfKu6ZKfDqYOFlFaCAvyOEVymdw.URL=https%3a%2f%2fgithub.com%2fapache%2fspark%2fblob%2fbranch-1.1%2fstreaming%2fsrc%2ftest%2fjava%2forg%2fapache%2fspark%2fstreaming%2fJavaAPISuite.java that depends on a Scala file: https://github.com/apache/spark/blob/branch-1.1/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala https://webmail.cronos.be/owa/redir.aspx?C=Jqwj6T-dSk63ezaf0lH8P7NV7ZXa4NEIco3EZ9VQhfNEvS3bxfKu6ZKfDqYOFlFaCAvyOEVymdw.URL=https%3a%2f%2fgithub.com%2fapache%2fspark%2fblob%2fbranch-1.1%2fstreaming%2fsrc%2ftest%2fjava%2forg%2fapache%2fspark%2fstreaming%2fJavaTestUtils.scala which, in turn, depends on the Scala test files in https://github.com/apache/spark/tree/branch-1.1/streaming/src/test/scala/org/apache/spark/streaming https://webmail.cronos.be/owa/redir.aspx?C=Jqwj6T-dSk63ezaf0lH8P7NV7ZXa4NEIco3EZ9VQhfNEvS3bxfKu6ZKfDqYOFlFaCAvyOEVymdw.URL=https%3a%2f%2fgithub.com%2fapache%2fspark%2ftree%2fbranch-1.1%2fstreaming%2fsrc%2ftest%2fscala%2forg%2fapache%2fspark%2fstreaming So I thought that I could grab the Spark source code, switch to branch-1.1 branch and then only compile 'core' and 'streaming' modules, hopefully ending up with the compiled classes (or jar files) of the Streaming test utilities, so that I can import them in my Java based Spark Streaming application. However, trying to build it via the following command line failed: mvn -pl core,streaming package You can see the full output at the end of this message. Any ideas how to progress? Full output of the build: emre@emre-ubuntu:~/code/spark$ mvn -pl core,streaming package [INFO] Scanning for projects... [INFO] [INFO] Reactor Build Order: [INFO] [INFO] Spark Project Core [INFO] Spark Project Streaming [INFO] [INFO] [INFO] Building Spark Project Core 1.1.2-SNAPSHOT [INFO] Downloading: https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.pom Downloaded: https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.pom (5 KB at 5.8 KB/sec) Downloading: https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar Downloaded: https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar (31 KB at 200.4 KB/sec) Downloading: https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.pom Downloaded: https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.pom (24 KB at 178.9 KB/sec) Downloading: https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.pom Downloaded: https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.pom (3 KB at 22.5 KB/sec) Downloading: https://repo1.maven.org/maven2/org/scala-lang/scalap/2.10.4/scalap-2.10.4.pom Downloaded: https://repo1.maven.org/maven2/org/scala-lang/scalap/2.10.4/scalap-2.10.4.pom (2 KB at 19.2 KB/sec) Downloading: https://repo1.maven.org/maven2/org/apache/derby/derby/10.4.2.0/derby-10.4.2.0.pom Downloaded: https://repo1.maven.org/maven2/org/apache/derby/derby/10.4.2.0/derby-10.4.2.0.pom (2 KB at 14.9 KB/sec) Downloading: https://repo1.maven.org/maven2/org/mockito/mockito-all/1.9.0/mockito-all-1.9.0.pom Downloaded: https://repo1.maven.org/maven2/org/mockito/mockito-all/1.9.0/mockito-all-1.9.0.pom (1010 B at 4.1 KB/sec) Downloading: https://repo1.maven.org/maven2/org/easymock/easymockclassextension/3.1/easymockclassextension-3.1.pom Downloaded: https://repo1.maven.org/maven2/org/easymock/easymockclassextension/3.1/easymockclassextension-3.1.pom (5 KB at 42.9 KB/sec) Downloading: https://repo1.maven.org/maven2/org/easymock/easymock-parent/3.1/easymock-parent-3.1.pom Downloaded: https://repo1.maven.org/maven2/org/easymock/easymock-parent/3.1/easymock-parent-3.1.pom (13 KB at 133.8 KB/sec) Downloading: https://repo1.maven.org/maven2/org/easymock/easymock/3.1/easymock-3.1.pom Downloaded: https://repo1.maven.org/maven2/org/easymock/easymock/3.1/easymock-3.1.pom (6 KB at 38.8 KB/sec) Downloading: https://repo1
Re: How can I compile only the core and streaming (so that I can get test utilities of streaming)?
Please specify '-DskipTests' on commandline. Cheers On Dec 5, 2014, at 3:52 AM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I'm currently developing a Spark Streaming application and trying to write my first unit test. I've used Java for this application, and I also need use Java (and JUnit) for writing unit tests. I could not find any documentation that focuses on Spark Streaming unit testing, all I could find was the Java based unit tests in Spark Streaming source code: https://github.com/apache/spark/blob/branch-1.1/streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java that depends on a Scala file: https://github.com/apache/spark/blob/branch-1.1/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala which, in turn, depends on the Scala test files in https://github.com/apache/spark/tree/branch-1.1/streaming/src/test/scala/org/apache/spark/streaming So I thought that I could grab the Spark source code, switch to branch-1.1 branch and then only compile 'core' and 'streaming' modules, hopefully ending up with the compiled classes (or jar files) of the Streaming test utilities, so that I can import them in my Java based Spark Streaming application. However, trying to build it via the following command line failed: mvn -pl core,streaming package You can see the full output at the end of this message. Any ideas how to progress? Full output of the build: emre@emre-ubuntu:~/code/spark$ mvn -pl core,streaming package [INFO] Scanning for projects... [INFO] [INFO] Reactor Build Order: [INFO] [INFO] Spark Project Core [INFO] Spark Project Streaming [INFO] [INFO] [INFO] Building Spark Project Core 1.1.2-SNAPSHOT [INFO] Downloading: https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.pom Downloaded: https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.pom (5 KB at 5.8 KB/sec) Downloading: https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar Downloaded: https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar (31 KB at 200.4 KB/sec) Downloading: https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.pom Downloaded: https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.pom (24 KB at 178.9 KB/sec) Downloading: https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.pom Downloaded: https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.pom (3 KB at 22.5 KB/sec) Downloading: https://repo1.maven.org/maven2/org/scala-lang/scalap/2.10.4/scalap-2.10.4.pom Downloaded: https://repo1.maven.org/maven2/org/scala-lang/scalap/2.10.4/scalap-2.10.4.pom (2 KB at 19.2 KB/sec) Downloading: https://repo1.maven.org/maven2/org/apache/derby/derby/10.4.2.0/derby-10.4.2.0.pom Downloaded: https://repo1.maven.org/maven2/org/apache/derby/derby/10.4.2.0/derby-10.4.2.0.pom (2 KB at 14.9 KB/sec) Downloading: https://repo1.maven.org/maven2/org/mockito/mockito-all/1.9.0/mockito-all-1.9.0.pom Downloaded: https://repo1.maven.org/maven2/org/mockito/mockito-all/1.9.0/mockito-all-1.9.0.pom (1010 B at 4.1 KB/sec) Downloading: https://repo1.maven.org/maven2/org/easymock/easymockclassextension/3.1/easymockclassextension-3.1.pom Downloaded: https://repo1.maven.org/maven2/org/easymock/easymockclassextension/3.1/easymockclassextension-3.1.pom (5 KB at 42.9 KB/sec) Downloading: https://repo1.maven.org/maven2/org/easymock/easymock-parent/3.1/easymock-parent-3.1.pom Downloaded: https://repo1.maven.org/maven2/org/easymock/easymock-parent/3.1/easymock-parent-3.1.pom (13 KB at 133.8 KB/sec) Downloading: https://repo1.maven.org/maven2/org/easymock/easymock/3.1/easymock-3.1.pom Downloaded: https://repo1.maven.org/maven2/org/easymock/easymock/3.1/easymock-3.1.pom (6 KB at 38.8 KB/sec) Downloading: https://repo1.maven.org/maven2/cglib/cglib-nodep/2.2.2/cglib-nodep-2.2.2.pom Downloaded: https://repo1.maven.org/maven2/cglib/cglib-nodep/2.2.2/cglib-nodep-2.2.2.pom (2 KB at 9.9 KB/sec) Downloading: https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.jar Downloading: https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.jar
Re: How can I compile only the core and streaming (so that I can get test utilities of streaming)?
Hello, Specifying '-DskipTests' on commandline worked, though I can't be sure whether first running 'sbt assembly' also contributed to the solution. (I've tried 'sbt assembly' because branch-1.1's README says to use sbt). Thanks for the answer. Kind regards, Emre Sevinç
Problems creating and reading a large test file
I am trying to look at problems reading a data file over 4G. In my testing I am trying to create such a file. My plan is to create a fasta file (a simple format used in biology) looking like 1 TCCTTACGGAGTTCGGGTGTTTATCTTACTTATCGCGGTTCGCTGCCGCTCCGGGAGCCCGGATAGGCTGCGTTAATACCTAAGGAGCGCGTATTG 2 GTCTGATCTAAATGCGACGACGTCTTTAGTGCTAAGTGGAACCCAATCTTAAGACCCAGGCTCTTAAGCAGAAACAGACCGTCCCTGCCTCCTGGAGTAT 3 ... I create a list with 5000 structures - use flatMap to add 5000 per entry and then either call saveAsText or dnaFragmentIterator = mySet.toLocalIterator(); and write to HDFS Then I try to call JavaRDDString lines = ctx.textFile(hdfsFileName); what I get on a 16 node cluster 14/12/06 01:49:21 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(pltrd007.labs.uninett.no,50119) java.nio.channels.ClosedChannelException 2 14/12/06 01:49:35 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140711-081617-711206558-5050-2543-13 The code is at the line below - I did not want to spam the group although it is only a couple of pages - I am baffled - there are no issues when I create a few thousand records but things blow up when I try 25 million records or a file of 6B or so Can someone take a look - it is not a lot of code https://drive.google.com/file/d/0B4cgoSGuA4KWUmo3UzBZRmU5M3M/view?usp=sharing
This is just a test
Test message -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/This-is-just-a-test-tp19895.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark: Simple local test failed depending on memory settings
Dear all, We encountered problems of failed jobs with huge amount of data. A simple local test was prepared for this question at https://gist.github.com/copy-of-rezo/6a137e13a1e4f841e7eb It generates 2 sets of key-value pairs, join them, selects distinct values and counts data finally. object Spill { def generate = { for{ j - 1 to 10 i - 1 to 200 } yield(j, i) } def main(args: Array[String]) { val conf = new SparkConf().setAppName(getClass.getSimpleName) conf.set(spark.shuffle.spill, true) conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) val sc = new SparkContext(conf) println(generate) val dataA = sc.parallelize(generate) val dataB = sc.parallelize(generate) val dst = dataA.join(dataB).distinct().count() println(dst) } } We compiled it locally and run 3 times with different settings of memory: 1) *--executor-memory 10M --driver-memory 10M --num-executors 1 --executor-cores 1* It fails wtih java.lang.OutOfMemoryError: GC overhead limit exceeded at . org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137) 2) *--executor-memory 20M --driver-memory 20M --num-executors 1 --executor-cores 1* It works OK 3) *--executor-memory 10M --driver-memory 10M --num-executors 1 --executor-cores 1* But let's make less data for i from 200 to 100. It reduces input data in 2 times and joined data in 4 times def generate = { for{ j - 1 to 10 i - 1 to 100 // previous value was 200 } yield(j, i) } This code works OK. We don't understand why 10M is not enough for such simple operation with 32000 bytes of ints (2 * 10 * 200 * 2 * 4) approximately? 10M of RAM works if we change the data volume in 2 times (2000 of records of (int, int)). Why spilling to disk doesn't cover this case? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Simple-local-test-failed-depending-on-memory-settings-tp19473.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SPARK BENCHMARK TEST
Hi Please can someone advice on this. On Wed, Sep 17, 2014 at 6:59 PM, VJ Shalish vjshal...@gmail.com wrote: I am trying to benchmark spark in a hadoop cluster. I need to design a sample spark job to test the CPU utilization, RAM usage, Input throughput, Output throughput and Duration of execution in the cluster. I need to test the state of the cluster for :- A spark job which uses high CPU A spark job which uses high RAM A spark job which uses high Input throughput A spark job which uses high Output throughput A spark job which takes long time. These have to be tested individually and a combination of these scenarios would also be used. Please help me in understanding the factors of a Spark Job which would contribute to CPU utilization, RAM usage, Input throughput, Output throughput, Duration of execution in the cluster. So that I can design spark jobs that could be used for testing. Thanks Shalish.
SPARK BENCHMARK TEST
I am trying to benchmark spark in a hadoop cluster. I need to design a sample spark job to test the CPU utilization, RAM usage, Input throughput, Output throughput and Duration of execution in the cluster. I need to test the state of the cluster for :- A spark job which uses high CPU A spark job which uses high RAM A spark job which uses high Input throughput A spark job which uses high Output throughput A spark job which takes long time. These have to be tested individually and a combination of these scenarios would also be used. Please help me in understanding the factors of a Spark Job which would contribute to CPU utilization, RAM usage, Input throughput, Output throughput, Duration of execution in the cluster. So that I can design spark jobs that could be used for testing. Thanks Shalish.
Re: Cannot run unit test.
When I run `sbt test-only SparkTest` or `sbt test-only SparkTest1`, it was pass. But run `set test` to tests SparkTest and SparkTest1, it was failed. If merge all cases into one file, the test was pass. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p14506.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
[spark upgrade] Error communicating with MapOutputTracker when running test cases in latest spark
I use https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala to help me with testing. In spark 9.1 my tests depending on TestSuiteBase worked fine. As soon as I switched to latest (1.0.1) all tests fail. My sbt import is: org.apache.spark %% spark-core % 1.1.0-SNAPSHOT % provided One exception I get is: Error communicating with MapOutputTracker org.apache.spark.SparkException: Error communicating with MapOutputTracker How can I fix this? Found a thread on this error but not very helpful: http://mail-archives.apache.org/mod_mbox/spark-dev/201405.mbox/%3ctencent_6b37d69c54f76819509a5...@qq.com%3E -Adrian
Re: What about implementing various hypothesis test for LogisticRegression in MLlib
Thanks for the reference! Many tests are not designed for big data: http://magazine.amstat.org/blog/2010/09/01/statrevolution/ . So we need to understand which tests are proper. Feel free to create a JIRA and let's move our discussion there. -Xiangrui On Fri, Aug 22, 2014 at 8:44 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi Xiangrui, You can refer to An Introduction to Statistical Learning with Applications in R, there are many stander hypothesis test to do regarding to linear regression and logistic regression, they should be implement in the fist order, then we will list some other testes, which are also important when using logistic regression to build score cards. Xiaobo Gu -- Original -- From: Xiangrui Meng;men...@gmail.com; Send time: Wednesday, Aug 20, 2014 2:18 PM To: guxiaobo1...@qq.com; Cc: user@spark.apache.orguser@spark.apache.org; Subject: Re: What about implementing various hypothesis test for LogisticRegression in MLlib We implemented chi-squared tests in v1.1: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166 and we will add more after v1.1. Feedback on which tests should come first would be greatly appreciated. -Xiangrui On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, From the documentation I think only the model fitting part is implement, what about the various hypothesis test and performance indexes used to evaluate the model fit? Regards, Xiaobo Gu - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: What about implementing various hypothesis test for LogisticRegression in MLlib
Hi Xiangrui, You can refer to An Introduction to Statistical Learning with Applications in R, there are many stander hypothesis test to do regarding to linear regression and logistic regression, they should be implement in the fist order, then we will list some other testes, which are also important when using logistic regression to build score cards. Xiaobo Gu -- Original -- From: Xiangrui Meng;men...@gmail.com; Send time: Wednesday, Aug 20, 2014 2:18 PM To: guxiaobo1...@qq.com; Cc: user@spark.apache.orguser@spark.apache.org; Subject: Re: What about implementing various hypothesis test for LogisticRegression in MLlib We implemented chi-squared tests in v1.1: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166 and we will add more after v1.1. Feedback on which tests should come first would be greatly appreciated. -Xiangrui On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, From the documentation I think only the model fitting part is implement, what about the various hypothesis test and performance indexes used to evaluate the model fit? Regards, Xiaobo Gu - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: What about implementing various hypothesis test for Logistic Regression in MLlib
We implemented chi-squared tests in v1.1: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166 and we will add more after v1.1. Feedback on which tests should come first would be greatly appreciated. -Xiangrui On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, From the documentation I think only the model fitting part is implement, what about the various hypothesis test and performance indexes used to evaluate the model fit? Regards, Xiaobo Gu - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
What about implementing various hypothesis test for Logistic Regression in MLlib
Hi, From the documentation I think only the model fitting part is implement, what about the various hypothesis test and performance indexes used to evaluate the model fit? Regards, Xiaobo Gu
Re: Unit Test for Spark Streaming
Hi TD, I tried some different setup on maven these days, and now I can at least get something when running mvn test. However, it seems like scalatest cannot find the test cases specified in the test suite. Here is the output I get: http://apache-spark-user-list.1001560.n3.nabble.com/file/n11825/Screen_Shot_2014-08-08_at_5.png Could you please give me some details on how you setup the ScalaTest on your machine? I believe there must be some other setup issue on my machine but I cannot figure out why... And here are the plugins and dependencies related to scalatest in my pom.xml : plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-surefire-plugin/artifactId version2.7/version configuration skipTeststrue/skipTests /configuration /plugin plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId version1.0/version configuration reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory junitxml./junitxml filereports${project.build.directory}/SparkTestSuite.txt/filereports tagsToIncludeATag/tagsToInclude systemProperties java.awt.headlesstrue/java.awt.headless spark.test.home${session.executionRootDirectory}/spark.test.home spark.testing1/spark.testing /systemProperties /configuration executions execution idtest/id goals goaltest/goal /goals /execution /executions /plugin dependency groupIdjunit/groupId artifactIdjunit/artifactId version4.8.1/version scopetest/scope /dependency dependency groupIdorg.scalatest/groupId artifactIdscalatest_2.10/artifactId version2.2.1/version scopetest/scope /dependency Thank you very much! Best Regards, Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11825.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
Thank you TD, I have worked around that problem and now the test compiles. However, I don't actually see that test running. As when I do mvn test, it just says BUILD SUCCESS, without any TEST section on stdout. Are we suppose to use mvn test to run the test? Are there any other methods can be used to run this test? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
Does it not show the name of the testsuite on stdout, showing that it has passed? Can you try writing a small test unit-test, in the same way as your kafka unit test, and with print statements on stdout ... to see whether it works? I believe it is some configuration issue in maven, which is hard for me to guess. TD On Wed, Aug 6, 2014 at 12:53 PM, JiajiaJing jj.jing0...@gmail.com wrote: Thank you TD, I have worked around that problem and now the test compiles. However, I don't actually see that test running. As when I do mvn test, it just says BUILD SUCCESS, without any TEST section on stdout. Are we suppose to use mvn test to run the test? Are there any other methods can be used to run this test? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
That function is simply deletes a directory recursively. you can use alternative libraries. see this discussion http://stackoverflow.com/questions/779519/delete-files-recursively-in-java On Tue, Aug 5, 2014 at 5:02 PM, JiajiaJing jj.jing0...@gmail.com wrote: Hi TD, I encountered a problem when trying to run the KafkaStreamSuite.scala unit test. I added scalatest-maven-plugin to my pom.xml, then ran mvn test, and got the follow error message: error: object Utils in package util cannot be accessed in package org.apache.spark.util [INFO] brokerConf.logDirs.foreach { f = Utils.deleteRecursively(new File(f)) } [INFO]^ I checked that Utils.scala does exists under spark/core/src/main/scala/org/apache/spark/util/, so I have no idea about why this access error. Could you please help me with this? Thank you very much! Best Regards, Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11505.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Unit Test for Spark Streaming
Hello Spark Users, I have a spark streaming program that stream data from kafka topics and output as parquet file on HDFS. Now I want to write a unit test for this program to make sure the output data is correct (i.e not missing any data from kafka). However, I have no idea about how to do this, especially how to mock a kafka topic. Can someone help me with this? Thank you very much! Best Regards, Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
Appropriately timed question! Here is the PR that adds a real unit test for Kafka stream in Spark Streaming. Maybe this will help! https://github.com/apache/spark/pull/1751/files On Mon, Aug 4, 2014 at 6:30 PM, JiajiaJing jj.jing0...@gmail.com wrote: Hello Spark Users, I have a spark streaming program that stream data from kafka topics and output as parquet file on HDFS. Now I want to write a unit test for this program to make sure the output data is correct (i.e not missing any data from kafka). However, I have no idea about how to do this, especially how to mock a kafka topic. Can someone help me with this? Thank you very much! Best Regards, Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
This helps a lot!! Thank you very much! Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11396.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
sbt + idea + test
Hi guys, I want to use Elasticsearch and HBase in my spark project, I want to create a test. I pulled up ES and Zookeeper, but if I put val htest = new HBaseTestingUtility() to my app I got a strange exception (compilation time, not runtime). https://gist.github.com/b0c1/4a4b3f6350816090c3b5 Any idea? -- Skype: boci13, Hangout: boci.b...@gmail.com
Re: Run spark unit test on Windows 7
It sounds really strange... I guess it is a bug, critical bug and must be fixed... at least some flag must be add (unable.hadoop) I found the next workaround : 1) download compiled winutils.exe from http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\) after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote: You don't actually need it per se - its just that some of the Spark libraries are referencing Hadoop libraries even if they ultimately do not call them. When I was doing some early builds of Spark on Windows, I admittedly had Hadoop on Windows running as well and had not run into this particular issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: No, I don't why do I need to have HDP installed? I don't use Hadoop at all and I'd like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote: By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on *Spark*, it works
Re: Run spark unit test on Windows 7
Hi Konstantin, Could you please create a jira item at: https://issues.apache.org/jira/browse/SPARK/ so this issue can be tracked? Thanks, Denny On July 2, 2014 at 11:45:24 PM, Konstantin Kudryavtsev (kudryavtsev.konstan...@gmail.com) wrote: It sounds really strange... I guess it is a bug, critical bug and must be fixed... at least some flag must be add (unable.hadoop) I found the next workaround : 1) download compiled winutils.exe from http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\) after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote: You don't actually need it per se - its just that some of the Spark libraries are referencing Hadoop libraries even if they ultimately do not call them. When I was doing some early builds of Spark on Windows, I admittedly had Hadoop on Windows running as well and had not run into this particular issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote: By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace
Re: Run spark unit test on Windows 7
Hi Denny, just created https://issues.apache.org/jira/browse/SPARK-2356 On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote: Hi Konstantin, Could you please create a jira item at: https://issues.apache.org/jira/browse/SPARK/ so this issue can be tracked? Thanks, Denny On July 2, 2014 at 11:45:24 PM, Konstantin Kudryavtsev (kudryavtsev.konstan...@gmail.com) wrote: It sounds really strange... I guess it is a bug, critical bug and must be fixed... at least some flag must be add (unable.hadoop) I found the next workaround : 1) download compiled winutils.exe from http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\) after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote: You don't actually need it per se - its just that some of the Spark libraries are referencing Hadoop libraries even if they ultimately do not call them. When I was doing some early builds of Spark on Windows, I admittedly had Hadoop on Windows running as well and had not run into this particular issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote: By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120
Re: Run spark unit test on Windows 7
Thanks! will take a look at this later today. HTH! On Jul 3, 2014, at 11:09 AM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Denny, just created https://issues.apache.org/jira/browse/SPARK-2356 On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote: Hi Konstantin, Could you please create a jira item at: https://issues.apache.org/jira/browse/SPARK/ so this issue can be tracked? Thanks, Denny On July 2, 2014 at 11:45:24 PM, Konstantin Kudryavtsev (kudryavtsev.konstan...@gmail.com) wrote: It sounds really strange... I guess it is a bug, critical bug and must be fixed... at least some flag must be add (unable.hadoop) I found the next workaround : 1) download compiled winutils.exe from http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\) after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote: You don't actually need it per se - its just that some of the Spark libraries are referencing Hadoop libraries even if they ultimately do not call them. When I was doing some early builds of Spark on Windows, I admittedly had Hadoop on Windows running as well and had not run into this particular issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote: By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke
Run spark unit test on Windows 7
Hi all, I'm trying to run some transformation on *Spark*, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (*Windows 7*) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Re: Run spark unit test on Windows 7
Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on *Spark*, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (*Windows 7*) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Re: Run spark unit test on Windows 7
Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on *Spark*, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (*Windows 7*) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Re: Run spark unit test on Windows 7
By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on Spark, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (Windows 7) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true
Re: Run spark unit test on Windows 7
No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote: By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on Spark, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (Windows 7) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data
Re: Run spark unit test on Windows 7
You don't actually need it per se - its just that some of the Spark libraries are referencing Hadoop libraries even if they ultimately do not call them. When I was doing some early builds of Spark on Windows, I admittedly had Hadoop on Windows running as well and had not run into this particular issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote: By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on *Spark*, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (*Windows 7*) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test
InputStreamsSuite test failed
Hello ,I am a new guy on scala spark, yestday i compile spark from 1.0.0 source code and run test,there is and testcase failed: For example run command in shell : sbt/sbt testOnly org.apache.spark.streaming.InputStreamsSuite the testcase: test(socket input stream) would failed , test result like : ===test result=== [info] InputStreamsSuite: [info] - socket input stream *** FAILED *** (20 seconds, 547 milliseconds) [info] 0 did not equal 5 (InputStreamsSuite.scala:96) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:318) [info] at org.apache.spark.streaming.InputStreamsSuite.newAssertionFailedException(InputStreamsSuite.scala:44) [info] at org.scalatest.Assertions$class.assert(Assertions.scala:401) [info] at org.apache.spark.streaming.InputStreamsSuite.assert(InputStreamsSuite.scala:44) [info] at org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply$mcV$sp(InputStreamsSuite.scala:96) [info] at org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply(InputStreamsSuite.scala:46) [info] at org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply(InputStreamsSuite.scala:46) [info] at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1974) [info] at org.apache.spark.streaming.InputStreamsSuite.withFixture(InputStreamsSuite.scala:44) [info] at org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262) [info] at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) [info] at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198) [info] at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271) [info] at org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputStreamsSuite.scala:44) [info] at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:171) [info] at org.apache.spark.streaming.InputStreamsSuite.runTest(InputStreamsSuite.scala:44) [info] at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) [info] at org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304) [info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260) [info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326) [info] at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304) [info] at org.apache.spark.streaming.InputStreamsSuite.runTests(InputStreamsSuite.scala:44) [info] at org.scalatest.Suite$class.run(Suite.scala:2303) [info] at org.apache.spark.streaming.InputStreamsSuite.org$scalatest$FunSuite$$super$run(InputStreamsSuite.scala:44) [info] at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) [info] at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:362) [info] at org.scalatest.FunSuite$class.run(FunSuite.scala:1310) [info] at org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$run(InputStreamsSuite.scala:44) [info] at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:208) [info] at org.apache.spark.streaming.InputStreamsSuite.run(InputStreamsSuite.scala:44) [info] at org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:214) [info] at sbt.RunnerWrapper$1.runRunner2(FrameworkWrapper.java:223) [info] at sbt.RunnerWrapper$1.execute(FrameworkWrapper.java:236) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:284) [info] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [info] at java.lang.Thread.run(Thread.java:662) I check the souce code, it seem's this assert would fail (line number is 96): assert(output.size === expectedOutput.size) so i print the output outputBuffer expectedOutput value : // Verify whether data received was as expected 85 println() 86 println(output.size = + output.size) 87 88 output.foreach(x = println([ + x.mkString(,) + ])) 89 println(outputBuffer.size=+outputBuffer.size) 90
Re: Unit test failure: Address already in use
Hi, Could your problem come from the fact that you run your tests in parallel ? If you are spark in local mode, you cannot have concurrent spark instances running. this means that your tests instantiating sparkContext cannot be run in parallel. The easiest fix is to tell sbt to not run parallel tests. This can be done by adding the following line in your build.sbt: parallelExecution in Test := false Cheers, Anselme 2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.com: Hi, I have 3 unit tests (independent of each other) in the /src/test/scala folder. When I run each of them individually using: sbt test-only test, all the 3 pass the test. But when I run them all using sbt test, then they fail with the warning below. I am wondering if the binding exception results in failure to run the job, thereby causing the failure. If so, what can I do to address this binding exception? I am running these tests locally on a standalone machine (i.e. SparkContext(local, test)). 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:174) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
RE: Unit test failure: Address already in use
Disabling parallelExecution has worked for me. Other alternatives I’ve tried that also work include: 1. Using a lock – this will let tests execute in parallel except for those using a SparkContext. If you have a large number of tests that could execute in parallel, this can shave off some time. object TestingSparkContext { val lock = new Lock() } // before you instantiate your local SparkContext TestingSparkContext.lock.acquire() // after you call sc.stop() TestingSparkContext.lock.release() 2. Sharing a local SparkContext between tests. - This is nice because your tests will run faster. Start-up and shutdown is time consuming (can add a few seconds per test). - The downside is that your tests are using the same SparkContext so they are less independent of each other. I haven’t seen issues with this yet but there are likely some things that might crop up. Best, Todd From: Anselme Vignon [mailto:anselme.vig...@flaminem.com] Sent: Wednesday, June 18, 2014 12:33 AM To: user@spark.apache.org Subject: Re: Unit test failure: Address already in use Hi, Could your problem come from the fact that you run your tests in parallel ? If you are spark in local mode, you cannot have concurrent spark instances running. this means that your tests instantiating sparkContext cannot be run in parallel. The easiest fix is to tell sbt to not run parallel tests. This can be done by adding the following line in your build.sbt: parallelExecution in Test := false Cheers, Anselme 2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.commailto:skrishna...@gmail.com: Hi, I have 3 unit tests (independent of each other) in the /src/test/scala folder. When I run each of them individually using: sbt test-only test, all the 3 pass the test. But when I run them all using sbt test, then they fail with the warning below. I am wondering if the binding exception results in failure to run the job, thereby causing the failure. If so, what can I do to address this binding exception? I am running these tests locally on a standalone machine (i.e. SparkContext(local, test)). 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@3487b78dmailto:org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:174) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Unit test failure: Address already in use
In my unit tests I have a base class that all my tests extend that has a setup and teardown method that they inherit. They look something like this: var spark: SparkContext = _ @Before def setUp() { Thread.sleep(100L) //this seems to give spark more time to reset from the previous test's tearDown spark = new SparkContext(local, test spark) } @After def tearDown() { spark.stop spark = null //not sure why this helps but it does! System.clearProperty(spark.master.port) } It's been since last fall (i.e. version 0.8.x) since I've examined this code and so I can't vouch that it is still accurate/necessary - but it still works for me. On 06/18/2014 12:59 PM, Lisonbee, Todd wrote: Disabling parallelExecution has worked for me. Other alternatives I’ve tried that also work include: 1. Using a lock – this will let tests execute in parallel except for those using a SparkContext. If you have a large number of tests that could execute in parallel, this can shave off some time. object TestingSparkContext { val lock = new Lock() } // before you instantiate your local SparkContext TestingSparkContext.lock.acquire() // after you call sc.stop() TestingSparkContext.lock.release() 2. Sharing a local SparkContext between tests. - This is nice because your tests will run faster. Start-up and shutdown is time consuming (can add a few seconds per test). - The downside is that your tests are using the same SparkContext so they are less independent of each other. I haven’t seen issues with this yet but there are likely some things that might crop up. Best, Todd *From:*Anselme Vignon [mailto:anselme.vig...@flaminem.com] *Sent:* Wednesday, June 18, 2014 12:33 AM *To:* user@spark.apache.org *Subject:* Re: Unit test failure: Address already in use Hi, Could your problem come from the fact that you run your tests in parallel ? If you are spark in local mode, you cannot have concurrent spark instances running. this means that your tests instantiating sparkContext cannot be run in parallel. The easiest fix is to tell sbt to not run parallel tests. This can be done by adding the following line in your build.sbt: parallelExecution in Test := false Cheers, Anselme 2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.com mailto:skrishna...@gmail.com: Hi, I have 3 unit tests (independent of each other) in the /src/test/scala folder. When I run each of them individually using: sbt test-only test, all the 3 pass the test. But when I run them all using sbt test, then they fail with the warning below. I am wondering if the binding exception results in failure to run the job, thereby causing the failure. If so, what can I do to address this binding exception? I am running these tests locally on a standalone machine (i.e. SparkContext(local, test)). 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@3487b78d mailto:org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:174) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Unit test failure: Address already in use
Hi, I have 3 unit tests (independent of each other) in the /src/test/scala folder. When I run each of them individually using: sbt test-only test, all the 3 pass the test. But when I run them all using sbt test, then they fail with the warning below. I am wondering if the binding exception results in failure to run the job, thereby causing the failure. If so, what can I do to address this binding exception? I am running these tests locally on a standalone machine (i.e. SparkContext(local, test)). 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:174) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
printing in unit test
Hi, My unit test is failing (the output is not matching the expected output). I would like to printout the value of the output. But rdd.foreach(r=println(r)) does not work from the unit test. How can I print or write out the output to a file/screen? thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/printing-in-unit-test-tp7611.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
unit test
Hi! I have two question: 1. I want to test my application. My app will write the result to elasticsearch (stage 1) with saveAsHadoopFile. How can I write test case for it? Need to pull up a MiniDFSCluster? Or there are other way? My application flow plan: Kafka = Spark Streaming (enrich) - Elasticsearch = Spark (map/reduce) - HBase 2. Can Spark read data from elasticsearch? What is the prefered way for this? b0c1 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/unit-test-tp7155.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Test
答复: 答复: java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-0000/2/stdout (No such file or directory)
I have just the problem resolved via running master and work daemons individually on where they are. if I execute the shell: sbin/start-all.sh , the problem always exist. 发件人: Francis.Hu [mailto:francis...@reachjunction.com] 发送时间: Tuesday, May 06, 2014 10:31 收件人: user@spark.apache.org 主题: 答复: 答复: java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or directory) i looked into the log again, all exceptions are about FileNotFoundException . In the Webui, no anymore info I can check except for the basic description of job. Attached the log file, could you help to take a look ? Thanks. Francis.Hu 发件人: Tathagata Das [mailto:tathagata.das1...@gmail.com] 发送时间: Tuesday, May 06, 2014 10:16 收件人: user@spark.apache.org 主题: Re: 答复: java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or directory) Can you check the Spark worker logs on that machine. Either from the web ui, or directly. Should be /test/spark-XXX/logs/ See if that has any error. If there is not permission issue, I am not why stdout and stderr is not being generated. TD On Mon, May 5, 2014 at 7:13 PM, Francis.Hu francis...@reachjunction.com wrote: The file does not exist in fact and no permission issue. francis@ubuntu-4:/test/spark-0.9.1$ ll work/app-20140505053550-/ total 24 drwxrwxr-x 6 francis francis 4096 May 5 05:35 ./ drwxrwxr-x 11 francis francis 4096 May 5 06:18 ../ drwxrwxr-x 2 francis francis 4096 May 5 05:35 2/ drwxrwxr-x 2 francis francis 4096 May 5 05:35 4/ drwxrwxr-x 2 francis francis 4096 May 5 05:35 7/ drwxrwxr-x 2 francis francis 4096 May 5 05:35 9/ Francis 发件人: Tathagata Das [mailto:tathagata.das1...@gmail.com] 发送时间: Tuesday, May 06, 2014 3:45 收件人: user@spark.apache.org 主题: Re: java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or directory) Do those file actually exist? Those stdout/stderr should have the output of the spark's executors running in the workers, and its weird that they dont exist. Could be permission issue - maybe the directories/files are not being generated because it cannot? TD On Mon, May 5, 2014 at 3:06 AM, Francis.Hu francis...@reachjunction.com wrote: Hi,All We run a spark cluster with three workers. created a spark streaming application, then run the spark project using below command: shell sbt run spark://192.168.219.129:7077 tcp://192.168.20.118:5556 foo we looked at the webui of workers, jobs failed without any error or info, but FileNotFoundException occurred in workers' log file as below: Is this an existent issue of spark? -in workers' logs/spark-francis-org.apache.spark.deploy.worker.Worker-1-ubuntu-4.out 14/05/05 02:39:39 WARN AbstractHttpConnection: /logPage/?appId=app-20140505053550-executorId=2logType=stdout java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:138) at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687) at org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52) at org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1040) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:976) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:363) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:483) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:920) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:982) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52
Re: Test
Got. But it doesn't indicate all can receive this test. Mail list is unstable recently. Sent from my iPhone5s On 2014年5月10日, at 13:31, Matei Zaharia matei.zaha...@gmail.com wrote: This message has no content.
Re: Test
I didn't get the original message, only the reply. Ruh-roh. On Sun, May 11, 2014 at 8:09 AM, Azuryy azury...@gmail.com wrote: Got. But it doesn't indicate all can receive this test. Mail list is unstable recently. Sent from my iPhone5s On 2014年5月10日, at 13:31, Matei Zaharia matei.zaha...@gmail.com wrote: *This message has no content.*
java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-0000/2/stdout (No such file or directory)
Hi,All We run a spark cluster with three workers. created a spark streaming application, then run the spark project using below command: shell sbt run spark://192.168.219.129:7077 tcp://192.168.20.118:5556 foo we looked at the webui of workers, jobs failed without any error or info, but FileNotFoundException occurred in workers' log file as below: Is this an existent issue of spark? -in workers' logs/spark-francis-org.apache.spark.deploy.worker.Worker-1-ubuntu-4.out- --- 14/05/05 02:39:39 WARN AbstractHttpConnection: /logPage/?appId=app-20140505053550-executorId=2logType=stdout java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:138) at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687) at org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.s cala:52) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.s cala:52) at org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java :1040) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java: 976) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135 ) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1 16) at org.eclipse.jetty.server.Server.handle(Server.java:363) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo nnection.java:483) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpC onnection.java:920) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplet e(AbstractHttpConnection.java:982) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java :82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint. java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.j ava:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java: 608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:5 43) at java.lang.Thread.run(Thread.java:722) 14/05/05 02:39:41 WARN AbstractHttpConnection: /logPage/?appId=app-20140505053550-executorId=9logType=stderr java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-/9/stderr (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:138) at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687) at org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.s cala:52) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.s cala:52) at org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java :1040) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java: 976) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135 ) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1 16) at org.eclipse.jetty.server.Server.handle(Server.java:363) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo nnection.java:483) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpC onnection.java:920) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplet e(AbstractHttpConnection.java:982) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java :82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint. java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.j ava
答复: java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-0000/2/stdout (No such file or directory)
The file does not exist in fact and no permission issue. francis@ubuntu-4:/test/spark-0.9.1$ ll work/app-20140505053550-/ total 24 drwxrwxr-x 6 francis francis 4096 May 5 05:35 ./ drwxrwxr-x 11 francis francis 4096 May 5 06:18 ../ drwxrwxr-x 2 francis francis 4096 May 5 05:35 2/ drwxrwxr-x 2 francis francis 4096 May 5 05:35 4/ drwxrwxr-x 2 francis francis 4096 May 5 05:35 7/ drwxrwxr-x 2 francis francis 4096 May 5 05:35 9/ Francis 发件人: Tathagata Das [mailto:tathagata.das1...@gmail.com] 发送时间: Tuesday, May 06, 2014 3:45 收件人: user@spark.apache.org 主题: Re: java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or directory) Do those file actually exist? Those stdout/stderr should have the output of the spark's executors running in the workers, and its weird that they dont exist. Could be permission issue - maybe the directories/files are not being generated because it cannot? TD On Mon, May 5, 2014 at 3:06 AM, Francis.Hu francis...@reachjunction.com wrote: Hi,All We run a spark cluster with three workers. created a spark streaming application, then run the spark project using below command: shell sbt run spark://192.168.219.129:7077 tcp://192.168.20.118:5556 foo we looked at the webui of workers, jobs failed without any error or info, but FileNotFoundException occurred in workers' log file as below: Is this an existent issue of spark? -in workers' logs/spark-francis-org.apache.spark.deploy.worker.Worker-1-ubuntu-4.out 14/05/05 02:39:39 WARN AbstractHttpConnection: /logPage/?appId=app-20140505053550-executorId=2logType=stdout java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:138) at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687) at org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52) at org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1040) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:976) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:363) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:483) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:920) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:982) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) 14/05/05 02:39:41 WARN AbstractHttpConnection: /logPage/?appId=app-20140505053550-executorId=9logType=stderr java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-/9/stderr (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:138) at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687) at org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52) at org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52) at org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1040) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:976