Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Terry Hole
Sean

Do you know how to tell decision tree that the "label" is a binary or set
some attributes to dataframe to carry number of classes?

Thanks!
- Terry

On Sun, Sep 6, 2015 at 5:23 PM, Sean Owen <so...@cloudera.com> wrote:

> (Sean)
> The error suggests that the type is not a binary or nominal attribute
> though. I think that's the missing step. A double-valued column need
> not be one of these attribute types.
>
> On Sun, Sep 6, 2015 at 10:14 AM, Terry Hole <hujie.ea...@gmail.com> wrote:
> > Hi, Owen,
> >
> > The dataframe "training" is from a RDD of case class:
> RDD[LabeledDocument],
> > while the case class is defined as this:
> > case class LabeledDocument(id: Long, text: String, label: Double)
> >
> > So there is already has the default "label" column with "double" type.
> >
> > I already tried to set the label column for decision tree as this:
> > val lr = new
> >
> DecisionTreeClassifier().setMaxDepth(5).setMaxBins(32).setImpurity("gini").setLabelCol("label")
> > It raised the same error.
> >
> > I also tried to change the "label" to "int" type, it also reported error
> > like following stack, I have no idea how to make this work.
> >
> > java.lang.IllegalArgumentException: requirement failed: Column label
> must be
> > of type DoubleType but was actually IntegerType.
> > at scala.Predef$.require(Predef.scala:233)
> > at
> >
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:37)
> > at
> >
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:53)
> > at
> >
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
> > at
> > org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
> > at
> >
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:162)
> > at
> >
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:162)
> > at
> >
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
> > at
> >
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
> > at
> > scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:108)
> > at
> org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:162)
> > at
> > org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:59)
> > at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:116)
> > at
> >
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
> > at
> >
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:56)
> > at
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:58)
> > at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:60)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:62)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:64)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:66)
> > at $iwC$$iwC$$iwC$$iwC$$iwC.(:68)
> > at $iwC$$iwC$$iwC$$iwC.(:70)
> > at $iwC$$iwC$$iwC.(:72)
> > at $iwC$$iwC.(:74)
> > at $iwC.(:76)
> > at (:78)
> > at .(:82)
> > at .()
> > at .(:7)
> > at .()
> > at $print()
> >
> > Thanks!
> > - Terry
> >
> > On Sun, Sep 6, 2015 at 4:53 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> I think somewhere alone the line you've not specified your label
> >> column -- it's defaulting to "label" and it does not recognize it, or
> >> at least not as a binary or nominal attribute.
> >>
> >> On Sun, Sep 6, 2015 at 5:47 AM, Terry Hole <hujie.ea...@gmail.com>
> wrote:
> >> > Hi, Experts,
> >> >
> >> > I followed the guide of spark ml pipe to test DecisionTreeClassifier
> on
> >> > spark shell with spark 1.4.1, but always meets error like following,
> do
> >> > you
> >> > have any idea how to fix this?
> >> >
> >> > The error stack:
> >> > java.lang.IllegalArgumentException: DecisionTreeClassifier was given
> >> > input
> >> > with invalid label column label, without the number of classes
> >> > specified.
> >> > See StringIndexer.
> >> > at
> >> >
> >&g

`sbt core/test` hangs on LogUrlsStandaloneSuite?

2015-09-02 Thread Jacek Laskowski
Hi,

Am I doing something off base to execute tests for core module using
sbt as follows?

[spark]> core/test
...
[info] KryoSerializerAutoResetDisabledSuite:
[info] - sort-shuffle with bypassMergeSort (SPARK-7873) (53 milliseconds)
[info] - calling deserialize() after deserializeStream() (2 milliseconds)
[info] LogUrlsStandaloneSuite:
...AND HANGS HERE :(

The reason I'm asking is that the command hangs after printing the
above [info]. I'm on Mac OS and Java 8 with the latest sources -
fc48307797912dc1d53893dce741ddda8630957b.

While taking a thread dump I can see quite a few WAITINGs and
TIMED_WAITING "at sun.misc.Unsafe.park(Native Method)"

Pozdrawiam,
Jacek

--
Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl
Follow me at https://twitter.com/jaceklaskowski
Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Writing test case for spark streaming checkpointing

2015-08-27 Thread Cody Koeninger
Kill the job in the middle of a batch, look at the worker logs to see which
offsets were being processed, verify the messages for those offsets are
read when you start the job back up

On Thu, Aug 27, 2015 at 10:14 AM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:

 Hi!

 I have enables check pointing in spark streaming with kafka. I can see that
 spark streaming is checkpointing to the mentioned directory at hdfs. How
 can
 i test that it works fine and recover with no data loss ?


 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Writing-test-case-for-spark-streaming-checkpointing-tp24475.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Writing test case for spark streaming checkpointing

2015-08-27 Thread Hafiz Mujadid
Hi!

I have enables check pointing in spark streaming with kafka. I can see that
spark streaming is checkpointing to the mentioned directory at hdfs. How can
i test that it works fine and recover with no data loss ?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-test-case-for-spark-streaming-checkpointing-tp24475.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-26 Thread Mike Trienis
Thanks for your response Yana,

I can increase the MaxPermSize parameter and it will allow me to run the
unit test a few more times before I run out of memory.

However, the primary issue is that running the same unit test in the same
JVM (multiple times) results in increased memory (each run of the unit
test) and I believe it has something to do with HiveContext not reclaiming
memory after it is finished (or I'm not shutting it down properly).

It could very well be related to sbt, however, it's not clear to me.


On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:

 The PermGen space error is controlled with MaxPermSize parameter. I run
 with this in my pom, I think copied pretty literally from Spark's own
 tests... I don't know what the sbt equivalent is but you should be able to
 pass it...possibly via SBT_OPTS?


  plugin
   groupIdorg.scalatest/groupId
   artifactIdscalatest-maven-plugin/artifactId
   version1.0/version
   configuration

 reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
   parallelfalse/parallel
   junitxml./junitxml
   filereportsSparkTestSuite.txt/filereports
   argLine-Xmx3g -XX:MaxPermSize=256m
 -XX:ReservedCodeCacheSize=512m/argLine
   stderr/
   systemProperties
   java.awt.headlesstrue/java.awt.headless
   spark.testing1/spark.testing
   spark.ui.enabledfalse/spark.ui.enabled

 spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts
   /systemProperties
   /configuration
   executions
   execution
   idtest/id
   goals
   goaltest/goal
   /goals
   /execution
   /executions
   /plugin
   /plugins


 On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com
 wrote:

 Hello,

 I am using sbt and created a unit test where I create a `HiveContext` and
 execute some query and then return. Each time I run the unit test the JVM
 will increase it's memory usage until I get the error:

 Internal error when running tests: java.lang.OutOfMemoryError: PermGen
 space
 Exception in thread Thread-2 java.io.EOFException

 As a work-around, I can fork a new JVM each time I run the unit test,
 however, it seems like a bad solution as takes a while to run the unit
 test.

 By the way, I tried to importing the TestHiveContext:

- import org.apache.spark.sql.hive.test.TestHiveContext

 However, it suffers from the same memory issue. Has anyone else suffered
 from the same problem? Note that I am running these unit tests on my mac.

 Cheers, Mike.





Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-26 Thread Michael Armbrust
I'd suggest setting sbt to fork when running tests.

On Wed, Aug 26, 2015 at 10:51 AM, Mike Trienis mike.trie...@orcsol.com
wrote:

 Thanks for your response Yana,

 I can increase the MaxPermSize parameter and it will allow me to run the
 unit test a few more times before I run out of memory.

 However, the primary issue is that running the same unit test in the same
 JVM (multiple times) results in increased memory (each run of the unit
 test) and I believe it has something to do with HiveContext not reclaiming
 memory after it is finished (or I'm not shutting it down properly).

 It could very well be related to sbt, however, it's not clear to me.


 On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska yana.kadiy...@gmail.com
 wrote:

 The PermGen space error is controlled with MaxPermSize parameter. I run
 with this in my pom, I think copied pretty literally from Spark's own
 tests... I don't know what the sbt equivalent is but you should be able to
 pass it...possibly via SBT_OPTS?


  plugin
   groupIdorg.scalatest/groupId
   artifactIdscalatest-maven-plugin/artifactId
   version1.0/version
   configuration

 reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
   parallelfalse/parallel
   junitxml./junitxml
   filereportsSparkTestSuite.txt/filereports
   argLine-Xmx3g -XX:MaxPermSize=256m
 -XX:ReservedCodeCacheSize=512m/argLine
   stderr/
   systemProperties
   java.awt.headlesstrue/java.awt.headless
   spark.testing1/spark.testing
   spark.ui.enabledfalse/spark.ui.enabled

 spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts
   /systemProperties
   /configuration
   executions
   execution
   idtest/id
   goals
   goaltest/goal
   /goals
   /execution
   /executions
   /plugin
   /plugins


 On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com
 wrote:

 Hello,

 I am using sbt and created a unit test where I create a `HiveContext`
 and execute some query and then return. Each time I run the unit test the
 JVM will increase it's memory usage until I get the error:

 Internal error when running tests: java.lang.OutOfMemoryError: PermGen
 space
 Exception in thread Thread-2 java.io.EOFException

 As a work-around, I can fork a new JVM each time I run the unit test,
 however, it seems like a bad solution as takes a while to run the unit
 test.

 By the way, I tried to importing the TestHiveContext:

- import org.apache.spark.sql.hive.test.TestHiveContext

 However, it suffers from the same memory issue. Has anyone else suffered
 from the same problem? Note that I am running these unit tests on my mac.

 Cheers, Mike.






How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-25 Thread Mike Trienis
Hello,

I am using sbt and created a unit test where I create a `HiveContext` and
execute some query and then return. Each time I run the unit test the JVM
will increase it's memory usage until I get the error:

Internal error when running tests: java.lang.OutOfMemoryError: PermGen space
Exception in thread Thread-2 java.io.EOFException

As a work-around, I can fork a new JVM each time I run the unit test,
however, it seems like a bad solution as takes a while to run the unit
test.

By the way, I tried to importing the TestHiveContext:

   - import org.apache.spark.sql.hive.test.TestHiveContext

However, it suffers from the same memory issue. Has anyone else suffered
from the same problem? Note that I am running these unit tests on my mac.

Cheers, Mike.


Re:RE: Test case for the spark sql catalyst

2015-08-25 Thread Todd


Thanks Chenghao!




At 2015-08-25 13:06:40, Cheng, Hao hao.ch...@intel.com wrote:


Yes, check the source code 
under:https://github.com/apache/spark/tree/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst

 

From: Todd [mailto:bit1...@163.com]
Sent: Tuesday, August 25, 2015 1:01 PM
To:user@spark.apache.org
Subject: Test case for the spark sql catalyst

 

Hi, Are there test cases for the spark sql catalyst, such as testing the rules 
of transforming unsolved query plan?
Thanks!

Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-25 Thread Yana Kadiyska
The PermGen space error is controlled with MaxPermSize parameter. I run
with this in my pom, I think copied pretty literally from Spark's own
tests... I don't know what the sbt equivalent is but you should be able to
pass it...possibly via SBT_OPTS?


 plugin
  groupIdorg.scalatest/groupId
  artifactIdscalatest-maven-plugin/artifactId
  version1.0/version
  configuration

reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
  parallelfalse/parallel
  junitxml./junitxml
  filereportsSparkTestSuite.txt/filereports
  argLine-Xmx3g -XX:MaxPermSize=256m
-XX:ReservedCodeCacheSize=512m/argLine
  stderr/
  systemProperties
  java.awt.headlesstrue/java.awt.headless
  spark.testing1/spark.testing
  spark.ui.enabledfalse/spark.ui.enabled

spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts
  /systemProperties
  /configuration
  executions
  execution
  idtest/id
  goals
  goaltest/goal
  /goals
  /execution
  /executions
  /plugin
  /plugins


On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com
wrote:

 Hello,

 I am using sbt and created a unit test where I create a `HiveContext` and
 execute some query and then return. Each time I run the unit test the JVM
 will increase it's memory usage until I get the error:

 Internal error when running tests: java.lang.OutOfMemoryError: PermGen
 space
 Exception in thread Thread-2 java.io.EOFException

 As a work-around, I can fork a new JVM each time I run the unit test,
 however, it seems like a bad solution as takes a while to run the unit
 test.

 By the way, I tried to importing the TestHiveContext:

- import org.apache.spark.sql.hive.test.TestHiveContext

 However, it suffers from the same memory issue. Has anyone else suffered
 from the same problem? Note that I am running these unit tests on my mac.

 Cheers, Mike.




RE: Test case for the spark sql catalyst

2015-08-24 Thread Cheng, Hao
Yes, check the source code under: 
https://github.com/apache/spark/tree/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst

From: Todd [mailto:bit1...@163.com]
Sent: Tuesday, August 25, 2015 1:01 PM
To: user@spark.apache.org
Subject: Test case for the spark sql catalyst

Hi, Are there test cases for the spark sql catalyst, such as testing the rules 
of transforming unsolved query plan?
Thanks!


Test case for the spark sql catalyst

2015-08-24 Thread Todd
Hi, Are there test cases for the spark sql catalyst, such as testing the rules 
of transforming unsolved query plan?
Thanks!


Re: HiBench test for hadoop/hive/spark cluster

2015-07-15 Thread Ted Yu
From log file:

15/07/16 11:16:56 INFO mapred.LocalDistributedCacheManager: Creating
symlink: /tmp/hadoop-root/mapred/local/1437016615898/user_agents -
/opt/HiBench-master/user_agents
15/07/16 11:16:56 INFO mapred.LocalDistributedCacheManager: Localized
hdfs://spark-study:9000/HiBench/Aggregation/temp/user_agents as
file:/tmp/hadoop-root/mapred/local/ 1437016615898/user_agents
...
java.io.FileNotFoundException:
file:/tmp/hadoop-root/mapred/local/1437016615898/user_agents (No such file
or directory)
  at java.io.FileInputStream.open(Native Method)

However, FileNotFoundException didn't happen to other localized files, such
as country_codes

FYI

On Wed, Jul 15, 2015 at 8:53 PM, luohui20...@sina.com wrote:

 Hi all

   when I am running my HiBench in my spark/Hadoop/Hive cluster. I
 found there is always a failure in my aggregation test. I doubt this
 problem maybe some issue relative with my hive settings? attaches are my
 config file  and log file .

   Any idea to solve this issue?

   my Spark cluster is a one node standalone cluster :

 hadoop:2.7

 spark:1.3

 hive:1.2.1

   here is the log:

 Prepare aggregation ...
 Exec script: /opt/HiBench-master/workloads/aggregation/prepare/prepare.sh
 Parsing conf: /opt/HiBench-master/conf/00-default-properties.conf
 Parsing conf: /opt/HiBench-master/conf/10-data-scale-profile.conf
 Parsing conf: /opt/HiBench-master/conf/99-user_defined_properties.conf
 Parsing conf:
 /opt/HiBench-master/workloads/aggregation/conf/00-aggregation-default.conf
 Parsing conf:
 /opt/HiBench-master/workloads/aggregation/conf/10-aggregation-userdefine.conf
 start HadoopPrepareAggregation bench
 hdfs rm -r: /usr/lib/hadoop/bin/hadoop --config /usr/lib/hadoop/etc/hadoop
 fs -rm -r -skipTrash hdfs://spark-study:9000/HiBench/Aggregation/Input
 15/07/16 11:16:26 WARN util.NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 Deleted hdfs://spark-study:9000/HiBench/Aggregation/Input
 Pages:12, USERVISITS:100
 Submit MapReduce Job: /usr/lib/hadoop/bin/hadoop --config
 /usr/lib/hadoop/etc/hadoop jar
 /opt/HiBench-master/src/autogen/target/autogen-4.0-SNAPSHOT-jar-with-dependencies.jar
 HiBench.DataGen -t hive -b hdfs://spark-study:9000/HiBench/Aggregation -n
 Input -m 12 -r 6 -p 12 -v 100 -o sequence
 15/07/16 11:16:34 WARN util.NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 sfubmzperrrbupqoq
 15/07/16 11:16:38 WARN mapreduce.JobResourceUploader: Hadoop command-line
 option parsing not performed. Implement the Tool interface and execute your
 application with ToolRunner to remedy this.
 15/07/16 11:16:40 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:42 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:43 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:43 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:44 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:45 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:45 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:46 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:46 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!!
 15/07/16 11:16:48 INFO reduce.EventFetcher: EventFetcher is interrupted..
 Returning
 15/07/16 11:16:50 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 0, 0
 erros, 0 missed
 15/07/16 11:16:50 INFO reduce.EventFetcher: EventFetcher is interrupted..
 Returning
 15/07/16 11:16:51 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 1, 0
 erros, 0 missed
 15/07/16 11:16:51 INFO reduce.EventFetcher: EventFetcher is interrupted..
 Returning
 15/07/16 11:16:51 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 2, 0
 erros, 0 missed
 15/07/16 11:16:52 INFO reduce.EventFetcher: EventFetcher is interrupted..
 Returning
 15/07/16 11:16:52 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 3, 0
 erros, 0 missed
 15/07/16 11:16:52 INFO reduce.EventFetcher: EventFetcher is interrupted..
 Returning
 15/07/16 11:16:53 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 4, 0
 erros, 0 missed
 15/07/16 11:16:53 INFO reduce.EventFetcher: EventFetcher is interrupted..
 Returning
 15/07/16 11:16:54 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 5, 0
 erros, 0 missed
 15/07/16 11:16:55 WARN mapreduce.JobResourceUploader: Hadoop command-line
 option parsing not performed. Implement the Tool interface and execute your
 application with ToolRunner to remedy this.
 java.io.FileNotFoundException:
 file:/tmp/hadoop-root/mapred/local/1437016615898/user_agents (No such file
 or directory)
 java.io.FileNotFoundException:
 file:/tmp/hadoop-root/mapred/local/1437016615898/user_agents

回复:Re: HiBench test for hadoop/hive/spark cluster

2015-07-15 Thread luohui20001
Hi Ted  Thanks for your advice, i found that there is something wrong with 
hadoop fs -get command, 'cause I believe the localization of 
hdfs://spark-study:9000/HiBench/Aggregation/temp/user_agents to 
/tmp/hadoop-root/mapred/local/1437016615898/user_agents is a behaviour like 
hadoop fs -get  I am forwarding to solve below issue 
now,:[root@spark-study HiBench-master]# hadoop fs -get 
/HiBench/Aggregation/temp/user_agents
15/07/16 12:31:55 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/07/16 12:31:58 WARN hdfs.DFSClient: DFSInputStream has been closed already   

Hi Mike:  I am new to Hibench, so I just setup a test enviroment of 1 node 
spark/hadoop cluster to test, no data actually. Because Hibench will 
autogenerate test data itself.


 

Thanksamp;Best regards!
San.Luo

- 原始邮件 -
发件人:Ted Yu yuzhih...@gmail.com
收件人:罗辉 luohui20...@sina.com
抄送人:jie.huang jie.hu...@intel.com, user user@spark.apache.org
主题:Re: HiBench test for hadoop/hive/spark cluster
日期:2015年07月16日 12点17分

From log file:
15/07/16 11:16:56 INFO mapred.LocalDistributedCacheManager: Creating symlink: 
/tmp/hadoop-root/mapred/local/1437016615898/user_agents - 
/opt/HiBench-master/user_agents15/07/16 11:16:56 INFO 
mapred.LocalDistributedCacheManager: Localized 
hdfs://spark-study:9000/HiBench/Aggregation/temp/user_agents as 
file:/tmp/hadoop-root/mapred/local/ 
1437016615898/user_agents...java.io.FileNotFoundException: 
file:/tmp/hadoop-root/mapred/local/1437016615898/user_agents (No such file or 
directory)  at java.io.FileInputStream.open(Native Method)
However, FileNotFoundException didn't happen to other localized files, such as 
country_codes
FYI
On Wed, Jul 15, 2015 at 8:53 PM,  luohui20...@sina.com wrote:
Hi all
  when I am running my HiBench in my spark/Hadoop/Hive cluster. I found 
there is always a failure in my aggregation test. I doubt this problem maybe 
some issue relative with my hive settings? attaches are my config file  and log 
file .  Any idea to solve this issue?
  my Spark cluster is a one node standalone cluster 
:hadoop:2.7spark:1.3hive:1.2.1
  here is the log:Prepare aggregation ...
Exec script: /opt/HiBench-master/workloads/aggregation/prepare/prepare.sh
Parsing conf: /opt/HiBench-master/conf/00-default-properties.conf
Parsing conf: /opt/HiBench-master/conf/10-data-scale-profile.conf
Parsing conf: /opt/HiBench-master/conf/99-user_defined_properties.conf
Parsing conf: 
/opt/HiBench-master/workloads/aggregation/conf/00-aggregation-default.conf
Parsing conf: 
/opt/HiBench-master/workloads/aggregation/conf/10-aggregation-userdefine.conf
start HadoopPrepareAggregation bench
hdfs rm -r: /usr/lib/hadoop/bin/hadoop --config /usr/lib/hadoop/etc/hadoop fs 
-rm -r -skipTrash hdfs://spark-study:9000/HiBench/Aggregation/Input
15/07/16 11:16:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Deleted hdfs://spark-study:9000/HiBench/Aggregation/Input
Pages:12, USERVISITS:100
Submit MapReduce Job: /usr/lib/hadoop/bin/hadoop --config 
/usr/lib/hadoop/etc/hadoop jar 
/opt/HiBench-master/src/autogen/target/autogen-4.0-SNAPSHOT-jar-with-dependencies.jar
 HiBench.DataGen -t hive -b hdfs://spark-study:9000/HiBench/Aggregation -n 
Input -m 12 -r 6 -p 12 -v 100 -o sequence
15/07/16 11:16:34 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
sfubmzperrrbupqoq
15/07/16 11:16:38 WARN mapreduce.JobResourceUploader: Hadoop command-line 
option parsing not performed. Implement the Tool interface and execute your 
application with ToolRunner to remedy this.
15/07/16 11:16:40 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:42 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:43 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:43 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:44 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:45 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:45 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:46 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:46 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:47 INFO HiBench.HtmlCore: WARNING: dict empty!!!
15/07/16 11:16:48 INFO reduce.EventFetcher: EventFetcher is interrupted.. 
Returning
15/07/16 11:16:50 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 0, 0 
erros, 0 missed
15/07/16 11:16:50 INFO reduce.EventFetcher: EventFetcher is interrupted.. 
Returning
15/07/16 11:16:51 INFO HiBench.HiveData$GenerateRankingsReducer: pid: 1, 0 
erros, 0 missed
15/07/16 11:16:51 INFO reduce.EventFetcher

Spark stream test throw org.apache.spark.SparkException: Task not serializable when execute in spark shell

2015-06-24 Thread yuemeng (A)
hi ,all

there two examples one is throw Task not serializable when execute in spark 
shell,the other one is ok,i am very puzzled,can anyone give what's different 
about this two code and why the other is ok

1.The one which throw Task not serializable :

import org.apache.spark._
import SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.broadcast._



@transient val ssc = new StreamingContext(sc, Seconds(5))
val lines = ssc.textFileStream(/a.log)





val testFun = (line:String) = {
if ((line.contains( ERROR)) || (line.startsWith(Spark))){
true
}
else{
false
}
}

val p_date_bc = sc.broadcast(^\\w+ \\wfile://\\w+ \\dfile://\\d+ 
\\d{2}:\\d{2}:\\d{2file://\\d{2}:\\d{2}:\\d{2} \\d{4}.rfile://\\d{4}.r)
val p_ORA_bc = sc.broadcast(^ORA-\\d+.+.r)
val A = (iter: 
Iterator[String],data_bc:Broadcast[scala.util.matching.Regex],ORA_bc:Broadcast[scala.util.matching.Regex])
 = {
val p_date = data_bc.value
val p_ORA = ORA_bc.value
var res = List[String]()
var lasttime = 

while (iter.hasNext) {
val line = iter.next.toString
val currentcode = p_ORA findFirstIn line getOrElse null
if (currentcode != null){
res ::= lasttime +  |  + currentcode
}else{
val currentdate = p_date findFirstIn line getOrElse null
if (currentdate != null){
lasttime = currentdate
}
}
}
res.iterator
}

val cdd = lines.filter(testFun).mapPartitions(x = A(x,p_date_bc,p_ORA_bc))  
//org.apache.spark.SparkException: Task not serializable



2.The other one is ok:



import org.apache.spark._
import SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.broadcast._



val ssc = new StreamingContext(sc, Seconds(5))
val lines = ssc.textFileStream(/a.log)





val testFun = (line:String) = {
if ((line.contains( ERROR)) || (line.startsWith(Spark))){
true
}
else{
false
}
}



val A = (iter: Iterator[String]) = {

var res = List[String]()
var lasttime = 
while (iter.hasNext) {
val line = iter.next.toString
val currentcode = ^\\w+ \\wfile://\\w+ \\dfile://\\d+ 
\\d{2}:\\d{2}:\\d{2file://\\d{2}:\\d{2}:\\d{2} 
\\d{4}.r.findFirstIn(line).getOrElse(nullfile://\\d{4}.r.findFirstIn(line).getOrElse(null)
if (currentcode != null){
res ::= lasttime +  |  + currentcode
}else{
 val currentdate = 
^ORA-\\d+.+.r.findFirstIn(line).getOrElse(null)
if (currentdate != null){
lasttime = currentdate
}
}
}
res.iterator
}



val cdd= lines.filter(testFun).mapPartitions(A)










Re: Spark stream test throw org.apache.spark.SparkException: Task not serializable when execute in spark shell

2015-06-24 Thread Yana Kadiyska
I can't tell immediately, but you might be able to get more info with the
hint provided here:
http://stackoverflow.com/questions/27980781/spark-task-not-serializable-with-simple-accumulator
(short version, set -Dsun.io.serialization.extendedDebugInfo=true)

Also, unless you're simplifying your example a lot, you only have 2
regexes, so I'm not quite sure why you want to broadcast them, as opposed
to just having an object that holds them on each executor, or just create
them at the start of mapPartitions (outside of iter.hasNext as shown in
your second snippet). Broadcasting seems overcomplicated, but maybe you
just showed a simplified example...

On Wed, Jun 24, 2015 at 8:41 AM, yuemeng (A) yueme...@huawei.com wrote:

  hi ,all

 there two examples one is throw Task not serializable when execute in
 spark shell,the other one is ok,i am very puzzled,can anyone give what's
 different about this two code and why the other is ok

 1.The one which throw Task not serializable :

 import org.apache.spark._
 import SparkContext._
 import org.apache.spark.SparkConf
 import org.apache.spark.streaming.{Seconds, StreamingContext}
 import org.apache.spark.broadcast._



 @transient val ssc = new StreamingContext(sc, Seconds(5))
 val lines = ssc.textFileStream(/a.log)





 val testFun = (line:String) = {
 if ((line.contains( ERROR)) || (line.startsWith(Spark))){
 true
 }
 else{
 false
 }
 }

 val p_date_bc = sc.broadcast(^\\w+ \\w+ \\d+ \\d{2}:\\d{2}:\\d{2}
 \\d{4}.r)
 val p_ORA_bc = sc.broadcast(^ORA-\\d+.+.r)
 val A = (iter:
 Iterator[String],data_bc:Broadcast[scala.util.matching.Regex],ORA_bc:Broadcast[scala.util.matching.Regex])
 = {
 val p_date = data_bc.value
 val p_ORA = ORA_bc.value
 var res = List[String]()
 var lasttime = 

 while (iter.hasNext) {
 val line = iter.next.toString
 val currentcode = p_ORA findFirstIn line getOrElse null
 if (currentcode != null){
 res ::= lasttime +  |  + currentcode
 }else{
 val currentdate = p_date findFirstIn line getOrElse
 null
 if (currentdate != null){
 lasttime = currentdate
 }
 }
 }
 res.iterator
 }

 val cdd = lines.filter(testFun).mapPartitions(x =
 A(x,p_date_bc,p_ORA_bc))  //org.apache.spark.SparkException: Task not
 serializable



 2.The other one is ok:



 import org.apache.spark._
 import SparkContext._
 import org.apache.spark.SparkConf
 import org.apache.spark.streaming.{Seconds, StreamingContext}
 import org.apache.spark.broadcast._



 val ssc = new StreamingContext(sc, Seconds(5))
 val lines = ssc.textFileStream(/a.log)





 val testFun = (line:String) = {
 if ((line.contains( ERROR)) || (line.startsWith(Spark))){
 true
 }
 else{
 false
 }
 }



 val A = (iter: Iterator[String]) = {

 var res = List[String]()
 var lasttime = 
 while (iter.hasNext) {
 val line = iter.next.toString
 val currentcode = ^\\w+ \\w+ \\d+ \\d{2}:\\d{2}:\\d{2}
 \\d{4}.r.findFirstIn(line).getOrElse(null)
 if (currentcode != null){
 res ::= lasttime +  |  + currentcode
 }else{
  val currentdate =
 ^ORA-\\d+.+.r.findFirstIn(line).getOrElse(null)
 if (currentdate != null){
 lasttime = currentdate
 }
 }
 }
 res.iterator
 }



 val cdd= lines.filter(testFun).mapPartitions(A)











Re: Spark Maven Test error

2015-06-10 Thread Rick Moritz
Dear List,

I'm trying to reference a lonely message to this list from March 25th,(
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Maven-Test-error-td22216.html
), but I'm unsure this will thread properly. Sorry, if didn't work out.

Anyway, using Spark 1.4.0-RC4 I run into the same issue when performing
tests, using

build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phadoop-provided
-Phive -Phive-thriftserver test
after a successful clean-package build.

The error is:
java.lang.IllegalStateException: failed to create a child event loop


Could this be due to another instance of Spark blocking ports? In that case
maybe the test case should be able to adapt to this particular issue.

Thanks for any help,

Rick


Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-05-26 Thread Mohammad Islam
I got a similar problem.I'm not sure if your problem is already resolved.
For the record, I solved this type of error by calling 
sc..setMaster(yarn-cluster);  If you find the solution, please let us know.
Regards,Mohammad




 On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com 
wrote:
   

 I am trying to run a Hive query from Spark using HiveContext. Here is the
code

/ val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)
    
  
    conf.set(spark.executor.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
    conf.set(spark.driver.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
    conf.set(spark.yarn.am.waitTime, 30L)
    
    val sc = new SparkContext(conf)

    val sqlContext = new HiveContext(sc)

    def inputRDD = sqlContext.sql(describe
spark_poc.src_digital_profile_user);

    inputRDD.collect().foreach { println }
    
    println(inputRDD.schema.getClass.getName)
/

Getting this exception. Any clues? The weird part is if I try to do the same
thing but in Java instead of Scala, it runs fine.

/Exception in thread Driver java.lang.NullPointerException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not
initialize after waiting for 1 ms. Please check earlier log output for
errors. Failing the application.
Exception in thread main java.lang.NullPointerException
    at
org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218)
    at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110)
    at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434)
    at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
    at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
    at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433)
    at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



  

Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-05-26 Thread Nitin kak
That is a much better solution than how I resolved it. I got around it by
placing comma separated jar paths for all the hive related jars in --jars
clause.

I will try your solution. Thanks for sharing it.

On Tue, May 26, 2015 at 4:14 AM, Mohammad Islam misla...@yahoo.com wrote:

 I got a similar problem.
 I'm not sure if your problem is already resolved.

 For the record, I solved this type of error by calling sc..setMaster(
 yarn-cluster);

 If you find the solution, please let us know.

 Regards,
 Mohammad





   On Friday, March 6, 2015 2:47 PM, nitinkak001 nitinkak...@gmail.com
 wrote:


 I am trying to run a Hive query from Spark using HiveContext. Here is the
 code

 / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)


 conf.set(spark.executor.extraClassPath,
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
 conf.set(spark.driver.extraClassPath,
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
 conf.set(spark.yarn.am.waitTime, 30L)

 val sc = new SparkContext(conf)

 val sqlContext = new HiveContext(sc)

 def inputRDD = sqlContext.sql(describe
 spark_poc.src_digital_profile_user);

 inputRDD.collect().foreach { println }

 println(inputRDD.schema.getClass.getName)
 /

 Getting this exception. Any clues? The weird part is if I try to do the
 same
 thing but in Java instead of Scala, it runs fine.

 /Exception in thread Driver java.lang.NullPointerException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
 15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not
 initialize after waiting for 1 ms. Please check earlier log output for
 errors. Failing the application.
 Exception in thread main java.lang.NullPointerException
 at

 org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434)
 at

 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
 at

 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at

 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at

 org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
 15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a
 signal./



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed

2015-05-20 Thread Tathagata Das
Has this been fixed for you now? There has been a number of patches since
then and it may have been fixed.

On Thu, May 14, 2015 at 7:20 AM, Wangfei (X) wangf...@huawei.com wrote:

  Yes it is repeatedly on my locally Jenkins.

 发自我的 iPhone

 在 2015年5月14日,18:30,Tathagata Das t...@databricks.com 写道:

   Do you get this failure repeatedly?



 On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote:

 Hi, all, i got following error when i run unit test of spark by
 dev/run-tests
 on the latest branch-1.4 branch.

 the latest commit id:
 commit d518c0369fa412567855980c3f0f426cde5c190d
 Author: zsxwing zsxw...@gmail.com
 Date:   Wed May 13 17:58:29 2015 -0700

 error

 [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started
 [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed:
 org.apache.spark.SparkException: Error communicating with MapOutputTracker
 [error] at
 org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113)
 [error] at
 org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119)
 [error] at
 org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
 [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
 [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577)
 [error] at

 org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626)
 [error] at

 org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597)
 [error] at

 org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102)
 [error] at

 org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
 [error] at

 org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
 [error] at
 org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala)
 [error] at
 org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103)
 [error] ...
 [error] Caused by: org.apache.spark.SparkException: Error sending message
 [message = StopMapOutputTracker]
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116)
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
 [error] at
 org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
 [error] ... 52 more
 [error] Caused by: java.util.concurrent.TimeoutException: Futures timed
 out
 after [120 seconds]
 [error] at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 [error] at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 [error] at
 scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 [error] at

 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 [error] at scala.concurrent.Await$.result(package.scala:107)
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
 [error] ... 54 more



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed

2015-05-14 Thread Wangfei (X)
Yes it is repeatedly on my locally Jenkins.

发自我的 iPhone

在 2015年5月14日,18:30,Tathagata Das 
t...@databricks.commailto:t...@databricks.com 写道:

Do you get this failure repeatedly?



On Thu, May 14, 2015 at 12:55 AM, kf 
wangf...@huawei.commailto:wangf...@huawei.com wrote:
Hi, all, i got following error when i run unit test of spark by dev/run-tests
on the latest branch-1.4 branch.

the latest commit id:
commit d518c0369fa412567855980c3f0f426cde5c190d
Author: zsxwing zsxw...@gmail.commailto:zsxw...@gmail.com
Date:   Wed May 13 17:58:29 2015 -0700

error

[info] Test org.apache.spark.streaming.JavaAPISuite.testCount started
[error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed:
org.apache.spark.SparkException: Error communicating with MapOutputTracker
[error] at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113)
[error] at
org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119)
[error] at
org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
[error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
[error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577)
[error] at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626)
[error] at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597)
[error] at
org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala)
[error] at
org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103)
[error] ...
[error] Caused by: org.apache.spark.SparkException: Error sending message
[message = StopMapOutputTracker]
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116)
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
[error] at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
[error] ... 52 more
[error] Caused by: java.util.concurrent.TimeoutException: Futures timed out
after [120 seconds]
[error] at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
[error] at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
[error] at
scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
[error] at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
[error] at scala.concurrent.Await$.result(package.scala:107)
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
[error] ... 54 more



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html
Sent from the Apache Spark User List mailing list archive at 
Nabble.comhttp://Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org




[Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed

2015-05-14 Thread kf
Hi, all, i got following error when i run unit test of spark by dev/run-tests
on the latest branch-1.4 branch. 

the latest commit id: 
commit d518c0369fa412567855980c3f0f426cde5c190d
Author: zsxwing zsxw...@gmail.com
Date:   Wed May 13 17:58:29 2015 -0700

error

[info] Test org.apache.spark.streaming.JavaAPISuite.testCount started
[error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed:
org.apache.spark.SparkException: Error communicating with MapOutputTracker
[error] at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113)
[error] at
org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119)
[error] at
org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
[error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
[error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577)
[error] at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626)
[error] at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597)
[error] at
org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala)
[error] at
org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103)
[error] ...
[error] Caused by: org.apache.spark.SparkException: Error sending message
[message = StopMapOutputTracker]
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116)
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
[error] at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
[error] ... 52 more
[error] Caused by: java.util.concurrent.TimeoutException: Futures timed out
after [120 seconds]
[error] at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
[error] at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
[error] at
scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
[error] at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
[error] at scala.concurrent.Await$.result(package.scala:107)
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
[error] ... 54 more



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed

2015-05-14 Thread Tathagata Das
Do you get this failure repeatedly?



On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote:

 Hi, all, i got following error when i run unit test of spark by
 dev/run-tests
 on the latest branch-1.4 branch.

 the latest commit id:
 commit d518c0369fa412567855980c3f0f426cde5c190d
 Author: zsxwing zsxw...@gmail.com
 Date:   Wed May 13 17:58:29 2015 -0700

 error

 [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started
 [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed:
 org.apache.spark.SparkException: Error communicating with MapOutputTracker
 [error] at
 org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113)
 [error] at
 org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119)
 [error] at
 org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
 [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
 [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577)
 [error] at

 org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626)
 [error] at

 org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597)
 [error] at

 org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102)
 [error] at

 org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
 [error] at

 org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
 [error] at
 org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala)
 [error] at
 org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103)
 [error] ...
 [error] Caused by: org.apache.spark.SparkException: Error sending message
 [message = StopMapOutputTracker]
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116)
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
 [error] at
 org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
 [error] ... 52 more
 [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out
 after [120 seconds]
 [error] at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 [error] at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 [error] at
 scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 [error] at

 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 [error] at scala.concurrent.Await$.result(package.scala:107)
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
 [error] ... 54 more



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark unit test fails

2015-05-07 Thread NoWisdom
I'm also getting the same error.

Any ideas?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368p22798.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



contributing code - how to test

2015-04-24 Thread Deborah Siegel
Hi,

I selected a starter task in JIRA, and made changes to my github fork of
the current code.

I assumed I would be able to build and test.
% mvn clean compile was fine
but
%mvn package failed

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-surefire-plugin:2.18:test (default-test) on
project spark-launcher_2.10: There are test failures.

I then reverted my changes, but same story. Any advice is appreciated!

Deb


Re: contributing code - how to test

2015-04-24 Thread Sean Owen
The standard incantation -- which is a little different from standard
Maven practice -- is:

mvn -DskipTests [your options] clean package
mvn [your options] test

Some tests require the assembly, so you have to do it this way.

I don't know what the test failures were, you didn't post them, but
I'm guessing this is the cause since it failed very early on the
launcher module and not on some module that you changed.

Sean


On Fri, Apr 24, 2015 at 7:35 PM, Deborah Siegel
deborah.sie...@gmail.com wrote:
 Hi,

 I selected a starter task in JIRA, and made changes to my github fork of
 the current code.

 I assumed I would be able to build and test.
 % mvn clean compile was fine
 but
 %mvn package failed

 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-surefire-plugin:2.18:test (default-test) on
 project spark-launcher_2.10: There are test failures.

 I then reverted my changes, but same story. Any advice is appreciated!

 Deb

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



ChiSquared Test from user response flat files to RDD[Vector]?

2015-04-20 Thread Dan DeCapria, CivicScience
Hi Spark community,

I'm very new to the Apache Spark community; but if this (very active) group
is anything like the other Apache project user groups I've worked with, I'm
going to enjoy discussions here very much. Thanks in advance!

Use Case:
I am trying to go from flat files of user response data, to contingency
tables of frequency counts, to Pearson Chi-Square correlation statistics
and perform a Chi-Squared hypothesis test.  The user response data
represents a multiple choice question-answer (MCQ) format. The goal is to
compute all choose-two combinations of question answers (precondition,
question X question) contingency tables. Each cell of the contingency table
is the intersection of the users whom responded per each option per each
question of the table.

An overview of the problem:
// data ingestion and typing schema: Observation (u: String, d:
java.util.Date, t: String, q: String, v: String, a: Int)
// a question (q) has a finite set of response options (v) per which a user
(u) responds
// additional response fields are not required per this test
for (precondition a) {
  for (q_i in lex ordered questions) {
for (q_j in lex ordered question, q_j  q_i) {
forall v_k \in q_i get set of distinct users {u}_ik
forall v_l \in q_j get set of distinct users {u}_jl
forall cells per table (a,q_i,q_j) defn C_ijkl = |intersect({u}_ik,
{u}_jl)| // contingency table construct
compute chisq test per this contingency table and persist
}
  }
}

The scala main I'm testing is provided below, and I was planning to use the
provided example
https://spark.apache.org/docs/1.3.1/mllib-statistics.html however
I am not sure how to go from my RDD[Observation] to the necessary
precondition of RDD[Vector] for ingestion

  def main(args: Array[String]): Unit = {
// setup main space for test
val conf = new SparkConf().setAppName(TestMain)
val sc = new SparkContext(conf)

// data ETL and typing schema
case class Observation (u: String, d: java.util.Date, t: String, q:
String, v: String, a: Int)
val date_format = new java.text.SimpleDateFormat(MMdd)
val data_project_abs_dir = /my/path/to/data/files
val data_files = data_project_abs_dir + /part-*.gz
val data = sc.textFile(data_files)
val observations = data.map(line = line.split(,).map(_.trim)).map(r
= Observation(r(0).toString, date_format.parse(r(1).toString),
r(2).toString, r(3).toString, r(4).toString, r(5).toInt))
observations.cache

// ToDo: the basic keying of the space, possibly...
val qvu = observations.map(o = ((o.a, o.q, o.v), o.u)).distinct

// ToDo: ok, so now how to get this into the precondition RDD[Vector]
from the Spark example to make a contingency table?...

// ToDo: perform then persist the resulting chisq and p-value on these
contingency tables...
  }


Any help is appreciated.

Thanks!  -Dan


Re: ChiSquared Test from user response flat files to RDD[Vector]?

2015-04-20 Thread Xiangrui Meng
You can find the user guide for vector creation here:
http://spark.apache.org/docs/latest/mllib-data-types.html#local-vector.
-Xiangrui

On Mon, Apr 20, 2015 at 2:32 PM, Dan DeCapria, CivicScience
dan.decap...@civicscience.com wrote:
 Hi Spark community,

 I'm very new to the Apache Spark community; but if this (very active) group
 is anything like the other Apache project user groups I've worked with, I'm
 going to enjoy discussions here very much. Thanks in advance!

 Use Case:
 I am trying to go from flat files of user response data, to contingency
 tables of frequency counts, to Pearson Chi-Square correlation statistics and
 perform a Chi-Squared hypothesis test.  The user response data represents a
 multiple choice question-answer (MCQ) format. The goal is to compute all
 choose-two combinations of question answers (precondition, question X
 question) contingency tables. Each cell of the contingency table is the
 intersection of the users whom responded per each option per each question
 of the table.

 An overview of the problem:
 // data ingestion and typing schema: Observation (u: String, d:
 java.util.Date, t: String, q: String, v: String, a: Int)
 // a question (q) has a finite set of response options (v) per which a user
 (u) responds
 // additional response fields are not required per this test
 for (precondition a) {
   for (q_i in lex ordered questions) {
 for (q_j in lex ordered question, q_j  q_i) {
 forall v_k \in q_i get set of distinct users {u}_ik
 forall v_l \in q_j get set of distinct users {u}_jl
 forall cells per table (a,q_i,q_j) defn C_ijkl = |intersect({u}_ik,
 {u}_jl)| // contingency table construct
 compute chisq test per this contingency table and persist
 }
   }
 }

 The scala main I'm testing is provided below, and I was planning to use the
 provided example https://spark.apache.org/docs/1.3.1/mllib-statistics.html
 however I am not sure how to go from my RDD[Observation] to the necessary
 precondition of RDD[Vector] for ingestion

   def main(args: Array[String]): Unit = {
 // setup main space for test
 val conf = new SparkConf().setAppName(TestMain)
 val sc = new SparkContext(conf)

 // data ETL and typing schema
 case class Observation (u: String, d: java.util.Date, t: String, q:
 String, v: String, a: Int)
 val date_format = new java.text.SimpleDateFormat(MMdd)
 val data_project_abs_dir = /my/path/to/data/files
 val data_files = data_project_abs_dir + /part-*.gz
 val data = sc.textFile(data_files)
 val observations = data.map(line = line.split(,).map(_.trim)).map(r
 = Observation(r(0).toString, date_format.parse(r(1).toString),
 r(2).toString, r(3).toString, r(4).toString, r(5).toInt))
 observations.cache

 // ToDo: the basic keying of the space, possibly...
 val qvu = observations.map(o = ((o.a, o.q, o.v), o.u)).distinct

 // ToDo: ok, so now how to get this into the precondition RDD[Vector]
 from the Spark example to make a contingency table?...

 // ToDo: perform then persist the resulting chisq and p-value on these
 contingency tables...
   }


 Any help is appreciated.

 Thanks!  -Dan


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Cannot run unit test.

2015-04-08 Thread Mike Trienis
It's because your tests are running in parallel and you can only have one
context running at a time. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p22429.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark unit test fails

2015-04-06 Thread Manas Kar
Trying to bump up the rank of the question.
Any example on Github can someone point to?

..Manas

On Fri, Apr 3, 2015 at 9:39 AM, manasdebashiskar manasdebashis...@gmail.com
 wrote:

 Hi experts,
  I am trying to write unit tests for my spark application which fails with
 javax.servlet.FilterRegistration error.

 I am using CDH5.3.2 Spark and below is my dependencies list.
 val spark   = 1.2.0-cdh5.3.2
 val esriGeometryAPI = 1.2
 val csvWriter   = 1.0.0
 val hadoopClient= 2.3.0
 val scalaTest   = 2.2.1
 val jodaTime= 1.6.0
 val scalajHTTP  = 1.0.1
 val avro= 1.7.7
 val scopt   = 3.2.0
 val config  = 1.2.1
 val jobserver   = 0.4.1
 val excludeJBossNetty = ExclusionRule(organization = org.jboss.netty)
 val excludeIONetty = ExclusionRule(organization = io.netty)
 val excludeEclipseJetty = ExclusionRule(organization =
 org.eclipse.jetty)
 val excludeMortbayJetty = ExclusionRule(organization =
 org.mortbay.jetty)
 val excludeAsm = ExclusionRule(organization = org.ow2.asm)
 val excludeOldAsm = ExclusionRule(organization = asm)
 val excludeCommonsLogging = ExclusionRule(organization =
 commons-logging)
 val excludeSLF4J = ExclusionRule(organization = org.slf4j)
 val excludeScalap = ExclusionRule(organization = org.scala-lang,
 artifact = scalap)
 val excludeHadoop = ExclusionRule(organization = org.apache.hadoop)
 val excludeCurator = ExclusionRule(organization = org.apache.curator)
 val excludePowermock = ExclusionRule(organization = org.powermock)
 val excludeFastutil = ExclusionRule(organization = it.unimi.dsi)
 val excludeJruby = ExclusionRule(organization = org.jruby)
 val excludeThrift = ExclusionRule(organization = org.apache.thrift)
 val excludeServletApi = ExclusionRule(organization = javax.servlet,
 artifact = servlet-api)
 val excludeJUnit = ExclusionRule(organization = junit)

 I found the link (
 http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-SecurityException-when-running-tests-with-Spark-1-0-0-td6747.html#a6749
 ) talking about the issue and the work around of the same.
 But that work around does not get rid of the problem for me.
 I am using an SBT build which can't be changed to maven.

 What am I missing?


 Stack trace
 -
 [info] FiltersRDDSpec:
 [info] - Spark Filter *** FAILED ***
 [info]   java.lang.SecurityException: class
 javax.servlet.FilterRegistration's signer information does not match
 signer information of other classes in the same package
 [info]   at java.lang.ClassLoader.checkCerts(Unknown Source)
 [info]   at java.lang.ClassLoader.preDefineClass(Unknown Source)
 [info]   at java.lang.ClassLoader.defineClass(Unknown Source)
 [info]   at java.security.SecureClassLoader.defineClass(Unknown Source)
 [info]   at java.net.URLClassLoader.defineClass(Unknown Source)
 [info]   at java.net.URLClassLoader.access$100(Unknown Source)
 [info]   at java.net.URLClassLoader$1.run(Unknown Source)
 [info]   at java.net.URLClassLoader$1.run(Unknown Source)
 [info]   at java.security.AccessController.doPrivileged(Native Method)
 [info]   at java.net.URLClassLoader.findClass(Unknown Source)

 Thanks
 Manas
  Manas Kar

 --
 View this message in context: Spark unit test fails
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.



Spark unit test fails

2015-04-03 Thread Manas Kar
Hi experts,
 I am trying to write unit tests for my spark application which fails with
javax.servlet.FilterRegistration error.

I am using CDH5.3.2 Spark and below is my dependencies list.
val spark   = 1.2.0-cdh5.3.2
val esriGeometryAPI = 1.2
val csvWriter   = 1.0.0
val hadoopClient= 2.3.0
val scalaTest   = 2.2.1
val jodaTime= 1.6.0
val scalajHTTP  = 1.0.1
val avro= 1.7.7
val scopt   = 3.2.0
val config  = 1.2.1
val jobserver   = 0.4.1
val excludeJBossNetty = ExclusionRule(organization = org.jboss.netty)
val excludeIONetty = ExclusionRule(organization = io.netty)
val excludeEclipseJetty = ExclusionRule(organization =
org.eclipse.jetty)
val excludeMortbayJetty = ExclusionRule(organization =
org.mortbay.jetty)
val excludeAsm = ExclusionRule(organization = org.ow2.asm)
val excludeOldAsm = ExclusionRule(organization = asm)
val excludeCommonsLogging = ExclusionRule(organization =
commons-logging)
val excludeSLF4J = ExclusionRule(organization = org.slf4j)
val excludeScalap = ExclusionRule(organization = org.scala-lang,
artifact = scalap)
val excludeHadoop = ExclusionRule(organization = org.apache.hadoop)
val excludeCurator = ExclusionRule(organization = org.apache.curator)
val excludePowermock = ExclusionRule(organization = org.powermock)
val excludeFastutil = ExclusionRule(organization = it.unimi.dsi)
val excludeJruby = ExclusionRule(organization = org.jruby)
val excludeThrift = ExclusionRule(organization = org.apache.thrift)
val excludeServletApi = ExclusionRule(organization = javax.servlet,
artifact = servlet-api)
val excludeJUnit = ExclusionRule(organization = junit)

I found the link (
http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-SecurityException-when-running-tests-with-Spark-1-0-0-td6747.html#a6749
) talking about the issue and the work around of the same.
But that work around does not get rid of the problem for me.
I am using an SBT build which can't be changed to maven.

What am I missing?


Stack trace
-
[info] FiltersRDDSpec:
[info] - Spark Filter *** FAILED ***
[info]   java.lang.SecurityException: class
javax.servlet.FilterRegistration's signer information does not match
signer information of other classes in the same package
[info]   at java.lang.ClassLoader.checkCerts(Unknown Source)
[info]   at java.lang.ClassLoader.preDefineClass(Unknown Source)
[info]   at java.lang.ClassLoader.defineClass(Unknown Source)
[info]   at java.security.SecureClassLoader.defineClass(Unknown Source)
[info]   at java.net.URLClassLoader.defineClass(Unknown Source)
[info]   at java.net.URLClassLoader.access$100(Unknown Source)
[info]   at java.net.URLClassLoader$1.run(Unknown Source)
[info]   at java.net.URLClassLoader$1.run(Unknown Source)
[info]   at java.security.AccessController.doPrivileged(Native Method)
[info]   at java.net.URLClassLoader.findClass(Unknown Source)

Thanks
Manas


Re: rdd.toDF().saveAsParquetFile(tachyon://host:19998/test)

2015-03-28 Thread Yin Huai
You are hitting https://issues.apache.org/jira/browse/SPARK-6330. It has
been fixed in 1.3.1, which will be released soon.

On Fri, Mar 27, 2015 at 10:42 PM, sud_self 852677...@qq.com wrote:

 spark version is 1.3.0 with tanhyon-0.6.1

 QUESTION DESCRIPTION: rdd.saveAsObjectFile(tachyon://host:19998/test)
 and
 rdd.saveAsTextFile(tachyon://host:19998/test)  succeed,   but
 rdd.toDF().saveAsParquetFile(tachyon://host:19998/test) failure.

 ERROR MESSAGE:java.lang.IllegalArgumentException: Wrong FS:
 tachyon://host:19998/test, expected: hdfs://host:8020
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
 at
 org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/rdd-toDF-saveAsParquetFile-tachyon-host-19998-test-tp22264.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




rdd.toDF().saveAsParquetFile(tachyon://host:19998/test)

2015-03-27 Thread sud_self
spark version is 1.3.0 with tanhyon-0.6.1

QUESTION DESCRIPTION: rdd.saveAsObjectFile(tachyon://host:19998/test)  and 
rdd.saveAsTextFile(tachyon://host:19998/test)  succeed,   but
rdd.toDF().saveAsParquetFile(tachyon://host:19998/test) failure.

ERROR MESSAGE:java.lang.IllegalArgumentException: Wrong FS:
tachyon://host:19998/test, expected: hdfs://host:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:252)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:251)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/rdd-toDF-saveAsParquetFile-tachyon-host-19998-test-tp22264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Maven Test error

2015-03-25 Thread zzcclp
I use command to run Unit test, as follow:
./make-distribution.sh --tgz --skip-java-test -Pscala-2.10 -Phadoop-2.3
-Phive -Phive-thriftserver -Pyarn -Dyarn.version=2.3.0-cdh5.1.2
-Dhadoop.version=2.3.0-cdh5.1.2

mvn -Pscala-2.10 -Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn
-Dyarn.version=2.3.0-cdh5.1.2 -Dhadoop.version=2.3.0-cdh5.1.2 -pl
core,launcher,network/common,network/shuffle,sql/core,sql/catalyst,sql/hive
-DwildcardSuites=org.apache.spark.sql.JoinSuite test‍

Error occur:
---

 T E S T S

---

Running org.apache.spark.network.ProtocolSuite

log4j:WARN No appenders could be found for logger
(io.netty.util.internal.logging.InternalLoggerFactory).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.251 sec -
in org.apache.spark.network.ProtocolSuite

Running org.apache.spark.network.RpcIntegrationSuite

Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.54 sec -
in org.apache.spark.network.RpcIntegrationSuite

Running org.apache.spark.network.ChunkFetchIntegrationSuite

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.34 sec -
in org.apache.spark.network.ChunkFetchIntegrationSuite

Running org.apache.spark.network.sasl.SparkSaslSuite

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.103 sec -
in org.apache.spark.network.sasl.SparkSaslSuite

Running org.apache.spark.network.TransportClientFactorySuite

Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 11.42 sec
 FAILURE! - in org.apache.spark.network.TransportClientFactorySuite

reuseClientsUpToConfigVariable(org.apache.spark.network.TransportClientFactorySuite)
 
Time elapsed: 2.332 sec   ERROR!

java.lang.IllegalStateException: failed to create a child event loop

at sun.nio.ch.IOUtil.makePipe(Native Method)

at
sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:65)

at
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)

at
io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)

at
io.netty.channel.nio.NioEventLoop.init(NioEventLoop.java:120)

at
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)

at
io.netty.util.concurrent.MultithreadEventExecutorGroup.init(MultithreadEventExecutorGroup.java:64)

at
io.netty.channel.MultithreadEventLoopGroup.init(MultithreadEventLoopGroup.java:49)

at
io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:61)

at
io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:52)

at
org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:56)

at
org.apache.spark.network.client.TransportClientFactory.init(TransportClientFactory.java:104)

at
org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:76)

at
org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:80)

at
org.apache.spark.network.TransportClientFactorySuite.testClientReuse(TransportClientFactorySuite.java:86)

at
org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable(TransportClientFactorySuite.java:131)

 

reuseClientsUpToConfigVariableConcurrent(org.apache.spark.network.TransportClientFactorySuite)
 
Time elapsed: 2.279 sec   ERROR!

java.lang.IllegalStateException: failed to create a child event loop

at sun.nio.ch.IOUtil.makePipe(Native Method)

at
sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:65)

at
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)

at
io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)

at
io.netty.channel.nio.NioEventLoop.init(NioEventLoop.java:120)

at
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)

at
io.netty.util.concurrent.MultithreadEventExecutorGroup.init(MultithreadEventExecutorGroup.java:64)

at
io.netty.channel.MultithreadEventLoopGroup.init(MultithreadEventLoopGroup.java:49)

at
io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:61)

at
io.netty.channel.nio.NioEventLoopGroup.init(NioEventLoopGroup.java:52)

at
org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:56)

at
org.apache.spark.network.client.TransportClientFactory.init(TransportClientFactory.java:104)

at
org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:76)

at
org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:80

HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-03-06 Thread nitinkak001
I am trying to run a Hive query from Spark using HiveContext. Here is the
code

/ val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)

   
conf.set(spark.executor.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
conf.set(spark.driver.extraClassPath,
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
conf.set(spark.yarn.am.waitTime, 30L)

val sc = new SparkContext(conf)

val sqlContext = new HiveContext(sc)

def inputRDD = sqlContext.sql(describe
spark_poc.src_digital_profile_user);

inputRDD.collect().foreach { println }

println(inputRDD.schema.getClass.getName)
/

Getting this exception. Any clues? The weird part is if I try to do the same
thing but in Java instead of Scala, it runs fine.

/Exception in thread Driver java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
15/03/06 17:39:32 ERROR yarn.ApplicationMaster: SparkContext did not
initialize after waiting for 1 ms. Please check earlier log output for
errors. Failing the application.
Exception in thread main java.lang.NullPointerException
at
org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:110)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:434)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:433)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
15/03/06 17:39:32 INFO yarn.ApplicationMaster: AppMaster received a signal./



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-test-Spark-Context-did-not-initialize-after-waiting-1ms-tp21953.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: HiveContext test, Spark Context did not initialize after waiting 10000ms

2015-03-06 Thread Marcelo Vanzin
On Fri, Mar 6, 2015 at 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote:
 I am trying to run a Hive query from Spark using HiveContext. Here is the
 code

 / val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)


 conf.set(spark.executor.extraClassPath,
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
 conf.set(spark.driver.extraClassPath,
 /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hive/lib);
 conf.set(spark.yarn.am.waitTime, 30L)

You're missing /* at the end of your classpath entries. Also, since
you're on CDH 5.2, you'll probably need to filter out the guava jar
from Hive's lib directory, otherwise things might break. So things
will get a little more complicated.

With CDH 5.3 you shouldn't need to filter out the guava jar.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Standard Application to Test

2015-02-25 Thread danilopds
Hello,
I am preparing some tests to execute in Spark in order to manipulate
properties and check the variations in results.

For this, I need to use a Standard Application in my environment like the
well-known apps to Hadoop:  Terasort
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html
  
and specially  Terrier http://terrier.org/   or something similar. I do
not need applications wordcount and grep because I have used them.

Can anyone suggest me something about this?
Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Standard-Application-to-Test-tp21803.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Test

2015-02-12 Thread Dima Zhiyanov


Sent from my iPhone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.lang.NoClassDefFoundError: io/netty/util/TimerTask Error when running sbt test

2015-01-14 Thread Jianguo Li
I am using Spark-1.1.1. When I used sbt test, I ran into the
following exceptions. Any idea how to solve it? Thanks! I think
somebody posted this question before, but no one seemed to have
answered it. Could it be the version of io.netty I put in my
build.sbt? I included an dependency libraryDependencies += io.netty
% netty % 3.6.6.Final in my build.sbt file.

java.lang.NoClassDefFoundError: io/netty/util/TimerTaskat
org.apache.spark.storage.BlockManager.init(BlockManager.scala:72)
  at org.apache.spark.storage.BlockManager.init(BlockManager.scala:168)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:230)at
org.apache.spark.SparkContext.init(SparkContext.scala:204)
  at 
spark.jobserver.util.DefaultSparkContextFactory.makeContext(SparkContextFactory.scala:34)
  at 
spark.jobserver.JobManagerActor.createContextFromConfig(JobManagerActor.scala:255)
  at 
spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:104)
  at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
  at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
  at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)...


TestSuiteBase based unit test using a sliding window join timesout

2015-01-07 Thread Enno Shioji
Hi,

I extended org.apache.spark.streaming.TestSuiteBase for some testing, and I
was able to run this test fine:

test(Sliding window join with 3 second window duration) {

  val input1 =
Seq(
  Seq(req1),
  Seq(req2, req3),
  Seq(),
  Seq(req4, req5, req6),
  Seq(req7),
  Seq(),
  Seq()
)

  val input2 =
Seq(
  Seq((tx1, req1)),
  Seq(),
  Seq((tx2, req3)),
  Seq((tx3, req2)),
  Seq(),
  Seq((tx4, req7)),
  Seq((tx5, req5), (tx6, req4))
)

  val expectedOutput = Seq(
Seq((req1, (1, tx1))),
Seq(),
Seq((req3, (1, tx2))),
Seq((req2, (1, tx3))),
Seq(),
Seq((req7, (1, tx4))),
Seq()
  )
  val operation = (rq: DStream[String], tx: DStream[(String,String)]) = {
rq.window(Seconds(3), Seconds(1)).map(x = (x, 1)).join(tx.map{ case
(k, v) = (v, k)})
  }
  testOperation(input1, input2, operation, expectedOutput, useSet=true)
}

However, this seemingly OK looking test fails with operation timeout:

test(Sliding window join with 3 second window duration + a tumbling
window) {

  val input1 =
Seq(
  Seq(req1),
  Seq(req2, req3),
  Seq(),
  Seq(req4, req5, req6),
  Seq(req7),
  Seq()
)

  val input2 =
Seq(
  Seq((tx1, req1)),
  Seq(),
  Seq((tx2, req3)),
  Seq((tx3, req2)),
  Seq(),
  Seq((tx4, req7))
)

  val expectedOutput = Seq(
Seq((req1, (1, tx1))),
Seq((req2, (1, tx3)), (req3, (1, tx3))),
Seq((req7, (1, tx4)))
  )
  val operation = (rq: DStream[String], tx: DStream[(String,String)]) = {
rq.window(Seconds(3), Seconds(2)).map(x = (x,
1)).join(tx.window(Seconds(2), Seconds(2)).map{ case (k, v) = (v, k)})
  }
  testOperation(input1, input2, operation, expectedOutput, useSet=true)
}

Stacktrace:
10033 was not less than 1 Operation timed out after 10033 ms
org.scalatest.exceptions.TestFailedException: 10033 was not less than 1
Operation timed out after 10033 ms
at
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
at
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
at
org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:338)

Does anybody know why this could be?
ᐧ


Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
Hello,

I have a piece of code that runs inside Spark Streaming and tries to get
some data from a RESTful web service (that runs locally on my machine). The
code snippet in question is:

 Client client = ClientBuilder.newClient();
 WebTarget target = client.target(http://localhost:/rest;);
 target = target.path(annotate)
 .queryParam(text,
UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
 .queryParam(confidence, 0.3);

  logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString());

  String response =
target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);

  logger.warn(!!! DEBUG !!! Spotlight response: {}, response);

When run inside a unit test as follows:

 mvn clean test -Dtest=SpotlightTest#testCountWords

it contacts the RESTful web service and retrieves some data as expected.
But when the same code is run as part of the application that is submitted
to Spark, using spark-submit script I receive the following error:

  java.lang.NoSuchMethodError:
javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V

I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in
my project's pom.xml:

 dependency
  groupIdorg.glassfish.jersey.containers/groupId
  artifactIdjersey-container-servlet-core/artifactId
  version2.14/version
/dependency

So I suspect that when the application is submitted to Spark, somehow
there's a different JAR in the environment that uses a different version of
Jersey / javax.ws.rs.*

Does anybody know which version of Jersey / javax.ws.rs.*  is used in the
Spark environment, or how to solve this conflict?


-- 
Emre Sevinç
https://be.linkedin.com/in/emresevinc/


Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Sean Owen
Your guess is right, that there are two incompatible versions of
Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey,
but its transitive dependencies may, or your transitive dependencies
may.

I don't see Jersey in Spark's dependency tree except from HBase tests,
which in turn only appear in examples, so that's unlikely to be it.
I'd take a look with 'mvn dependency:tree' on your own code first.
Maybe you are including JavaEE 6 for example?

On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote:
 Hello,

 I have a piece of code that runs inside Spark Streaming and tries to get
 some data from a RESTful web service (that runs locally on my machine). The
 code snippet in question is:

  Client client = ClientBuilder.newClient();
  WebTarget target = client.target(http://localhost:/rest;);
  target = target.path(annotate)
  .queryParam(text,
 UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
  .queryParam(confidence, 0.3);

   logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString());

   String response =
 target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);

   logger.warn(!!! DEBUG !!! Spotlight response: {}, response);

 When run inside a unit test as follows:

  mvn clean test -Dtest=SpotlightTest#testCountWords

 it contacts the RESTful web service and retrieves some data as expected. But
 when the same code is run as part of the application that is submitted to
 Spark, using spark-submit script I receive the following error:

   java.lang.NoSuchMethodError:
 javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V

 I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in
 my project's pom.xml:

  dependency
   groupIdorg.glassfish.jersey.containers/groupId
   artifactIdjersey-container-servlet-core/artifactId
   version2.14/version
 /dependency

 So I suspect that when the application is submitted to Spark, somehow
 there's a different JAR in the environment that uses a different version of
 Jersey / javax.ws.rs.*

 Does anybody know which version of Jersey / javax.ws.rs.*  is used in the
 Spark environment, or how to solve this conflict?


 --
 Emre Sevinç
 https://be.linkedin.com/in/emresevinc/


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
[INFO] |  |  \- javax.validation:validation-api:jar:1.1.0.Final:compile
[INFO] |  \- javax.ws.rs:javax.ws.rs-api:jar:2.0.1:compile
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-hdfs:jar:2.4.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.4.0:compile
[INFO] |  |  +-
org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.4.0:compile
[INFO] |  |  |  +- org.apache.hadoop:hadoop-yarn-client:jar:2.4.0:compile
[INFO] |  |  |  |  \- com.sun.jersey:jersey-client:jar:1.9:compile
[INFO] |  |  |  \-
org.apache.hadoop:hadoop-yarn-server-common:jar:2.4.0:compile
[INFO] |  |  \-
org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.4.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-yarn-api:jar:2.4.0:compile
[INFO] |  +-
org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.4.0:compile
[INFO] |  \- org.apache.hadoop:hadoop-annotations:jar:2.4.0:compile
[INFO] +- com.google.guava:guava:jar:16.0:compile
[INFO] +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.4.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-yarn-common:jar:2.4.0:compile
[INFO] |  |  +- javax.xml.bind:jaxb-api:jar:2.2.2:compile
[INFO] |  |  |  +- javax.xml.stream:stax-api:jar:1.0-2:compile
[INFO] |  |  |  \- javax.activation:activation:jar:1.1:compile
[INFO] |  |  +- javax.servlet:servlet-api:jar:2.5:compile
[INFO] |  |  +- com.google.inject:guice:jar:3.0:compile
[INFO] |  |  |  +- javax.inject:javax.inject:jar:1:compile
[INFO] |  |  |  \- aopalliance:aopalliance:jar:1.0:compile
[INFO] |  |  \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile
[INFO] |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  +- org.slf4j:slf4j-api:jar:1.7.5:compile
[INFO] |  +- com.google.inject.extensions:guice-servlet:jar:3.0:compile
[INFO] |  \- io.netty:netty:jar:3.6.2.Final:compile
[INFO] +- json-mapreduce:json-mapreduce:jar:1.0-SNAPSHOT:compile
[INFO] +- org.apache.avro:avro-mapred:jar:1.7.7:compile
[INFO] |  +- org.apache.avro:avro-ipc:jar:1.7.7:compile
[INFO] |  |  +- org.apache.velocity:velocity:jar:1.7:compile
[INFO] |  |  \- org.mortbay.jetty:servlet-api:jar:2.5-20081211:compile
[INFO] |  +- org.apache.avro:avro-ipc:jar:tests:1.7.7:compile
[INFO] |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] |  \- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] +- junit:junit:jar:4.11:test
[INFO] |  \- org.hamcrest:hamcrest-core:jar:1.3:test
[INFO] +- org.apache.avro:avro:jar:1.7.7:compile
[INFO] |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] |  +- org.xerial.snappy:snappy-java:jar:1.0.5:compile
[INFO] |  \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] | \- org.tukaani:xz:jar:1.0:compile
[INFO] +- org.apache.hadoop:hadoop-common:jar:2.4.0:provided
[INFO] |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  +- org.apache.commons:commons-math3:jar:3.1.1:provided
[INFO] |  +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:provided
[INFO] |  +- commons-codec:commons-codec:jar:1.4:compile
[INFO] |  +- commons-io:commons-io:jar:2.4:compile
[INFO] |  +- commons-net:commons-net:jar:3.1:provided
[INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  +- org.mortbay.jetty:jetty:jar:6.1.26:compile
[INFO] |  +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] |  +- com.sun.jersey:jersey-core:jar:1.9:compile
[INFO] |  +- com.sun.jersey:jersey-json:jar:1.9:compile
[INFO] |  |  +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] |  |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile
[INFO] |  |  \- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile
[INFO] |  +- com.sun.jersey:jersey-server:jar:1.9:compile
[INFO] |  |  \- asm:asm:jar:3.1:compile
[INFO] |  +- tomcat:jasper-compiler:jar:5.5.23:provided
[INFO] |  +- tomcat:jasper-runtime:jar:5.5.23:provided
[INFO] |  +- javax.servlet.jsp:jsp-api:jar:2.1:provided
[INFO] |  +- commons-el:commons-el:jar:1.0:provided
[INFO] |  +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] |  +- log4j:log4j:jar:1.2.17:compile
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.9.0:provided
[INFO] |  |  +- org.apache.httpcomponents:httpclient:jar:4.1.2:provided
[INFO] |  |  +- org.apache.httpcomponents:httpcore:jar:4.1.2:provided
[INFO] |  |  \- com.jamesmurty.utils:java-xmlbuilder:jar:0.4:provided
[INFO] |  +- commons-lang:commons-lang:jar:2.6:compile
[INFO] |  +- commons-configuration:commons-configuration:jar:1.6:provided
[INFO] |  |  +- commons-digester:commons-digester:jar:1.8:provided
[INFO] |  |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:provided
[INFO] |  |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:provided
[INFO] |  +- org.apache.hadoop:hadoop-auth:jar:2.4.0:provided
[INFO] |  +- com.jcraft:jsch:jar:0.1.42:provided
[INFO] |  +- com.google.code.findbugs:jsr305:jar:1.3.9:provided
[INFO

Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
It seems like YARN depends an older version of Jersey, that is 1.9:

  https://github.com/apache/spark/blob/master/yarn/pom.xml

When I've modified my dependencies to have only:

  dependency
  groupIdcom.sun.jersey/groupId
  artifactIdjersey-core/artifactId
  version1.9.1/version
/dependency

And then modified the code to use the old Jersey API:

Client c = Client.create();
WebResource r = c.resource(http://localhost:/rest;)
 .path(annotate)
 .queryParam(text,
UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
 .queryParam(confidence, 0.3);

logger.warn(!!! DEBUG !!! target: {}, r.getURI());

String response = r.accept(MediaType.APPLICATION_JSON_TYPE)
   //.header()
   .get(String.class);

logger.warn(!!! DEBUG !!! Spotlight response: {}, response);

It seems to work when I use spark-submit to submit the application that
includes this code.

Funny thing is, now my relevant unit test does not run, complaining about
not having enough memory:

Java HotSpot(TM) 64-Bit Server VM warning: INFO:
os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot
allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 25165824 bytes for
committing reserved memory.

--
Emre


On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:

 Your guess is right, that there are two incompatible versions of
 Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey,
 but its transitive dependencies may, or your transitive dependencies
 may.

 I don't see Jersey in Spark's dependency tree except from HBase tests,
 which in turn only appear in examples, so that's unlikely to be it.
 I'd take a look with 'mvn dependency:tree' on your own code first.
 Maybe you are including JavaEE 6 for example?

 On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:
  Hello,
 
  I have a piece of code that runs inside Spark Streaming and tries to get
  some data from a RESTful web service (that runs locally on my machine).
 The
  code snippet in question is:
 
   Client client = ClientBuilder.newClient();
   WebTarget target = client.target(http://localhost:/rest;);
   target = target.path(annotate)
   .queryParam(text,
  UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
   .queryParam(confidence, 0.3);
 
logger.warn(!!! DEBUG !!! target: {},
 target.getUri().toString());
 
String response =
 
 target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);
 
logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
 
  When run inside a unit test as follows:
 
   mvn clean test -Dtest=SpotlightTest#testCountWords
 
  it contacts the RESTful web service and retrieves some data as expected.
 But
  when the same code is run as part of the application that is submitted to
  Spark, using spark-submit script I receive the following error:
 
java.lang.NoSuchMethodError:
 
 javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V
 
  I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey
 in
  my project's pom.xml:
 
   dependency
groupIdorg.glassfish.jersey.containers/groupId
artifactIdjersey-container-servlet-core/artifactId
version2.14/version
  /dependency
 
  So I suspect that when the application is submitted to Spark, somehow
  there's a different JAR in the environment that uses a different version
 of
  Jersey / javax.ws.rs.*
 
  Does anybody know which version of Jersey / javax.ws.rs.*  is used in the
  Spark environment, or how to solve this conflict?
 
 
  --
  Emre Sevinç
  https://be.linkedin.com/in/emresevinc/
 




-- 
Emre Sevinc


Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Sean Owen
That could well be it -- oops, I forgot to run with the YARN profile
and so didn't see the YARN dependencies. Try the userClassPathFirst
option to try to make your app's copy take precedence.

The second error is really a JVM bug, but, is from having too little
memory available for the unit tests.

http://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage

On Wed, Dec 24, 2014 at 1:56 PM, Emre Sevinc emre.sev...@gmail.com wrote:
 It seems like YARN depends an older version of Jersey, that is 1.9:

   https://github.com/apache/spark/blob/master/yarn/pom.xml

 When I've modified my dependencies to have only:

   dependency
   groupIdcom.sun.jersey/groupId
   artifactIdjersey-core/artifactId
   version1.9.1/version
 /dependency

 And then modified the code to use the old Jersey API:

 Client c = Client.create();
 WebResource r = c.resource(http://localhost:/rest;)
  .path(annotate)
  .queryParam(text,
 UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
  .queryParam(confidence, 0.3);

 logger.warn(!!! DEBUG !!! target: {}, r.getURI());

 String response = r.accept(MediaType.APPLICATION_JSON_TYPE)
//.header()
.get(String.class);

 logger.warn(!!! DEBUG !!! Spotlight response: {}, response);

 It seems to work when I use spark-submit to submit the application that
 includes this code.

 Funny thing is, now my relevant unit test does not run, complaining about
 not having enough memory:

 Java HotSpot(TM) 64-Bit Server VM warning: INFO:
 os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot
 allocate memory' (errno=12)
 #
 # There is insufficient memory for the Java Runtime Environment to continue.
 # Native memory allocation (mmap) failed to map 25165824 bytes for
 committing reserved memory.

 --
 Emre


 On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:

 Your guess is right, that there are two incompatible versions of
 Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey,
 but its transitive dependencies may, or your transitive dependencies
 may.

 I don't see Jersey in Spark's dependency tree except from HBase tests,
 which in turn only appear in examples, so that's unlikely to be it.
 I'd take a look with 'mvn dependency:tree' on your own code first.
 Maybe you are including JavaEE 6 for example?

 On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:
  Hello,
 
  I have a piece of code that runs inside Spark Streaming and tries to get
  some data from a RESTful web service (that runs locally on my machine).
  The
  code snippet in question is:
 
   Client client = ClientBuilder.newClient();
   WebTarget target = client.target(http://localhost:/rest;);
   target = target.path(annotate)
   .queryParam(text,
  UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
   .queryParam(confidence, 0.3);
 
logger.warn(!!! DEBUG !!! target: {},
  target.getUri().toString());
 
String response =
 
  target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);
 
logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
 
  When run inside a unit test as follows:
 
   mvn clean test -Dtest=SpotlightTest#testCountWords
 
  it contacts the RESTful web service and retrieves some data as expected.
  But
  when the same code is run as part of the application that is submitted
  to
  Spark, using spark-submit script I receive the following error:
 
java.lang.NoSuchMethodError:
 
  javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V
 
  I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey
  in
  my project's pom.xml:
 
   dependency
groupIdorg.glassfish.jersey.containers/groupId
artifactIdjersey-container-servlet-core/artifactId
version2.14/version
  /dependency
 
  So I suspect that when the application is submitted to Spark, somehow
  there's a different JAR in the environment that uses a different version
  of
  Jersey / javax.ws.rs.*
 
  Does anybody know which version of Jersey / javax.ws.rs.*  is used in
  the
  Spark environment, or how to solve this conflict?
 
 
  --
  Emre Sevinç
  https://be.linkedin.com/in/emresevinc/
 




 --
 Emre Sevinc

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
Sean,

Thanks a lot for the important information, especially  userClassPathFirst.

Cheers,
Emre

On Wed, Dec 24, 2014 at 3:38 PM, Sean Owen so...@cloudera.com wrote:

 That could well be it -- oops, I forgot to run with the YARN profile
 and so didn't see the YARN dependencies. Try the userClassPathFirst
 option to try to make your app's copy take precedence.

 The second error is really a JVM bug, but, is from having too little
 memory available for the unit tests.


 http://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage

 On Wed, Dec 24, 2014 at 1:56 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:
  It seems like YARN depends an older version of Jersey, that is 1.9:
 
https://github.com/apache/spark/blob/master/yarn/pom.xml
 
  When I've modified my dependencies to have only:
 
dependency
groupIdcom.sun.jersey/groupId
artifactIdjersey-core/artifactId
version1.9.1/version
  /dependency
 
  And then modified the code to use the old Jersey API:
 
  Client c = Client.create();
  WebResource r = c.resource(http://localhost:/rest;)
   .path(annotate)
   .queryParam(text,
  UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
   .queryParam(confidence, 0.3);
 
  logger.warn(!!! DEBUG !!! target: {}, r.getURI());
 
  String response = r.accept(MediaType.APPLICATION_JSON_TYPE)
 //.header()
 .get(String.class);
 
  logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
 
  It seems to work when I use spark-submit to submit the application that
  includes this code.
 
  Funny thing is, now my relevant unit test does not run, complaining about
  not having enough memory:
 
  Java HotSpot(TM) 64-Bit Server VM warning: INFO:
  os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot
  allocate memory' (errno=12)
  #
  # There is insufficient memory for the Java Runtime Environment to
 continue.
  # Native memory allocation (mmap) failed to map 25165824 bytes for
  committing reserved memory.
 
  --
  Emre
 
 
  On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:
 
  Your guess is right, that there are two incompatible versions of
  Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey,
  but its transitive dependencies may, or your transitive dependencies
  may.
 
  I don't see Jersey in Spark's dependency tree except from HBase tests,
  which in turn only appear in examples, so that's unlikely to be it.
  I'd take a look with 'mvn dependency:tree' on your own code first.
  Maybe you are including JavaEE 6 for example?
 
  On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com
  wrote:
   Hello,
  
   I have a piece of code that runs inside Spark Streaming and tries to
 get
   some data from a RESTful web service (that runs locally on my
 machine).
   The
   code snippet in question is:
  
Client client = ClientBuilder.newClient();
WebTarget target = client.target(http://localhost:/rest;);
target = target.path(annotate)
.queryParam(text,
   UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
.queryParam(confidence, 0.3);
  
 logger.warn(!!! DEBUG !!! target: {},
   target.getUri().toString());
  
 String response =
  
  
 target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);
  
 logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
  
   When run inside a unit test as follows:
  
mvn clean test -Dtest=SpotlightTest#testCountWords
  
   it contacts the RESTful web service and retrieves some data as
 expected.
   But
   when the same code is run as part of the application that is submitted
   to
   Spark, using spark-submit script I receive the following error:
  
 java.lang.NoSuchMethodError:
  
  
 javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V
  
   I'm using Spark 1.1.0 and for consuming the web service I'm using
 Jersey
   in
   my project's pom.xml:
  
dependency
 groupIdorg.glassfish.jersey.containers/groupId
 artifactIdjersey-container-servlet-core/artifactId
 version2.14/version
   /dependency
  
   So I suspect that when the application is submitted to Spark, somehow
   there's a different JAR in the environment that uses a different
 version
   of
   Jersey / javax.ws.rs.*
  
   Does anybody know which version of Jersey / javax.ws.rs.*  is used in
   the
   Spark environment, or how to solve this conflict?
  
  
   --
   Emre Sevinç
   https://be.linkedin.com/in/emresevinc/
  
 
 
 
 
  --
  Emre Sevinc




-- 
Emre Sevinc


Re: Intermittent test failures

2014-12-17 Thread Marius Soutier
Using TestSQLContext from multiple tests leads to:

SparkException: : Task not serializable

ERROR ContextCleaner: Error cleaning broadcast 10
java.lang.NullPointerException
at 
org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:246)
at 
org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:46)
at 
org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
at 
org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185)
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147)
at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138)
at scala.Option.foreach(Option.scala:236)


On 15.12.2014, at 22:36, Marius Soutier mps@gmail.com wrote:

 Ok, maybe these test versions will help me then. I’ll check it out.
 
 On 15.12.2014, at 22:33, Michael Armbrust mich...@databricks.com wrote:
 
 Using a single SparkContext should not cause this problem.  In the SQL tests 
 we use TestSQLContext and TestHive which are global singletons for all of 
 our unit testing.
 
 On Mon, Dec 15, 2014 at 1:27 PM, Marius Soutier mps@gmail.com wrote:
 Possible, yes, although I’m trying everything I can to prevent it, i.e. fork 
 in Test := true and isolated. Can you confirm that reusing a single 
 SparkContext for multiple tests poses a problem as well?
 
 Other than that, just switching from SQLContext to HiveContext also provoked 
 the error.
 
 
 On 15.12.2014, at 20:22, Michael Armbrust mich...@databricks.com wrote:
 
 Is it possible that you are starting more than one SparkContext in a single 
 JVM with out stopping previous ones?  I'd try testing with Spark 1.2, which 
 will throw an exception in this case.
 
 On Mon, Dec 15, 2014 at 8:48 AM, Marius Soutier mps@gmail.com wrote:
 Hi,
 
 I’m seeing strange, random errors when running unit tests for my Spark 
 jobs. In this particular case I’m using Spark SQL to read and write Parquet 
 files, and one error that I keep running into is this one:
 
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 
 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 
 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2)
 org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
 org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
 org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)
 
 I can only prevent this from happening by using isolated Specs tests thats 
 always create a new SparkContext that is not shared between tests (but 
 there can also be only a single SparkContext per test), and also by using 
 standard SQLContext instead of HiveContext. It does not seem to have 
 anything to do with the actual files that I also create during the test run 
 with SQLContext.saveAsParquetFile.
 
 
 Cheers
 - Marius
 
 
 PS The full trace:
 
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 
 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 
 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2)
 org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
 org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
 org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)
 
 org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)
 
 org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
 
 org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58)
 
 org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
 
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232)
 
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169)
 org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927)
 
 org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155)
 sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 java.lang.reflect.Method.invoke(Method.java:606)
 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990

Intermittent test failures

2014-12-15 Thread Marius Soutier
Hi,

I’m seeing strange, random errors when running unit tests for my Spark jobs. In 
this particular case I’m using Spark SQL to read and write Parquet files, and 
one error that I keep running into is this one:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in 
stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 
223, localhost): java.io.IOException: PARSING_ERROR(2)
org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)

I can only prevent this from happening by using isolated Specs tests thats 
always create a new SparkContext that is not shared between tests (but there 
can also be only a single SparkContext per test), and also by using standard 
SQLContext instead of HiveContext. It does not seem to have anything to do with 
the actual files that I also create during the test run with 
SQLContext.saveAsParquetFile.


Cheers
- Marius


PS The full trace:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in 
stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 
223, localhost): java.io.IOException: PARSING_ERROR(2)
org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)

org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)

org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58)

org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)

org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232)

org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169)
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927)

org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155)
sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)

org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
 ~[spark-core_2.10-1.1.1.jar:1.1.1]
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
 ~[spark-core_2.10-1.1.1.jar:1.1.1]
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
 ~[spark-core_2.10-1.1.1.jar:1.1.1]
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
~[scala-library.jar:na]
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
~[scala-library.jar:na]
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) 
~[spark-core_2.10-1.1.1.jar:1.1.1]
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Intermittent test failures

2014-12-15 Thread Michael Armbrust
Is it possible that you are starting more than one SparkContext in a single
JVM with out stopping previous ones?  I'd try testing with Spark 1.2, which
will throw an exception in this case.

On Mon, Dec 15, 2014 at 8:48 AM, Marius Soutier mps@gmail.com wrote:

 Hi,

 I’m seeing strange, random errors when running unit tests for my Spark
 jobs. In this particular case I’m using Spark SQL to read and write Parquet
 files, and one error that I keep running into is this one:

 org.apache.spark.SparkException: Job aborted due to stage failure: Task 19
 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage
 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2)
 org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
 org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
 org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)

 I can only prevent this from happening by using isolated Specs tests thats
 always create a new SparkContext that is not shared between tests (but
 there can also be only a single SparkContext per test), and also by using
 standard SQLContext instead of HiveContext. It does not seem to have
 anything to do with the actual files that I also create during the test run
 with SQLContext.saveAsParquetFile.


 Cheers
 - Marius


 PS The full trace:

 org.apache.spark.SparkException: Job aborted due to stage failure: Task 19
 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage
 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2)
 org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
 org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
 org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)

 org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)

 org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)

 org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58)

 org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)

 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232)

 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169)
 org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927)

 org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155)
 sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 java.lang.reflect.Method.invoke(Method.java:606)

 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)

 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)

 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 Driver stacktrace:
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
 ~[spark-core_2.10-1.1.1.jar:1.1.1]
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
 ~[spark-core_2.10-1.1.1.jar:1.1.1]
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
 ~[spark-core_2.10-1.1.1.jar:1.1.1]
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 ~[scala-library.jar:na]
 at
 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 ~[scala-library.jar:na]
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
 ~[spark-core_2.10-1.1.1.jar:1.1.1]
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Intermittent test failures

2014-12-15 Thread Marius Soutier
Ok, maybe these test versions will help me then. I’ll check it out.

On 15.12.2014, at 22:33, Michael Armbrust mich...@databricks.com wrote:

 Using a single SparkContext should not cause this problem.  In the SQL tests 
 we use TestSQLContext and TestHive which are global singletons for all of our 
 unit testing.
 
 On Mon, Dec 15, 2014 at 1:27 PM, Marius Soutier mps@gmail.com wrote:
 Possible, yes, although I’m trying everything I can to prevent it, i.e. fork 
 in Test := true and isolated. Can you confirm that reusing a single 
 SparkContext for multiple tests poses a problem as well?
 
 Other than that, just switching from SQLContext to HiveContext also provoked 
 the error.
 
 
 On 15.12.2014, at 20:22, Michael Armbrust mich...@databricks.com wrote:
 
 Is it possible that you are starting more than one SparkContext in a single 
 JVM with out stopping previous ones?  I'd try testing with Spark 1.2, which 
 will throw an exception in this case.
 
 On Mon, Dec 15, 2014 at 8:48 AM, Marius Soutier mps@gmail.com wrote:
 Hi,
 
 I’m seeing strange, random errors when running unit tests for my Spark jobs. 
 In this particular case I’m using Spark SQL to read and write Parquet files, 
 and one error that I keep running into is this one:
 
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 
 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 
 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2)
 org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
 org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
 org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)
 
 I can only prevent this from happening by using isolated Specs tests thats 
 always create a new SparkContext that is not shared between tests (but there 
 can also be only a single SparkContext per test), and also by using standard 
 SQLContext instead of HiveContext. It does not seem to have anything to do 
 with the actual files that I also create during the test run with 
 SQLContext.saveAsParquetFile.
 
 
 Cheers
 - Marius
 
 
 PS The full trace:
 
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 
 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 
 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2)
 org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
 org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
 org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545)
 
 org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)
 
 org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
 org.xerial.snappy.SnappyInputStream.init(SnappyInputStream.java:58)
 
 org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
 
 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232)
 
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169)
 org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927)
 
 org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155)
 sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 java.lang.reflect.Method.invoke(Method.java:606)
 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 Driver stacktrace:
 at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
  ~[spark-core_2.10-1.1.1.jar:1.1.1

How can I make Spark Streaming count the words in a file in a unit test?

2014-12-08 Thread Emre Sevinc
Hello,

I've successfully built a very simple Spark Streaming application in Java
that is based on the HdfsCount example in Scala at
https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
.

When I submit this application to my local Spark, it waits for a file to be
written to a given directory, and when I create that file it successfully
prints the number of words. I terminate the application by pressing Ctrl+C.

Now I've tried to create a very basic unit test for this functionality, but
in the test I was not able to print the same information, that is the
number of words.

What am I missing?

Below is the unit test file, and after that I've also included the code
snippet that shows the countWords method:

=
StarterAppTest.java
=
import com.google.common.io.Files;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;


import org.junit.*;

import java.io.*;

public class StarterAppTest {

  JavaStreamingContext ssc;
  File tempDir;

  @Before
  public void setUp() {
ssc = new JavaStreamingContext(local, test, new Duration(3000));
tempDir = Files.createTempDir();
tempDir.deleteOnExit();
  }

  @After
  public void tearDown() {
ssc.stop();
ssc = null;
  }

  @Test
  public void testInitialization() {
Assert.assertNotNull(ssc.sc());
  }


  @Test
  public void testCountWords() {

StarterApp starterApp = new StarterApp();

try {
  JavaDStreamString lines =
ssc.textFileStream(tempDir.getAbsolutePath());
  JavaPairDStreamString, Integer wordCounts =
starterApp.countWords(lines);

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

  ssc.start();

  File tmpFile = new File(tempDir.getAbsolutePath(), tmp.txt);
  PrintWriter writer = new PrintWriter(tmpFile, UTF-8);
  writer.println(8-Dec-2014: Emre Emre Emre Ergin Ergin Ergin);
  writer.close();

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

} catch (FileNotFoundException e) {
  e.printStackTrace();
} catch (UnsupportedEncodingException e) {
  e.printStackTrace();
}


Assert.assertTrue(true);

  }

}
=

This test compiles and starts to run, Spark Streaming prints a lot of
diagnostic messages on the console but the calls to wordCounts.print();
does not print anything, whereas in StarterApp.java itself, they do.

I've also added ssc.awaitTermination(); after ssc.start() but nothing
changed in that respect. After that I've also tried to create a new file in
the directory that this Spark Streaming application was checking but this
time it gave an error.

For completeness, below is the wordCounts method:


public JavaPairDStreamString, Integer countWords(JavaDStreamString
lines) {
JavaDStreamString words = lines.flatMap(new FlatMapFunctionString,
String() {
  @Override
  public IterableString call(String x) { return
Lists.newArrayList(SPACE.split(x)); }
});

JavaPairDStreamString, Integer wordCounts = words.mapToPair(
new PairFunctionString, String, Integer() {
  @Override
  public Tuple2String, Integer call(String s) { return new
Tuple2(s, 1); }
}).reduceByKey((i1, i2) - i1 + i2);

return wordCounts;
  }




Kind regards
Emre Sevinç


Re: How can I make Spark Streaming count the words in a file in a unit test?

2014-12-08 Thread Burak Yavuz
Hi,

https://github.com/databricks/spark-perf/tree/master/streaming-tests/src/main/scala/streaming/perf
contains some performance tests for streaming. There are examples of how to 
generate synthetic files during the test in that repo, maybe you
can find some code snippets that you can use there.

Best,
Burak

- Original Message -
From: Emre Sevinc emre.sev...@gmail.com
To: user@spark.apache.org
Sent: Monday, December 8, 2014 2:36:41 AM
Subject: How can I make Spark Streaming count the words in a file in a unit 
test?

Hello,

I've successfully built a very simple Spark Streaming application in Java
that is based on the HdfsCount example in Scala at
https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
.

When I submit this application to my local Spark, it waits for a file to be
written to a given directory, and when I create that file it successfully
prints the number of words. I terminate the application by pressing Ctrl+C.

Now I've tried to create a very basic unit test for this functionality, but
in the test I was not able to print the same information, that is the
number of words.

What am I missing?

Below is the unit test file, and after that I've also included the code
snippet that shows the countWords method:

=
StarterAppTest.java
=
import com.google.common.io.Files;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;


import org.junit.*;

import java.io.*;

public class StarterAppTest {

  JavaStreamingContext ssc;
  File tempDir;

  @Before
  public void setUp() {
ssc = new JavaStreamingContext(local, test, new Duration(3000));
tempDir = Files.createTempDir();
tempDir.deleteOnExit();
  }

  @After
  public void tearDown() {
ssc.stop();
ssc = null;
  }

  @Test
  public void testInitialization() {
Assert.assertNotNull(ssc.sc());
  }


  @Test
  public void testCountWords() {

StarterApp starterApp = new StarterApp();

try {
  JavaDStreamString lines =
ssc.textFileStream(tempDir.getAbsolutePath());
  JavaPairDStreamString, Integer wordCounts =
starterApp.countWords(lines);

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

  ssc.start();

  File tmpFile = new File(tempDir.getAbsolutePath(), tmp.txt);
  PrintWriter writer = new PrintWriter(tmpFile, UTF-8);
  writer.println(8-Dec-2014: Emre Emre Emre Ergin Ergin Ergin);
  writer.close();

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

} catch (FileNotFoundException e) {
  e.printStackTrace();
} catch (UnsupportedEncodingException e) {
  e.printStackTrace();
}


Assert.assertTrue(true);

  }

}
=

This test compiles and starts to run, Spark Streaming prints a lot of
diagnostic messages on the console but the calls to wordCounts.print();
does not print anything, whereas in StarterApp.java itself, they do.

I've also added ssc.awaitTermination(); after ssc.start() but nothing
changed in that respect. After that I've also tried to create a new file in
the directory that this Spark Streaming application was checking but this
time it gave an error.

For completeness, below is the wordCounts method:


public JavaPairDStreamString, Integer countWords(JavaDStreamString
lines) {
JavaDStreamString words = lines.flatMap(new FlatMapFunctionString,
String() {
  @Override
  public IterableString call(String x) { return
Lists.newArrayList(SPACE.split(x)); }
});

JavaPairDStreamString, Integer wordCounts = words.mapToPair(
new PairFunctionString, String, Integer() {
  @Override
  public Tuple2String, Integer call(String s) { return new
Tuple2(s, 1); }
}).reduceByKey((i1, i2) - i1 + i2);

return wordCounts;
  }




Kind regards
Emre Sevinç


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How can I compile only the core and streaming (so that I can get test utilities of streaming)?

2014-12-05 Thread Emre Sevinc
Hello,

I'm currently developing a Spark Streaming application and trying to write
my first unit test. I've used Java for this application, and I also need
use Java (and JUnit) for writing unit tests.

I could not find any documentation that focuses on Spark Streaming unit
testing, all I could find was the Java based unit tests in Spark Streaming
source code:


https://github.com/apache/spark/blob/branch-1.1/streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
https://webmail.cronos.be/owa/redir.aspx?C=Jqwj6T-dSk63ezaf0lH8P7NV7ZXa4NEIco3EZ9VQhfNEvS3bxfKu6ZKfDqYOFlFaCAvyOEVymdw.URL=https%3a%2f%2fgithub.com%2fapache%2fspark%2fblob%2fbranch-1.1%2fstreaming%2fsrc%2ftest%2fjava%2forg%2fapache%2fspark%2fstreaming%2fJavaAPISuite.java

that depends on a Scala file:


https://github.com/apache/spark/blob/branch-1.1/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala
https://webmail.cronos.be/owa/redir.aspx?C=Jqwj6T-dSk63ezaf0lH8P7NV7ZXa4NEIco3EZ9VQhfNEvS3bxfKu6ZKfDqYOFlFaCAvyOEVymdw.URL=https%3a%2f%2fgithub.com%2fapache%2fspark%2fblob%2fbranch-1.1%2fstreaming%2fsrc%2ftest%2fjava%2forg%2fapache%2fspark%2fstreaming%2fJavaTestUtils.scala

which, in turn, depends on the Scala test files in


https://github.com/apache/spark/tree/branch-1.1/streaming/src/test/scala/org/apache/spark/streaming
https://webmail.cronos.be/owa/redir.aspx?C=Jqwj6T-dSk63ezaf0lH8P7NV7ZXa4NEIco3EZ9VQhfNEvS3bxfKu6ZKfDqYOFlFaCAvyOEVymdw.URL=https%3a%2f%2fgithub.com%2fapache%2fspark%2ftree%2fbranch-1.1%2fstreaming%2fsrc%2ftest%2fscala%2forg%2fapache%2fspark%2fstreaming

So I thought that I could grab the Spark source code, switch to branch-1.1
branch and then only compile 'core' and 'streaming' modules, hopefully
ending up with the compiled classes (or jar files) of the Streaming test
utilities, so that I can import them in my Java based Spark Streaming
application.

However, trying to build it via the following command line failed:

mvn -pl core,streaming package

You can see the full output at the end of this message.

Any ideas how to progress?

Full output of the build:

emre@emre-ubuntu:~/code/spark$ mvn -pl core,streaming package
[INFO] Scanning for projects...
[INFO]

[INFO] Reactor Build Order:
[INFO]
[INFO] Spark Project Core
[INFO] Spark Project Streaming
[INFO]

[INFO]

[INFO] Building Spark Project Core 1.1.2-SNAPSHOT
[INFO]

Downloading:
https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.pom
Downloaded:
https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.pom
(5 KB at 5.8 KB/sec)
Downloading:
https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar
Downloaded:
https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar
(31 KB at 200.4 KB/sec)
Downloading:
https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.pom
Downloaded:
https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.pom
(24 KB at 178.9 KB/sec)
Downloading:
https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.pom
Downloaded:
https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.pom
(3 KB at 22.5 KB/sec)
Downloading:
https://repo1.maven.org/maven2/org/scala-lang/scalap/2.10.4/scalap-2.10.4.pom
Downloaded:
https://repo1.maven.org/maven2/org/scala-lang/scalap/2.10.4/scalap-2.10.4.pom
(2 KB at 19.2 KB/sec)
Downloading:
https://repo1.maven.org/maven2/org/apache/derby/derby/10.4.2.0/derby-10.4.2.0.pom
Downloaded:
https://repo1.maven.org/maven2/org/apache/derby/derby/10.4.2.0/derby-10.4.2.0.pom
(2 KB at 14.9 KB/sec)
Downloading:
https://repo1.maven.org/maven2/org/mockito/mockito-all/1.9.0/mockito-all-1.9.0.pom
Downloaded:
https://repo1.maven.org/maven2/org/mockito/mockito-all/1.9.0/mockito-all-1.9.0.pom
(1010 B at 4.1 KB/sec)
Downloading:
https://repo1.maven.org/maven2/org/easymock/easymockclassextension/3.1/easymockclassextension-3.1.pom
Downloaded:
https://repo1.maven.org/maven2/org/easymock/easymockclassextension/3.1/easymockclassextension-3.1.pom
(5 KB at 42.9 KB/sec)
Downloading:
https://repo1.maven.org/maven2/org/easymock/easymock-parent/3.1/easymock-parent-3.1.pom
Downloaded:
https://repo1.maven.org/maven2/org/easymock/easymock-parent/3.1/easymock-parent-3.1.pom
(13 KB at 133.8 KB/sec)
Downloading:
https://repo1.maven.org/maven2/org/easymock/easymock/3.1/easymock-3.1.pom
Downloaded:
https://repo1.maven.org/maven2/org/easymock/easymock/3.1/easymock-3.1.pom
(6 KB at 38.8 KB/sec)
Downloading:
https://repo1

Re: How can I compile only the core and streaming (so that I can get test utilities of streaming)?

2014-12-05 Thread Ted Yu
Please specify '-DskipTests' on commandline. 

Cheers

On Dec 5, 2014, at 3:52 AM, Emre Sevinc emre.sev...@gmail.com wrote:

 Hello,
 
 I'm currently developing a Spark Streaming application and trying to write my 
 first unit test. I've used Java for this application, and I also need use 
 Java (and JUnit) for writing unit tests.
 
 I could not find any documentation that focuses on Spark Streaming unit 
 testing, all I could find was the Java based unit tests in Spark Streaming 
 source code:
 
   
 https://github.com/apache/spark/blob/branch-1.1/streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
 
 that depends on a Scala file:
 
   
 https://github.com/apache/spark/blob/branch-1.1/streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala
 
 which, in turn, depends on the Scala test files in
 

 https://github.com/apache/spark/tree/branch-1.1/streaming/src/test/scala/org/apache/spark/streaming
 
 So I thought that I could grab the Spark source code, switch to branch-1.1 
 branch and then only compile 'core' and 'streaming' modules, hopefully ending 
 up with the compiled classes (or jar files) of the Streaming test utilities, 
 so that I can import them in my Java based Spark Streaming application.
 
 However, trying to build it via the following command line failed:
 
 mvn -pl core,streaming package
 
 You can see the full output at the end of this message. 
 
 Any ideas how to progress?
 
 Full output of the build:
 
 emre@emre-ubuntu:~/code/spark$ mvn -pl core,streaming package
 [INFO] Scanning for projects...
 [INFO] 
 
 [INFO] Reactor Build Order:
 [INFO] 
 [INFO] Spark Project Core
 [INFO] Spark Project Streaming
 [INFO]
  
 [INFO] 
 
 [INFO] Building Spark Project Core 1.1.2-SNAPSHOT
 [INFO] 
 
 Downloading: 
 https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.pom
 Downloaded: 
 https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.pom
  (5 KB at 5.8 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar
 Downloaded: 
 https://repo1.maven.org/maven2/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar
  (31 KB at 200.4 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.pom
 Downloaded: 
 https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.pom
  (24 KB at 178.9 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.pom
 Downloaded: 
 https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.pom
  (3 KB at 22.5 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/scala-lang/scalap/2.10.4/scalap-2.10.4.pom
 Downloaded: 
 https://repo1.maven.org/maven2/org/scala-lang/scalap/2.10.4/scalap-2.10.4.pom 
 (2 KB at 19.2 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/apache/derby/derby/10.4.2.0/derby-10.4.2.0.pom
 Downloaded: 
 https://repo1.maven.org/maven2/org/apache/derby/derby/10.4.2.0/derby-10.4.2.0.pom
  (2 KB at 14.9 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/mockito/mockito-all/1.9.0/mockito-all-1.9.0.pom
 Downloaded: 
 https://repo1.maven.org/maven2/org/mockito/mockito-all/1.9.0/mockito-all-1.9.0.pom
  (1010 B at 4.1 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/easymock/easymockclassextension/3.1/easymockclassextension-3.1.pom
 Downloaded: 
 https://repo1.maven.org/maven2/org/easymock/easymockclassextension/3.1/easymockclassextension-3.1.pom
  (5 KB at 42.9 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/easymock/easymock-parent/3.1/easymock-parent-3.1.pom
 Downloaded: 
 https://repo1.maven.org/maven2/org/easymock/easymock-parent/3.1/easymock-parent-3.1.pom
  (13 KB at 133.8 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/easymock/easymock/3.1/easymock-3.1.pom
 Downloaded: 
 https://repo1.maven.org/maven2/org/easymock/easymock/3.1/easymock-3.1.pom (6 
 KB at 38.8 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/cglib/cglib-nodep/2.2.2/cglib-nodep-2.2.2.pom
 Downloaded: 
 https://repo1.maven.org/maven2/cglib/cglib-nodep/2.2.2/cglib-nodep-2.2.2.pom 
 (2 KB at 9.9 KB/sec)
 Downloading: 
 https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.3/commons-math3-3.3.jar
 Downloading: 
 https://repo1.maven.org/maven2/org/spark-project/akka/akka-testkit_2.10/2.2.3-shaded-protobuf/akka-testkit_2.10-2.2.3-shaded-protobuf.jar

Re: How can I compile only the core and streaming (so that I can get test utilities of streaming)?

2014-12-05 Thread Emre Sevinc
Hello,

Specifying '-DskipTests' on commandline worked, though I can't be sure
whether first running 'sbt assembly' also contributed to the solution.
(I've tried 'sbt assembly' because branch-1.1's README says to use sbt).

Thanks for the answer.

Kind regards,
Emre Sevinç


Problems creating and reading a large test file

2014-12-05 Thread Steve Lewis
I am trying to look at problems reading a data file over 4G. In my testing
I am trying to create such a file.
My plan is to create a fasta file (a simple format used in biology)
looking like
1
TCCTTACGGAGTTCGGGTGTTTATCTTACTTATCGCGGTTCGCTGCCGCTCCGGGAGCCCGGATAGGCTGCGTTAATACCTAAGGAGCGCGTATTG
2
GTCTGATCTAAATGCGACGACGTCTTTAGTGCTAAGTGGAACCCAATCTTAAGACCCAGGCTCTTAAGCAGAAACAGACCGTCCCTGCCTCCTGGAGTAT
3
...
I create a list with 5000 structures - use flatMap to add 5000 per entry
and then either call saveAsText or dnaFragmentIterator =
mySet.toLocalIterator(); and write to HDFS

Then I try to call JavaRDDString lines = ctx.textFile(hdfsFileName);

what I get on a 16 node cluster
14/12/06 01:49:21 ERROR SendingConnection: Exception while reading
SendingConnection to ConnectionManagerId(pltrd007.labs.uninett.no,50119)
java.nio.channels.ClosedChannelException

2 14/12/06 01:49:35 ERROR BlockManagerMasterActor: Got two different block
manager registrations on 20140711-081617-711206558-5050-2543-13

The code is at the line below - I did not want to spam the group although
it is only a couple of pages -
I am baffled - there are no issues when I create  a few thousand records
but things blow up when I try 25 million records or a file of 6B or so

Can someone take a look - it is not a lot of code

https://drive.google.com/file/d/0B4cgoSGuA4KWUmo3UzBZRmU5M3M/view?usp=sharing


This is just a test

2014-11-26 Thread NingjunWang
Test message



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/This-is-just-a-test-tp19895.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark: Simple local test failed depending on memory settings

2014-11-21 Thread rzykov
Dear all,

We encountered problems of failed jobs with huge amount of data.

A simple local test was prepared for this question at
https://gist.github.com/copy-of-rezo/6a137e13a1e4f841e7eb
It generates 2 sets of key-value pairs, join them, selects distinct values
and counts data finally.

object Spill {
  def generate = {
for{
  j - 1 to 10
  i - 1 to 200
} yield(j, i)
  }
 
  def main(args: Array[String]) {
val conf = new SparkConf().setAppName(getClass.getSimpleName)
conf.set(spark.shuffle.spill, true)
conf.set(spark.serializer,
org.apache.spark.serializer.KryoSerializer)
val sc = new SparkContext(conf)
println(generate)
 
val dataA = sc.parallelize(generate)
val dataB = sc.parallelize(generate)
val dst = dataA.join(dataB).distinct().count()
println(dst)
  }
}

We compiled it locally and run 3 times with different settings of memory:
1) *--executor-memory 10M --driver-memory 10M --num-executors 1
--executor-cores 1*
It fails wtih java.lang.OutOfMemoryError: GC overhead limit exceeded at 
.
org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)

2) *--executor-memory 20M --driver-memory 20M --num-executors 1
--executor-cores 1*
It works OK

3)  *--executor-memory 10M --driver-memory 10M --num-executors 1
--executor-cores 1* But let's make less data for i from 200 to 100. It
reduces input data in 2 times and joined data in 4 times

  def generate = {
for{
  j - 1 to 10
  i - 1 to 100   // previous value was 200 
} yield(j, i)
  }
This code works OK. 

We don't understand why 10M is not enough for such simple operation with
32000 bytes of ints (2 * 10 * 200 * 2 * 4) approximately? 10M of RAM works
if we change the data volume in 2 times (2000 of records of (int, int)).  
Why spilling to disk doesn't cover this case? 





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Simple-local-test-failed-depending-on-memory-settings-tp19473.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SPARK BENCHMARK TEST

2014-09-18 Thread VJ Shalish
Hi Please can someone advice on this.

On Wed, Sep 17, 2014 at 6:59 PM, VJ Shalish vjshal...@gmail.com wrote:

 I am trying to benchmark spark in a hadoop cluster.
 I need to design a sample spark job to test the CPU utilization, RAM
 usage, Input throughput, Output throughput and Duration of execution in the
 cluster.

 I need to test the state of the cluster for :-

 A spark job which uses high CPU
 A spark job which uses high RAM
 A spark job which uses high Input throughput
 A spark job which uses high Output throughput
 A spark job which takes long time.

 These have to be tested individually and a combination of these scenarios
 would also be used.

 Please help me in understanding the factors of a Spark Job which would
 contribute to  CPU utilization, RAM usage, Input throughput, Output
 throughput, Duration of execution in the cluster. So that I can design
 spark jobs that could be used for testing.



 Thanks
 Shalish.





SPARK BENCHMARK TEST

2014-09-17 Thread VJ Shalish
 I am trying to benchmark spark in a hadoop cluster.
I need to design a sample spark job to test the CPU utilization, RAM usage,
Input throughput, Output throughput and Duration of execution in the
cluster.

I need to test the state of the cluster for :-

A spark job which uses high CPU
A spark job which uses high RAM
A spark job which uses high Input throughput
A spark job which uses high Output throughput
A spark job which takes long time.

These have to be tested individually and a combination of these scenarios
would also be used.

Please help me in understanding the factors of a Spark Job which would
contribute to  CPU utilization, RAM usage, Input throughput, Output
throughput, Duration of execution in the cluster. So that I can design
spark jobs that could be used for testing.



Thanks
Shalish.


Re: Cannot run unit test.

2014-09-17 Thread Jies
When I run `sbt test-only SparkTest` or `sbt test-only SparkTest1`, it
was pass. But run `set test` to tests SparkTest and SparkTest1, it was
failed.

If merge all cases into one file, the test was pass.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p14506.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[spark upgrade] Error communicating with MapOutputTracker when running test cases in latest spark

2014-09-10 Thread Adrian Mocanu
I use
https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
to help me with testing.

In spark 9.1 my tests depending on TestSuiteBase worked fine. As soon as I 
switched to latest (1.0.1) all tests fail. My sbt import is: org.apache.spark 
%% spark-core % 1.1.0-SNAPSHOT % provided

One exception I get is:
Error communicating with MapOutputTracker
org.apache.spark.SparkException: Error communicating with MapOutputTracker 

How can I fix this?

Found a thread on this error but not very helpful: 
http://mail-archives.apache.org/mod_mbox/spark-dev/201405.mbox/%3ctencent_6b37d69c54f76819509a5...@qq.com%3E

-Adrian



Re: What about implementing various hypothesis test for LogisticRegression in MLlib

2014-08-24 Thread Xiangrui Meng
Thanks for the reference! Many tests are not designed for big data:
http://magazine.amstat.org/blog/2010/09/01/statrevolution/ . So we
need to understand which tests are proper. Feel free to create a JIRA
and let's move our discussion there. -Xiangrui

On Fri, Aug 22, 2014 at 8:44 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi Xiangrui,

 You can refer to An Introduction to Statistical Learning with Applications
 in R, there are many stander hypothesis test to do regarding to linear
 regression and logistic regression, they should be implement in the fist
 order, then we will  list some other testes, which are also important when
 using logistic regression to build score cards.

 Xiaobo Gu


 -- Original --
 From:  Xiangrui Meng;men...@gmail.com;
 Send time: Wednesday, Aug 20, 2014 2:18 PM
 To: guxiaobo1...@qq.com;
 Cc: user@spark.apache.orguser@spark.apache.org;
 Subject:  Re: What about implementing various hypothesis test for
 LogisticRegression in MLlib

 We implemented chi-squared tests in v1.1:
 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166
 and we will add more after v1.1. Feedback on which tests should come
 first would be greatly appreciated. -Xiangrui

 On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi,

 From the documentation I think only the model fitting part is implement,
 what about the various hypothesis test and performance indexes used to
 evaluate the model fit?

 Regards,

 Xiaobo Gu

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: What about implementing various hypothesis test for LogisticRegression in MLlib

2014-08-22 Thread guxiaobo1982
Hi Xiangrui,


You can refer to An Introduction to Statistical Learning with Applications in 
R, there are many stander hypothesis test to do regarding to linear 
regression and logistic regression, they should be implement in the fist order, 
then we will  list some other testes, which are also important when using 
logistic regression to build score cards.


Xiaobo Gu




-- Original --
From:  Xiangrui Meng;men...@gmail.com;
Send time: Wednesday, Aug 20, 2014 2:18 PM
To: guxiaobo1...@qq.com; 
Cc: user@spark.apache.orguser@spark.apache.org; 
Subject:  Re: What about implementing various hypothesis test for 
LogisticRegression in MLlib



We implemented chi-squared tests in v1.1:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166
and we will add more after v1.1. Feedback on which tests should come
first would be greatly appreciated. -Xiangrui

On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi,

 From the documentation I think only the model fitting part is implement,
 what about the various hypothesis test and performance indexes used to
 evaluate the model fit?

 Regards,

 Xiaobo Gu

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: What about implementing various hypothesis test for Logistic Regression in MLlib

2014-08-20 Thread Xiangrui Meng
We implemented chi-squared tests in v1.1:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166
and we will add more after v1.1. Feedback on which tests should come
first would be greatly appreciated. -Xiangrui

On Tue, Aug 19, 2014 at 9:50 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi,

 From the documentation I think only the model fitting part is implement,
 what about the various hypothesis test and performance indexes used to
 evaluate the model fit?

 Regards,

 Xiaobo Gu

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



What about implementing various hypothesis test for Logistic Regression in MLlib

2014-08-19 Thread guxiaobo1982
Hi,

From the documentation I think only the model fitting part is implement, what 
about the various hypothesis test and performance indexes used to evaluate the 
model fit?


Regards,


Xiaobo Gu

Re: Unit Test for Spark Streaming

2014-08-08 Thread JiajiaJing
Hi TD,

I tried some different setup on maven these days, and now I can at least get
something when running mvn test. However, it seems like scalatest cannot
find the test cases specified in the test suite.
Here is the output I get: 
http://apache-spark-user-list.1001560.n3.nabble.com/file/n11825/Screen_Shot_2014-08-08_at_5.png
 

Could you please give me some details on how you setup the ScalaTest on your
machine? I believe there must be some other setup issue on my machine but I
cannot figure out why...
And here are the plugins and dependencies related to scalatest in my pom.xml
:
plugin
  groupIdorg.apache.maven.plugins/groupId
  artifactIdmaven-surefire-plugin/artifactId
  version2.7/version
  configuration
skipTeststrue/skipTests
  /configuration
/plugin

plugin
  groupIdorg.scalatest/groupId
  artifactIdscalatest-maven-plugin/artifactId
  version1.0/version
  configuration
   
reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
junitxml./junitxml
   
filereports${project.build.directory}/SparkTestSuite.txt/filereports
tagsToIncludeATag/tagsToInclude
systemProperties
  java.awt.headlesstrue/java.awt.headless
 
spark.test.home${session.executionRootDirectory}/spark.test.home
  spark.testing1/spark.testing
/systemProperties
  /configuration
  executions
execution
  idtest/id
  goals
goaltest/goal
  /goals
/execution
  /executions
/plugin


dependency
  groupIdjunit/groupId
  artifactIdjunit/artifactId
  version4.8.1/version
  scopetest/scope
/dependency
dependency
  groupIdorg.scalatest/groupId
  artifactIdscalatest_2.10/artifactId
  version2.2.1/version
  scopetest/scope
/dependency

Thank you very much!

Best Regards,

Jiajia



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11825.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit Test for Spark Streaming

2014-08-06 Thread JiajiaJing
Thank you TD,

I have worked around that problem and now the test compiles. 
However, I don't actually see that test running. As when I do mvn test, it
just says BUILD SUCCESS, without any TEST section on stdout. 
Are we suppose to use mvn test to run the test? Are there any other
methods can be used to run this test?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit Test for Spark Streaming

2014-08-06 Thread Tathagata Das
Does it not show the name of the testsuite on stdout, showing that it has
passed? Can you try writing a small test unit-test, in the same way as
your kafka unit test, and with print statements on stdout ... to see
whether it works? I believe it is some configuration issue in maven, which
is hard for me to guess.

TD


On Wed, Aug 6, 2014 at 12:53 PM, JiajiaJing jj.jing0...@gmail.com wrote:

 Thank you TD,

 I have worked around that problem and now the test compiles.
 However, I don't actually see that test running. As when I do mvn test,
 it
 just says BUILD SUCCESS, without any TEST section on stdout.
 Are we suppose to use mvn test to run the test? Are there any other
 methods can be used to run this test?





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Unit Test for Spark Streaming

2014-08-05 Thread Tathagata Das
That function is simply deletes a directory recursively. you can use
alternative libraries. see this discussion
http://stackoverflow.com/questions/779519/delete-files-recursively-in-java


On Tue, Aug 5, 2014 at 5:02 PM, JiajiaJing jj.jing0...@gmail.com wrote:
 Hi TD,

 I encountered a problem when trying to run the KafkaStreamSuite.scala unit
 test.
 I added scalatest-maven-plugin to my pom.xml, then ran mvn test, and got
 the follow error message:

 error: object Utils in package util cannot be accessed in package
 org.apache.spark.util
 [INFO] brokerConf.logDirs.foreach { f =
 Utils.deleteRecursively(new File(f)) }
 [INFO]^

 I checked that Utils.scala does exists under
 spark/core/src/main/scala/org/apache/spark/util/, so I have no idea about
 why this access error.
 Could you please help me with this?

 Thank you very much!

 Best Regards,

 Jiajia



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11505.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Unit Test for Spark Streaming

2014-08-04 Thread JiajiaJing
Hello Spark Users,

I have a spark streaming program that stream data from kafka topics and
output as parquet file on HDFS. 
Now I want to write a unit test for this program to make sure the output
data is correct (i.e not missing any data from kafka). 
However, I have no idea about how to do this, especially how to mock a kafka
topic.
Can someone help me with this?

Thank you very much!

Best Regards,

Jiajia



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit Test for Spark Streaming

2014-08-04 Thread Tathagata Das
Appropriately timed question! Here is the PR that adds a real unit
test for Kafka stream in Spark Streaming. Maybe this will help!

https://github.com/apache/spark/pull/1751/files

On Mon, Aug 4, 2014 at 6:30 PM, JiajiaJing jj.jing0...@gmail.com wrote:
 Hello Spark Users,

 I have a spark streaming program that stream data from kafka topics and
 output as parquet file on HDFS.
 Now I want to write a unit test for this program to make sure the output
 data is correct (i.e not missing any data from kafka).
 However, I have no idea about how to do this, especially how to mock a kafka
 topic.
 Can someone help me with this?

 Thank you very much!

 Best Regards,

 Jiajia



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit Test for Spark Streaming

2014-08-04 Thread JiajiaJing
This helps a lot!!
Thank you very much!

Jiajia



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11396.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



sbt + idea + test

2014-07-14 Thread boci
Hi guys,


I want to use Elasticsearch and HBase in my spark project, I want to create
a test. I pulled up ES and Zookeeper, but if I put val htest = new
HBaseTestingUtility() to my app I got a strange exception (compilation
time, not runtime).

https://gist.github.com/b0c1/4a4b3f6350816090c3b5

Any idea?

--
Skype: boci13, Hangout: boci.b...@gmail.com


Re: Run spark unit test on Windows 7

2014-07-03 Thread Konstantin Kudryavtsev
It sounds really strange...

I guess it is a bug, critical bug and must be fixed... at least some flag
must be add (unable.hadoop)

I found the next workaround :
1) download compiled winutils.exe from
http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight
2) put this file into d:\winutil\bin
3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\)

after that test runs

Thank you,
Konstantin Kudryavtsev


On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote:

 You don't actually need it per se - its just that some of the Spark
 libraries are referencing Hadoop libraries even if they ultimately do not
 call them. When I was doing some early builds of Spark on Windows, I
 admittedly had Hadoop on Windows running as well and had not run into this
 particular issue.



 On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 No, I don't

 why do I need to have HDP installed? I don't use Hadoop at all and I'd
 like to read data from local filesystem

 On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote:

 By any chance do you have HDP 2.1 installed? you may need to install the
 utils and update the env variables per
 http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows


 On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 Hi Andrew,

 it's windows 7 and I doesn't set up any env variables here

 The full stack trace:

 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in
 the hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
  at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
  at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
  at org.apache.hadoop.security.Groups.init(Groups.java:77)
 at
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
  at
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
 at
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
  at
 org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
 at
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
  at
 org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
 at org.apache.spark.SparkContext.init(SparkContext.scala:228)
  at org.apache.spark.SparkContext.init(SparkContext.scala:97)
 at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
  at junit.framework.TestCase.runTest(TestCase.java:168)
 at junit.framework.TestCase.runBare(TestCase.java:134)
  at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
  at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
  at junit.framework.TestSuite.runTest(TestSuite.java:232)
 at junit.framework.TestSuite.run(TestSuite.java:227)
  at
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
  at
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
 at
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
  at
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)


 Thank you,
 Konstantin Kudryavtsev


 On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:

 Hi Konstatin,

 We use hadoop as a library in a few places in Spark. I wonder why the
 path includes null though.

 Could you provide the full stack trace?

 Andrew


 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:

 Hi all,

 I'm trying to run some transformation on *Spark*, it works

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
Hi Konstantin,

Could you please create a jira item at: 
https://issues.apache.org/jira/browse/SPARK/ so this issue can be tracked?

Thanks,
Denny


On July 2, 2014 at 11:45:24 PM, Konstantin Kudryavtsev 
(kudryavtsev.konstan...@gmail.com) wrote:

It sounds really strange...

I guess it is a bug, critical bug and must be fixed... at least some flag must 
be add (unable.hadoop)

I found the next workaround :
1) download compiled winutils.exe from 
http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight
2) put this file into d:\winutil\bin
3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\)

after that test runs

Thank you,
Konstantin Kudryavtsev


On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote:
You don't actually need it per se - its just that some of the Spark libraries 
are referencing Hadoop libraries even if they ultimately do not call them. When 
I was doing some early builds of Spark on Windows, I admittedly had Hadoop on 
Windows running as well and had not run into this particular issue.



On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev 
kudryavtsev.konstan...@gmail.com wrote:
No, I don’t

why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to 
read data from local filesystem

On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote:

By any chance do you have HDP 2.1 installed? you may need to install the utils 
and update the env variables per 
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows


On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
kudryavtsev.konstan...@gmail.com wrote:

Hi Andrew,

it's windows 7 and I doesn't set up any env variables here 

The full stack trace:

14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
at org.apache.hadoop.security.Groups.init(Groups.java:77)
at 
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
at org.apache.spark.SparkContext.init(SparkContext.scala:228)
at org.apache.spark.SparkContext.init(SparkContext.scala:97)
at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)


Thank you,
Konstantin Kudryavtsev


On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:
Hi Konstatin,

We use hadoop as a library in a few places in Spark. I wonder why the path 
includes null though.

Could you provide the full stack trace

Re: Run spark unit test on Windows 7

2014-07-03 Thread Kostiantyn Kudriavtsev
Hi Denny,

just created https://issues.apache.org/jira/browse/SPARK-2356

On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote:

 Hi Konstantin,
 
 Could you please create a jira item at: 
 https://issues.apache.org/jira/browse/SPARK/ so this issue can be tracked?
 
 Thanks,
 Denny
 
 
 On July 2, 2014 at 11:45:24 PM, Konstantin Kudryavtsev 
 (kudryavtsev.konstan...@gmail.com) wrote:
 
 It sounds really strange...
 
 I guess it is a bug, critical bug and must be fixed... at least some flag 
 must be add (unable.hadoop)
 
 I found the next workaround :
 1) download compiled winutils.exe from 
 http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight
 2) put this file into d:\winutil\bin
 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\)
 
 after that test runs
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote:
 You don't actually need it per se - its just that some of the Spark 
 libraries are referencing Hadoop libraries even if they ultimately do not 
 call them. When I was doing some early builds of Spark on Windows, I 
 admittedly had Hadoop on Windows running as well and had not run into this 
 particular issue.
 
 
 
 On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 No, I don’t
 
 why do I need to have HDP installed? I don’t use Hadoop at all and I’d like 
 to read data from local filesystem
 
 On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote:
 
 By any chance do you have HDP 2.1 installed? you may need to install the 
 utils and update the env variables per 
 http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
 
 
 On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Hi Andrew,
 
 it's windows 7 and I doesn't set up any env variables here 
 
 The full stack trace:
 
 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
 hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 at org.apache.hadoop.security.Groups.init(Groups.java:77)
 at 
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
 at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
 at 
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
 at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
 at 
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
 at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
 at org.apache.spark.SparkContext.init(SparkContext.scala:228)
 at org.apache.spark.SparkContext.init(SparkContext.scala:97)
 at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:168)
 at junit.framework.TestCase.runBare(TestCase.java:134)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
 at junit.framework.TestSuite.run(TestSuite.java:227)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
 at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
 at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
 at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
Thanks!  will take a look at this later today. HTH!



 On Jul 3, 2014, at 11:09 AM, Kostiantyn Kudriavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Hi Denny,
 
 just created https://issues.apache.org/jira/browse/SPARK-2356
 
 On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote:
 
 Hi Konstantin,
 
 Could you please create a jira item at: 
 https://issues.apache.org/jira/browse/SPARK/ so this issue can be tracked?
 
 Thanks,
 Denny
 
 
 On July 2, 2014 at 11:45:24 PM, Konstantin Kudryavtsev 
 (kudryavtsev.konstan...@gmail.com) wrote:
 
 It sounds really strange...
 
 I guess it is a bug, critical bug and must be fixed... at least some flag 
 must be add (unable.hadoop)
 
 I found the next workaround :
 1) download compiled winutils.exe from 
 http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight
 2) put this file into d:\winutil\bin
 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\)
 
 after that test runs
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote:
 You don't actually need it per se - its just that some of the Spark 
 libraries are referencing Hadoop libraries even if they ultimately do not 
 call them. When I was doing some early builds of Spark on Windows, I 
 admittedly had Hadoop on Windows running as well and had not run into this 
 particular issue.
 
 
 
 On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 No, I don’t
 
 why do I need to have HDP installed? I don’t use Hadoop at all and I’d 
 like to read data from local filesystem
 
 On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote:
 
 By any chance do you have HDP 2.1 installed? you may need to install the 
 utils and update the env variables per 
 http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
 
 
 On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Hi Andrew,
 
 it's windows 7 and I doesn't set up any env variables here 
 
 The full stack trace:
 
 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in 
 the hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe 
 in the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 at org.apache.hadoop.security.Groups.init(Groups.java:77)
 at 
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
 at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
 at 
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
 at 
 org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
 at 
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
 at 
 org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
 at org.apache.spark.SparkContext.init(SparkContext.scala:228)
 at org.apache.spark.SparkContext.init(SparkContext.scala:97)
 at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:168)
 at junit.framework.TestCase.runBare(TestCase.java:134)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
 at junit.framework.TestSuite.run(TestSuite.java:227)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
 at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
 at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
 at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke

Run spark unit test on Windows 7

2014-07-02 Thread Konstantin Kudryavtsev
Hi all,

I'm trying to run some transformation on *Spark*, it works fine on cluster
(YARN, linux machines). However, when I'm trying to run it on local machine
(*Windows 7*) under unit test, I got errors:

java.io.IOException: Could not locate executable null\bin\winutils.exe
in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)


My code is following:

@Test
def testETL() = {
val conf = new SparkConf()
val sc = new SparkContext(local, test, conf)
try {
val etl = new IxtoolsDailyAgg() // empty constructor

val data = sc.parallelize(List(in1, in2, in3))

etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop
Assert.assertTrue(true)
} finally {
if(sc != null)
sc.stop()
}
}


Why is it trying to access hadoop at all? and how can I fix it? Thank you
in advance

Thank you,
Konstantin Kudryavtsev


Re: Run spark unit test on Windows 7

2014-07-02 Thread Andrew Or
Hi Konstatin,

We use hadoop as a library in a few places in Spark. I wonder why the path
includes null though.

Could you provide the full stack trace?

Andrew


2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
kudryavtsev.konstan...@gmail.com:

 Hi all,

 I'm trying to run some transformation on *Spark*, it works fine on
 cluster (YARN, linux machines). However, when I'm trying to run it on local
 machine (*Windows 7*) under unit test, I got errors:

 java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
 Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)


 My code is following:

 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor

 val data = sc.parallelize(List(in1, in2, in3))

 etl.etl(data) // rdd transformation, no access to SparkContext or 
 Hadoop
 Assert.assertTrue(true)
 } finally {
 if(sc != null)
 sc.stop()
 }
 }


 Why is it trying to access hadoop at all? and how can I fix it? Thank you
 in advance

 Thank you,
 Konstantin Kudryavtsev



Re: Run spark unit test on Windows 7

2014-07-02 Thread Konstantin Kudryavtsev
Hi Andrew,

it's windows 7 and I doesn't set up any env variables here

The full stack trace:

14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the
hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in
the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 at org.apache.hadoop.security.Groups.init(Groups.java:77)
at
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
 at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
at
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
 at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
at
org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
 at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
at org.apache.spark.SparkContext.init(SparkContext.scala:228)
 at org.apache.spark.SparkContext.init(SparkContext.scala:97)
at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
 at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
 at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
at
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
 at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)


Thank you,
Konstantin Kudryavtsev


On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:

 Hi Konstatin,

 We use hadoop as a library in a few places in Spark. I wonder why the path
 includes null though.

 Could you provide the full stack trace?

 Andrew


 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:

 Hi all,

 I'm trying to run some transformation on *Spark*, it works fine on
 cluster (YARN, linux machines). However, when I'm trying to run it on local
 machine (*Windows 7*) under unit test, I got errors:


 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)


 My code is following:


 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor

 val data = sc.parallelize(List(in1, in2, in3))

 etl.etl(data) // rdd transformation, no access to SparkContext or 
 Hadoop
 Assert.assertTrue(true)
 } finally {
 if(sc != null)
 sc.stop()
 }
 }


 Why is it trying to access hadoop at all? and how can I fix it? Thank you
 in advance

 Thank you,
 Konstantin Kudryavtsev





Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
By any chance do you have HDP 2.1 installed? you may need to install the utils 
and update the env variables per 
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows


 On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Hi Andrew,
 
 it's windows 7 and I doesn't set up any env variables here 
 
 The full stack trace:
 
 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
 for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
 hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
 Hadoop binaries.
   at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
   at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
   at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
   at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
   at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
   at org.apache.hadoop.security.Groups.init(Groups.java:77)
   at 
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
   at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
   at 
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
   at 
 org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
   at org.apache.spark.SparkContext.init(SparkContext.scala:228)
   at org.apache.spark.SparkContext.init(SparkContext.scala:97)
   at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)
   at junit.framework.TestResult$1.protect(TestResult.java:110)
   at junit.framework.TestResult.runProtected(TestResult.java:128)
   at junit.framework.TestResult.run(TestResult.java:113)
   at junit.framework.TestCase.run(TestCase.java:124)
   at junit.framework.TestSuite.runTest(TestSuite.java:232)
   at junit.framework.TestSuite.run(TestSuite.java:227)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
   at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:
 Hi Konstatin,
 
 We use hadoop as a library in a few places in Spark. I wonder why the path 
 includes null though.
 
 Could you provide the full stack trace?
 
 Andrew
 
 
 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:
 
 Hi all,
 
 I'm trying to run some transformation on Spark, it works fine on cluster 
 (YARN, linux machines). However, when I'm trying to run it on local machine 
 (Windows 7) under unit test, I got errors:
 
 
 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 
 My code is following:
 
 
 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor
 
 val data = sc.parallelize(List(in1, in2, in3))
 
 etl.etl(data) // rdd transformation, no access to SparkContext or 
 Hadoop
 Assert.assertTrue(true

Re: Run spark unit test on Windows 7

2014-07-02 Thread Kostiantyn Kudriavtsev
No, I don’t

why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to 
read data from local filesystem

On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote:

 By any chance do you have HDP 2.1 installed? you may need to install the 
 utils and update the env variables per 
 http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
 
 
 On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Hi Andrew,
 
 it's windows 7 and I doesn't set up any env variables here 
 
 The full stack trace:
 
 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
 hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
  at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
  at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
  at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
  at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
  at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
  at org.apache.hadoop.security.Groups.init(Groups.java:77)
  at 
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
  at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
  at 
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
  at 
 org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
  at 
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
  at 
 org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
  at org.apache.spark.SparkContext.init(SparkContext.scala:228)
  at org.apache.spark.SparkContext.init(SparkContext.scala:97)
  at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at junit.framework.TestCase.runTest(TestCase.java:168)
  at junit.framework.TestCase.runBare(TestCase.java:134)
  at junit.framework.TestResult$1.protect(TestResult.java:110)
  at junit.framework.TestResult.runProtected(TestResult.java:128)
  at junit.framework.TestResult.run(TestResult.java:113)
  at junit.framework.TestCase.run(TestCase.java:124)
  at junit.framework.TestSuite.runTest(TestSuite.java:232)
  at junit.framework.TestSuite.run(TestSuite.java:227)
  at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
  at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
  at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
  at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
  at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:
 Hi Konstatin,
 
 We use hadoop as a library in a few places in Spark. I wonder why the path 
 includes null though.
 
 Could you provide the full stack trace?
 
 Andrew
 
 
 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:
 
 Hi all,
 
 I'm trying to run some transformation on Spark, it works fine on cluster 
 (YARN, linux machines). However, when I'm trying to run it on local machine 
 (Windows 7) under unit test, I got errors:
 
 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 
 My code is following:
 
 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor
 
 val data

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
You don't actually need it per se - its just that some of the Spark
libraries are referencing Hadoop libraries even if they ultimately do not
call them. When I was doing some early builds of Spark on Windows, I
admittedly had Hadoop on Windows running as well and had not run into this
particular issue.



On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev 
kudryavtsev.konstan...@gmail.com wrote:

 No, I don’t

 why do I need to have HDP installed? I don’t use Hadoop at all and I’d
 like to read data from local filesystem

 On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote:

 By any chance do you have HDP 2.1 installed? you may need to install the
 utils and update the env variables per
 http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows


 On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 Hi Andrew,

 it's windows 7 and I doesn't set up any env variables here

 The full stack trace:

 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the
 hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
  at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
  at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
  at org.apache.hadoop.security.Groups.init(Groups.java:77)
 at
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
  at
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
 at
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
  at
 org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
 at
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
  at
 org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
 at org.apache.spark.SparkContext.init(SparkContext.scala:228)
  at org.apache.spark.SparkContext.init(SparkContext.scala:97)
 at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
  at junit.framework.TestCase.runTest(TestCase.java:168)
 at junit.framework.TestCase.runBare(TestCase.java:134)
  at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
  at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
  at junit.framework.TestSuite.runTest(TestSuite.java:232)
 at junit.framework.TestSuite.run(TestSuite.java:227)
  at
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
  at
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
 at
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
  at
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)


 Thank you,
 Konstantin Kudryavtsev


 On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:

 Hi Konstatin,

 We use hadoop as a library in a few places in Spark. I wonder why the
 path includes null though.

 Could you provide the full stack trace?

 Andrew


 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:

 Hi all,

 I'm trying to run some transformation on *Spark*, it works fine on
 cluster (YARN, linux machines). However, when I'm trying to run it on local
 machine (*Windows 7*) under unit test, I got errors:

 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)


 My code is following:

 @Test

InputStreamsSuite test failed

2014-06-22 Thread crazymb
Hello ,I am a new guy on scala spark,  yestday i compile spark from 1.0.0 
source code  and run test,there is  and testcase failed:


  For example  run command in shell : sbt/sbt testOnly 
org.apache.spark.streaming.InputStreamsSuite




the testcase: test(socket input stream)  would failed , test result  like :


===test result===
[info] InputStreamsSuite:
[info] - socket input stream *** FAILED *** (20 seconds, 547 milliseconds)
[info]   0 did not equal 5 (InputStreamsSuite.scala:96)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:318)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.newAssertionFailedException(InputStreamsSuite.scala:44)
[info]   at org.scalatest.Assertions$class.assert(Assertions.scala:401)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.assert(InputStreamsSuite.scala:44)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply$mcV$sp(InputStreamsSuite.scala:96)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply(InputStreamsSuite.scala:46)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite$$anonfun$1.apply(InputStreamsSuite.scala:46)
[info]   at org.scalatest.FunSuite$$anon$1.apply(FunSuite.scala:1265)
[info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1974)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.withFixture(InputStreamsSuite.scala:44)
[info]   at 
org.scalatest.FunSuite$class.invokeWithFixture$1(FunSuite.scala:1262)
[info]   at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
[info]   at org.scalatest.FunSuite$$anonfun$runTest$1.apply(FunSuite.scala:1271)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198)
[info]   at org.scalatest.FunSuite$class.runTest(FunSuite.scala:1271)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputStreamsSuite.scala:44)
[info]   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:171)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.runTest(InputStreamsSuite.scala:44)
[info]   at 
org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
[info]   at 
org.scalatest.FunSuite$$anonfun$runTests$1.apply(FunSuite.scala:1304)
[info]   at 
org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260)
[info]   at 
org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249)
[info]   at scala.collection.immutable.List.foreach(List.scala:318)
[info]   at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326)
[info]   at org.scalatest.FunSuite$class.runTests(FunSuite.scala:1304)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.runTests(InputStreamsSuite.scala:44)
[info]   at org.scalatest.Suite$class.run(Suite.scala:2303)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.org$scalatest$FunSuite$$super$run(InputStreamsSuite.scala:44)
[info]   at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310)
[info]   at org.scalatest.FunSuite$$anonfun$run$1.apply(FunSuite.scala:1310)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:362)
[info]   at org.scalatest.FunSuite$class.run(FunSuite.scala:1310)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.org$scalatest$BeforeAndAfter$$super$run(InputStreamsSuite.scala:44)
[info]   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:208)
[info]   at 
org.apache.spark.streaming.InputStreamsSuite.run(InputStreamsSuite.scala:44)
[info]   at 
org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:214)
[info]   at sbt.RunnerWrapper$1.runRunner2(FrameworkWrapper.java:223)
[info]   at sbt.RunnerWrapper$1.execute(FrameworkWrapper.java:236)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
[info]   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[info]   at java.lang.Thread.run(Thread.java:662)


I check the souce code, it seem's this assert would fail (line number is 96):


  assert(output.size === expectedOutput.size)


so  i  print the output  outputBuffer  expectedOutput value :


 // Verify whether data received was as expected
 85 println()
 86 println(output.size =  + output.size)
 87
 88 output.foreach(x = println([ + x.mkString(,) + ]))
 89 println(outputBuffer.size=+outputBuffer.size)
 90

Re: Unit test failure: Address already in use

2014-06-18 Thread Anselme Vignon
Hi,

Could your problem come from the fact that you run your tests in parallel ?

If you are spark in local mode, you cannot have concurrent spark instances
running. this means that your tests instantiating sparkContext cannot be
run in parallel. The easiest fix is to tell sbt to not run parallel tests.
This can be done by adding the following line in your build.sbt:

parallelExecution in Test := false

Cheers,

Anselme




2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.com:

 Hi,

 I have 3 unit tests (independent of each other) in the /src/test/scala
 folder. When I run each of them individually using: sbt test-only test,
 all the 3 pass the test. But when I run them all using sbt test, then
 they
 fail with the warning below. I am wondering if the binding exception
 results
 in failure to run the job, thereby causing the failure. If so, what can I
 do
 to address this binding exception? I am running these tests locally on a
 standalone machine (i.e. SparkContext(local, test)).


 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED
 org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address
 already in use
 java.net.BindException: Address already in use
 at sun.nio.ch.Net.bind0(Native Method)
 at sun.nio.ch.Net.bind(Net.java:174)
 at
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
 at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)


 thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



RE: Unit test failure: Address already in use

2014-06-18 Thread Lisonbee, Todd

Disabling parallelExecution has worked for me.

Other alternatives I’ve tried that also work include:

1. Using a lock – this will let tests execute in parallel except for those 
using a SparkContext.  If you have a large number of tests that could execute 
in parallel, this can shave off some time.

object TestingSparkContext {
  val lock = new Lock()
}

// before you instantiate your local SparkContext
TestingSparkContext.lock.acquire()

// after you call sc.stop()
TestingSparkContext.lock.release()


2. Sharing a local SparkContext between tests.

- This is nice because your tests will run faster.  Start-up and shutdown is 
time consuming (can add a few seconds per test).

- The downside is that your tests are using the same SparkContext so they are 
less independent of each other.  I haven’t seen issues with this yet but there 
are likely some things that might crop up.

Best,

Todd


From: Anselme Vignon [mailto:anselme.vig...@flaminem.com]
Sent: Wednesday, June 18, 2014 12:33 AM
To: user@spark.apache.org
Subject: Re: Unit test failure: Address already in use

Hi,

Could your problem come from the fact that you run your tests in parallel ?

If you are spark in local mode, you cannot have concurrent spark instances 
running. this means that your tests instantiating sparkContext cannot be run in 
parallel. The easiest fix is to tell sbt to not run parallel tests.
This can be done by adding the following line in your build.sbt:

parallelExecution in Test := false

Cheers,

Anselme



2014-06-17 23:01 GMT+02:00 SK 
skrishna...@gmail.commailto:skrishna...@gmail.com:
Hi,

I have 3 unit tests (independent of each other) in the /src/test/scala
folder. When I run each of them individually using: sbt test-only test,
all the 3 pass the test. But when I run them all using sbt test, then they
fail with the warning below. I am wondering if the binding exception results
in failure to run the job, thereby causing the failure. If so, what can I do
to address this binding exception? I am running these tests locally on a
standalone machine (i.e. SparkContext(local, test)).


14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED
org.eclipse.jetty.server.Server@3487b78dmailto:org.eclipse.jetty.server.Server@3487b78d:
 java.net.BindException: Address
already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:174)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)


thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Unit test failure: Address already in use

2014-06-18 Thread Philip Ogren
In my unit tests I have a base class that all my tests extend that has a 
setup and teardown method that they inherit.  They look something like this:


var spark: SparkContext = _

@Before
def setUp() {
Thread.sleep(100L) //this seems to give spark more time to 
reset from the previous test's tearDown

spark = new SparkContext(local, test spark)
}

@After
def tearDown() {
spark.stop
spark = null //not sure why this helps but it does!
System.clearProperty(spark.master.port)
   }


It's been since last fall (i.e. version 0.8.x) since I've examined this 
code and so I can't vouch that it is still accurate/necessary - but it 
still works for me.



On 06/18/2014 12:59 PM, Lisonbee, Todd wrote:


Disabling parallelExecution has worked for me.

Other alternatives I’ve tried that also work include:

1. Using a lock – this will let tests execute in parallel except for 
those using a SparkContext.  If you have a large number of tests that 
could execute in parallel, this can shave off some time.


object TestingSparkContext {

val lock = new Lock()

}

// before you instantiate your local SparkContext

TestingSparkContext.lock.acquire()

// after you call sc.stop()

TestingSparkContext.lock.release()

2. Sharing a local SparkContext between tests.

- This is nice because your tests will run faster.  Start-up and 
shutdown is time consuming (can add a few seconds per test).


- The downside is that your tests are using the same SparkContext so 
they are less independent of each other.  I haven’t seen issues with 
this yet but there are likely some things that might crop up.


Best,

Todd

*From:*Anselme Vignon [mailto:anselme.vig...@flaminem.com]
*Sent:* Wednesday, June 18, 2014 12:33 AM
*To:* user@spark.apache.org
*Subject:* Re: Unit test failure: Address already in use

Hi,

Could your problem come from the fact that you run your tests in 
parallel ?


If you are spark in local mode, you cannot have concurrent spark 
instances running. this means that your tests instantiating 
sparkContext cannot be run in parallel. The easiest fix is to tell sbt 
to not run parallel tests.


This can be done by adding the following line in your build.sbt:

parallelExecution in Test := false

Cheers,

Anselme

2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.com 
mailto:skrishna...@gmail.com:


Hi,

I have 3 unit tests (independent of each other) in the /src/test/scala
folder. When I run each of them individually using: sbt test-only
test,
all the 3 pass the test. But when I run them all using sbt test,
then they
fail with the warning below. I am wondering if the binding
exception results
in failure to run the job, thereby causing the failure. If so,
what can I do
to address this binding exception? I am running these tests
locally on a
standalone machine (i.e. SparkContext(local, test)).


14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED
org.eclipse.jetty.server.Server@3487b78d
mailto:org.eclipse.jetty.server.Server@3487b78d:
java.net.BindException: Address
already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:174)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)


thanks



--
View this message in context:

http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.





Unit test failure: Address already in use

2014-06-17 Thread SK
Hi,

I have 3 unit tests (independent of each other) in the /src/test/scala
folder. When I run each of them individually using: sbt test-only test,
all the 3 pass the test. But when I run them all using sbt test, then they
fail with the warning below. I am wondering if the binding exception results
in failure to run the job, thereby causing the failure. If so, what can I do
to address this binding exception? I am running these tests locally on a
standalone machine (i.e. SparkContext(local, test)).


14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED
org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address
already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:174)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)


thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


printing in unit test

2014-06-13 Thread SK
Hi,
My unit test is failing (the output is not matching the expected output). I
would like to printout the value of the output. But 
rdd.foreach(r=println(r)) does not work from the unit test. How can I print
or write out the output to a file/screen?

thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/printing-in-unit-test-tp7611.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


unit test

2014-06-06 Thread b0c1
Hi!

I have two question:
1.
I want to test my application. My app will write the result to elasticsearch
(stage 1) with saveAsHadoopFile. How can I write test case for it? Need to
pull up a MiniDFSCluster? Or there are other way?

My application flow plan: 
Kafka = Spark Streaming (enrich) - Elasticsearch = Spark (map/reduce) -
HBase

2.
Can Spark read data from elasticsearch? What is the prefered way for this?

b0c1



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/unit-test-tp7155.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Test

2014-05-15 Thread Matei Zaharia


答复: 答复: java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-0000/2/stdout (No such file or directory)

2014-05-11 Thread Francis . Hu
I  have just the problem resolved via running master and work daemons 
individually on where they are.

if I execute the shell: sbin/start-all.sh , the problem always exist.  

 

 

发件人: Francis.Hu [mailto:francis...@reachjunction.com] 
发送时间: Tuesday, May 06, 2014 10:31
收件人: user@spark.apache.org
主题: 答复: 答复: java.io.FileNotFoundException: 
/test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or 
directory)

 

i looked into the log again, all exceptions are about FileNotFoundException . 
In the Webui, no anymore info I can check except for the basic description of 
job.  

Attached the log file, could you help to take a look ? Thanks.

 

Francis.Hu

 

发件人: Tathagata Das [mailto:tathagata.das1...@gmail.com] 
发送时间: Tuesday, May 06, 2014 10:16
收件人: user@spark.apache.org
主题: Re: 答复: java.io.FileNotFoundException: 
/test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or 
directory)

 

Can you check the Spark worker logs on that machine. Either from the web ui, or 
directly. Should be /test/spark-XXX/logs/  See if that has any error.

If there is not permission issue, I am not why stdout and stderr is not being 
generated. 

 

TD

 

On Mon, May 5, 2014 at 7:13 PM, Francis.Hu francis...@reachjunction.com wrote:

The file does not exist in fact and no permission issue. 

 

francis@ubuntu-4:/test/spark-0.9.1$ ll work/app-20140505053550-/

total 24

drwxrwxr-x  6 francis francis 4096 May  5 05:35 ./

drwxrwxr-x 11 francis francis 4096 May  5 06:18 ../

drwxrwxr-x  2 francis francis 4096 May  5 05:35 2/

drwxrwxr-x  2 francis francis 4096 May  5 05:35 4/

drwxrwxr-x  2 francis francis 4096 May  5 05:35 7/

drwxrwxr-x  2 francis francis 4096 May  5 05:35 9/

 

Francis

 

发件人: Tathagata Das [mailto:tathagata.das1...@gmail.com] 
发送时间: Tuesday, May 06, 2014 3:45
收件人: user@spark.apache.org
主题: Re: java.io.FileNotFoundException: 
/test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or 
directory)

 

Do those file actually exist? Those stdout/stderr should have the output of the 
spark's executors running in the workers, and its weird that they dont exist. 
Could be permission issue - maybe the directories/files are not being generated 
because it cannot?

 

TD

 

On Mon, May 5, 2014 at 3:06 AM, Francis.Hu francis...@reachjunction.com wrote:

Hi,All

 

 

We run a spark cluster with three workers. 

created a spark streaming application,

then run the spark project using below command:

 

shell sbt run spark://192.168.219.129:7077 tcp://192.168.20.118:5556 foo

 

we looked at the webui of workers, jobs failed without any error or info, but 
FileNotFoundException occurred in workers' log file as below:

Is this an existent issue of spark? 

 

 

-in workers' 
logs/spark-francis-org.apache.spark.deploy.worker.Worker-1-ubuntu-4.out

 

14/05/05 02:39:39 WARN AbstractHttpConnection: 
/logPage/?appId=app-20140505053550-executorId=2logType=stdout

java.io.FileNotFoundException: 
/test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or 
directory)

at java.io.FileInputStream.open(Native Method)

at java.io.FileInputStream.init(FileInputStream.java:138)

at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687)

at 
org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119)

at 
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52)

at 
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52)

at org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61)

at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1040)

at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:976)

at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)

at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)

at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:363)

at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:483)

at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:920)

at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:982)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)

at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)

at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)

at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52

Re: Test

2014-05-11 Thread Azuryy
Got.

But it doesn't indicate all can receive this test.

Mail list is unstable recently.


Sent from my iPhone5s

 On 2014年5月10日, at 13:31, Matei Zaharia matei.zaha...@gmail.com wrote:
 
 This message has no content.


Re: Test

2014-05-11 Thread Aaron Davidson
I didn't get the original message, only the reply. Ruh-roh.


On Sun, May 11, 2014 at 8:09 AM, Azuryy azury...@gmail.com wrote:

 Got.

 But it doesn't indicate all can receive this test.

 Mail list is unstable recently.


 Sent from my iPhone5s

 On 2014年5月10日, at 13:31, Matei Zaharia matei.zaha...@gmail.com wrote:

 *This message has no content.*




java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-0000/2/stdout (No such file or directory)

2014-05-05 Thread Francis . Hu
Hi,All

 

 

We run a spark cluster with three workers. 

created a spark streaming application,

then run the spark project using below command:

 

shell sbt run spark://192.168.219.129:7077 tcp://192.168.20.118:5556 foo

 

we looked at the webui of workers, jobs failed without any error or info,
but FileNotFoundException occurred in workers' log file as below:

Is this an existent issue of spark? 

 

 

-in workers'
logs/spark-francis-org.apache.spark.deploy.worker.Worker-1-ubuntu-4.out-
---

 

14/05/05 02:39:39 WARN AbstractHttpConnection:
/logPage/?appId=app-20140505053550-executorId=2logType=stdout

java.io.FileNotFoundException:
/test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or
directory)

at java.io.FileInputStream.open(Native Method)

at java.io.FileInputStream.init(FileInputStream.java:138)

at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687)

at
org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119)

at
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.s
cala:52)

at
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.s
cala:52)

at
org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61)

at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java
:1040)

at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:
976)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135
)

at
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)

at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1
16)

at org.eclipse.jetty.server.Server.handle(Server.java:363)

at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo
nnection.java:483)

at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpC
onnection.java:920)

at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplet
e(AbstractHttpConnection.java:982)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)

at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)

at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java
:82)

at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.
java:628)

at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.j
ava:52)

at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:
608)

at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:5
43)

at java.lang.Thread.run(Thread.java:722)

14/05/05 02:39:41 WARN AbstractHttpConnection:
/logPage/?appId=app-20140505053550-executorId=9logType=stderr

java.io.FileNotFoundException:
/test/spark-0.9.1/work/app-20140505053550-/9/stderr (No such file or
directory)

at java.io.FileInputStream.open(Native Method)

at java.io.FileInputStream.init(FileInputStream.java:138)

at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687)

at
org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119)

at
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.s
cala:52)

at
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.s
cala:52)

at
org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61)

at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java
:1040)

at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:
976)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135
)

at
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)

at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1
16)

at org.eclipse.jetty.server.Server.handle(Server.java:363)

at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo
nnection.java:483)

at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpC
onnection.java:920)

at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplet
e(AbstractHttpConnection.java:982)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)

at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)

at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java
:82)

at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.
java:628)

at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.j
ava

答复: java.io.FileNotFoundException: /test/spark-0.9.1/work/app-20140505053550-0000/2/stdout (No such file or directory)

2014-05-05 Thread Francis . Hu
The file does not exist in fact and no permission issue. 

 

francis@ubuntu-4:/test/spark-0.9.1$ ll work/app-20140505053550-/

total 24

drwxrwxr-x  6 francis francis 4096 May  5 05:35 ./

drwxrwxr-x 11 francis francis 4096 May  5 06:18 ../

drwxrwxr-x  2 francis francis 4096 May  5 05:35 2/

drwxrwxr-x  2 francis francis 4096 May  5 05:35 4/

drwxrwxr-x  2 francis francis 4096 May  5 05:35 7/

drwxrwxr-x  2 francis francis 4096 May  5 05:35 9/

 

Francis

 

发件人: Tathagata Das [mailto:tathagata.das1...@gmail.com] 
发送时间: Tuesday, May 06, 2014 3:45
收件人: user@spark.apache.org
主题: Re: java.io.FileNotFoundException: 
/test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or 
directory)

 

Do those file actually exist? Those stdout/stderr should have the output of the 
spark's executors running in the workers, and its weird that they dont exist. 
Could be permission issue - maybe the directories/files are not being generated 
because it cannot?

 

TD

 

On Mon, May 5, 2014 at 3:06 AM, Francis.Hu francis...@reachjunction.com wrote:

Hi,All

 

 

We run a spark cluster with three workers. 

created a spark streaming application,

then run the spark project using below command:

 

shell sbt run spark://192.168.219.129:7077 tcp://192.168.20.118:5556 foo

 

we looked at the webui of workers, jobs failed without any error or info, but 
FileNotFoundException occurred in workers' log file as below:

Is this an existent issue of spark? 

 

 

-in workers' 
logs/spark-francis-org.apache.spark.deploy.worker.Worker-1-ubuntu-4.out

 

14/05/05 02:39:39 WARN AbstractHttpConnection: 
/logPage/?appId=app-20140505053550-executorId=2logType=stdout

java.io.FileNotFoundException: 
/test/spark-0.9.1/work/app-20140505053550-/2/stdout (No such file or 
directory)

at java.io.FileInputStream.open(Native Method)

at java.io.FileInputStream.init(FileInputStream.java:138)

at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687)

at 
org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119)

at 
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52)

at 
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52)

at org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61)

at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1040)

at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:976)

at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)

at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)

at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:363)

at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:483)

at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:920)

at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:982)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)

at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)

at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)

at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)

at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)

at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

at java.lang.Thread.run(Thread.java:722)

14/05/05 02:39:41 WARN AbstractHttpConnection: 
/logPage/?appId=app-20140505053550-executorId=9logType=stderr

java.io.FileNotFoundException: 
/test/spark-0.9.1/work/app-20140505053550-/9/stderr (No such file or 
directory)

at java.io.FileInputStream.open(Native Method)

at java.io.FileInputStream.init(FileInputStream.java:138)

at org.apache.spark.util.Utils$.offsetBytes(Utils.scala:687)

at 
org.apache.spark.deploy.worker.ui.WorkerWebUI.logPage(WorkerWebUI.scala:119)

at 
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52)

at 
org.apache.spark.deploy.worker.ui.WorkerWebUI$$anonfun$6.apply(WorkerWebUI.scala:52)

at org.apache.spark.ui.JettyUtils$$anon$1.handle(JettyUtils.scala:61)

at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1040)

at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:976

<    1   2