Re: The Dataset unit test is much slower than the RDD unit test (in Scala)
Which Spark version are you using? SPARK-36444[1] and SPARK-38138[2] may be related, please test w/ the patched version or disable DPP by setting spark.sql.optimizer.dynamicPartitionPruning.enabled=false to see if it helps. [1] https://issues.apache.org/jira/browse/SPARK-36444 [2] https://issues.apache.org/jira/browse/SPARK-38138 Thanks, Cheng Pan On Nov 2, 2022 at 00:14:34, Enrico Minack wrote: > Hi Tanin, > > running your test with option "spark.sql.planChangeLog.level" set to > "info" or "warn" (depending on your Spark log level) will show you > insights into the planning (which rules are applied, how long rules > take, how many iterations are done). > > Hoping this helps, > Enrico > > > Am 25.10.22 um 21:54 schrieb Tanin Na Nakorn: > > Hi All, > > > Our data job is very complex (e.g. 100+ joins), and we have switched > > from RDD to Dataset recently. > > > We've found that the unit test takes much longer. We profiled it and > > have found that it's the planning phase that is slow, not execution. > > > I wonder if anyone has encountered this issue before and if there's a > > way to make the planning phase faster (e.g. maybe disabling certain > > optimizers). > > > Any thoughts or input would be appreciated. > > > Thank you, > > Tanin > > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: The Dataset unit test is much slower than the RDD unit test (in Scala)
Hi Tanin, running your test with option "spark.sql.planChangeLog.level" set to "info" or "warn" (depending on your Spark log level) will show you insights into the planning (which rules are applied, how long rules take, how many iterations are done). Hoping this helps, Enrico Am 25.10.22 um 21:54 schrieb Tanin Na Nakorn: Hi All, Our data job is very complex (e.g. 100+ joins), and we have switched from RDD to Dataset recently. We've found that the unit test takes much longer. We profiled it and have found that it's the planning phase that is slow, not execution. I wonder if anyone has encountered this issue before and if there's a way to make the planning phase faster (e.g. maybe disabling certain optimizers). Any thoughts or input would be appreciated. Thank you, Tanin - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
The Dataset unit test is much slower than the RDD unit test (in Scala)
Hi All, Our data job is very complex (e.g. 100+ joins), and we have switched from RDD to Dataset recently. We've found that the unit test takes much longer. We profiled it and have found that it's the planning phase that is slow, not execution. I wonder if anyone has encountered this issue before and if there's a way to make the planning phase faster (e.g. maybe disabling certain optimizers). Any thoughts or input would be appreciated. Thank you, Tanin
Re: ivy unit test case filing for Spark
Are you using IvyVPN which causes this problem? If the VPN software changes the network URL silently you should avoid using them. Regards. On Wed, Dec 22, 2021 at 1:48 AM Pralabh Kumar wrote: > Hi Spark Team > > I am building a spark in VPN . But the unit test case below is failing. > This is pointing to ivy location which cannot be reached within VPN . Any > help would be appreciated > > test("SPARK-33084: Add jar support Ivy URI -- default transitive = true") > { > *sc *= new SparkContext(new > SparkConf().setAppName("test").setMaster("local-cluster[3, > 1, 1024]")) > *sc*.addJar("*ivy://org.apache.hive:hive-storage-api:2.7.0*") > assert(*sc*.listJars().exists(_.contains( > "org.apache.hive_hive-storage-api-2.7.0.jar"))) > assert(*sc*.listJars().exists(_.contains( > "commons-lang_commons-lang-2.6.jar"))) > } > > Error > > - SPARK-33084: Add jar support Ivy URI -- default transitive = true *** > FAILED *** > java.lang.RuntimeException: [unresolved dependency: > org.apache.hive#hive-storage-api;2.7.0: not found] > at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates( > SparkSubmit.scala:1447) > at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies( > DependencyUtils.scala:185) > at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies( > DependencyUtils.scala:159) > at org.apache.spark.SparkContext.addJar(SparkContext.scala:1996) > at org.apache.spark.SparkContext.addJar(SparkContext.scala:1928) > at org.apache.spark.SparkContextSuite.$anonfun$new$115(SparkContextSuite. > scala:1041) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > > Regards > Pralabh Kumar > > >
Re: ivy unit test case filing for Spark
You would have to make it available? This doesn't seem like a spark issue. On Tue, Dec 21, 2021, 10:48 AM Pralabh Kumar wrote: > Hi Spark Team > > I am building a spark in VPN . But the unit test case below is failing. > This is pointing to ivy location which cannot be reached within VPN . Any > help would be appreciated > > test("SPARK-33084: Add jar support Ivy URI -- default transitive = true") > { > *sc *= new SparkContext(new > SparkConf().setAppName("test").setMaster("local-cluster[3, > 1, 1024]")) > *sc*.addJar("*ivy://org.apache.hive:hive-storage-api:2.7.0*") > assert(*sc*.listJars().exists(_.contains( > "org.apache.hive_hive-storage-api-2.7.0.jar"))) > assert(*sc*.listJars().exists(_.contains( > "commons-lang_commons-lang-2.6.jar"))) > } > > Error > > - SPARK-33084: Add jar support Ivy URI -- default transitive = true *** > FAILED *** > java.lang.RuntimeException: [unresolved dependency: > org.apache.hive#hive-storage-api;2.7.0: not found] > at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates( > SparkSubmit.scala:1447) > at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies( > DependencyUtils.scala:185) > at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies( > DependencyUtils.scala:159) > at org.apache.spark.SparkContext.addJar(SparkContext.scala:1996) > at org.apache.spark.SparkContext.addJar(SparkContext.scala:1928) > at org.apache.spark.SparkContextSuite.$anonfun$new$115(SparkContextSuite. > scala:1041) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > > Regards > Pralabh Kumar > > >
ivy unit test case filing for Spark
Hi Spark Team I am building a spark in VPN . But the unit test case below is failing. This is pointing to ivy location which cannot be reached within VPN . Any help would be appreciated test("SPARK-33084: Add jar support Ivy URI -- default transitive = true") { *sc *= new SparkContext(new SparkConf().setAppName("test").setMaster("local-cluster[3, 1, 1024]")) *sc*.addJar("*ivy://org.apache.hive:hive-storage-api:2.7.0*") assert(*sc*.listJars().exists(_.contains( "org.apache.hive_hive-storage-api-2.7.0.jar"))) assert(*sc*.listJars().exists(_.contains( "commons-lang_commons-lang-2.6.jar"))) } Error - SPARK-33084: Add jar support Ivy URI -- default transitive = true *** FAILED *** java.lang.RuntimeException: [unresolved dependency: org.apache.hive#hive-storage-api;2.7.0: not found] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates( SparkSubmit.scala:1447) at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies( DependencyUtils.scala:185) at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies( DependencyUtils.scala:159) at org.apache.spark.SparkContext.addJar(SparkContext.scala:1996) at org.apache.spark.SparkContext.addJar(SparkContext.scala:1928) at org.apache.spark.SparkContextSuite.$anonfun$new$115(SparkContextSuite. scala:1041) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) Regards Pralabh Kumar
Re: Need Unit test complete reference for Pyspark
Hey they are good libraries..to get you started. Have used both of them.. unfortunately -as far as i saw when i started to use them - only few people maintains them. But you can get pointers out of them for writing tests. the code below can get you started What you'll need is - a method to create dataframe on the fly, perhaps from a string. you can have a look at pandas, it will have methods for it - a method to test dataframe equality. you can use df1.subtract(df2) I am assuming you are into dataframes - rather than RDDs, for which the two packages you mention should have everything you need hht marco import logging from pyspark.sql import SparkSession from pyspark import HiveContext from pyspark import SparkConf from pyspark import SparkContext import pyspark from pyspark.sql import SparkSession import pytest import shutil @pytest.fixture def spark_session(): return SparkSession.builder \ .master('local[1]') \ .appName('SparkByExamples.com') \ .getOrCreate() def test_create_table(spark_session): df = spark_session.createDataFrame([['one', 'two']]).toDF(*['first', 'second']) print(df.show()) df2 = spark_session.createDataFrame([['one', 'two']]).toDF(*['first', 'second']) assert df.subtract(df2).count() == 0 On Thu, Nov 19, 2020 at 6:38 AM Sachit Murarka wrote: > Hi Users, > > I have to write Unit Test cases for PySpark. > I think pytest-spark and "spark testing base" are good test libraries. > > Can anyone please provide full reference for writing the test cases in > Python using these? > > Kind Regards, > Sachit Murarka >
Need Unit test complete reference for Pyspark
Hi Users, I have to write Unit Test cases for PySpark. I think pytest-spark and "spark testing base" are good test libraries. Can anyone please provide full reference for writing the test cases in Python using these? Kind Regards, Sachit Murarka
Re: how do i force unit test to do whole stage codegen
Thanks Koert for the kind words. That part however is easy to fix and was surprised to have seen the old style referenced (!) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Apr 5, 2017 at 6:14 PM, Koert Kuipers <ko...@tresata.com> wrote: > its pretty much impossible to be fully up to date with spark given how fast > it moves! > > the book is a very helpful reference > > On Wed, Apr 5, 2017 at 11:15 AM, Jacek Laskowski <ja...@japila.pl> wrote: >> >> Hi, >> >> I'm very sorry for not being up to date with the current style (and >> "promoting" the old style) and am going to review that part soon. I'm very >> close to touch it again since I'm with Optimizer these days. >> >> Jacek >> >> On 5 Apr 2017 6:08 a.m., "Kazuaki Ishizaki" <ishiz...@jp.ibm.com> wrote: >>> >>> Hi, >>> The page in the URL explains the old style of physical plan output. >>> The current style adds "*" as a prefix of each operation that the >>> whole-stage codegen can be apply to. >>> >>> So, in your test case, whole-stage codegen has been already enabled!! >>> >>> FYI. I think that it is a good topic for d...@spark.apache.org. >>> >>> Kazuaki Ishizaki >>> >>> >>> >>> From:Koert Kuipers <ko...@tresata.com> >>> To:"user@spark.apache.org" <user@spark.apache.org> >>> Date:2017/04/05 05:12 >>> Subject:how do i force unit test to do whole stage codegen >>> >>> >>> >>> >>> i wrote my own expression with eval and doGenCode, but doGenCode never >>> gets called in tests. >>> >>> also as a test i ran this in a unit test: >>> spark.range(10).select('id as 'asId).where('id === 4).explain >>> according to >>> >>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html >>> this is supposed to show: >>> == Physical Plan == >>> WholeStageCodegen >>> : +- Project [id#0L AS asId#3L] >>> : +- Filter (id#0L = 4) >>> :+- Range 0, 1, 8, 10, [id#0L] >>> >>> but it doesn't. instead it shows: >>> >>> == Physical Plan == >>> *Project [id#12L AS asId#15L] >>> +- *Filter (id#12L = 4) >>> +- *Range (0, 10, step=1, splits=Some(4)) >>> >>> so i am again missing the WholeStageCodegen. any idea why? >>> >>> i create spark session for unit tests simply as: >>> val session = SparkSession.builder >>> .master("local[*]") >>> .appName("test") >>> .config("spark.sql.shuffle.partitions", 4) >>> .getOrCreate() >>> >>> > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: how do i force unit test to do whole stage codegen
its pretty much impossible to be fully up to date with spark given how fast it moves! the book is a very helpful reference On Wed, Apr 5, 2017 at 11:15 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > I'm very sorry for not being up to date with the current style (and > "promoting" the old style) and am going to review that part soon. I'm very > close to touch it again since I'm with Optimizer these days. > > Jacek > > On 5 Apr 2017 6:08 a.m., "Kazuaki Ishizaki" <ishiz...@jp.ibm.com> wrote: > >> Hi, >> The page in the URL explains the old style of physical plan output. >> The current style adds "*" as a prefix of each operation that the >> whole-stage codegen can be apply to. >> >> So, in your test case, whole-stage codegen has been already enabled!! >> >> FYI. I think that it is a good topic for d...@spark.apache.org. >> >> Kazuaki Ishizaki >> >> >> >> From:Koert Kuipers <ko...@tresata.com> >> To:"user@spark.apache.org" <user@spark.apache.org> >> Date:2017/04/05 05:12 >> Subject:how do i force unit test to do whole stage codegen >> -- >> >> >> >> i wrote my own expression with eval and doGenCode, but doGenCode never >> gets called in tests. >> >> also as a test i ran this in a unit test: >> spark.range(10).select('id as 'asId).where('id === 4).explain >> according to >> >> *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html* >> <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html> >> this is supposed to show: >> == Physical Plan == >> WholeStageCodegen >> : +- Project [id#0L AS asId#3L] >> : +- Filter (id#0L = 4) >> :+- Range 0, 1, 8, 10, [id#0L] >> >> but it doesn't. instead it shows: >> >> == Physical Plan == >> *Project [id#12L AS asId#15L] >> +- *Filter (id#12L = 4) >> +- *Range (0, 10, step=1, splits=Some(4)) >> >> so i am again missing the WholeStageCodegen. any idea why? >> >> i create spark session for unit tests simply as: >> val session = SparkSession.builder >> .master("local[*]") >> .appName("test") >> .config("spark.sql.shuffle.partitions", 4) >> .getOrCreate() >> >> >>
Re: how do i force unit test to do whole stage codegen
Hi, I'm very sorry for not being up to date with the current style (and "promoting" the old style) and am going to review that part soon. I'm very close to touch it again since I'm with Optimizer these days. Jacek On 5 Apr 2017 6:08 a.m., "Kazuaki Ishizaki" <ishiz...@jp.ibm.com> wrote: > Hi, > The page in the URL explains the old style of physical plan output. > The current style adds "*" as a prefix of each operation that the > whole-stage codegen can be apply to. > > So, in your test case, whole-stage codegen has been already enabled!! > > FYI. I think that it is a good topic for d...@spark.apache.org. > > Kazuaki Ishizaki > > > > From:Koert Kuipers <ko...@tresata.com> > To:"user@spark.apache.org" <user@spark.apache.org> > Date:2017/04/05 05:12 > Subject:how do i force unit test to do whole stage codegen > -- > > > > i wrote my own expression with eval and doGenCode, but doGenCode never > gets called in tests. > > also as a test i ran this in a unit test: > spark.range(10).select('id as 'asId).where('id === 4).explain > according to > > *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html* > <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html> > this is supposed to show: > == Physical Plan == > WholeStageCodegen > : +- Project [id#0L AS asId#3L] > : +- Filter (id#0L = 4) > :+- Range 0, 1, 8, 10, [id#0L] > > but it doesn't. instead it shows: > > == Physical Plan == > *Project [id#12L AS asId#15L] > +- *Filter (id#12L = 4) > +- *Range (0, 10, step=1, splits=Some(4)) > > so i am again missing the WholeStageCodegen. any idea why? > > i create spark session for unit tests simply as: > val session = SparkSession.builder > .master("local[*]") > .appName("test") > .config("spark.sql.shuffle.partitions", 4) > .getOrCreate() > > >
Re: how do i force unit test to do whole stage codegen
got it. thats good to know. thanks! On Wed, Apr 5, 2017 at 12:07 AM, Kazuaki Ishizaki <ishiz...@jp.ibm.com> wrote: > Hi, > The page in the URL explains the old style of physical plan output. > The current style adds "*" as a prefix of each operation that the > whole-stage codegen can be apply to. > > So, in your test case, whole-stage codegen has been already enabled!! > > FYI. I think that it is a good topic for d...@spark.apache.org. > > Kazuaki Ishizaki > > > > From:Koert Kuipers <ko...@tresata.com> > To:"user@spark.apache.org" <user@spark.apache.org> > Date:2017/04/05 05:12 > Subject:how do i force unit test to do whole stage codegen > -- > > > > i wrote my own expression with eval and doGenCode, but doGenCode never > gets called in tests. > > also as a test i ran this in a unit test: > spark.range(10).select('id as 'asId).where('id === 4).explain > according to > > *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html* > <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html> > this is supposed to show: > == Physical Plan == > WholeStageCodegen > : +- Project [id#0L AS asId#3L] > : +- Filter (id#0L = 4) > :+- Range 0, 1, 8, 10, [id#0L] > > but it doesn't. instead it shows: > > == Physical Plan == > *Project [id#12L AS asId#15L] > +- *Filter (id#12L = 4) > +- *Range (0, 10, step=1, splits=Some(4)) > > so i am again missing the WholeStageCodegen. any idea why? > > i create spark session for unit tests simply as: > val session = SparkSession.builder > .master("local[*]") > .appName("test") > .config("spark.sql.shuffle.partitions", 4) > .getOrCreate() > > >
Re: how do i force unit test to do whole stage codegen
Hi, The page in the URL explains the old style of physical plan output. The current style adds "*" as a prefix of each operation that the whole-stage codegen can be apply to. So, in your test case, whole-stage codegen has been already enabled!! FYI. I think that it is a good topic for d...@spark.apache.org. Kazuaki Ishizaki From: Koert Kuipers <ko...@tresata.com> To: "user@spark.apache.org" <user@spark.apache.org> Date: 2017/04/05 05:12 Subject:how do i force unit test to do whole stage codegen i wrote my own expression with eval and doGenCode, but doGenCode never gets called in tests. also as a test i ran this in a unit test: spark.range(10).select('id as 'asId).where('id === 4).explain according to https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html this is supposed to show: == Physical Plan == WholeStageCodegen : +- Project [id#0L AS asId#3L] : +- Filter (id#0L = 4) :+- Range 0, 1, 8, 10, [id#0L] but it doesn't. instead it shows: == Physical Plan == *Project [id#12L AS asId#15L] +- *Filter (id#12L = 4) +- *Range (0, 10, step=1, splits=Some(4)) so i am again missing the WholeStageCodegen. any idea why? i create spark session for unit tests simply as: val session = SparkSession.builder .master("local[*]") .appName("test") .config("spark.sql.shuffle.partitions", 4) .getOrCreate()
how do i force unit test to do whole stage codegen
i wrote my own expression with eval and doGenCode, but doGenCode never gets called in tests. also as a test i ran this in a unit test: spark.range(10).select('id as 'asId).where('id === 4).explain according to https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html this is supposed to show: == Physical Plan ==WholeStageCodegen : +- Project [id#0L AS asId#3L] : +- Filter (id#0L = 4) :+- Range 0, 1, 8, 10, [id#0L] but it doesn't. instead it shows: == Physical Plan == *Project [id#12L AS asId#15L] +- *Filter (id#12L = 4) +- *Range (0, 10, step=1, splits=Some(4)) so i am again missing the WholeStageCodegen. any idea why? i create spark session for unit tests simply as: val session = SparkSession.builder .master("local[*]") .appName("test") .config("spark.sql.shuffle.partitions", 4) .getOrCreate()
Re: How to unit test spark streaming?
Agreed with the statement in quotes below whether one wants to do unit tests or not It is a good practice to write code that way. But I think the more painful and tedious task is to mock/emulate all the nodes such as spark workers/master/hdfs/input source stream and all that. I wish there is something really simple. Perhaps the simplest thing to do is just to do integration tests which also tests the transformations/business logic. This way I can spawn a small cluster and run my tests and bring my cluster down when I am done. And sure if the cluster isn't available then I can't run the tests however some node should be available even to run a single process. I somehow feel like we may doing too much work to fit into the archaic definition of unit tests. "Basically you abstract your transformations to take in a dataframe and return one, then you assert on the returned df " this On Tue, Mar 7, 2017 at 11:14 AM, Michael Armbrustwrote: > Basically you abstract your transformations to take in a dataframe and >> return one, then you assert on the returned df >> > > +1 to this suggestion. This is why we wanted streaming and batch > dataframes to share the same API. >
Re: How to unit test spark streaming?
> > Basically you abstract your transformations to take in a dataframe and > return one, then you assert on the returned df > +1 to this suggestion. This is why we wanted streaming and batch dataframes to share the same API.
Re: How to unit test spark streaming?
This depends on your target setup! I run for example for my open source libraries for spark integration tests (a dedicated folder a side the unit tests) a local spark master, but also use a minidfs cluster (to simulate HDFS on a node) and sometimes also a miniyarn cluster (see https://wiki.apache.org/hadoop/HowToDevelopUnitTests). An example can be found here: https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/spark-bitcoinblock or - if you need Scala - https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/scala-spark-bitcoinblock In both cases it is in the integration-tests (Java) or it (Scala) folder. Spark Streaming - I have no open source example at hand, but basically you need to simulate the source and the rest is as above. I will eventually write a blog post about this with more details. > On 7 Mar 2017, at 13:04, kant kodali <kanth...@gmail.com> wrote: > > Hi All, > > How to unit test spark streaming or spark in general? How do I test the > results of my transformations? Also, more importantly don't we need to spawn > master and worker JVM's either in one or multiple nodes? > > Thanks! > kant - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: How to unit test spark streaming?
Hey kant You can use holdens spark test base Have a look at some of the specs I wrote here to give you an idea https://github.com/samelamin/spark-bigquery/blob/master/src/test/scala/com/samelamin/spark/bigquery/BigQuerySchemaSpecs.scala Basically you abstract your transformations to take in a dataframe and return one, then you assert on the returned df Regards Sam On Tue, 7 Mar 2017 at 12:05, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > How to unit test spark streaming or spark in general? How do I test the > results of my transformations? Also, more importantly don't we need to > spawn master and worker JVM's either in one or multiple nodes? > > Thanks! > kant >
How to unit test spark streaming?
Hi All, How to unit test spark streaming or spark in general? How do I test the results of my transformations? Also, more importantly don't we need to spawn master and worker JVM's either in one or multiple nodes? Thanks! kant
Error in run multiple unit test that extends DataFrameSuiteBase
After I created two test case that FlatSpec with DataFrameSuiteBase. But I got errors when do sbt test. I was able to run each of them separately. My test cases does use sqlContext to read files. Here is the exception stack. Judging from the exception, I may need to unregister RpcEndpoint after each test run. info] Exception encountered when attempting to run a suite with class name: MyTestSuit *** ABORTED *** [info] java.lang.IllegalArgumentException: There is already an RpcEndpoint called LocalSchedulerBackendEndpoint [info] at org.apache.spark.rpc.netty.Dispatcher.registerRpcEndpoint(Dispatcher.scala:66) [info] at org.apache.spark.rpc.netty.NettyRpcEnv.setupEndpoint(NettyRpcEnv.scala:129) [info] at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:127) [info] at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149) [info] at org.apache.spark.SparkContext.(SparkContext.scala:500)
RE: How this unit test passed on master trunk?
So in that case then the result will be following: [1,[1,1]][3,[3,1]][2,[2,1]]Thanks for explaining the meaning of the it. But the question is that how first() will be [3,[1,1]]? In fact, if there were any ordering in the final result, it will be [1,[1,1]], instead of [3,[1,1]], correct? Yong Subject: Re: How this unit test passed on master trunk? From: zzh...@hortonworks.com To: java8...@hotmail.com; gatorsm...@gmail.com CC: user@spark.apache.org Date: Sun, 24 Apr 2016 04:37:11 + There are multiple records for the DF scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).show +---+-+ | a|min(struct(unresolvedstar()))| +---+-+ | 1|[1,1]| | 3|[3,1]| | 2|[2,1]| The meaning of .groupBy($"a").agg(min(struct($"record.*"))) is to get the min for all the records with the same $”a” For example: TestData2(1,1) :: TestData2(1,2) The result would be 1, (1, 1), since struct(1, 1) is less than struct(1, 2). Please check how the Ordering is implemented in InterpretedOrdering. The output itself does not have any ordering. I am not sure why the unit test and the real env have different environment. Xiao, I do see the difference between unit test and local cluster run. Do you know the reason? Thanks. Zhan Zhang On Apr 22, 2016, at 11:23 AM, Yong Zhang <java8...@hotmail.com> wrote: Hi, I was trying to find out why this unit test can pass in Spark code. in https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala for this unit test: test("Star Expansion - CreateStruct and CreateArray") { val structDf = testData2.select("a", "b").as("record") // CreateStruct and CreateArray in aggregateExpressions assert(structDf.groupBy($"a").agg(min(struct($"record.*"))).first() == Row(3, Row(3, 1))) assert(structDf.groupBy($"a").agg(min(array($"record.*"))).first() == Row(3, Seq(3, 1))) // CreateStruct and CreateArray in project list (unresolved alias) assert(structDf.select(struct($"record.*")).first() == Row(Row(1, 1))) assert(structDf.select(array($"record.*")).first().getAs[Seq[Int]](0) === Seq(1, 1)) // CreateStruct and CreateArray in project list (alias) assert(structDf.select(struct($"record.*").as("a")).first() == Row(Row(1, 1))) assert(structDf.select(array($"record.*").as("a")).first().getAs[Seq[Int]](0) === Seq(1, 1)) } >From my understanding, the data return in this case should be Row(1, Row(1, >1]), as that will be min of struct. In fact, if I run the spark-shell on my laptop, and I got the result I expected: ./bin/spark-shell Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) Type in expressions to have them evaluated. Type :help for more information. scala> case class TestData2(a: Int, b: Int) defined class TestData2 scala> val testData2DF = sqlContext.sparkContext.parallelize(TestData2(1,1) :: TestData2(1,2) :: TestData2(2,1) :: TestData2(2,2) :: TestData2(3,1) :: TestData2(3,2) :: Nil, 2).toDF() scala> val structDF = testData2DF.select("a","b").as("record") scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).first() res0: org.apache.spark.sql.Row = [1,[1,1]] scala> structDF.show +---+---+ | a| b| +---+---+ | 1| 1| | 1| 2| | 2| 1| | 2| 2| | 3| 1| | 3| 2| +---+---+ So from my spark, which I built on the master, I cannot get Row[3,[1,1]] back in this case. Why the unit test asserts that Row[3,[1,1]] should be the first, and it will pass? But I cannot reproduce that in my spark-shell? I am trying to understand how to interpret the meaning of "agg(min(struct($"record.*")))" Thanks Yong
Re: How this unit test passed on master trunk?
There are multiple records for the DF scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).show +---+-+ | a|min(struct(unresolvedstar()))| +---+-+ | 1|[1,1]| | 3|[3,1]| | 2|[2,1]| The meaning of .groupBy($"a").agg(min(struct($"record.*"))) is to get the min for all the records with the same $”a” For example: TestData2(1,1) :: TestData2(1,2) The result would be 1, (1, 1), since struct(1, 1) is less than struct(1, 2). Please check how the Ordering is implemented in InterpretedOrdering. The output itself does not have any ordering. I am not sure why the unit test and the real env have different environment. Xiao, I do see the difference between unit test and local cluster run. Do you know the reason? Thanks. Zhan Zhang On Apr 22, 2016, at 11:23 AM, Yong Zhang <java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote: Hi, I was trying to find out why this unit test can pass in Spark code. in https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala for this unit test: test("Star Expansion - CreateStruct and CreateArray") { val structDf = testData2.select("a", "b").as("record") // CreateStruct and CreateArray in aggregateExpressions assert(structDf.groupBy($"a").agg(min(struct($"record.*"))).first() == Row(3, Row(3, 1))) assert(structDf.groupBy($"a").agg(min(array($"record.*"))).first() == Row(3, Seq(3, 1))) // CreateStruct and CreateArray in project list (unresolved alias) assert(structDf.select(struct($"record.*")).first() == Row(Row(1, 1))) assert(structDf.select(array($"record.*")).first().getAs[Seq[Int]](0) === Seq(1, 1)) // CreateStruct and CreateArray in project list (alias) assert(structDf.select(struct($"record.*").as("a")).first() == Row(Row(1, 1))) assert(structDf.select(array($"record.*").as("a")).first().getAs[Seq[Int]](0) === Seq(1, 1)) } >From my understanding, the data return in this case should be Row(1, Row(1, >1]), as that will be min of struct. In fact, if I run the spark-shell on my laptop, and I got the result I expected: ./bin/spark-shell Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) Type in expressions to have them evaluated. Type :help for more information. scala> case class TestData2(a: Int, b: Int) defined class TestData2 scala> val testData2DF = sqlContext.sparkContext.parallelize(TestData2(1,1) :: TestData2(1,2) :: TestData2(2,1) :: TestData2(2,2) :: TestData2(3,1) :: TestData2(3,2) :: Nil, 2).toDF() scala> val structDF = testData2DF.select("a","b").as("record") scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).first() res0: org.apache.spark.sql.Row = [1,[1,1]] scala> structDF.show +---+---+ | a| b| +---+---+ | 1| 1| | 1| 2| | 2| 1| | 2| 2| | 3| 1| | 3| 2| +---+---+ So from my spark, which I built on the master, I cannot get Row[3,[1,1]] back in this case. Why the unit test asserts that Row[3,[1,1]] should be the first, and it will pass? But I cannot reproduce that in my spark-shell? I am trying to understand how to interpret the meaning of "agg(min(struct($"record.*")))" Thanks Yong
Re: How this unit test passed on master trunk?
This was added by Xiao through: [SPARK-13320][SQL] Support Star in CreateStruct/CreateArray and Error Handling when DataFrame/DataSet Functions using Star I tried in spark-shell and got: scala> val first = structDf.groupBy($"a").agg(min(struct($"record.*"))).first() first: org.apache.spark.sql.Row = [1,[1,1]] BTW https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/715/consoleFull shows this test passing. On Fri, Apr 22, 2016 at 11:23 AM, Yong Zhang <java8...@hotmail.com> wrote: > Hi, > > I was trying to find out why this unit test can pass in Spark code. > > in > > https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala > > for this unit test: > > test("Star Expansion - CreateStruct and CreateArray") { > val structDf = testData2.select("a", "b").as("record") > // CreateStruct and CreateArray in aggregateExpressions > *assert(structDf.groupBy($"a").agg(min(struct($"record.*"))).first() == > Row(3, Row(3, 1)))* > assert(structDf.groupBy($"a").agg(min(array($"record.*"))).first() == > Row(3, Seq(3, 1))) > > // CreateStruct and CreateArray in project list (unresolved alias) > assert(structDf.select(struct($"record.*")).first() == Row(Row(1, 1))) > assert(structDf.select(array($"record.*")).first().getAs[Seq[Int]](0) === > Seq(1, 1)) > > // CreateStruct and CreateArray in project list (alias) > assert(structDf.select(struct($"record.*").as("a")).first() == Row(Row(1, > 1))) > > assert(structDf.select(array($"record.*").as("a")).first().getAs[Seq[Int]](0) > === Seq(1, 1)) > } > > From my understanding, the data return in this case should be Row(1, Row(1, > 1]), as that will be min of struct. > > In fact, if I run the spark-shell on my laptop, and I got the result I > expected: > > > ./bin/spark-shell > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > /_/ > > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> case class TestData2(a: Int, b: Int) > defined class TestData2 > > scala> val testData2DF = sqlContext.sparkContext.parallelize(TestData2(1,1) > :: TestData2(1,2) :: TestData2(2,1) :: TestData2(2,2) :: TestData2(3,1) :: > TestData2(3,2) :: Nil, 2).toDF() > > scala> val structDF = testData2DF.select("a","b").as("record") > > scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).first() > res0: org.apache.spark.sql.Row = [1,[1,1]] > > scala> structDF.show > +---+---+ > | a| b| > +---+---+ > | 1| 1| > | 1| 2| > | 2| 1| > | 2| 2| > | 3| 1| > | 3| 2| > +---+---+ > > So from my spark, which I built on the master, I cannot get Row[3,[1,1]] back > in this case. Why the unit test asserts that Row[3,[1,1]] should be the > first, and it will pass? But I cannot reproduce that in my spark-shell? I am > trying to understand how to interpret the meaning of > "agg(min(struct($"record.*")))" > > > Thanks > > Yong > >
How this unit test passed on master trunk?
Hi, I was trying to find out why this unit test can pass in Spark code. inhttps://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala for this unit test: test("Star Expansion - CreateStruct and CreateArray") { val structDf = testData2.select("a", "b").as("record") // CreateStruct and CreateArray in aggregateExpressions assert(structDf.groupBy($"a").agg(min(struct($"record.*"))).first() == Row(3, Row(3, 1))) assert(structDf.groupBy($"a").agg(min(array($"record.*"))).first() == Row(3, Seq(3, 1))) // CreateStruct and CreateArray in project list (unresolved alias) assert(structDf.select(struct($"record.*")).first() == Row(Row(1, 1))) assert(structDf.select(array($"record.*")).first().getAs[Seq[Int]](0) === Seq(1, 1)) // CreateStruct and CreateArray in project list (alias) assert(structDf.select(struct($"record.*").as("a")).first() == Row(Row(1, 1))) assert(structDf.select(array($"record.*").as("a")).first().getAs[Seq[Int]](0) === Seq(1, 1)) }From my understanding, the data return in this case should be Row(1, Row(1, 1]), as that will be min of struct.In fact, if I run the spark-shell on my laptop, and I got the result I expected: ./bin/spark-shell Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) Type in expressions to have them evaluated. Type :help for more information. scala> case class TestData2(a: Int, b: Int) defined class TestData2scala> val testData2DF = sqlContext.sparkContext.parallelize(TestData2(1,1) :: TestData2(1,2) :: TestData2(2,1) :: TestData2(2,2) :: TestData2(3,1) :: TestData2(3,2) :: Nil, 2).toDF()scala> val structDF = testData2DF.select("a","b").as("record")scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).first() res0: org.apache.spark.sql.Row = [1,[1,1]] scala> structDF.show +---+---+ | a| b| +---+---+ | 1| 1| | 1| 2| | 2| 1| | 2| 2| | 3| 1| | 3| 2| +---+---+So from my spark, which I built on the master, I cannot get Row[3,[1,1]] back in this case. Why the unit test asserts that Row[3,[1,1]] should be the first, and it will pass? But I cannot reproduce that in my spark-shell? I am trying to understand how to interpret the meaning of "agg(min(struct($"record.*")))" ThanksYong
Re: Unit test with sqlContext
If you prefer the py.test framework, I just wrote a blog post with some examples: Unit testing Apache Spark with py.test https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b On Fri, Feb 5, 2016 at 11:43 AM, Steve Annessa <steve.anne...@gmail.com> wrote: > Thanks for all of the responses. > > I do have an afterAll that stops the sc. > > While looking over Holden's readme I noticed she mentioned "Make sure to > disable parallel execution." That was what I was missing; I added the > follow to my build.sbt: > > ``` > parallelExecution in Test := false > ``` > > Now all of my tests are running. > > I'm going to look into using the package she created. > > Thanks again, > > -- Steve > > > On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rmis...@snappydata.io> > wrote: > >> Hi Steve, >> Have you cleaned up your SparkContext ( sc.stop()) , in a afterAll(). >> The error suggests you are creating more than one SparkContext. >> >> >> On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca> >> wrote: >> >>> Thanks for recommending spark-testing-base :) Just wanted to add if >>> anyone has feature requests for Spark testing please get in touch (or add >>> an issue on the github) :) >>> >>> >>> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito < >>> silvio.fior...@granturing.com> wrote: >>> >>>> Hi Steve, >>>> >>>> Have you looked at the spark-testing-base package by Holden? It’s >>>> really useful for unit testing Spark apps as it handles all the >>>> bootstrapping for you. >>>> >>>> https://github.com/holdenk/spark-testing-base >>>> >>>> DataFrame examples are here: >>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala >>>> >>>> Thanks, >>>> Silvio >>>> >>>> From: Steve Annessa <steve.anne...@gmail.com> >>>> Date: Thursday, February 4, 2016 at 8:36 PM >>>> To: "user@spark.apache.org" <user@spark.apache.org> >>>> Subject: Unit test with sqlContext >>>> >>>> I'm trying to unit test a function that reads in a JSON file, >>>> manipulates the DF and then returns a Scala Map. >>>> >>>> The function has signature: >>>> def ingest(dataLocation: String, sc: SparkContext, sqlContext: >>>> SQLContext) >>>> >>>> I've created a bootstrap spec for spark jobs that instantiates the >>>> Spark Context and SQLContext like so: >>>> >>>> @transient var sc: SparkContext = _ >>>> @transient var sqlContext: SQLContext = _ >>>> >>>> override def beforeAll = { >>>> System.clearProperty("spark.driver.port") >>>> System.clearProperty("spark.hostPort") >>>> >>>> val conf = new SparkConf() >>>> .setMaster(master) >>>> .setAppName(appName) >>>> >>>> sc = new SparkContext(conf) >>>> sqlContext = new SQLContext(sc) >>>> } >>>> >>>> When I do not include sqlContext, my tests run. Once I add the >>>> sqlContext I get the following errors: >>>> >>>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being >>>> constructed (or threw an exception in its constructor). This may indicate >>>> an error, since only one SparkContext may be running in this JVM (see >>>> SPARK-2243). The other SparkContext was created at: >>>> org.apache.spark.SparkContext.(SparkContext.scala:81) >>>> >>>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. >>>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is >>>> not unique! >>>> >>>> and finally: >>>> >>>> [info] IngestSpec: >>>> [info] Exception encountered when attempting to run a suite with class >>>> name: com.company.package.IngestSpec *** ABORTED *** >>>> [info] akka.actor.InvalidActorNameException: actor name >>>> [ExecutorEndpoint] is not unique! >>>> >>>> >>>> What do I need to do to get a sqlContext through my tests? >>>> >>>> Thanks, >>>> >>>> -- Steve >>>> >>> >>> >>> >>> -- >>> Cell : 425-233-8271 >>> Twitter: https://twitter.com/holdenkarau >>> >> >> >> >> -- >> Regards, >> Rishitesh Mishra, >> SnappyData . (http://www.snappydata.io/) >> >> https://in.linkedin.com/in/rishiteshmishra >> > >
Re: Unit test with sqlContext
Thanks for all of the responses. I do have an afterAll that stops the sc. While looking over Holden's readme I noticed she mentioned "Make sure to disable parallel execution." That was what I was missing; I added the follow to my build.sbt: ``` parallelExecution in Test := false ``` Now all of my tests are running. I'm going to look into using the package she created. Thanks again, -- Steve On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rmis...@snappydata.io> wrote: > Hi Steve, > Have you cleaned up your SparkContext ( sc.stop()) , in a afterAll(). The > error suggests you are creating more than one SparkContext. > > > On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca> > wrote: > >> Thanks for recommending spark-testing-base :) Just wanted to add if >> anyone has feature requests for Spark testing please get in touch (or add >> an issue on the github) :) >> >> >> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito < >> silvio.fior...@granturing.com> wrote: >> >>> Hi Steve, >>> >>> Have you looked at the spark-testing-base package by Holden? It’s really >>> useful for unit testing Spark apps as it handles all the bootstrapping for >>> you. >>> >>> https://github.com/holdenk/spark-testing-base >>> >>> DataFrame examples are here: >>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala >>> >>> Thanks, >>> Silvio >>> >>> From: Steve Annessa <steve.anne...@gmail.com> >>> Date: Thursday, February 4, 2016 at 8:36 PM >>> To: "user@spark.apache.org" <user@spark.apache.org> >>> Subject: Unit test with sqlContext >>> >>> I'm trying to unit test a function that reads in a JSON file, >>> manipulates the DF and then returns a Scala Map. >>> >>> The function has signature: >>> def ingest(dataLocation: String, sc: SparkContext, sqlContext: >>> SQLContext) >>> >>> I've created a bootstrap spec for spark jobs that instantiates the Spark >>> Context and SQLContext like so: >>> >>> @transient var sc: SparkContext = _ >>> @transient var sqlContext: SQLContext = _ >>> >>> override def beforeAll = { >>> System.clearProperty("spark.driver.port") >>> System.clearProperty("spark.hostPort") >>> >>> val conf = new SparkConf() >>> .setMaster(master) >>> .setAppName(appName) >>> >>> sc = new SparkContext(conf) >>> sqlContext = new SQLContext(sc) >>> } >>> >>> When I do not include sqlContext, my tests run. Once I add the >>> sqlContext I get the following errors: >>> >>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being >>> constructed (or threw an exception in its constructor). This may indicate >>> an error, since only one SparkContext may be running in this JVM (see >>> SPARK-2243). The other SparkContext was created at: >>> org.apache.spark.SparkContext.(SparkContext.scala:81) >>> >>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. >>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is >>> not unique! >>> >>> and finally: >>> >>> [info] IngestSpec: >>> [info] Exception encountered when attempting to run a suite with class >>> name: com.company.package.IngestSpec *** ABORTED *** >>> [info] akka.actor.InvalidActorNameException: actor name >>> [ExecutorEndpoint] is not unique! >>> >>> >>> What do I need to do to get a sqlContext through my tests? >>> >>> Thanks, >>> >>> -- Steve >>> >> >> >> >> -- >> Cell : 425-233-8271 >> Twitter: https://twitter.com/holdenkarau >> > > > > -- > Regards, > Rishitesh Mishra, > SnappyData . (http://www.snappydata.io/) > > https://in.linkedin.com/in/rishiteshmishra >
Unit test with sqlContext
I'm trying to unit test a function that reads in a JSON file, manipulates the DF and then returns a Scala Map. The function has signature: def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) I've created a bootstrap spec for spark jobs that instantiates the Spark Context and SQLContext like so: @transient var sc: SparkContext = _ @transient var sqlContext: SQLContext = _ override def beforeAll = { System.clearProperty("spark.driver.port") System.clearProperty("spark.hostPort") val conf = new SparkConf() .setMaster(master) .setAppName(appName) sc = new SparkContext(conf) sqlContext = new SQLContext(sc) } When I do not include sqlContext, my tests run. Once I add the sqlContext I get the following errors: 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.SparkContext.(SparkContext.scala:81) 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! and finally: [info] IngestSpec: [info] Exception encountered when attempting to run a suite with class name: com.company.package.IngestSpec *** ABORTED *** [info] akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! What do I need to do to get a sqlContext through my tests? Thanks, -- Steve
Re: Unit test with sqlContext
Hi Steve, Have you looked at the spark-testing-base package by Holden? It’s really useful for unit testing Spark apps as it handles all the bootstrapping for you. https://github.com/holdenk/spark-testing-base DataFrame examples are here: https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala Thanks, Silvio From: Steve Annessa <steve.anne...@gmail.com<mailto:steve.anne...@gmail.com>> Date: Thursday, February 4, 2016 at 8:36 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Unit test with sqlContext I'm trying to unit test a function that reads in a JSON file, manipulates the DF and then returns a Scala Map. The function has signature: def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) I've created a bootstrap spec for spark jobs that instantiates the Spark Context and SQLContext like so: @transient var sc: SparkContext = _ @transient var sqlContext: SQLContext = _ override def beforeAll = { System.clearProperty("spark.driver.port") System.clearProperty("spark.hostPort") val conf = new SparkConf() .setMaster(master) .setAppName(appName) sc = new SparkContext(conf) sqlContext = new SQLContext(sc) } When I do not include sqlContext, my tests run. Once I add the sqlContext I get the following errors: 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.SparkContext.(SparkContext.scala:81) 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! and finally: [info] IngestSpec: [info] Exception encountered when attempting to run a suite with class name: com.company.package.IngestSpec *** ABORTED *** [info] akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not unique! What do I need to do to get a sqlContext through my tests? Thanks, -- Steve
Re: Unit test with sqlContext
Hi Steve, Have you cleaned up your SparkContext ( sc.stop()) , in a afterAll(). The error suggests you are creating more than one SparkContext. On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca> wrote: > Thanks for recommending spark-testing-base :) Just wanted to add if anyone > has feature requests for Spark testing please get in touch (or add an issue > on the github) :) > > > On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito < > silvio.fior...@granturing.com> wrote: > >> Hi Steve, >> >> Have you looked at the spark-testing-base package by Holden? It’s really >> useful for unit testing Spark apps as it handles all the bootstrapping for >> you. >> >> https://github.com/holdenk/spark-testing-base >> >> DataFrame examples are here: >> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala >> >> Thanks, >> Silvio >> >> From: Steve Annessa <steve.anne...@gmail.com> >> Date: Thursday, February 4, 2016 at 8:36 PM >> To: "user@spark.apache.org" <user@spark.apache.org> >> Subject: Unit test with sqlContext >> >> I'm trying to unit test a function that reads in a JSON file, manipulates >> the DF and then returns a Scala Map. >> >> The function has signature: >> def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) >> >> I've created a bootstrap spec for spark jobs that instantiates the Spark >> Context and SQLContext like so: >> >> @transient var sc: SparkContext = _ >> @transient var sqlContext: SQLContext = _ >> >> override def beforeAll = { >> System.clearProperty("spark.driver.port") >> System.clearProperty("spark.hostPort") >> >> val conf = new SparkConf() >> .setMaster(master) >> .setAppName(appName) >> >> sc = new SparkContext(conf) >> sqlContext = new SQLContext(sc) >> } >> >> When I do not include sqlContext, my tests run. Once I add the sqlContext >> I get the following errors: >> >> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being >> constructed (or threw an exception in its constructor). This may indicate >> an error, since only one SparkContext may be running in this JVM (see >> SPARK-2243). The other SparkContext was created at: >> org.apache.spark.SparkContext.(SparkContext.scala:81) >> >> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. >> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is >> not unique! >> >> and finally: >> >> [info] IngestSpec: >> [info] Exception encountered when attempting to run a suite with class >> name: com.company.package.IngestSpec *** ABORTED *** >> [info] akka.actor.InvalidActorNameException: actor name >> [ExecutorEndpoint] is not unique! >> >> >> What do I need to do to get a sqlContext through my tests? >> >> Thanks, >> >> -- Steve >> > > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > -- Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra
Re: Unit test with sqlContext
Thanks for recommending spark-testing-base :) Just wanted to add if anyone has feature requests for Spark testing please get in touch (or add an issue on the github) :) On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > Hi Steve, > > Have you looked at the spark-testing-base package by Holden? It’s really > useful for unit testing Spark apps as it handles all the bootstrapping for > you. > > https://github.com/holdenk/spark-testing-base > > DataFrame examples are here: > https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala > > Thanks, > Silvio > > From: Steve Annessa <steve.anne...@gmail.com> > Date: Thursday, February 4, 2016 at 8:36 PM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: Unit test with sqlContext > > I'm trying to unit test a function that reads in a JSON file, manipulates > the DF and then returns a Scala Map. > > The function has signature: > def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext) > > I've created a bootstrap spec for spark jobs that instantiates the Spark > Context and SQLContext like so: > > @transient var sc: SparkContext = _ > @transient var sqlContext: SQLContext = _ > > override def beforeAll = { > System.clearProperty("spark.driver.port") > System.clearProperty("spark.hostPort") > > val conf = new SparkConf() > .setMaster(master) > .setAppName(appName) > > sc = new SparkContext(conf) > sqlContext = new SQLContext(sc) > } > > When I do not include sqlContext, my tests run. Once I add the sqlContext > I get the following errors: > > 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being > constructed (or threw an exception in its constructor). This may indicate > an error, since only one SparkContext may be running in this JVM (see > SPARK-2243). The other SparkContext was created at: > org.apache.spark.SparkContext.(SparkContext.scala:81) > > 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext. > akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not > unique! > > and finally: > > [info] IngestSpec: > [info] Exception encountered when attempting to run a suite with class > name: com.company.package.IngestSpec *** ABORTED *** > [info] akka.actor.InvalidActorNameException: actor name > [ExecutorEndpoint] is not unique! > > > What do I need to do to get a sqlContext through my tests? > > Thanks, > > -- Steve > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau
Re: how to run unit test for specific component only
try: mvn test -pl sql -DwildcardSuites=org.apache.spark.sql -Dtest=none On 12 Nov 2015, at 03:13, weoccc <weo...@gmail.com<mailto:weo...@gmail.com>> wrote: Hi, I am wondering how to run unit test for specific spark component only. mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none The above command doesn't seem to work. I'm using spark 1.5. Thanks, Weide
Re: how to run unit test for specific component only
Have you tried the following ? build/sbt "sql/test-only *" Cheers On Wed, Nov 11, 2015 at 7:13 PM, weoccc <weo...@gmail.com> wrote: > Hi, > > I am wondering how to run unit test for specific spark component only. > > mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none > > The above command doesn't seem to work. I'm using spark 1.5. > > Thanks, > > Weide > > >
how to run unit test for specific component only
Hi, I am wondering how to run unit test for specific spark component only. mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none The above command doesn't seem to work. I'm using spark 1.5. Thanks, Weide
Re: How to unit test HiveContext without OutOfMemoryError (using sbt)
Thanks for your response Yana, I can increase the MaxPermSize parameter and it will allow me to run the unit test a few more times before I run out of memory. However, the primary issue is that running the same unit test in the same JVM (multiple times) results in increased memory (each run of the unit test) and I believe it has something to do with HiveContext not reclaiming memory after it is finished (or I'm not shutting it down properly). It could very well be related to sbt, however, it's not clear to me. On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: The PermGen space error is controlled with MaxPermSize parameter. I run with this in my pom, I think copied pretty literally from Spark's own tests... I don't know what the sbt equivalent is but you should be able to pass it...possibly via SBT_OPTS? plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId version1.0/version configuration reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory parallelfalse/parallel junitxml./junitxml filereportsSparkTestSuite.txt/filereports argLine-Xmx3g -XX:MaxPermSize=256m -XX:ReservedCodeCacheSize=512m/argLine stderr/ systemProperties java.awt.headlesstrue/java.awt.headless spark.testing1/spark.testing spark.ui.enabledfalse/spark.ui.enabled spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts /systemProperties /configuration executions execution idtest/id goals goaltest/goal /goals /execution /executions /plugin /plugins On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com wrote: Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception in thread Thread-2 java.io.EOFException As a work-around, I can fork a new JVM each time I run the unit test, however, it seems like a bad solution as takes a while to run the unit test. By the way, I tried to importing the TestHiveContext: - import org.apache.spark.sql.hive.test.TestHiveContext However, it suffers from the same memory issue. Has anyone else suffered from the same problem? Note that I am running these unit tests on my mac. Cheers, Mike.
Re: How to unit test HiveContext without OutOfMemoryError (using sbt)
I'd suggest setting sbt to fork when running tests. On Wed, Aug 26, 2015 at 10:51 AM, Mike Trienis mike.trie...@orcsol.com wrote: Thanks for your response Yana, I can increase the MaxPermSize parameter and it will allow me to run the unit test a few more times before I run out of memory. However, the primary issue is that running the same unit test in the same JVM (multiple times) results in increased memory (each run of the unit test) and I believe it has something to do with HiveContext not reclaiming memory after it is finished (or I'm not shutting it down properly). It could very well be related to sbt, however, it's not clear to me. On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: The PermGen space error is controlled with MaxPermSize parameter. I run with this in my pom, I think copied pretty literally from Spark's own tests... I don't know what the sbt equivalent is but you should be able to pass it...possibly via SBT_OPTS? plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId version1.0/version configuration reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory parallelfalse/parallel junitxml./junitxml filereportsSparkTestSuite.txt/filereports argLine-Xmx3g -XX:MaxPermSize=256m -XX:ReservedCodeCacheSize=512m/argLine stderr/ systemProperties java.awt.headlesstrue/java.awt.headless spark.testing1/spark.testing spark.ui.enabledfalse/spark.ui.enabled spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts /systemProperties /configuration executions execution idtest/id goals goaltest/goal /goals /execution /executions /plugin /plugins On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com wrote: Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception in thread Thread-2 java.io.EOFException As a work-around, I can fork a new JVM each time I run the unit test, however, it seems like a bad solution as takes a while to run the unit test. By the way, I tried to importing the TestHiveContext: - import org.apache.spark.sql.hive.test.TestHiveContext However, it suffers from the same memory issue. Has anyone else suffered from the same problem? Note that I am running these unit tests on my mac. Cheers, Mike.
How to unit test HiveContext without OutOfMemoryError (using sbt)
Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception in thread Thread-2 java.io.EOFException As a work-around, I can fork a new JVM each time I run the unit test, however, it seems like a bad solution as takes a while to run the unit test. By the way, I tried to importing the TestHiveContext: - import org.apache.spark.sql.hive.test.TestHiveContext However, it suffers from the same memory issue. Has anyone else suffered from the same problem? Note that I am running these unit tests on my mac. Cheers, Mike.
Re: How to unit test HiveContext without OutOfMemoryError (using sbt)
The PermGen space error is controlled with MaxPermSize parameter. I run with this in my pom, I think copied pretty literally from Spark's own tests... I don't know what the sbt equivalent is but you should be able to pass it...possibly via SBT_OPTS? plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId version1.0/version configuration reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory parallelfalse/parallel junitxml./junitxml filereportsSparkTestSuite.txt/filereports argLine-Xmx3g -XX:MaxPermSize=256m -XX:ReservedCodeCacheSize=512m/argLine stderr/ systemProperties java.awt.headlesstrue/java.awt.headless spark.testing1/spark.testing spark.ui.enabledfalse/spark.ui.enabled spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts /systemProperties /configuration executions execution idtest/id goals goaltest/goal /goals /execution /executions /plugin /plugins On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com wrote: Hello, I am using sbt and created a unit test where I create a `HiveContext` and execute some query and then return. Each time I run the unit test the JVM will increase it's memory usage until I get the error: Internal error when running tests: java.lang.OutOfMemoryError: PermGen space Exception in thread Thread-2 java.io.EOFException As a work-around, I can fork a new JVM each time I run the unit test, however, it seems like a bad solution as takes a while to run the unit test. By the way, I tried to importing the TestHiveContext: - import org.apache.spark.sql.hive.test.TestHiveContext However, it suffers from the same memory issue. Has anyone else suffered from the same problem? Note that I am running these unit tests on my mac. Cheers, Mike.
Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed
Has this been fixed for you now? There has been a number of patches since then and it may have been fixed. On Thu, May 14, 2015 at 7:20 AM, Wangfei (X) wangf...@huawei.com wrote: Yes it is repeatedly on my locally Jenkins. 发自我的 iPhone 在 2015年5月14日,18:30,Tathagata Das t...@databricks.com 写道: Do you get this failure repeatedly? On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote: Hi, all, i got following error when i run unit test of spark by dev/run-tests on the latest branch-1.4 branch. the latest commit id: commit d518c0369fa412567855980c3f0f426cde5c190d Author: zsxwing zsxw...@gmail.com Date: Wed May 13 17:58:29 2015 -0700 error [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed: org.apache.spark.SparkException: Error communicating with MapOutputTracker [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113) [error] at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119) [error] at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93) [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala) [error] at org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103) [error] ... [error] Caused by: org.apache.spark.SparkException: Error sending message [message = StopMapOutputTracker] [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109) [error] ... 52 more [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] [error] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) [error] at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) [error] at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) [error] at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) [error] at scala.concurrent.Await$.result(package.scala:107) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) [error] ... 54 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed
Yes it is repeatedly on my locally Jenkins. 发自我的 iPhone 在 2015年5月14日,18:30,Tathagata Das t...@databricks.commailto:t...@databricks.com 写道: Do you get this failure repeatedly? On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.commailto:wangf...@huawei.com wrote: Hi, all, i got following error when i run unit test of spark by dev/run-tests on the latest branch-1.4 branch. the latest commit id: commit d518c0369fa412567855980c3f0f426cde5c190d Author: zsxwing zsxw...@gmail.commailto:zsxw...@gmail.com Date: Wed May 13 17:58:29 2015 -0700 error [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed: org.apache.spark.SparkException: Error communicating with MapOutputTracker [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113) [error] at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119) [error] at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93) [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala) [error] at org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103) [error] ... [error] Caused by: org.apache.spark.SparkException: Error sending message [message = StopMapOutputTracker] [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109) [error] ... 52 more [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] [error] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) [error] at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) [error] at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) [error] at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) [error] at scala.concurrent.Await$.result(package.scala:107) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) [error] ... 54 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html Sent from the Apache Spark User List mailing list archive at Nabble.comhttp://Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
[Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed
Hi, all, i got following error when i run unit test of spark by dev/run-tests on the latest branch-1.4 branch. the latest commit id: commit d518c0369fa412567855980c3f0f426cde5c190d Author: zsxwing zsxw...@gmail.com Date: Wed May 13 17:58:29 2015 -0700 error [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed: org.apache.spark.SparkException: Error communicating with MapOutputTracker [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113) [error] at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119) [error] at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93) [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala) [error] at org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103) [error] ... [error] Caused by: org.apache.spark.SparkException: Error sending message [message = StopMapOutputTracker] [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109) [error] ... 52 more [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] [error] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) [error] at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) [error] at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) [error] at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) [error] at scala.concurrent.Await$.result(package.scala:107) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) [error] ... 54 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed
Do you get this failure repeatedly? On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote: Hi, all, i got following error when i run unit test of spark by dev/run-tests on the latest branch-1.4 branch. the latest commit id: commit d518c0369fa412567855980c3f0f426cde5c190d Author: zsxwing zsxw...@gmail.com Date: Wed May 13 17:58:29 2015 -0700 error [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed: org.apache.spark.SparkException: Error communicating with MapOutputTracker [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113) [error] at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119) [error] at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324) [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93) [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626) [error] at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74) [error] at org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102) [error] at org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala) [error] at org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103) [error] ... [error] Caused by: org.apache.spark.SparkException: Error sending message [message = StopMapOutputTracker] [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) [error] at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109) [error] ... 52 more [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] [error] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) [error] at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) [error] at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) [error] at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) [error] at scala.concurrent.Await$.result(package.scala:107) [error] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) [error] ... 54 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark unit test fails
I'm also getting the same error. Any ideas? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368p22798.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Cannot run unit test.
It's because your tests are running in parallel and you can only have one context running at a time. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p22429.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark unit test fails
Trying to bump up the rank of the question. Any example on Github can someone point to? ..Manas On Fri, Apr 3, 2015 at 9:39 AM, manasdebashiskar manasdebashis...@gmail.com wrote: Hi experts, I am trying to write unit tests for my spark application which fails with javax.servlet.FilterRegistration error. I am using CDH5.3.2 Spark and below is my dependencies list. val spark = 1.2.0-cdh5.3.2 val esriGeometryAPI = 1.2 val csvWriter = 1.0.0 val hadoopClient= 2.3.0 val scalaTest = 2.2.1 val jodaTime= 1.6.0 val scalajHTTP = 1.0.1 val avro= 1.7.7 val scopt = 3.2.0 val config = 1.2.1 val jobserver = 0.4.1 val excludeJBossNetty = ExclusionRule(organization = org.jboss.netty) val excludeIONetty = ExclusionRule(organization = io.netty) val excludeEclipseJetty = ExclusionRule(organization = org.eclipse.jetty) val excludeMortbayJetty = ExclusionRule(organization = org.mortbay.jetty) val excludeAsm = ExclusionRule(organization = org.ow2.asm) val excludeOldAsm = ExclusionRule(organization = asm) val excludeCommonsLogging = ExclusionRule(organization = commons-logging) val excludeSLF4J = ExclusionRule(organization = org.slf4j) val excludeScalap = ExclusionRule(organization = org.scala-lang, artifact = scalap) val excludeHadoop = ExclusionRule(organization = org.apache.hadoop) val excludeCurator = ExclusionRule(organization = org.apache.curator) val excludePowermock = ExclusionRule(organization = org.powermock) val excludeFastutil = ExclusionRule(organization = it.unimi.dsi) val excludeJruby = ExclusionRule(organization = org.jruby) val excludeThrift = ExclusionRule(organization = org.apache.thrift) val excludeServletApi = ExclusionRule(organization = javax.servlet, artifact = servlet-api) val excludeJUnit = ExclusionRule(organization = junit) I found the link ( http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-SecurityException-when-running-tests-with-Spark-1-0-0-td6747.html#a6749 ) talking about the issue and the work around of the same. But that work around does not get rid of the problem for me. I am using an SBT build which can't be changed to maven. What am I missing? Stack trace - [info] FiltersRDDSpec: [info] - Spark Filter *** FAILED *** [info] java.lang.SecurityException: class javax.servlet.FilterRegistration's signer information does not match signer information of other classes in the same package [info] at java.lang.ClassLoader.checkCerts(Unknown Source) [info] at java.lang.ClassLoader.preDefineClass(Unknown Source) [info] at java.lang.ClassLoader.defineClass(Unknown Source) [info] at java.security.SecureClassLoader.defineClass(Unknown Source) [info] at java.net.URLClassLoader.defineClass(Unknown Source) [info] at java.net.URLClassLoader.access$100(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.security.AccessController.doPrivileged(Native Method) [info] at java.net.URLClassLoader.findClass(Unknown Source) Thanks Manas Manas Kar -- View this message in context: Spark unit test fails http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368.html Sent from the Apache Spark User List mailing list archive http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.
Spark unit test fails
Hi experts, I am trying to write unit tests for my spark application which fails with javax.servlet.FilterRegistration error. I am using CDH5.3.2 Spark and below is my dependencies list. val spark = 1.2.0-cdh5.3.2 val esriGeometryAPI = 1.2 val csvWriter = 1.0.0 val hadoopClient= 2.3.0 val scalaTest = 2.2.1 val jodaTime= 1.6.0 val scalajHTTP = 1.0.1 val avro= 1.7.7 val scopt = 3.2.0 val config = 1.2.1 val jobserver = 0.4.1 val excludeJBossNetty = ExclusionRule(organization = org.jboss.netty) val excludeIONetty = ExclusionRule(organization = io.netty) val excludeEclipseJetty = ExclusionRule(organization = org.eclipse.jetty) val excludeMortbayJetty = ExclusionRule(organization = org.mortbay.jetty) val excludeAsm = ExclusionRule(organization = org.ow2.asm) val excludeOldAsm = ExclusionRule(organization = asm) val excludeCommonsLogging = ExclusionRule(organization = commons-logging) val excludeSLF4J = ExclusionRule(organization = org.slf4j) val excludeScalap = ExclusionRule(organization = org.scala-lang, artifact = scalap) val excludeHadoop = ExclusionRule(organization = org.apache.hadoop) val excludeCurator = ExclusionRule(organization = org.apache.curator) val excludePowermock = ExclusionRule(organization = org.powermock) val excludeFastutil = ExclusionRule(organization = it.unimi.dsi) val excludeJruby = ExclusionRule(organization = org.jruby) val excludeThrift = ExclusionRule(organization = org.apache.thrift) val excludeServletApi = ExclusionRule(organization = javax.servlet, artifact = servlet-api) val excludeJUnit = ExclusionRule(organization = junit) I found the link ( http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-SecurityException-when-running-tests-with-Spark-1-0-0-td6747.html#a6749 ) talking about the issue and the work around of the same. But that work around does not get rid of the problem for me. I am using an SBT build which can't be changed to maven. What am I missing? Stack trace - [info] FiltersRDDSpec: [info] - Spark Filter *** FAILED *** [info] java.lang.SecurityException: class javax.servlet.FilterRegistration's signer information does not match signer information of other classes in the same package [info] at java.lang.ClassLoader.checkCerts(Unknown Source) [info] at java.lang.ClassLoader.preDefineClass(Unknown Source) [info] at java.lang.ClassLoader.defineClass(Unknown Source) [info] at java.security.SecureClassLoader.defineClass(Unknown Source) [info] at java.net.URLClassLoader.defineClass(Unknown Source) [info] at java.net.URLClassLoader.access$100(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.net.URLClassLoader$1.run(Unknown Source) [info] at java.security.AccessController.doPrivileged(Native Method) [info] at java.net.URLClassLoader.findClass(Unknown Source) Thanks Manas
TestSuiteBase based unit test using a sliding window join timesout
Hi, I extended org.apache.spark.streaming.TestSuiteBase for some testing, and I was able to run this test fine: test(Sliding window join with 3 second window duration) { val input1 = Seq( Seq(req1), Seq(req2, req3), Seq(), Seq(req4, req5, req6), Seq(req7), Seq(), Seq() ) val input2 = Seq( Seq((tx1, req1)), Seq(), Seq((tx2, req3)), Seq((tx3, req2)), Seq(), Seq((tx4, req7)), Seq((tx5, req5), (tx6, req4)) ) val expectedOutput = Seq( Seq((req1, (1, tx1))), Seq(), Seq((req3, (1, tx2))), Seq((req2, (1, tx3))), Seq(), Seq((req7, (1, tx4))), Seq() ) val operation = (rq: DStream[String], tx: DStream[(String,String)]) = { rq.window(Seconds(3), Seconds(1)).map(x = (x, 1)).join(tx.map{ case (k, v) = (v, k)}) } testOperation(input1, input2, operation, expectedOutput, useSet=true) } However, this seemingly OK looking test fails with operation timeout: test(Sliding window join with 3 second window duration + a tumbling window) { val input1 = Seq( Seq(req1), Seq(req2, req3), Seq(), Seq(req4, req5, req6), Seq(req7), Seq() ) val input2 = Seq( Seq((tx1, req1)), Seq(), Seq((tx2, req3)), Seq((tx3, req2)), Seq(), Seq((tx4, req7)) ) val expectedOutput = Seq( Seq((req1, (1, tx1))), Seq((req2, (1, tx3)), (req3, (1, tx3))), Seq((req7, (1, tx4))) ) val operation = (rq: DStream[String], tx: DStream[(String,String)]) = { rq.window(Seconds(3), Seconds(2)).map(x = (x, 1)).join(tx.window(Seconds(2), Seconds(2)).map{ case (k, v) = (v, k)}) } testOperation(input1, input2, operation, expectedOutput, useSet=true) } Stacktrace: 10033 was not less than 1 Operation timed out after 10033 ms org.scalatest.exceptions.TestFailedException: 10033 was not less than 1 Operation timed out after 10033 ms at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:338) Does anybody know why this could be? ᐧ
Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
Your guess is right, that there are two incompatible versions of Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey, but its transitive dependencies may, or your transitive dependencies may. I don't see Jersey in Spark's dependency tree except from HBase tests, which in turn only appear in examples, so that's unlikely to be it. I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/ - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote: I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? For reference, my complete pom.xml looks like: project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi= http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; modelVersion4.0.0/modelVersion groupIdbigcontent/groupId artifactIdbigcontent/artifactId version1.0-SNAPSHOT/version packagingjar/packaging namebigcontent/name urlhttp://maven.apache.org/url properties project.build.sourceEncodingUTF-8/project.build.sourceEncoding /properties build plugins plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-shade-plugin/artifactId version2.3/version configuration !-- put your configurations here -- /configuration executions execution phasepackage/phase goals goalshade/goal /goals /execution /executions /plugin plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-compiler-plugin/artifactId version3.2/version configuration source1.7/source target1.7/target /configuration /plugin /plugins /build dependencies dependency groupIdorg.apache.spark/groupId artifactIdspark-streaming_2.10/artifactId version1.1.1/version scopeprovided/scope /dependency dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-client/artifactId version2.4.0/version /dependency dependency groupIdcom.google.guava/groupId artifactIdguava/artifactId version16.0/version /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-mapreduce-client-core/artifactId version2.4.0/version /dependency dependency groupIdjson-mapreduce/groupId artifactIdjson-mapreduce/artifactId version1.0-SNAPSHOT/version exclusions exclusion groupIdjavax.servlet/groupId artifactId*/artifactId /exclusion exclusion groupIdcommons-io/groupId artifactId*/artifactId /exclusion exclusion groupIdcommons-lang/groupId artifactId*/artifactId /exclusion exclusion groupIdorg.apache.hadoop/groupId artifactIdhadoop-common/artifactId /exclusion /exclusions /dependency dependency groupIdorg.apache.avro/groupId artifactIdavro-mapred/artifactId version1.7.7/version exclusions exclusion groupIdjavax.servlet/groupId artifactId*/artifactId /exclusion exclusion groupIdorg.apache.hadoop/groupId artifactIdhadoop-common/artifactId /exclusion /exclusions /dependency dependency groupIdjunit/groupId artifactIdjunit/artifactId version4.11/version scopetest/scope /dependency dependency groupIdorg.apache.avro/groupId artifactIdavro/artifactId version1.7.7/version exclusions exclusion groupIdjavax.servlet/groupId artifactId*/artifactId /exclusion /exclusions /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-common/artifactId version2.4.0/version scopeprovided/scope exclusions exclusion groupIdjavax.servlet/groupId artifactId*/artifactId /exclusion exclusion groupIdcom.google.guava/groupId artifactId*/artifactId /exclusion /exclusions /dependency dependency groupIdorg.slf4j/groupId artifactIdslf4j-log4j12/artifactId version1.7.7/version /dependency /dependencies /project And 'mvn dependency:tree' produces the following output: [INFO] Scanning for projects... [INFO] [INFO] [INFO] Building bigcontent 1.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ bigcontent --- [INFO] bigcontent:bigcontent:jar:1.0-SNAPSHOT [INFO] +- org.apache.spark:spark-streaming_2.10:jar:1.1.1:provided [INFO] | +- org.apache.spark:spark-core_2.10:jar:1.1.1:provided [INFO] | | +- org.apache.curator:curator-recipes:jar:2.4.0:provided [INFO] | | | \- org.apache.curator:curator-framework:jar:2.4.0:provided [INFO] | | | \-
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
It seems like YARN depends an older version of Jersey, that is 1.9: https://github.com/apache/spark/blob/master/yarn/pom.xml When I've modified my dependencies to have only: dependency groupIdcom.sun.jersey/groupId artifactIdjersey-core/artifactId version1.9.1/version /dependency And then modified the code to use the old Jersey API: Client c = Client.create(); WebResource r = c.resource(http://localhost:/rest;) .path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, r.getURI()); String response = r.accept(MediaType.APPLICATION_JSON_TYPE) //.header() .get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); It seems to work when I use spark-submit to submit the application that includes this code. Funny thing is, now my relevant unit test does not run, complaining about not having enough memory: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 25165824 bytes for committing reserved memory. -- Emre On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Your guess is right, that there are two incompatible versions of Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey, but its transitive dependencies may, or your transitive dependencies may. I don't see Jersey in Spark's dependency tree except from HBase tests, which in turn only appear in examples, so that's unlikely to be it. I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/ -- Emre Sevinc
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
That could well be it -- oops, I forgot to run with the YARN profile and so didn't see the YARN dependencies. Try the userClassPathFirst option to try to make your app's copy take precedence. The second error is really a JVM bug, but, is from having too little memory available for the unit tests. http://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage On Wed, Dec 24, 2014 at 1:56 PM, Emre Sevinc emre.sev...@gmail.com wrote: It seems like YARN depends an older version of Jersey, that is 1.9: https://github.com/apache/spark/blob/master/yarn/pom.xml When I've modified my dependencies to have only: dependency groupIdcom.sun.jersey/groupId artifactIdjersey-core/artifactId version1.9.1/version /dependency And then modified the code to use the old Jersey API: Client c = Client.create(); WebResource r = c.resource(http://localhost:/rest;) .path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, r.getURI()); String response = r.accept(MediaType.APPLICATION_JSON_TYPE) //.header() .get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); It seems to work when I use spark-submit to submit the application that includes this code. Funny thing is, now my relevant unit test does not run, complaining about not having enough memory: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 25165824 bytes for committing reserved memory. -- Emre On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Your guess is right, that there are two incompatible versions of Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey, but its transitive dependencies may, or your transitive dependencies may. I don't see Jersey in Spark's dependency tree except from HBase tests, which in turn only appear in examples, so that's unlikely to be it. I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/ -- Emre Sevinc - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?
Sean, Thanks a lot for the important information, especially userClassPathFirst. Cheers, Emre On Wed, Dec 24, 2014 at 3:38 PM, Sean Owen so...@cloudera.com wrote: That could well be it -- oops, I forgot to run with the YARN profile and so didn't see the YARN dependencies. Try the userClassPathFirst option to try to make your app's copy take precedence. The second error is really a JVM bug, but, is from having too little memory available for the unit tests. http://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage On Wed, Dec 24, 2014 at 1:56 PM, Emre Sevinc emre.sev...@gmail.com wrote: It seems like YARN depends an older version of Jersey, that is 1.9: https://github.com/apache/spark/blob/master/yarn/pom.xml When I've modified my dependencies to have only: dependency groupIdcom.sun.jersey/groupId artifactIdjersey-core/artifactId version1.9.1/version /dependency And then modified the code to use the old Jersey API: Client c = Client.create(); WebResource r = c.resource(http://localhost:/rest;) .path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, r.getURI()); String response = r.accept(MediaType.APPLICATION_JSON_TYPE) //.header() .get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); It seems to work when I use spark-submit to submit the application that includes this code. Funny thing is, now my relevant unit test does not run, complaining about not having enough memory: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 25165824 bytes for committing reserved memory. -- Emre On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Your guess is right, that there are two incompatible versions of Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey, but its transitive dependencies may, or your transitive dependencies may. I don't see Jersey in Spark's dependency tree except from HBase tests, which in turn only appear in examples, so that's unlikely to be it. I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target = client.target(http://localhost:/rest;); target = target.path(annotate) .queryParam(text, UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission)) .queryParam(confidence, 0.3); logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString()); String response = target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class); logger.warn(!!! DEBUG !!! Spotlight response: {}, response); When run inside a unit test as follows: mvn clean test -Dtest=SpotlightTest#testCountWords it contacts the RESTful web service and retrieves some data as expected. But when the same code is run as part of the application that is submitted to Spark, using spark-submit script I receive the following error: java.lang.NoSuchMethodError: javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in my project's pom.xml: dependency groupIdorg.glassfish.jersey.containers/groupId artifactIdjersey-container-servlet-core/artifactId version2.14/version /dependency So I suspect that when the application is submitted to Spark, somehow there's a different JAR in the environment that uses a different version of Jersey / javax.ws.rs.* Does anybody know which version of Jersey / javax.ws.rs.* is used in the Spark environment, or how to solve this conflict? -- Emre Sevinç https://be.linkedin.com/in/emresevinc/ -- Emre Sevinc -- Emre Sevinc
How can I make Spark Streaming count the words in a file in a unit test?
Hello, I've successfully built a very simple Spark Streaming application in Java that is based on the HdfsCount example in Scala at https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala . When I submit this application to my local Spark, it waits for a file to be written to a given directory, and when I create that file it successfully prints the number of words. I terminate the application by pressing Ctrl+C. Now I've tried to create a very basic unit test for this functionality, but in the test I was not able to print the same information, that is the number of words. What am I missing? Below is the unit test file, and after that I've also included the code snippet that shows the countWords method: = StarterAppTest.java = import com.google.common.io.Files; import org.apache.spark.streaming.Duration; import org.apache.spark.streaming.api.java.JavaDStream; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import org.junit.*; import java.io.*; public class StarterAppTest { JavaStreamingContext ssc; File tempDir; @Before public void setUp() { ssc = new JavaStreamingContext(local, test, new Duration(3000)); tempDir = Files.createTempDir(); tempDir.deleteOnExit(); } @After public void tearDown() { ssc.stop(); ssc = null; } @Test public void testInitialization() { Assert.assertNotNull(ssc.sc()); } @Test public void testCountWords() { StarterApp starterApp = new StarterApp(); try { JavaDStreamString lines = ssc.textFileStream(tempDir.getAbsolutePath()); JavaPairDStreamString, Integer wordCounts = starterApp.countWords(lines); System.err.println(= Word Counts ===); wordCounts.print(); System.err.println(= Word Counts ===); ssc.start(); File tmpFile = new File(tempDir.getAbsolutePath(), tmp.txt); PrintWriter writer = new PrintWriter(tmpFile, UTF-8); writer.println(8-Dec-2014: Emre Emre Emre Ergin Ergin Ergin); writer.close(); System.err.println(= Word Counts ===); wordCounts.print(); System.err.println(= Word Counts ===); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } Assert.assertTrue(true); } } = This test compiles and starts to run, Spark Streaming prints a lot of diagnostic messages on the console but the calls to wordCounts.print(); does not print anything, whereas in StarterApp.java itself, they do. I've also added ssc.awaitTermination(); after ssc.start() but nothing changed in that respect. After that I've also tried to create a new file in the directory that this Spark Streaming application was checking but this time it gave an error. For completeness, below is the wordCounts method: public JavaPairDStreamString, Integer countWords(JavaDStreamString lines) { JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { return Lists.newArrayList(SPACE.split(x)); } }); JavaPairDStreamString, Integer wordCounts = words.mapToPair( new PairFunctionString, String, Integer() { @Override public Tuple2String, Integer call(String s) { return new Tuple2(s, 1); } }).reduceByKey((i1, i2) - i1 + i2); return wordCounts; } Kind regards Emre Sevinç
Re: How can I make Spark Streaming count the words in a file in a unit test?
Hi, https://github.com/databricks/spark-perf/tree/master/streaming-tests/src/main/scala/streaming/perf contains some performance tests for streaming. There are examples of how to generate synthetic files during the test in that repo, maybe you can find some code snippets that you can use there. Best, Burak - Original Message - From: Emre Sevinc emre.sev...@gmail.com To: user@spark.apache.org Sent: Monday, December 8, 2014 2:36:41 AM Subject: How can I make Spark Streaming count the words in a file in a unit test? Hello, I've successfully built a very simple Spark Streaming application in Java that is based on the HdfsCount example in Scala at https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala . When I submit this application to my local Spark, it waits for a file to be written to a given directory, and when I create that file it successfully prints the number of words. I terminate the application by pressing Ctrl+C. Now I've tried to create a very basic unit test for this functionality, but in the test I was not able to print the same information, that is the number of words. What am I missing? Below is the unit test file, and after that I've also included the code snippet that shows the countWords method: = StarterAppTest.java = import com.google.common.io.Files; import org.apache.spark.streaming.Duration; import org.apache.spark.streaming.api.java.JavaDStream; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import org.junit.*; import java.io.*; public class StarterAppTest { JavaStreamingContext ssc; File tempDir; @Before public void setUp() { ssc = new JavaStreamingContext(local, test, new Duration(3000)); tempDir = Files.createTempDir(); tempDir.deleteOnExit(); } @After public void tearDown() { ssc.stop(); ssc = null; } @Test public void testInitialization() { Assert.assertNotNull(ssc.sc()); } @Test public void testCountWords() { StarterApp starterApp = new StarterApp(); try { JavaDStreamString lines = ssc.textFileStream(tempDir.getAbsolutePath()); JavaPairDStreamString, Integer wordCounts = starterApp.countWords(lines); System.err.println(= Word Counts ===); wordCounts.print(); System.err.println(= Word Counts ===); ssc.start(); File tmpFile = new File(tempDir.getAbsolutePath(), tmp.txt); PrintWriter writer = new PrintWriter(tmpFile, UTF-8); writer.println(8-Dec-2014: Emre Emre Emre Ergin Ergin Ergin); writer.close(); System.err.println(= Word Counts ===); wordCounts.print(); System.err.println(= Word Counts ===); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } Assert.assertTrue(true); } } = This test compiles and starts to run, Spark Streaming prints a lot of diagnostic messages on the console but the calls to wordCounts.print(); does not print anything, whereas in StarterApp.java itself, they do. I've also added ssc.awaitTermination(); after ssc.start() but nothing changed in that respect. After that I've also tried to create a new file in the directory that this Spark Streaming application was checking but this time it gave an error. For completeness, below is the wordCounts method: public JavaPairDStreamString, Integer countWords(JavaDStreamString lines) { JavaDStreamString words = lines.flatMap(new FlatMapFunctionString, String() { @Override public IterableString call(String x) { return Lists.newArrayList(SPACE.split(x)); } }); JavaPairDStreamString, Integer wordCounts = words.mapToPair( new PairFunctionString, String, Integer() { @Override public Tuple2String, Integer call(String s) { return new Tuple2(s, 1); } }).reduceByKey((i1, i2) - i1 + i2); return wordCounts; } Kind regards Emre Sevinç - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Cannot run unit test.
When I run `sbt test-only SparkTest` or `sbt test-only SparkTest1`, it was pass. But run `set test` to tests SparkTest and SparkTest1, it was failed. If merge all cases into one file, the test was pass. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p14506.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
Hi TD, I tried some different setup on maven these days, and now I can at least get something when running mvn test. However, it seems like scalatest cannot find the test cases specified in the test suite. Here is the output I get: http://apache-spark-user-list.1001560.n3.nabble.com/file/n11825/Screen_Shot_2014-08-08_at_5.png Could you please give me some details on how you setup the ScalaTest on your machine? I believe there must be some other setup issue on my machine but I cannot figure out why... And here are the plugins and dependencies related to scalatest in my pom.xml : plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-surefire-plugin/artifactId version2.7/version configuration skipTeststrue/skipTests /configuration /plugin plugin groupIdorg.scalatest/groupId artifactIdscalatest-maven-plugin/artifactId version1.0/version configuration reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory junitxml./junitxml filereports${project.build.directory}/SparkTestSuite.txt/filereports tagsToIncludeATag/tagsToInclude systemProperties java.awt.headlesstrue/java.awt.headless spark.test.home${session.executionRootDirectory}/spark.test.home spark.testing1/spark.testing /systemProperties /configuration executions execution idtest/id goals goaltest/goal /goals /execution /executions /plugin dependency groupIdjunit/groupId artifactIdjunit/artifactId version4.8.1/version scopetest/scope /dependency dependency groupIdorg.scalatest/groupId artifactIdscalatest_2.10/artifactId version2.2.1/version scopetest/scope /dependency Thank you very much! Best Regards, Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11825.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
Thank you TD, I have worked around that problem and now the test compiles. However, I don't actually see that test running. As when I do mvn test, it just says BUILD SUCCESS, without any TEST section on stdout. Are we suppose to use mvn test to run the test? Are there any other methods can be used to run this test? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
Does it not show the name of the testsuite on stdout, showing that it has passed? Can you try writing a small test unit-test, in the same way as your kafka unit test, and with print statements on stdout ... to see whether it works? I believe it is some configuration issue in maven, which is hard for me to guess. TD On Wed, Aug 6, 2014 at 12:53 PM, JiajiaJing jj.jing0...@gmail.com wrote: Thank you TD, I have worked around that problem and now the test compiles. However, I don't actually see that test running. As when I do mvn test, it just says BUILD SUCCESS, without any TEST section on stdout. Are we suppose to use mvn test to run the test? Are there any other methods can be used to run this test? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
That function is simply deletes a directory recursively. you can use alternative libraries. see this discussion http://stackoverflow.com/questions/779519/delete-files-recursively-in-java On Tue, Aug 5, 2014 at 5:02 PM, JiajiaJing jj.jing0...@gmail.com wrote: Hi TD, I encountered a problem when trying to run the KafkaStreamSuite.scala unit test. I added scalatest-maven-plugin to my pom.xml, then ran mvn test, and got the follow error message: error: object Utils in package util cannot be accessed in package org.apache.spark.util [INFO] brokerConf.logDirs.foreach { f = Utils.deleteRecursively(new File(f)) } [INFO]^ I checked that Utils.scala does exists under spark/core/src/main/scala/org/apache/spark/util/, so I have no idea about why this access error. Could you please help me with this? Thank you very much! Best Regards, Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11505.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Unit Test for Spark Streaming
Hello Spark Users, I have a spark streaming program that stream data from kafka topics and output as parquet file on HDFS. Now I want to write a unit test for this program to make sure the output data is correct (i.e not missing any data from kafka). However, I have no idea about how to do this, especially how to mock a kafka topic. Can someone help me with this? Thank you very much! Best Regards, Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
Appropriately timed question! Here is the PR that adds a real unit test for Kafka stream in Spark Streaming. Maybe this will help! https://github.com/apache/spark/pull/1751/files On Mon, Aug 4, 2014 at 6:30 PM, JiajiaJing jj.jing0...@gmail.com wrote: Hello Spark Users, I have a spark streaming program that stream data from kafka topics and output as parquet file on HDFS. Now I want to write a unit test for this program to make sure the output data is correct (i.e not missing any data from kafka). However, I have no idea about how to do this, especially how to mock a kafka topic. Can someone help me with this? Thank you very much! Best Regards, Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unit Test for Spark Streaming
This helps a lot!! Thank you very much! Jiajia -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11396.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Run spark unit test on Windows 7
fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (*Windows 7*) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Re: Run spark unit test on Windows 7
? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on Spark, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (Windows 7) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Re: Run spark unit test on Windows 7
) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on Spark, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (Windows 7) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Re: Run spark unit test on Windows 7
(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on Spark, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (Windows 7) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Run spark unit test on Windows 7
Hi all, I'm trying to run some transformation on *Spark*, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (*Windows 7*) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Re: Run spark unit test on Windows 7
Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on *Spark*, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (*Windows 7*) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Re: Run spark unit test on Windows 7
Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on *Spark*, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (*Windows 7*) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true) } finally { if(sc != null) sc.stop() } } Why is it trying to access hadoop at all? and how can I fix it? Thank you in advance Thank you, Konstantin Kudryavtsev
Re: Run spark unit test on Windows 7
By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on Spark, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (Windows 7) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data = sc.parallelize(List(in1, in2, in3)) etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop Assert.assertTrue(true
Re: Run spark unit test on Windows 7
No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote: By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on Spark, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (Windows 7) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test def testETL() = { val conf = new SparkConf() val sc = new SparkContext(local, test, conf) try { val etl = new IxtoolsDailyAgg() // empty constructor val data
Re: Run spark unit test on Windows 7
You don't actually need it per se - its just that some of the Spark libraries are referencing Hadoop libraries even if they ultimately do not call them. When I was doing some early builds of Spark on Windows, I admittedly had Hadoop on Windows running as well and had not run into this particular issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote: By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Andrew, it's windows 7 and I doesn't set up any env variables here The full stack trace: 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) at org.apache.hadoop.security.Groups.init(Groups.java:77) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283) at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36) at org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109) at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala) at org.apache.spark.SparkContext.init(SparkContext.scala:228) at org.apache.spark.SparkContext.init(SparkContext.scala:97) at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81) at org.junit.runner.JUnitCore.run(JUnitCore.java:130) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote: Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation on *Spark*, it works fine on cluster (YARN, linux machines). However, when I'm trying to run it on local machine (*Windows 7*) under unit test, I got errors: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333) at org.apache.hadoop.util.Shell.clinit(Shell.java:326) at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93) My code is following: @Test
Re: Unit test failure: Address already in use
Hi, Could your problem come from the fact that you run your tests in parallel ? If you are spark in local mode, you cannot have concurrent spark instances running. this means that your tests instantiating sparkContext cannot be run in parallel. The easiest fix is to tell sbt to not run parallel tests. This can be done by adding the following line in your build.sbt: parallelExecution in Test := false Cheers, Anselme 2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.com: Hi, I have 3 unit tests (independent of each other) in the /src/test/scala folder. When I run each of them individually using: sbt test-only test, all the 3 pass the test. But when I run them all using sbt test, then they fail with the warning below. I am wondering if the binding exception results in failure to run the job, thereby causing the failure. If so, what can I do to address this binding exception? I am running these tests locally on a standalone machine (i.e. SparkContext(local, test)). 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:174) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
RE: Unit test failure: Address already in use
Disabling parallelExecution has worked for me. Other alternatives I’ve tried that also work include: 1. Using a lock – this will let tests execute in parallel except for those using a SparkContext. If you have a large number of tests that could execute in parallel, this can shave off some time. object TestingSparkContext { val lock = new Lock() } // before you instantiate your local SparkContext TestingSparkContext.lock.acquire() // after you call sc.stop() TestingSparkContext.lock.release() 2. Sharing a local SparkContext between tests. - This is nice because your tests will run faster. Start-up and shutdown is time consuming (can add a few seconds per test). - The downside is that your tests are using the same SparkContext so they are less independent of each other. I haven’t seen issues with this yet but there are likely some things that might crop up. Best, Todd From: Anselme Vignon [mailto:anselme.vig...@flaminem.com] Sent: Wednesday, June 18, 2014 12:33 AM To: user@spark.apache.org Subject: Re: Unit test failure: Address already in use Hi, Could your problem come from the fact that you run your tests in parallel ? If you are spark in local mode, you cannot have concurrent spark instances running. this means that your tests instantiating sparkContext cannot be run in parallel. The easiest fix is to tell sbt to not run parallel tests. This can be done by adding the following line in your build.sbt: parallelExecution in Test := false Cheers, Anselme 2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.commailto:skrishna...@gmail.com: Hi, I have 3 unit tests (independent of each other) in the /src/test/scala folder. When I run each of them individually using: sbt test-only test, all the 3 pass the test. But when I run them all using sbt test, then they fail with the warning below. I am wondering if the binding exception results in failure to run the job, thereby causing the failure. If so, what can I do to address this binding exception? I am running these tests locally on a standalone machine (i.e. SparkContext(local, test)). 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@3487b78dmailto:org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:174) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Unit test failure: Address already in use
In my unit tests I have a base class that all my tests extend that has a setup and teardown method that they inherit. They look something like this: var spark: SparkContext = _ @Before def setUp() { Thread.sleep(100L) //this seems to give spark more time to reset from the previous test's tearDown spark = new SparkContext(local, test spark) } @After def tearDown() { spark.stop spark = null //not sure why this helps but it does! System.clearProperty(spark.master.port) } It's been since last fall (i.e. version 0.8.x) since I've examined this code and so I can't vouch that it is still accurate/necessary - but it still works for me. On 06/18/2014 12:59 PM, Lisonbee, Todd wrote: Disabling parallelExecution has worked for me. Other alternatives I’ve tried that also work include: 1. Using a lock – this will let tests execute in parallel except for those using a SparkContext. If you have a large number of tests that could execute in parallel, this can shave off some time. object TestingSparkContext { val lock = new Lock() } // before you instantiate your local SparkContext TestingSparkContext.lock.acquire() // after you call sc.stop() TestingSparkContext.lock.release() 2. Sharing a local SparkContext between tests. - This is nice because your tests will run faster. Start-up and shutdown is time consuming (can add a few seconds per test). - The downside is that your tests are using the same SparkContext so they are less independent of each other. I haven’t seen issues with this yet but there are likely some things that might crop up. Best, Todd *From:*Anselme Vignon [mailto:anselme.vig...@flaminem.com] *Sent:* Wednesday, June 18, 2014 12:33 AM *To:* user@spark.apache.org *Subject:* Re: Unit test failure: Address already in use Hi, Could your problem come from the fact that you run your tests in parallel ? If you are spark in local mode, you cannot have concurrent spark instances running. this means that your tests instantiating sparkContext cannot be run in parallel. The easiest fix is to tell sbt to not run parallel tests. This can be done by adding the following line in your build.sbt: parallelExecution in Test := false Cheers, Anselme 2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.com mailto:skrishna...@gmail.com: Hi, I have 3 unit tests (independent of each other) in the /src/test/scala folder. When I run each of them individually using: sbt test-only test, all the 3 pass the test. But when I run them all using sbt test, then they fail with the warning below. I am wondering if the binding exception results in failure to run the job, thereby causing the failure. If so, what can I do to address this binding exception? I am running these tests locally on a standalone machine (i.e. SparkContext(local, test)). 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@3487b78d mailto:org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:174) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Unit test failure: Address already in use
Hi, I have 3 unit tests (independent of each other) in the /src/test/scala folder. When I run each of them individually using: sbt test-only test, all the 3 pass the test. But when I run them all using sbt test, then they fail with the warning below. I am wondering if the binding exception results in failure to run the job, thereby causing the failure. If so, what can I do to address this binding exception? I am running these tests locally on a standalone machine (i.e. SparkContext(local, test)). 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address already in use java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:174) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77) thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
printing in unit test
Hi, My unit test is failing (the output is not matching the expected output). I would like to printout the value of the output. But rdd.foreach(r=println(r)) does not work from the unit test. How can I print or write out the output to a file/screen? thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/printing-in-unit-test-tp7611.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
unit test
Hi! I have two question: 1. I want to test my application. My app will write the result to elasticsearch (stage 1) with saveAsHadoopFile. How can I write test case for it? Need to pull up a MiniDFSCluster? Or there are other way? My application flow plan: Kafka = Spark Streaming (enrich) - Elasticsearch = Spark (map/reduce) - HBase 2. Can Spark read data from elasticsearch? What is the prefered way for this? b0c1 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/unit-test-tp7155.html Sent from the Apache Spark User List mailing list archive at Nabble.com.