Re: The Dataset unit test is much slower than the RDD unit test (in Scala)

2022-11-01 Thread Cheng Pan
Which Spark version are you using?

SPARK-36444[1] and SPARK-38138[2] may be related, please test w/ the
patched version or disable DPP by setting
spark.sql.optimizer.dynamicPartitionPruning.enabled=false to see if it
helps.

[1] https://issues.apache.org/jira/browse/SPARK-36444
[2] https://issues.apache.org/jira/browse/SPARK-38138


Thanks,
Cheng Pan


On Nov 2, 2022 at 00:14:34, Enrico Minack  wrote:

> Hi Tanin,
>
> running your test with option "spark.sql.planChangeLog.level" set to
> "info" or "warn" (depending on your Spark log level) will show you
> insights into the planning (which rules are applied, how long rules
> take, how many iterations are done).
>
> Hoping this helps,
> Enrico
>
>
> Am 25.10.22 um 21:54 schrieb Tanin Na Nakorn:
>
> Hi All,
>
>
> Our data job is very complex (e.g. 100+ joins), and we have switched
>
> from RDD to Dataset recently.
>
>
> We've found that the unit test takes much longer. We profiled it and
>
> have found that it's the planning phase that is slow, not execution.
>
>
> I wonder if anyone has encountered this issue before and if there's a
>
> way to make the planning phase faster (e.g. maybe disabling certain
>
> optimizers).
>
>
> Any thoughts or input would be appreciated.
>
>
> Thank you,
>
> Tanin
>
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: The Dataset unit test is much slower than the RDD unit test (in Scala)

2022-11-01 Thread Enrico Minack

Hi Tanin,

running your test with option "spark.sql.planChangeLog.level" set to 
"info" or "warn" (depending on your Spark log level) will show you 
insights into the planning (which rules are applied, how long rules 
take, how many iterations are done).


Hoping this helps,
Enrico


Am 25.10.22 um 21:54 schrieb Tanin Na Nakorn:

Hi All,

Our data job is very complex (e.g. 100+ joins), and we have switched 
from RDD to Dataset recently.


We've found that the unit test takes much longer. We profiled it and 
have found that it's the planning phase that is slow, not execution.


I wonder if anyone has encountered this issue before and if there's a 
way to make the planning phase faster (e.g. maybe disabling certain 
optimizers).


Any thoughts or input would be appreciated.

Thank you,
Tanin




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



The Dataset unit test is much slower than the RDD unit test (in Scala)

2022-10-25 Thread Tanin Na Nakorn
Hi All,

Our data job is very complex (e.g. 100+ joins), and we have switched from
RDD to Dataset recently.

We've found that the unit test takes much longer. We profiled it and have
found that it's the planning phase that is slow, not execution.

I wonder if anyone has encountered this issue before and if there's a way
to make the planning phase faster (e.g. maybe disabling certain optimizers).

Any thoughts or input would be appreciated.

Thank you,
Tanin


Re: ivy unit test case filing for Spark

2021-12-21 Thread Wes Peng
Are you using IvyVPN which causes this problem? If the VPN software changes
the network URL silently you should avoid using them.

Regards.

On Wed, Dec 22, 2021 at 1:48 AM Pralabh Kumar 
wrote:

> Hi Spark Team
>
> I am building a spark in VPN . But the unit test case below is failing.
> This is pointing to ivy location which  cannot be reached within VPN . Any
> help would be appreciated
>
> test("SPARK-33084: Add jar support Ivy URI -- default transitive = true")
> {
>   *sc *= new SparkContext(new 
> SparkConf().setAppName("test").setMaster("local-cluster[3,
> 1, 1024]"))
>   *sc*.addJar("*ivy://org.apache.hive:hive-storage-api:2.7.0*")
>   assert(*sc*.listJars().exists(_.contains(
> "org.apache.hive_hive-storage-api-2.7.0.jar")))
>   assert(*sc*.listJars().exists(_.contains(
> "commons-lang_commons-lang-2.6.jar")))
> }
>
> Error
>
> - SPARK-33084: Add jar support Ivy URI -- default transitive = true ***
> FAILED ***
> java.lang.RuntimeException: [unresolved dependency:
> org.apache.hive#hive-storage-api;2.7.0: not found]
> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(
> SparkSubmit.scala:1447)
> at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
> DependencyUtils.scala:185)
> at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
> DependencyUtils.scala:159)
> at org.apache.spark.SparkContext.addJar(SparkContext.scala:1996)
> at org.apache.spark.SparkContext.addJar(SparkContext.scala:1928)
> at org.apache.spark.SparkContextSuite.$anonfun$new$115(SparkContextSuite.
> scala:1041)
> at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> at org.scalatest.Transformer.apply(Transformer.scala:22)
>
> Regards
> Pralabh Kumar
>
>
>


Re: ivy unit test case filing for Spark

2021-12-21 Thread Sean Owen
You would have to make it available? This doesn't seem like a spark issue.

On Tue, Dec 21, 2021, 10:48 AM Pralabh Kumar  wrote:

> Hi Spark Team
>
> I am building a spark in VPN . But the unit test case below is failing.
> This is pointing to ivy location which  cannot be reached within VPN . Any
> help would be appreciated
>
> test("SPARK-33084: Add jar support Ivy URI -- default transitive = true")
> {
>   *sc *= new SparkContext(new 
> SparkConf().setAppName("test").setMaster("local-cluster[3,
> 1, 1024]"))
>   *sc*.addJar("*ivy://org.apache.hive:hive-storage-api:2.7.0*")
>   assert(*sc*.listJars().exists(_.contains(
> "org.apache.hive_hive-storage-api-2.7.0.jar")))
>   assert(*sc*.listJars().exists(_.contains(
> "commons-lang_commons-lang-2.6.jar")))
> }
>
> Error
>
> - SPARK-33084: Add jar support Ivy URI -- default transitive = true ***
> FAILED ***
> java.lang.RuntimeException: [unresolved dependency:
> org.apache.hive#hive-storage-api;2.7.0: not found]
> at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(
> SparkSubmit.scala:1447)
> at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
> DependencyUtils.scala:185)
> at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
> DependencyUtils.scala:159)
> at org.apache.spark.SparkContext.addJar(SparkContext.scala:1996)
> at org.apache.spark.SparkContext.addJar(SparkContext.scala:1928)
> at org.apache.spark.SparkContextSuite.$anonfun$new$115(SparkContextSuite.
> scala:1041)
> at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> at org.scalatest.Transformer.apply(Transformer.scala:22)
>
> Regards
> Pralabh Kumar
>
>
>


ivy unit test case filing for Spark

2021-12-21 Thread Pralabh Kumar
Hi Spark Team

I am building a spark in VPN . But the unit test case below is failing.
This is pointing to ivy location which  cannot be reached within VPN . Any
help would be appreciated

test("SPARK-33084: Add jar support Ivy URI -- default transitive = true") {
  *sc *= new SparkContext(new
SparkConf().setAppName("test").setMaster("local-cluster[3,
1, 1024]"))
  *sc*.addJar("*ivy://org.apache.hive:hive-storage-api:2.7.0*")
  assert(*sc*.listJars().exists(_.contains(
"org.apache.hive_hive-storage-api-2.7.0.jar")))
  assert(*sc*.listJars().exists(_.contains(
"commons-lang_commons-lang-2.6.jar")))
}

Error

- SPARK-33084: Add jar support Ivy URI -- default transitive = true ***
FAILED ***
java.lang.RuntimeException: [unresolved dependency:
org.apache.hive#hive-storage-api;2.7.0: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(
SparkSubmit.scala:1447)
at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
DependencyUtils.scala:185)
at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(
DependencyUtils.scala:159)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1996)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1928)
at org.apache.spark.SparkContextSuite.$anonfun$new$115(SparkContextSuite.
scala:1041)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)

Regards
Pralabh Kumar


Re: Need Unit test complete reference for Pyspark

2020-11-19 Thread Sofia’s World
Hey
 they are good libraries..to get you started. Have used both of them..
unfortunately -as far as i saw when i started to use them  - only few
people maintains them.
But you can get pointers out of them for writing tests. the code below can
get you started
What you'll need is

- a method to create dataframe on the fly, perhaps from  a string.  you can
have a look at pandas, it will have methods for it
- a method to test dataframe equality. you can use  df1.subtract(df2)

I am assuming you are into dataframes - rather than RDDs, for which the two
packages you mention  should have everything you need

hht
 marco


import logging
from pyspark.sql import SparkSession
from pyspark import HiveContext
from pyspark import SparkConf
from pyspark import SparkContext
import pyspark
from pyspark.sql import SparkSession
import pytest
import shutil

@pytest.fixture
def spark_session():
return SparkSession.builder \
.master('local[1]') \
.appName('SparkByExamples.com') \
.getOrCreate()


def test_create_table(spark_session):
df = spark_session.createDataFrame([['one',
'two']]).toDF(*['first', 'second'])
print(df.show())

df2 = spark_session.createDataFrame([['one',
'two']]).toDF(*['first', 'second'])

assert df.subtract(df2).count() == 0




On Thu, Nov 19, 2020 at 6:38 AM Sachit Murarka 
wrote:

> Hi Users,
>
> I have to write Unit Test cases for PySpark.
> I think pytest-spark and "spark testing base" are good test libraries.
>
> Can anyone please provide full reference for writing the test cases in
> Python using these?
>
> Kind Regards,
> Sachit Murarka
>


Need Unit test complete reference for Pyspark

2020-11-18 Thread Sachit Murarka
Hi Users,

I have to write Unit Test cases for PySpark.
I think pytest-spark and "spark testing base" are good test libraries.

Can anyone please provide full reference for writing the test cases in
Python using these?

Kind Regards,
Sachit Murarka


Re: how do i force unit test to do whole stage codegen

2017-04-05 Thread Jacek Laskowski
Thanks Koert for the kind words. That part however is easy to fix and
was surprised to have seen the old style referenced (!)

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Wed, Apr 5, 2017 at 6:14 PM, Koert Kuipers <ko...@tresata.com> wrote:
> its pretty much impossible to be fully up to date with spark given how fast
> it moves!
>
> the book is a very helpful reference
>
> On Wed, Apr 5, 2017 at 11:15 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Hi,
>>
>> I'm very sorry for not being up to date with the current style (and
>> "promoting" the old style) and am going to review that part soon. I'm very
>> close to touch it again since I'm with Optimizer these days.
>>
>> Jacek
>>
>> On 5 Apr 2017 6:08 a.m., "Kazuaki Ishizaki" <ishiz...@jp.ibm.com> wrote:
>>>
>>> Hi,
>>> The page in the URL explains the old style of physical plan output.
>>> The current style adds "*" as a prefix of each operation that the
>>> whole-stage codegen can be apply to.
>>>
>>> So, in your test case, whole-stage codegen has been already enabled!!
>>>
>>> FYI. I think that it is a good topic for d...@spark.apache.org.
>>>
>>> Kazuaki Ishizaki
>>>
>>>
>>>
>>> From:Koert Kuipers <ko...@tresata.com>
>>> To:"user@spark.apache.org" <user@spark.apache.org>
>>> Date:2017/04/05 05:12
>>> Subject:how do i force unit test to do whole stage codegen
>>> 
>>>
>>>
>>>
>>> i wrote my own expression with eval and doGenCode, but doGenCode never
>>> gets called in tests.
>>>
>>> also as a test i ran this in a unit test:
>>> spark.range(10).select('id as 'asId).where('id === 4).explain
>>> according to
>>>
>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html
>>> this is supposed to show:
>>> == Physical Plan ==
>>> WholeStageCodegen
>>> :  +- Project [id#0L AS asId#3L]
>>> : +- Filter (id#0L = 4)
>>> :+- Range 0, 1, 8, 10, [id#0L]
>>>
>>> but it doesn't. instead it shows:
>>>
>>> == Physical Plan ==
>>> *Project [id#12L AS asId#15L]
>>> +- *Filter (id#12L = 4)
>>>   +- *Range (0, 10, step=1, splits=Some(4))
>>>
>>> so i am again missing the WholeStageCodegen. any idea why?
>>>
>>> i create spark session for unit tests simply as:
>>> val session = SparkSession.builder
>>>  .master("local[*]")
>>>  .appName("test")
>>>  .config("spark.sql.shuffle.partitions", 4)
>>>  .getOrCreate()
>>>
>>>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: how do i force unit test to do whole stage codegen

2017-04-05 Thread Koert Kuipers
its pretty much impossible to be fully up to date with spark given how fast
it moves!

the book is a very helpful reference

On Wed, Apr 5, 2017 at 11:15 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> I'm very sorry for not being up to date with the current style (and
> "promoting" the old style) and am going to review that part soon. I'm very
> close to touch it again since I'm with Optimizer these days.
>
> Jacek
>
> On 5 Apr 2017 6:08 a.m., "Kazuaki Ishizaki" <ishiz...@jp.ibm.com> wrote:
>
>> Hi,
>> The page in the URL explains the old style of physical plan output.
>> The current style adds "*" as a prefix of each operation that the
>> whole-stage codegen can be apply to.
>>
>> So, in your test case, whole-stage codegen has been already enabled!!
>>
>> FYI. I think that it is a good topic for d...@spark.apache.org.
>>
>> Kazuaki Ishizaki
>>
>>
>>
>> From:Koert Kuipers <ko...@tresata.com>
>> To:"user@spark.apache.org" <user@spark.apache.org>
>> Date:2017/04/05 05:12
>> Subject:how do i force unit test to do whole stage codegen
>> --
>>
>>
>>
>> i wrote my own expression with eval and doGenCode, but doGenCode never
>> gets called in tests.
>>
>> also as a test i ran this in a unit test:
>> spark.range(10).select('id as 'asId).where('id === 4).explain
>> according to
>>
>> *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html*
>> <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html>
>> this is supposed to show:
>> == Physical Plan ==
>> WholeStageCodegen
>> :  +- Project [id#0L AS asId#3L]
>> : +- Filter (id#0L = 4)
>> :+- Range 0, 1, 8, 10, [id#0L]
>>
>> but it doesn't. instead it shows:
>>
>> == Physical Plan ==
>> *Project [id#12L AS asId#15L]
>> +- *Filter (id#12L = 4)
>>   +- *Range (0, 10, step=1, splits=Some(4))
>>
>> so i am again missing the WholeStageCodegen. any idea why?
>>
>> i create spark session for unit tests simply as:
>> val session = SparkSession.builder
>>  .master("local[*]")
>>  .appName("test")
>>  .config("spark.sql.shuffle.partitions", 4)
>>  .getOrCreate()
>>
>>
>>


Re: how do i force unit test to do whole stage codegen

2017-04-05 Thread Jacek Laskowski
Hi,

I'm very sorry for not being up to date with the current style (and
"promoting" the old style) and am going to review that part soon. I'm very
close to touch it again since I'm with Optimizer these days.

Jacek

On 5 Apr 2017 6:08 a.m., "Kazuaki Ishizaki" <ishiz...@jp.ibm.com> wrote:

> Hi,
> The page in the URL explains the old style of physical plan output.
> The current style adds "*" as a prefix of each operation that the
> whole-stage codegen can be apply to.
>
> So, in your test case, whole-stage codegen has been already enabled!!
>
> FYI. I think that it is a good topic for d...@spark.apache.org.
>
> Kazuaki Ishizaki
>
>
>
> From:Koert Kuipers <ko...@tresata.com>
> To:"user@spark.apache.org" <user@spark.apache.org>
> Date:2017/04/05 05:12
> Subject:how do i force unit test to do whole stage codegen
> --
>
>
>
> i wrote my own expression with eval and doGenCode, but doGenCode never
> gets called in tests.
>
> also as a test i ran this in a unit test:
> spark.range(10).select('id as 'asId).where('id === 4).explain
> according to
>
> *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html*
> <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html>
> this is supposed to show:
> == Physical Plan ==
> WholeStageCodegen
> :  +- Project [id#0L AS asId#3L]
> : +- Filter (id#0L = 4)
> :+- Range 0, 1, 8, 10, [id#0L]
>
> but it doesn't. instead it shows:
>
> == Physical Plan ==
> *Project [id#12L AS asId#15L]
> +- *Filter (id#12L = 4)
>   +- *Range (0, 10, step=1, splits=Some(4))
>
> so i am again missing the WholeStageCodegen. any idea why?
>
> i create spark session for unit tests simply as:
> val session = SparkSession.builder
>  .master("local[*]")
>  .appName("test")
>  .config("spark.sql.shuffle.partitions", 4)
>  .getOrCreate()
>
>
>


Re: how do i force unit test to do whole stage codegen

2017-04-04 Thread Koert Kuipers
got it. thats good to know. thanks!

On Wed, Apr 5, 2017 at 12:07 AM, Kazuaki Ishizaki <ishiz...@jp.ibm.com>
wrote:

> Hi,
> The page in the URL explains the old style of physical plan output.
> The current style adds "*" as a prefix of each operation that the
> whole-stage codegen can be apply to.
>
> So, in your test case, whole-stage codegen has been already enabled!!
>
> FYI. I think that it is a good topic for d...@spark.apache.org.
>
> Kazuaki Ishizaki
>
>
>
> From:Koert Kuipers <ko...@tresata.com>
> To:"user@spark.apache.org" <user@spark.apache.org>
> Date:2017/04/05 05:12
> Subject:how do i force unit test to do whole stage codegen
> --
>
>
>
> i wrote my own expression with eval and doGenCode, but doGenCode never
> gets called in tests.
>
> also as a test i ran this in a unit test:
> spark.range(10).select('id as 'asId).where('id === 4).explain
> according to
>
> *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html*
> <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html>
> this is supposed to show:
> == Physical Plan ==
> WholeStageCodegen
> :  +- Project [id#0L AS asId#3L]
> : +- Filter (id#0L = 4)
> :+- Range 0, 1, 8, 10, [id#0L]
>
> but it doesn't. instead it shows:
>
> == Physical Plan ==
> *Project [id#12L AS asId#15L]
> +- *Filter (id#12L = 4)
>   +- *Range (0, 10, step=1, splits=Some(4))
>
> so i am again missing the WholeStageCodegen. any idea why?
>
> i create spark session for unit tests simply as:
> val session = SparkSession.builder
>  .master("local[*]")
>  .appName("test")
>  .config("spark.sql.shuffle.partitions", 4)
>  .getOrCreate()
>
>
>


Re: how do i force unit test to do whole stage codegen

2017-04-04 Thread Kazuaki Ishizaki
Hi,
The page in the URL explains the old style of physical plan output.
The current style adds "*" as a prefix of each operation that the 
whole-stage codegen can be apply to.

So, in your test case, whole-stage codegen has been already enabled!!

FYI. I think that it is a good topic for d...@spark.apache.org.

Kazuaki Ishizaki



From:   Koert Kuipers <ko...@tresata.com>
To: "user@spark.apache.org" <user@spark.apache.org>
Date:   2017/04/05 05:12
Subject:how do i force unit test to do whole stage codegen



i wrote my own expression with eval and doGenCode, but doGenCode never 
gets called in tests.

also as a test i ran this in a unit test:
spark.range(10).select('id as 'asId).where('id === 4).explain
according to
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html
this is supposed to show:
== Physical Plan ==
WholeStageCodegen
:  +- Project [id#0L AS asId#3L]
: +- Filter (id#0L = 4)
:+- Range 0, 1, 8, 10, [id#0L]

but it doesn't. instead it shows:

== Physical Plan ==
*Project [id#12L AS asId#15L]
+- *Filter (id#12L = 4)
   +- *Range (0, 10, step=1, splits=Some(4))

so i am again missing the WholeStageCodegen. any idea why?

i create spark session for unit tests simply as:
val session = SparkSession.builder
  .master("local[*]")
  .appName("test")
  .config("spark.sql.shuffle.partitions", 4)
  .getOrCreate()





how do i force unit test to do whole stage codegen

2017-04-04 Thread Koert Kuipers
i wrote my own expression with eval and doGenCode, but doGenCode never gets
called in tests.

also as a test i ran this in a unit test:
spark.range(10).select('id as 'asId).where('id === 4).explain
according to
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html
this is supposed to show:

== Physical Plan ==WholeStageCodegen
:  +- Project [id#0L AS asId#3L]
: +- Filter (id#0L = 4)
:+- Range 0, 1, 8, 10, [id#0L]

but it doesn't. instead it shows:

== Physical Plan ==
*Project [id#12L AS asId#15L]
+- *Filter (id#12L = 4)
   +- *Range (0, 10, step=1, splits=Some(4))

so i am again missing the WholeStageCodegen. any idea why?

i create spark session for unit tests simply as:
val session = SparkSession.builder
  .master("local[*]")
  .appName("test")
  .config("spark.sql.shuffle.partitions", 4)
  .getOrCreate()


Re: How to unit test spark streaming?

2017-03-07 Thread kant kodali
Agreed with the statement in quotes below whether one wants to do unit
tests or not It is a good practice to write code that way. But I think the
more painful and tedious task is to mock/emulate all the nodes such as
spark workers/master/hdfs/input source stream and all that. I wish there is
something really simple. Perhaps the simplest thing to do is just to do
integration tests which also tests the transformations/business logic. This
way I can spawn a small cluster and run my tests and bring my cluster down
when I am done. And sure if the cluster isn't available then I can't run
the tests however some node should be available even to run a single
process. I somehow feel like we may doing too much work to fit into the
archaic definition of unit tests.

 "Basically you abstract your transformations to take in a dataframe and
return one, then you assert on the returned df " this

On Tue, Mar 7, 2017 at 11:14 AM, Michael Armbrust 
wrote:

> Basically you abstract your transformations to take in a dataframe and
>> return one, then you assert on the returned df
>>
>
> +1 to this suggestion.  This is why we wanted streaming and batch
> dataframes to share the same API.
>


Re: How to unit test spark streaming?

2017-03-07 Thread Michael Armbrust
>
> Basically you abstract your transformations to take in a dataframe and
> return one, then you assert on the returned df
>

+1 to this suggestion.  This is why we wanted streaming and batch
dataframes to share the same API.


Re: How to unit test spark streaming?

2017-03-07 Thread Jörn Franke
This depends on your target setup! I run for example for my open source 
libraries for spark integration tests (a dedicated folder a side the unit 
tests) a local spark master, but also use a minidfs cluster (to simulate HDFS 
on a node) and sometimes also a miniyarn cluster (see 
https://wiki.apache.org/hadoop/HowToDevelopUnitTests).

 An example can be found here:  
https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/spark-bitcoinblock
 

or - if you need Scala - 
https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/scala-spark-bitcoinblock
 

In both cases it is in the integration-tests (Java) or it (Scala) folder.

Spark Streaming - I have no open source example at hand, but basically you need 
to simulate the source and the rest is as above.

 I will eventually write a blog post about this with more details.

> On 7 Mar 2017, at 13:04, kant kodali <kanth...@gmail.com> wrote:
> 
> Hi All,
> 
> How to unit test spark streaming or spark in general? How do I test the 
> results of my transformations? Also, more importantly don't we need to spawn 
> master and worker JVM's either in one or multiple nodes?
> 
> Thanks!
> kant

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: How to unit test spark streaming?

2017-03-07 Thread Sam Elamin
Hey kant

You can use holdens spark test base

Have a look at some of the specs I wrote here to give you an idea

https://github.com/samelamin/spark-bigquery/blob/master/src/test/scala/com/samelamin/spark/bigquery/BigQuerySchemaSpecs.scala

Basically you abstract your transformations to take in a dataframe and
return one, then you assert on the returned df

Regards
Sam
On Tue, 7 Mar 2017 at 12:05, kant kodali <kanth...@gmail.com> wrote:

> Hi All,
>
> How to unit test spark streaming or spark in general? How do I test the
> results of my transformations? Also, more importantly don't we need to
> spawn master and worker JVM's either in one or multiple nodes?
>
> Thanks!
> kant
>


How to unit test spark streaming?

2017-03-07 Thread kant kodali
Hi All,

How to unit test spark streaming or spark in general? How do I test the
results of my transformations? Also, more importantly don't we need to
spawn master and worker JVM's either in one or multiple nodes?

Thanks!
kant


Error in run multiple unit test that extends DataFrameSuiteBase

2016-09-23 Thread Jinyuan Zhou
After I created two test case  that FlatSpec with DataFrameSuiteBase. But I
got errors when do sbt test. I was able to run each of them separately. My
test cases does use sqlContext to read files. Here is the exception stack.
Judging from the exception, I may need to unregister RpcEndpoint after each
test run.
info] Exception encountered when attempting to run a suite with class name:
 MyTestSuit *** ABORTED ***
[info]   java.lang.IllegalArgumentException: There is already an
RpcEndpoint called LocalSchedulerBackendEndpoint
[info]   at
org.apache.spark.rpc.netty.Dispatcher.registerRpcEndpoint(Dispatcher.scala:66)
[info]   at
org.apache.spark.rpc.netty.NettyRpcEnv.setupEndpoint(NettyRpcEnv.scala:129)
[info]   at
org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:127)
[info]   at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
[info]   at org.apache.spark.SparkContext.(SparkContext.scala:500)


RE: How this unit test passed on master trunk?

2016-04-24 Thread Yong Zhang
So in that case then the result will be following:
[1,[1,1]][3,[3,1]][2,[2,1]]Thanks for explaining the meaning of the it. But the 
question is that how first() will be [3,[1,1]]? In fact, if there were any 
ordering in the final result, it will be [1,[1,1]], instead of [3,[1,1]], 
correct? 
Yong
Subject: Re: How this unit test passed on master trunk?
From: zzh...@hortonworks.com
To: java8...@hotmail.com; gatorsm...@gmail.com
CC: user@spark.apache.org
Date: Sun, 24 Apr 2016 04:37:11 +






There are multiple records for the DF




scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).show
+---+-+
|  a|min(struct(unresolvedstar()))|
+---+-+
|  1|[1,1]|
|  3|[3,1]|
|  2|[2,1]|



The meaning of .groupBy($"a").agg(min(struct($"record.*"))) is to get the min 
for all the records with the same $”a”



For example: TestData2(1,1) :: TestData2(1,2) The result would be 1, (1, 1), 
since struct(1, 1) is less than struct(1, 2). Please check how the Ordering is 
implemented in InterpretedOrdering.



The output itself does not have any ordering. I am not sure why the unit test 
and the real env have different environment.



Xiao,



I do see the difference between unit test and local cluster run. Do you know 
the reason?



Thanks.



Zhan Zhang









 

On Apr 22, 2016, at 11:23 AM, Yong Zhang <java8...@hotmail.com> wrote:



Hi,



I was trying to find out why this unit test can pass in Spark code.



in
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala



for this unit test:

  test("Star Expansion - CreateStruct and CreateArray") {
val structDf = testData2.select("a", "b").as("record")
// CreateStruct and CreateArray in aggregateExpressions
assert(structDf.groupBy($"a").agg(min(struct($"record.*"))).first() == 
Row(3, Row(3, 1)))
assert(structDf.groupBy($"a").agg(min(array($"record.*"))).first() == 
Row(3, Seq(3, 1)))

// CreateStruct and CreateArray in project list (unresolved alias)
assert(structDf.select(struct($"record.*")).first() == Row(Row(1, 1)))
assert(structDf.select(array($"record.*")).first().getAs[Seq[Int]](0) === 
Seq(1, 1))

// CreateStruct and CreateArray in project list (alias)
assert(structDf.select(struct($"record.*").as("a")).first() == Row(Row(1, 
1)))

assert(structDf.select(array($"record.*").as("a")).first().getAs[Seq[Int]](0) 
=== Seq(1, 1))
  }
>From my understanding, the data return in this case should be Row(1, Row(1, 
>1]), as that will be min of struct.
In fact, if I run the spark-shell on my laptop, and I got the result I expected:


./bin/spark-shell
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> case class TestData2(a: Int, b: Int)
defined class TestData2
scala> val testData2DF = sqlContext.sparkContext.parallelize(TestData2(1,1) :: 
TestData2(1,2) :: TestData2(2,1) :: TestData2(2,2) :: TestData2(3,1) :: 
TestData2(3,2) :: Nil, 2).toDF()
scala> val structDF = testData2DF.select("a","b").as("record")
scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).first()
res0: org.apache.spark.sql.Row = [1,[1,1]]

scala> structDF.show
+---+---+
|  a|  b|
+---+---+
|  1|  1|
|  1|  2|
|  2|  1|
|  2|  2|
|  3|  1|
|  3|  2|
+---+---+
So from my spark, which I built on the master, I cannot get Row[3,[1,1]] back 
in this case. Why the unit test asserts that Row[3,[1,1]] should be the first, 
and it will pass? But I cannot reproduce that in my spark-shell? I am trying to 
understand how to interpret the meaning of "agg(min(struct($"record.*")))"


Thanks
Yong 







  

Re: How this unit test passed on master trunk?

2016-04-23 Thread Zhan Zhang
There are multiple records for the DF

scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).show
+---+-+
|  a|min(struct(unresolvedstar()))|
+---+-+
|  1|[1,1]|
|  3|[3,1]|
|  2|[2,1]|

The meaning of .groupBy($"a").agg(min(struct($"record.*"))) is to get the min 
for all the records with the same $”a”

For example: TestData2(1,1) :: TestData2(1,2) The result would be 1, (1, 1), 
since struct(1, 1) is less than struct(1, 2). Please check how the Ordering is 
implemented in InterpretedOrdering.

The output itself does not have any ordering. I am not sure why the unit test 
and the real env have different environment.

Xiao,

I do see the difference between unit test and local cluster run. Do you know 
the reason?

Thanks.

Zhan Zhang




On Apr 22, 2016, at 11:23 AM, Yong Zhang 
<java8...@hotmail.com<mailto:java8...@hotmail.com>> wrote:

Hi,

I was trying to find out why this unit test can pass in Spark code.

in
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

for this unit test:

  test("Star Expansion - CreateStruct and CreateArray") {
val structDf = testData2.select("a", "b").as("record")
// CreateStruct and CreateArray in aggregateExpressions
assert(structDf.groupBy($"a").agg(min(struct($"record.*"))).first() == 
Row(3, Row(3, 1)))
assert(structDf.groupBy($"a").agg(min(array($"record.*"))).first() == 
Row(3, Seq(3, 1)))

// CreateStruct and CreateArray in project list (unresolved alias)
assert(structDf.select(struct($"record.*")).first() == Row(Row(1, 1)))
assert(structDf.select(array($"record.*")).first().getAs[Seq[Int]](0) === 
Seq(1, 1))

// CreateStruct and CreateArray in project list (alias)
assert(structDf.select(struct($"record.*").as("a")).first() == Row(Row(1, 
1)))

assert(structDf.select(array($"record.*").as("a")).first().getAs[Seq[Int]](0) 
=== Seq(1, 1))
  }

>From my understanding, the data return in this case should be Row(1, Row(1, 
>1]), as that will be min of struct.

In fact, if I run the spark-shell on my laptop, and I got the result I expected:


./bin/spark-shell
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> case class TestData2(a: Int, b: Int)
defined class TestData2

scala> val testData2DF = sqlContext.sparkContext.parallelize(TestData2(1,1) :: 
TestData2(1,2) :: TestData2(2,1) :: TestData2(2,2) :: TestData2(3,1) :: 
TestData2(3,2) :: Nil, 2).toDF()

scala> val structDF = testData2DF.select("a","b").as("record")

scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).first()
res0: org.apache.spark.sql.Row = [1,[1,1]]

scala> structDF.show
+---+---+
|  a|  b|
+---+---+
|  1|  1|
|  1|  2|
|  2|  1|
|  2|  2|
|  3|  1|
|  3|  2|
+---+---+

So from my spark, which I built on the master, I cannot get Row[3,[1,1]] back 
in this case. Why the unit test asserts that Row[3,[1,1]] should be the first, 
and it will pass? But I cannot reproduce that in my spark-shell? I am trying to 
understand how to interpret the meaning of "agg(min(struct($"record.*")))"


Thanks

Yong



Re: How this unit test passed on master trunk?

2016-04-22 Thread Ted Yu
This was added by Xiao through:

[SPARK-13320][SQL] Support Star in CreateStruct/CreateArray and Error
Handling when DataFrame/DataSet Functions using Star

I tried in spark-shell and got:

scala> val first =
structDf.groupBy($"a").agg(min(struct($"record.*"))).first()
first: org.apache.spark.sql.Row = [1,[1,1]]

BTW
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/715/consoleFull
shows this test passing.

On Fri, Apr 22, 2016 at 11:23 AM, Yong Zhang <java8...@hotmail.com> wrote:

> Hi,
>
> I was trying to find out why this unit test can pass in Spark code.
>
> in
>
> https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
>
> for this unit test:
>
>   test("Star Expansion - CreateStruct and CreateArray") {
> val structDf = testData2.select("a", "b").as("record")
> // CreateStruct and CreateArray in aggregateExpressions
> *assert(structDf.groupBy($"a").agg(min(struct($"record.*"))).first() == 
> Row(3, Row(3, 1)))*
> assert(structDf.groupBy($"a").agg(min(array($"record.*"))).first() == 
> Row(3, Seq(3, 1)))
>
> // CreateStruct and CreateArray in project list (unresolved alias)
> assert(structDf.select(struct($"record.*")).first() == Row(Row(1, 1)))
> assert(structDf.select(array($"record.*")).first().getAs[Seq[Int]](0) === 
> Seq(1, 1))
>
> // CreateStruct and CreateArray in project list (alias)
> assert(structDf.select(struct($"record.*").as("a")).first() == Row(Row(1, 
> 1)))
> 
> assert(structDf.select(array($"record.*").as("a")).first().getAs[Seq[Int]](0) 
> === Seq(1, 1))
>   }
>
> From my understanding, the data return in this case should be Row(1, Row(1, 
> 1]), as that will be min of struct.
>
> In fact, if I run the spark-shell on my laptop, and I got the result I 
> expected:
>
>
> ./bin/spark-shell
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
>   /_/
>
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> case class TestData2(a: Int, b: Int)
> defined class TestData2
>
> scala> val testData2DF = sqlContext.sparkContext.parallelize(TestData2(1,1) 
> :: TestData2(1,2) :: TestData2(2,1) :: TestData2(2,2) :: TestData2(3,1) :: 
> TestData2(3,2) :: Nil, 2).toDF()
>
> scala> val structDF = testData2DF.select("a","b").as("record")
>
> scala> structDF.groupBy($"a").agg(min(struct($"record.*"))).first()
> res0: org.apache.spark.sql.Row = [1,[1,1]]
>
> scala> structDF.show
> +---+---+
> |  a|  b|
> +---+---+
> |  1|  1|
> |  1|  2|
> |  2|  1|
> |  2|  2|
> |  3|  1|
> |  3|  2|
> +---+---+
>
> So from my spark, which I built on the master, I cannot get Row[3,[1,1]] back 
> in this case. Why the unit test asserts that Row[3,[1,1]] should be the 
> first, and it will pass? But I cannot reproduce that in my spark-shell? I am 
> trying to understand how to interpret the meaning of 
> "agg(min(struct($"record.*")))"
>
>
> Thanks
>
> Yong
>
>


How this unit test passed on master trunk?

2016-04-22 Thread Yong Zhang
Hi,
I was trying to find out why this unit test can pass in Spark code.
inhttps://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
for this unit test:
  test("Star Expansion - CreateStruct and CreateArray") {
val structDf = testData2.select("a", "b").as("record")
// CreateStruct and CreateArray in aggregateExpressions
assert(structDf.groupBy($"a").agg(min(struct($"record.*"))).first() == 
Row(3, Row(3, 1)))
assert(structDf.groupBy($"a").agg(min(array($"record.*"))).first() == 
Row(3, Seq(3, 1)))

// CreateStruct and CreateArray in project list (unresolved alias)
assert(structDf.select(struct($"record.*")).first() == Row(Row(1, 1)))
assert(structDf.select(array($"record.*")).first().getAs[Seq[Int]](0) === 
Seq(1, 1))

// CreateStruct and CreateArray in project list (alias)
assert(structDf.select(struct($"record.*").as("a")).first() == Row(Row(1, 
1)))

assert(structDf.select(array($"record.*").as("a")).first().getAs[Seq[Int]](0) 
=== Seq(1, 1))
  }From my understanding, the data return in this case should be Row(1, Row(1, 
1]), as that will be min of struct.In fact, if I run the spark-shell on my 
laptop, and I got the result I expected:
./bin/spark-shell
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> case class TestData2(a: Int, b: Int)
defined class TestData2scala> val testData2DF = 
sqlContext.sparkContext.parallelize(TestData2(1,1) :: TestData2(1,2) :: 
TestData2(2,1) :: TestData2(2,2) :: TestData2(3,1) :: TestData2(3,2) :: Nil, 
2).toDF()scala> val structDF = testData2DF.select("a","b").as("record")scala> 
structDF.groupBy($"a").agg(min(struct($"record.*"))).first()
res0: org.apache.spark.sql.Row = [1,[1,1]]

scala> structDF.show
+---+---+
|  a|  b|
+---+---+
|  1|  1|
|  1|  2|
|  2|  1|
|  2|  2|
|  3|  1|
|  3|  2|
+---+---+So from my spark, which I built on the master, I cannot get 
Row[3,[1,1]] back in this case. Why the unit test asserts that Row[3,[1,1]] 
should be the first, and it will pass? But I cannot reproduce that in my 
spark-shell? I am trying to understand how to interpret the meaning of 
"agg(min(struct($"record.*")))"
ThanksYong

Re: Unit test with sqlContext

2016-03-19 Thread Vikas Kawadia
If you prefer  the py.test framework, I just wrote a blog post with some
examples:

Unit testing Apache Spark with py.test
https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b

On Fri, Feb 5, 2016 at 11:43 AM, Steve Annessa <steve.anne...@gmail.com>
wrote:

> Thanks for all of the responses.
>
> I do have an afterAll that stops the sc.
>
> While looking over Holden's readme I noticed she mentioned "Make sure to
> disable parallel execution." That was what I was missing; I added the
> follow to my build.sbt:
>
> ```
> parallelExecution in Test := false
> ```
>
> Now all of my tests are running.
>
> I'm going to look into using the package she created.
>
> Thanks again,
>
> -- Steve
>
>
> On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rmis...@snappydata.io>
> wrote:
>
>> Hi Steve,
>> Have you cleaned up your SparkContext ( sc.stop())  , in a afterAll().
>> The error suggests you are creating more than one SparkContext.
>>
>>
>> On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>> Thanks for recommending spark-testing-base :) Just wanted to add if
>>> anyone has feature requests for Spark testing please get in touch (or add
>>> an issue on the github) :)
>>>
>>>
>>> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
>>> silvio.fior...@granturing.com> wrote:
>>>
>>>> Hi Steve,
>>>>
>>>> Have you looked at the spark-testing-base package by Holden? It’s
>>>> really useful for unit testing Spark apps as it handles all the
>>>> bootstrapping for you.
>>>>
>>>> https://github.com/holdenk/spark-testing-base
>>>>
>>>> DataFrame examples are here:
>>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>>>
>>>> Thanks,
>>>> Silvio
>>>>
>>>> From: Steve Annessa <steve.anne...@gmail.com>
>>>> Date: Thursday, February 4, 2016 at 8:36 PM
>>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>>> Subject: Unit test with sqlContext
>>>>
>>>> I'm trying to unit test a function that reads in a JSON file,
>>>> manipulates the DF and then returns a Scala Map.
>>>>
>>>> The function has signature:
>>>> def ingest(dataLocation: String, sc: SparkContext, sqlContext:
>>>> SQLContext)
>>>>
>>>> I've created a bootstrap spec for spark jobs that instantiates the
>>>> Spark Context and SQLContext like so:
>>>>
>>>> @transient var sc: SparkContext = _
>>>> @transient var sqlContext: SQLContext = _
>>>>
>>>> override def beforeAll = {
>>>>   System.clearProperty("spark.driver.port")
>>>>   System.clearProperty("spark.hostPort")
>>>>
>>>>   val conf = new SparkConf()
>>>> .setMaster(master)
>>>> .setAppName(appName)
>>>>
>>>>   sc = new SparkContext(conf)
>>>>   sqlContext = new SQLContext(sc)
>>>> }
>>>>
>>>> When I do not include sqlContext, my tests run. Once I add the
>>>> sqlContext I get the following errors:
>>>>
>>>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
>>>> constructed (or threw an exception in its constructor).  This may indicate
>>>> an error, since only one SparkContext may be running in this JVM (see
>>>> SPARK-2243). The other SparkContext was created at:
>>>> org.apache.spark.SparkContext.(SparkContext.scala:81)
>>>>
>>>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
>>>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is
>>>> not unique!
>>>>
>>>> and finally:
>>>>
>>>> [info] IngestSpec:
>>>> [info] Exception encountered when attempting to run a suite with class
>>>> name: com.company.package.IngestSpec *** ABORTED ***
>>>> [info]   akka.actor.InvalidActorNameException: actor name
>>>> [ExecutorEndpoint] is not unique!
>>>>
>>>>
>>>> What do I need to do to get a sqlContext through my tests?
>>>>
>>>> Thanks,
>>>>
>>>> -- Steve
>>>>
>>>
>>>
>>>
>>> --
>>> Cell : 425-233-8271
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>>
>>
>> --
>> Regards,
>> Rishitesh Mishra,
>> SnappyData . (http://www.snappydata.io/)
>>
>> https://in.linkedin.com/in/rishiteshmishra
>>
>
>


Re: Unit test with sqlContext

2016-02-05 Thread Steve Annessa
Thanks for all of the responses.

I do have an afterAll that stops the sc.

While looking over Holden's readme I noticed she mentioned "Make sure to
disable parallel execution." That was what I was missing; I added the
follow to my build.sbt:

```
parallelExecution in Test := false
```

Now all of my tests are running.

I'm going to look into using the package she created.

Thanks again,

-- Steve


On Thu, Feb 4, 2016 at 8:50 PM, Rishi Mishra <rmis...@snappydata.io> wrote:

> Hi Steve,
> Have you cleaned up your SparkContext ( sc.stop())  , in a afterAll(). The
> error suggests you are creating more than one SparkContext.
>
>
> On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca>
> wrote:
>
>> Thanks for recommending spark-testing-base :) Just wanted to add if
>> anyone has feature requests for Spark testing please get in touch (or add
>> an issue on the github) :)
>>
>>
>> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
>> silvio.fior...@granturing.com> wrote:
>>
>>> Hi Steve,
>>>
>>> Have you looked at the spark-testing-base package by Holden? It’s really
>>> useful for unit testing Spark apps as it handles all the bootstrapping for
>>> you.
>>>
>>> https://github.com/holdenk/spark-testing-base
>>>
>>> DataFrame examples are here:
>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>>
>>> Thanks,
>>> Silvio
>>>
>>> From: Steve Annessa <steve.anne...@gmail.com>
>>> Date: Thursday, February 4, 2016 at 8:36 PM
>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: Unit test with sqlContext
>>>
>>> I'm trying to unit test a function that reads in a JSON file,
>>> manipulates the DF and then returns a Scala Map.
>>>
>>> The function has signature:
>>> def ingest(dataLocation: String, sc: SparkContext, sqlContext:
>>> SQLContext)
>>>
>>> I've created a bootstrap spec for spark jobs that instantiates the Spark
>>> Context and SQLContext like so:
>>>
>>> @transient var sc: SparkContext = _
>>> @transient var sqlContext: SQLContext = _
>>>
>>> override def beforeAll = {
>>>   System.clearProperty("spark.driver.port")
>>>   System.clearProperty("spark.hostPort")
>>>
>>>   val conf = new SparkConf()
>>> .setMaster(master)
>>> .setAppName(appName)
>>>
>>>   sc = new SparkContext(conf)
>>>   sqlContext = new SQLContext(sc)
>>> }
>>>
>>> When I do not include sqlContext, my tests run. Once I add the
>>> sqlContext I get the following errors:
>>>
>>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
>>> constructed (or threw an exception in its constructor).  This may indicate
>>> an error, since only one SparkContext may be running in this JVM (see
>>> SPARK-2243). The other SparkContext was created at:
>>> org.apache.spark.SparkContext.(SparkContext.scala:81)
>>>
>>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
>>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is
>>> not unique!
>>>
>>> and finally:
>>>
>>> [info] IngestSpec:
>>> [info] Exception encountered when attempting to run a suite with class
>>> name: com.company.package.IngestSpec *** ABORTED ***
>>> [info]   akka.actor.InvalidActorNameException: actor name
>>> [ExecutorEndpoint] is not unique!
>>>
>>>
>>> What do I need to do to get a sqlContext through my tests?
>>>
>>> Thanks,
>>>
>>> -- Steve
>>>
>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Regards,
> Rishitesh Mishra,
> SnappyData . (http://www.snappydata.io/)
>
> https://in.linkedin.com/in/rishiteshmishra
>


Unit test with sqlContext

2016-02-04 Thread Steve Annessa
I'm trying to unit test a function that reads in a JSON file, manipulates
the DF and then returns a Scala Map.

The function has signature:
def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)

I've created a bootstrap spec for spark jobs that instantiates the Spark
Context and SQLContext like so:

@transient var sc: SparkContext = _
@transient var sqlContext: SQLContext = _

override def beforeAll = {
  System.clearProperty("spark.driver.port")
  System.clearProperty("spark.hostPort")

  val conf = new SparkConf()
.setMaster(master)
.setAppName(appName)

  sc = new SparkContext(conf)
  sqlContext = new SQLContext(sc)
}

When I do not include sqlContext, my tests run. Once I add the sqlContext I
get the following errors:

16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
constructed (or threw an exception in its constructor).  This may indicate
an error, since only one SparkContext may be running in this JVM (see
SPARK-2243). The other SparkContext was created at:
org.apache.spark.SparkContext.(SparkContext.scala:81)

16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not
unique!

and finally:

[info] IngestSpec:
[info] Exception encountered when attempting to run a suite with class
name: com.company.package.IngestSpec *** ABORTED ***
[info]   akka.actor.InvalidActorNameException: actor name
[ExecutorEndpoint] is not unique!


What do I need to do to get a sqlContext through my tests?

Thanks,

-- Steve


Re: Unit test with sqlContext

2016-02-04 Thread Silvio Fiorito
Hi Steve,

Have you looked at the spark-testing-base package by Holden? It’s really useful 
for unit testing Spark apps as it handles all the bootstrapping for you.

https://github.com/holdenk/spark-testing-base

DataFrame examples are here: 
https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala

Thanks,
Silvio

From: Steve Annessa <steve.anne...@gmail.com<mailto:steve.anne...@gmail.com>>
Date: Thursday, February 4, 2016 at 8:36 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Unit test with sqlContext

I'm trying to unit test a function that reads in a JSON file, manipulates the 
DF and then returns a Scala Map.

The function has signature:
def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)

I've created a bootstrap spec for spark jobs that instantiates the Spark 
Context and SQLContext like so:

@transient var sc: SparkContext = _
@transient var sqlContext: SQLContext = _

override def beforeAll = {
  System.clearProperty("spark.driver.port")
  System.clearProperty("spark.hostPort")

  val conf = new SparkConf()
.setMaster(master)
.setAppName(appName)

  sc = new SparkContext(conf)
  sqlContext = new SQLContext(sc)
}

When I do not include sqlContext, my tests run. Once I add the sqlContext I get 
the following errors:

16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being constructed 
(or threw an exception in its constructor).  This may indicate an error, since 
only one SparkContext may be running in this JVM (see SPARK-2243). The other 
SparkContext was created at:
org.apache.spark.SparkContext.(SparkContext.scala:81)

16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not 
unique!

and finally:

[info] IngestSpec:
[info] Exception encountered when attempting to run a suite with class name: 
com.company.package.IngestSpec *** ABORTED ***
[info]   akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is 
not unique!


What do I need to do to get a sqlContext through my tests?

Thanks,

-- Steve


Re: Unit test with sqlContext

2016-02-04 Thread Rishi Mishra
Hi Steve,
Have you cleaned up your SparkContext ( sc.stop())  , in a afterAll(). The
error suggests you are creating more than one SparkContext.


On Fri, Feb 5, 2016 at 10:04 AM, Holden Karau <hol...@pigscanfly.ca> wrote:

> Thanks for recommending spark-testing-base :) Just wanted to add if anyone
> has feature requests for Spark testing please get in touch (or add an issue
> on the github) :)
>
>
> On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
> silvio.fior...@granturing.com> wrote:
>
>> Hi Steve,
>>
>> Have you looked at the spark-testing-base package by Holden? It’s really
>> useful for unit testing Spark apps as it handles all the bootstrapping for
>> you.
>>
>> https://github.com/holdenk/spark-testing-base
>>
>> DataFrame examples are here:
>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>
>> Thanks,
>> Silvio
>>
>> From: Steve Annessa <steve.anne...@gmail.com>
>> Date: Thursday, February 4, 2016 at 8:36 PM
>> To: "user@spark.apache.org" <user@spark.apache.org>
>> Subject: Unit test with sqlContext
>>
>> I'm trying to unit test a function that reads in a JSON file, manipulates
>> the DF and then returns a Scala Map.
>>
>> The function has signature:
>> def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)
>>
>> I've created a bootstrap spec for spark jobs that instantiates the Spark
>> Context and SQLContext like so:
>>
>> @transient var sc: SparkContext = _
>> @transient var sqlContext: SQLContext = _
>>
>> override def beforeAll = {
>>   System.clearProperty("spark.driver.port")
>>   System.clearProperty("spark.hostPort")
>>
>>   val conf = new SparkConf()
>> .setMaster(master)
>> .setAppName(appName)
>>
>>   sc = new SparkContext(conf)
>>   sqlContext = new SQLContext(sc)
>> }
>>
>> When I do not include sqlContext, my tests run. Once I add the sqlContext
>> I get the following errors:
>>
>> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
>> constructed (or threw an exception in its constructor).  This may indicate
>> an error, since only one SparkContext may be running in this JVM (see
>> SPARK-2243). The other SparkContext was created at:
>> org.apache.spark.SparkContext.(SparkContext.scala:81)
>>
>> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
>> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is
>> not unique!
>>
>> and finally:
>>
>> [info] IngestSpec:
>> [info] Exception encountered when attempting to run a suite with class
>> name: com.company.package.IngestSpec *** ABORTED ***
>> [info]   akka.actor.InvalidActorNameException: actor name
>> [ExecutorEndpoint] is not unique!
>>
>>
>> What do I need to do to get a sqlContext through my tests?
>>
>> Thanks,
>>
>> -- Steve
>>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>



-- 
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra


Re: Unit test with sqlContext

2016-02-04 Thread Holden Karau
Thanks for recommending spark-testing-base :) Just wanted to add if anyone
has feature requests for Spark testing please get in touch (or add an issue
on the github) :)


On Thu, Feb 4, 2016 at 8:25 PM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

> Hi Steve,
>
> Have you looked at the spark-testing-base package by Holden? It’s really
> useful for unit testing Spark apps as it handles all the bootstrapping for
> you.
>
> https://github.com/holdenk/spark-testing-base
>
> DataFrame examples are here:
> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>
> Thanks,
> Silvio
>
> From: Steve Annessa <steve.anne...@gmail.com>
> Date: Thursday, February 4, 2016 at 8:36 PM
> To: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Unit test with sqlContext
>
> I'm trying to unit test a function that reads in a JSON file, manipulates
> the DF and then returns a Scala Map.
>
> The function has signature:
> def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)
>
> I've created a bootstrap spec for spark jobs that instantiates the Spark
> Context and SQLContext like so:
>
> @transient var sc: SparkContext = _
> @transient var sqlContext: SQLContext = _
>
> override def beforeAll = {
>   System.clearProperty("spark.driver.port")
>   System.clearProperty("spark.hostPort")
>
>   val conf = new SparkConf()
> .setMaster(master)
> .setAppName(appName)
>
>   sc = new SparkContext(conf)
>   sqlContext = new SQLContext(sc)
> }
>
> When I do not include sqlContext, my tests run. Once I add the sqlContext
> I get the following errors:
>
> 16/02/04 17:31:58 WARN SparkContext: Another SparkContext is being
> constructed (or threw an exception in its constructor).  This may indicate
> an error, since only one SparkContext may be running in this JVM (see
> SPARK-2243). The other SparkContext was created at:
> org.apache.spark.SparkContext.(SparkContext.scala:81)
>
> 16/02/04 17:31:59 ERROR SparkContext: Error initializing SparkContext.
> akka.actor.InvalidActorNameException: actor name [ExecutorEndpoint] is not
> unique!
>
> and finally:
>
> [info] IngestSpec:
> [info] Exception encountered when attempting to run a suite with class
> name: com.company.package.IngestSpec *** ABORTED ***
> [info]   akka.actor.InvalidActorNameException: actor name
> [ExecutorEndpoint] is not unique!
>
>
> What do I need to do to get a sqlContext through my tests?
>
> Thanks,
>
> -- Steve
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: how to run unit test for specific component only

2015-11-13 Thread Steve Loughran
try:

mvn test -pl sql  -DwildcardSuites=org.apache.spark.sql -Dtest=none




On 12 Nov 2015, at 03:13, weoccc <weo...@gmail.com<mailto:weo...@gmail.com>> 
wrote:

Hi,

I am wondering how to run unit test for specific spark component only.

mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none

The above command doesn't seem to work. I'm using spark 1.5.

Thanks,

Weide




Re: how to run unit test for specific component only

2015-11-11 Thread Ted Yu
Have you tried the following ?

build/sbt "sql/test-only *"

Cheers

On Wed, Nov 11, 2015 at 7:13 PM, weoccc <weo...@gmail.com> wrote:

> Hi,
>
> I am wondering how to run unit test for specific spark component only.
>
> mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none
>
> The above command doesn't seem to work. I'm using spark 1.5.
>
> Thanks,
>
> Weide
>
>
>


how to run unit test for specific component only

2015-11-11 Thread weoccc
Hi,

I am wondering how to run unit test for specific spark component only.

mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none

The above command doesn't seem to work. I'm using spark 1.5.

Thanks,

Weide


Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-26 Thread Mike Trienis
Thanks for your response Yana,

I can increase the MaxPermSize parameter and it will allow me to run the
unit test a few more times before I run out of memory.

However, the primary issue is that running the same unit test in the same
JVM (multiple times) results in increased memory (each run of the unit
test) and I believe it has something to do with HiveContext not reclaiming
memory after it is finished (or I'm not shutting it down properly).

It could very well be related to sbt, however, it's not clear to me.


On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:

 The PermGen space error is controlled with MaxPermSize parameter. I run
 with this in my pom, I think copied pretty literally from Spark's own
 tests... I don't know what the sbt equivalent is but you should be able to
 pass it...possibly via SBT_OPTS?


  plugin
   groupIdorg.scalatest/groupId
   artifactIdscalatest-maven-plugin/artifactId
   version1.0/version
   configuration

 reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
   parallelfalse/parallel
   junitxml./junitxml
   filereportsSparkTestSuite.txt/filereports
   argLine-Xmx3g -XX:MaxPermSize=256m
 -XX:ReservedCodeCacheSize=512m/argLine
   stderr/
   systemProperties
   java.awt.headlesstrue/java.awt.headless
   spark.testing1/spark.testing
   spark.ui.enabledfalse/spark.ui.enabled

 spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts
   /systemProperties
   /configuration
   executions
   execution
   idtest/id
   goals
   goaltest/goal
   /goals
   /execution
   /executions
   /plugin
   /plugins


 On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com
 wrote:

 Hello,

 I am using sbt and created a unit test where I create a `HiveContext` and
 execute some query and then return. Each time I run the unit test the JVM
 will increase it's memory usage until I get the error:

 Internal error when running tests: java.lang.OutOfMemoryError: PermGen
 space
 Exception in thread Thread-2 java.io.EOFException

 As a work-around, I can fork a new JVM each time I run the unit test,
 however, it seems like a bad solution as takes a while to run the unit
 test.

 By the way, I tried to importing the TestHiveContext:

- import org.apache.spark.sql.hive.test.TestHiveContext

 However, it suffers from the same memory issue. Has anyone else suffered
 from the same problem? Note that I am running these unit tests on my mac.

 Cheers, Mike.





Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-26 Thread Michael Armbrust
I'd suggest setting sbt to fork when running tests.

On Wed, Aug 26, 2015 at 10:51 AM, Mike Trienis mike.trie...@orcsol.com
wrote:

 Thanks for your response Yana,

 I can increase the MaxPermSize parameter and it will allow me to run the
 unit test a few more times before I run out of memory.

 However, the primary issue is that running the same unit test in the same
 JVM (multiple times) results in increased memory (each run of the unit
 test) and I believe it has something to do with HiveContext not reclaiming
 memory after it is finished (or I'm not shutting it down properly).

 It could very well be related to sbt, however, it's not clear to me.


 On Tue, Aug 25, 2015 at 1:12 PM, Yana Kadiyska yana.kadiy...@gmail.com
 wrote:

 The PermGen space error is controlled with MaxPermSize parameter. I run
 with this in my pom, I think copied pretty literally from Spark's own
 tests... I don't know what the sbt equivalent is but you should be able to
 pass it...possibly via SBT_OPTS?


  plugin
   groupIdorg.scalatest/groupId
   artifactIdscalatest-maven-plugin/artifactId
   version1.0/version
   configuration

 reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
   parallelfalse/parallel
   junitxml./junitxml
   filereportsSparkTestSuite.txt/filereports
   argLine-Xmx3g -XX:MaxPermSize=256m
 -XX:ReservedCodeCacheSize=512m/argLine
   stderr/
   systemProperties
   java.awt.headlesstrue/java.awt.headless
   spark.testing1/spark.testing
   spark.ui.enabledfalse/spark.ui.enabled

 spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts
   /systemProperties
   /configuration
   executions
   execution
   idtest/id
   goals
   goaltest/goal
   /goals
   /execution
   /executions
   /plugin
   /plugins


 On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com
 wrote:

 Hello,

 I am using sbt and created a unit test where I create a `HiveContext`
 and execute some query and then return. Each time I run the unit test the
 JVM will increase it's memory usage until I get the error:

 Internal error when running tests: java.lang.OutOfMemoryError: PermGen
 space
 Exception in thread Thread-2 java.io.EOFException

 As a work-around, I can fork a new JVM each time I run the unit test,
 however, it seems like a bad solution as takes a while to run the unit
 test.

 By the way, I tried to importing the TestHiveContext:

- import org.apache.spark.sql.hive.test.TestHiveContext

 However, it suffers from the same memory issue. Has anyone else suffered
 from the same problem? Note that I am running these unit tests on my mac.

 Cheers, Mike.






How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-25 Thread Mike Trienis
Hello,

I am using sbt and created a unit test where I create a `HiveContext` and
execute some query and then return. Each time I run the unit test the JVM
will increase it's memory usage until I get the error:

Internal error when running tests: java.lang.OutOfMemoryError: PermGen space
Exception in thread Thread-2 java.io.EOFException

As a work-around, I can fork a new JVM each time I run the unit test,
however, it seems like a bad solution as takes a while to run the unit
test.

By the way, I tried to importing the TestHiveContext:

   - import org.apache.spark.sql.hive.test.TestHiveContext

However, it suffers from the same memory issue. Has anyone else suffered
from the same problem? Note that I am running these unit tests on my mac.

Cheers, Mike.


Re: How to unit test HiveContext without OutOfMemoryError (using sbt)

2015-08-25 Thread Yana Kadiyska
The PermGen space error is controlled with MaxPermSize parameter. I run
with this in my pom, I think copied pretty literally from Spark's own
tests... I don't know what the sbt equivalent is but you should be able to
pass it...possibly via SBT_OPTS?


 plugin
  groupIdorg.scalatest/groupId
  artifactIdscalatest-maven-plugin/artifactId
  version1.0/version
  configuration

reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
  parallelfalse/parallel
  junitxml./junitxml
  filereportsSparkTestSuite.txt/filereports
  argLine-Xmx3g -XX:MaxPermSize=256m
-XX:ReservedCodeCacheSize=512m/argLine
  stderr/
  systemProperties
  java.awt.headlesstrue/java.awt.headless
  spark.testing1/spark.testing
  spark.ui.enabledfalse/spark.ui.enabled

spark.driver.allowMultipleContextstrue/spark.driver.allowMultipleContexts
  /systemProperties
  /configuration
  executions
  execution
  idtest/id
  goals
  goaltest/goal
  /goals
  /execution
  /executions
  /plugin
  /plugins


On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com
wrote:

 Hello,

 I am using sbt and created a unit test where I create a `HiveContext` and
 execute some query and then return. Each time I run the unit test the JVM
 will increase it's memory usage until I get the error:

 Internal error when running tests: java.lang.OutOfMemoryError: PermGen
 space
 Exception in thread Thread-2 java.io.EOFException

 As a work-around, I can fork a new JVM each time I run the unit test,
 however, it seems like a bad solution as takes a while to run the unit
 test.

 By the way, I tried to importing the TestHiveContext:

- import org.apache.spark.sql.hive.test.TestHiveContext

 However, it suffers from the same memory issue. Has anyone else suffered
 from the same problem? Note that I am running these unit tests on my mac.

 Cheers, Mike.




Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed

2015-05-20 Thread Tathagata Das
Has this been fixed for you now? There has been a number of patches since
then and it may have been fixed.

On Thu, May 14, 2015 at 7:20 AM, Wangfei (X) wangf...@huawei.com wrote:

  Yes it is repeatedly on my locally Jenkins.

 发自我的 iPhone

 在 2015年5月14日,18:30,Tathagata Das t...@databricks.com 写道:

   Do you get this failure repeatedly?



 On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote:

 Hi, all, i got following error when i run unit test of spark by
 dev/run-tests
 on the latest branch-1.4 branch.

 the latest commit id:
 commit d518c0369fa412567855980c3f0f426cde5c190d
 Author: zsxwing zsxw...@gmail.com
 Date:   Wed May 13 17:58:29 2015 -0700

 error

 [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started
 [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed:
 org.apache.spark.SparkException: Error communicating with MapOutputTracker
 [error] at
 org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113)
 [error] at
 org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119)
 [error] at
 org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
 [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
 [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577)
 [error] at

 org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626)
 [error] at

 org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597)
 [error] at

 org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102)
 [error] at

 org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
 [error] at

 org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
 [error] at
 org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala)
 [error] at
 org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103)
 [error] ...
 [error] Caused by: org.apache.spark.SparkException: Error sending message
 [message = StopMapOutputTracker]
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116)
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
 [error] at
 org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
 [error] ... 52 more
 [error] Caused by: java.util.concurrent.TimeoutException: Futures timed
 out
 after [120 seconds]
 [error] at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 [error] at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 [error] at
 scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 [error] at

 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 [error] at scala.concurrent.Await$.result(package.scala:107)
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
 [error] ... 54 more



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed

2015-05-14 Thread Wangfei (X)
Yes it is repeatedly on my locally Jenkins.

发自我的 iPhone

在 2015年5月14日,18:30,Tathagata Das 
t...@databricks.commailto:t...@databricks.com 写道:

Do you get this failure repeatedly?



On Thu, May 14, 2015 at 12:55 AM, kf 
wangf...@huawei.commailto:wangf...@huawei.com wrote:
Hi, all, i got following error when i run unit test of spark by dev/run-tests
on the latest branch-1.4 branch.

the latest commit id:
commit d518c0369fa412567855980c3f0f426cde5c190d
Author: zsxwing zsxw...@gmail.commailto:zsxw...@gmail.com
Date:   Wed May 13 17:58:29 2015 -0700

error

[info] Test org.apache.spark.streaming.JavaAPISuite.testCount started
[error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed:
org.apache.spark.SparkException: Error communicating with MapOutputTracker
[error] at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113)
[error] at
org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119)
[error] at
org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
[error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
[error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577)
[error] at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626)
[error] at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597)
[error] at
org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala)
[error] at
org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103)
[error] ...
[error] Caused by: org.apache.spark.SparkException: Error sending message
[message = StopMapOutputTracker]
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116)
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
[error] at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
[error] ... 52 more
[error] Caused by: java.util.concurrent.TimeoutException: Futures timed out
after [120 seconds]
[error] at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
[error] at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
[error] at
scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
[error] at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
[error] at scala.concurrent.Await$.result(package.scala:107)
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
[error] ... 54 more



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html
Sent from the Apache Spark User List mailing list archive at 
Nabble.comhttp://Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org




[Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed

2015-05-14 Thread kf
Hi, all, i got following error when i run unit test of spark by dev/run-tests
on the latest branch-1.4 branch. 

the latest commit id: 
commit d518c0369fa412567855980c3f0f426cde5c190d
Author: zsxwing zsxw...@gmail.com
Date:   Wed May 13 17:58:29 2015 -0700

error

[info] Test org.apache.spark.streaming.JavaAPISuite.testCount started
[error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed:
org.apache.spark.SparkException: Error communicating with MapOutputTracker
[error] at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113)
[error] at
org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119)
[error] at
org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
[error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
[error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577)
[error] at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626)
[error] at
org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597)
[error] at
org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74)
[error] at
org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
[error] at
org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala)
[error] at
org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103)
[error] ...
[error] Caused by: org.apache.spark.SparkException: Error sending message
[message = StopMapOutputTracker]
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116)
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
[error] at
org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
[error] ... 52 more
[error] Caused by: java.util.concurrent.TimeoutException: Futures timed out
after [120 seconds]
[error] at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
[error] at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
[error] at
scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
[error] at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
[error] at scala.concurrent.Await$.result(package.scala:107)
[error] at
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
[error] ... 54 more



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: [Unit Test Failure] Test org.apache.spark.streaming.JavaAPISuite.testCount failed

2015-05-14 Thread Tathagata Das
Do you get this failure repeatedly?



On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote:

 Hi, all, i got following error when i run unit test of spark by
 dev/run-tests
 on the latest branch-1.4 branch.

 the latest commit id:
 commit d518c0369fa412567855980c3f0f426cde5c190d
 Author: zsxwing zsxw...@gmail.com
 Date:   Wed May 13 17:58:29 2015 -0700

 error

 [info] Test org.apache.spark.streaming.JavaAPISuite.testCount started
 [error] Test org.apache.spark.streaming.JavaAPISuite.testCount failed:
 org.apache.spark.SparkException: Error communicating with MapOutputTracker
 [error] at
 org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:113)
 [error] at
 org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:119)
 [error] at
 org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:324)
 [error] at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
 [error] at org.apache.spark.SparkContext.stop(SparkContext.scala:1577)
 [error] at

 org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:626)
 [error] at

 org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:597)
 [error] at

 org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:403)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreamsWithPartitions(JavaTestUtils.scala:102)
 [error] at

 org.apache.spark.streaming.TestSuiteBase$class.runStreams(TestSuiteBase.scala:344)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
 [error] at

 org.apache.spark.streaming.JavaTestBase$class.runStreams(JavaTestUtils.scala:74)
 [error] at

 org.apache.spark.streaming.JavaTestUtils$.runStreams(JavaTestUtils.scala:102)
 [error] at
 org.apache.spark.streaming.JavaTestUtils.runStreams(JavaTestUtils.scala)
 [error] at
 org.apache.spark.streaming.JavaAPISuite.testCount(JavaAPISuite.java:103)
 [error] ...
 [error] Caused by: org.apache.spark.SparkException: Error sending message
 [message = StopMapOutputTracker]
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:116)
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
 [error] at
 org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:109)
 [error] ... 52 more
 [error] Caused by: java.util.concurrent.TimeoutException: Futures timed out
 after [120 seconds]
 [error] at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 [error] at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 [error] at
 scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 [error] at

 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 [error] at scala.concurrent.Await$.result(package.scala:107)
 [error] at
 org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
 [error] ... 54 more



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-Failure-Test-org-apache-spark-streaming-JavaAPISuite-testCount-failed-tp22879.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark unit test fails

2015-05-07 Thread NoWisdom
I'm also getting the same error.

Any ideas?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368p22798.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Cannot run unit test.

2015-04-08 Thread Mike Trienis
It's because your tests are running in parallel and you can only have one
context running at a time. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p22429.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark unit test fails

2015-04-06 Thread Manas Kar
Trying to bump up the rank of the question.
Any example on Github can someone point to?

..Manas

On Fri, Apr 3, 2015 at 9:39 AM, manasdebashiskar manasdebashis...@gmail.com
 wrote:

 Hi experts,
  I am trying to write unit tests for my spark application which fails with
 javax.servlet.FilterRegistration error.

 I am using CDH5.3.2 Spark and below is my dependencies list.
 val spark   = 1.2.0-cdh5.3.2
 val esriGeometryAPI = 1.2
 val csvWriter   = 1.0.0
 val hadoopClient= 2.3.0
 val scalaTest   = 2.2.1
 val jodaTime= 1.6.0
 val scalajHTTP  = 1.0.1
 val avro= 1.7.7
 val scopt   = 3.2.0
 val config  = 1.2.1
 val jobserver   = 0.4.1
 val excludeJBossNetty = ExclusionRule(organization = org.jboss.netty)
 val excludeIONetty = ExclusionRule(organization = io.netty)
 val excludeEclipseJetty = ExclusionRule(organization =
 org.eclipse.jetty)
 val excludeMortbayJetty = ExclusionRule(organization =
 org.mortbay.jetty)
 val excludeAsm = ExclusionRule(organization = org.ow2.asm)
 val excludeOldAsm = ExclusionRule(organization = asm)
 val excludeCommonsLogging = ExclusionRule(organization =
 commons-logging)
 val excludeSLF4J = ExclusionRule(organization = org.slf4j)
 val excludeScalap = ExclusionRule(organization = org.scala-lang,
 artifact = scalap)
 val excludeHadoop = ExclusionRule(organization = org.apache.hadoop)
 val excludeCurator = ExclusionRule(organization = org.apache.curator)
 val excludePowermock = ExclusionRule(organization = org.powermock)
 val excludeFastutil = ExclusionRule(organization = it.unimi.dsi)
 val excludeJruby = ExclusionRule(organization = org.jruby)
 val excludeThrift = ExclusionRule(organization = org.apache.thrift)
 val excludeServletApi = ExclusionRule(organization = javax.servlet,
 artifact = servlet-api)
 val excludeJUnit = ExclusionRule(organization = junit)

 I found the link (
 http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-SecurityException-when-running-tests-with-Spark-1-0-0-td6747.html#a6749
 ) talking about the issue and the work around of the same.
 But that work around does not get rid of the problem for me.
 I am using an SBT build which can't be changed to maven.

 What am I missing?


 Stack trace
 -
 [info] FiltersRDDSpec:
 [info] - Spark Filter *** FAILED ***
 [info]   java.lang.SecurityException: class
 javax.servlet.FilterRegistration's signer information does not match
 signer information of other classes in the same package
 [info]   at java.lang.ClassLoader.checkCerts(Unknown Source)
 [info]   at java.lang.ClassLoader.preDefineClass(Unknown Source)
 [info]   at java.lang.ClassLoader.defineClass(Unknown Source)
 [info]   at java.security.SecureClassLoader.defineClass(Unknown Source)
 [info]   at java.net.URLClassLoader.defineClass(Unknown Source)
 [info]   at java.net.URLClassLoader.access$100(Unknown Source)
 [info]   at java.net.URLClassLoader$1.run(Unknown Source)
 [info]   at java.net.URLClassLoader$1.run(Unknown Source)
 [info]   at java.security.AccessController.doPrivileged(Native Method)
 [info]   at java.net.URLClassLoader.findClass(Unknown Source)

 Thanks
 Manas
  Manas Kar

 --
 View this message in context: Spark unit test fails
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.



Spark unit test fails

2015-04-03 Thread Manas Kar
Hi experts,
 I am trying to write unit tests for my spark application which fails with
javax.servlet.FilterRegistration error.

I am using CDH5.3.2 Spark and below is my dependencies list.
val spark   = 1.2.0-cdh5.3.2
val esriGeometryAPI = 1.2
val csvWriter   = 1.0.0
val hadoopClient= 2.3.0
val scalaTest   = 2.2.1
val jodaTime= 1.6.0
val scalajHTTP  = 1.0.1
val avro= 1.7.7
val scopt   = 3.2.0
val config  = 1.2.1
val jobserver   = 0.4.1
val excludeJBossNetty = ExclusionRule(organization = org.jboss.netty)
val excludeIONetty = ExclusionRule(organization = io.netty)
val excludeEclipseJetty = ExclusionRule(organization =
org.eclipse.jetty)
val excludeMortbayJetty = ExclusionRule(organization =
org.mortbay.jetty)
val excludeAsm = ExclusionRule(organization = org.ow2.asm)
val excludeOldAsm = ExclusionRule(organization = asm)
val excludeCommonsLogging = ExclusionRule(organization =
commons-logging)
val excludeSLF4J = ExclusionRule(organization = org.slf4j)
val excludeScalap = ExclusionRule(organization = org.scala-lang,
artifact = scalap)
val excludeHadoop = ExclusionRule(organization = org.apache.hadoop)
val excludeCurator = ExclusionRule(organization = org.apache.curator)
val excludePowermock = ExclusionRule(organization = org.powermock)
val excludeFastutil = ExclusionRule(organization = it.unimi.dsi)
val excludeJruby = ExclusionRule(organization = org.jruby)
val excludeThrift = ExclusionRule(organization = org.apache.thrift)
val excludeServletApi = ExclusionRule(organization = javax.servlet,
artifact = servlet-api)
val excludeJUnit = ExclusionRule(organization = junit)

I found the link (
http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-SecurityException-when-running-tests-with-Spark-1-0-0-td6747.html#a6749
) talking about the issue and the work around of the same.
But that work around does not get rid of the problem for me.
I am using an SBT build which can't be changed to maven.

What am I missing?


Stack trace
-
[info] FiltersRDDSpec:
[info] - Spark Filter *** FAILED ***
[info]   java.lang.SecurityException: class
javax.servlet.FilterRegistration's signer information does not match
signer information of other classes in the same package
[info]   at java.lang.ClassLoader.checkCerts(Unknown Source)
[info]   at java.lang.ClassLoader.preDefineClass(Unknown Source)
[info]   at java.lang.ClassLoader.defineClass(Unknown Source)
[info]   at java.security.SecureClassLoader.defineClass(Unknown Source)
[info]   at java.net.URLClassLoader.defineClass(Unknown Source)
[info]   at java.net.URLClassLoader.access$100(Unknown Source)
[info]   at java.net.URLClassLoader$1.run(Unknown Source)
[info]   at java.net.URLClassLoader$1.run(Unknown Source)
[info]   at java.security.AccessController.doPrivileged(Native Method)
[info]   at java.net.URLClassLoader.findClass(Unknown Source)

Thanks
Manas


TestSuiteBase based unit test using a sliding window join timesout

2015-01-07 Thread Enno Shioji
Hi,

I extended org.apache.spark.streaming.TestSuiteBase for some testing, and I
was able to run this test fine:

test(Sliding window join with 3 second window duration) {

  val input1 =
Seq(
  Seq(req1),
  Seq(req2, req3),
  Seq(),
  Seq(req4, req5, req6),
  Seq(req7),
  Seq(),
  Seq()
)

  val input2 =
Seq(
  Seq((tx1, req1)),
  Seq(),
  Seq((tx2, req3)),
  Seq((tx3, req2)),
  Seq(),
  Seq((tx4, req7)),
  Seq((tx5, req5), (tx6, req4))
)

  val expectedOutput = Seq(
Seq((req1, (1, tx1))),
Seq(),
Seq((req3, (1, tx2))),
Seq((req2, (1, tx3))),
Seq(),
Seq((req7, (1, tx4))),
Seq()
  )
  val operation = (rq: DStream[String], tx: DStream[(String,String)]) = {
rq.window(Seconds(3), Seconds(1)).map(x = (x, 1)).join(tx.map{ case
(k, v) = (v, k)})
  }
  testOperation(input1, input2, operation, expectedOutput, useSet=true)
}

However, this seemingly OK looking test fails with operation timeout:

test(Sliding window join with 3 second window duration + a tumbling
window) {

  val input1 =
Seq(
  Seq(req1),
  Seq(req2, req3),
  Seq(),
  Seq(req4, req5, req6),
  Seq(req7),
  Seq()
)

  val input2 =
Seq(
  Seq((tx1, req1)),
  Seq(),
  Seq((tx2, req3)),
  Seq((tx3, req2)),
  Seq(),
  Seq((tx4, req7))
)

  val expectedOutput = Seq(
Seq((req1, (1, tx1))),
Seq((req2, (1, tx3)), (req3, (1, tx3))),
Seq((req7, (1, tx4)))
  )
  val operation = (rq: DStream[String], tx: DStream[(String,String)]) = {
rq.window(Seconds(3), Seconds(2)).map(x = (x,
1)).join(tx.window(Seconds(2), Seconds(2)).map{ case (k, v) = (v, k)})
  }
  testOperation(input1, input2, operation, expectedOutput, useSet=true)
}

Stacktrace:
10033 was not less than 1 Operation timed out after 10033 ms
org.scalatest.exceptions.TestFailedException: 10033 was not less than 1
Operation timed out after 10033 ms
at
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
at
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
at
org.apache.spark.streaming.TestSuiteBase$class.runStreamsWithPartitions(TestSuiteBase.scala:338)

Does anybody know why this could be?
ᐧ


Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
Hello,

I have a piece of code that runs inside Spark Streaming and tries to get
some data from a RESTful web service (that runs locally on my machine). The
code snippet in question is:

 Client client = ClientBuilder.newClient();
 WebTarget target = client.target(http://localhost:/rest;);
 target = target.path(annotate)
 .queryParam(text,
UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
 .queryParam(confidence, 0.3);

  logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString());

  String response =
target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);

  logger.warn(!!! DEBUG !!! Spotlight response: {}, response);

When run inside a unit test as follows:

 mvn clean test -Dtest=SpotlightTest#testCountWords

it contacts the RESTful web service and retrieves some data as expected.
But when the same code is run as part of the application that is submitted
to Spark, using spark-submit script I receive the following error:

  java.lang.NoSuchMethodError:
javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V

I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in
my project's pom.xml:

 dependency
  groupIdorg.glassfish.jersey.containers/groupId
  artifactIdjersey-container-servlet-core/artifactId
  version2.14/version
/dependency

So I suspect that when the application is submitted to Spark, somehow
there's a different JAR in the environment that uses a different version of
Jersey / javax.ws.rs.*

Does anybody know which version of Jersey / javax.ws.rs.*  is used in the
Spark environment, or how to solve this conflict?


-- 
Emre Sevinç
https://be.linkedin.com/in/emresevinc/


Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Sean Owen
Your guess is right, that there are two incompatible versions of
Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey,
but its transitive dependencies may, or your transitive dependencies
may.

I don't see Jersey in Spark's dependency tree except from HBase tests,
which in turn only appear in examples, so that's unlikely to be it.
I'd take a look with 'mvn dependency:tree' on your own code first.
Maybe you are including JavaEE 6 for example?

On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote:
 Hello,

 I have a piece of code that runs inside Spark Streaming and tries to get
 some data from a RESTful web service (that runs locally on my machine). The
 code snippet in question is:

  Client client = ClientBuilder.newClient();
  WebTarget target = client.target(http://localhost:/rest;);
  target = target.path(annotate)
  .queryParam(text,
 UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
  .queryParam(confidence, 0.3);

   logger.warn(!!! DEBUG !!! target: {}, target.getUri().toString());

   String response =
 target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);

   logger.warn(!!! DEBUG !!! Spotlight response: {}, response);

 When run inside a unit test as follows:

  mvn clean test -Dtest=SpotlightTest#testCountWords

 it contacts the RESTful web service and retrieves some data as expected. But
 when the same code is run as part of the application that is submitted to
 Spark, using spark-submit script I receive the following error:

   java.lang.NoSuchMethodError:
 javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V

 I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey in
 my project's pom.xml:

  dependency
   groupIdorg.glassfish.jersey.containers/groupId
   artifactIdjersey-container-servlet-core/artifactId
   version2.14/version
 /dependency

 So I suspect that when the application is submitted to Spark, somehow
 there's a different JAR in the environment that uses a different version of
 Jersey / javax.ws.rs.*

 Does anybody know which version of Jersey / javax.ws.rs.*  is used in the
 Spark environment, or how to solve this conflict?


 --
 Emre Sevinç
 https://be.linkedin.com/in/emresevinc/


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:

 I'd take a look with 'mvn dependency:tree' on your own code first.
 Maybe you are including JavaEE 6 for example?


For reference, my complete pom.xml looks like:

project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=
http://www.w3.org/2001/XMLSchema-instance;
  xsi:schemaLocation=http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;
  modelVersion4.0.0/modelVersion

  groupIdbigcontent/groupId
  artifactIdbigcontent/artifactId
  version1.0-SNAPSHOT/version
  packagingjar/packaging

  namebigcontent/name
  urlhttp://maven.apache.org/url

  properties
project.build.sourceEncodingUTF-8/project.build.sourceEncoding
  /properties

  build
plugins
  plugin
groupIdorg.apache.maven.plugins/groupId
artifactIdmaven-shade-plugin/artifactId
version2.3/version
configuration
  !-- put your configurations here --
/configuration
executions
  execution
phasepackage/phase
goals
  goalshade/goal
/goals
  /execution
/executions
  /plugin

  plugin
groupIdorg.apache.maven.plugins/groupId
artifactIdmaven-compiler-plugin/artifactId
version3.2/version
configuration
  source1.7/source
  target1.7/target
/configuration
  /plugin
/plugins
  /build

  dependencies
dependency
  groupIdorg.apache.spark/groupId
  artifactIdspark-streaming_2.10/artifactId
  version1.1.1/version
  scopeprovided/scope
/dependency

dependency
  groupIdorg.glassfish.jersey.containers/groupId
  artifactIdjersey-container-servlet-core/artifactId
  version2.14/version
/dependency

dependency
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-client/artifactId
  version2.4.0/version
/dependency

dependency
  groupIdcom.google.guava/groupId
  artifactIdguava/artifactId
  version16.0/version
/dependency

dependency
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-mapreduce-client-core/artifactId
  version2.4.0/version
/dependency

dependency
  groupIdjson-mapreduce/groupId
  artifactIdjson-mapreduce/artifactId
  version1.0-SNAPSHOT/version
  exclusions
  exclusion
groupIdjavax.servlet/groupId
artifactId*/artifactId
  /exclusion
exclusion
  groupIdcommons-io/groupId
  artifactId*/artifactId
  /exclusion
  exclusion
  groupIdcommons-lang/groupId
  artifactId*/artifactId
  /exclusion
exclusion
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-common/artifactId
/exclusion
  /exclusions
/dependency

dependency
  groupIdorg.apache.avro/groupId
  artifactIdavro-mapred/artifactId
  version1.7.7/version
  exclusions
exclusion
  groupIdjavax.servlet/groupId
  artifactId*/artifactId
/exclusion
exclusion
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-common/artifactId
/exclusion
  /exclusions
/dependency

dependency
  groupIdjunit/groupId
  artifactIdjunit/artifactId
  version4.11/version
  scopetest/scope
/dependency

dependency
  groupIdorg.apache.avro/groupId
  artifactIdavro/artifactId
  version1.7.7/version
  exclusions
exclusion
  groupIdjavax.servlet/groupId
  artifactId*/artifactId
/exclusion
  /exclusions
/dependency

dependency
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-common/artifactId
  version2.4.0/version
  scopeprovided/scope
  exclusions
exclusion
  groupIdjavax.servlet/groupId
  artifactId*/artifactId
/exclusion
exclusion
  groupIdcom.google.guava/groupId
  artifactId*/artifactId
/exclusion
  /exclusions
/dependency

dependency
  groupIdorg.slf4j/groupId
  artifactIdslf4j-log4j12/artifactId
  version1.7.7/version
/dependency
  /dependencies
/project

And 'mvn dependency:tree' produces the following output:



[INFO] Scanning for projects...
[INFO]

[INFO]

[INFO] Building bigcontent 1.0-SNAPSHOT
[INFO]

[INFO]
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ bigcontent ---
[INFO] bigcontent:bigcontent:jar:1.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-streaming_2.10:jar:1.1.1:provided
[INFO] |  +- org.apache.spark:spark-core_2.10:jar:1.1.1:provided
[INFO] |  |  +- org.apache.curator:curator-recipes:jar:2.4.0:provided
[INFO] |  |  |  \- org.apache.curator:curator-framework:jar:2.4.0:provided
[INFO] |  |  | \- 

Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
It seems like YARN depends an older version of Jersey, that is 1.9:

  https://github.com/apache/spark/blob/master/yarn/pom.xml

When I've modified my dependencies to have only:

  dependency
  groupIdcom.sun.jersey/groupId
  artifactIdjersey-core/artifactId
  version1.9.1/version
/dependency

And then modified the code to use the old Jersey API:

Client c = Client.create();
WebResource r = c.resource(http://localhost:/rest;)
 .path(annotate)
 .queryParam(text,
UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
 .queryParam(confidence, 0.3);

logger.warn(!!! DEBUG !!! target: {}, r.getURI());

String response = r.accept(MediaType.APPLICATION_JSON_TYPE)
   //.header()
   .get(String.class);

logger.warn(!!! DEBUG !!! Spotlight response: {}, response);

It seems to work when I use spark-submit to submit the application that
includes this code.

Funny thing is, now my relevant unit test does not run, complaining about
not having enough memory:

Java HotSpot(TM) 64-Bit Server VM warning: INFO:
os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot
allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 25165824 bytes for
committing reserved memory.

--
Emre


On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:

 Your guess is right, that there are two incompatible versions of
 Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey,
 but its transitive dependencies may, or your transitive dependencies
 may.

 I don't see Jersey in Spark's dependency tree except from HBase tests,
 which in turn only appear in examples, so that's unlikely to be it.
 I'd take a look with 'mvn dependency:tree' on your own code first.
 Maybe you are including JavaEE 6 for example?

 On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:
  Hello,
 
  I have a piece of code that runs inside Spark Streaming and tries to get
  some data from a RESTful web service (that runs locally on my machine).
 The
  code snippet in question is:
 
   Client client = ClientBuilder.newClient();
   WebTarget target = client.target(http://localhost:/rest;);
   target = target.path(annotate)
   .queryParam(text,
  UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
   .queryParam(confidence, 0.3);
 
logger.warn(!!! DEBUG !!! target: {},
 target.getUri().toString());
 
String response =
 
 target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);
 
logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
 
  When run inside a unit test as follows:
 
   mvn clean test -Dtest=SpotlightTest#testCountWords
 
  it contacts the RESTful web service and retrieves some data as expected.
 But
  when the same code is run as part of the application that is submitted to
  Spark, using spark-submit script I receive the following error:
 
java.lang.NoSuchMethodError:
 
 javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V
 
  I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey
 in
  my project's pom.xml:
 
   dependency
groupIdorg.glassfish.jersey.containers/groupId
artifactIdjersey-container-servlet-core/artifactId
version2.14/version
  /dependency
 
  So I suspect that when the application is submitted to Spark, somehow
  there's a different JAR in the environment that uses a different version
 of
  Jersey / javax.ws.rs.*
 
  Does anybody know which version of Jersey / javax.ws.rs.*  is used in the
  Spark environment, or how to solve this conflict?
 
 
  --
  Emre Sevinç
  https://be.linkedin.com/in/emresevinc/
 




-- 
Emre Sevinc


Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Sean Owen
That could well be it -- oops, I forgot to run with the YARN profile
and so didn't see the YARN dependencies. Try the userClassPathFirst
option to try to make your app's copy take precedence.

The second error is really a JVM bug, but, is from having too little
memory available for the unit tests.

http://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage

On Wed, Dec 24, 2014 at 1:56 PM, Emre Sevinc emre.sev...@gmail.com wrote:
 It seems like YARN depends an older version of Jersey, that is 1.9:

   https://github.com/apache/spark/blob/master/yarn/pom.xml

 When I've modified my dependencies to have only:

   dependency
   groupIdcom.sun.jersey/groupId
   artifactIdjersey-core/artifactId
   version1.9.1/version
 /dependency

 And then modified the code to use the old Jersey API:

 Client c = Client.create();
 WebResource r = c.resource(http://localhost:/rest;)
  .path(annotate)
  .queryParam(text,
 UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
  .queryParam(confidence, 0.3);

 logger.warn(!!! DEBUG !!! target: {}, r.getURI());

 String response = r.accept(MediaType.APPLICATION_JSON_TYPE)
//.header()
.get(String.class);

 logger.warn(!!! DEBUG !!! Spotlight response: {}, response);

 It seems to work when I use spark-submit to submit the application that
 includes this code.

 Funny thing is, now my relevant unit test does not run, complaining about
 not having enough memory:

 Java HotSpot(TM) 64-Bit Server VM warning: INFO:
 os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot
 allocate memory' (errno=12)
 #
 # There is insufficient memory for the Java Runtime Environment to continue.
 # Native memory allocation (mmap) failed to map 25165824 bytes for
 committing reserved memory.

 --
 Emre


 On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:

 Your guess is right, that there are two incompatible versions of
 Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey,
 but its transitive dependencies may, or your transitive dependencies
 may.

 I don't see Jersey in Spark's dependency tree except from HBase tests,
 which in turn only appear in examples, so that's unlikely to be it.
 I'd take a look with 'mvn dependency:tree' on your own code first.
 Maybe you are including JavaEE 6 for example?

 On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:
  Hello,
 
  I have a piece of code that runs inside Spark Streaming and tries to get
  some data from a RESTful web service (that runs locally on my machine).
  The
  code snippet in question is:
 
   Client client = ClientBuilder.newClient();
   WebTarget target = client.target(http://localhost:/rest;);
   target = target.path(annotate)
   .queryParam(text,
  UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
   .queryParam(confidence, 0.3);
 
logger.warn(!!! DEBUG !!! target: {},
  target.getUri().toString());
 
String response =
 
  target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);
 
logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
 
  When run inside a unit test as follows:
 
   mvn clean test -Dtest=SpotlightTest#testCountWords
 
  it contacts the RESTful web service and retrieves some data as expected.
  But
  when the same code is run as part of the application that is submitted
  to
  Spark, using spark-submit script I receive the following error:
 
java.lang.NoSuchMethodError:
 
  javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V
 
  I'm using Spark 1.1.0 and for consuming the web service I'm using Jersey
  in
  my project's pom.xml:
 
   dependency
groupIdorg.glassfish.jersey.containers/groupId
artifactIdjersey-container-servlet-core/artifactId
version2.14/version
  /dependency
 
  So I suspect that when the application is submitted to Spark, somehow
  there's a different JAR in the environment that uses a different version
  of
  Jersey / javax.ws.rs.*
 
  Does anybody know which version of Jersey / javax.ws.rs.*  is used in
  the
  Spark environment, or how to solve this conflict?
 
 
  --
  Emre Sevinç
  https://be.linkedin.com/in/emresevinc/
 




 --
 Emre Sevinc

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
Sean,

Thanks a lot for the important information, especially  userClassPathFirst.

Cheers,
Emre

On Wed, Dec 24, 2014 at 3:38 PM, Sean Owen so...@cloudera.com wrote:

 That could well be it -- oops, I forgot to run with the YARN profile
 and so didn't see the YARN dependencies. Try the userClassPathFirst
 option to try to make your app's copy take precedence.

 The second error is really a JVM bug, but, is from having too little
 memory available for the unit tests.


 http://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage

 On Wed, Dec 24, 2014 at 1:56 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:
  It seems like YARN depends an older version of Jersey, that is 1.9:
 
https://github.com/apache/spark/blob/master/yarn/pom.xml
 
  When I've modified my dependencies to have only:
 
dependency
groupIdcom.sun.jersey/groupId
artifactIdjersey-core/artifactId
version1.9.1/version
  /dependency
 
  And then modified the code to use the old Jersey API:
 
  Client c = Client.create();
  WebResource r = c.resource(http://localhost:/rest;)
   .path(annotate)
   .queryParam(text,
  UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
   .queryParam(confidence, 0.3);
 
  logger.warn(!!! DEBUG !!! target: {}, r.getURI());
 
  String response = r.accept(MediaType.APPLICATION_JSON_TYPE)
 //.header()
 .get(String.class);
 
  logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
 
  It seems to work when I use spark-submit to submit the application that
  includes this code.
 
  Funny thing is, now my relevant unit test does not run, complaining about
  not having enough memory:
 
  Java HotSpot(TM) 64-Bit Server VM warning: INFO:
  os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot
  allocate memory' (errno=12)
  #
  # There is insufficient memory for the Java Runtime Environment to
 continue.
  # Native memory allocation (mmap) failed to map 25165824 bytes for
  committing reserved memory.
 
  --
  Emre
 
 
  On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:
 
  Your guess is right, that there are two incompatible versions of
  Jersey (or really, JAX-RS) in your runtime. Spark doesn't use Jersey,
  but its transitive dependencies may, or your transitive dependencies
  may.
 
  I don't see Jersey in Spark's dependency tree except from HBase tests,
  which in turn only appear in examples, so that's unlikely to be it.
  I'd take a look with 'mvn dependency:tree' on your own code first.
  Maybe you are including JavaEE 6 for example?
 
  On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com
  wrote:
   Hello,
  
   I have a piece of code that runs inside Spark Streaming and tries to
 get
   some data from a RESTful web service (that runs locally on my
 machine).
   The
   code snippet in question is:
  
Client client = ClientBuilder.newClient();
WebTarget target = client.target(http://localhost:/rest;);
target = target.path(annotate)
.queryParam(text,
   UrlEscapers.urlFragmentEscaper().escape(spotlightSubmission))
.queryParam(confidence, 0.3);
  
 logger.warn(!!! DEBUG !!! target: {},
   target.getUri().toString());
  
 String response =
  
  
 target.request().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);
  
 logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
  
   When run inside a unit test as follows:
  
mvn clean test -Dtest=SpotlightTest#testCountWords
  
   it contacts the RESTful web service and retrieves some data as
 expected.
   But
   when the same code is run as part of the application that is submitted
   to
   Spark, using spark-submit script I receive the following error:
  
 java.lang.NoSuchMethodError:
  
  
 javax.ws.rs.core.MultivaluedMap.addAll(Ljava/lang/Object;[Ljava/lang/Object;)V
  
   I'm using Spark 1.1.0 and for consuming the web service I'm using
 Jersey
   in
   my project's pom.xml:
  
dependency
 groupIdorg.glassfish.jersey.containers/groupId
 artifactIdjersey-container-servlet-core/artifactId
 version2.14/version
   /dependency
  
   So I suspect that when the application is submitted to Spark, somehow
   there's a different JAR in the environment that uses a different
 version
   of
   Jersey / javax.ws.rs.*
  
   Does anybody know which version of Jersey / javax.ws.rs.*  is used in
   the
   Spark environment, or how to solve this conflict?
  
  
   --
   Emre Sevinç
   https://be.linkedin.com/in/emresevinc/
  
 
 
 
 
  --
  Emre Sevinc




-- 
Emre Sevinc


How can I make Spark Streaming count the words in a file in a unit test?

2014-12-08 Thread Emre Sevinc
Hello,

I've successfully built a very simple Spark Streaming application in Java
that is based on the HdfsCount example in Scala at
https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
.

When I submit this application to my local Spark, it waits for a file to be
written to a given directory, and when I create that file it successfully
prints the number of words. I terminate the application by pressing Ctrl+C.

Now I've tried to create a very basic unit test for this functionality, but
in the test I was not able to print the same information, that is the
number of words.

What am I missing?

Below is the unit test file, and after that I've also included the code
snippet that shows the countWords method:

=
StarterAppTest.java
=
import com.google.common.io.Files;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;


import org.junit.*;

import java.io.*;

public class StarterAppTest {

  JavaStreamingContext ssc;
  File tempDir;

  @Before
  public void setUp() {
ssc = new JavaStreamingContext(local, test, new Duration(3000));
tempDir = Files.createTempDir();
tempDir.deleteOnExit();
  }

  @After
  public void tearDown() {
ssc.stop();
ssc = null;
  }

  @Test
  public void testInitialization() {
Assert.assertNotNull(ssc.sc());
  }


  @Test
  public void testCountWords() {

StarterApp starterApp = new StarterApp();

try {
  JavaDStreamString lines =
ssc.textFileStream(tempDir.getAbsolutePath());
  JavaPairDStreamString, Integer wordCounts =
starterApp.countWords(lines);

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

  ssc.start();

  File tmpFile = new File(tempDir.getAbsolutePath(), tmp.txt);
  PrintWriter writer = new PrintWriter(tmpFile, UTF-8);
  writer.println(8-Dec-2014: Emre Emre Emre Ergin Ergin Ergin);
  writer.close();

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

} catch (FileNotFoundException e) {
  e.printStackTrace();
} catch (UnsupportedEncodingException e) {
  e.printStackTrace();
}


Assert.assertTrue(true);

  }

}
=

This test compiles and starts to run, Spark Streaming prints a lot of
diagnostic messages on the console but the calls to wordCounts.print();
does not print anything, whereas in StarterApp.java itself, they do.

I've also added ssc.awaitTermination(); after ssc.start() but nothing
changed in that respect. After that I've also tried to create a new file in
the directory that this Spark Streaming application was checking but this
time it gave an error.

For completeness, below is the wordCounts method:


public JavaPairDStreamString, Integer countWords(JavaDStreamString
lines) {
JavaDStreamString words = lines.flatMap(new FlatMapFunctionString,
String() {
  @Override
  public IterableString call(String x) { return
Lists.newArrayList(SPACE.split(x)); }
});

JavaPairDStreamString, Integer wordCounts = words.mapToPair(
new PairFunctionString, String, Integer() {
  @Override
  public Tuple2String, Integer call(String s) { return new
Tuple2(s, 1); }
}).reduceByKey((i1, i2) - i1 + i2);

return wordCounts;
  }




Kind regards
Emre Sevinç


Re: How can I make Spark Streaming count the words in a file in a unit test?

2014-12-08 Thread Burak Yavuz
Hi,

https://github.com/databricks/spark-perf/tree/master/streaming-tests/src/main/scala/streaming/perf
contains some performance tests for streaming. There are examples of how to 
generate synthetic files during the test in that repo, maybe you
can find some code snippets that you can use there.

Best,
Burak

- Original Message -
From: Emre Sevinc emre.sev...@gmail.com
To: user@spark.apache.org
Sent: Monday, December 8, 2014 2:36:41 AM
Subject: How can I make Spark Streaming count the words in a file in a unit 
test?

Hello,

I've successfully built a very simple Spark Streaming application in Java
that is based on the HdfsCount example in Scala at
https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
.

When I submit this application to my local Spark, it waits for a file to be
written to a given directory, and when I create that file it successfully
prints the number of words. I terminate the application by pressing Ctrl+C.

Now I've tried to create a very basic unit test for this functionality, but
in the test I was not able to print the same information, that is the
number of words.

What am I missing?

Below is the unit test file, and after that I've also included the code
snippet that shows the countWords method:

=
StarterAppTest.java
=
import com.google.common.io.Files;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;


import org.junit.*;

import java.io.*;

public class StarterAppTest {

  JavaStreamingContext ssc;
  File tempDir;

  @Before
  public void setUp() {
ssc = new JavaStreamingContext(local, test, new Duration(3000));
tempDir = Files.createTempDir();
tempDir.deleteOnExit();
  }

  @After
  public void tearDown() {
ssc.stop();
ssc = null;
  }

  @Test
  public void testInitialization() {
Assert.assertNotNull(ssc.sc());
  }


  @Test
  public void testCountWords() {

StarterApp starterApp = new StarterApp();

try {
  JavaDStreamString lines =
ssc.textFileStream(tempDir.getAbsolutePath());
  JavaPairDStreamString, Integer wordCounts =
starterApp.countWords(lines);

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

  ssc.start();

  File tmpFile = new File(tempDir.getAbsolutePath(), tmp.txt);
  PrintWriter writer = new PrintWriter(tmpFile, UTF-8);
  writer.println(8-Dec-2014: Emre Emre Emre Ergin Ergin Ergin);
  writer.close();

  System.err.println(= Word Counts ===);
  wordCounts.print();
  System.err.println(= Word Counts ===);

} catch (FileNotFoundException e) {
  e.printStackTrace();
} catch (UnsupportedEncodingException e) {
  e.printStackTrace();
}


Assert.assertTrue(true);

  }

}
=

This test compiles and starts to run, Spark Streaming prints a lot of
diagnostic messages on the console but the calls to wordCounts.print();
does not print anything, whereas in StarterApp.java itself, they do.

I've also added ssc.awaitTermination(); after ssc.start() but nothing
changed in that respect. After that I've also tried to create a new file in
the directory that this Spark Streaming application was checking but this
time it gave an error.

For completeness, below is the wordCounts method:


public JavaPairDStreamString, Integer countWords(JavaDStreamString
lines) {
JavaDStreamString words = lines.flatMap(new FlatMapFunctionString,
String() {
  @Override
  public IterableString call(String x) { return
Lists.newArrayList(SPACE.split(x)); }
});

JavaPairDStreamString, Integer wordCounts = words.mapToPair(
new PairFunctionString, String, Integer() {
  @Override
  public Tuple2String, Integer call(String s) { return new
Tuple2(s, 1); }
}).reduceByKey((i1, i2) - i1 + i2);

return wordCounts;
  }




Kind regards
Emre Sevinç


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Cannot run unit test.

2014-09-17 Thread Jies
When I run `sbt test-only SparkTest` or `sbt test-only SparkTest1`, it
was pass. But run `set test` to tests SparkTest and SparkTest1, it was
failed.

If merge all cases into one file, the test was pass.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p14506.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit Test for Spark Streaming

2014-08-08 Thread JiajiaJing
Hi TD,

I tried some different setup on maven these days, and now I can at least get
something when running mvn test. However, it seems like scalatest cannot
find the test cases specified in the test suite.
Here is the output I get: 
http://apache-spark-user-list.1001560.n3.nabble.com/file/n11825/Screen_Shot_2014-08-08_at_5.png
 

Could you please give me some details on how you setup the ScalaTest on your
machine? I believe there must be some other setup issue on my machine but I
cannot figure out why...
And here are the plugins and dependencies related to scalatest in my pom.xml
:
plugin
  groupIdorg.apache.maven.plugins/groupId
  artifactIdmaven-surefire-plugin/artifactId
  version2.7/version
  configuration
skipTeststrue/skipTests
  /configuration
/plugin

plugin
  groupIdorg.scalatest/groupId
  artifactIdscalatest-maven-plugin/artifactId
  version1.0/version
  configuration
   
reportsDirectory${project.build.directory}/surefire-reports/reportsDirectory
junitxml./junitxml
   
filereports${project.build.directory}/SparkTestSuite.txt/filereports
tagsToIncludeATag/tagsToInclude
systemProperties
  java.awt.headlesstrue/java.awt.headless
 
spark.test.home${session.executionRootDirectory}/spark.test.home
  spark.testing1/spark.testing
/systemProperties
  /configuration
  executions
execution
  idtest/id
  goals
goaltest/goal
  /goals
/execution
  /executions
/plugin


dependency
  groupIdjunit/groupId
  artifactIdjunit/artifactId
  version4.8.1/version
  scopetest/scope
/dependency
dependency
  groupIdorg.scalatest/groupId
  artifactIdscalatest_2.10/artifactId
  version2.2.1/version
  scopetest/scope
/dependency

Thank you very much!

Best Regards,

Jiajia



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11825.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit Test for Spark Streaming

2014-08-06 Thread JiajiaJing
Thank you TD,

I have worked around that problem and now the test compiles. 
However, I don't actually see that test running. As when I do mvn test, it
just says BUILD SUCCESS, without any TEST section on stdout. 
Are we suppose to use mvn test to run the test? Are there any other
methods can be used to run this test?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit Test for Spark Streaming

2014-08-06 Thread Tathagata Das
Does it not show the name of the testsuite on stdout, showing that it has
passed? Can you try writing a small test unit-test, in the same way as
your kafka unit test, and with print statements on stdout ... to see
whether it works? I believe it is some configuration issue in maven, which
is hard for me to guess.

TD


On Wed, Aug 6, 2014 at 12:53 PM, JiajiaJing jj.jing0...@gmail.com wrote:

 Thank you TD,

 I have worked around that problem and now the test compiles.
 However, I don't actually see that test running. As when I do mvn test,
 it
 just says BUILD SUCCESS, without any TEST section on stdout.
 Are we suppose to use mvn test to run the test? Are there any other
 methods can be used to run this test?





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Unit Test for Spark Streaming

2014-08-05 Thread Tathagata Das
That function is simply deletes a directory recursively. you can use
alternative libraries. see this discussion
http://stackoverflow.com/questions/779519/delete-files-recursively-in-java


On Tue, Aug 5, 2014 at 5:02 PM, JiajiaJing jj.jing0...@gmail.com wrote:
 Hi TD,

 I encountered a problem when trying to run the KafkaStreamSuite.scala unit
 test.
 I added scalatest-maven-plugin to my pom.xml, then ran mvn test, and got
 the follow error message:

 error: object Utils in package util cannot be accessed in package
 org.apache.spark.util
 [INFO] brokerConf.logDirs.foreach { f =
 Utils.deleteRecursively(new File(f)) }
 [INFO]^

 I checked that Utils.scala does exists under
 spark/core/src/main/scala/org/apache/spark/util/, so I have no idea about
 why this access error.
 Could you please help me with this?

 Thank you very much!

 Best Regards,

 Jiajia



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11505.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Unit Test for Spark Streaming

2014-08-04 Thread JiajiaJing
Hello Spark Users,

I have a spark streaming program that stream data from kafka topics and
output as parquet file on HDFS. 
Now I want to write a unit test for this program to make sure the output
data is correct (i.e not missing any data from kafka). 
However, I have no idea about how to do this, especially how to mock a kafka
topic.
Can someone help me with this?

Thank you very much!

Best Regards,

Jiajia



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit Test for Spark Streaming

2014-08-04 Thread Tathagata Das
Appropriately timed question! Here is the PR that adds a real unit
test for Kafka stream in Spark Streaming. Maybe this will help!

https://github.com/apache/spark/pull/1751/files

On Mon, Aug 4, 2014 at 6:30 PM, JiajiaJing jj.jing0...@gmail.com wrote:
 Hello Spark Users,

 I have a spark streaming program that stream data from kafka topics and
 output as parquet file on HDFS.
 Now I want to write a unit test for this program to make sure the output
 data is correct (i.e not missing any data from kafka).
 However, I have no idea about how to do this, especially how to mock a kafka
 topic.
 Can someone help me with this?

 Thank you very much!

 Best Regards,

 Jiajia



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Unit Test for Spark Streaming

2014-08-04 Thread JiajiaJing
This helps a lot!!
Thank you very much!

Jiajia



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11396.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Run spark unit test on Windows 7

2014-07-03 Thread Konstantin Kudryavtsev
 fine on
 cluster (YARN, linux machines). However, when I'm trying to run it on local
 machine (*Windows 7*) under unit test, I got errors:

 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)


 My code is following:

 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor

 val data = sc.parallelize(List(in1, in2, in3))

 etl.etl(data) // rdd transformation, no access to SparkContext or 
 Hadoop
 Assert.assertTrue(true)
 } finally {
 if(sc != null)
 sc.stop()
 }
 }


 Why is it trying to access hadoop at all? and how can I fix it? Thank
 you in advance

 Thank you,
 Konstantin Kudryavtsev








Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
?

Andrew


2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
kudryavtsev.konstan...@gmail.com:

Hi all,

I'm trying to run some transformation on Spark, it works fine on cluster (YARN, 
linux machines). However, when I'm trying to run it on local machine (Windows 
7) under unit test, I got errors:


java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)

My code is following:


@Test
def testETL() = {
val conf = new SparkConf()
val sc = new SparkContext(local, test, conf)
try {
val etl = new IxtoolsDailyAgg() // empty constructor

val data = sc.parallelize(List(in1, in2, in3))

etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop
Assert.assertTrue(true)
} finally {
if(sc != null)
sc.stop()
}
}

Why is it trying to access hadoop at all? and how can I fix it? Thank you in 
advance

Thank you,
Konstantin Kudryavtsev







Re: Run spark unit test on Windows 7

2014-07-03 Thread Kostiantyn Kudriavtsev
)
 
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:
 Hi Konstatin,
 
 We use hadoop as a library in a few places in Spark. I wonder why the path 
 includes null though.
 
 Could you provide the full stack trace?
 
 Andrew
 
 
 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:
 
 Hi all,
 
 I'm trying to run some transformation on Spark, it works fine on cluster 
 (YARN, linux machines). However, when I'm trying to run it on local 
 machine (Windows 7) under unit test, I got errors:
 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 
 My code is following:
 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor
 
 val data = sc.parallelize(List(in1, in2, in3))
 
 etl.etl(data) // rdd transformation, no access to SparkContext or 
 Hadoop
 Assert.assertTrue(true)
 } finally {
 if(sc != null)
 sc.stop()
 }
 }
 
 Why is it trying to access hadoop at all? and how can I fix it? Thank you 
 in advance
 
 Thank you,
 Konstantin Kudryavtsev



Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:
 Hi Konstatin,
 
 We use hadoop as a library in a few places in Spark. I wonder why the 
 path includes null though.
 
 Could you provide the full stack trace?
 
 Andrew
 
 
 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:
 
 Hi all,
 
 I'm trying to run some transformation on Spark, it works fine on 
 cluster (YARN, linux machines). However, when I'm trying to run it on 
 local machine (Windows 7) under unit test, I got errors:
 java.io.IOException: Could not locate executable null\bin\winutils.exe 
 in the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 
 My code is following:
 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor
 
 val data = sc.parallelize(List(in1, in2, in3))
 
 etl.etl(data) // rdd transformation, no access to SparkContext 
 or Hadoop
 Assert.assertTrue(true)
 } finally {
 if(sc != null)
 sc.stop()
 }
 }
 
 Why is it trying to access hadoop at all? and how can I fix it? Thank 
 you in advance
 
 Thank you,
 Konstantin Kudryavtsev
 


Run spark unit test on Windows 7

2014-07-02 Thread Konstantin Kudryavtsev
Hi all,

I'm trying to run some transformation on *Spark*, it works fine on cluster
(YARN, linux machines). However, when I'm trying to run it on local machine
(*Windows 7*) under unit test, I got errors:

java.io.IOException: Could not locate executable null\bin\winutils.exe
in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)


My code is following:

@Test
def testETL() = {
val conf = new SparkConf()
val sc = new SparkContext(local, test, conf)
try {
val etl = new IxtoolsDailyAgg() // empty constructor

val data = sc.parallelize(List(in1, in2, in3))

etl.etl(data) // rdd transformation, no access to SparkContext or Hadoop
Assert.assertTrue(true)
} finally {
if(sc != null)
sc.stop()
}
}


Why is it trying to access hadoop at all? and how can I fix it? Thank you
in advance

Thank you,
Konstantin Kudryavtsev


Re: Run spark unit test on Windows 7

2014-07-02 Thread Andrew Or
Hi Konstatin,

We use hadoop as a library in a few places in Spark. I wonder why the path
includes null though.

Could you provide the full stack trace?

Andrew


2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
kudryavtsev.konstan...@gmail.com:

 Hi all,

 I'm trying to run some transformation on *Spark*, it works fine on
 cluster (YARN, linux machines). However, when I'm trying to run it on local
 machine (*Windows 7*) under unit test, I got errors:

 java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
 Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)


 My code is following:

 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor

 val data = sc.parallelize(List(in1, in2, in3))

 etl.etl(data) // rdd transformation, no access to SparkContext or 
 Hadoop
 Assert.assertTrue(true)
 } finally {
 if(sc != null)
 sc.stop()
 }
 }


 Why is it trying to access hadoop at all? and how can I fix it? Thank you
 in advance

 Thank you,
 Konstantin Kudryavtsev



Re: Run spark unit test on Windows 7

2014-07-02 Thread Konstantin Kudryavtsev
Hi Andrew,

it's windows 7 and I doesn't set up any env variables here

The full stack trace:

14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the
hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in
the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 at org.apache.hadoop.security.Groups.init(Groups.java:77)
at
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
 at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
at
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
 at org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
at
org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
 at org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
at org.apache.spark.SparkContext.init(SparkContext.scala:228)
 at org.apache.spark.SparkContext.init(SparkContext.scala:97)
at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
 at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
 at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
 at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
at
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
 at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)


Thank you,
Konstantin Kudryavtsev


On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:

 Hi Konstatin,

 We use hadoop as a library in a few places in Spark. I wonder why the path
 includes null though.

 Could you provide the full stack trace?

 Andrew


 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:

 Hi all,

 I'm trying to run some transformation on *Spark*, it works fine on
 cluster (YARN, linux machines). However, when I'm trying to run it on local
 machine (*Windows 7*) under unit test, I got errors:


 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)


 My code is following:


 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor

 val data = sc.parallelize(List(in1, in2, in3))

 etl.etl(data) // rdd transformation, no access to SparkContext or 
 Hadoop
 Assert.assertTrue(true)
 } finally {
 if(sc != null)
 sc.stop()
 }
 }


 Why is it trying to access hadoop at all? and how can I fix it? Thank you
 in advance

 Thank you,
 Konstantin Kudryavtsev





Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
By any chance do you have HDP 2.1 installed? you may need to install the utils 
and update the env variables per 
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows


 On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Hi Andrew,
 
 it's windows 7 and I doesn't set up any env variables here 
 
 The full stack trace:
 
 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
 for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
 hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
 Hadoop binaries.
   at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
   at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
   at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
   at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
   at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
   at org.apache.hadoop.security.Groups.init(Groups.java:77)
   at 
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
   at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
   at 
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
   at 
 org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
   at org.apache.spark.SparkContext.init(SparkContext.scala:228)
   at org.apache.spark.SparkContext.init(SparkContext.scala:97)
   at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at junit.framework.TestCase.runTest(TestCase.java:168)
   at junit.framework.TestCase.runBare(TestCase.java:134)
   at junit.framework.TestResult$1.protect(TestResult.java:110)
   at junit.framework.TestResult.runProtected(TestResult.java:128)
   at junit.framework.TestResult.run(TestResult.java:113)
   at junit.framework.TestCase.run(TestCase.java:124)
   at junit.framework.TestSuite.runTest(TestSuite.java:232)
   at junit.framework.TestSuite.run(TestSuite.java:227)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
   at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
   at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
   at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:
 Hi Konstatin,
 
 We use hadoop as a library in a few places in Spark. I wonder why the path 
 includes null though.
 
 Could you provide the full stack trace?
 
 Andrew
 
 
 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:
 
 Hi all,
 
 I'm trying to run some transformation on Spark, it works fine on cluster 
 (YARN, linux machines). However, when I'm trying to run it on local machine 
 (Windows 7) under unit test, I got errors:
 
 
 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 
 My code is following:
 
 
 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor
 
 val data = sc.parallelize(List(in1, in2, in3))
 
 etl.etl(data) // rdd transformation, no access to SparkContext or 
 Hadoop
 Assert.assertTrue(true

Re: Run spark unit test on Windows 7

2014-07-02 Thread Kostiantyn Kudriavtsev
No, I don’t

why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to 
read data from local filesystem

On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote:

 By any chance do you have HDP 2.1 installed? you may need to install the 
 utils and update the env variables per 
 http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
 
 
 On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:
 
 Hi Andrew,
 
 it's windows 7 and I doesn't set up any env variables here 
 
 The full stack trace:
 
 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
 hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
  at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
  at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
  at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
  at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
  at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
  at org.apache.hadoop.security.Groups.init(Groups.java:77)
  at 
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
  at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
  at 
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
  at 
 org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
  at 
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
  at 
 org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
  at org.apache.spark.SparkContext.init(SparkContext.scala:228)
  at org.apache.spark.SparkContext.init(SparkContext.scala:97)
  at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at junit.framework.TestCase.runTest(TestCase.java:168)
  at junit.framework.TestCase.runBare(TestCase.java:134)
  at junit.framework.TestResult$1.protect(TestResult.java:110)
  at junit.framework.TestResult.runProtected(TestResult.java:128)
  at junit.framework.TestResult.run(TestResult.java:113)
  at junit.framework.TestCase.run(TestCase.java:124)
  at junit.framework.TestSuite.runTest(TestSuite.java:232)
  at junit.framework.TestSuite.run(TestSuite.java:227)
  at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
  at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
  at 
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
  at 
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
  at 
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
 
 
 Thank you,
 Konstantin Kudryavtsev
 
 
 On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:
 Hi Konstatin,
 
 We use hadoop as a library in a few places in Spark. I wonder why the path 
 includes null though.
 
 Could you provide the full stack trace?
 
 Andrew
 
 
 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:
 
 Hi all,
 
 I'm trying to run some transformation on Spark, it works fine on cluster 
 (YARN, linux machines). However, when I'm trying to run it on local machine 
 (Windows 7) under unit test, I got errors:
 
 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
 
 My code is following:
 
 @Test
 def testETL() = {
 val conf = new SparkConf()
 val sc = new SparkContext(local, test, conf)
 try {
 val etl = new IxtoolsDailyAgg() // empty constructor
 
 val data

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
You don't actually need it per se - its just that some of the Spark
libraries are referencing Hadoop libraries even if they ultimately do not
call them. When I was doing some early builds of Spark on Windows, I
admittedly had Hadoop on Windows running as well and had not run into this
particular issue.



On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev 
kudryavtsev.konstan...@gmail.com wrote:

 No, I don’t

 why do I need to have HDP installed? I don’t use Hadoop at all and I’d
 like to read data from local filesystem

 On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com wrote:

 By any chance do you have HDP 2.1 installed? you may need to install the
 utils and update the env variables per
 http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows


 On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com wrote:

 Hi Andrew,

 it's windows 7 and I doesn't set up any env variables here

 The full stack trace:

 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the
 hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
  at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
  at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
  at org.apache.hadoop.security.Groups.init(Groups.java:77)
 at
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
  at
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
 at
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
  at
 org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
 at
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
  at
 org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
 at org.apache.spark.SparkContext.init(SparkContext.scala:228)
  at org.apache.spark.SparkContext.init(SparkContext.scala:97)
 at my.example.EtlTest.testETL(IxtoolsDailyAggTest.scala:13)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
  at junit.framework.TestCase.runTest(TestCase.java:168)
 at junit.framework.TestCase.runBare(TestCase.java:134)
  at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
  at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
  at junit.framework.TestSuite.runTest(TestSuite.java:232)
 at junit.framework.TestSuite.run(TestSuite.java:227)
  at
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:130)
  at
 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
 at
 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
  at
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)


 Thank you,
 Konstantin Kudryavtsev


 On Wed, Jul 2, 2014 at 8:15 PM, Andrew Or and...@databricks.com wrote:

 Hi Konstatin,

 We use hadoop as a library in a few places in Spark. I wonder why the
 path includes null though.

 Could you provide the full stack trace?

 Andrew


 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev 
 kudryavtsev.konstan...@gmail.com:

 Hi all,

 I'm trying to run some transformation on *Spark*, it works fine on
 cluster (YARN, linux machines). However, when I'm trying to run it on local
 machine (*Windows 7*) under unit test, I got errors:

 java.io.IOException: Could not locate executable null\bin\winutils.exe in 
 the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
 at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
 at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
 at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)


 My code is following:

 @Test

Re: Unit test failure: Address already in use

2014-06-18 Thread Anselme Vignon
Hi,

Could your problem come from the fact that you run your tests in parallel ?

If you are spark in local mode, you cannot have concurrent spark instances
running. this means that your tests instantiating sparkContext cannot be
run in parallel. The easiest fix is to tell sbt to not run parallel tests.
This can be done by adding the following line in your build.sbt:

parallelExecution in Test := false

Cheers,

Anselme




2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.com:

 Hi,

 I have 3 unit tests (independent of each other) in the /src/test/scala
 folder. When I run each of them individually using: sbt test-only test,
 all the 3 pass the test. But when I run them all using sbt test, then
 they
 fail with the warning below. I am wondering if the binding exception
 results
 in failure to run the job, thereby causing the failure. If so, what can I
 do
 to address this binding exception? I am running these tests locally on a
 standalone machine (i.e. SparkContext(local, test)).


 14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED
 org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address
 already in use
 java.net.BindException: Address already in use
 at sun.nio.ch.Net.bind0(Native Method)
 at sun.nio.ch.Net.bind(Net.java:174)
 at
 sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
 at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)


 thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



RE: Unit test failure: Address already in use

2014-06-18 Thread Lisonbee, Todd

Disabling parallelExecution has worked for me.

Other alternatives I’ve tried that also work include:

1. Using a lock – this will let tests execute in parallel except for those 
using a SparkContext.  If you have a large number of tests that could execute 
in parallel, this can shave off some time.

object TestingSparkContext {
  val lock = new Lock()
}

// before you instantiate your local SparkContext
TestingSparkContext.lock.acquire()

// after you call sc.stop()
TestingSparkContext.lock.release()


2. Sharing a local SparkContext between tests.

- This is nice because your tests will run faster.  Start-up and shutdown is 
time consuming (can add a few seconds per test).

- The downside is that your tests are using the same SparkContext so they are 
less independent of each other.  I haven’t seen issues with this yet but there 
are likely some things that might crop up.

Best,

Todd


From: Anselme Vignon [mailto:anselme.vig...@flaminem.com]
Sent: Wednesday, June 18, 2014 12:33 AM
To: user@spark.apache.org
Subject: Re: Unit test failure: Address already in use

Hi,

Could your problem come from the fact that you run your tests in parallel ?

If you are spark in local mode, you cannot have concurrent spark instances 
running. this means that your tests instantiating sparkContext cannot be run in 
parallel. The easiest fix is to tell sbt to not run parallel tests.
This can be done by adding the following line in your build.sbt:

parallelExecution in Test := false

Cheers,

Anselme



2014-06-17 23:01 GMT+02:00 SK 
skrishna...@gmail.commailto:skrishna...@gmail.com:
Hi,

I have 3 unit tests (independent of each other) in the /src/test/scala
folder. When I run each of them individually using: sbt test-only test,
all the 3 pass the test. But when I run them all using sbt test, then they
fail with the warning below. I am wondering if the binding exception results
in failure to run the job, thereby causing the failure. If so, what can I do
to address this binding exception? I am running these tests locally on a
standalone machine (i.e. SparkContext(local, test)).


14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED
org.eclipse.jetty.server.Server@3487b78dmailto:org.eclipse.jetty.server.Server@3487b78d:
 java.net.BindException: Address
already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:174)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)


thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Unit test failure: Address already in use

2014-06-18 Thread Philip Ogren
In my unit tests I have a base class that all my tests extend that has a 
setup and teardown method that they inherit.  They look something like this:


var spark: SparkContext = _

@Before
def setUp() {
Thread.sleep(100L) //this seems to give spark more time to 
reset from the previous test's tearDown

spark = new SparkContext(local, test spark)
}

@After
def tearDown() {
spark.stop
spark = null //not sure why this helps but it does!
System.clearProperty(spark.master.port)
   }


It's been since last fall (i.e. version 0.8.x) since I've examined this 
code and so I can't vouch that it is still accurate/necessary - but it 
still works for me.



On 06/18/2014 12:59 PM, Lisonbee, Todd wrote:


Disabling parallelExecution has worked for me.

Other alternatives I’ve tried that also work include:

1. Using a lock – this will let tests execute in parallel except for 
those using a SparkContext.  If you have a large number of tests that 
could execute in parallel, this can shave off some time.


object TestingSparkContext {

val lock = new Lock()

}

// before you instantiate your local SparkContext

TestingSparkContext.lock.acquire()

// after you call sc.stop()

TestingSparkContext.lock.release()

2. Sharing a local SparkContext between tests.

- This is nice because your tests will run faster.  Start-up and 
shutdown is time consuming (can add a few seconds per test).


- The downside is that your tests are using the same SparkContext so 
they are less independent of each other.  I haven’t seen issues with 
this yet but there are likely some things that might crop up.


Best,

Todd

*From:*Anselme Vignon [mailto:anselme.vig...@flaminem.com]
*Sent:* Wednesday, June 18, 2014 12:33 AM
*To:* user@spark.apache.org
*Subject:* Re: Unit test failure: Address already in use

Hi,

Could your problem come from the fact that you run your tests in 
parallel ?


If you are spark in local mode, you cannot have concurrent spark 
instances running. this means that your tests instantiating 
sparkContext cannot be run in parallel. The easiest fix is to tell sbt 
to not run parallel tests.


This can be done by adding the following line in your build.sbt:

parallelExecution in Test := false

Cheers,

Anselme

2014-06-17 23:01 GMT+02:00 SK skrishna...@gmail.com 
mailto:skrishna...@gmail.com:


Hi,

I have 3 unit tests (independent of each other) in the /src/test/scala
folder. When I run each of them individually using: sbt test-only
test,
all the 3 pass the test. But when I run them all using sbt test,
then they
fail with the warning below. I am wondering if the binding
exception results
in failure to run the job, thereby causing the failure. If so,
what can I do
to address this binding exception? I am running these tests
locally on a
standalone machine (i.e. SparkContext(local, test)).


14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED
org.eclipse.jetty.server.Server@3487b78d
mailto:org.eclipse.jetty.server.Server@3487b78d:
java.net.BindException: Address
already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:174)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)


thanks



--
View this message in context:

http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.





Unit test failure: Address already in use

2014-06-17 Thread SK
Hi,

I have 3 unit tests (independent of each other) in the /src/test/scala
folder. When I run each of them individually using: sbt test-only test,
all the 3 pass the test. But when I run them all using sbt test, then they
fail with the warning below. I am wondering if the binding exception results
in failure to run the job, thereby causing the failure. If so, what can I do
to address this binding exception? I am running these tests locally on a
standalone machine (i.e. SparkContext(local, test)).


14/06/17 13:42:48 WARN component.AbstractLifeCycle: FAILED
org.eclipse.jetty.server.Server@3487b78d: java.net.BindException: Address
already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:174)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)


thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


printing in unit test

2014-06-13 Thread SK
Hi,
My unit test is failing (the output is not matching the expected output). I
would like to printout the value of the output. But 
rdd.foreach(r=println(r)) does not work from the unit test. How can I print
or write out the output to a file/screen?

thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/printing-in-unit-test-tp7611.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


unit test

2014-06-06 Thread b0c1
Hi!

I have two question:
1.
I want to test my application. My app will write the result to elasticsearch
(stage 1) with saveAsHadoopFile. How can I write test case for it? Need to
pull up a MiniDFSCluster? Or there are other way?

My application flow plan: 
Kafka = Spark Streaming (enrich) - Elasticsearch = Spark (map/reduce) -
HBase

2.
Can Spark read data from elasticsearch? What is the prefered way for this?

b0c1



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/unit-test-tp7155.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.