joins), and we have switched
>
> from RDD to Dataset recently.
>
>
> We've found that the unit test takes much longer. We profiled it and
>
> have found that it's the planning phase that is slow, not execution.
>
>
> I wonder if anyone has encountered this issue before and
is helps,
Enrico
Am 25.10.22 um 21:54 schrieb Tanin Na Nakorn:
Hi All,
Our data job is very complex (e.g. 100+ joins), and we have switched
from RDD to Dataset recently.
We've found that the unit test takes much longer. We profiled it and
have found that it's the planning phase that is s
Hi All,
Our data job is very complex (e.g. 100+ joins), and we have switched from
RDD to Dataset recently.
We've found that the unit test takes much longer. We profiled it and have
found that it's the planning phase that is slow, not execution.
I wonder if anyone has encountered this issue
Are you using IvyVPN which causes this problem? If the VPN software changes
the network URL silently you should avoid using them.
Regards.
On Wed, Dec 22, 2021 at 1:48 AM Pralabh Kumar
wrote:
> Hi Spark Team
>
> I am building a spark in VPN . But the unit test case below i
You would have to make it available? This doesn't seem like a spark issue.
On Tue, Dec 21, 2021, 10:48 AM Pralabh Kumar wrote:
> Hi Spark Team
>
> I am building a spark in VPN . But the unit test case below is failing.
> This is pointing to ivy location which cannot be reached with
Hi Spark Team
I am building a spark in VPN . But the unit test case below is failing.
This is pointing to ivy location which cannot be reached within VPN . Any
help would be appreciated
test("SPARK-33084: Add jar support Ivy URI -- default transitive = true") {
*sc *= new SparkC
= spark_session.createDataFrame([['one',
'two']]).toDF(*['first', 'second'])
assert df.subtract(df2).count() == 0
On Thu, Nov 19, 2020 at 6:38 AM Sachit Murarka
wrote:
> Hi Users,
>
> I have to write Unit Test cases for PySpark.
> I think pytest-spark and "spark testing base" are good test lib
Hi Users,
I have to write Unit Test cases for PySpark.
I think pytest-spark and "spark testing base" are good test libraries.
Can anyone please provide full reference for writing the test cases in
Python using these?
Kind Regards,
Sachit Murarka
x of each operation that the
>>> whole-stage codegen can be apply to.
>>>
>>> So, in your test case, whole-stage codegen has been already enabled!!
>>>
>>> FYI. I think that it is a good topic for d...@spark.apache.org.
>>>
>>> Kazuaki
t; FYI. I think that it is a good topic for d...@spark.apache.org.
>>
>> Kazuaki Ishizaki
>>
>>
>>
>> From:Koert Kuipers <ko...@tresata.com>
>> To:"user@spark.apache.org" <user@spark.apache.org>
>> Date:2017/04
abled!!
>
> FYI. I think that it is a good topic for d...@spark.apache.org.
>
> Kazuaki Ishizaki
>
>
>
> From:Koert Kuipers <ko...@tresata.com>
> To:"user@spark.apache.org" <user@spark.apache.org>
> Date:2017/04/05 05:12
>
"user@spark.apache.org" <user@spark.apache.org>
> Date:2017/04/05 05:12
> Subject:how do i force unit test to do whole stage codegen
> --
>
>
>
> i wrote my own expression with eval and doGenCode, but doGenCode
opic for d...@spark.apache.org.
Kazuaki Ishizaki
From: Koert Kuipers <ko...@tresata.com>
To: "user@spark.apache.org" <user@spark.apache.org>
Date: 2017/04/05 05:12
Subject:how do i force unit test to do whole stage codegen
i wrote my own expression with eval an
i wrote my own expression with eval and doGenCode, but doGenCode never gets
called in tests.
also as a test i ran this in a unit test:
spark.range(10).select('id as 'asId).where('id === 4).explain
according to
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage
Agreed with the statement in quotes below whether one wants to do unit
tests or not It is a good practice to write code that way. But I think the
more painful and tedious task is to mock/emulate all the nodes such as
spark workers/master/hdfs/input source stream and all that. I wish there is
>
> Basically you abstract your transformations to take in a dataframe and
> return one, then you assert on the returned df
>
+1 to this suggestion. This is why we wanted streaming and batch
dataframes to share the same API.
ali <kanth...@gmail.com> wrote:
>
> Hi All,
>
> How to unit test spark streaming or spark in general? How do I test the
> results of my transformations? Also, more importantly don't we need to spawn
> master and worker JVM's either in one or multiple
in a dataframe and
return one, then you assert on the returned df
Regards
Sam
On Tue, 7 Mar 2017 at 12:05, kant kodali <kanth...@gmail.com> wrote:
> Hi All,
>
> How to unit test spark streaming or spark in general? How do I test the
> results of my transformations? Also, more importa
Hi All,
How to unit test spark streaming or spark in general? How do I test the
results of my transformations? Also, more importantly don't we need to
spawn master and worker JVM's either in one or multiple nodes?
Thanks!
kant
After I created two test case that FlatSpec with DataFrameSuiteBase. But I
got errors when do sbt test. I was able to run each of them separately. My
test cases does use sqlContext to read files. Here is the exception stack.
Judging from the exception, I may need to unregister RpcEndpoint after
Subject: Re: How this unit test passed on master trunk?
From: zzh...@hortonworks.com
To: java8...@hotmail.com; gatorsm...@gmail.com
CC: user@spark.apache.org
Date: Sun, 24 Apr 2016 04:37:11 +
There are multiple records for the DF
scala> structDF.groupBy($"a").agg(min(st
struct(1, 2). Please check how the Ordering is
implemented in InterpretedOrdering.
The output itself does not have any ordering. I am not sure why the unit test
and the real env have different environment.
Xiao,
I do see the difference between unit test and local cluster run. Do you know
the reaso
"))).first()
first: org.apache.spark.sql.Row = [1,[1,1]]
BTW
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/715/consoleFull
shows this test passing.
On Fri, Apr 22, 2016 at 11:23 AM, Yong Zhang <java8...@hotmail.com> wrote:
> Hi,
>
> I was trying to find out why
Hi,
I was trying to find out why this unit test can pass in Spark code.
inhttps://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
for this unit test:
test("Star Expansion - CreateStruct and CreateArray") {
val structDf = testDa
bootstrapping for you.
>>>>
>>>> https://github.com/holdenk/spark-testing-base
>>>>
>>>> DataFrame examples are here:
>>>> https://github.com/holdenk/spark-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleD
-testing-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>>
>>> Thanks,
>>> Silvio
>>>
>>> From: Steve Annessa <steve.anne...@gmail.com>
>>> Date: Thursday, February 4, 2016 at 8:36 PM
>>>
I'm trying to unit test a function that reads in a JSON file, manipulates
the DF and then returns a Scala Map.
The function has signature:
def ingest(dataLocation: String, sc: SparkContext, sqlContext: SQLContext)
I've created a bootstrap spec for spark jobs that instantiates the Spark
Context
uot;
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Unit test with sqlContext
I'm trying to unit test a function that reads in a JSON file, manipulates the
DF and then returns a Scala Map.
The function has signature:
def ingest(dataLocation: String, sc: SparkContex
ting-base/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>>
>> Thanks,
>> Silvio
>>
>> From: Steve Annessa <steve.anne...@gmail.com>
>> Date: Thursday, February 4, 2016 at 8:36 PM
>> To: "user@spark.apac
e/blob/master/src/test/1.3/scala/com/holdenkarau/spark/testing/SampleDataFrameTest.scala
>
> Thanks,
> Silvio
>
> From: Steve Annessa <steve.anne...@gmail.com>
> Date: Thursday, February 4, 2016 at 8:36 PM
> To: "user@spark.apache.org" <user@spark.apache.org&
try:
mvn test -pl sql -DwildcardSuites=org.apache.spark.sql -Dtest=none
On 12 Nov 2015, at 03:13, weoccc <weo...@gmail.com<mailto:weo...@gmail.com>>
wrote:
Hi,
I am wondering how to run unit test for specific spark component only.
mvn test -DwildcardSuites="org.apache.sp
Have you tried the following ?
build/sbt "sql/test-only *"
Cheers
On Wed, Nov 11, 2015 at 7:13 PM, weoccc <weo...@gmail.com> wrote:
> Hi,
>
> I am wondering how to run unit test for specific spark component only.
>
> mvn test -DwildcardSuites="org.apache.sp
Hi,
I am wondering how to run unit test for specific spark component only.
mvn test -DwildcardSuites="org.apache.spark.sql.*" -Dtest=none
The above command doesn't seem to work. I'm using spark 1.5.
Thanks,
Weide
Thanks for your response Yana,
I can increase the MaxPermSize parameter and it will allow me to run the
unit test a few more times before I run out of memory.
However, the primary issue is that running the same unit test in the same
JVM (multiple times) results in increased memory (each run
I'd suggest setting sbt to fork when running tests.
On Wed, Aug 26, 2015 at 10:51 AM, Mike Trienis mike.trie...@orcsol.com
wrote:
Thanks for your response Yana,
I can increase the MaxPermSize parameter and it will allow me to run the
unit test a few more times before I run out of memory
Hello,
I am using sbt and created a unit test where I create a `HiveContext` and
execute some query and then return. Each time I run the unit test the JVM
will increase it's memory usage until I get the error:
Internal error when running tests: java.lang.OutOfMemoryError: PermGen space
Exception
test where I create a `HiveContext` and
execute some query and then return. Each time I run the unit test the JVM
will increase it's memory usage until I get the error:
Internal error when running tests: java.lang.OutOfMemoryError: PermGen
space
Exception in thread Thread-2
:
Do you get this failure repeatedly?
On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote:
Hi, all, i got following error when i run unit test of spark by
dev/run-tests
on the latest branch-1.4 branch.
the latest commit id:
commit d518c0369fa412567855980c3f0f426cde5c190d
error when i run unit test of spark by dev/run-tests
on the latest branch-1.4 branch.
the latest commit id:
commit d518c0369fa412567855980c3f0f426cde5c190d
Author: zsxwing zsxw...@gmail.commailto:zsxw...@gmail.com
Date: Wed May 13 17:58:29 2015 -0700
error
[info] Test
Hi, all, i got following error when i run unit test of spark by dev/run-tests
on the latest branch-1.4 branch.
the latest commit id:
commit d518c0369fa412567855980c3f0f426cde5c190d
Author: zsxwing zsxw...@gmail.com
Date: Wed May 13 17:58:29 2015 -0700
error
[info] Test
Do you get this failure repeatedly?
On Thu, May 14, 2015 at 12:55 AM, kf wangf...@huawei.com wrote:
Hi, all, i got following error when i run unit test of spark by
dev/run-tests
on the latest branch-1.4 branch.
the latest commit id:
commit d518c0369fa412567855980c3f0f426cde5c190d
Author
I'm also getting the same error.
Any ideas?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368p22798.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
It's because your tests are running in parallel and you can only have one
context running at a time.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p22429.html
Sent from the Apache Spark User List mailing list archive
--
View this message in context: Spark unit test fails
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-unit-test-fails-tp22368.html
Sent from the Apache Spark User List mailing list archive
http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.
Hi experts,
I am trying to write unit tests for my spark application which fails with
javax.servlet.FilterRegistration error.
I am using CDH5.3.2 Spark and below is my dependencies list.
val spark = 1.2.0-cdh5.3.2
val esriGeometryAPI = 1.2
val csvWriter = 1.0.0
Hi,
I extended org.apache.spark.streaming.TestSuiteBase for some testing, and I
was able to run this test fine:
test(Sliding window join with 3 second window duration) {
val input1 =
Seq(
Seq(req1),
Seq(req2, req3),
Seq(),
Seq(req4, req5, req6),
Seq(req7),
().accept(MediaType.APPLICATION_JSON_TYPE).get(String.class);
logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
When run inside a unit test as follows:
mvn clean test -Dtest=SpotlightTest#testCountWords
it contacts the RESTful web service and retrieves some data as expected
(String.class);
logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
When run inside a unit test as follows:
mvn clean test -Dtest=SpotlightTest#testCountWords
it contacts the RESTful web service and retrieves some data as expected. But
when the same code is run as part
On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:
I'd take a look with 'mvn dependency:tree' on your own code first.
Maybe you are including JavaEE 6 for example?
For reference, my complete pom.xml looks like:
project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=
: {}, response);
It seems to work when I use spark-submit to submit the application that
includes this code.
Funny thing is, now my relevant unit test does not run, complaining about
not having enough memory:
Java HotSpot(TM) 64-Bit Server VM warning: INFO:
os::commit_memory(0xc490
thing is, now my relevant unit test does not run, complaining about
not having enough memory:
Java HotSpot(TM) 64-Bit Server VM warning: INFO:
os::commit_memory(0xc490, 25165824, 0) failed; error='Cannot
allocate memory' (errno=12)
#
# There is insufficient memory for the Java
()
.get(String.class);
logger.warn(!!! DEBUG !!! Spotlight response: {}, response);
It seems to work when I use spark-submit to submit the application that
includes this code.
Funny thing is, now my relevant unit test does not run, complaining about
not having enough
to my local Spark, it waits for a file to be
written to a given directory, and when I create that file it successfully
prints the number of words. I terminate the application by pressing Ctrl+C.
Now I've tried to create a very basic unit test for this functionality, but
in the test I was not able
there.
Best,
Burak
- Original Message -
From: Emre Sevinc emre.sev...@gmail.com
To: user@spark.apache.org
Sent: Monday, December 8, 2014 2:36:41 AM
Subject: How can I make Spark Streaming count the words in a file in a unit
test?
Hello,
I've successfully built a very simple Spark Streaming
/Cannot-run-unit-test-tp14459p14506.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11825.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
be used to run this test?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Does it not show the name of the testsuite on stdout, showing that it has
passed? Can you try writing a small test unit-test, in the same way as
your kafka unit test, and with print statements on stdout ... to see
whether it works? I believe it is some configuration issue in maven, which
is hard
when trying to run the KafkaStreamSuite.scala unit
test.
I added scalatest-maven-plugin to my pom.xml, then ran mvn test, and got
the follow error message:
error: object Utils in package util cannot be accessed in package
org.apache.spark.util
[INFO
Hello Spark Users,
I have a spark streaming program that stream data from kafka topics and
output as parquet file on HDFS.
Now I want to write a unit test for this program to make sure the output
data is correct (i.e not missing any data from kafka).
However, I have no idea about how to do
Appropriately timed question! Here is the PR that adds a real unit
test for Kafka stream in Spark Streaming. Maybe this will help!
https://github.com/apache/spark/pull/1751/files
On Mon, Aug 4, 2014 at 6:30 PM, JiajiaJing jj.jing0...@gmail.com wrote:
Hello Spark Users,
I have a spark
This helps a lot!!
Thank you very much!
Jiajia
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Test-for-Spark-Streaming-tp11394p11396.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
(*Windows 7*) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe in
the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333
,
I'm trying to run some transformation on Spark, it works fine on cluster (YARN,
linux machines). However, when I'm trying to run it on local machine (Windows
7) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the
Hadoop binaries
to run some transformation on Spark, it works fine on cluster
(YARN, linux machines). However, when I'm trying to run it on local
machine (Windows 7) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe in
the Hadoop binaries
machine (Windows 7) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe
in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333
Hi all,
I'm trying to run some transformation on *Spark*, it works fine on cluster
(YARN, linux machines). However, when I'm trying to run it on local machine
(*Windows 7*) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe
in the Hadoop
on *Spark*, it works fine on
cluster (YARN, linux machines). However, when I'm trying to run it on local
machine (*Windows 7*) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the
Hadoop binaries
GMT-07:00 Konstantin Kudryavtsev
kudryavtsev.konstan...@gmail.com:
Hi all,
I'm trying to run some transformation on *Spark*, it works fine on
cluster (YARN, linux machines). However, when I'm trying to run it on local
machine (*Windows 7*) under unit test, I got errors:
java.io.IOException
:00 Konstantin Kudryavtsev
kudryavtsev.konstan...@gmail.com:
Hi all,
I'm trying to run some transformation on Spark, it works fine on cluster
(YARN, linux machines). However, when I'm trying to run it on local machine
(Windows 7) under unit test, I got errors:
java.io.IOException
on cluster
(YARN, linux machines). However, when I'm trying to run it on local machine
(Windows 7) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe in
the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318
(*Windows 7*) under unit test, I got errors:
java.io.IOException: Could not locate executable null\bin\winutils.exe in
the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
,
Todd
From: Anselme Vignon [mailto:anselme.vig...@flaminem.com]
Sent: Wednesday, June 18, 2014 12:33 AM
To: user@spark.apache.org
Subject: Re: Unit test failure: Address already in use
Hi,
Could your problem come from the fact that you run your tests in parallel ?
If you are spark in local mode
@spark.apache.org
*Subject:* Re: Unit test failure: Address already in use
Hi,
Could your problem come from the fact that you run your tests in
parallel ?
If you are spark in local mode, you cannot have concurrent spark
instances running. this means that your tests instantiating
sparkContext cannot be run
)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unit-test-failure-Address-already-in-use-tp7771.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
My unit test is failing (the output is not matching the expected output). I
would like to printout the value of the output. But
rdd.foreach(r=println(r)) does not work from the unit test. How can I print
or write out the output to a file/screen?
thanks.
--
View this message in context
) - Elasticsearch = Spark (map/reduce) -
HBase
2.
Can Spark read data from elasticsearch? What is the prefered way for this?
b0c1
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/unit-test-tp7155.html
Sent from the Apache Spark User List mailing list archive
78 matches
Mail list logo