Re: Run a specific PySpark test or group of tests

2018-12-06 Thread Xiao Li
Yes! This is very helpful!

On Wed, Dec 5, 2018 at 9:21 PM Wenchen Fan  wrote:

> great job! thanks a lot!
>
> On Thu, Dec 6, 2018 at 9:39 AM Hyukjin Kwon  wrote:
>
>> It's merged now and in developer tools page -
>> http://spark.apache.org/developer-tools.html#individual-tests
>> Have some func with PySpark testing!
>>
>> 2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성:
>>
>>> Hey all, I kind of met the goal with a minimised fix with keeping
>>> available framework and options. See
>>>
>>> https://github.com/apache/spark/pull/23203
>>> https://github.com/apache/spark-website/pull/161
>>>
>>> I know it's not perfect and other Python testing framework provide many
>>> good other features but should be good enough for now.
>>> Thanks!
>>>
>>>
>>> 2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas 님이
>>> 작성:
>>>
 Looks like it doesn’t take too much work to get pytest working on our
 code base, since it knows how to run unittest tests.

 https://github.com/apache/spark/compare/master…nchammas:pytest
 

 For example I was able to do this from that branch and it did the right
 thing, running only the tests with string in their name:

 python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py 
 ./pyspark/sql/tests.py -v -k string

 However, looking more closely at the whole test setup, I’m hesitant to
 work any further on this.

 My intention was to see if we could leverage pytest, tox, and other
 test tools that are standard in the Python ecosystem to replace some of the
 homegrown stuff we have. We have our own test dependency tracking code, our
 own breakdown of tests into module-scoped chunks, and our own machinery to
 parallelize test execution. It seems like it would be a lot of work to reap
 the benefits of using the standard tools while ensuring that we don’t lose
 any of the benefits our current test setup provides.

 Nick

 On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutl...@gmail.com
  wrote:

 This generally works for me to just run tests within a class or even a
> single test.  Not as flexible as pytest -k, which would be nice..
>
> $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
> On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Pytest does support unittest-based tests
>> , allowing for
>> incremental adoption. I'll see how convenient it is to use with our 
>> current
>> test layout.
>>
>> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon 
>> wrote:
>>
>>> For me, I would like this if this can be done with relatively small
>>> changes.
>>> How about adding more granular options, for example, specifying or
>>> filtering smaller set of test goals in the run-tests.py script?
>>> I think it'd be quite small change and we could roughly reach this
>>> goal if I understood correctly.
>>>
>>>
>>> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas <
>>> nicholas.cham...@gmail.com>:
>>>
 Say you’re working on something and you want to rerun the PySpark
 tests, focusing on a specific test or group of tests. Is there a way 
 to do
 that?

 I know that you can test entire modules with this:

 ./python/run-tests --modules pyspark-sql

 But I’m looking for something more granular, like pytest’s -k
 option.

 On that note, does anyone else think it would be valuable to use a
 test runner like pytest to run our Python tests? The biggest benefits 
 would
 be the use of fixtures
 , and more
 flexibility on test running and reporting. Just wondering if we’ve 
 already
 considered this.

 Nick
 ​

>>>
>>> ​

>>>

-- 
[image: Spark+AI Summit North America 2019]



Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Wenchen Fan
great job! thanks a lot!

On Thu, Dec 6, 2018 at 9:39 AM Hyukjin Kwon  wrote:

> It's merged now and in developer tools page -
> http://spark.apache.org/developer-tools.html#individual-tests
> Have some func with PySpark testing!
>
> 2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성:
>
>> Hey all, I kind of met the goal with a minimised fix with keeping
>> available framework and options. See
>>
>> https://github.com/apache/spark/pull/23203
>> https://github.com/apache/spark-website/pull/161
>>
>> I know it's not perfect and other Python testing framework provide many
>> good other features but should be good enough for now.
>> Thanks!
>>
>>
>> 2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas 님이
>> 작성:
>>
>>> Looks like it doesn’t take too much work to get pytest working on our
>>> code base, since it knows how to run unittest tests.
>>>
>>> https://github.com/apache/spark/compare/master…nchammas:pytest
>>> 
>>>
>>> For example I was able to do this from that branch and it did the right
>>> thing, running only the tests with string in their name:
>>>
>>> python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py 
>>> ./pyspark/sql/tests.py -v -k string
>>>
>>> However, looking more closely at the whole test setup, I’m hesitant to
>>> work any further on this.
>>>
>>> My intention was to see if we could leverage pytest, tox, and other test
>>> tools that are standard in the Python ecosystem to replace some of the
>>> homegrown stuff we have. We have our own test dependency tracking code, our
>>> own breakdown of tests into module-scoped chunks, and our own machinery to
>>> parallelize test execution. It seems like it would be a lot of work to reap
>>> the benefits of using the standard tools while ensuring that we don’t lose
>>> any of the benefits our current test setup provides.
>>>
>>> Nick
>>>
>>> On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutl...@gmail.com
>>>  wrote:
>>>
>>> This generally works for me to just run tests within a class or even a
 single test.  Not as flexible as pytest -k, which would be nice..

 $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
 On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
 nicholas.cham...@gmail.com> wrote:

> Pytest does support unittest-based tests
> , allowing for
> incremental adoption. I'll see how convenient it is to use with our 
> current
> test layout.
>
> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon 
> wrote:
>
>> For me, I would like this if this can be done with relatively small
>> changes.
>> How about adding more granular options, for example, specifying or
>> filtering smaller set of test goals in the run-tests.py script?
>> I think it'd be quite small change and we could roughly reach this
>> goal if I understood correctly.
>>
>>
>> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas <
>> nicholas.cham...@gmail.com>:
>>
>>> Say you’re working on something and you want to rerun the PySpark
>>> tests, focusing on a specific test or group of tests. Is there a way to 
>>> do
>>> that?
>>>
>>> I know that you can test entire modules with this:
>>>
>>> ./python/run-tests --modules pyspark-sql
>>>
>>> But I’m looking for something more granular, like pytest’s -k
>>> option.
>>>
>>> On that note, does anyone else think it would be valuable to use a
>>> test runner like pytest to run our Python tests? The biggest benefits 
>>> would
>>> be the use of fixtures
>>> , and more
>>> flexibility on test running and reporting. Just wondering if we’ve 
>>> already
>>> considered this.
>>>
>>> Nick
>>> ​
>>>
>>
>> ​
>>>
>>


Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
It's merged now and in developer tools page -
http://spark.apache.org/developer-tools.html#individual-tests
Have some func with PySpark testing!

2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성:

> Hey all, I kind of met the goal with a minimised fix with keeping
> available framework and options. See
>
> https://github.com/apache/spark/pull/23203
> https://github.com/apache/spark-website/pull/161
>
> I know it's not perfect and other Python testing framework provide many
> good other features but should be good enough for now.
> Thanks!
>
>
> 2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas 님이
> 작성:
>
>> Looks like it doesn’t take too much work to get pytest working on our
>> code base, since it knows how to run unittest tests.
>>
>> https://github.com/apache/spark/compare/master…nchammas:pytest
>> 
>>
>> For example I was able to do this from that branch and it did the right
>> thing, running only the tests with string in their name:
>>
>> python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py 
>> ./pyspark/sql/tests.py -v -k string
>>
>> However, looking more closely at the whole test setup, I’m hesitant to
>> work any further on this.
>>
>> My intention was to see if we could leverage pytest, tox, and other test
>> tools that are standard in the Python ecosystem to replace some of the
>> homegrown stuff we have. We have our own test dependency tracking code, our
>> own breakdown of tests into module-scoped chunks, and our own machinery to
>> parallelize test execution. It seems like it would be a lot of work to reap
>> the benefits of using the standard tools while ensuring that we don’t lose
>> any of the benefits our current test setup provides.
>>
>> Nick
>>
>> On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutl...@gmail.com
>>  wrote:
>>
>> This generally works for me to just run tests within a class or even a
>>> single test.  Not as flexible as pytest -k, which would be nice..
>>>
>>> $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
>>> On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
 Pytest does support unittest-based tests
 , allowing for
 incremental adoption. I'll see how convenient it is to use with our current
 test layout.

 On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon 
 wrote:

> For me, I would like this if this can be done with relatively small
> changes.
> How about adding more granular options, for example, specifying or
> filtering smaller set of test goals in the run-tests.py script?
> I think it'd be quite small change and we could roughly reach this
> goal if I understood correctly.
>
>
> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas  >:
>
>> Say you’re working on something and you want to rerun the PySpark
>> tests, focusing on a specific test or group of tests. Is there a way to 
>> do
>> that?
>>
>> I know that you can test entire modules with this:
>>
>> ./python/run-tests --modules pyspark-sql
>>
>> But I’m looking for something more granular, like pytest’s -k option.
>>
>> On that note, does anyone else think it would be valuable to use a
>> test runner like pytest to run our Python tests? The biggest benefits 
>> would
>> be the use of fixtures
>> , and more
>> flexibility on test running and reporting. Just wondering if we’ve 
>> already
>> considered this.
>>
>> Nick
>> ​
>>
>
> ​
>>
>


Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
Hey all, I kind of met the goal with a minimised fix with keeping available
framework and options. See

https://github.com/apache/spark/pull/23203
https://github.com/apache/spark-website/pull/161

I know it's not perfect and other Python testing framework provide many
good other features but should be good enough for now.
Thanks!


2017년 8월 17일 (목) 오전 2:38, Nicholas Chammas 님이
작성:

> Looks like it doesn’t take too much work to get pytest working on our code
> base, since it knows how to run unittest tests.
>
> https://github.com/apache/spark/compare/master…nchammas:pytest
> 
>
> For example I was able to do this from that branch and it did the right
> thing, running only the tests with string in their name:
>
> python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py 
> ./pyspark/sql/tests.py -v -k string
>
> However, looking more closely at the whole test setup, I’m hesitant to
> work any further on this.
>
> My intention was to see if we could leverage pytest, tox, and other test
> tools that are standard in the Python ecosystem to replace some of the
> homegrown stuff we have. We have our own test dependency tracking code, our
> own breakdown of tests into module-scoped chunks, and our own machinery to
> parallelize test execution. It seems like it would be a lot of work to reap
> the benefits of using the standard tools while ensuring that we don’t lose
> any of the benefits our current test setup provides.
>
> Nick
>
> On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutl...@gmail.com
>  wrote:
>
> This generally works for me to just run tests within a class or even a
>> single test.  Not as flexible as pytest -k, which would be nice..
>>
>> $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
>> On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> Pytest does support unittest-based tests
>>> , allowing for
>>> incremental adoption. I'll see how convenient it is to use with our current
>>> test layout.
>>>
>>> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon 
>>> wrote:
>>>
 For me, I would like this if this can be done with relatively small
 changes.
 How about adding more granular options, for example, specifying or
 filtering smaller set of test goals in the run-tests.py script?
 I think it'd be quite small change and we could roughly reach this goal
 if I understood correctly.


 2017-08-15 3:06 GMT+09:00 Nicholas Chammas 
 :

> Say you’re working on something and you want to rerun the PySpark
> tests, focusing on a specific test or group of tests. Is there a way to do
> that?
>
> I know that you can test entire modules with this:
>
> ./python/run-tests --modules pyspark-sql
>
> But I’m looking for something more granular, like pytest’s -k option.
>
> On that note, does anyone else think it would be valuable to use a
> test runner like pytest to run our Python tests? The biggest benefits 
> would
> be the use of fixtures
> , and more
> flexibility on test running and reporting. Just wondering if we’ve already
> considered this.
>
> Nick
> ​
>

 ​
>


Re: Run a specific PySpark test or group of tests

2017-08-16 Thread Nicholas Chammas
Looks like it doesn’t take too much work to get pytest working on our code
base, since it knows how to run unittest tests.

https://github.com/apache/spark/compare/master…nchammas:pytest


For example I was able to do this from that branch and it did the right
thing, running only the tests with string in their name:

python [pytest *]$ ../bin/spark-submit ./pytest-run-tests.py
./pyspark/sql/tests.py -v -k string

However, looking more closely at the whole test setup, I’m hesitant to work
any further on this.

My intention was to see if we could leverage pytest, tox, and other test
tools that are standard in the Python ecosystem to replace some of the
homegrown stuff we have. We have our own test dependency tracking code, our
own breakdown of tests into module-scoped chunks, and our own machinery to
parallelize test execution. It seems like it would be a lot of work to reap
the benefits of using the standard tools while ensuring that we don’t lose
any of the benefits our current test setup provides.

Nick

On Tue, Aug 15, 2017 at 3:26 PM Bryan Cutler cutl...@gmail.com
 wrote:

This generally works for me to just run tests within a class or even a
> single test.  Not as flexible as pytest -k, which would be nice..
>
> $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests
> On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Pytest does support unittest-based tests
>> , allowing for
>> incremental adoption. I'll see how convenient it is to use with our current
>> test layout.
>>
>> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon  wrote:
>>
>>> For me, I would like this if this can be done with relatively small
>>> changes.
>>> How about adding more granular options, for example, specifying or
>>> filtering smaller set of test goals in the run-tests.py script?
>>> I think it'd be quite small change and we could roughly reach this goal
>>> if I understood correctly.
>>>
>>>
>>> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas :
>>>
 Say you’re working on something and you want to rerun the PySpark
 tests, focusing on a specific test or group of tests. Is there a way to do
 that?

 I know that you can test entire modules with this:

 ./python/run-tests --modules pyspark-sql

 But I’m looking for something more granular, like pytest’s -k option.

 On that note, does anyone else think it would be valuable to use a test
 runner like pytest to run our Python tests? The biggest benefits would be
 the use of fixtures ,
 and more flexibility on test running and reporting. Just wondering if we’ve
 already considered this.

 Nick
 ​

>>>
>>> ​


Re: Run a specific PySpark test or group of tests

2017-08-15 Thread Bryan Cutler
This generally works for me to just run tests within a class or even a
single test.  Not as flexible as pytest -k, which would be nice..

$ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests

On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> Pytest does support unittest-based tests
> , allowing for
> incremental adoption. I'll see how convenient it is to use with our current
> test layout.
>
> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon  wrote:
>
>> For me, I would like this if this can be done with relatively small
>> changes.
>> How about adding more granular options, for example, specifying or
>> filtering smaller set of test goals in the run-tests.py script?
>> I think it'd be quite small change and we could roughly reach this goal
>> if I understood correctly.
>>
>>
>> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas :
>>
>>> Say you’re working on something and you want to rerun the PySpark tests,
>>> focusing on a specific test or group of tests. Is there a way to do that?
>>>
>>> I know that you can test entire modules with this:
>>>
>>> ./python/run-tests --modules pyspark-sql
>>>
>>> But I’m looking for something more granular, like pytest’s -k option.
>>>
>>> On that note, does anyone else think it would be valuable to use a test
>>> runner like pytest to run our Python tests? The biggest benefits would be
>>> the use of fixtures ,
>>> and more flexibility on test running and reporting. Just wondering if we’ve
>>> already considered this.
>>>
>>> Nick
>>> ​
>>>
>>
>>


Re: Run a specific PySpark test or group of tests

2017-08-15 Thread Nicholas Chammas
Pytest does support unittest-based tests
, allowing for incremental
adoption. I'll see how convenient it is to use with our current test layout.

On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon  wrote:

> For me, I would like this if this can be done with relatively small
> changes.
> How about adding more granular options, for example, specifying or
> filtering smaller set of test goals in the run-tests.py script?
> I think it'd be quite small change and we could roughly reach this goal if
> I understood correctly.
>
>
> 2017-08-15 3:06 GMT+09:00 Nicholas Chammas :
>
>> Say you’re working on something and you want to rerun the PySpark tests,
>> focusing on a specific test or group of tests. Is there a way to do that?
>>
>> I know that you can test entire modules with this:
>>
>> ./python/run-tests --modules pyspark-sql
>>
>> But I’m looking for something more granular, like pytest’s -k option.
>>
>> On that note, does anyone else think it would be valuable to use a test
>> runner like pytest to run our Python tests? The biggest benefits would be
>> the use of fixtures ,
>> and more flexibility on test running and reporting. Just wondering if we’ve
>> already considered this.
>>
>> Nick
>> ​
>>
>
>


Re: Run a specific PySpark test or group of tests

2017-08-14 Thread Hyukjin Kwon
For me, I would like this if this can be done with relatively small changes.
How about adding more granular options, for example, specifying or
filtering smaller set of test goals in the run-tests.py script?
I think it'd be quite small change and we could roughly reach this goal if
I understood correctly.


2017-08-15 3:06 GMT+09:00 Nicholas Chammas :

> Say you’re working on something and you want to rerun the PySpark tests,
> focusing on a specific test or group of tests. Is there a way to do that?
>
> I know that you can test entire modules with this:
>
> ./python/run-tests --modules pyspark-sql
>
> But I’m looking for something more granular, like pytest’s -k option.
>
> On that note, does anyone else think it would be valuable to use a test
> runner like pytest to run our Python tests? The biggest benefits would be
> the use of fixtures , and
> more flexibility on test running and reporting. Just wondering if we’ve
> already considered this.
>
> Nick
> ​
>


Run a specific PySpark test or group of tests

2017-08-14 Thread Nicholas Chammas
Say you’re working on something and you want to rerun the PySpark tests,
focusing on a specific test or group of tests. Is there a way to do that?

I know that you can test entire modules with this:

./python/run-tests --modules pyspark-sql

But I’m looking for something more granular, like pytest’s -k option.

On that note, does anyone else think it would be valuable to use a test
runner like pytest to run our Python tests? The biggest benefits would be
the use of fixtures , and
more flexibility on test running and reporting. Just wondering if we’ve
already considered this.

Nick
​