[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2018-11-22 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695652#comment-16695652
 ] 

Apache Spark commented on SPARK-7721:
-

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/23117

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>Priority: Major
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2018-11-22 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695650#comment-16695650
 ] 

Apache Spark commented on SPARK-7721:
-

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/23117

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>Priority: Major
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2018-03-02 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383532#comment-16383532
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

[~rxin], I am sorry that it's been delayed. I will finish this for sure. I was 
away from this due to release stuff.

Will try "Another one" way first and fallback to "Simplest one" if I fail ([in 
this 
comment|https://issues.apache.org/jira/browse/SPARK-7721?focusedCommentId=16305108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16305108])

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>Priority: Major
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2018-01-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318639#comment-16318639
 ] 

Apache Spark commented on SPARK-7721:
-

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/20204

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2018-01-04 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312398#comment-16312398
 ] 

Reynold Xin commented on SPARK-7721:


I think it's fine even if you don't preserve the history forever ...


> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2018-01-04 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312331#comment-16312331
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

I (and possibly few committers given [the comment 
above|https://issues.apache.org/jira/browse/SPARK-7721?focusedCommentId=14551198&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14551198])
 would run this though ..  but yes, sure, it should become actually powerful 
when we can run it automatically.

If we are all fine to have a single up-to-date coverage site ("Simplest one") 
for now, it's pretty easy and possible. It's just what I have done so far here 
- https://spark-test.github.io/pyspark-coverage-site and the only thing I 
should do is to make this automatic, clone the latest commit bit and push it.

I know it's better to keep the history of coverages and leave the link in each 
PR ("Another one") and in this case we should consider how to keep the history 
of the coverages, etc. This is where I should investigate more and verify the 
idea.

Will anyway test and investigate the integration more and try the "Another one" 
way too. If I fail, I think we can fall back to "Simplest one" for now. Does 
this sounds good to you? In this way, I think I can make sure we can run this 
automatically eventually.

BTW, can you take a look for https://github.com/apache/spark/pull/20151 too?

This way we can make the changes separate for Coverage only and I am trying to 
isolate such logics as much as we can in case we can bring better idea in the 
future.


> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2018-01-04 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311731#comment-16311731
 ] 

Reynold Xin commented on SPARK-7721:


We can add it first but in my experience this will only be used when it is 
automatic :)


> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2018-01-03 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309585#comment-16309585
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

[~rxin] What do you think about doing a script and then integrating it with 
Jenkins? Or would you want me to check "2. Integrating with Jenkins" a bit more 
for clarification?

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-12-27 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305108#comment-16305108
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

I roughly checked the coverage results and seems fine. There is one trivial nit 
tho - 
https://github.com/apache/spark/blob/04e44b37cc04f62fbf9e08c7076349e0a4d12ea8/python/pyspark/daemon.py#L148-L169
 this scope is not in the coverage results as basically I am producing the 
coverage results in {{worker.py}} separately and then merging it. I believe 
it's not a big deal.

So, if you are fine for all now, how about if i proceed this by two PRs

1. Adding the script only (of course after cleaning up)

   Adding script alone should also be useful when reviewers check PRs, they can 
at least manually run it.

2. Integrating with Jenkins

  I have two thoughts for this:

  - Simplest one: Only run it in a specific mater in Jenkins and we always only 
keep a single up-to-date coverage site. It's simple. We can just simply push 
it. I think this is quite straightforward and pretty feasible. 

  - Another one: I make a simple site to list up all other coverages of all 
other builds (including PR builds) in git pages, and then leave a link in each 
PR's Jenkins build success message. I think this's also feasible but I think I 
need to take a look further.

BTW, I will be able to start to work on this from next week or two weeks after 
..

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-12-27 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305103#comment-16305103
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

Hey [~rxin], I think I made it now by few modification of the script and 
forcing {{worker.py}} to produce the coverage results.
I ran it by Python 3 and Coverage 4.4 and all tests passed and just updated the 
site - https://spark-test.github.io/pyspark-coverage-site

FYI, here is the diff I used in the main codes to force it to produces (15ish 
lines addition)

{code}
diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
index e6737ae1c12..088debcf796 100644
--- a/python/pyspark/worker.py
+++ b/python/pyspark/worker.py
@@ -159,7 +159,7 @@ def read_udfs(pickleSer, infile, eval_type):
 return func, None, ser, ser


-def main(infile, outfile):
+def _main(infile, outfile):
 try:
 boot_time = time.time()
 split_index = read_int(infile)
@@ -259,6 +259,22 @@ def main(infile, outfile):
 exit(-1)


+if "COVERAGE_PROCESS_START" in os.environ:
+def _cov_wrapped(*args, **kwargs):
+import coverage
+cov = coverage.coverage(
+config_file=os.environ["COVERAGE_PROCESS_START"])
+cov.start()
+try:
+_main(*args, **kwargs)
+finally:
+cov.stop()
+cov.save()
+main = _cov_wrapped
+else:
+main = _main
+
+
 if __name__ == '__main__':
 # Read a local port to connect to from stdin
 java_port = int(sys.stdin.readline())
{code}



> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-12-13 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290175#comment-16290175
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

Sure, I didn't mean to rush and start to proceed without investigating and 
checking the whole stuff ahead. Just wanted to check your thought ahead. Will 
try to have some time to take a look and proceed this bit by bit, and of course 
will update you.

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-12-13 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290134#comment-16290134
 ] 

Reynold Xin commented on SPARK-7721:


We definitely don't need to do it in one-go, but with all the stuff like this 
the key is to know for sure we can do it. Otherwise they become some half 
-baked infra that's committed but not actually functioning, and brings more 
hassle than needed.


> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-12-09 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284797#comment-16284797
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

For 2., will take another look and be back with a way to solve it.

[~rxin], do you expect for me to do this in one go, or separate ones (1. 
exposing/working coverage with Jenkins and 2. making this working with doctests 
and tracking worker processes)? 

BTW, somehow I am missing notifications from old JIRAs(?) unless explicitly I 
am cc'ed(?).



> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-12-09 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284795#comment-16284795
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

[~rxin] Yup, it is. coverage from worker side is missed too as described above. 
So, there are currently two problems:

1. worker side coverage is not covered:

I couldn't figure out how to track worker processes too with {{daemon.py}} and 
{{fork()}} as Josh described. I think it became somehow important now in 
particular for some recent changes like Pandas udfs.

  One way I could do is some manual fixes / changes in our codebase for the 
Python worker side to force it working. It's ugly but I think we could make 
this working at least.

2. Doctest seems missed:

I think I also need a manual fix to run this with coverage. I could not figure 
out a clear way to run this but at least I think I can make this working


> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-11-26 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266358#comment-16266358
 ] 

Reynold Xin commented on SPARK-7721:


This is really cool. I took a look but it looks like doctests are missing? For 
example, sortWithinPartitions is labeled as missing, but there is doctest for 
that.


> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-11-24 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265588#comment-16265588
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

Hey [~rxin] and [~joshrosen], I just did a simple demo here:

https://spark-test.github.io/pyspark-coverage-site/
https://github.com/spark-test/pyspark-coverage-site



> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-11-23 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264349#comment-16264349
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

I knew the similar way but was't sure if this was the only way so I was 
hesitant but found this JIRA.

I can give a shot if using git pages sounds good for you guys.

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2017-11-23 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263944#comment-16263944
 ] 

Hyukjin Kwon commented on SPARK-7721:
-

[~joshrosen], ahh, I happened to duplicate the efforts here before ..

So, seems Jenkins <> Codecov is declined for now? Probably one easy workaround 
is just to use github pages - https://pages.github.com/. What we need would 
probably just push the changes into a repo if the tests pass, which will 
automatically updates its page.

I did this before to demonstrate SQL function docs:

https://spark-test.github.io/sparksqldoc/
https://github.com/spark-test/sparksqldoc 

FWIW, I recently added {{spark.python.use.daemon}} config like SparkR to 
disable os.fork and this (of course) enables tracking worker processes, 
although of course we should not disable it in Jenkins tests as it's extremely 
slow. It was good enough for small tests to verify PR or changes though.

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2016-10-17 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583642#comment-15583642
 ] 

Josh Rosen commented on SPARK-7721:
---

IIRC when I looked into this I hit problems with the HTML Publisher Plugin not 
being able to properly publish / serve HTML reports which weren't present on 
the Jenkins master because the underlying files weren't being archived properly 
from the remote build workspaces. From a cursory Google search, it looks like 
other folks have hit similar problems with this: 
https://issues.jenkins-ci.org/browse/JENKINS-6780 
https://issues.jenkins-ci.org/browse/JENKINS-15301

Ideally we could use the Codecov service to aggregate and publish these 
reports. Last month I opened a ticket with Apache Infra to ask about obtaining 
the token which would let us push results to that service, but they haven't 
responded back to my latest comment yet: 
https://issues.apache.org/jira/browse/INFRA-12640

Alternatively, we could write some one-off shell to archive the reports to a 
public S3 bucket and serve them as static files.

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2016-10-16 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579874#comment-15579874
 ] 

holdenk commented on SPARK-7721:


[~joshrosen]is this something your still looking at/interested in or would you 
have the review bandwidth for this be a good place for someone else to step up 
and help out?

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2015-06-15 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586280#comment-14586280
 ] 

Josh Rosen commented on SPARK-7721:
---

We now have the Jenkins HTML publisher plugin installed, so we can now easily 
publish HTML reports from tools from coverage.py 
(https://wiki.jenkins-ci.org/display/JENKINS/HTML+Publisher+Plugin).  I might 
give this a try on NewSparkPullRequestBuilder today. 

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2015-05-31 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566727#comment-14566727
 ] 

Josh Rosen commented on SPARK-7721:
---

I played around with {{coverage.py}} a bit this morning and set up a script 
which runs the Python unit tests with coverage, combines the coverage data 
files, then generates a combined HTML report.  You can find my code at 
https://gist.github.com/JoshRosen/60d590b1cdc271d332e5; just clone that Gist 
and configure the environment variables properly, then run the bash script from 
the Gist directory.

One gotcha: I don't think that this is properly capturing coverage metrics for 
Python worker processes.  This may actually be somewhat complicated because I'm 
not sure that our use of {{fork()}} in {{daemon.py}} will play nicely with 
{{coverage.py}}'s parallel coverage file support (the feature that writes 
different process's coverage data to different files).  We may have to reach a 
bit more deeply into PySpark's internals in order to integrate coverage metrics 
for worker-side code, perhaps by adding code to programmatically start the 
coverage capturing after the fork.  It would be great if someone wants to work 
on this, although I imagine that worker-side coverage is a lower priority than 
having any form of basic coverage for the driver-side code.

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2015-05-19 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551255#comment-14551255
 ] 

Josh Rosen commented on SPARK-7721:
---

Codacy doesn't require repo hook access in order to work; I have a private 
Codacy build for Spark that I set up a while back.  I haven't really played 
around with it much, though.

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2015-05-19 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551252#comment-14551252
 ] 

Reynold Xin commented on SPARK-7721:


Would we have permission to use this?

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2015-05-19 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551249#comment-14551249
 ] 

Josh Rosen commented on SPARK-7721:
---

Actually, we should check out Codacy, since they support Scala + Python and 
have a way to display coverage reports: https://www.codacy.com/features

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2015-05-19 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551240#comment-14551240
 ] 

Josh Rosen commented on SPARK-7721:
---

If we just want to be able to view the coverage reports for individual builds, 
we can probably hook {{coverage.py}} into the build and rig it so that each 
Python process + test run logs its coverage data to a separate file.  Given 
these files, I think it's possible to have {{coverage.py}} generate a combined 
coverage report.  Maybe we could attach / serve these combined HTML reports 
from Jenkins.

If we want to be able to compare coverage across builds, we could look into 
setting up an integration with Coveralls (coveralls.io), but we might run into 
issues with being unable to obtain the right GitHub permissions from Apache.  
We could also investigate Sonar.

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2015-05-19 Thread Davies Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551198#comment-14551198
 ] 

Davies Liu commented on SPARK-7721:
---

There are some tools to generate test coverage for Python, what's best way to 
show them? [~joshrosen] and I check the reports manually, sometimes.

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7721) Generate test coverage report from Python

2015-05-18 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549873#comment-14549873
 ] 

Reynold Xin commented on SPARK-7721:


[~davies] any idea on this?

> Generate test coverage report from Python
> -
>
> Key: SPARK-7721
> URL: https://issues.apache.org/jira/browse/SPARK-7721
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org