[ 
https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566727#comment-14566727
 ] 

Josh Rosen commented on SPARK-7721:
-----------------------------------

I played around with {{coverage.py}} a bit this morning and set up a script 
which runs the Python unit tests with coverage, combines the coverage data 
files, then generates a combined HTML report.  You can find my code at 
https://gist.github.com/JoshRosen/60d590b1cdc271d332e5; just clone that Gist 
and configure the environment variables properly, then run the bash script from 
the Gist directory.

One gotcha: I don't think that this is properly capturing coverage metrics for 
Python worker processes.  This may actually be somewhat complicated because I'm 
not sure that our use of {{fork()}} in {{daemon.py}} will play nicely with 
{{coverage.py}}'s parallel coverage file support (the feature that writes 
different process's coverage data to different files).  We may have to reach a 
bit more deeply into PySpark's internals in order to integrate coverage metrics 
for worker-side code, perhaps by adding code to programmatically start the 
coverage capturing after the fork.  It would be great if someone wants to work 
on this, although I imagine that worker-side coverage is a lower priority than 
having any form of basic coverage for the driver-side code.

> Generate test coverage report from Python
> -----------------------------------------
>
>                 Key: SPARK-7721
>                 URL: https://issues.apache.org/jira/browse/SPARK-7721
>             Project: Spark
>          Issue Type: Test
>          Components: PySpark, Tests
>            Reporter: Reynold Xin
>
> Would be great to have test coverage report for Python. Compared with Scala, 
> it is tricker to understand the coverage without coverage reports in Python 
> because we employ both docstring tests and unit tests in test files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to