[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566727#comment-14566727 ]
Josh Rosen commented on SPARK-7721: ----------------------------------- I played around with {{coverage.py}} a bit this morning and set up a script which runs the Python unit tests with coverage, combines the coverage data files, then generates a combined HTML report. You can find my code at https://gist.github.com/JoshRosen/60d590b1cdc271d332e5; just clone that Gist and configure the environment variables properly, then run the bash script from the Gist directory. One gotcha: I don't think that this is properly capturing coverage metrics for Python worker processes. This may actually be somewhat complicated because I'm not sure that our use of {{fork()}} in {{daemon.py}} will play nicely with {{coverage.py}}'s parallel coverage file support (the feature that writes different process's coverage data to different files). We may have to reach a bit more deeply into PySpark's internals in order to integrate coverage metrics for worker-side code, perhaps by adding code to programmatically start the coverage capturing after the fork. It would be great if someone wants to work on this, although I imagine that worker-side coverage is a lower priority than having any form of basic coverage for the driver-side code. > Generate test coverage report from Python > ----------------------------------------- > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests > Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org