[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695652#comment-16695652 ] Apache Spark commented on SPARK-7721: - User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/23117 > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin >Priority: Major > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695650#comment-16695650 ] Apache Spark commented on SPARK-7721: - User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/23117 > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin >Priority: Major > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383532#comment-16383532 ] Hyukjin Kwon commented on SPARK-7721: - [~rxin], I am sorry that it's been delayed. I will finish this for sure. I was away from this due to release stuff. Will try "Another one" way first and fallback to "Simplest one" if I fail ([in this comment|https://issues.apache.org/jira/browse/SPARK-7721?focusedCommentId=16305108&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16305108]) > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin >Priority: Major > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318639#comment-16318639 ] Apache Spark commented on SPARK-7721: - User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/20204 > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312398#comment-16312398 ] Reynold Xin commented on SPARK-7721: I think it's fine even if you don't preserve the history forever ... > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312331#comment-16312331 ] Hyukjin Kwon commented on SPARK-7721: - I (and possibly few committers given [the comment above|https://issues.apache.org/jira/browse/SPARK-7721?focusedCommentId=14551198&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14551198]) would run this though .. but yes, sure, it should become actually powerful when we can run it automatically. If we are all fine to have a single up-to-date coverage site ("Simplest one") for now, it's pretty easy and possible. It's just what I have done so far here - https://spark-test.github.io/pyspark-coverage-site and the only thing I should do is to make this automatic, clone the latest commit bit and push it. I know it's better to keep the history of coverages and leave the link in each PR ("Another one") and in this case we should consider how to keep the history of the coverages, etc. This is where I should investigate more and verify the idea. Will anyway test and investigate the integration more and try the "Another one" way too. If I fail, I think we can fall back to "Simplest one" for now. Does this sounds good to you? In this way, I think I can make sure we can run this automatically eventually. BTW, can you take a look for https://github.com/apache/spark/pull/20151 too? This way we can make the changes separate for Coverage only and I am trying to isolate such logics as much as we can in case we can bring better idea in the future. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311731#comment-16311731 ] Reynold Xin commented on SPARK-7721: We can add it first but in my experience this will only be used when it is automatic :) > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309585#comment-16309585 ] Hyukjin Kwon commented on SPARK-7721: - [~rxin] What do you think about doing a script and then integrating it with Jenkins? Or would you want me to check "2. Integrating with Jenkins" a bit more for clarification? > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305108#comment-16305108 ] Hyukjin Kwon commented on SPARK-7721: - I roughly checked the coverage results and seems fine. There is one trivial nit tho - https://github.com/apache/spark/blob/04e44b37cc04f62fbf9e08c7076349e0a4d12ea8/python/pyspark/daemon.py#L148-L169 this scope is not in the coverage results as basically I am producing the coverage results in {{worker.py}} separately and then merging it. I believe it's not a big deal. So, if you are fine for all now, how about if i proceed this by two PRs 1. Adding the script only (of course after cleaning up) Adding script alone should also be useful when reviewers check PRs, they can at least manually run it. 2. Integrating with Jenkins I have two thoughts for this: - Simplest one: Only run it in a specific mater in Jenkins and we always only keep a single up-to-date coverage site. It's simple. We can just simply push it. I think this is quite straightforward and pretty feasible. - Another one: I make a simple site to list up all other coverages of all other builds (including PR builds) in git pages, and then leave a link in each PR's Jenkins build success message. I think this's also feasible but I think I need to take a look further. BTW, I will be able to start to work on this from next week or two weeks after .. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305103#comment-16305103 ] Hyukjin Kwon commented on SPARK-7721: - Hey [~rxin], I think I made it now by few modification of the script and forcing {{worker.py}} to produce the coverage results. I ran it by Python 3 and Coverage 4.4 and all tests passed and just updated the site - https://spark-test.github.io/pyspark-coverage-site FYI, here is the diff I used in the main codes to force it to produces (15ish lines addition) {code} diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py index e6737ae1c12..088debcf796 100644 --- a/python/pyspark/worker.py +++ b/python/pyspark/worker.py @@ -159,7 +159,7 @@ def read_udfs(pickleSer, infile, eval_type): return func, None, ser, ser -def main(infile, outfile): +def _main(infile, outfile): try: boot_time = time.time() split_index = read_int(infile) @@ -259,6 +259,22 @@ def main(infile, outfile): exit(-1) +if "COVERAGE_PROCESS_START" in os.environ: +def _cov_wrapped(*args, **kwargs): +import coverage +cov = coverage.coverage( +config_file=os.environ["COVERAGE_PROCESS_START"]) +cov.start() +try: +_main(*args, **kwargs) +finally: +cov.stop() +cov.save() +main = _cov_wrapped +else: +main = _main + + if __name__ == '__main__': # Read a local port to connect to from stdin java_port = int(sys.stdin.readline()) {code} > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290175#comment-16290175 ] Hyukjin Kwon commented on SPARK-7721: - Sure, I didn't mean to rush and start to proceed without investigating and checking the whole stuff ahead. Just wanted to check your thought ahead. Will try to have some time to take a look and proceed this bit by bit, and of course will update you. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290134#comment-16290134 ] Reynold Xin commented on SPARK-7721: We definitely don't need to do it in one-go, but with all the stuff like this the key is to know for sure we can do it. Otherwise they become some half -baked infra that's committed but not actually functioning, and brings more hassle than needed. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284797#comment-16284797 ] Hyukjin Kwon commented on SPARK-7721: - For 2., will take another look and be back with a way to solve it. [~rxin], do you expect for me to do this in one go, or separate ones (1. exposing/working coverage with Jenkins and 2. making this working with doctests and tracking worker processes)? BTW, somehow I am missing notifications from old JIRAs(?) unless explicitly I am cc'ed(?). > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284795#comment-16284795 ] Hyukjin Kwon commented on SPARK-7721: - [~rxin] Yup, it is. coverage from worker side is missed too as described above. So, there are currently two problems: 1. worker side coverage is not covered: I couldn't figure out how to track worker processes too with {{daemon.py}} and {{fork()}} as Josh described. I think it became somehow important now in particular for some recent changes like Pandas udfs. One way I could do is some manual fixes / changes in our codebase for the Python worker side to force it working. It's ugly but I think we could make this working at least. 2. Doctest seems missed: I think I also need a manual fix to run this with coverage. I could not figure out a clear way to run this but at least I think I can make this working > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266358#comment-16266358 ] Reynold Xin commented on SPARK-7721: This is really cool. I took a look but it looks like doctests are missing? For example, sortWithinPartitions is labeled as missing, but there is doctest for that. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265588#comment-16265588 ] Hyukjin Kwon commented on SPARK-7721: - Hey [~rxin] and [~joshrosen], I just did a simple demo here: https://spark-test.github.io/pyspark-coverage-site/ https://github.com/spark-test/pyspark-coverage-site > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264349#comment-16264349 ] Hyukjin Kwon commented on SPARK-7721: - I knew the similar way but was't sure if this was the only way so I was hesitant but found this JIRA. I can give a shot if using git pages sounds good for you guys. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263944#comment-16263944 ] Hyukjin Kwon commented on SPARK-7721: - [~joshrosen], ahh, I happened to duplicate the efforts here before .. So, seems Jenkins <> Codecov is declined for now? Probably one easy workaround is just to use github pages - https://pages.github.com/. What we need would probably just push the changes into a repo if the tests pass, which will automatically updates its page. I did this before to demonstrate SQL function docs: https://spark-test.github.io/sparksqldoc/ https://github.com/spark-test/sparksqldoc FWIW, I recently added {{spark.python.use.daemon}} config like SparkR to disable os.fork and this (of course) enables tracking worker processes, although of course we should not disable it in Jenkins tests as it's extremely slow. It was good enough for small tests to verify PR or changes though. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583642#comment-15583642 ] Josh Rosen commented on SPARK-7721: --- IIRC when I looked into this I hit problems with the HTML Publisher Plugin not being able to properly publish / serve HTML reports which weren't present on the Jenkins master because the underlying files weren't being archived properly from the remote build workspaces. From a cursory Google search, it looks like other folks have hit similar problems with this: https://issues.jenkins-ci.org/browse/JENKINS-6780 https://issues.jenkins-ci.org/browse/JENKINS-15301 Ideally we could use the Codecov service to aggregate and publish these reports. Last month I opened a ticket with Apache Infra to ask about obtaining the token which would let us push results to that service, but they haven't responded back to my latest comment yet: https://issues.apache.org/jira/browse/INFRA-12640 Alternatively, we could write some one-off shell to archive the reports to a public S3 bucket and serve them as static files. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15579874#comment-15579874 ] holdenk commented on SPARK-7721: [~joshrosen]is this something your still looking at/interested in or would you have the review bandwidth for this be a good place for someone else to step up and help out? > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586280#comment-14586280 ] Josh Rosen commented on SPARK-7721: --- We now have the Jenkins HTML publisher plugin installed, so we can now easily publish HTML reports from tools from coverage.py (https://wiki.jenkins-ci.org/display/JENKINS/HTML+Publisher+Plugin). I might give this a try on NewSparkPullRequestBuilder today. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566727#comment-14566727 ] Josh Rosen commented on SPARK-7721: --- I played around with {{coverage.py}} a bit this morning and set up a script which runs the Python unit tests with coverage, combines the coverage data files, then generates a combined HTML report. You can find my code at https://gist.github.com/JoshRosen/60d590b1cdc271d332e5; just clone that Gist and configure the environment variables properly, then run the bash script from the Gist directory. One gotcha: I don't think that this is properly capturing coverage metrics for Python worker processes. This may actually be somewhat complicated because I'm not sure that our use of {{fork()}} in {{daemon.py}} will play nicely with {{coverage.py}}'s parallel coverage file support (the feature that writes different process's coverage data to different files). We may have to reach a bit more deeply into PySpark's internals in order to integrate coverage metrics for worker-side code, perhaps by adding code to programmatically start the coverage capturing after the fork. It would be great if someone wants to work on this, although I imagine that worker-side coverage is a lower priority than having any form of basic coverage for the driver-side code. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551255#comment-14551255 ] Josh Rosen commented on SPARK-7721: --- Codacy doesn't require repo hook access in order to work; I have a private Codacy build for Spark that I set up a while back. I haven't really played around with it much, though. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551252#comment-14551252 ] Reynold Xin commented on SPARK-7721: Would we have permission to use this? > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551249#comment-14551249 ] Josh Rosen commented on SPARK-7721: --- Actually, we should check out Codacy, since they support Scala + Python and have a way to display coverage reports: https://www.codacy.com/features > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551240#comment-14551240 ] Josh Rosen commented on SPARK-7721: --- If we just want to be able to view the coverage reports for individual builds, we can probably hook {{coverage.py}} into the build and rig it so that each Python process + test run logs its coverage data to a separate file. Given these files, I think it's possible to have {{coverage.py}} generate a combined coverage report. Maybe we could attach / serve these combined HTML reports from Jenkins. If we want to be able to compare coverage across builds, we could look into setting up an integration with Coveralls (coveralls.io), but we might run into issues with being unable to obtain the right GitHub permissions from Apache. We could also investigate Sonar. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551198#comment-14551198 ] Davies Liu commented on SPARK-7721: --- There are some tools to generate test coverage for Python, what's best way to show them? [~joshrosen] and I check the reports manually, sometimes. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549873#comment-14549873 ] Reynold Xin commented on SPARK-7721: [~davies] any idea on this? > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org