Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19894 )
Change subject: IMPALA-12171: Optimize delimited output ...................................................................... Patch Set 8: (8 comments) http://gerrit.cloudera.org:8080/#/c/19894/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19894/6//COMMIT_MSG@16 PS6, Line 16: python2 + hs2: 42s516ms -> 22s335ms > Optional: you could also include a percentage of performance improvement, i I didn't want to add percentages as I cannot say something like "ClientFetchWaitTimer was improved 50% for Python3 + HS2" - this really depends on other factors like data types and number of columns/rows. This specific benchmarks show s that the improvement is significant, but I wouldn't want to extrapolate to other queries. http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py File shell/shell_output.py: http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@102 PS6, Line 102: [ > Optional: You could use () instead of [], which would create a lazy iterato Stayed with [] as I do not know lazy iterators that well. I tested it and the speed didn't change. http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@105 PS6, Line 105: or result.find(quote) != -1: > Currently we can't turn the try_quick_path flag back to True once it's turn I wouldn't complicate this part as I am considering actually implementing quoting in the future to handle all cases. The quoting rules are not too complex but would need robust testing to ensure that we always return the same results as Python's csv module. The find/count functions add minimal overhead, the only one that is somewhat significant is decoding. Note that DelimitedOutputFormatter is created per query, so if there are several queries the shell will retry the quick path in each query. http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@123 PS6, Line 123: > nit: don't need 'pass' Done http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@33 PS6, Line 33: try: > Could we check the python version and branch on that instead of try-except? This was copied from shell_output.py and I am not sure that the comments are always correct (is cStringIO always available on Python2?). http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1597 PS6, Line 1597: > Nit: result sets. Done http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1599 PS6, Line 1599: . > Nit: . (full stop). Done http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1628 PS6, Line 1628: > Can we add tests for decoding errors? We can use the queries in test_utf8_d Added a test and bumped into an issue with Hive: HIVE-27418 (and learned that with strict hs2 protocol the shell tests connect to Hive). -- To view, visit http://gerrit.cloudera.org:8080/19894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb Gerrit-Change-Number: 19894 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Wed, 07 Jun 2023 16:51:35 +0000 Gerrit-HasComments: Yes
