Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19894 )

Change subject: IMPALA-12171: Optimize delimited output
......................................................................


Patch Set 8:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19894/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19894/6//COMMIT_MSG@16
PS6, Line 16: python2 + hs2: 42s516ms -> 22s335ms
> Optional: you could also include a percentage of performance improvement, i
I didn't want to add percentages as I cannot say something like 
"ClientFetchWaitTimer was improved 50% for Python3 + HS2" - this really depends 
on other factors like data types and number of columns/rows. This specific 
benchmarks show s that the improvement is significant, but I wouldn't want to 
extrapolate to other queries.


http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py
File shell/shell_output.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@102
PS6, Line 102: [
> Optional: You could use () instead of [], which would create a lazy iterato
Stayed with [] as I do not know lazy iterators that well. I tested it and the 
speed didn't change.


http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@105
PS6, Line 105:         or result.find(quote) != -1:
> Currently we can't turn the try_quick_path flag back to True once it's turn
I wouldn't complicate this part as I am considering actually implementing 
quoting in the future to handle all cases. The quoting rules are not too 
complex but would need robust testing to ensure that we always return the same 
results as Python's csv module.

The find/count functions add minimal overhead, the only one that is somewhat 
significant is decoding.

Note that DelimitedOutputFormatter is created per query, so if there are 
several queries the shell will retry the quick path in each query.


http://gerrit.cloudera.org:8080/#/c/19894/6/shell/shell_output.py@123
PS6, Line 123:
> nit: don't need 'pass'
Done


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@33
PS6, Line 33: try:
> Could we check the python version and branch on that instead of try-except?
This was copied from shell_output.py and I am not sure that the comments are 
always correct (is cStringIO always available on Python2?).


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1597
PS6, Line 1597:
> Nit: result sets.
Done


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1599
PS6, Line 1599: .
> Nit: . (full stop).
Done


http://gerrit.cloudera.org:8080/#/c/19894/6/tests/shell/test_shell_commandline.py@1628
PS6, Line 1628:
> Can we add tests for decoding errors? We can use the queries in test_utf8_d
Added a test and bumped into an issue with Hive: HIVE-27418 (and learned that 
with strict hs2 protocol the shell tests connect to Hive).



--
To view, visit http://gerrit.cloudera.org:8080/19894
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I671c6f538c588f8ad4ef4067f7bc8a6b8a5220cb
Gerrit-Change-Number: 19894
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Wed, 07 Jun 2023 16:51:35 +0000
Gerrit-HasComments: Yes

Reply via email to