[Impala-ASF-CR] IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8549 ) Change subject: IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@25 PS2, Line 25: Tested manually starting with a select from a random table that has few > I haven't figured out a way to test this automatically. So, um, if you want a query that returns a lot of rows: with v as (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9 will do it. (Customize as necessary.) In my very quick experiment, that query is long enough that Ctrl-C worked (but got the query handle error), and, um, takes 20 seconds to go to /dev/null, so you've got time to try the cancellation. $impala-shell.sh --query ' with v as (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9' -o /dev/null Starting Impala Shell without Kerberos authentication Connected to localhost:21000 Server version: impalad version 2.11.0-SNAPSHOT DEBUG (build 454a4aefa426725ab8806cfc814fb8e9df3ab8e5) Query: with v as (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9 Fetched 262144 row(s) in 21.50s $impala-shell.sh --query ' with v as (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9' -o /dev/null Starting Impala Shell without Kerberos authentication Connected to localhost:21000 Server version: impalad version 2.11.0-SNAPSHOT DEBUG (build 454a4aefa426725ab8806cfc814fb8e9df3ab8e5) Query: with v as (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9 ^C Cancelling Query ERROR: Invalid query handle: 864b2a5cd241853b:c45bb8fe Could not execute command: with v as (values (1 as x), (2), (3), (4)) select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9 Note that I didn't get "Invalid query handle" every time; depends a little bit on the timing of Ctrl-C. -- To view, visit http://gerrit.cloudera.org:8080/8549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Gerrit-Change-Number: 8549 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 15 Nov 2017 05:03:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1144: Fix exception when CTRL+C on running query in Impala-shell
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8549 ) Change subject: IMPALA-1144: Fix exception when CTRL+C on running query in Impala-shell .. Patch Set 2: (5 comments) The Python looks fine to me (after David's semicolon comment). I think it'd be good to try to add a regular test for cancellation in the shell. http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-1144: Fix exception when CTRL+C on running query in Impala-shell Perhaps: Fix exception when cancelling query in impala-shell with Ctrl-C "when CTRL+C" reads oddly to me. http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@11 PS2, Line 11: 'Invalid query handle'. nit: reformat this as one paragraph? http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@22 PS2, Line 22: was issued automatically. This happened to historical reasons but not needed nit: "but is not needed" (insert "is") http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@25 PS2, Line 25: Tested manually starting with a select from a random table that has few Can we write a normal test for this? I don't know if it's true, but "select sleep(6) from functional.whatever" may actually work with cancellation. Or it may not: I haven't tried it! http://gerrit.cloudera.org:8080/#/c/8549/2/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8549/2/shell/impala_shell.py@485 PS2, Line 485: new_imp_client = self._new_impala_client() does new_imp_client need to be closed? or old_imp_client? -- To view, visit http://gerrit.cloudera.org:8080/8549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Gerrit-Change-Number: 8549 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 15 Nov 2017 00:52:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8368 ) Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. Patch Set 5: Code-Review+1 Thanks. Tim: I'm good with this; you'll have to +2. -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 15 Nov 2017 00:12:38 + Gerrit-HasComments: No
[Impala-ASF-CR] [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8529 ) Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI. .. Patch Set 1: (3 comments) I took a very brief look. I was worried when you said "lightweight framework", but it turns out you're using codahale/yammer metrics, which is the right thing to do. You know this, but TopNCache doesn't have a unit test. I didn't look at all of this. http://gerrit.cloudera.org:8080/#/c/8529/1/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: http://gerrit.cloudera.org:8080/#/c/8529/1/common/thrift/JniCatalog.thrift@602 PS1, Line 602: 1: required list large_tables Is the idea that len(large_tables)==len(memory_estimates), and likewise len(frequent_table)=len(num_metadata_operations)? I think it'd be more honest for this to be "list large_tables", with LargeTable being a struct that has a tablename and details about that table that you want to share. http://gerrit.cloudera.org:8080/#/c/8529/1/fe/pom.xml File fe/pom.xml: http://gerrit.cloudera.org:8080/#/c/8529/1/fe/pom.xml@365 PS1, Line 365: metrics-core Thanks. This is the right thing to use in my experience. http://gerrit.cloudera.org:8080/#/c/8529/1/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java File fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java: http://gerrit.cloudera.org:8080/#/c/8529/1/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java@36 PS1, Line 36: // TODO: Consider making it a configurable parameter. A somewhat cheap way to do this is to use: Integer.getInteger("org.apache.impala.catalog.CatalogUsageMonitor.NUM_TABLES_TRACKED", 25) here. That gets the number from system properties. It's pretty weak configuration, but I've used it to nice effect for constants that really should be fine as is. -- To view, visit http://gerrit.cloudera.org:8080/8529 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e Gerrit-Change-Number: 8529 Gerrit-PatchSet: 1 Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 14 Nov 2017 20:00:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 ) Change subject: IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process .. Patch Set 2: Based on my reading of "man readdir", it's not threadsafe. I think the only usage here is if a user hits "http://impalad:.../"; to see the web UI. It's possible that there's a lock somewhere preventing concurrent use, but given that this is already reasonably expensive, I'd recommend using the reentrant readdir_r instead. I looked around, and it looks like C++17 and boost have filesystem abstractions, but I think readdir() is simple enough. -- To view, visit http://gerrit.cloudera.org:8080/8546 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Gerrit-Change-Number: 8546 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 14 Nov 2017 19:09:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8368 ) Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. Patch Set 4: Code-Review+1 (1 comment) Thanks for the commit message. Please see my comment about the sleep you added. If you think it won't work, I'm fine with this as is. http://gerrit.cloudera.org:8080/#/c/8368/4/tests/custom_cluster/test_shell_interactive_reconnect.py File tests/custom_cluster/test_shell_interactive_reconnect.py: http://gerrit.cloudera.org:8080/#/c/8368/4/tests/custom_cluster/test_shell_interactive_reconnect.py@60 PS4, Line 60: time.sleep(1) Would "p.send_cmd("show tables"); p.get_result()" make this stable, without doing a sleep? I always worry a bit about sleeps in tests, as they have a tendency to show up at annoying times as problems. -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 14 Nov 2017 16:58:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Pin gen build version's git handling to typical git dir.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8500 ) Change subject: Pin gen_build_version's git handling to typical git dir. .. Patch Set 1: I'm seeing a compile error here. https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/613/console. Seems unrelated, so we'll probably try again later? 21:19:41 [ 34%] Building CXX object be/src/udf/CMakeFiles/Udf.dir/udf-test-harness.cc.o 21:19:41 /home/ubuntu/Impala-lzo/hdfs-lzo-text-scanner.cc: In member function ‘impala::Status impala::HdfsLzoTextScanner::ReadAndDecompressData(impala::MemPool*)’: 21:19:41 /home/ubuntu/Impala-lzo/hdfs-lzo-text-scanner.cc:589:44: error: no matching function for call to ‘impala::ScannerContext::ReleaseCompletedResources(bool)’ 21:19:41context_->ReleaseCompletedResources(false); 21:19:41 ^ 21:19:41 /home/ubuntu/Impala-lzo/hdfs-lzo-text-scanner.cc:589:44: note: candidate is: 21:19:41 In file included from /home/ubuntu/Impala/be/src/exec/hdfs-scanner.h:31:0, 21:19:41 from /home/ubuntu/Impala/be/src/exec/hdfs-text-scanner.h:22, 21:19:41 from /home/ubuntu/Impala-lzo/hdfs-lzo-text-scanner.h:9, 21:19:41 from /home/ubuntu/Impala-lzo/hdfs-lzo-text-scanner.cc:22: 21:19:41 /home/ubuntu/Impala/be/src/exec/scanner-context.h:298:8: note: void impala::ScannerContext::ReleaseCompletedResources(impala::RowBatch*, bool) 21:19:41void ReleaseCompletedResources(RowBatch* batch, bool done); 21:19:41 ^ 21:19:41 /home/ubuntu/Impala/be/src/exec/scanner-context.h:298:8: note: candidate expects 2 arguments, 1 provided -- To view, visit http://gerrit.cloudera.org:8080/8500 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd Gerrit-Change-Number: 8500 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Mon, 13 Nov 2017 22:19:18 + Gerrit-HasComments: No
[Impala-ASF-CR] Pin gen build version's git handling to typical git dir.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8500 ) Change subject: Pin gen_build_version's git handling to typical git dir. .. Patch Set 1: Ping? Thanks! -- To view, visit http://gerrit.cloudera.org:8080/8500 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd Gerrit-Change-Number: 8500 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Mon, 13 Nov 2017 21:03:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8368 ) Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. Patch Set 3: Code-Review+1 (3 comments) Thanks for the updates. This looks fine to me, with 2 minor issues: * Please re-work the first line of the commit message; it doesn't make sense to me. (Or maybe I'm missing something, in which case just let me know.) * If you're using NamedTemporaryFile to just get a filename, I suggested a shorter solution. There are two cases in this diff where you could take advantage of that. I'm fine with both of those changes going in without further review; they're very minor. http://gerrit.cloudera.org:8080/#/c/8368/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8368/3//COMMIT_MSG@9 PS3, Line 9: When precmd tested the connection it didn't validate that if we are This sentence didn't parse for me. Specifically, I'm not sure what "that" in "it didn't validate that" is referring to. http://gerrit.cloudera.org:8080/#/c/8368/3/tests/custom_cluster/test_shell_interactive_reconnect.py File tests/custom_cluster/test_shell_interactive_reconnect.py: http://gerrit.cloudera.org:8080/#/c/8368/3/tests/custom_cluster/test_shell_interactive_reconnect.py@36 PS3, Line 36: tempfile = NamedTemporaryFile(delete=False) : tempfile.close() : cls.tempfile_name = tempfile.name Minor: I think this is just: cls.tempfile_name = tempfile.mktemp() Or did you use NamedTemporaryFile to create the file too, and that's important somehow? http://gerrit.cloudera.org:8080/#/c/8368/3/tests/shell/util.py File tests/shell/util.py: http://gerrit.cloudera.org:8080/#/c/8368/3/tests/shell/util.py@121 PS3, Line 121: """ Moves back history file from given filepath """ mention that this is a no-op if filepath doesn't exist. -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 13 Nov 2017 19:14:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8456 ) Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. Patch Set 6: Well, that's what I get for being clever with bash. I've removed the magical "unset" for loop and am now unsetting every URL explicitly. -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 09 Nov 2017 18:18:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Hello David Knupp, Joe McDonnell, Zach Amsden, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8456 to look at the new patch set (#6). Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. IMPALA-6148: Specifying thirdparty deps as URLs If the environment variable $IMPALA__URL is configured in impala-config-branch.sh or impala-config-local, for a thirdparty dependency, use that to download it instead of the s3://native-toolchain bucket. This makes testing against arbitrary versions of the dependencies easier. I did a little bit of refactoring while here, creating a small class for a Package to handle reading the environment variables. I also changed bootstrap_toolchain.py to use Python logging, which cleans up the output during the multi-threaded downloading. I tested this by both with customized URLs and by running the regular build (pre-review-test, without most of the slow test suites). Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 138 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/6 -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] Remove unused/defunct Maven repositories.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8497 ) Change subject: Remove unused/defunct Maven repositories. .. Patch Set 1: https://jenkins.impala.io/view/Utility/job/pre-review-test/74/ ran happily with this change. The "-from-scratch" aspect of it probably suggests that it ran with a clean .m2 cache (though I haven't explicitly verified that). -- To view, visit http://gerrit.cloudera.org:8080/8497 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79eb6c483561726c7cbaf86874001f1979128720 Gerrit-Change-Number: 8497 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 08 Nov 2017 23:00:17 + Gerrit-HasComments: No
[Impala-ASF-CR] Remove unused/defunct Maven repositories.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8497 Change subject: Remove unused/defunct Maven repositories. .. Remove unused/defunct Maven repositories. Removes three Maven repositories. davidtrott and codehaus both don't exist any more, so they're not doing anyone any good. (We had previously cleaned up Codehaus in IMPALA-5224, but a reference was resurrected.) The libphonenumber repo was simply misconfigured: the library exists in Maven central in the "normal" place, and a subdirectory repo is unnecessary. To test this, I ran "buildall" after removing ~/.m2/ on my machine. Change-Id: I79eb6c483561726c7cbaf86874001f1979128720 --- M common/yarn-extras/pom.xml M fe/pom.xml M tests/test-hive-udfs/pom.xml 3 files changed, 0 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/8497/1 -- To view, visit http://gerrit.cloudera.org:8080/8497 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I79eb6c483561726c7cbaf86874001f1979128720 Gerrit-Change-Number: 8497 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Expose $IMPALA MAVEN OPTIONS for configuring Maven.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8496 to look at the new patch set (#3). Change subject: Expose $IMPALA_MAVEN_OPTIONS for configuring Maven. .. Expose $IMPALA_MAVEN_OPTIONS for configuring Maven. With this commit, $IMPALA_MAVEN_OPTIONS is used by bin/mvn-quiet.sh to configure Maven slightly. The default is no extra options. This is handy for giving Maven a settings file with the "-s" flag, to control, for example, repositories and their mirrors. In fact, I considered exposing IMPALA_MAVEN_SETTINGS_FILE explicitly, but decided that the generic option would be as good. It's useful to customize how Maven works, especially to provide a settings file with repository mirrors. Change-Id: I2c62185476fd2388c7cda8884276b79a77370127 --- M bin/impala-config.sh M bin/mvn-quiet.sh M testdata/bin/generate-block-ids.sh M testdata/bin/generate-load-nested.sh M testdata/bin/split-hbase.sh 5 files changed, 14 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8496/3 -- To view, visit http://gerrit.cloudera.org:8080/8496 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c62185476fd2388c7cda8884276b79a77370127 Gerrit-Change-Number: 8496 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6084: Avoid using of global namespace for llvm
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8489 ) Change subject: IMPALA-6084: Avoid using of global namespace for llvm .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8489/3/be/src/codegen/instruction-counter.cc File be/src/codegen/instruction-counter.cc: http://gerrit.cloudera.org:8080/#/c/8489/3/be/src/codegen/instruction-counter.cc@28 PS3, Line 28: const char* InstructionCounter::TOTAL_INSTS = "Total llvm::Instructions"; This was probably not intentional? -- To view, visit http://gerrit.cloudera.org:8080/8489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7098baff5f746bcc2965583ca99f6188997b733a Gerrit-Change-Number: 8489 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 08 Nov 2017 22:41:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8456 ) Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py File bin/bootstrap_toolchain.py: http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@112 PS2, Line 112: if re.search(k, release): > This is probably an obtuse question (you can count on me for those): for an The 57 times is why we're adding the cache. It was spamming the log, mostly. (It also saves a second, which is sort of nice on a warm build.) But release can be passed in as an argument to get_platform_release_label(), which is done if Kudu isn't supported; line 240. -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 08 Nov 2017 21:09:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 9: (1 comment) I'm still a little weirded out that we want "SET" to behave differently in the impala-shell and in HS2. Tim/Dan: you ok with it? http://gerrit.cloudera.org:8080/#/c/8447/9/common/thrift/beeswax.thrift File common/thrift/beeswax.thrift: http://gerrit.cloudera.org:8080/#/c/8447/9/common/thrift/beeswax.thrift@103 PS9, Line 103: 4: optional byte level Why is this a byte? Shouldn't this be an enum? -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Nov 2017 20:56:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Pin gen build version's git handling to typical git dir.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8500 Change subject: Pin gen_build_version's git handling to typical git dir. .. Pin gen_build_version's git handling to typical git dir. gen_build_version.py now specifies --git-dir explicitly, to avoid fetching the revision of a containing git directory that's not an Impala checkout. This came up when building Impala without a git directory, but within a build system that happens to have a git directory higher up in the directory tree. I tested this by running the script manually and observing it works identically in the normal case. Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd --- M bin/gen_build_version.py 1 file changed, 5 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8500/1 -- To view, visit http://gerrit.cloudera.org:8080/8500 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I98870a1cb2073ddcf9ba1620e3801e1964930edd Gerrit-Change-Number: 8500 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8456 ) Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. Patch Set 4: (6 comments) Thanks for the reviews! I think I addressed/responded all the comments. http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py File bin/bootstrap_toolchain.py: http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@36 PS2, Line 36: # python bootstrap_toolchain.py > Not a blocker, but would it make sense to find/replace print -> logging thr Done http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@82 PS2, Line 82: if not self.version: > Nit: Probably a bit pedantic, but catching a specific exception only to rai I fixed this by getting rid of catching KeyError entirely. http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@112 PS2, Line 112: if re.search(k, release): > Also, would it be appropriate to cache the value as an attribute of Package You can't trivially cache v because it depends on the argument 'release'. I'm happy to revert this function to its previous incarnation. The overall logic here is complicated enough that I don't want to do a further refactor of moving the URL-creation stuff into Package in the same commit. http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@373 PS2, Line 373: > Nit: trailing white space Done http://gerrit.cloudera.org:8080/#/c/8456/2/bin/bootstrap_toolchain.py@399 PS2, Line 399: logging.basicConfig(level=logging.INFO, > Nice. Done http://gerrit.cloudera.org:8080/#/c/8456/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/8456/3/bin/impala-config.sh@138 PS3, Line 138: # Download URLs for toolchain dependencies can be overridden by : # IMPALA__URLs in impala-config-*.sh. We unset them here first: : for var in "${!IMPALA_@}"; do : if [[ "$var" == IMPALA_*_URL ]]; then : unset $var : fi : done > If I understand this correctly, setting these in the environment won't work I updated commit message and some comments to reflect this. I think it's appropriate for IMPALA_HIVE_URL and IMPALA_HIVE_VERSION to have the same lifecycle, which in this case is impala-config*.sh. -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 08 Nov 2017 18:10:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Expose $IMPALA MAVEN OPTIONS for configuring Maven.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8496 Change subject: Expose $IMPALA_MAVEN_OPTIONS for configuring Maven. .. Expose $IMPALA_MAVEN_OPTIONS for configuring Maven. With this commit, $IMPALA_MAVEN_OPTIONS is used by bin/mvn-quiet.sh to configure Maven slightly. The default is no extra options. This is handy for giving Maven a settings file with the "-s" flag, to control, for example, repositories and their mirrors. In fact, I considered exposing IMPALA_MAVEN_SETTINGS_FILE explicitly, but decided that the generic option would be as good. It's useful to customize how Maven works, especially to provide a settings file with repository mirrors. Change-Id: I2c62185476fd2388c7cda8884276b79a77370127 --- M bin/impala-config.sh M bin/mvn-quiet.sh 2 files changed, 8 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8496/1 -- To view, visit http://gerrit.cloudera.org:8080/8496 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2c62185476fd2388c7cda8884276b79a77370127 Gerrit-Change-Number: 8496 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Hello David Knupp, Joe McDonnell, Zach Amsden, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8456 to look at the new patch set (#4). Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. IMPALA-6148: Specifying thirdparty deps as URLs If the environment variable $IMPALA__URL is configured in impala-config-branch.sh or impala-config-local, for a thirdparty dependency, use that to download it instead of the s3://native-toolchain bucket. This makes testing against arbitrary versions of the dependencies easier. I did a little bit of refactoring while here, creating a small class for a Package to handle reading the environment variables. I also changed bootstrap_toolchain.py to use Python logging, which cleans up the output during the multi-threaded downloading. I tested this by both with customized URLs and by running the regular build (pre-review-test, without most of the slow test suites). Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 101 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/4 -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Hello David Knupp, Joe McDonnell, Zach Amsden, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8456 to look at the new patch set (#3). Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. IMPALA-6148: Specifying thirdparty deps as URLs If the environment variable $IMPALA__URL is configured for a thirdparty dependency, use that to download it instead of the s3://native-toolchain bucket. This makes testing against arbitrary versions of the dependencies easier. I did a little bit of refactoring while here, creating a small class for a Package to handle reading the environment variables. I also changed bootstrap_toolchain.py to use Python logging, which cleans up the output during the multi-threaded downloading. I tested this by both with customized URLs and by running the regular build (pre-review-test, without most of the slow test suites). Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 --- M bin/bootstrap_toolchain.py M bin/impala-config.sh 2 files changed, 85 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/3 -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] Fix errant, newline-including log directory.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8459 ) Change subject: Fix errant, newline-including log directory. .. Patch Set 1: Looking through https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/574/artifact/Impala/logs_static/logs/, I think the following is the relevant issue. I'm hesitant to believe that a change to bin/clean.sh could have caused this. F1103 20:52:33.350702 67042 scalar-expr-evaluator.cc:70] Check failed: !initialized_ || closed_ I1103 20:52:33.415714 67008 status.cc:124] ImpalaRuntimeException: Unable to find class. CAUSED BY: ClassNotFoundException: org.apache.impala.TestUdf @ 0x15a0bef impala::Status::Status() @ 0x19a42c8 impala::JniUtil::GetJniExceptionMsg() @ 0x1d28b85 impala::HiveUdfCall::OpenEvaluator() @ 0x1d2b76d impala::ScalarExpr::OpenEvaluator() @ 0x1d6f55d impala::ScalarFnCall::OpenEvaluator() @ 0x1d31d94 impala::ScalarExprEvaluator::Clone() @ 0x1d31fa6 impala::ScalarExprEvaluator::Clone() @ 0x1ad088c impala::HdfsScanner::Open() @ 0x1b10f80 impala::HdfsTextScanner::Open() @ 0x1ab4fdf impala::HdfsScanNodeBase::CreateAndOpenScanner() @ 0x1aa9f99 impala::HdfsScanNode::ProcessSplit() @
[Impala-ASF-CR] Fix errant, newline-including log directory.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8459 Change subject: Fix errant, newline-including log directory. .. Fix errant, newline-including log directory. We've seen cases where a directory named "cluster\n " sneaks into the logs directory. The intention is that $IMPALA_ALL_LOGS_DIRS is a space-separated list, but clean.sh (as opposed to buildall.sh) treats it as a single argument. I tested this manually: Before: $ls logs/ be_tests/ custom_cluster_tests/ data_loading/ ee_tests/ fe_tests/ Bad command: $mkdir -p "${IMPALA_ALL_LOGS_DIRS}" After: $ls logs/ be_tests/ cluster? / custom_cluster_tests/ data_loading/ ee_tests/ fe_tests/ Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 --- M bin/clean.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/8459/1 -- To view, visit http://gerrit.cloudera.org:8080/8459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id18404c3f3fc0008f67b01afbac9d908532fc317 Gerrit-Change-Number: 8459 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8456 Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. IMPALA-6148: Specifying thirdparty deps as URLs If the environment variable $IMPALA__URL is configured for a thirdparty dependency, use that to download it instead of the s3://native-toolchain bucket. This makes testing against arbitrary versions of the dependencies easier. I did a little bit of refactoring while here, creating a small class for a Package to handle reading the environment variables. I also changed bootstrap_toolchain.py to use Python logging, which cleans up the output during the multi-threaded downloading. I tested this by both with customized URLs and by running the regular build (pre-review-test, without most of the slow test suites). Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 --- M bin/bootstrap_toolchain.py 1 file changed, 77 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/8456/2 -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8428 ) Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable .. Patch Set 1: (1 comment) > Should I create tests for condition-variable.h? We don't seem to have unit > tests yet. I think ideally we'd have tests for helper code like this. I'm ok if we think it's not worth it here given the low complexity. http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h File be/src/util/condition-variable.h: http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@38 PS1, Line 38: void Wait(boost::unique_lock& lock) { > About the 'struct' keyword: it is something that C requires, but in C++ it Thanks! -- To view, visit http://gerrit.cloudera.org:8080/8428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9 Gerrit-Change-Number: 8428 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 16:38:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 5: Code-Review-1 (3 comments) I took a quick look through. I'm uncomfortable with this change as it stands, because I feel it abuses the Thrift structures too much. Specifically, keeping the map of query option levels in TQueryOptions seems wrong: you're really describing "Query Option Metadata" (which might include name, description/help, and level), rather than the query options themselves. And, for sending this stuff to impala-shell, abusing the 'description' field seems wrong. (In fact, impala-shell should actually show useful help for these options. Things like "MT_DOP" are completely opaque. Implementation wise, I think we also need to worry about what happens for the HiveServer2 side of things. Specifically, we respond to "SET" queries over HS2 at https://github.com/apache/incubator-impala/blob/1803b403e38b3f952afc4ff3fd3e5f4d14c088f8/be/src/service/client-request-state.cc#L194. I think we want to stay away from adding information into the Beeswax API that isn't also exposed in HS2. One option is to let "SET ALL" be meaningful as a query for HS2, though I don't quite know what that means in terms of the schema returned: how would you divide the regular/advanced/deprecated options over that API? http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift@65 PS5, Line 65: // Levels to use when displaying query options from Impala shell Let's leave a pointer here indicating where the mapping from the options in TQueryOptions to these levels is defined. http://gerrit.cloudera.org:8080/#/c/8447/5/common/thrift/ImpalaInternalService.thrift@297 PS5, Line 297: // Map for query option name to option level mapping. Populated by ImpalaServer : // on startup : 61: required map query_option_levels; This worries me a bit, for a couple of reasons. First, 'required' is forever. (See https://diwakergupta.github.io/thrift-missing-guide/#_defining_structs.) I think it means that a Thrift client will refuse to serialize a struct if this isn't set. For some uses, like TClientRequest, it makes no sense for us to carry this around. It's possible it works out because it can be set to the empty map often, but it's still odd. Second, this isn't a query option. It's certainly describing query options, but it's very very different from, say, "mem_limit", which applies to the query. http://gerrit.cloudera.org:8080/#/c/8447/5/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/8447/5/shell/impala_client.py@100 PS5, Line 100: def get_query_option_level_from_description(self, description): Have we decided we can't change ConfigVariable to include a level? It seems like this is very much not a description, and we're abusing it. /** Represents a Hadoop-style configuration variable. */ struct ConfigVariable { 1: string key, 2: string value, 3: string description } -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Nov 2017 16:33:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: (1 comment) Thanks for the reviews! I'll report back about cancellations and how this looks in tools like CM that look at profiles. Cancellations isn't something I considered yet, so thanks for the tip! http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/8434/1/be/src/service/client-request-state.h@342 PS1, Line 342: is_planning_ > maybe make that is_planning_finished_ (or similar), and initialize it to fa I started out with this being planning_finished_ (or some such). I ran into the fact that some calls (like GetSchemas()) avoid the ExecuteInternal() path entirely: there's no planning to be done, but yet they also have a ClientRequestState, and they use get_result_metadata() and so on. In my case, the JDBC tests exposed this very clearly, but many other tests would have as well. So, I decided to have state around only the query paths that actually do planning, so as to avoid saying that "GetSchemas()" has finished planning, since that makes less sense. I could also think about this as a query state, but changing the query state enums seems much more delicate than this approach. -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 02 Nov 2017 22:32:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Correct log line in start-impala-cluster.py.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8432 ) Change subject: Correct log line in start-impala-cluster.py. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/8432/1/bin/start-impala-cluster.py@398 PS1, Line 398: executors = options.cluster_size - options.num_coordinators > Could executors be negative due to the same bug? No. I think it's handled by line 343-344. $bin/start-impala-cluster.py --impalad_args="--stress_metadata_loading_pause_injection_ms=5000" --log_level=2 --cluster_size=1 --num_coordinators=1 --use_exclusive_coordinators Cannot start an Impala cluster with no executors -- To view, visit http://gerrit.cloudera.org:8080/8432 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e Gerrit-Change-Number: 8432 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 02 Nov 2017 22:08:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6136: Part 1: Query duration should not be normally negative.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8430 ) Change subject: IMPALA-6136: Part 1: Query duration should not be normally negative. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/8430/2/be/src/service/impala-http-handler.cc@303 PS2, Line 303: // record.end_time_us might still be zero if the query is not yet done nit: Could you update the following comment in client-request-state.h and impala-server.h (once for ClientRequestState, once for QueryStateRecord) to say something about how 0 is a special value. /// Start/end time of the query, in Unix microseconds. int64_t start_time_us_, end_time_us_; -- To view, visit http://gerrit.cloudera.org:8080/8430 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6f971ebd5f0f00f3e38df0495692ffe9d52ef90 Gerrit-Change-Number: 8430 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 01 Nov 2017 16:02:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17 PS1, Line 17: whereas the web UI would previously block on the lock. > I tried this out (admittedly 1.5 weeks ago, but I don't think it's changed) Looking over https://gerrit.cloudera.org/c/6707/12/be/src/service/impala-server.cc a bit more clearly, this is replacing "Query plan is not ready" with the actual profile. The profile is useful at this point: it has been populated with the query string, the time that planning started, and so on. -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 01 Nov 2017 15:36:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@17 PS1, Line 17: whereas the web UI would previously block on the lock. > After IMPALA-1972, it doesn't block, it just returns an empty profile. I tried this out (admittedly 1.5 weeks ago, but I don't think it's changed). You can go to the web UI and see the queries, but if you click on "Profile", you'll get stuck getting the lock in ImpalaServer::GetRuntimeProfileStr() below. Am I missing something? Status ImpalaServer::GetRuntimeProfileStr(const TUniqueId& query_id, const string& user, bool base64_encoded, stringstream* output) { DCHECK(output != nullptr); // Search for the query id in the active query map { shared_ptr request_state = GetClientRequestState(query_id); if (request_state.get() != nullptr) { lock_guard l(*request_state->lock()); http://gerrit.cloudera.org:8080/#/c/8434/1//COMMIT_MSG@26 PS1, Line 26: know the query id : that's necessary to call GetResultSetMetadata() > It looks to me like this is against IMPALA-2568. I guess we eventually want Yes, IMPALA-2568 is one of the eventual goals. This is actually listed as the first subtask of that JIRA. It's easier to argue that this one is compatible. As for this one needing to be undone, it's hard to say. I think you solve IMPALA-2568 by having more fine-grained and clear query states/lifecycles. I think this is a step in that direction. What would you suggest? -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 01 Nov 2017 15:32:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8434 ) Change subject: IMPALA-5564: Release lock during planning. (wip) .. Patch Set 1: Hi Dan, This is a draft of the change to let query profiles show up during planning. Feedback appreciated! Thanks! -- Philip -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 31 Oct 2017 22:46:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5564: Release lock during planning. (wip)
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8434 Change subject: IMPALA-5564: Release lock during planning. (wip) .. IMPALA-5564: Release lock during planning. (wip) ** I'm looking for feedback on the approach here; if folks think this is the right direction, I'll work on chasing down why Thrift Exception messages aren't propagated and adding tests for any Thrift method that takes a query_id, across Beeswax and HS2 APIs. ** This commit changes the query path to release the client request state lock during planning. As a result, the plan becomes available to the web UI, whereas the web UI would previously block on the lock. This introduces a new boolean, is_planning_, to capture that that we're in planning. This is done largely to be able to assert that result metadata is not queried during planning. The common workflow here is: client, using Thrift, calls ImpalaServer::ExecuteStatement() (in impala-hs2-server.cc), which calls ImpalaServer::Execute(), which calls ImpalaServer::ExecuteInternal(), where the plannig happens. Because the first time the client can cleanly get the query id is the return of ExecuteStatement(), there shouldn't be a way to know the query id that's necessary to call GetResultSetMetadata(). If someone reaches around (e.g., via the web UI), they will get an error message. This happens to addess one of the points in IMPALA-3882, which talks about releasing this very lock. The core tests pass. One exhaustive test had to be modified, and I saw exhaustive failures that are unrelated. I added a new test for one Thrift Beeswax API whose behavior has changed. If folks think this approach is fine, I'll work through the other APIs. Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b --- M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M tests/common/impala_connection.py A tests/custom_cluster/test_profile_during_planning.py M tests/custom_cluster/test_query_concurrency.py 8 files changed, 147 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8434/1 -- To view, visit http://gerrit.cloudera.org:8080/8434 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1e3fc4c28d7a5ad8546d48bcd22c03fbce502e5b Gerrit-Change-Number: 8434 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Correct log line in start-impala-cluster.py.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8432 Change subject: Correct log line in start-impala-cluster.py. .. Correct log line in start-impala-cluster.py. Updated logging in start-impala-cluster to accurately specify how many impala nodes were started, and how many of these were coordinators or executors or both. The new logging looks like: Impala Cluster Running with 1 nodes (1 coordinators, 1 executors). Previously, when invoking this script with --cluster_size=1, it would report "1 nodes and 3 coordinators" which was wrong (because there was only 1 coordinator) and confusing (because it seemed like a coordinator was a separate thing from a node). I also removed an unused import. I have run core and exhaustive tests with these change, as part of testing other changes. Nothing untoward happened. Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e --- M bin/start-impala-cluster.py 1 file changed, 8 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/8432/1 -- To view, visit http://gerrit.cloudera.org:8080/8432 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7ceece1c05b9a4ca9f0a08fa30d195f811490c0e Gerrit-Change-Number: 8432 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Install OpenJDK-dbg for development environments.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8431 Change subject: Install OpenJDK-dbg for development environments. .. Install OpenJDK-dbg for development environments. Updates the boostrap script to install the OpenJDK debug symbols. OpenJDK comes with a gdb stack unwinder that errors out if the OpenJDK debugging symbols aren't installed. This fixes the following error message in gdb: Installing openjdk unwinder Traceback (most recent call last): File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 52, in class Types(object): File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 66, in Types nmethodp_t = gdb.lookup_type('nmethod').pointer() gdb.error: No type named nmethod. I tested this by installing the package manually on Ubuntu16.04 and using gdb a bit. Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0 --- M bin/bootstrap_system.sh 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/8431/1 -- To view, visit http://gerrit.cloudera.org:8080/8431 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9aa3a5e1bbba55a4b0b489e4a4dfa39e35fc0ae0 Gerrit-Change-Number: 8431 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-3887: Use dfs.namenode.replication.min=3
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8426 ) Change subject: IMPALA-3887: Use dfs.namenode.replication.min=3 .. Patch Set 3: > Patch Set 3: > > I'm still testing this change. With this change we seem to be more prone to > hitting the Hive/mapred local executor race (MAPREDUCE-6441 MAPREDUCE-6992). > Will sync with PhilZ to see how to proceed. https://gerrit.cloudera.org/c/8405/ is ready to commit. We were holding off because Lars said to hold off until the build stabilized. I don't know where you're running into MAPREDUCE-6441. My change only addresses it for data load, since it's an option you have to pass to beeline. c/8405 is splittable into two parts (beeline change for the option; parallelization), but I don't see much benefit into splitting it. -- To view, visit http://gerrit.cloudera.org:8080/8426 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie09efdd9512cc4f0e338919af42bcdcede2cfb93 Gerrit-Change-Number: 8426 Gerrit-PatchSet: 3 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 31 Oct 2017 20:38:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8428 ) Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable .. Patch Set 1: (1 comment) Most of this looks mechanical, and it looked fine. You changed some 1-line if's into 3-line ifs, which may be against house style. We don't seem to have any tests for condition-variable.h. How did you test this? http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h File be/src/util/condition-variable.h: http://gerrit.cloudera.org:8080/#/c/8428/1/be/src/util/condition-variable.h@38 PS1, Line 38: void Wait(boost::unique_lock& lock) { I'm not yet familiar with this part of C++; why is 'inline' getting removed here? And 'struct' in line 47? -- To view, visit http://gerrit.cloudera.org:8080/8428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9 Gerrit-Change-Number: 8428 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 31 Oct 2017 17:23:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6134: Update code base to use impala::ConditionVariable
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8428 ) Change subject: IMPALA-6134: Update code base to use impala::ConditionVariable .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8428/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8428/1//COMMIT_MSG@14 PS1, Line 14: This commit substitues every boost::condtion_variable in nit: condtion_variable -> condition_variable. -- To view, visit http://gerrit.cloudera.org:8080/8428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3085c6dcb42350b61244df6e7f091a1e7db356c9 Gerrit-Change-Number: 8428 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 31 Oct 2017 17:15:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6073: Fail on misconfigured CLASSPATH.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8327 ) Change subject: IMPALA-6073: Fail on misconfigured CLASSPATH. .. Patch Set 1: Code-Review-1 > Patch Set 1: > > > I think this check seems not to be enough. Let me leave a detailed > > comment on the JIRA ticket. > > It is OK to leave detailed comments here in the code review tool. In fact, it > is common. On JIRA, Kim Jin Chul said: > CLASSPATH should be set after loading impala-config.sh but the variable may > not have sufficient libraries. I'm -1'ing this for a bit; I think the trick is to figure out if we can get rid of impala-config.sh setting this at all. I'll try it out. -- To view, visit http://gerrit.cloudera.org:8080/8327 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d Gerrit-Change-Number: 8327 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 30 Oct 2017 21:06:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8405 ) Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8405/2/bin/load-data.py File bin/load-data.py: http://gerrit.cloudera.org:8080/#/c/8405/2/bin/load-data.py@114 PS2, Line 114: # When HiveServer2 is configured to use "local" mode (i.e., MR jobs are run > Let's mention the HADOOP JIRA in case it's ever fixed and we can remove the Good idea. Done. -- To view, visit http://gerrit.cloudera.org:8080/8405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 Gerrit-Change-Number: 8405 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Oct 2017 03:48:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
Hello Joe McDonnell, Tim Armstrong, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8405 to look at the new patch set (#3). Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). .. IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). This is a revert of a revert, re-enabling parallel data load. It avoid the race condition by explicitly configuring the temporary directory in question in load-data.py. When the parallel data load change went in, we discovered a race with a signature of: java.io.FileNotFoundException: File /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist The number in this path is milliseconds since the epoch, and the race occurs when two queries submitted to HiveServer2, running with the local runner, hit the same millisecond time stamp. The upstream bug is https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which is now marked as a dupe). I've tested this by running data load 5 times on the same machines where it failed before. I also ran data load manually and inspected the system to make sure that the temporary directories are getting created as expected in /tmp/impala-data-load-*. Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 --- M bin/load-data.py M testdata/bin/create-load-data.sh M testdata/bin/run-hive-server.sh M testdata/bin/run-step.sh 4 files changed, 59 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8405/3 -- To view, visit http://gerrit.cloudera.org:8080/8405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 Gerrit-Change-Number: 8405 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
Hello Joe McDonnell, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8405 to look at the new patch set (#2). Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). .. IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). This is a revert of a revert, re-enabling parallel data load. It avoid the race condition by explicitly configuring the temporary directory in question in load-data.py. When the parallel data load change went in, we discovered a race with a signature of: java.io.FileNotFoundException: File /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist The number in this path is milliseconds since the epoch, and the race occurs when two queries submitted to HiveServer2, running with the local runner, hit the same millisecond time stamp. The upstream bug is https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which is now marked as a dupe). I've tested this by running data load 5 times on the same machines where it failed before. I also ran data load manually and inspected the system to make sure that the temporary directories are getting created as expected in /tmp/impala-data-load-*. Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 --- M bin/load-data.py M testdata/bin/create-load-data.sh M testdata/bin/run-hive-server.sh M testdata/bin/run-step.sh 4 files changed, 58 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8405/2 -- To view, visit http://gerrit.cloudera.org:8080/8405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 Gerrit-Change-Number: 8405 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6108, IMPALA-6070: Parallel data load (re-instated).
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8405 Change subject: IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). .. IMPALA-6108, IMPALA-6070: Parallel data load (re-instated). This is a revert of a revert, re-enabling parallel data load. It avoid the race condition by explicitly configuring the temporary directory in question in load-data.py. When the parallel data load change went in, we discovered a race with a signature of: java.io.FileNotFoundException: File /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist The number in this path is milliseconds since the epoch, and the race occurs when two queries submitted to HiveServer2, running with the local runner, hit the same millisecond time stamp. The upstream bug is https://issues.apache.org/jira/browse/MAPREDUCE-6441, and I described the symptoms in https://issues.apache.org/jira/browse/MAPREDUCE-6992 (which is now marked as a dupe). I've tested this by running data load 5 times on the same machines where it failed before. I also ran data load manually and inspected the system to make sure that the temporary directories are getting created as expected in /tmp/impala-data-load-*. Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 --- M bin/load-data.py M testdata/bin/create-load-data.sh M testdata/bin/run-hive-server.sh M testdata/bin/run-step.sh 4 files changed, 61 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8405/1 -- To view, visit http://gerrit.cloudera.org:8080/8405 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I60d65794da08de4bb3eb439a2414c095f5be0c10 Gerrit-Change-Number: 8405 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8211 ) Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. Patch Set 8: > Patch Set 8: > > > Oof. GVO caught a (correct) clang-tidy issue. Alas, clang-tidy is > > not in pre-review-test, so this was my first experience with that. > > You can also use > https://jenkins.impala.io/view/Utility/job/gerrit-verify-dryrun-external/ , > which does run clang-tidy. Thanks for the pointer. I'm giving it a shot at https://jenkins.impala.io/job/gerrit-verify-dryrun-external/25/ My local clang-tidy build was clean: $ bin/run_clang_tidy.sh | tee clang_tidy.txt $ cat clang_tidy.txt | grep ']' $ -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 21:39:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8211 ) Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. Patch Set 8: Oof. GVO caught a (correct) clang-tidy issue. Alas, clang-tidy is not in pre-review-test, so this was my first experience with that. I've made the trivial change (removed a const in llvm-codegen.h) and am running it through the relevant gauntlet locally. -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 20:56:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8211 to look at the new patch set (#8). Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. IMPALA-5243: Speed up code gen for wide Avro tables. HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in size to the number of columns. On 1000 column tables, codegen time is significant. This commit roughly halves it for wide columns. (Note that this had been much worse in recent history (<= Impala 2.9).) It does so by breaking up MaterializeTuple() into multiple smaller functions, and then calls them in order. When breaking up into 200-column chunks, there is a noticeable speed-up. I've made the helper code for generating LLVM function prototypes have a mutable function name, so that the builder can be re-used multiple times. I've checked by inspecting optimized LLVM that in the case where there's only 1 helper function, code gets inlined so that there doesn't seem to be an extra function. I measured codegen time for various "step sizes." The case where there are no helper functions is about 2.7s. The best case was about a step size of 200, with timings of 1.3s. For the query "select count(int_col16) from functional_avro.widetable_1000_cols", codegen times as a function of step size are roughly as follows. This is averaged across 5 executions, and rounded to 0.1s. step time 10 2.4 50 2.5 75 2.9 100 3.0 125 3.0 150 1.4 175 1.3 200 1.3 <-- chosen step size 225 1.5 250 1.4 300 1.6 400 1.6 500 1.8 1000 2.7 The raw data was generated like so, with some code that let me change the step size at runtime: $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; impala-shell.sh -q "select count(int_col16) from functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' | sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee out.txt ... 200 CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms ... 1000 CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms I have run the core tests with this change. Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 --- M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h 3 files changed, 158 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/8 -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8211 ) Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. Patch Set 7: Code-Review+1 Carrying the +1. I've not made any changes here (except rebases). I've run the core tests successfully. I had run into https://issues.apache.org/jira/browse/IMPALA-6118 and https://issues.apache.org/jira/browse/IMPALA-6106 in previous testing. I didn't see any evidence that this change breaks the build specifically. The specific failed GVO build had the following assertion. That's IMPALA-6099, which is also reasonably unrelated to the Avro-only changes here. F1023 19:06:33.677640 43548 filter-context.cc:49] Check failed: it != counters.end() Tried to increment unknown counter group *** Check failure stack trace: *** @ 0x2f1e11d google::LogMessage::Fail() F1023 19:06:33.677934 43350 filter-context.cc:49] Check failed: it != counters.end() Tried to increment unknown counter group *** Check failure stack trace: *** @ 0x2f1f9c2 google::LogMessage::SendToLog() @ 0x2f1daf7 google::LogMessage::Flush() @ 0x2f1e11d google::LogMessage::Fail() @ 0x2f210be google::LogMessageFatal::~LogMessageFatal() @ 0x2f1f9c2 google::LogMessage::SendToLog() @ 0x1ca4d99 impala::FilterStats::IncrCounters() @ 0x1ca72a4 impala::FilterContext::CheckForAlwaysFalse() @ 0x2f1daf7 google::LogMessage::Flush() @ 0x1aa537d impala::HdfsScanNodeBase::PartitionPassesFilters() @ 0x2f210be google::LogMessageFatal::~LogMessageFatal() @ 0x1b0c9c6 impala::HdfsParquetScanner::GetNextInternal() @ 0x1ca4d99 impala::FilterStats::IncrCounters() @ 0x1b0acca impala::HdfsParquetScanner::ProcessSplit() @ 0x1a99cc0 impala::HdfsScanNode::ProcessSplit() @ 0x1ca72a4 impala::FilterContext::CheckForAlwaysFalse() @ 0x1a99136 impala::HdfsScanNode::ScannerThread() @ 0x1a985d3 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_17ThreadResourceMgr12ResourcePoolEENKUlvE_clEv @ 0x1aa537d impala::HdfsScanNodeBase::PartitionPassesFilters() @ 0x1a9a501 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_17ThreadResourceMgr12ResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x171bdfc boost::function0<>::operator()() @ 0x1b0c9c6 impala::HdfsParquetScanner::GetNextInternal() @ 0x19f3393 impala::Thread::SuperviseThread() @ 0x1b0acca impala::HdfsParquetScanner::ProcessSplit() @ 0x19fbf26 boost::_bi::list4<>::operator()<>() @ 0x1a99cc0 impala::HdfsScanNode::ProcessSplit() @ 0x19fbe69 boost::_bi::bind_t<>::operator()() @ 0x19fbe2c boost::detail::thread_data<>::run() @ 0x1a99136 impala::HdfsScanNode::ScannerThread() @ 0x1a985d3 _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_17ThreadResourceMgr12ResourcePoolEENKUlvE_clEv @ 0x20a7c9a thread_proxy @ 0x1a9a501 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_17ThreadResourceMgr12ResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x7f0cd8e3a6ba start_thread @ 0x7f0cd8b703dd clone -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 16:17:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8368 ) Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. Patch Set 2: (7 comments) Thanks for tackling this! As a user, I hate that this was broken. I think your changes to impala_shell.py look good. I'm a bit more wary of having another superclass for tests with helper functions. http://gerrit.cloudera.org:8080/#/c/8368/2/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8368/2/shell/impala_shell.py@729 PS2, Line 729: def _validate_database(self, immediately=False): Is there a case where you ever want immediately=True? i.e., could we get rid of the two paths here and converge to just one? http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py File tests/common/cluster_controller.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@33 PS2, Line 33: class ClusterController(CustomClusterTestSuite): I'm generally not fond of the inheritance and multiple-inheritance in these tests. Is this actually distinct from CustomClusterTestSuite? http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@64 PS2, Line 64: def start_cluster_with_args(self, **kwargs): This could go straight into the super class. It's tightly coupled with _start_impala_cluster. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py File tests/custom_cluster/test_shell_interactive_reconnect.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@25 PS2, Line 25: TMP_HISTORY_FILE = os.path.expanduser("~/.impalahistorytmp") Perhaps use an actual tempfile created with tempfile.[something] http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@27 PS2, Line 27: class TestShellInteractiveReconnect(ClusterController): Any reason not to add this simply to tests/shell/test_shell_interactive.py? I think because you need CustomCluster to do the restart? If so, please add a comment about this. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@40 PS2, Line 40: @pytest.mark.execute_serially I think all CustomCluster tests run serially? This one seems like it could be parallel. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py File tests/shell/util.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py@116 PS2, Line 116: """ Moves history file to given filepath """ The underlying bug here is that the shell doesn't have a historypath option, but I think this is fine... -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 17:58:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6108: Revert "IMPALA-6070: Parallel data load."
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8386 Change subject: IMPALA-6108: Revert "IMPALA-6070: Parallel data load." .. IMPALA-6108: Revert "IMPALA-6070: Parallel data load." We may be seeing a race with errors like "java.io.FileNotFoundException: File /tmp/hadoop-jenkins/mapred/local/1508958341829_tmp does not exist". This reverts commit e020c37106383be5416f882cbe11fc25efad8968. Change-Id: I46da93f4315a5a4bdaa96fa464cb51922bd6c419 --- M testdata/bin/create-load-data.sh M testdata/bin/run-hive-server.sh M testdata/bin/run-step.sh 3 files changed, 5 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/8386/1 -- To view, visit http://gerrit.cloudera.org:8080/8386 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I46da93f4315a5a4bdaa96fa464cb51922bd6c419 Gerrit-Change-Number: 8386 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-3998: deprecate --refresh after connect
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8381 ) Change subject: IMPALA-3998: deprecate --refresh_after_connect .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8381 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id297f80c0f596a69ef8ecde948812b82d2a5c0fa Gerrit-Change-Number: 8381 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 25 Oct 2017 18:02:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8211 ) Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. Patch Set 6: Code-Review-1 The test failures are real; I'm going to figure it out. -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 25 Oct 2017 15:35:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 ) Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs .. Patch Set 1: Hi Laszlo, We're interested in this change for avoiding some compatibility problems with Hadoop 3. Any news here? Thanks! -- To view, visit http://gerrit.cloudera.org:8080/8294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae Gerrit-Change-Number: 8294 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 24 Oct 2017 23:03:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8363 ) Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_ .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8363/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8363/3//COMMIT_MSG@30 PS3, Line 30: shareded to a default of 4 buckets initally. nit: sharded http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/runtime/query-exec-mgr.cc@84 PS3, Line 84: uint8_t qs_map_bucket = QueryIdToBucket(query_id); : lock_guard l(qs_map_lock_[qs_map_bucket]); Would it make sense to write a helper so that this code could look like: // I don't like my naming here... gs_map_bucket_guard qs_map_bucket_(get_guard(query_id)); qs_map_bucket_.find(query_id); ... ... i.e., every time you use this thing, you have to use QueryIdToBucket(), and then use the result of that for both the lock guard andthe map. Could that be shortened with a helper class that contains both the lock guard and the map you're allowed to use? http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/service/impala-http-handler.cc@244 PS3, Line 244: for (int i = 0 ; i < 4 ; ++i) { 4 should be the constant here, no? http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8363/3/be/src/service/impala-server.h@562 PS3, Line 562: /// on disk. Must be called with the corressponding client_request_state_map_lock_[] nit: corresponding -- To view, visit http://gerrit.cloudera.org:8080/8363 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1 Gerrit-Change-Number: 8363 Gerrit-PatchSet: 3 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 23 Oct 2017 22:34:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8354 ) Change subject: IMPALA-6070: Parallel compute_table_stats.py .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py File tests/util/compute_table_stats.py: http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@71 PS2, Line 71: end(pool.apply_async(compute_stats_table > Thanks for the clarification. I actually missed that you do the same thing This script ignores errors by default, and that's not including silent issues like the one you mention. For its purposes: computing table stats on everything, it seems to do the trick. If I were shipping a "stats computation script", I wouldn't ship this one. (For one, it computes stats even if they've been computed.) In fact, the current invocation tries to compute stats on a view in Kudu and fails, but the failure is ignored. -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 23 Oct 2017 19:28:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Hello Michael Brown, David Knupp, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8354 to look at the new patch set (#4). Change subject: IMPALA-6070: Parallel compute_table_stats.py .. IMPALA-6070: Parallel compute_table_stats.py Uses a thread pool to issue many compute stats commands in parallel to Impala, rather than doing it serially. Where it was obvious, I combined multiple stats commands into fewer, to reduce the number of "show databses" and serialized "show tables" commands. This speeds up the compute stats step in data loading significantly. My measurements for testdata/bin/compute-table-stats.sh running before and after this change, with the Impala daemons restarted (cold) or not restarted (warm) on an 8-core, 32GB RAM machine were: old, cold: 7m44s new, cold: 1m42s old, warm: 1m23s new, warm: 48s The data load in the full test build behaves in a cold fashion. It's typical for https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/ to run this compute stats step for 9 or 10 minutes. With this change, this will come down to about 2 minutes. Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d --- M testdata/bin/compute-table-stats.sh M tests/util/compute_table_stats.py 2 files changed, 59 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8354/4 -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8354 ) Change subject: IMPALA-6070: Parallel compute_table_stats.py .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py File tests/util/compute_table_stats.py: http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@44 PS2, Line 44: if not continue_on_error: raise e > Two nits: even though we tend to do this a lot, PEP 8 frowns on inline if s Sure; fixed. (This was pre-existing.) http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@48 PS2, Line 48: > I'm curious -- why is it necessary to define the client factory inside of t It's a closure. Doing it inline in the main block lets the closure access options.impalad, and so on. Obviously, you could create a class, which takes as init parameters all the options, and then make this a class method. Or you could use globals. I've not changed anything here. http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@71 PS2, Line 71: _factory, db, table, continue_on_error,) > It seems to me like you don't need the intersection. You already have eithe Let's say all_tables = ["a", "b", "c"] and selected_tables = ["b", "c", "d"]. This code is trying to avoid running compute stats on "d", which has been selected but isn't in all tables. http://gerrit.cloudera.org:8080/#/c/8354/2/tests/util/compute_table_stats.py@116 PS2, Line 116: yield impala_client > Would it be possible to decorate client_factory as a @contextmanager, somet Done -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 23 Oct 2017 18:42:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8354 ) Change subject: IMPALA-6070: Parallel compute_table_stats.py .. Patch Set 2: > Patch Set 2: > > (1 comment) Sure. I did the @contextmanager trick. I'm a little bit ambivalent about it, because it's kind of "fancy python", but I think it looks ok. -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 23 Oct 2017 18:08:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Hello Michael Brown, David Knupp, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8354 to look at the new patch set (#3). Change subject: IMPALA-6070: Parallel compute_table_stats.py .. IMPALA-6070: Parallel compute_table_stats.py Uses a thread pool to issue many compute stats commands in parallel to Impala, rather than doing it serially. Where it was obvious, I combined multiple stats commands into fewer, to reduce the number of "show databses" and serialized "show tables" commands. This speeds up the compute stats step in data loading significantly. My measurements for testdata/bin/compute-table-stats.sh running before and after this change, with the Impala daemons restarted (cold) or not restarted (warm) on an 8-core, 32GB RAM machine were: old, cold: 7m44s new, cold: 1m42s old, warm: 1m23s new, warm: 48s The data load in the full test build behaves in a cold fashion. It's typical for https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/ to run this compute stats step for 9 or 10 minutes. With this change, this will come down to about 2 minutes. Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d --- M testdata/bin/compute-table-stats.sh M tests/util/compute_table_stats.py 2 files changed, 58 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8354/3 -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8354 ) Change subject: IMPALA-6070: Parallel compute_table_stats.py .. Patch Set 2: BTW, https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/481/consoleFull did fine. It had both data load parallelism and table stats parallelism. Ran in 3h17m. A baseline is, say, https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/479/, which ran for 3h52m. So, between these two changes, I've picked up about 35 minutes. 22:51:43 Computing table stats (logging to /home/ubuntu/Impala/logs/data_loading/compute-table-stats.log)... 22:53:28 Computing table stats OK (Took: 1 min 45 sec) -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 23 Oct 2017 17:34:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8354 ) Change subject: IMPALA-6070: Parallel compute_table_stats.py .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py File tests/util/compute_table_stats.py: http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py@37 PS1, Line 37: statement = "compute stats %s" % (db_table,) > Do these prints come out clean or all garbled due to the parallelism? They do. At first I thought I didn't care, but you're right, and I've fixed it. I switched to using python logging, which has its own logging to prevent interleaving. As a bonus, we get timestamps if we want to debug how long things take, and thread ids to make it obvious that there's parallelism. http://gerrit.cloudera.org:8080/#/c/8354/1/tests/util/compute_table_stats.py@119 PS1, Line 119: impala_client.connect() > please add newline to separate more cleanly Done -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 23 Oct 2017 17:29:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8354 to look at the new patch set (#2). Change subject: IMPALA-6070: Parallel compute_table_stats.py .. IMPALA-6070: Parallel compute_table_stats.py Uses a thread pool to issue many compute stats commands in parallel to Impala, rather than doing it serially. Where it was obvious, I combined multiple stats commands into fewer, to reduce the number of "show databses" and serialized "show tables" commands. This speeds up the compute stats step in data loading significantly. My measurements for testdata/bin/compute-table-stats.sh running before and after this change, with the Impala daemons restarted (cold) or not restarted (warm) on an 8-core, 32GB RAM machine were: old, cold: 7m44s new, cold: 1m42s old, warm: 1m23s new, warm: 48s The data load in the full test build behaves in a cold fashion. It's typical for https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/ to run this compute stats step for 9 or 10 minutes. With this change, this will come down to about 2 minutes. Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d --- M testdata/bin/compute-table-stats.sh M tests/util/compute_table_stats.py 2 files changed, 62 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8354/2 -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls. .. Patch Set 8: (4 comments) Thanks for the review, Alex! I believe I addressed your comments. I've re-run test_alter_table_create_many_partitions (the new test). I'm kicking off a full test run (https://jenkins.impala.io/view/Utility/job/pre-review-test/68/) to make sure my Lists.partition() change wasn't fat-fingered somehow. LD_LIBRARY_PATH=./toolchain/kudu-bec2a24/release/lib:$LD_LIBRARY_PATH infra/python/env/bin/py.test tests/metadata/test_ddl.py -k test_alter_table_create_many_partitions -s http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1958 PS7, Line 1958: Lists.partition(allHmsPartitionsToAdd, MAX_PARTITION_UPDATES_PER_RPC)) { > While you are here, can you also change bulkAlterPartitions() to use Lists. Done http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@514 PS7, Line 514: service.wait_for_metric_value(class_cache_misses_metric, class_cache_misses) > This test doesn't belong in the TestLibCache class. I moved it up in this faile in test_ddl. Good catch that this file had multiple classes; I missed that somehow. I found a test test_hms_integration.py:test_add_overlapping_partitions that does have multi-partition add alter, but it's all about Hive/Impala compatibility, so I think test_ddl.py is a slightly better home here. I'm very happy to defer to you. http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@519 PS7, Line 519: cache. create_stmts is the list of CREATE statements to be executed in order > "use" can generally cause problems in tests because we sometimes run a test Done http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@527 PS7, Line 527: self.client.set_configuration(vector.get_value("exec_option")) > This comment can become stale quickly. I suggest not adding it an output ro Done -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 23 Oct 2017 17:20:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.
Hello Dimitris Tsirogiannis, Alex Behm, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8238 to look at the new patch set (#9). Change subject: IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls. .. IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls. This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE command. We batch the operations against Hive into groups of 500. I tested this manually, creating 1002 partitions and observing the expected 3 API calls against the Hive Metastore in the log. I can confirm that there is coverage of this in some existing tests. A new, simple, test has been added that confirms that creating 502 partitions works. Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f --- M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/metadata/test_ddl.py 4 files changed, 45 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/9 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 9 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls.
Hello Dimitris Tsirogiannis, Alex Behm, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8238 to look at the new patch set (#8). Change subject: IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls. .. IMPALA-4524: Batch ALTER TABLE...ADD PARTITION calls. This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE command. We batch the operations against Hive into groups of 500. I tested this manually, creating 1002 partitions and observing the expected 3 API calls against the Hive Metastore in the log. I can confirm that there is coverage of this in some existing tests. A new, simple, test has been added that confirms that creating 502 partitions works. Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f --- M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/metadata/test_ddl.py 4 files changed, 45 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/8 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6070: Parallel compute table stats.py
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8354 Change subject: IMPALA-6070: Parallel compute_table_stats.py .. IMPALA-6070: Parallel compute_table_stats.py Uses a thread pool to issue many compute stats commands in parallel to Impala, rather than doing it serially. Where it was obvious, I combined multiple stats commands into fewer, to reduce the number of "show databses" and serialized "show tables" commands. This speeds up the compute stats step in data loading significantly. My measurements for testdata/bin/compute-table-stats.sh running before and after this change, with the Impala daemons restarted (cold) or not restarted (warm) on an 8-core, 32GB RAM machine were: old, cold: 7m44s new, cold: 1m42s old, warm: 1m23s new, warm: 48s The data load in the full test build behaves in a cold fashion. It's typical for https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/ to run this compute stats step for 9 or 10 minutes. With this change, this will come down to about 2 minutes. Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d --- M testdata/bin/compute-table-stats.sh M tests/util/compute_table_stats.py 2 files changed, 58 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/8354/1 -- To view, visit http://gerrit.cloudera.org:8080/8354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ifb080f2552b9dbe304ecadd6e52429214094237d Gerrit-Change-Number: 8354 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8211 ) Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. Patch Set 4: (4 comments) Thanks. I think I got all the long lines out. "git show | egrep '^\+.{91}' | grep -v '^+//'" doesn't return anything any more. (Obviously, we're not wrapping the example codegen.) http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.h File be/src/exec/hdfs-avro-scanner.h: http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.h@241 PS3, Line 241: const SchemaPath& path, const AvroSchemaElement& record, int child_start, > long line Done http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc File be/src/exec/hdfs-avro-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@856 PS3, Line 856: BasicBlock* helper_block = BasicBlock::Create( > Long line Done http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@867 PS3, Line 867: Function* fnHelper = helper_functions[i]; > Long line. Done http://gerrit.cloudera.org:8080/#/c/8211/3/be/src/exec/hdfs-avro-scanner.cc@889 PS3, Line 889: > long line. Done -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 22 Oct 2017 04:13:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8211 to look at the new patch set (#5). Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. IMPALA-5243: Speed up code gen for wide Avro tables. HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in size to the number of columns. On 1000 column tables, codegen time is significant. This commit roughly halves it for wide columns. (Note that this had been much worse in recent history (<= Impala 2.9).) It does so by breaking up MaterializeTuple() into multiple smaller functions, and then calls them in order. When breaking up into 200-column chunks, there is a noticeable speed-up. I've made the helper code for generating LLVM function prototypes have a mutable function name, so that the builder can be re-used multiple times. I've checked by inspecting optimized LLVM that in the case where there's only 1 helper function, code gets inlined so that there doesn't seem to be an extra function. I measured codegen time for various "step sizes." The case where there are no helper functions is about 2.7s. The best case was about a step size of 200, with timings of 1.3s. For the query "select count(int_col16) from functional_avro.widetable_1000_cols", codegen times as a function of step size are roughly as follows. This is averaged across 5 executions, and rounded to 0.1s. step time 10 2.4 50 2.5 75 2.9 100 3.0 125 3.0 150 1.4 175 1.3 200 1.3 <-- chosen step size 225 1.5 250 1.4 300 1.6 400 1.6 500 1.8 1000 2.7 The raw data was generated like so, with some code that let me change the step size at runtime: $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; impala-shell.sh -q "select count(int_col16) from functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' | sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee out.txt ... 200 CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms ... 1000 CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms I have run the core tests with this change. Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 --- M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h 3 files changed, 158 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/5 -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8211 to look at the new patch set (#4). Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. IMPALA-5243: Speed up code gen for wide Avro tables. HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in size to the number of columns. On 1000 column tables, codegen time is significant. This commit roughly halves it for wide columns. (Note that this had been much worse in recent history (<= Impala 2.9).) It does so by breaking up MaterializeTuple() into multiple smaller functions, and then calls them in order. When breaking up into 200-column chunks, there is a noticeable speed-up. I've made the helper code for generating LLVM function prototypes have a mutable function name, so that the builder can be re-used multiple times. I've checked by inspecting optimized LLVM that in the case where there's only 1 helper function, code gets inlined so that there doesn't seem to be an extra function. I measured codegen time for various "step sizes." The case where there are no helper functions is about 2.7s. The best case was about a step size of 200, with timings of 1.3s. For the query "select count(int_col16) from functional_avro.widetable_1000_cols", codegen times as a function of step size are roughly as follows. This is averaged across 5 executions, and rounded to 0.1s. step time 10 2.4 50 2.5 75 2.9 100 3.0 125 3.0 150 1.4 175 1.3 200 1.3 <-- chosen step size 225 1.5 250 1.4 300 1.6 400 1.6 500 1.8 1000 2.7 The raw data was generated like so, with some code that let me change the step size at runtime: $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; impala-shell.sh -q "select count(int_col16) from functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' | sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee out.txt ... 200 CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms ... 1000 CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms I have run the core tests with this change. Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 --- M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h 3 files changed, 156 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/4 -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8320 ) Change subject: IMPALA-6070: Parallel data load. .. Patch Set 2: (9 comments) Thanks for the reviews! I observed memory when watching this, and on my 32GB machine, I always has ~20GB available. I agree with Alex on adding in more things: there are similar changes that can continue to help here, but I'm doing them one at a time. http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@9 PS1, Line 9: This commit loads functional-query, TPC-H data, and TPC-DS data in > nit: Can you wrap this at the red line provided by gerrit? I think it is 72 Done. "gqip" does it in vi. It looks like it's 72 chars. http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@12 PS1, Line 12: 13 minut > nit: minutes Done http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@33 PS1, Line 33: 16:14:33Loading workload 'tpcds' using exploration strategy 'core' OK (Took: 16 min 29 sec) > What testing did you do? Does the data load still run on a non-beefy local Define non-beefy? My desktop is 32 GB and 8 cores. This ran fine. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@480 PS1, Line 480: # Run some steps in parallel, with run-step-backgroundable / run-step-wait-all. > Could add a comment about what you decided to background and what you decid Done. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@492 PS1, Line 492: LOAD_NESTED_ARGS="--cm-host $CM_HOST" > I don't see any reason this also couldn't run in parallel. Yes, but I've not tested this one. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@505 PS1, Line 505: load-data "functional-query" "core" "hbase/none" : fi : : if $KUDU_IS_SUPPORTED; then : # Tests depend on the kudu data being clean, so load > It should be possible to do the same thing for these. That will only save a Yes. I am testing this one, but I'll do a separate patch for it. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh File testdata/bin/run-hive-server.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75 PS1, Line 75: HADOOP_HEAPSIZE="512" hive --service hiveserver2 > ${LOGDIR}/hive-server2.out 2>&1 & > :). I'm still using that good-old machine, mem should be fine (fingers cros 512 works, so that's what I've changed it to. I'm not investigating using -Xms -Xmx to give this more flexibility (but even less predictability). http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh File testdata/bin/run-step.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@53 PS1, Line 53: > nit: only one empty line, to match context Done http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@84 PS1, Line 84: RUN_STEP_MSGS=() > Do you want to reset MSGS, too? Good catch. Done. -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Sat, 21 Oct 2017 21:32:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Hello Jim Apple, Joe McDonnell, Alex Behm, Zach Amsden, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8320 to look at the new patch set (#2). Change subject: IMPALA-6070: Parallel data load. .. IMPALA-6070: Parallel data load. This commit loads functional-query, TPC-H data, and TPC-DS data in parallel. In parallel, these take about 37 minutes, dominated by functional-query. Serially, these take about 30 minutes more, namely the 13 minutes of tpcds and 16 minutes of tpcds. This works out nicely because CPU usage during data load is very low in aggregate. (We don't sustain more than 1 CPU of load, whereas build machines are likely to have many CPUs.) To do this, I added support to run-step.sh to have a notion of a backgroundable task, and support waiting for all tasks. I also increased the heapsize of our HiveServer2 server. When datasets were being loaded in parallel, we ran out of memory at 256MB of heap. The resulting log output is currently like so (but without the timestamps): 15:58:04 Started Loading functional-query data in background; pid 8105. 15:58:04 Started Loading TPC-H data in background; pid 8106. 15:58:04 Loading functional-query data (logging to /home/impdev/Impala/logs/data_loading/load-functional-query.log)... 15:58:04 Started Loading TPC-DS data in background; pid 8107. 15:58:04 Loading TPC-H data (logging to /home/impdev/Impala/logs/data_loading/load-tpch.log)... 15:58:04 Loading TPC-DS data (logging to /home/impdev/Impala/logs/data_loading/load-tpcds.log)... 16:11:31Loading workload 'tpch' using exploration strategy 'core' OK (Took: 13 min 27 sec) 16:14:33Loading workload 'tpcds' using exploration strategy 'core' OK (Took: 16 min 29 sec) 16:35:08Loading workload 'functional-query' using exploration strategy 'exhaustive' OK (Took: 37 min 4 sec) I tested dataloading with the following command on an 8-core, 32GB machine. I saw 19GB of available memory during my run: ./buildall.sh -testdata -build_shared_libs -start_minicluster -start_impala_cluster -format Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac --- M testdata/bin/create-load-data.sh M testdata/bin/run-hive-server.sh M testdata/bin/run-step.sh 3 files changed, 44 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8320/2 -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8038 ) Change subject: IMPALA-5736: Add impala-shell argument to set default query options .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/8038/7/shell/option_parser.py File shell/option_parser.py: http://gerrit.cloudera.org:8080/#/c/8038/7/shell/option_parser.py@91 PS7, Line 91: conve > Unfortunately Python 2.7 is not ubiquitous in Linux distributions like Cent To be specific: CentOS/Redhat 6 uses py2.6. CentOS/Redhat 5: py2.4. -- To view, visit http://gerrit.cloudera.org:8080/8038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 Gerrit-Change-Number: 8038 Gerrit-PatchSet: 10 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Thu, 19 Oct 2017 16:47:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6073: Fail on misconfigured CLASSPATH.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8327 Change subject: IMPALA-6073: Fail on misconfigured CLASSPATH. .. IMPALA-6073: Fail on misconfigured CLASSPATH. Asserts, early, that $CLASSPATH is set and has no wildcards. Several developers have been tripped up running C++ tests without first running 'bin/set-classpath.sh'. This causes them to run into HDFS-11851 (getGlobalJNIEnv() may deadlock on exception), wherein, instead of a relatively clear "Class not found" error, we hang forever. I considered limiting this to tests, but we clearly need a $CLASSPATH for the daemons themselves. Testing: * Ran a backend test without $CLASSPATH set. * Started Impala cluster with this change. Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d --- M be/src/common/init.cc 1 file changed, 7 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/8327/1 -- To view, visit http://gerrit.cloudera.org:8080/8327 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icb92fa5cad8bbc29b620bec5904e45ed7aeff13d Gerrit-Change-Number: 8327 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8262 ) Change subject: IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04 .. Patch Set 3: > Patch Set 3: > > > Sorry to get to this a bit late. > > No apology required. > > I agree with all of your suggestions. > > If you want to file a ticket and assign it to me or if you want to send me a > patch to review, I'd be amenable. Filed https://issues.apache.org/jira/browse/IMPALA-6079. I've left it unassigned. It's not clear to me yet whether some other work will lead me right back to this :) -- To view, visit http://gerrit.cloudera.org:8080/8262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8196a2a87bce5893a349a1b290c3f3d04fd80317 Gerrit-Change-Number: 8262 Gerrit-PatchSet: 3 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 18 Oct 2017 22:38:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8320 Change subject: IMPALA-6070: Parallel data load. .. IMPALA-6070: Parallel data load. This commit loads functional-query, TPC-H data, and TPC-DS data in parallel. In parallel, these take about 37 minutes, dominated by functional-query. Serially, these take about 30 minutes more, namely the 13 minutes of tpcds and 16 minuites of tpcds. This works out nicely because CPU usage during data load is very low in aggregate. (We don't sustain more than 1 CPU of load, whereas build machines are likely to have many CPUs.) To do this, I added support to run-step.sh to have a notion of a backgroundable task, and support waiting for all tasks. I also increased the heapsize of our HiveServer2 server. When datasets were being loaded in parallel, we ran out of memory at 256MB of heap. The resulting log output is currently like so (but without the timestamps): 15:58:04 Started Loading functional-query data in background; pid 8105. 15:58:04 Started Loading TPC-H data in background; pid 8106. 15:58:04 Loading functional-query data (logging to /home/impdev/Impala/logs/data_loading/load-functional-query.log)... 15:58:04 Started Loading TPC-DS data in background; pid 8107. 15:58:04 Loading TPC-H data (logging to /home/impdev/Impala/logs/data_loading/load-tpch.log)... 15:58:04 Loading TPC-DS data (logging to /home/impdev/Impala/logs/data_loading/load-tpcds.log)... 16:11:31Loading workload 'tpch' using exploration strategy 'core' OK (Took: 13 min 27 sec) 16:14:33Loading workload 'tpcds' using exploration strategy 'core' OK (Took: 16 min 29 sec) 16:35:08Loading workload 'functional-query' using exploration strategy 'exhaustive' OK (Took: 37 min 4 sec) Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac --- M testdata/bin/create-load-data.sh M testdata/bin/run-hive-server.sh M testdata/bin/run-step.sh 3 files changed, 40 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8320/1 -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8211 to look at the new patch set (#3). Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. IMPALA-5243: Speed up code gen for wide Avro tables. HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in size to the number of columns. On 1000 column tables, codegen time is significant. This commit roughly halves it for wide columns. (Note that this had been much worse in recent history (<= Impala 2.9).) It does so by breaking up MaterializeTuple() into multiple smaller functions, and then calls them in order. When breaking up into 200-column chunks, there is a noticeable speed-up. I've made the helper code for generating LLVM function prototypes have a mutable function name, so that the builder can be re-used multiple times. I've checked by inspecting optimized LLVM that in the case where there's only 1 helper function, code gets inlined so that there doesn't seem to be an extra function. I measured codegen time for various "step sizes." The case where there are no helper functions is about 2.7s. The best case was about a step size of 200, with timings of 1.3s. For the query "select count(int_col16) from functional_avro.widetable_1000_cols", codegen times as a function of step size are roughly as follows. This is averaged across 5 executions, and rounded to 0.1s. step time 10 2.4 50 2.5 75 2.9 100 3.0 125 3.0 150 1.4 175 1.3 200 1.3 <-- chosen step size 225 1.5 250 1.4 300 1.6 400 1.6 500 1.8 1000 2.7 The raw data was generated like so, with some code that let me change the step size at runtime: $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; impala-shell.sh -q "select count(int_col16) from functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' | sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee out.txt ... 200 CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms ... 1000 CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms I have run the core tests with this change. Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 --- M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h 3 files changed, 148 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/3 -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5243: Speed up code gen for wide Avro tables.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8211 to look at the new patch set (#2). Change subject: IMPALA-5243: Speed up code gen for wide Avro tables. .. IMPALA-5243: Speed up code gen for wide Avro tables. HdfsAvroScanner::CodegenMaterializeTuple generates a function linear in size to the number of columns. On 1000 column tables, codegen time is significant. This commit roughly halves it for wide columns. (Note that this had been much worse in recent history (<= Impala 2.9).) It does so by breaking up MaterializeTuple() into multiple smaller functions, and then calls them in order. When breaking up into 200-column chunks, there is a noticeable speed-up. I've made the helper code for generating LLVM function prototypes have a mutable function name, so that the builder can be re-used multiple times. I've checked by inspecting optimized LLVM that in the case where there's only 1 helper function, code gets inlined so that there doesn't seem to be an extra function. I measured codegen time for various "step sizes." The case where there are no helper functions is about 2.7s. The best case was about a step size of 200, with timings of 1.3s. For the query "select count(int_col16) from functional_avro.widetable_1000_cols", codegen times as a function of step size are roughly as follows. This is averaged across 5 executions, and rounded to 0.1s. step time 10 2.4 50 2.5 75 2.9 100 3.0 125 3.0 150 1.4 175 1.3 200 1.3 <-- chosen step size 225 1.5 250 1.4 300 1.6 400 1.6 500 1.8 1000 2.7 The raw data was generated like so, with some code that let me change the step size at runtime: $(for step in 10 50 75 100 125 150 175 200 225 250 300 400 500 1000; do for try in $(seq 5); do echo $step > /tmp/step_size.txt; echo -n "$step "; impala-shell.sh -q "select count(int_col16) from functional_avro.widetable_1000_cols; profile;" 2> /dev/null | grep -A9 'CodeGen:(Total: [0-9]*s' -m 1 | sed -e 's/ - / /' | sed -e 's/([0-9]*)//' | tr -d '\n' | tr -s ' ' ' '; echo; done; done) | tee out.txt ... 200 CodeGen:(Total: 1s333ms, non-child: 1s333ms, % non-child: 100.00%) CodegenTime: 613.562us CompileTime: 605.320ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 38 NumInstructions: 8.44K OptimizationTime: 701.276ms PeakMemoryUsage: 4.12 MB PrepareTime: 10.014ms ... 1000 CodeGen:(Total: 2s659ms, non-child: 2s659ms, % non-child: 100.00%) CodegenTime: 558.860us CompileTime: 1s267ms LoadTime: 0.000ns ModuleBitcodeSize: 1.95 MB NumFunctions: 34 NumInstructions: 8.41K OptimizationTime: 1s362ms PeakMemoryUsage: 4.11 MB PrepareTime: 10.574ms I have run the core tests with this change. Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 --- M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h 3 files changed, 148 insertions(+), 93 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8211/2 -- To view, visit http://gerrit.cloudera.org:8080/8211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7f1b390be4adf6e6699a18344234f8ff7ee74476 Gerrit-Change-Number: 8211 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 7: Code-Review+1 Carry post rebase. -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 6: Code-Review+1 (1 comment) Carrying Dimitris' +1. http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974 PS3, Line 1974: // HMS because some of them may already exist there. In that case, we load in the : // catalog the partitions that already exist in HMS but aren't in the catalog yet. : if (allHmsPartitionsToAdd.size() != addedHmsPartitions.size()) { : List difference = computeDifference(allHmsPartitionsToAdd, : addedHmsPartitions); : addedHmsPartitions.addAll( : getPartitionsFromHms(msTbl, msClient, tableName, difference)); : } : : for (Partition partition: addedHmsPartitions) { : // Create and add the HdfsPartition to catalog. Return the table object with an : > Well, you're only calling it for the diff, right? I would expect in most ca Yep, I agree. -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 ) Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@276 PS1, Line 276: 169.254.169.254 > What is this address? Can we use a domain name and https? http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html This is EC2 magic. If you're not on EC2, this will tell you to set the environment variables (if you're testing S3). If you're on EC2, this will return something. We don't actually check that those credentials can access the relevant bucket, but at least it lets you through. -- To view, visit http://gerrit.cloudera.org:8080/8294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae Gerrit-Change-Number: 8294 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 18 Oct 2017 21:06:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 ) Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/8294/1/bin/impala-config.sh@256 PS1, Line 256: if [[ -z ${AWS_ACCESS_KEY_ID-} ]]; then > We may want to include a check here that "set -x" has not been enabled? (Bash hackery) If you want, you can use a subshell: $(set -x; (set +x; [ $USER == philip ])) && echo yes || echo no + set +x yes 10:46:02 mystery ~ $(set -x; (set +x; [ $USER == phili ])) && echo yes || echo no + set +x no Return values survive, and you can just do if (set +x; [[ -z ${AWS_SECRET_ACCESS_KEY_ID-} ]]); then ... fi I tested something similar above. -- To view, visit http://gerrit.cloudera.org:8080/8294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae Gerrit-Change-Number: 8294 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 18 Oct 2017 17:47:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Add 'psmisc' to bootstrap system.sh.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8306 ) Change subject: Add 'psmisc' to bootstrap_system.sh. .. Patch Set 1: > Would you like me to submit this for you, Phil? Yes, please. I don't have permission to do it. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/8306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3 Gerrit-Change-Number: 8306 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 18 Oct 2017 17:10:17 + Gerrit-HasComments: No
[Impala-ASF-CR] Add 'psmisc' to bootstrap system.sh.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8306 Change subject: Add 'psmisc' to bootstrap_system.sh. .. Add 'psmisc' to bootstrap_system.sh. testdata/bin/kill-all.sh uses /usr/bin/killall, which is provided by the 'psmisc' package on Ubuntu. This commit adds the package to the list of packages we install. The Docker base image for Ubuntu 16.04 doesn't contain this package, which is how this came up. Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3 --- M bin/bootstrap_system.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/8306/1 -- To view, visit http://gerrit.cloudera.org:8080/8306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I378b6fbf7a7b3f0b42ecf064e98cfe6e29b3 Gerrit-Change-Number: 8306 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Making bin/bootstrap system.sh executable.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8292 Change subject: Making bin/bootstrap_system.sh executable. .. Making bin/bootstrap_system.sh executable. Script should be executable, since it's reasonable to run it standalone. Change-Id: I15973f68e0d6cba86da58a104a607cac0627c4cb --- M bin/bootstrap_system.sh 1 file changed, 0 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/8292/1 -- To view, visit http://gerrit.cloudera.org:8080/8292 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I15973f68e0d6cba86da58a104a607cac0627c4cb Gerrit-Change-Number: 8292 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Hello Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8238 to look at the new patch set (#5). Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE command. We batch the operations against Hive into groups of 500. I tested this manually, creating 1002 partitions and observing the expected 3 API calls against the Hive Metastore in the log. I can confirm that there is coverage of this in some existing tests. A new, simple, test has been added that confirms that creating 502 partitions works. Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f --- M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/metadata/test_ddl.py 4 files changed, 40 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/5 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 4: Code-Review+1 (2 comments) Thanks! http://gerrit.cloudera.org:8080/#/c/8233/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/8233/4/be/src/codegen/llvm-codegen.cc@308 PS4, Line 308: if (!diagnostic_err.empty()) ss <<" "<< diagnostic_err; nit: Do we do spacing around "<<" operators? http://gerrit.cloudera.org:8080/#/c/8233/4/be/src/codegen/llvm-codegen.cc@1625 PS4, Line 1625: diagnostic_printer <<"LLVM diagnostic error: "; nit: space around "<<"? -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 16 Oct 2017 23:28:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 4: (2 comments) Thanks for the review! I ran the core tests. Added a python test, as you suggested. http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@a1954 PS3, Line 1954: > Why did you remove this? Mistake. Re-instated. (I missed that MetaStoreClient was autocloseable...) http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974 PS3, Line 1974: // HMS because some of them may already exist there. In that case, we load in the : // catalog the partitions that already exist in HMS but aren't in the catalog yet. : if (allHmsPartitionsToAdd.size() != addedHmsPartitions.size()) { : List difference = computeDifference(allHmsPartitionsToAdd, : addedHmsPartitions); : addedHmsPartitions.addAll( : getPartitionsFromHms(msTbl, msClient, tableName, difference)); : } : : for (Partition partition: addedHmsPartitions) { : // Create and add the HdfsPartition to catalog. Return the table object with an : > I think you can refactor this so that you make only one call to getPartitio Sure. I tightened the partitioned loop to only include the msClient call. Note that this implies that getPartitionFromHms() is ok to call with a big (> MAX_PARTITION_UPDATES_PER_RPC) batch. Is that against the spirit here? -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 16 Oct 2017 18:43:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Hello Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8238 to look at the new patch set (#4). Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE command. We batch the operations against Hive into groups of 500. I tested this manually, creating 1002 partitions and observing the expected 3 API calls against the Hive Metastore in the log. I can confirm that there is coverage of this in some existing tests. A new, simple, test has been added that confirms that creating 502 partitions works. Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f --- M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/metadata/test_ddl.py 4 files changed, 39 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/4 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8274 ) Change subject: IMPALA-6055: Fix hdfs encryption test far Hadoop 2.8+ .. Patch Set 1: Code-Review+1 (3 comments) This looks fine to me, with minor comment nits. http://gerrit.cloudera.org:8080/#/c/8274/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8274/1//COMMIT_MSG@13 PS1, Line 13: N.B. this is probably worth fixing upstream. We are upstream! http://gerrit.cloudera.org:8080/#/c/8274/1/tests/metadata/test_hdfs_encryption.py File tests/metadata/test_hdfs_encryption.py: http://gerrit.cloudera.org:8080/#/c/8274/1/tests/metadata/test_hdfs_encryption.py@191 PS1, Line 191: # exists. This behavior is expected due to the difference in encryption zones "the difference in encryption zones between the .Trash and the warehouse directory" If that's true, could we elaborate on "difference in encryption zones" perhaps as suggested here? http://gerrit.cloudera.org:8080/#/c/8274/1/tests/metadata/test_hdfs_encryption.py@198 PS1, Line 198: # New HDFS behavior succeeds the query and creates trash; the partition removal s/New HDFS/HDFS 2.8+/? -- To view, visit http://gerrit.cloudera.org:8080/8274 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30234aa50fea93f316e75beea2ced002dcea0c24 Gerrit-Change-Number: 8274 Gerrit-PatchSet: 1 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 13 Oct 2017 22:07:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8262 ) Change subject: IMPALA-6045: Make build scripts more friendly to Ubuntu 16.04 .. Patch Set 3: (2 comments) Sorry to get to this a bit late. I find the distinctions between the (now 3) bootstrap_* scripts to be really hard to grok. Since we're multiplying scripts in this commit, it seems like we should clarify those distinctions. I very much like separating bootstrap_system from bootstrap_development, but I think it doesn't make sense to separate it into two scripts. Rather, I think this would be clearer to use if it looked like: bootstrap_development [install-packages] [checkout-impala] [configure] [build] [loadtestdata] [runtests] (Or something like it; point being to have on entry point with a few variants.) The default actions would be whatever bootstrap_development does today. There's a mild prototype at https://github.com/philz/incubator-impala/commit/e0e7ad3ce07335709987b1b4088ac66ef312e7af if you're interested. This came up for me while trying to time the different parts of the bootstrap script for some Docker work I was doing. In it, I just broke the existing script into ~5 bash functions. http://gerrit.cloudera.org:8080/#/c/8262/3/bin/bootstrap_build.sh File bin/bootstrap_build.sh: http://gerrit.cloudera.org:8080/#/c/8262/3/bin/bootstrap_build.sh@35 PS3, Line 35: python-dev python-setuptools libffi-dev libkrb5-dev Consider just adding openjdk-${JDK_VERSION}-jdk (and friends) here? I don't know that one apt-get command is faster than two, but it's nice to have them all together. http://gerrit.cloudera.org:8080/#/c/8262/3/bin/bootstrap_system.sh File bin/bootstrap_system.sh: http://gerrit.cloudera.org:8080/#/c/8262/3/bin/bootstrap_system.sh@20 PS3, Line 20: # This script bootstraps a system for Impala development from almost nothing; it is known This is very heavily replicated with bootstrap_development; highlighting the differences may make sense. -- To view, visit http://gerrit.cloudera.org:8080/8262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8196a2a87bce5893a349a1b290c3f3d04fd80317 Gerrit-Change-Number: 8262 Gerrit-PatchSet: 3 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 13 Oct 2017 16:54:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8038 ) Change subject: IMPALA-5736: Add impala-shell argument to set default query options .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 Gerrit-Change-Number: 8038 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 13 Oct 2017 16:09:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8238 to look at the new patch set (#3). Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE command. We batch the operations against Hive into groups of 500. I tested this manually, creating 1002 partitions and observing the expected 3 API calls against the Hive Metastore in the log. I can confirm that there is coverage of this in some existing tests. I'm very open to additional suggestions for how to test this. I can certainly add a query test that reproduces my manual testing, if reviewers feel like it's the right direction. Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f --- M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 3 files changed, 12 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/3 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8038 ) Change subject: IMPALA-5736: Add impala-shell argument to set default query options .. Patch Set 4: (10 comments) Thanks! I think this is much clearer than the previous iteration. All of my comments are either nits or requests to add a few more comments. http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@686 PS4, Line 686: tmp_set = {} Add a comment: # Use a temporary to avoid changing set_query_options during iteration. (An alternative is to use del, but use .items() instead of .iteritems()) http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@1330 PS4, Line 1330: if __name__ == "__main__": Perhaps we need a big comment: """ There are two types of options: shell options and query_options. You can set these in the shell itself, which overrides settings on the command line, which override the defaults in impala_shell_config_defaults.py. Query options have no defaults within the impala-shell, but they do have defaults on the server. """ http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@1350 PS4, Line 1350: # default options loaded in from impala_shell_config_defaults.py Let's take the time to update this comment by disambiguating "shell options" vs "query options" here? http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@1354 PS4, Line 1354: impala_shell_defaults.update(loaded_shell_options) I think "impala_query_options_default" is empty, but this assymetry between impala_shell_defaults and query_options is weird. It's weird that you're updating impala_shell_defaults, rather than updating "shell_options". http://gerrit.cloudera.org:8080/#/c/8038/4/shell/impala_shell.py@1437 PS4, Line 1437: query_options.update( Add comment: Override query_options with those specified on the command line. http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py File shell/option_parser.py: http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@36 PS4, Line 36: def convert_loaded_shell_option(option, value, default_options, config_filename): Please document config_filename. I'm unclear how it's used. http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@47 PS4, Line 47: # if the option is not set to either true or false, use the default Do we need to log about the ignored value here? http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@36 PS4, Line 36: def convert_loaded_shell_option(option, value, default_options, config_filename): : """Converts some values based on their type in default_options : """ : if default_options[option] in [True, False]: : # validate the option if it can only be a boolean value : # the only choice for these options is true or false : if value.lower() == "true": : return True : elif value.lower() == 'false': : return False : else: : # if the option is not set to either true or false, use the default : return default_options[option] : elif value.lower() == "none": : return None : elif option.lower() == "config_file": : return config_filename This function is mixing 2-space indent and 4-space indent. Please clean up. http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@164 PS4, Line 164: "Only specify this as a option in the commandline." s/as a option/as an option/ http://gerrit.cloudera.org:8080/#/c/8038/4/shell/option_parser.py@183 PS4, Line 183: help="Sets default query options." Add: "May be specified multiple times." Unless it's clear from the usage? -- To view, visit http://gerrit.cloudera.org:8080/8038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 Gerrit-Change-Number: 8038 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 11 Oct 2017 18:42:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc@308 PS3, Line 308: } nit: This might look like "... to main module.Bla bla." I.e., no space after the period. You might want to add '<< " "' in here to make it look nicer. http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc@1635 PS3, Line 1635: I might be confused about the C++11 stuff, but given that you just std::move'd it, I don't think you need to clear it. http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py File tests/query_test/test_codegen.py: http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py@45 PS1, Line 45: def test_codegen_diagnostic_handler(self, vector): > I tried moving this to llvm-codegen-test.cc but it turns out that a lot ug I don't know LLVM well enough to suggest how to get at these errors. That said, LoadModuleFromFile() takes a regular file, I think, and not one on HDFS. You might be able to just cp ./llvm-ir/test-loop.bc to a temporary file to induce this. (Or create a second test-loop that has a link-time conflict: presumably name collisions cause errors?) -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 11 Oct 2017 17:43:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6027: Retry downloading toolchain components.
Philip Zeyliger has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8258 ) Change subject: IMPALA-6027: Retry downloading toolchain components. .. IMPALA-6027: Retry downloading toolchain components. We've seen intermittent 500 errors when downloading the toolchain from S3 over the HTTPS URLs. As a first stab, this commit retries 3 times, with some jitter. I also changed the threadpool introduced previously to have a limit of 4 threads, because that's sufficient to get the speed improvement. The 500 errors have been observed both before and after the threadpool change. For testing, I ran the straight-forward case directly. I introduced a broken version string to observe that retries would happen on any error from wget. Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b --- M bin/bootstrap_toolchain.py 1 file changed, 15 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/8258/2 -- To view, visit http://gerrit.cloudera.org:8080/8258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b Gerrit-Change-Number: 8258 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-6027: Retry downloading toolchain components.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8258 Change subject: IMPALA-6027: Retry downloading toolchain components. .. IMPALA-6027: Retry downloading toolchain components. We've seen intermittent 500 errors when downloading the toolchain from S3 over the HTTPS URLs. As a first stab, this commit retries 3 times, with some jitter. I also changed the threadpool introduced previously to have a limit of 4 threads, because that's sufficient to get the speed improvement. The 500 errors have been observed both before and after the threadpool change. For testing, I ran the straight-forward case directly. I introduced a broken version string to observe that retries would happen on any error from wget. Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b --- M bin/bootstrap_toolchain.py 1 file changed, 15 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/8258/1 -- To view, visit http://gerrit.cloudera.org:8080/8258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7669c7d41240aa0eb43c30d5bf2bd5c01b66180b Gerrit-Change-Number: 8258 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Download toolchain in parallel.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8237 Change subject: Download toolchain in parallel. .. Download toolchain in parallel. By downloading from the toolchain S3 buckets in parallel with extracting them, this improves bootstrap_toolchain on my machine from about 1m5s to about 30s. $rm -rf toolchain; time bin/bootstrap_toolchain.py > /dev/null real0m29.226s user0m46.516s sys 0m33.820s On a large EC2 machine, closer to the S3 buckets, the new time is 21s. Because multiprocessing hasn't always been available (python2.4 on RHEL5 won't have it), I fall back to a simpler implementation Change-Id: I46a6088bb002402c7653dbc8257dff869afb26ec --- M bin/bootstrap_toolchain.py 1 file changed, 30 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/8237/1 -- To view, visit http://gerrit.cloudera.org:8080/8237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I46a6088bb002402c7653dbc8257dff869afb26ec Gerrit-Change-Number: 8237 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8100 ) Change subject: IMPALA-5940: Avoid log spew by using Status::Expected. .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/8100/3/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: http://gerrit.cloudera.org:8080/#/c/8100/3/be/src/runtime/client-cache.h@246 PS3, Line 246: > Maybe we should include the RPC name in the log message and that will disam I followed Michael's foot steps and looked at the ~17 callers of DoRpc. Far from all of them log. I'm now proposing we do this patch without the DoRpc() changes. From a performance standpoint, if you're timing out on RPCs, the backtrace is the least of your problems, and, yet, I think knowing which RPC is timing out is very helpful. I wasn't able to see a quick way to get the name of the RPC without discouraged C++ shenanigans. I checked that the Thrift Exception doesn't have that, nor do the Thrift structs seem to have self-identifying information. -- To view, visit http://gerrit.cloudera.org:8080/8100 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 Gerrit-Change-Number: 8100 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 06 Oct 2017 23:28:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5940: Avoid log spew by using Status::Expected.
Hello Michael Ho, Sailesh Mukil, Alex Behm, Mostafa Mokhtar, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8100 to look at the new patch set (#6). Change subject: IMPALA-5940: Avoid log spew by using Status::Expected. .. IMPALA-5940: Avoid log spew by using Status::Expected. In IMPALA-5926, we fixed a case where closing the session triggered a stack trace in the logs which impacted performance for short-running queries. Looking at log files from several active clusters, I identified a few other cases where we could clean up log spew with the same (trivial) approach. The following messages will no longer log: "Failed to codegen MaterializeTuple() due to unsupported type: $0" "Not implemented for this format." "ScalarFnCall Codegen not supported for CHAR" "Could not codegen CodegenMaterializeExprs: " "Could not codegen TupleRowComparator::Compare(): $0" These codegen related messages were happening at every execution node, and were therefore very common. In addition, the following messages, which were very frequent in the logs, now will log, but without the back trace: "Query Id ... not found." To do this, I changed them from logging implicitly via Status(...) to logging explicitly. I have not fixed this message: "... Client ... timed-out during recv call." It's in DoRpc(), and not all callers of DoRpc() have independent logging. For now, we'll trade a little bit of log spam for clearer debugging. The snippet I used to identify these was: find . -type f -name '*IMPALAD*.gz' | xargs gzcat | awk '/^I/ { if(x) { print x; } x = "" } /status.cc/ { x=" "; } { if(x) { x=x $0 } }' | sed -e 's/0x[0-9a-fx]* //g' | sed -e 's/[0-9a-f]\{16\}:[0-9a-f]*/QUERYID/g' | tr -s '\t' ' ' | tr '[0-9]' 'N' | sort | uniq -c | sort -n | tee output.txt I also analyzed some logs using SQL, against a pre-processed logs table: with v as ( select regexp_replace( regexp_replace( translate(substr(message, 42), "\n\t", " "), "[a-zA-Z0-9.-]*[.][a-zA-Z0-9-]*:[0-9]*", ""), "@.*$", "@@@...") as m from logs_table where `class`="status.cc") select m, count(*) from v group by 1 order by 2 desc limit 100 Testing: * Automated tests. * Manual testing for one of the new back-trace-suppressed paths: $ impala-python shell/gen-py/ImpalaService/ImpalaService-remote -h localhost:21000 GetRuntimeProfile 'beeswaxd.ttypes.QueryHandle()' Traceback (most recent call last): File "shell/gen-py/ImpalaService/ImpalaService-remote", line 106, in pp.pprint(client.GetRuntimeProfile(eval(args[0]),)) File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", line 161, in GetRuntimeProfile return self.recv_GetRuntimeProfile() File "/home/philip/src/impala/shell/gen-py/ImpalaService/ImpalaService.py", line 184, in recv_GetRuntimeProfile raise result.error beeswaxd.ttypes.BeeswaxException: BeeswaxException(handle=QueryHandle(log_context='', id=''), log_context='', SQLState='HY000', _message='GetRuntimeProfile error: Query id 0:0 not found.\n', errorCode=0) $ grep 'Query id' logs/cluster/impalad.INFO | tail -n 1 I0926 20:29:09.44 6787 impala-server.cc:642] Query id 0:0 not found. Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 --- M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exprs/scalar-fn-call.cc M be/src/runtime/tuple.cc M be/src/service/impala-server.cc M be/src/util/tuple-row-compare.cc 6 files changed, 16 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/8100/6 -- To view, visit http://gerrit.cloudera.org:8080/8100 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I38088482377a1c3e794a9c8178ef83f29957a330 Gerrit-Change-Number: 8100 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 1: (5 comments) I'm excited by anything that makes LLVM debugging easier! http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.h@783 PS1, Line 783: class DiagnosticState { Based on http://llvm.org/doxygen/DiagnosticHandler_8h_source.html, it looks like they recommend us subclassing their class to implement this, and that there are different "remarks" that we can get. It's tricky, but if they've got different warnings, I sure as heck would want to see them during development. http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@186 PS1, Line 186: &this->diagnostic_state_, true); Could you comment on what this third argument does? The llvm header says "expects enabled diagnostics", which I don't understand. http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@304 PS1, Line 304: if (error || diagnostic_state_.EncounteredError()) { So do the diagnostic messages get printed anywhere? http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@1622 PS1, Line 1622: info.print(diagnostic_printer); Does this make it to our logs? http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py File tests/query_test/test_codegen.py: http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py@45 PS1, Line 45: def test_codegen_diagnostic_handler(self, vector): You could probably test this in be/src/codegen/llvm-codegen-test.cc with considerably less fanfare. I think the end-to-end test that we don't fail on a custom UDF is lovely, but a more targeted test makes sense too. -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 06 Oct 2017 23:22:29 + Gerrit-HasComments: Yes