[Impala-ASF-CR] Expose $IMPALA MAVEN OPTIONS for configuring Maven.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8496 ) Change subject: Expose $IMPALA_MAVEN_OPTIONS for configuring Maven. .. Patch Set 4: Code-Review+2 Rebase, carry + 2 -- To view, visit http://gerrit.cloudera.org:8080/8496 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c62185476fd2388c7cda8884276b79a77370127 Gerrit-Change-Number: 8496 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 13 Nov 2017 21:58:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8456 ) Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 09 Nov 2017 23:22:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8456 ) Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 09 Nov 2017 23:22:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Joe McDonnell has removed a vote on this change. Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. Removed Verified+1 by Joe McDonnell -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8456 ) Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. Patch Set 5: Verified+1 Code-Review+2 Rebase, carry +2. -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 08 Nov 2017 23:03:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 6: (1 comment) One small comment http://gerrit.cloudera.org:8080/#/c/8424/6/be/src/runtime/io/disk-io-mgr.h File be/src/runtime/io/disk-io-mgr.h: http://gerrit.cloudera.org:8080/#/c/8424/6/be/src/runtime/io/disk-io-mgr.h@184 PS6, Line 184: /// TODO: Break this up the .h/.cc into multiple files under an /io subdirectory. Can remove this TODO -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 Nov 2017 19:39:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6148: Specifying thirdparty deps as URLs
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8456 ) Change subject: IMPALA-6148: Specifying thirdparty deps as URLs .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8456/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/8456/3/bin/impala-config.sh@138 PS3, Line 138: # Download URLs for toolchain dependencies can be overridden by : # IMPALA__URLs in impala-config-*.sh. We unset them here first: : for var in "${!IMPALA_@}"; do : if [[ "$var" == IMPALA_*_URL ]]; then : unset $var : fi : done If I understand this correctly, setting these in the environment won't work. Instead, they must be set in impala-config-branch.sh or impala-config-local.sh. Is that what we want? One thing that we do for some environment variables crucial to the build is to have an OVERRIDE version of these variables. Sometimes we also respect the environment itself. See: DOWNLOAD_CDH_COMPONENTS HADOOP_INCLUDE_DIR_OVERRIDE HADOOP_LIB_DIR_OVERRIDE HIVE_SRC_DIR_OVERRIDE -- To view, visit http://gerrit.cloudera.org:8080/8456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4628d86022d4bd8b762313f7056d76416a58b422 Gerrit-Change-Number: 8456 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 07 Nov 2017 01:34:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 15: (3 comments) http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@77 PS15, Line 77: void ClearIndices() { > We generally use the more concise one-line formatting for very short single To add to what Tim said, this code didn't actually change, so it is nice to avoid touching code if you can. It makes it easier to track down what code change last impacted a piece of code. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@152 PS15, Line 152: /// Destructor, release the bytes used by Memtracker. : ~DictEncoder() { : ReleaseBytes(); : DCHECK(dict_bytes_cnt_ == 0); : } I think we can omit this destructor, since DictEncoderBase does the same thing. http://gerrit.cloudera.org:8080/#/c/8034/15/be/src/util/dict-encoding.h@296 PS15, Line 296: /// Destructor, release the bytes used by Memtracker and close. : ~DictDecoder() { : ReleaseBytes(); : DCHECK(dict_bytes_cnt_ == 0); :} I think we can omit this destructor, as DictDecoderBase does this. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 15 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 03 Nov 2017 23:14:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 4: Made a first pass through this and it makes sense so far. I will make a second pass soon. -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 02 Nov 2017 21:24:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/util/dict-encoding.h@63 PS12, Line 63: /// dict_encoded_size() bytes. : virtual void Writ > implemented the logic of Close() inside ClearIndices() Close() should call ClearIndices() as one of its steps, ClearIndices() still needs to exist separately and cannot incorporate the logic of Close(). -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 14 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 01:16:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resourcesa
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8323 ) Change subject: IMPALA-1575: part 2: yield admission control resourcesa .. Patch Set 4: Code-Review+1 This looks good to me. It is much clearer than my original patch. -- To view, visit http://gerrit.cloudera.org:8080/8323 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I80279eb2bda740d7f61420f52db3bfa42a6a51ac Gerrit-Change-Number: 8323 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 30 Oct 2017 23:10:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 12: (5 comments) I think this is very close. http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/exec/hdfs-parquet-scanner.h@352 PS12, Line 352: MemTracker* dict_mem_tracker() { return scan_node_->mem_tracker(); } See comment in HdfsParquetTableWriter. Also, see how we pass the mem_tracker to data_page_pool_. http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/exec/hdfs-parquet-table-writer.h File be/src/exec/hdfs-parquet-table-writer.h: http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/exec/hdfs-parquet-table-writer.h@81 PS12, Line 81: /// Memory tracker to track the memory used by encoder. : MemTracker* dict_mem_tracker(); When I'm reading the code, I'm thinking that this is hiding more than it is illuminating. We use this in exactly one place. I would rather have the exact code right there with a good comment explaining which mem_tracker we are using. http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/exec/hdfs-parquet-table-writer.cc@179 PS12, Line 179: dict_encoder_base_->ClearIndices(); : dict_encoder_base_->Close(); When Close() does a call to ClearIndices(), this can be simplified to just a Close() call. http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/exec/hdfs-parquet-table-writer.cc@322 PS12, Line 322:plain_encoded_value_size_, :parent_->dict_mem_tracker())); Nit: Indentation in this case should be even with the "D" in new DictEncoder. It goes against every editor's typical behavior, but it's the standard we have. Also, feel free to keep plain_encoded_value_size_ on the first line. (It's on the edge of our 90 char limit.) http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/12/be/src/util/dict-encoding.h@63 PS12, Line 63: void Close() { : ReleaseBytes(); I think Close() should do ClearIndices(). -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 12 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 23:59:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8350 ) Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. Patch Set 6: Code-Review+2 Rebased, carry +2 -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Wed, 25 Oct 2017 00:18:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8350 ) Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. Patch Set 5: Code-Review+1 Carry Alex's +1 -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 24 Oct 2017 20:53:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8350 to look at the new patch set (#4). Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. IMPALA-6068: Fix dataload for complextypes_fileformat Dataload typically follows a pattern of loading data into a text version of a table, and then using an insert overwrite from the text table to populate the table for other file formats. This insert is always done in Impala for Parquet and Kudu. Otherwise it runs in Hive. Since Impala doesn't support writing nested data, the population of complextypes_fileformat tries to hack the insert to run in Hive by including it in the ALTER part of the table definition. ALTER runs immediately after CREATE and always runs in Hive. The problem is that ALTER also runs before the base table (functional.complextypes_fileformat) is populated. The insert succeeds, but it is inserting zero rows. This code change introduces a way to force the Parquet load to run using Hive. This lets complextypes_fileformat specify that the insert should happen in Hive and fixes the ordering so that the table is populated correctly. This is also useful for loading custom Parquet files into Parquet tables. Hive supports the DATA LOAD LOCAL syntax, which can read a file from the local filesystem. This means that several locations that currently use the hdfs commandline can be modified to use this SQL. This change speeds up dataload by a few minutes, as it avoids the overhead of the hdfs commandline. Any other location that could use DATA LOAD LOCAL is also switched over to use it. This includes the testescape* tables which now print the appropriate DATA LOAD commands as a result of text_delims_table.py. Any location that already uses DATA LOAD LOCAL is also switched to indicate that it must run in Hive. Any location that was doing an HDFS command in the LOAD section is moved to the LOAD_DEPENDENT_HIVE section. Testing: Ran dataload and core tests. Also verified that functional_parquet.complextypes_fileformat has rows. Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 --- M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/common/text_delims_table.py M testdata/common/widetable.py M testdata/datasets/functional/functional_schema_template.sql 5 files changed, 160 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8350/4 -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8350 ) Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. Patch Set 3: (1 comment) Found a couple issues in testing. Backed out some unrelated optimizations to fix the issue. I will merge those separately. http://gerrit.cloudera.org:8080/#/c/8350/3/testdata/common/text_delims_table.py File testdata/common/text_delims_table.py: http://gerrit.cloudera.org:8080/#/c/8350/3/testdata/common/text_delims_table.py@60 PS3, Line 60: print ("LOAD DATA LOCAL INPATH '%s' " > Mention that this is intended to be printed into one of the sections of our Added comments to this file and widetable.py to clarify this. -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Tue, 24 Oct 2017 20:52:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8303 ) Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. Patch Set 9: Code-Review+1 (1 comment) This makes sense to me. http://gerrit.cloudera.org:8080/#/c/8303/9/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/8303/9/be/src/runtime/query-state.cc@314 PS9, Line 314: exec_resource_refcnt_.Add(1); // decremented in ExecFInstance() Does it make sense for these exec_resource_refcnt_.Add(1) locations to use AcquireExecResourceRefcount? -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 23 Oct 2017 22:18:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8350 to look at the new patch set (#3). Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. IMPALA-6068: Fix dataload for complextypes_fileformat Dataload typically follows a pattern of loading data into a text version of a table, and then using an insert overwrite from the text table to populate the table for other file formats. This insert is always done in Impala for Parquet and Kudu. Otherwise it runs in Hive. Since Impala doesn't support writing nested data, the population of complextypes_fileformat tries to hack the insert to run in Hive by including it in the ALTER part of the table definition. ALTER runs immediately after CREATE and always runs in Hive. The problem is that ALTER also runs before the base table (functional.complextypes_fileformat) is populated. The insert succeeds, but it is inserting zero rows. This code change introduces a way to force the Parquet load to run using Hive. This lets complextypes_fileformat specify that the insert should happen in Hive and fixes the ordering so that the table is populated correctly. This is also useful for loading custom Parquet files into Parquet tables. Hive supports the DATA LOAD LOCAL syntax, which can read a file from the local filesystem. This means that several locations that currently use the hdfs commandline can be modified to use this SQL. This change speeds up dataload by a few minutes, as it avoids the overhead of the hdfs commandline. Any other location that could use DATA LOAD LOCAL is also switched over to use it. This includes the testescape* tables which now print the appropriate DATA LOAD commands as a result of text_delims_table.py. Any location that already uses DATA LOAD LOCAL is also switched to indicate that it must run in Hive. Any location that was doing an HDFS command in the LOAD section is moved to the LOAD_DEPENDENT_HIVE section. Testing: Ran dataload and core tests. Also verified that functional_parquet.complextypes_fileformat has rows. Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 --- M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/common/text_delims_table.py M testdata/datasets/functional/functional_schema_template.sql 4 files changed, 154 insertions(+), 156 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8350/3 -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8350 ) Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. Patch Set 2: (1 comment) The next upload does two things: 1. It adds asserts to make sure CREATE and CREATE_HIVE can't both be specified and DEPENDENT_LOAD and DEPENDENT_LOAD_HIVE can't both be specified. 2. It changes the testescape* tables to use LOAD DATA statements (by printing them from text_delims_table.py). 3. Fixes a typo in one of the SQLs http://gerrit.cloudera.org:8080/#/c/8350/2/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/8350/2/testdata/bin/generate-schema-statements.py@528 PS2, Line 528: insert = eval_section(section['DEPENDENT_LOAD']) > Are these two sections intended to be mutually exclusive? If so, then we sh Added an assert that we don't specify both. -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Mon, 23 Oct 2017 22:08:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Joe McDonnell has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8350 ) Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. IMPALA-6068: Fix dataload for complextypes_fileformat Dataload typically follows a pattern of loading data into a text version of a table, and then using an insert overwrite from the text table to populate the table for other file formats. This insert is always done in Impala for Parquet and Kudu. Otherwise it runs in Hive. Since Impala doesn't support writing nested data, the population of complextypes_fileformat tries to hack the insert to run in Hive by including it in the ALTER part of the table definition. ALTER runs immediately after CREATE and always runs in Hive. The problem is that ALTER also runs before the base table (functional.complextypes_fileformat) is populated. The insert succeeds, but it is inserting zero rows. This code change introduces a way to force the Parquet load to run using Hive. This lets complextypes_fileformat specify that the insert should happen in Hive and fixes the ordering so that the table is populated correctly. This is also useful for loading custom Parquet files into Parquet tables. Hive supports the DATA LOAD LOCAL syntax, which can read a file from the local filesystem. This means that several locations that currently use the hdfs commandline can be modified to use this SQL. This change speeds up dataload by a few minutes, as it avoids the overhead of the hdfs commandline. Any other location that could use DATA LOAD LOCAL is also switched over to use it. Any location that already uses DATA LOAD LOCAL is also switched to indicate that it must run in Hive. Testing: Ran dataload and verified that functional_parquet.complextypes_fileformat has rows. Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 --- M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/datasets/functional/functional_schema_template.sql 3 files changed, 141 insertions(+), 138 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8350/2 -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-6068: Fix dataload for complextypes fileformat
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8350 Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat .. IMPALA-6068: Fix dataload for complextypes_fileformat Dataload typically follows a pattern of loading data into a text version of a table, and then using an insert overwrite from the text table to populate the table for other file formats. This insert is always done in Impala for Parquet and Kudu. Otherwise it runs in Hive. Since Impala doesn't support writing nested data, the population of complextypes_fileformat tries to hack the insert to run in Hive by including it in the ALTER part of the table definition. ALTER runs immediately after CREATE and always runs in Hive. The problem is that ALTER also runs before the base table (functional.complextypes_fileformat) is populated. The insert succeeds, but it is inserting zero rows. This code change introduces a way to force the Parquet load to run using Hive. This lets complextypes_fileformat specify that the insert should happen in Hive and fixes the ordering so that the table is populated correctly. This is also useful for loading custom Parquet files into Parquet tables. Hive supports the DATA LOAD LOCAL syntax, which can read a file from the local filesystem. This means that several locations that currently use the hdfs commandline can be modified to use this SQL. This change speeds up dataload by a few minutes, as it avoids the overhead of the hdfs commandline. Any other location that could use DATA LOAD LOCAL is also switched over to use it. Any location that already uses DATA LOAD LOCAL is also switched to indicate that it must run in Hive. Testing: Ran dataload and verified that functional_parquet.complextypes_fileformat has rows. Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 --- M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/datasets/functional/functional_schema_template.sql 3 files changed, 133 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8350/1 -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8303 ) Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. Patch Set 9: (1 comment) Making my way through this. http://gerrit.cloudera.org:8080/#/c/8303/9/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8303/9/be/src/runtime/coordinator.cc@842 PS9, Line 842: if (!needs_finalization_ && !status.ok()) return status; Just checking my own understanding of this. If this case triggers, the resource refcnt should be released in CancelInternal(). Is that right? -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Fri, 20 Oct 2017 17:33:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6070: Parallel data load.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8320 ) Change subject: IMPALA-6070: Parallel data load. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@505 PS1, Line 505: # Tests depend on the kudu data being clean, so load the data from scratch. : run-step "Loading Kudu functional" load-kudu.log \ : load-data "functional-query" "core" "kudu/none/none" force : run-step "Loading Kudu TPCH" load-kudu-tpch.log \ : load-data "tpch" "core" "kudu/none/none" force It should be possible to do the same thing for these. That will only save about 4 minutes, but this runs even when loading from a snapshot. -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 18 Oct 2017 23:16:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8294 ) Change subject: IMPALA-6067: Enable s3 access via IAM roles for EC2 VMs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8294/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/8294/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3000 PS1, Line 3000: AnalysisError(String.format("load data inpath '%s' %s into table " + : "tpch.lineitem", "s3a://bucket/test-warehouse/test.out", overwrite), : "INPATH location 's3a://bucket/test-warehouse/test.out' must point to an " + : "HDFS, S3A or ADL filesystem."); Impala cannot load data from s3n. I think this test is intended to verify our error message when given an s3n path, so I don't think an s3a path will work here. The source of the error is LoadDataStmt::analyzePaths(). -- To view, visit http://gerrit.cloudera.org:8080/8294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14cd9d4453a91baad3c379aa7e4944993fca95ae Gerrit-Change-Number: 8294 Gerrit-PatchSet: 1 Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Tue, 17 Oct 2017 23:30:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 8: (8 comments) http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-scanner.cc@274 PS8, Line 274: // Before closing the Column readers, the accounted bytes in mem_tracker for : // dict_decoder_ is released. : if (mem_tracker_ != nullptr) { : for (ParquetColumnReader* col_reader : column_readers_) col_reader->Cleanup(); : } I think this code won't be necessary if col_reader->Close() does the dictionary close. http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-scanner.cc@348 PS8, Line 348: CloseAndUnregisterFromParent Same question here as elsewhere: Do we really need CloseAndUnregisterFromParent() or will Close() do? http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-table-writer.cc@178 PS8, Line 178: dict_encoder_base_->ClearIndices(); This should call the new base Close() function (which should do ClearIndices()). http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/hdfs-parquet-table-writer.cc@1057 PS8, Line 1057: CloseAndUnregisterFromParent(); I think we want to limit use of CloseAndUnregisterFromParent() to cases that really need it. I think the ordinary Close() should work here. Check if that works. http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/parquet-column-readers.h@234 PS8, Line 234: /// Delete the bytes used for memory tracking. : virtual void Cleanup() {} I think this can be folded into Close() and removed. http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/exec/parquet-column-readers.cc@269 PS8, Line 269: virtual void Cleanup() { : dict_decoder_.Close(); : } I think this can be folded into Close() without a separate Cleanup() function. http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/util/dict-encoding.h@107 PS8, Line 107: DictEncoder(MemPool* pool, int encoded_value_size, MemTracker* mem_tracker) : : DictEncoderBase(pool), buckets_(HASH_TABLE_SIZE, Node::INVALID_INDEX), : encoded_value_size_(encoded_value_size), : dict_mem_tracker_(mem_tracker), : dict_bytes_cnt_(0) { } As I understand it, the DictEncoderBase/DictEncoder split is that DictEncoderBase contains everything that does not rely on T and DictEncoder contains the type-specific stuff. The same split applies for DictDecoderBase/DictDecoder. Since the memory tracking is not type specific, I think it makes sense to put it in the base classes. This should simplify the code in HdfsParquetWriter, because then you can call Close() on the DictEncoderBase rather than having to go to the typed version. Close() on DictEncoderBase should also do ClearIndices(). http://gerrit.cloudera.org:8080/#/c/8034/8/be/src/util/dict-encoding.h@251 PS8, Line 251: ReleaseBytes(); Is there any codepath that should legitimately use this? If not, DCHECK that dict_bytes_cnt_ is zero. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 8 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Thu, 12 Oct 2017 17:23:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: [DOCS] Document file handle caching
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8200 ) Change subject: IMPALA-4623: [DOCS] Document file handle caching .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f Gerrit-Change-Number: 8200 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Comment-Date: Thu, 05 Oct 2017 23:38:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: [DOCS] Document file handle caching
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8200 ) Change subject: IMPALA-4623: [DOCS] Document file handle caching .. Patch Set 3: Code-Review+1 This looks right to me. Are we expecting any other reviewers? -- To view, visit http://gerrit.cloudera.org:8080/8200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f Gerrit-Change-Number: 8200 Gerrit-PatchSet: 3 Gerrit-Owner: John Russell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Comment-Date: Thu, 05 Oct 2017 23:22:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: [DOCS] Document file handle caching
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8200 ) Change subject: IMPALA-4623: [DOCS] Document file handle caching .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml File docs/topics/impala_scalability.xml: http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@967 PS1, Line 967: although the encryption layer : adds overhead that might lessen the benefit of the caching. > I had written in the notes from our conversation HDFS encryption adds overh The closest thing I can think of as being related to this is IMPALA-5909, where a problem with HDFS code causes excessive logging. However, I don't think that matches this statement, and it is a bug. If the sole source for this is me, then I don't think it should be included. I don't remember what I said originally (and it may have been ambiguous), but I don't currently think this statement is true. -- To view, visit http://gerrit.cloudera.org:8080/8200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f Gerrit-Change-Number: 8200 Gerrit-PatchSet: 2 Gerrit-Owner: John Russell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Russell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Comment-Date: Thu, 05 Oct 2017 22:44:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: [DOCS] Document file handle caching
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8200 ) Change subject: IMPALA-4623: [DOCS] Document file handle caching .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml@338 PS1, Line 338: continuously appended by an HDFS mechanism This also applies if an HDFS file is overwritten in place. http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml File docs/topics/impala_scalability.xml: http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@967 PS1, Line 967: although the encryption layer : adds overhead that might lessen the benefit of the caching. I'm not familiar with this overhead. What is this referring to? http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@973 PS1, Line 973: 20 thousand Just curious: How do you decide to use "20 thousand" vs "20,000"? http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@991 PS1, Line 991: evict any stale file handles from the cache The file handles won't actually be evicted directly. The new metadata will mean that new statements will no longer use that file handle and eventually it will get aged out. I'm not sure if this distinction is important for documentation, but I think the important thing is that the memory may not be freed immediately. (This is something we are likely to change in a future release.) http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@995 PS1, Line 995: To evaluate the effectiveness of file handle caching for a particular workload, issue the : PROFILE statement in impala-shell or examine query : profiles in the Impala web UI. Look for the ratio of CachedFileHandlesHitCount : (ideally, should be high) to CachedFileHandlesMissCount (ideally, should be low). : Before starting any evaluation, run some representative queries to warm up the cache, : because the first time each data file is accessed is always recorded as a cache miss. I'm not sure this belongs here, but information about the cache across the whole impalad is available via the metrics page under impala-server: impala-server.io.mgr.cached-file-handles-miss-count impala-server.io.mgr.cached-file-handles-hit-count The total number of file handles in the cache is: impala-server.io.mgr.num-cached-file-handles -- To view, visit http://gerrit.cloudera.org:8080/8200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f Gerrit-Change-Number: 8200 Gerrit-PatchSet: 1 Gerrit-Owner: John Russell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Comment-Date: Thu, 05 Oct 2017 02:37:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8056 ) Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet .. Patch Set 5: (2 comments) I'm close to a +1 on this. Only minor thoughts left for me. I just invited Tim to the review. http://gerrit.cloudera.org:8080/#/c/8056/5/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/8056/5/tests/query_test/test_scanners_fuzz.py@103 PS5, Line 103: with no compression. Nit: I would emphasize that this has a new table. Something like: "into a new table with no compression" http://gerrit.cloudera.org:8080/#/c/8056/5/tests/query_test/test_scanners_fuzz.py@110 PS5, Line 110: for orig_tbl_name in tbl_list: : src_table_name = "parquet_uncomp_src_" + orig_tbl_name : dst_table_name = "parquet_uncomp_dst_" + orig_tbl_name Nit: I think I would prefer to have "dst_table_name" be "fuzz_table_name". Also, in other code, I think it would be good to be explicit about this being the fuzz_db and fuzz_table rather than dst_db and dst_table. -- To view, visit http://gerrit.cloudera.org:8080/8056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc Gerrit-Change-Number: 8056 Gerrit-PatchSet: 5 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 02 Oct 2017 22:05:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h@412 PS3, Line 412: ConsumeBytes(sizeof(value)); > using num_values * size of type might not work when dealing with variable s For strings, the StringValue is a pointer and a length. In the case of the dictionary, the pointer is pointing into the dictionary (allocated from dictionary_pool_ in InitDictionary()). Since the memory for the dictionary page is already tracked, we only need to track these StringValue's, which are fixed-size. Either way, we should aggregate the Consume() calls here. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Mon, 02 Oct 2017 21:48:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/exec/data-sink.h File be/src/exec/data-sink.h: http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/exec/data-sink.h@126 PS3, Line 126: /// This memtracker tracks all the allocations done by parquet dictonary encoder. : boost::scoped_ptr dict_mem_tracker_; Can we move this down to HdfsParquetTableWriter? http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/exec/exec-node.h File be/src/exec/exec-node.h: http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/exec/exec-node.h@325 PS3, Line 325: /// Tracks all the memory allocation by parquet dictonary decoders : boost::scoped_ptr dict_mem_tracker_; Can this move down to HdfsParquetScanner? Other ExecNodes don't need this. http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h@113 PS3, Line 113: Remove trailing spaces. This also applies to lines that don't have any code (there are a couple others in this file). These are easily visible in the Gerrit UI. If using emacs, the following will highlight trailing whitespace: (setq-default show-trailing-whitespace t) http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h@164 PS3, Line 164: mtrackp For class fields, the convention is to end the field name with an underscore. Also, look at other code that uses MemTrackers and see what variable name that code uses. It is good to harmonize these things across the codebase. http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h@412 PS3, Line 412: ConsumeBytes(sizeof(value)); The parquet DictionaryPageHeader contains a num_values field. Look where we call CreateDictionaryDecoder and you'll see that we reference to it on the next line. I think we should consider doing a single ConsumeBytes using num_values * size of type. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Thu, 28 Sep 2017 16:55:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8132 ) Change subject: IMPALA-5975: Work around broken beeline clients .. Patch Set 2: Code-Review+2 Carrying +2 -- To view, visit http://gerrit.cloudera.org:8080/8132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada Gerrit-Change-Number: 8132 Gerrit-PatchSet: 2 Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Tue, 26 Sep 2017 23:25:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8056 ) Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet .. Patch Set 4: (4 comments) I like the overall approach. I have a few small naming/style issues, but I think this is getting closer. http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@101 PS4, Line 101: """ Clone an existing parquet table with codec as none in the : unique database. This cloned table is passed to run_fuzz_test : which clones the table and corrupts the table. The test later : checks that there is no crash while performing SQL queries on : a corrupt table. : """ I think this comment should focus on why this test is different from the others. For example, it should explain that the parquet tables in the default schema are always compressed. So, in order to test uncompressed parquet, we need to create a new source table. I think you can skip the last two sentences. http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@111 PS4, Line 111: db_name = unique_database I would prefer to emphasize that the source and destination are the unique_database. To make that clearer, I think I would get rid of this variable and just use unique_database directly in each location. http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@117 PS4, Line 117: functional_parquet.alltypes Can we extend this to do fuzzing on decimal_tbl as well? I was thinking this could be a loop that runs fuzzing over a list of tables (that happens to have two entries). http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@118 PS4, Line 118: .format(fq_tbl_name)) This indentation is a bit awkward. I don't think .format should be on its own line. One way to get around this is to use only 4 space indentation for the second line (" select"...) and then put the .format on that line. -- To view, visit http://gerrit.cloudera.org:8080/8056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc Gerrit-Change-Number: 8056 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Mon, 25 Sep 2017 23:25:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5174: Suppress kudu flags that aren't relevant to Impala
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5174: Suppress kudu flags that aren't relevant to Impala .. Patch Set 1: Code-Review+1 I think this makes sense. At some point, should we add a test that tracks when parameters are added or removed from the UI? It seems like something that could happen inadvertently. It would add some overhead to adding/removing params. -- To view, visit http://gerrit.cloudera.org:8080/8074 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I499c903cde92595c4a02803ecbf98ac1d41517b4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh .. Patch Set 3: Ran full dataload + tests and everything worked. I think this is ready to merge. -- To view, visit http://gerrit.cloudera.org:8080/8081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic312df4597c7d211d4ecd551d572f751aea0cd24 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8056/2/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: PS2, Line 97: """ > I'll use the functional-parquet to create the cloned table which will be pa One thing that I failed to mention clearly enough is that the uncompressed parquet table should not be created in the default schemas. Instead, it should be created in unique_database. This is why the function signature of run_fuzz_test will change. The normal tests will pass in table_format and a table from the default schema. Your new test will pass in unique_database and the table you created. -- To view, visit http://gerrit.cloudera.org:8080/8056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8056/2/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: PS2, Line 97: fq_tbl_name = "functional_parquet" + "." + tbl_name > Do not create a table in the default schemas. Instead, with the changes I d Another option is to keep the clone code in run_fuzz_test, but change the arguments so that it specifies a source database, source table, destination database, and destination table. The existing code would simply pass in the appropriate existing table. The uncompressed parquet code would create an uncompressed parquet table and pass that in. -- To view, visit http://gerrit.cloudera.org:8080/8056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh .. Patch Set 1: (1 comment) We have had it this way for a very long time, so I'm kicking off some Jenkins jobs to test data load. http://gerrit.cloudera.org:8080/#/c/8081/1/bin/create-test-configuration.sh File bin/create-test-configuration.sh: Line 99:-f ${HIVE_HOME}/scripts/metastore/upgrade/postgres/hive-schema-1.1.0.postgres.sql > Very minor comment: it might be simpler to use a relative path here and avo Fixed, that is definitely cleaner. I tested it on my box and it seems to work. -- To view, visit http://gerrit.cloudera.org:8080/8081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic312df4597c7d211d4ecd551d572f751aea0cd24 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8081 to look at the new patch set (#2). Change subject: IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh .. IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh The Hive Metastore schema script includes other SQL scripts using \i, which expects absolute paths. Since we currently invoke it from outside the schema script directory, it is unable to find those included scripts. The fix is to switch to the Hive Metastore script directory when invoking the schema script. Change-Id: Ic312df4597c7d211d4ecd551d572f751aea0cd24 --- M bin/create-test-configuration.sh 1 file changed, 5 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/8081/2 -- To view, visit http://gerrit.cloudera.org:8080/8081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic312df4597c7d211d4ecd551d572f751aea0cd24 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet .. Patch Set 2: (2 comments) Take a look at these suggestions and let me know if they make sense. http://gerrit.cloudera.org:8080/#/c/8056/2/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: PS2, Line 97: fq_tbl_name = "functional_parquet" + "." + tbl_name Do not create a table in the default schemas. Instead, with the changes I described elsewhere, create a separate clone function that will create an uncompressed parquet table directly in unique_database and then pass that into run_fuzz_test. The existing tests will run a simple table clone function that is equivalent to the current code. PS2, Line 129: self.execute_query("create table %s.%s like %s" % (unique_database, table, table)) : fuzz_table_location = get_fs_path("/test-warehouse/{0}.db/{1}".format( : unique_database, table)) Pull this logic out into its own function e.g. clone_table. run_fuzz_test should take in a table that has already been created in unique_database. It should not do the clone itself. This allows you to use a different clone function to create a parquet table without compression. -- To view, visit http://gerrit.cloudera.org:8080/8056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8034/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS2, Line 213: dict_decoder_(parent_->dictionary_pool_.get()), We should think about having a separate MemPool for parts of the dictionary that will not be referenced by the RowBatch and can be freed at the end of the row group. Since some datatypes are copied by value rather than pointing into the dictionary, this might be used for those as well. http://gerrit.cloudera.org:8080/#/c/8034/2/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: PS2, Line 229: memcpy(val_ptr, dict_[index], sizeof(T)); The memcpy is doing a dereference here and is not different from *val_ptr = *dict_[index]; PS2, Line 238: std::vector dict_; Using a T* means that GetValue() must do a dereference to obtain the value. This is a very performance sensitive codepath, so we need to avoid this extra dereference. This should maintain the value directly in the vector. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh
Joe McDonnell has uploaded a new change for review. http://gerrit.cloudera.org:8080/8081 Change subject: IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh .. IMPALA-5941: Fix Metastore schema creation in create-test-configuration.sh The Hive Metastore schema script includes other SQL scripts using \i, which expects absolute paths. Since we currently invoke it from outside the schema script directory, it is unable to find those included scripts. The fix is to switch to the Hive Metastore script directory when invoking the schema script. Change-Id: Ic312df4597c7d211d4ecd551d572f751aea0cd24 --- M bin/create-test-configuration.sh 1 file changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/8081/1 -- To view, visit http://gerrit.cloudera.org:8080/8081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic312df4597c7d211d4ecd551d572f751aea0cd24 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8056/1/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: Line 68 Avoid unrelated whitespace diffs. One way of getting a graphical diff that can help with this is to use the tool meld. For example: git difftool -y -t meld Where could be asf-gerrit/master or origin/master or whatnot. PS1, Line 96: fq_tbl_name = "functional_parquet" + "." + tbl_name I'm wary of creating tables in our default schemas. This won't get cleaned up, and it is subtle behavior. If we can create the new table in the unique_database that would be nice PS1, Line 98: create = ("create table {0} stored as parquet as select * from functional.alltypes" : .format(fq_tbl_name)) I think we need to verify that the right options are being set when we create this table. As I understand it, you need to specify the query option compression_codec = none to create a parquet file without compression. https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_compression_codec.html -- To view, visit http://gerrit.cloudera.org:8080/8056 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1649 Pass precision to Print TCounterType::DOUBLE VALUE
Joe McDonnell has posted comments on this change. Change subject: IMPALA-1649 Pass precision to Print TCounterType::DOUBLE_VALUE .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7725/2/be/src/util/pretty-printer.h File be/src/util/pretty-printer.h: PS2, Line 142: case TUnit::DOUBLE_VALUE: { : double output = static_cast(value); : ss << std::setprecision(precision) << output << " "; I think TUnit::DOUBLE_VALUE is so similar to TUnit::UNIT that we should be looking to remove uses of DOUBLE_VALUE and allow UNIT to specify a precision. -- To view, visit http://gerrit.cloudera.org:8080/7725 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icc63d8a59bdf175341097df143798fb1a957d93f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/exec-node.h File be/src/exec/exec-node.h: PS1, Line 215: MemTracker* decoder_mem_tracker() { return decoder_mem_tracker_.get(); } The dictionary is only used for Parquet, so I think the MemPool should not be in the ExecNode, as that is used for a large number of other things. Look into moving this down to HdfsParquetScanner (or reuse dictionary_pool_). http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS1, Line 213: dict_decoder_(new MemPool(parent->scan_node_->decoder_mem_tracker())), It is important not to create objects on a per-column basis unless truly necessary. I think it makes more sense to have a single MemPool up at the HdfsParquetScanner level. Look at how dictionary_pool_ works. I think it might be better to reuse that pool. http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: PS1, Line 234: *val_ptr = *dict_[index]; dict_ needs to return T's directly rather than T*'s. This is a performance critical path, and an extra dereference is too expensive. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Hello Sailesh Mukil, Dan Hecht, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7730 to look at the new patch set (#12). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/global-flags.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 55 files changed, 455 insertions(+), 215 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/12 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 11: (1 comment) Rebased all the way and incorporated changes from IMPALA-5892. http://gerrit.cloudera.org:8080/#/c/7730/11/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: PS11, Line 361: This > The "this" was confusing the first time I read it (I thought it was referri Done -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5888: free other local allocations in Parquet
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5888: free other local allocations in Parquet .. Patch Set 2: Code-Review+1 This makes sense to me. -- To view, visit http://gerrit.cloudera.org:8080/7933 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7792552510b54aa95044e44218e3351a36d6f9a8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5892: Allow reporting status independent of fragment instance
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7943 to look at the new patch set (#3). Change subject: IMPALA-5892: Allow reporting status independent of fragment instance .. IMPALA-5892: Allow reporting status independent of fragment instance Queries can hit an error that is not specific to a particular fragment instance. For example, QueryState::StartFInstances() calls DescriptorTbl::Create() before any fragment instances start. This location has no reason to report status via a particular fragment, and there is currently no way to report status otherwise. This leads to a query hang, because the status is never propagated back to the coordinator. This adds the ability to report status that is not associated with a particular fragment instance. By reporting status, the coordinator will now correctly abort the query in the case of the QueryState::StartFInstances() scenario described. Change-Id: I4cd98022f1d62a999c7c80ff5474fa8d069eb12c --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/query-state.cc M common/thrift/ImpalaInternalService.thrift 6 files changed, 73 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/7943/3 -- To view, visit http://gerrit.cloudera.org:8080/7943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4cd98022f1d62a999c7c80ff5474fa8d069eb12c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5892: Allow reporting status independent of fragment instance
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5892: Allow reporting status independent of fragment instance .. Patch Set 2: (3 comments) Also rebased all the way. http://gerrit.cloudera.org:8080/#/c/7943/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: Line 221: DCHECK(requests_fragment_detail || omits_fragment_args); > how about just: Done http://gerrit.cloudera.org:8080/#/c/7943/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: Line 873: // in that case). Coordinator fragment failed in this case so we log the query_id. > not your change (and doesn't have to be addressed now) but this comment doe I think the first sentence makes some sense. If there is already an error in query_status_, then UpdateStatus() will return that rather than the status passed in. If a fragment hit an error and called CancelInternal(), that would call ReleaseResources(), which closes the coord_sink_. That should cause GetNext() to return an error, but not the original error. The second sentence doesn't match anything that I can see. http://gerrit.cloudera.org:8080/#/c/7943/2/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS2, Line 613: =cdfklnoru > some random characters snuck in Done -- To view, visit http://gerrit.cloudera.org:8080/7943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4cd98022f1d62a999c7c80ff5474fa8d069eb12c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Hello Sailesh Mukil, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7730 to look at the new patch set (#11). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/global-flags.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 55 files changed, 491 insertions(+), 239 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/11 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 10: (1 comment) Also rebased. http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS8, Line 332: ery > My test shows that the fragments that are not on the coordinator node will Moved the call to ReportExecStatusAux to after all fragment instances are prepared and verified that this works. The code to do this is ugly (because it relies on calling ReportExecStatusAux using the failed fis), and it will require IMPALA-5892 to become cleaner (where we can pass fis=nullptr to ReportExecStatusAux). -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS8, Line 332: true > Ah right. So I think the assumption is that the coordinator may not be able My test shows that the fragments that are not on the coordinator node will exit. (This test is the equivalent of instances_started=false with complete RPC failure.) However, if the message doesn't get to the coordinator, then the coordinator fragments will hang (seemingly indefinitely). I'm using a simple query, so it could be that this is not representative. -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS8, Line 332: true > What's the consequence of using instances_started=false? Did you verify th instances_started only impacts the case when there are 3 repeated RPC failures to the coordinator. I will run a test to see what happens. -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Hello Sailesh Mukil, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7730 to look at the new patch set (#10). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/global-flags.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 55 files changed, 480 insertions(+), 238 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/10 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 8: (5 comments) http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: Line 226: if (fis->IsPrepared()) { > should this check that the prepare status is ok? if Prepare() failed, then The first thing that Prepare() does is allocate the RuntimeState. It has a comment saying not to return before that. See https://github.com/apache/incubator-impala/blob/master/be/src/runtime/fragment-instance-state.cc#L119 We also check whether fis->profile() is null. So, I think this should be safe. When IMPALA-5892 goes in, I will just use fis==nullptr. That will make this part of the change unnecessary, as nothing needs to send status for an unprepared fragment instance. Line 328: ExecEnv::GetInstance()->query_exec_mgr()->ReleaseQueryState(this); > this is a little weird since ReleaseQueryState() can delete this. Except it Added a comment. PS8, Line 332: true > passing instances_started=true can lead to a deadlock if the RPC send fails Good point, changed this to instances_started=false. If the RPC to the coordinator fails, I think there are other ways to hang, but this is clearly wrong. I'll think about whether there is a better way to handle this case. PS8, Line 336: update fis_map_ > nit: seems excessive Done http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/runtime/thread-resource-mgr.h File be/src/runtime/thread-resource-mgr.h: Line 288: if (!in_callback) InvokeCallbacks(); > As we discussed in person, I don't think it's safe even in that case genera I went with option #1. I added a comment with a warning to thread-resource-mgr.h and noted why we were using skip_callbacks in kudu-scan-node.cc and hdfs-scan-node.cc. -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5892: Allow reporting status independent of fragment instance
Joe McDonnell has uploaded a new patch set (#2). Change subject: IMPALA-5892: Allow reporting status independent of fragment instance .. IMPALA-5892: Allow reporting status independent of fragment instance Queries can hit an error that is not specific to a particular fragment instance. For example, QueryState::StartFInstances() calls DescriptorTbl::Create() before any fragment instances start. This location has no reason to report status via a particular fragment, and there is currently no way to report status otherwise. This leads to a query hang, because the status is never propagated back to the coordinator. This adds the ability to report status that is not associated with a particular fragment instance. By reporting status, the coordinator will now correctly abort the query in the case of the QueryState::StartFInstances() scenario described. Change-Id: I4cd98022f1d62a999c7c80ff5474fa8d069eb12c --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/query-state.cc M common/thrift/ImpalaInternalService.thrift 6 files changed, 79 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/7943/2 -- To view, visit http://gerrit.cloudera.org:8080/7943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4cd98022f1d62a999c7c80ff5474fa8d069eb12c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-5892: Allow reporting status independent of fragment instance
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5892: Allow reporting status independent of fragment instance .. Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/7943/1/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: PS1, Line 219: *failed_instance_id = failed_instance_id_; > see header question -- this is why it seems we should require that the call Changed the semantic so that the caller must specify both or neither. PS1, Line 309: independently > independent of an instance Rewrote this comment. PS1, Line 310: incorporated > incorporated to where? and what is "here" referring to, given that you jus Rewrote this comment. PS1, Line 312: cumulative_status > that name confused me because it's doesn't seem like this is the cumulative Changed this to "overall_backend_status" and rewrote the comment. http://gerrit.cloudera.org:8080/#/c/7943/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: Line 95: /// (with not specific fragment responsible). > nit: s/not/no/? Done PS1, Line 100: has_failed_instance == false > what does that mean? do you mean '*has_failed_instance == false' or somethi Changed the semantics so that the caller must specify both or neither. Also updated the comment to make the semantics clear. http://gerrit.cloudera.org:8080/#/c/7943/1/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: PS1, Line 375: Status UpdateStatus(const Status& status, const TUniqueId& failed_fragment, : const std::string& instance_hostname, : bool has_failed_instance = true) WARN_UNUSED_RESULT; > If we keep this one method, let's be careful with our use of default parame Changed the ordering and made the arguments required. http://gerrit.cloudera.org:8080/#/c/7943/1/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: PS1, Line 628: This incorporates status from : // TFragmentInstanceExecStatus as well as errors that occur independently > given today's implementation it doesn't matter, but from a protocol standpo Updated the comment to be clear about the priorities. We don't currently report multiple errors, but I specified what I think is a reasonable semantic for that case. -- To view, visit http://gerrit.cloudera.org:8080/7943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4cd98022f1d62a999c7c80ff5474fa8d069eb12c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: PS8, Line 39: class TGetAllCatalogObjectsResponse; > nit: Not your change, but no longer required. Done http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: PS7, Line 240: SetDoneInternal(); > We previously didn't call materialized_row_batches_->Shutdown() here, but n This previously called Shutdown() when the last scanner thread is done. I looked into this, and there doesn't seem to be any reason to wait. It is safe to call Shutdown() when the other scanner threads are still around. We do that for the HDFS version of this. We also do it if we reach a limit. I looked at the different methods on the queue, and I don't see any problem. http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/rpc/auth-provider.h File be/src/rpc/auth-provider.h: PS8, Line 22: #include > No longer needed. Done http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/rpc/thrift-thread.cc File be/src/rpc/thrift-thread.cc: PS8, Line 40: throw atc::SystemResourceException("Thread::Create() failed"); > is it worth incorporating the status msg into the exception message? I think that makes sense. The status has the category and thread name, which could be useful to us. http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 28: #include "common/hdfs.h" > nit: Include what you use, status.h (it's already being pulled in through o Done http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: PS8, Line 341: admission_controller_->Init() > The admission controller dequeue thread starts a little later than before n I think it is ok. ExecEnv::StartServices() is called from impalad-main.cc soon after constructing the ExecEnv and before starting the backend or beeswax or hs2 servers. There should be nothing to admit. One thing I noticed is that StartServices() is not called from FeSupport. I think that should be ok, but I'm checking. http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/runtime/thread-resource-mgr.h File be/src/runtime/thread-resource-mgr.h: Line 288: if (!in_callback) InvokeCallbacks(); > This explains why we need to skip the callbacks. But why is it safe to do s You're right. This is only safe from a callback if a thread did TryAcquireThreadToken and then wants to return it without using it. This correct usage is equivalent to just passing on picking up a thread token. If the callback did a release of an unrelated thread, then that would be a problem, because we won't find a replacement for that thread. There is no real way to tell if something has called TryAcquireThreadToken. We don't give it an object or anything. I can augment the comment to detail that it must be used in a very targeted way, or there could be separate functions. http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/service/child-query.h File be/src/service/child-query.h: PS8, Line 22: > boost/thread/mutex.hpp Done http://gerrit.cloudera.org:8080/#/c/7730/8/be/src/statestore/statestore.h File be/src/statestore/statestore.h: Line 30: > nit: include what you use: status.h Done -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7730 to look at the new patch set (#9). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/global-flags.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 55 files changed, 471 insertions(+), 238 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/9 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5892: Allow reporting status independent of fragment instance
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5892: Allow reporting status independent of fragment instance .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7943/1/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: PS1, Line 375: Status UpdateStatus(const Status& status, const TUniqueId& failed_fragment, : const std::string& instance_hostname, : bool has_failed_instance = true) WARN_UNUSED_RESULT; Might be cleaner to add two functions UpdateFragmentStatus(status, failed_fragment, instance_hostname) and UpdateGeneralStatus(status, instance_hostname). -- To view, visit http://gerrit.cloudera.org:8080/7943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4cd98022f1d62a999c7c80ff5474fa8d069eb12c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5892: Allow reporting status independent of fragment instance
Joe McDonnell has uploaded a new change for review. http://gerrit.cloudera.org:8080/7943 Change subject: IMPALA-5892: Allow reporting status independent of fragment instance .. IMPALA-5892: Allow reporting status independent of fragment instance Queries can hit an error that is not specific to a particular fragment instance. For example, QueryState::StartFInstances() calls DescriptorTbl::Create() before any fragment instances start. This location has no reason to report status via a particular fragment, and there is currently no way to report status otherwise. This leads to a query hang, because the status is never propagated back to the coordinator. This adds the ability to report status that is not associated with a particular fragment instance. By reporting status, the coordinator will now correctly abort the query in the case of the QueryState::StartFInstances() scenario described. Change-Id: I4cd98022f1d62a999c7c80ff5474fa8d069eb12c --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/query-state.cc M common/thrift/ImpalaInternalService.thrift 6 files changed, 58 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/7943/1 -- To view, visit http://gerrit.cloudera.org:8080/7943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4cd98022f1d62a999c7c80ff5474fa8d069eb12c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: PS7, Line 173: scanner_threads_.AddThread(move(t)); > There may be another race here where if the KuduScanNode is closed before w Here is my understanding: SetDone() in Close() gets lock_ before calling SetDoneInternal(), so it holds lock_ regardless of whether done_ is true. The thread in ThreadAvailableCb() also gets lock_ before reading done_. So, they agree about done_ and either the thread in Close waits for the thread in ThreadAvailableCb() to add the thread and will need to join with it, or the thread in ThreadAvailableCb() will see done_==true and bail. I might be missing something. -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7730 to look at the new patch set (#8). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/global-flags.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 55 files changed, 465 insertions(+), 235 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/8 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: PS7, Line 371: COUNTER_ADD(num_scanner_threads_started_counter_, 1); > The problem with moving this here is when the node is under heavy stress, t Yes, this could be off. I suppose there is no way around it. http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: PS7, Line 171: ++num_active_scanners_; > This can cause quite a few races. It can race with L220, L244 and L245. num_active_scanners_ is protected by lock_. We get lock_ at the top of this loop and the other locations get lock_. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS6, Line 59: status = Thread::Create("query-exec-mgr", : Substitute("start-query-finstances-$0", PrintId(query_id)), : &QueryExecMgr::StartQueryHelper, this, qs, &t, true); > Done In testing, I found that this also needs qs->ReleaseInitialReservationRefcount(), which was taken in Init(). Added. http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS7, Line 335: // Fragment instance successfully started : // update fis_map_ : fis_map_.emplace(fis->instance_id(), fis); : // update fragment_map_ : vector& fis_list = fragment_map_[instance_ctx.fragment_idx]; : fis_list.push_back(fis); > Is it safe to update the map with the Fragment instance state after already My thought process is something like this: fis is a local variable. Both fis_map_ and fragment_map_ are private variables on QueryState. None of these variables are referenced outside this QueryState except through APIs protected by the promise. It looks like the fragment instance doesn't care about using the functions that would access these variables. Worth considering if there is any way around it, but I think it is safe. http://gerrit.cloudera.org:8080/#/c/7730/7/be/src/util/thread.cc File be/src/util/thread.cc: PS7, Line 303: rand() > Not to be pedantic, but a rand() without seeding the PRNG first, will cause I was thinking it might make sense for us to do a single srand() at startup. I will look into the appropriate place to do this. -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 6: (11 comments) http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: PS6, Line 159: Status status; : status = > nit: combine these lines Done PS6, Line 163: num_scanner_threads_started_counter_ > here and other scan node: I guess we undo this rather than just waiting unt I checked, and this is only used as a statistic and for making the names of the threads unique. I changed this to increment after successful thread create (here and for hdfs-scan-node.cc). (A side effect is that the thread indexes start from zero.) http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS6, Line 59: RETURN_IF_ERROR(Thread::Create("query-exec-mgr", : Substitute("start-query-finstances-$0", PrintId(query_id)), : &QueryExecMgr::StartQueryHelper, this, qs, &t, true)); > Yes, that looks like a bug. I will fix in the next upload. Done http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS6, Line 199: // If this fragment instance is not prepared, it must be reporting an error. : DCHECK(fis->IsPrepared() || !status.ok()); > If fis==null case doesn't report the error, then that's a bug we need to fi Looking into a fix. I will also do fault injection at that point to see what happens. The two might be slightly different, because in that case no fragment instances have been started. PS6, Line 331: // Remove fis from fis_map_ and fragment_map_ : fis_map_.erase(fis->instance_id()); : fis_list.pop_back(); > why don't we just do this on the success path? is it to try to be consisten I checked and all other uses of these two things check that the instances_preapred_promise_ is set. So, I moved it to the success path, since that is safe. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/thread-resource-mgr.h File be/src/runtime/thread-resource-mgr.h: PS6, Line 115: bool in_callback = false); > this seems complicated. why is it needed now? ReleaseThreadToken calls InvokeCallbacks. The two locations where we use this (hdfs-scan-node.cc and kudu-scan-node.cc) are in functions that are registered as callbacks. If we try to release inside the callback, this is mutual recursion. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/service/child-query.h File be/src/service/child-query.h: Line 159: Status ExecAsync(std::vector&& child_queries); > WARN_UNUSED_RESULT (or are we relying on the Status annotation now?) Added WARN_UNUSED_RESULT. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: PS6, Line 598: Thread::Create("query-exec-state", "wait-thread", : &ClientRequestState::Wait, this, &wait_thread_); > Good point. I will add it to the list of locations that are eligible for fa Done http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: PS6, Line 71: Status status; : status = request_state->WaitAsync(); > nit: combine Done http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/util/thread.cc File be/src/util/thread.cc: Line 304: if ((nanos % 100) == 1) { > Good point, will make that change. Changed to use rand() Line 327: thread->swap(t); > *thread = move(t) seems clearer Done -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7730 to look at the new patch set (#7). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/global-flags.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 55 files changed, 464 insertions(+), 235 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/7 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 6: (9 comments) Still going through the comments, but I thought I'd put up some quick replies. http://gerrit.cloudera.org:8080/#/c/7730/6//COMMIT_MSG Commit Message: Line 28: variables or direct declarations in classes. > any reason Thread and ThreadPool don't follow the same pattern? i.e. both u ThreadPool uses the constructor+init pattern because we subclass it for the CallableThreadPool. Subclassing gets awkward with the static create pattern. If someone forgets to run Init(), they find out when they offer a piece of work to the ThreadPool. For Thread, I think it would be feasible to use the constructor+init pattern. The difference is that if someone forgets an Init(), they might be expecting a thread and it just isn't there. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: PS6, Line 363: pool->ReleaseThreadToken(false, true); > Do you need to drop the lock before calling this? Based on the comment from Yes, that is what in_callback is for. ReleaseThreadToken can call these thread availability callbacks, so we could end up back in this code. The in_callback=true prevents that. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS6, Line 59: RETURN_IF_ERROR(Thread::Create("query-exec-mgr", : Substitute("start-query-finstances-$0", PrintId(query_id)), : &QueryExecMgr::StartQueryHelper, this, qs, &t, true)); > Shouldn't we call ReleaseQueryState(qs); if the thread failed to start here Yes, that looks like a bug. I will fix in the next upload. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS6, Line 199: // If this fragment instance is not prepared, it must be reporting an error. : DCHECK(fis->IsPrepared() || !status.ok()); > Is this necessarily true? The race might exist. I'm looking at the code to check. The old DCHECK is stricter than the new one. Right now, we require that fis is nullptr or fis->IsPrepared(). I'm carving out an exception for failure during thread spawn when the fis won't have a chance to be prepared. PS6, Line 199: // If this fragment instance is not prepared, it must be reporting an error. : DCHECK(fis->IsPrepared() || !status.ok()); > I don't really understand this DCHECK. oh, because we pass 'fis' in the ne In testing, I noticed that fis==nullptr doesn't actually convey the status. If I use fis==nullptr, the coordinator never gets notified of the status and the query hangs. When I went through ReportExecStatusAux, I noticed that apart from some DCHECKs, we don't use the status argument unless fis is non-null. TReportExecStatusParams doesn't really have a place for status apart from in the TFragmentInstanceExecStatus's. PS6, Line 328: prepare_status > why is this called "prepare_status"? the old 'prepare_status' was the stat The idea is to share the codepath below, which is waiting for the fragment instance states to be prepared. To incorporate the thread create status into the instances_prepared_promise_, the status needs to be available outside the loop. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: PS6, Line 598: Thread::Create("query-exec-state", "wait-thread", : &ClientRequestState::Wait, this, &wait_thread_); > Why not inject the thread failures here as well? I think it's an important Good point. I will add it to the list of locations that are eligible for fault injection. I will be doing more test runs as I go, so this will now be tested. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/util/thread.cc File be/src/util/thread.cc: Line 304: if ((nanos % 100) == 1) { > on some platforms, the monotonic clock may not actually have nanosecond res Good point, will make that change. http://gerrit.cloudera.org:8080/#/c/7730/6/be/src/util/thread.h File be/src/util/thread.h: PS6, Line 158: thread > nit: 'thread' Done -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7730/5/be/src/util/thread.cc File be/src/util/thread.cc: Line 48: DEFINE_bool(thread_creation_fault_injection, false, "Causes calls to Thread::Create() " > Nice! That makes sense. It is good for it to be close to the other fault injection options. I made this only for DEBUG builds. I think that is sufficient for now. -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7730 to look at the new patch set (#6). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/global-flags.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 55 files changed, 460 insertions(+), 227 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/6 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has uploaded a new patch set (#5). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. If the scan node fails to spawn any scanner thread, it will abort the query. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: This introduces the parameter thread_creation_fault_injection, which will cause Thread::Create() calls in eligible locations to fail randomly roughly 1% of the time. Quite a few locations of Thread::Create() and ThreadPool::Init() are necessary for startup and cannot be eligible. However, all the locations used for query execution are marked as eligible and governed by this parameter. The code was tested by setting this parameter to true and running queries to verify that queries either run to completion with the correct result or fail with appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 54 files changed, 454 insertions(+), 227 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/5 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/7730/3//COMMIT_MSG Commit Message: Line 28: variables or direct declarations in classes. > Yeah, it does seem like there's an important distinction there. I labeled the locations that are eligible for fault injection, and then added a parameter to turn on thread creation fault injection. This should make testing much easier. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 361: COUNTER_ADD(&active_scanner_thread_counter_, -1); > Oh I see. Seems ok. I'm a little skeptical that it's worth trying to recove There are arguments in either direction. Given that we think this should be rare, I think it makes sense to go ahead and abort the query. I changed this code and kudu-scan-node.cc to do that. http://gerrit.cloudera.org:8080/#/c/7730/4/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 164: UndoGetNextScanToken(token); > In this case it does seem simpler to fail the query, since it we mess this Changed this to fail the query. http://gerrit.cloudera.org:8080/#/c/7730/4/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: Line 330: // Either get this to work or remove before merge. > Still relevant? Removed Line 350: Status instance_status = entry.second->WaitForPrepare(); > I think we should preserve the postcondition that all fragment instances ha That makes sense. One thing that I noticed when making this change is that in some circumstances the fragment instance will get into an RPC call before the query starts cancelling. In this case, the RPC call needs to timeout for the query to complete its cancel. The query does eventually end and nothing crashes. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread-pool.h File be/src/util/thread-pool.h: Line 61: > Yeah I think that would make most sense to me - reduces the number of diffe Fixed this by doing Shutdown() + Join() in the error condition. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread.cc File be/src/util/thread.cc: Line 301: unique_ptr t(new Thread(category, name)); > Sounds like a good idea to me. Added. -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has uploaded a new patch set (#4). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Query execution code paths will now handle the error: 1. Non-MT scan nodes, such as HDFS scan node and Kudu scan node require at least one scanner thread to make progress. This patch will abort the query if the scan node cannot obtain at least one scanner thread. However, if the scan node has at least one scanner thread, the query will continue. 2. Failing to spawn a fragment instance from the query state in StartFInstances() will correctly report the error to the coordinator and tear down the query. Testing: Thread::Create() has an optional argument to randomly inject thread creation errors at any single point in the code. It causes roughly 1% of the calls to fail. The code was tested by setting this option in different code points and verifying that the queries either run to completion with the correct result or fail with the appropriate status. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-base.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/thread-resource-mgr.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/generate_error_codes.py 56 files changed, 444 insertions(+), 225 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/4 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/7730/3//COMMIT_MSG Commit Message: Line 28: variables or direct declarations in classes. > For testing, it might be worth adding a stress startup option to make threa I have done hand testing on a variety of codepaths. I'm looking into automating the testing. It is useful to make a distinction between the thread creates that are necessary for startup and/or instance life and those that are only needed for query success. I think tagging the query locations can lead to a repeatable test: either the queries return the right answer or they return this error. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 361: COUNTER_ADD(num_scanner_threads_started_counter_, -1); > It seems ok to omit fixing up num_scanner_threads_started_counter_. I don't I'm ok with either way. My thought is that in cases where the query doesn't abort we may want to know how many threads actually ran. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/exec/kudu-scan-node.cc File be/src/exec/kudu-scan-node.cc: Line 154: ++num_active_scanners_; > This is protected by lock_ so we could just wait until the thread is create Done http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/rpc/thrift-thread.cc File be/src/rpc/thrift-thread.cc: Line 37: if (!status.ok()) throw atc::SystemResourceException("Thread::Create() failed"); > This could use some explanation, since it's an unusual idiom in the code. I Added a comment for this, but it is a bit unclear to me how thrift handles the exception. It seems like it gets sent back to the client, but I'm not 100% sure. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: Line 469: HS2_RETURN_ERROR(return_val, status.GetDetail(), SQLSTATE_GENERAL_ERROR); > It looks like we have identical error-handling code in three places. May be Changed to gotos and shared code. PS3, Line 476: (void) > discard_result Done http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread-pool.h File be/src/util/thread-pool.h: Line 61: /// error spawning the threads. > How do things get cleaned up on an error? Does Shutdown() handle that case? When we get an error, only successfully started threads are in the threads_ ThreadGroup. ~ThreadPool calls Shutdown() and Join(), so we know that the threads will be cleaned up eventually. The owner of the ThreadPool might also call Shutdown() and Join(), and this is fine, because both of those functions are idempotent. The only thing the owner can do wrong is to Join() without calling Shutdown(), which makes no sense. It might make sense to have the error case automatically call Shutdown() + Join(). http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread.cc File be/src/util/thread.cc: Line 301: return Status(e.what()); > Might be worth adding a status code to common/thrift/generate_error_codes.p Added this. I don't think the category or name would be useful to the customer, but if they would be useful to us, we can add it to the error message. http://gerrit.cloudera.org:8080/#/c/7730/3/be/src/util/thread.h File be/src/util/thread.h: PS3, Line 69: std::unique_ptr& > we'd normally use pointers for output arguments - https://google.github.io/ Done -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Hello Impala Public Jenkins, Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7640 to look at the new patch set (#10). Change subject: IMPALA-5352: Age out unused file handles from the cache .. IMPALA-5352: Age out unused file handles from the cache Currently, a file handle in the file handle cache will only be evicted if the cache reaches its capacity. This means that file handles can be retained for an indefinite amount of time. This is true even for files that have been deleted, replaced, or modified. Since a file handle maintains a file descriptor for local files, this can prevent the disk space from being freed. Additionally, unused file handles are wasted memory. This adds code to evict file handles that have been unused for longer than a specified threshold. A thread periodically checks the file handle cache to see if any file handle should be evicted. The threshold is specified by 'unused_file_handle_timeout_sec'; it defaults to 6 hours. This adds a test to custom_cluster/test_hdfs_fd_caching.py to verify the eviction behavior. Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e --- M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M tests/custom_cluster/test_hdfs_fd_caching.py 4 files changed, 172 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7640/10 -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5352: Age out unused file handles from the cache .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/7640/9/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: PS9, Line 190: bool timed_out = true; : // This Get() will time out until shutdown, when the promise is set. : while (timed_out) { > this is weird. normally I'm not in favor of while (true), but in this case Yes, that is better. Changed it. -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Hello Impala Public Jenkins, Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7640 to look at the new patch set (#9). Change subject: IMPALA-5352: Age out unused file handles from the cache .. IMPALA-5352: Age out unused file handles from the cache Currently, a file handle in the file handle cache will only be evicted if the cache reaches its capacity. This means that file handles can be retained for an indefinite amount of time. This is true even for files that have been deleted, replaced, or modified. Since a file handle maintains a file descriptor for local files, this can prevent the disk space from being freed. Additionally, unused file handles are wasted memory. This adds code to evict file handles that have been unused for longer than a specified threshold. A thread periodically checks the file handle cache to see if any file handle should be evicted. The threshold is specified by 'unused_file_handle_timeout_sec'; it defaults to 6 hours. This adds a test to custom_cluster/test_hdfs_fd_caching.py to verify the eviction behavior. Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e --- M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M tests/custom_cluster/test_hdfs_fd_caching.py 4 files changed, 171 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7640/9 -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5352: Age out unused file handles from the cache .. Patch Set 8: Fixed the run time of disk-io-mgr-test by changing to an interruptible wait. -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7640 to look at the new patch set (#5). Change subject: IMPALA-5352: Age out unused file handles from the cache .. IMPALA-5352: Age out unused file handles from the cache Currently, a file handle in the file handle cache will only be evicted if the cache reaches its capacity. This means that file handles can be retained for an indefinite amount of time. This is true even for files that have been deleted, replaced, or modified. Since a file handle maintains a file descriptor for local files, this can prevent the disk space from being freed. Additionally, unused file handles are wasted memory. This adds code to evict file handles that have been unused for longer than a specified threshold. A thread periodically checks the file handle cache to see if any file handle should be evicted. The threshold is specified by 'unused_file_handle_timeout_sec'; it defaults to 6 hours. This adds a test to custom_cluster/test_hdfs_fd_caching.py to verify the eviction behavior. Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e --- M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M tests/custom_cluster/test_hdfs_fd_caching.py 4 files changed, 169 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7640/5 -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5352: Age out unused file handles from the cache .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/7640/4/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 94: ~FileHandleCache(); > Worth mentioning this is only used for backend tests. Done PS4, Line 187: true > SHUTDOWN_TRUE. Done http://gerrit.cloudera.org:8080/#/c/7640/4/tests/custom_cluster/test_hdfs_fd_caching.py File tests/custom_cluster/test_hdfs_fd_caching.py: Line 63: def run_fd_caching_test(self, vector, caching_expected, cache_capacity, \ > nit: line continuation shouldn't be needed inside parens Done PS4, Line 113: ( > nit: unnecessary parens Good point. Fixed. PS4, Line 150: ): > nit: unnecessary parens Done -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has uploaded a new patch set (#3). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool requires that Init() be called after the ThreadPool is constructed. 3. To propagate the Status, Threads cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Threads now use unique_ptr's for management in all cases. Threads cannot be used as stack-allocated local variables or direct declarations in classes. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalogd-main.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/rpc/thrift-thread.cc M be/src/rpc/thrift-thread.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-subscriber.h M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/testutil/in-process-servers.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h 47 files changed, 297 insertions(+), 149 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/3 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. Patch Set 1: (17 comments) http://gerrit.cloudera.org:8080/#/c/7730/1//COMMIT_MSG Commit Message: Line 38: 3. The Thread::Create and ThreadPool::Create have > Would unique_ptr do the job? I think you will need to support unique_ptr an Changed to unique_ptr http://gerrit.cloudera.org:8080/#/c/7730/2//COMMIT_MSG Commit Message: Line 18: 1. util/thread.h's Thread constructor is now private > I think it's a matter of taste, but in some ways I prefer the constructor + Yes, the constructor + Init() also makes subclassing easier. I switched ThreadPool over to this approach. I think it makes more sense for that. Thread seems like either would work, so I'm sticking with a static factory method for now. Line 34: 1. QueryState::StartFInstances() needs appropriate > One option is to defer the tricky cases til a follow-up patch and bring dow I did that for query-state.cc, but I think that is probably the most common place for new threads to hit this exception, so it would be really nice to get it working. http://gerrit.cloudera.org:8080/#/c/7730/1/be/src/exec/blocking-join-node.cc File be/src/exec/blocking-join-node.cc: Line 197: RETURN_IF_ERROR(Thread::Create(FragmentInstanceState::FINST_THREAD_GROUP_NAME, > Do we need to release the thread token on this error path? Good point, this definitely needs to release the thread token. Fixed. http://gerrit.cloudera.org:8080/#/c/7730/2/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: Line 226: stop_ = true; > Should we throw an exception here? What's the behavior when it continues ex Setting stop_=true means we will not execute the loop and instead will do shutdown cleanup in the "if (stop_)" block and exit the function. Exiting the function stops the server. At various points in this file, we catch exceptions, so I think we don't want an exception. http://gerrit.cloudera.org:8080/#/c/7730/1/be/src/runtime/parallel-executor.cc File be/src/runtime/parallel-executor.cc: Line 39: RETURN_IF_ERROR(Thread::Create("parallel-executor", ss.str(), > If we created some threads I think we still need to call JoinAll() to avoid Yes, that definitely needs to happen. Those threads reference stack variables, so it would be very dangerous to just leave. Changed to do the JoinAll(). http://gerrit.cloudera.org:8080/#/c/7730/1/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: Line 339: // TODO: This error handling needs work. > I think worst-case we could do a LOG(FATAL) for now and come back to fix th Changed to LOG(FATAL) for the moment, but I am still looking into getting this to work. http://gerrit.cloudera.org:8080/#/c/7730/1/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: PS1, Line 74: (void) U > should use discard_result() once you rebase onto my change (GCC 7 does not Done http://gerrit.cloudera.org:8080/#/c/7730/1/be/src/statestore/statestore.h File be/src/statestore/statestore.h: Line 404: boost::scoped_ptr> subscriber_topic_update_threadpool_; > long lines Done http://gerrit.cloudera.org:8080/#/c/7730/2/be/src/statestore/statestored-main.cc File be/src/statestore/statestored-main.cc: Line 71: ABORT_IF_ERROR(RegisterMemoryMetrics(metrics.get(), false, nullptr, nullptr)); > You also need to do: Done PS2, Line 72: StartMemoryMaintenanceThread(); > ABORT_IF_ERROR(StartMemoryMaintenanceThread()); Done http://gerrit.cloudera.org:8080/#/c/7730/1/be/src/util/thread-pool.h File be/src/util/thread-pool.h: Line 123: Status CreateThreads(const std::string& group, const std::string& thread_prefix, > Should add WARN_UNUSED_RESULT to make sure it's not dropped anywhere. Done Line 185: static Status Create(const std::string& group, const std::string& thread_prefix, > Should add WARN_UNUSED_RESULT to make sure it's not dropped anywhere. Done http://gerrit.cloudera.org:8080/#/c/7730/2/be/src/util/thread-pool.h File be/src/util/thread-pool.h: Line 46: /// -- work_function: the function to run every time an item is consumed from the queue > Update comment to include 'tpool' arg. Changed approach on ThreadPool to make it follow the constructor + Init() pattern, so this change is gone. PS2, Line 184: /// TODO: add comment > Comment? Removed PS2, Line 188: DCHECK(pool == nullptr); : boost::scoped_ptr local_tpool; : local_tpool.reset(new CallableThreadPool(queue_size, &CallableThreadPool::Worker)); : RETURN_IF_ERROR(local_tpool->CreateThreads(group, thread_prefix, num_threads)); : pool.swap(local_tpool); : return Status::OK(); > Why not just call the parent's Create() here? Switching to the constructor + Init() pattern eliminated the need for this code. Now Call
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7640 to look at the new patch set (#4). Change subject: IMPALA-5352: Age out unused file handles from the cache .. IMPALA-5352: Age out unused file handles from the cache Currently, a file handle in the file handle cache will only be evicted if the cache reaches its capacity. This means that file handles can be retained for an indefinite amount of time. This is true even for files that have been deleted, replaced, or modified. Since a file handle maintains a file descriptor for local files, this can prevent the disk space from being freed. Additionally, unused file handles are wasted memory. This adds code to evict file handles that have been unused for longer than a specified threshold. A thread periodically checks the file handle cache to see if any file handle should be evicted. The threshold is specified by 'unused_file_handle_timeout_sec'; it defaults to 6 hours. This adds a test to custom_cluster/test_hdfs_fd_caching.py to verify the eviction behavior. Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e --- M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M tests/custom_cluster/test_hdfs_fd_caching.py 4 files changed, 167 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7640/4 -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5352: Age out unused file handles from the cache .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7640/3/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: PS3, Line 137:/// Move constructor : FileHandleEntry(FileHandleEntry&& entry) : : fh(std::move(entry.fh)), lru_entry(entry.lru_entry) { : DCHECK(!entry.in_use); : } > Is this necessary? I think the compiler should generate a move constructor Good point, it isn't needed. Removed. -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5352: Age out unused file handles from the cache .. Patch Set 2: (12 comments) Also rebased to the latest http://gerrit.cloudera.org:8080/#/c/7640/2//COMMIT_MSG Commit Message: Line 25: This adds a test to custom_cluster/test_hdfs_fd_caching.py > worth thinking about testing this on a cluster where there is a huge burst Since this is happening every second, the cache should have a limited number of file handles to clean up on a single partition in a single iteration. http://gerrit.cloudera.org:8080/#/c/7640/2/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: PS2, Line 97: Returns an error if thread creation fails. > now it's void :) Done http://gerrit.cloudera.org:8080/#/c/7640/2/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 49: shut_down_.Store(SHUTDOWN_FALSE); > nit: I think we can just initialize this in the initialiser list. Done Line 67: if (eviction_thread_.get() != nullptr) { > nit: conditional fits on one line. Done PS2, Line 64: template : FileHandleCache::~FileHandleCache() { : shut_down_.Store(SHUTDOWN_TRUE); : if (eviction_thread_.get() != nullptr) { : eviction_thread_->Join(); : } : } > this never actually runs because this has process-lifetime, right? Since this is part of the DiskIoMgr object, disk-io-mgr-test.cc covers this. Some of the tests construct a DiskIoMgr with a smart pointer and later reset that pointer. PS2, Line 116: // This emplace_hint requires constructing a pair in the map. Emplacing an : // element with a multiple argument constructor (such as FileHandleEntry) : // introduces ambiguity between the key's arguments and the value's arguments. : // To resolve the ambiguity, use a piecewise construction and separate the : // arguments for the key from the value. : typename MapType::iterator new_it = p.cache.emplace_hint(range.second, : std::piecewise_construct, std::forward_as_tuple(*fname), : std::forward_as_tuple(new_fh, p.lru_list)); > Yeah I'm wondering if there's a simpler way to express it. Changed this to use a move constructor. This is not performance critical. We can revisit this later if it bothers us. PS2, Line 196: FileHandleCache::FileHandleCachePartition& p > how about moving this function into FileHandleCachePartition, then calling I added detailed comments about the lru list manipulations, but I don't think putting it in a separate method makes sense. I looked at the function signatures and they would need a lot of explanation in the function comments, so I would rather have the comments inline. My main thought is that this code is one coherent whole, and anyone doing maintenance or improvement on it is going to have to understand every line. I'm reluctant to add code structures to hide anything. Right now, FileHandleCachePartition is purely for memory layout. My general feeling is that adding methods to it and turning FileHandleCachePartition into a full object is overkill. It moves code around, but it doesn't really change anything. PS2, Line 197: uint64_t now = MonotonicSeconds(); > maybe this should be determined in EvictHandlesLoop() before you start taki This is called every second. I don't think there needs to be any consistency between partitions. PS2, Line 198: valid > valid is a bit vague Changed to "allowed" PS2, Line 199: 0 > hitting this branch seems extremely unlikely, only if the user sets the tim Changed the non-eviction test to set the flag to a very large value. PS2, Line 202: lru_entry = p.lru_list.front(); : typename MapType::iterator evict_it = lru_entry.map_entry; : uint64_t lru_oldest_timestamp > it'd be nice to use the same naming convention for these vars since they're Done http://gerrit.cloudera.org:8080/#/c/7640/2/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 112: DEFINE_uint64(unused_file_handle_timeout, 43200, "Maximum time, in seconds, that an " > I do like having the units in the name, would you mind appending _sec ? Done -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7640 to look at the new patch set (#3). Change subject: IMPALA-5352: Age out unused file handles from the cache .. IMPALA-5352: Age out unused file handles from the cache Currently, a file handle in the file handle cache will only be evicted if the cache reaches its capacity. This means that file handles can be retained for an indefinite amount of time. This is true even for files that have been deleted, replaced, or modified. Since a file handle maintains a file descriptor for local files, this can prevent the disk space from being freed. Additionally, unused file handles are wasted memory. This adds code to evict file handles that have been unused for longer than a specified threshold. A thread periodically checks the file handle cache to see if any file handle should be evicted. The threshold is specified by 'unused_file_handle_timeout_sec'; it defaults to 6 hours. This adds a test to custom_cluster/test_hdfs_fd_caching.py to verify the eviction behavior. Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e --- M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M tests/custom_cluster/test_hdfs_fd_caching.py 4 files changed, 173 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7640/3 -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has uploaded a new patch set (#2). Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Systems with a large number of nodes and threads are hitting this limit. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool constructor is now private and all ThreadPools are constructed via a new Create() static factory method. 3. To propagate the Status, Threads and ThreadPools cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Since Threads and ThreadPools need to be passed as arguments to the Create method, this eliminates the ability to use them as stack-allocated local variables or direct declarations in classes. These have been replaced with scoped_ptr's. Remaining TODO: 1. QueryState::StartFInstances() needs appropriate cleanup if thread create fails. This is intricate. 2. There is some duplicate code between ThreadPool and CallableThreadPool to try to eliminate. 3. The Thread::Create and ThreadPool::Create have both pointer and scoped_ptr versions. It would be nice to only have one version. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-thread.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/scheduling/admission-controller.cc M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/util/hdfs-bulk-ops.cc M be/src/util/hdfs-bulk-ops.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h 38 files changed, 371 insertions(+), 177 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/2 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-5750: Catch exceptions from boost thread creation
Joe McDonnell has uploaded a new change for review. http://gerrit.cloudera.org:8080/7730 Change subject: IMPALA-5750: Catch exceptions from boost thread creation .. IMPALA-5750: Catch exceptions from boost thread creation The boost thread constructor will throw boost::thread_resource_error if it is unable to spawn a thread on the system (e.g. due to a ulimit). This uncaught exception crashes Impala. Several customers are seeing this. This change catches the exception from the thread constructor and converts it to a Status. This requires several changes: 1. util/thread.h's Thread constructor is now private and all Threads are constructed via a new Create() static factory method. 2. util/thread-pool.h's ThreadPool constructor is now private and all ThreadPools are constructed via a new Create() static factory method. 3. To propagate the Status, Threads and ThreadPools cannot be created in constructors, so this is moved to initialization methods that can return Status. 4. Since Threads and ThreadPools need to be passed as arguments to the Create method, this eliminates the ability to use them as stack-allocated local variables or direct declarations in classes. These have been replaced with scoped_ptr's. Remaining TODO: 1. QueryState::StartFInstances() needs appropriate cleanup if thread create fails. This is intricate. 2. There is some duplicate code between ThreadPool and CallableThreadPool to try to eliminate. 3. The Thread::Create and ThreadPool::Create have both pointer and scoped_ptr versions. It would be nice to only have one version. Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 --- M be/src/benchmarks/thread-create-benchmark.cc M be/src/catalog/catalog-server.cc M be/src/common/init.cc M be/src/common/init.h M be/src/exec/blocking-join-node.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/kudu-scan-node.cc M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-thread.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/parallel-executor.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/scheduling/admission-controller.cc M be/src/service/child-query.cc M be/src/service/child-query.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-hs2-server.cc M be/src/service/impala-server.cc M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M be/src/statestore/statestored-main.cc M be/src/testutil/in-process-servers.cc M be/src/util/hdfs-bulk-ops.cc M be/src/util/hdfs-bulk-ops.h M be/src/util/thread-pool-test.cc M be/src/util/thread-pool.h M be/src/util/thread.cc M be/src/util/thread.h 38 files changed, 371 insertions(+), 177 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/7730/1 -- To view, visit http://gerrit.cloudera.org:8080/7730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Joe McDonnell has uploaded a new patch set (#2). Change subject: IMPALA-5352: Age out unused file handles from the cache .. IMPALA-5352: Age out unused file handles from the cache Currently, a file handle in the file handle cache will only be evicted if the cache reaches its capacity. This means that file handles can be retained for an indefinite amount of time. This is true even for files that have been deleted, replaced, or modified. Since a file handle maintains a file descriptor for local files, this can prevent the disk space from being freed. Additionally, unused file handles are wasted memory. This adds code to evict file handles that have been unused for longer than a specified threshold. A thread periodically checks the file handle cache to see if any file handle should be evicted. The threshold is specified by 'unused_file_handle_timeout'; it defaults to 6 hours. This adds a test to custom_cluster/test_hdfs_fd_caching.py to verify the eviction behavior. Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e --- M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M tests/custom_cluster/test_hdfs_fd_caching.py 4 files changed, 151 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7640/2 -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5352: Age out unused file handles from the cache .. Patch Set 1: (14 comments) http://gerrit.cloudera.org:8080/#/c/7640/1/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: PS1, Line 98: Status > comment on when this returns an error Changed to void. This was anticipating IMPALA-5750, but I will just change this in that patch. PS1, Line 126: struct LruListEntry { : LruListEntry(typename MapType::iterator map_entry_in); : typename MapType::iterator map_entry; : uint64_t timestamp_seconds; : }; > It'd be nice to have a validate() method to check internal consistency of t Changed the semantics of the lru_entry on FileHandleEntry so that it can DCHECK that it is being used correctly. Looking into whether there are other checks that make sense. Line 170: /// Periodic check to evict unused file handles > maybe mention "executed by eviction_thread_". Done Line 178: size_t total_capacity_; > total_capacity_ could do with a short comment. Alternatively I'm not sure t Removed PS1, Line 180: Maximum time that a file handle can be unused before > Maximum time before an unused file handle is aged out of the cache. Done. That is much better. Line 187: bool shut_down_ = false; > It would be better to use an atomic so that it's obviously correct and ther Done http://gerrit.cloudera.org:8080/#/c/7640/1/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: PS1, Line 73: Status > Might as well be a void fn Done Line 76: if (unused_handle_timeout_secs_ != 0 && total_capacity_ != 0) { > It's worth considering if we should just always run the thread for simplici Changed to always run the thread. That eliminates the need for total_capacity_. Line 78: &FileHandleCache::EvictHandlesLoop, this)); > nit: 4spaces Done PS1, Line 187: 1000 > Might be helpful to make this a constant in the class. Done PS1, Line 197: true > Can we combine this line and the one below? Done http://gerrit.cloudera.org:8080/#/c/7640/1/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: PS1, Line 101: insures > nit: ensures Done http://gerrit.cloudera.org:8080/#/c/7640/1/tests/custom_cluster/test_hdfs_fd_caching.py File tests/custom_cluster/test_hdfs_fd_caching.py: PS1, Line 137: unused_file_handle_timeout_mins > might be better to take the timeout in seconds, then you can make the timeo Changed the timeout to seconds and lowered the value. PS1, Line 147: 60 > Maybe extract into a local variable and mention that it's derived from unus Changed to take the timeout in seconds. It looks like we don't have any parameters that are floating point, so I thought it would be easier in seconds. I introduced a variable to clarify the test. -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5598: Fix excessive dumping in MemLimitExceeded
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7597 to look at the new patch set (#4). Change subject: IMPALA-5598: Fix excessive dumping in MemLimitExceeded .. IMPALA-5598: Fix excessive dumping in MemLimitExceeded ExecQueryFInstances RPC timeouts in stress tests were traced to excessive dumping in MemLimitExceeded() and LogUsage(). The QueryState::Init() hits the process memory limit, and this causes MemLimitExceeded to dump the process memory tracker. On the stress test, this involves dumping hundreds of queries and all the structures underneath. This is expensive and the contention between threads accessing these structures causes RPC timeouts. This adds the ability to the limit the recursive depth when dumping memory trackers. It also modifies the dumping in MemLimitExceeded() to have the following semantics: 1. The query memory tracker is always logged in full if it exists. 2. The process memory tracker is logged if the query memory tracker doesn't exist or if the process memory limit is being hit. The process memory tracker is limited to dumping only two levels of children. Other uses of LogUsage() are not limited (e.g. /memz and the query memory page on the web UI). A stress test run with the process memory tracker limited to two levels showed no RPC timeouts. Change-Id: Ib151c3894d4c43b508779e6809d77e2f8af89cf2 --- M be/src/exec/exec-node.cc M be/src/exec/hash-table-test.cc M be/src/runtime/bufferpool/reservation-tracker-test.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/runtime-state.cc M be/src/service/impala-http-handler.cc M be/src/util/default-path-handlers.cc 9 files changed, 61 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/7597/4 -- To view, visit http://gerrit.cloudera.org:8080/7597 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib151c3894d4c43b508779e6809d77e2f8af89cf2 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5598: Fix excessive dumping in MemLimitExceeded
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5598: Fix excessive dumping in MemLimitExceeded .. Patch Set 3: (2 comments) Also rebased to the latest. http://gerrit.cloudera.org:8080/#/c/7597/3/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: PS3, Line 353: logged_query_tracker > rather than introducing another control variable, why not just 'state == nu Done http://gerrit.cloudera.org:8080/#/c/7597/3/be/src/runtime/mem-tracker.h File be/src/runtime/mem-tracker.h: PS3, Line 338: summary of the queries > does this mean it it dumps the query-level tracker but nothing below? if so Yes, it has a single-line summary of the query tracker. I updated the comment to indicate this. -- To view, visit http://gerrit.cloudera.org:8080/7597 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib151c3894d4c43b508779e6809d77e2f8af89cf2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5598: Fix excessive dumping in MemLimitExceeded
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5598: Fix excessive dumping in MemLimitExceeded .. Patch Set 2: (1 comment) Fix long line and fix a backend test I had missed. http://gerrit.cloudera.org:8080/#/c/7597/2/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 218:<< state->instance_mem_tracker()->LogUsage(MemTracker::UNLIMITED_DEPTH); > Long line Done -- To view, visit http://gerrit.cloudera.org:8080/7597 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib151c3894d4c43b508779e6809d77e2f8af89cf2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5598: Fix excessive dumping in MemLimitExceeded
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7597 to look at the new patch set (#3). Change subject: IMPALA-5598: Fix excessive dumping in MemLimitExceeded .. IMPALA-5598: Fix excessive dumping in MemLimitExceeded ExecQueryFInstances RPC timeouts in stress tests were traced to excessive dumping in MemLimitExceeded() and LogUsage(). The QueryState::Init() hits the process memory limit, and this causes MemLimitExceeded to dump the process memory tracker. On the stress test, this involves dumping hundreds of queries and all the structures underneath. This is expensive and the contention between threads accessing these structures causes RPC timeouts. This adds the ability to the limit the recursive depth when dumping memory trackers. It also modifies the dumping in MemLimitExceeded() to have the following semantics: 1. The query memory tracker is always logged in full if it exists. 2. The process memory tracker is logged if the query memory tracker doesn't exist or if the process memory limit is being hit. The process memory tracker is limited to dumping only two levels of children. Other uses of LogUsage() are not limited (e.g. /memz and the query memory page on the web UI). A stress test run with the process memory tracker limited to two levels showed no RPC timeouts. Change-Id: Ib151c3894d4c43b508779e6809d77e2f8af89cf2 --- M be/src/exec/exec-node.cc M be/src/exec/hash-table-test.cc M be/src/runtime/bufferpool/reservation-tracker-test.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/runtime-state.cc M be/src/service/impala-http-handler.cc M be/src/util/default-path-handlers.cc 9 files changed, 62 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/7597/3 -- To view, visit http://gerrit.cloudera.org:8080/7597 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib151c3894d4c43b508779e6809d77e2f8af89cf2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5598: Fix excessive dumping in MemLimitExceeded
Joe McDonnell has uploaded a new patch set (#2). Change subject: IMPALA-5598: Fix excessive dumping in MemLimitExceeded .. IMPALA-5598: Fix excessive dumping in MemLimitExceeded ExecQueryFInstances RPC timeouts in stress tests were traced to excessive dumping in MemLimitExceeded() and LogUsage(). The QueryState::Init() hits the process memory limit, and this causes MemLimitExceeded to dump the process memory tracker. On the stress test, this involves dumping hundreds of queries and all the structures underneath. This is expensive and the contention between threads accessing these structures causes RPC timeouts. This adds the ability to the limit the recursive depth when dumping memory trackers. It also modifies the dumping in MemLimitExceeded() to have the following semantics: 1. The query memory tracker is always logged in full if it exists. 2. The process memory tracker is logged if the query memory tracker doesn't exist or if the process memory limit is being hit. The process memory tracker is limited to dumping only two levels of children. Other uses of LogUsage() are not limited (e.g. /memz and the query memory page on the web UI). A stress test run with the process memory tracker limited to two levels showed no RPC timeouts. Change-Id: Ib151c3894d4c43b508779e6809d77e2f8af89cf2 --- M be/src/exec/exec-node.cc M be/src/exec/hash-table-test.cc M be/src/runtime/initial-reservations.cc M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/runtime-state.cc M be/src/service/impala-http-handler.cc M be/src/util/default-path-handlers.cc 8 files changed, 61 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/7597/2 -- To view, visit http://gerrit.cloudera.org:8080/7597 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib151c3894d4c43b508779e6809d77e2f8af89cf2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5598: Fix excessive dumping in MemLimitExceeded
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5598: Fix excessive dumping in MemLimitExceeded .. Patch Set 1: (3 comments) Rebased all the way. http://gerrit.cloudera.org:8080/#/c/7597/1/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: PS1, Line 341: (process_capacity < failed_allocation_size) > nit: unnecessary parens. Done PS1, Line 344: 1 > +1 I was wondering about this as well. The pool tracker is probably not goi Changed this to two levels. Stress tests show that this isn't a problem. I also added constants for the code to use. http://gerrit.cloudera.org:8080/#/c/7597/1/be/src/runtime/mem-tracker.h File be/src/runtime/mem-tracker.h: PS1, Line 324: int max_recursive_levels = INT_MAX > Maybe we should make it a required argument to force new callsites to think Changed this to a required argument. -- To view, visit http://gerrit.cloudera.org:8080/7597 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib151c3894d4c43b508779e6809d77e2f8af89cf2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5352: Age out unused file handles from the cache
Joe McDonnell has uploaded a new change for review. http://gerrit.cloudera.org:8080/7640 Change subject: IMPALA-5352: Age out unused file handles from the cache .. IMPALA-5352: Age out unused file handles from the cache Currently, a file handle in the file handle cache will only be evicted if the cache reaches its capacity. This means that file handles can be retained for an indefinite amount of time. This is true even for files that have been deleted, replaced, or modified. Since a file handle maintains a file descriptor for local files, this can prevent the disk space from being freed. Additionally, unused file handles are wasted memory. This adds code to evict file handles that have been unused for longer than a specified threshold. A thread periodically checks the file handle cache to see if any file handle should be evicted. The threshold is specified by 'unused_file_handle_timeout_mins'; it defaults to 6 hours. This adds a test to custom_cluster/test_hdfs_fd_caching.py to verify the eviction behavior. Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e --- M be/src/runtime/disk-io-mgr-handle-cache.h M be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr.cc M tests/custom_cluster/test_hdfs_fd_caching.py 4 files changed, 138 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/7640/1 -- To view, visit http://gerrit.cloudera.org:8080/7640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iefe04b3e2e22123ecb8b3e494934c93dfb29682e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations .. Patch Set 6: Code-Review+1 This looks good to me. -- To view, visit http://gerrit.cloudera.org:8080/7380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idb1fa3110dc893321f9f4e8ced6b7ede12194dad Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7380/5/be/src/runtime/mem-tracker.h File be/src/runtime/mem-tracker.h: PS5, Line 266: uint64_t val = consumption_metric_->value(); : if (negate_consumption_metric_) val = -val; val needs to be signed. -- To view, visit http://gerrit.cloudera.org:8080/7380 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idb1fa3110dc893321f9f4e8ced6b7ede12194dad Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes