[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
Pranay Singh has abandoned this change. ( http://gerrit.cloudera.org:8080/10587 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh (320) Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh (320) Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations .. Patch Set 4: (10 comments) http://gerrit.cloudera.org:8080/#/c/10587/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10587/4//COMMIT_MSG@9 PS4, Line 9: new row this should apply more generally to new files, not just rows? either way, the wording here is inconsistent with the title, which refers to "file-only operations". http://gerrit.cloudera.org:8080/#/c/10587/4//COMMIT_MSG@17 PS4, Line 17: the all? which partitions? http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1417 PS4, Line 1417: a nit: an http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1420 PS4, Line 1420: by calling nit: from http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@254 PS4, Line 254: reloads the table schema from Hive Meta Store can in-line this comment directly on L259. same with the other descriptions of what each flag does when set. doing so will remove some redundancy in the comment. http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@585 PS4, Line 585: noneOf(MetaDataLoadFlag.class); : flags.add simplify to .of(...) since we're always adding RELOAD_PARTITION_METADATA. http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@663 PS4, Line 663: MetaDataLoadFlag of type 'RELOAD_TABLE_METADATA', 'RELOAD_PARTITION_METADATA', :* 'RELOAD_FILE_METADATA' replace with just flags. http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@670 PS4, Line 670: msTbl why is this pulled out now as a parameter. I see one place where its used-- what requires this change? if its really needed, please add a comment (e.g., what does a null msTbl do?) http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3484 PS4, Line 3484: noneOf(MetaDataLoadFlag.class); : flags.add(MetaDataLoadFlag.RELOAD_FILE_METADATA) simplify to of(...) since the flag on L3485 is always included. http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3486 PS4, Line 3486: == true remove -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 01 Aug 2018 23:26:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
Hello Bharath Vissapragada, Balazs Jeszenszky, Todd Lipcon, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10587 to look at the new patch set (#4). Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations .. IMPALA-6994: Avoid reloading a table's HMS data for file-only operations The problem is that while inserting a new row to an existing partition of a HDFS table or to an unpartitioned table, catalogd makes an unnecessary call to Hive MetaStore to do a getTable() and list the partitions of the table. In such a case where a row is being inserted to an existing partition or to an unpartitioned table only the file metadata for HDFS tables needs to be updated. This change avoids calling the Hive Meta Store in updatePartitionsFromHms() by providing a hint from the caller, to skip loading the partitions. Testing: Ran core test without failure. Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 2 files changed, 126 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/4 -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1251 PS3, Line 1251: RELOAD_PARTITION_METADATA if not set in flags results in not getting the :* partition details from HiveMetaStore in updatePartitionsFromHms(). > nit: this double negative is a bit confusing. Perhaps it's clearer to rephr Changed the comment, thanks for rewording it. http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1254 PS3, Line 1254: or if there are no partitions in the table > I'm not sure I follow this part. How does the caller know if there are no p That's right the unpartitioned case doesn't have to consider this, I have removed the sentence. -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 18:57:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations .. Patch Set 3: (3 comments) Just left some notes on the HdfsTable side. I dont know the CatalogOpExecutor code well so probably better for someone else to take a look at that part. http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1251 PS3, Line 1251: RELOAD_PARTITION_METADATA if not set in flags results in not getting the :* partition details from HiveMetaStore in updatePartitionsFromHms(). nit: this double negative is a bit confusing. Perhaps it's clearer to rephrase like: If RELOAD_PARTITION_METADATA is set in flags, the partition details will be reloaded from the HMS. In some cases, this may not be necessary, and when unset, an expensive call to the HMS can be avoided. http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1254 PS3, Line 1254: or if there are no partitions in the table I'm not sure I follow this part. How does the caller know if there are no partitions in the table without fetching the list of partitions? In the case of an unpartitioned table (no partition keys) the code already takes care of this internally without the caller having to pass it, right? http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1418 PS3, Line 1418: Updates the partitions of an HdfsTable so that they are in sync with the Hive :* Metastore. this doesn't seem to be an accurate description of the function, since the function also reloads file metadata, right? Maybe the function should just be called updatePartitionMetadata? Then it's clearer that the flags specify whether we want to update the file metadata, the list of partitions, or both, right? -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 20 Jul 2018 16:57:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 ) Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations > nit: add a space Done http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@10 PS2, Line 10: unecessary > nit: check spelling and remove extra spaces. Done http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@9 PS2, Line 9: unpartitioned HDFS table, : or to an existing partition > simplify to just HDFS table, or is there some case that's being excluded? Done http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@21 PS2, Line 21: Testing: Ran core test without failure. > Is there any new scenario that you want to test for this change? If existin Yes I want to test the scenario where concurrent removal of partition and insertion of row to the same partition happens. a) Impala is inserting the row to an existing partition b) Hive is removing the partition at the same time. http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1252 PS2, Line 1252: nonewPartitionHint > needs a comment. Done http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1252 PS2, Line 1252: nonewPartitionHint > needs a comment. Added the comment http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1365 PS2, Line 1365: loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true); > any way to refactor this function so you don't need to duplicate this code Refactored the code and added a new function to reload the partition. http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@640 PS2, Line 640: private void loadTableMetadata(Table tbl, long newCatalogVersion, > the arguments here are getting quite unwieldly, and it seems that the two e Made changes to the code to use enum MetadataLoadFlag -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 19 Jul 2018 18:27:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
Hello Bharath Vissapragada, Todd Lipcon, Vuk Ercegovac, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10587 to look at the new patch set (#3). Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations .. IMPALA-6994: Avoid reloading a table's HMS data for file-only operations The problem is that while inserting a new row to an existing partition of a HDFS table or to an unpartitioned table, catalogd makes an unnecessary call to Hive MetaStore to do a getTable() and list the partitions of the table. In such a case where a row is being inserted to an existing partition or to an unpartitioned table only the file metadata for HDFS tables needs to be updated. This change avoids calling the Hive Meta Store in updatePartitionsFromHms() by providing a hint from the caller, to skip loading the partitions. Testing: Ran core test without failure. Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 2 files changed, 126 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/3 -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 ) Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1365 PS2, Line 1365: loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true); any way to refactor this function so you don't need to duplicate this code from line 1429-1435 up to here? perhaps extracting another function for all of the partition-reloading code and only call it when you dont have the 'no new partitions' hint? http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@640 PS2, Line 640: private void loadTableMetadata(Table tbl, long newCatalogVersion, the arguments here are getting quite unwieldly, and it seems that the two existing booleans have a positive sense ("DO reload filemetadata" and "DO reload table schema") whereas this new boolean has a negative sense "DONT reload data from HMS". How about refactoring this slightly to use an enum and an EnumSet, something like: static enum MetadataLoadFlag { RELOAD_TABLE_METADATA, RELOAD_PARTITION_METADATA, RELOAD_FILE_METADATA } void loadTableMetadata(Table tbl, long newCatalogVersion, EnumSet flags, Set partitionsToUpdate, Table msTable); .. and perhaps add Preconditions that validate that the optional msTable, partitionsToUpdate, etc params are only set when it makes sense according to the flags? I think this would make the call sites and code a lot more understandable. -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 06 Jul 2018 17:54:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 ) Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations nit: add a space http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@9 PS2, Line 9: remove the description block's indent http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@10 PS2, Line 10: unecessary nit: check spelling and remove extra spaces. http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@9 PS2, Line 9: unpartitioned HDFS table, : or to an existing partition simplify to just HDFS table, or is there some case that's being excluded? http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@14 PS2, Line 14: This extra call to HMS introduces a point of failure, we need to handle : for error scenario when Hive MetaStore crashes So is this change an optimization, a correctness/robustness fix, or both? Also, I don't understand the point about a failure-- pls add an example with a sequence of steps that currently leads to an error. Can add such an example to the jira. http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@15 PS2, Line 15: This change removes the : extra call to Hive Meta Store for the case when a row is inserted to an : existing partition in HDFS table or when a row is added to unpartitioned : table. Thus, an optimization as it reduces the call to Hive Meta Store : during Update of Catalog. lots of redundancy here with the prev. paragraph-- pls condense or remove. http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@21 PS2, Line 21: Testing: Ran core test without failure. Is there any new scenario that you want to test for this change? If existing tests cover it, is there any way to assert that this change is operating as expected? http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1252 PS2, Line 1252: nonewPartitionHint needs a comment. -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 06 Jul 2018 16:19:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
Pranay Singh has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10587 Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations .. IMPALA-6994:Avoid reloading a table's HMS data for file-only operations The problem is that while inserting a new row to an unpartitioned HDFS table, or to an existing partition, the catalogd makes unecessary call to Hive Meta Store to do a getTable() and list the partitions of the table, when in fact only the file metadata for HDFS tables needs to be updated. This extra call to HMS introduces a point of failure, we need to handle for error scenario when Hive MetaStore crashes. This change removes the extra call to Hive Meta Store for the case when a row is inserted to an existing partition in HDFS table or when a row is added to unpartitioned table. Thus, an optimization as it reduces the call to Hive Meta Store during Update of Catalog. Testing: Ran core test without failure. Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java 2 files changed, 34 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/1 -- To view, visit http://gerrit.cloudera.org:8080/10587 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c Gerrit-Change-Number: 10587 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Pranay Singh has abandoned this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410 PS2, Line 1410: partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate); : loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true); > I don't think so. In line 1404, the dirtyPartitions are all added to partit Yes missed on that one, I'll rollback this change. -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 21 May 2018 15:53:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410 PS2, Line 1410: partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate); : loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true); > The new change behaves very much like the old code except for the case when I don't think so. In line 1404, the dirtyPartitions are all added to partitionsToRemove. When dirtyPartitions exist, partitionsToRemove won't be empty. Thus the if-branch won't be chosen and the else-branch performs the same as the old codes. -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 20 May 2018 01:27:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: > > (2 comments) > > > > Just interested in this optimization. May I ask some questions? > > > > Looks like we are optimizing the case when partitionsToUpdate != > > null and partitions were neither dropped, created. Can we > optimize > > the case that partitionsToUpdate != null and some partitions are > > dropped? For example when an INSERT OVERWRITE statement updates > the > > majority of the partitions and only drops few of them. > > Will this case not introduce inconsistency between HMS and Impala? The problem with optimization is that introduces inconsistency, something which my change introduces too, when ALTER TABLE is done and HMS crashes. -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 19 May 2018 18:18:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: > (2 comments) > > Just interested in this optimization. May I ask some questions? > > Looks like we are optimizing the case when partitionsToUpdate != > null and partitions were neither dropped, created. Can we optimize > the case that partitionsToUpdate != null and some partitions are > dropped? For example when an INSERT OVERWRITE statement updates the > majority of the partitions and only drops few of them. Will this case not introduce inconsistency between HMS and Impala? -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 19 May 2018 18:15:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: (2 comments) > (2 comments) > > Just interested in this optimization. May I ask some questions? > > Looks like we are optimizing the case when partitionsToUpdate != > null and partitions were neither dropped, created. Can we optimize > the case that partitionsToUpdate != null and some partitions are > dropped? For example when an INSERT OVERWRITE statement updates the > majority of the partitions and only drops few of them. Will this case not cause introduce inconsistency between HMS and Impala ? http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1409 PS2, Line 1409: size() == 0 > nit: can be simplified by isEmpty() OK http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410 PS2, Line 1410: partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate); : loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true); > Looks like the original codes perform the same as these two lines. Since dr The new change behaves very much like the old code except for the case when dirtyPartitions exist in that case there is an overhead of dropping and loading the dirty partitions from Metastore. -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 19 May 2018 18:14:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: (2 comments) Just interested in this optimization. May I ask some questions? Looks like we are optimizing the case when partitionsToUpdate != null and partitions were neither dropped, created. Can we optimize the case that partitionsToUpdate != null and some partitions are dropped? For example when an INSERT OVERWRITE statement updates the majority of the partitions and only drops few of them. http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1409 PS2, Line 1409: size() == 0 nit: can be simplified by isEmpty() http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410 PS2, Line 1410: partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate); : loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true); Looks like the original codes perform the same as these two lines. Since dropPartitions, loadPartitionsFromMetastore and loadPartitionsFromMetastore will return fast when their first parameter is empty. partitionExist is not empty means partitionsToUpdate != null, so the original codes will perform as these two lines. My question is what are the HMS requests we save? -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 19 May 2018 13:17:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Pranay Singh has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10450 Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. Catalog currently loads the HMS data even, if only the file metadata load is requested. This change detects some of those updates and tries to skip the loading of HMS data if only the file metadata load is requested. Testing: Ran the core test without failures. Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 1 file changed, 39 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10450/1 -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 1 Gerrit-Owner: Pranay Singh