[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog
Paul Rogers has abandoned this change. ( http://gerrit.cloudera.org:8080/11688 ) Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog .. Abandoned Change has become too involved for a simple ramp-up task; really requires the deep understanding of the metadata team. Please incorporate the specific changes if/where they make sense. -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11688 ) Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/11688/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11688/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@807 PS2, Line 807: null, // Don't include the schema; it is not used Are we sure we don't have any other unintended consequences here? Tracking the callers of toHmsPartition(), we also use it [1] to make changes to this partition on the HMS side. Do we also wipe out the FieldSchemas on the HMS side if that happens? [1] https://github.com/apache/impala/blob/449fe73d2145bd22f0f857623c3652a097f06d73/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3078 http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169 PS1, Line 1169: // The per-partition schema information is not used by Impala. > Good idea. In looking into this, the catalogd caches Impala-specific object I'm not quite sure I understand why we still need this on the LocalCatalog side even after stripping them on the Catalog server side. Rephrasing it, why do we hit this line with the "original Partition object" even after the fix in toHmsPartition()? -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sun, 28 Oct 2018 19:52:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11688 ) Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169 PS1, Line 1169: // The per-partition schema information is not used by Impala. > Good idea. In looking into this, the catalogd caches Impala-specific object The background here is that we were hoping to be able to run Impala in something approaching "direct from HMS" mode (see DirectMetaProvider). Thus, it makes sense to expose an API that looks not too dissimilar from the HMS itself. Since then, we switched back to a design that uses catalogd as a cache, and we need to decide whether we will stick with caching in catalogd or try to push forward to pulling everything from HMS. Of course, even if we pull from HMS, we could still convert it to internal formats before caching, which might make sense. The one particular potential tricky bit here is that, currently, HdfsPartition (the internal partition object) is "relative" to a given HdfsTable object in the sense that it references the table's hostIndex_, etc. We ended up solving that here by having a cache-local host index, so maybe the same solution could be done. Another wrinkle is that HdfsPartition is, if I recall correctly, not immutable. It would be nice to have an immutable stripped-down Partition object that we could use for caching. -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 26 Oct 2018 23:55:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11688 ) Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1182/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 26 Oct 2018 23:07:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11688 ) Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog .. Patch Set 2: (2 comments) Updated with changes from code review comments. http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169 PS1, Line 1169: // The per-partition schema information is not used by Impala. > good idea, that would reduce network bandwidth noticeably considering we do Good idea. In looking into this, the catalogd caches Impala-specific objects, then recreates an HMS Partition object, passing in the table schema when creating the StorageDescriptor(). The fix is to simply not add the schema. When running in local catalog mode, we hit this line, with the original Partition object, so we still need this line. There was discussion on some list or other about the idea of caching the Impala catalog objects rather than the HMS objects. The Impala objects appear to be stripped down. By caching HMS objects, we will find ourselves going through the same process that likely led to those light-weight Impala objects. http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java: http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@71 PS1, Line 71: > checkArgument Done -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 26 Oct 2018 22:33:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog
Hello Bharath Vissapragada, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11688 to look at the new patch set (#2). Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog .. IMPALA-7501: Slim down metastore Partition objects in LocalCatalog Used existing method to remove cached per-partition column information as requested in the JIRA ticket. See IMPALA-7501 for details. Then, revised catalogd to not include the unwanted partition schema info when re-creating an HMS parition object. Cleaned up a few unused imports in test files used during this effort. Tested by inspecting a heap dump before and after the change to ensure that the column schema is no longer referenced from the Hive Partition objects. (At present, we have no automated way to inspect a heap dump.) Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d --- M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java 5 files changed, 5 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/11688/2 -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11688 ) Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169 PS1, Line 1169: // The per-partition schema information is not used by Impala. > Can we slim them down even before serializing it at the Catalog server? Mig good idea, that would reduce network bandwidth noticeably considering we don't currently do compression on the wire -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 26 Oct 2018 17:00:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11688 ) Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169 PS1, Line 1169: // The per-partition schema information is not used by Impala. Can we slim them down even before serializing it at the Catalog server? Might as well save some CPU and RPC costs https://github.com/apache/impala/blob/1dbb3c1be634c2a3e356c3b2ee5deac2436c9815/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1727 -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 26 Oct 2018 01:21:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11688 ) Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/11688/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11688/1//COMMIT_MSG@14 PS1, Line 14: objects. (At present, we have no automated way to inspect a heap dump.) fwiw I tested changes like this in the past by starting an impalad, querying some table(s) to populate the cache, and then using 'jmap -histo:live' on the impalad. See 1dd4403fdcae665b0f343bf3fb7cb644c1875851's commit message for an example along with a script that can be handy to compare before/after jmap histo dumps. Not super critical to go back and do it for this change if you're busy with other stuff. http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java: http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@71 PS1, Line 71: checkState checkArgument -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 25 Oct 2018 23:36:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11688 ) Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache .. Patch Set 1: Hey Todd, you asked for this fix. Can you take a look at this change? Thanks! -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 25 Oct 2018 23:25:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache
Paul Rogers has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11688 Change subject: IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache .. IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache Used existing method to remove cached per-partition column information as requested in the JIRA ticket. See IMPALA-7501 for details. Tested by inspecting a heap dump before and after the change to ensure that the column schema is no longer referenced from the Hive Partition objects. (At present, we have no automated way to inspect a heap dump.) Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d --- M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java 2 files changed, 7 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/11688/1 -- To view, visit http://gerrit.cloudera.org:8080/11688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d Gerrit-Change-Number: 11688 Gerrit-PatchSet: 1 Gerrit-Owner: Paul Rogers