[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

2018-11-01 Thread Paul Rogers (Code Review)
Paul Rogers has abandoned this change. ( http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog
..


Abandoned

Change has become too involved for a simple ramp-up task; really requires the 
deep understanding of the metadata team. Please incorporate the specific 
changes if/where they make sense.
--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

2018-10-28 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11688/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11688/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@807
PS2, Line 807: null, // Don't include the schema; it is not used
Are we sure we don't have any other unintended consequences here? Tracking the 
callers of toHmsPartition(), we also use it [1] to make changes to this 
partition on the HMS side. Do we also wipe out the FieldSchemas on the HMS side 
if that happens?

[1] 
https://github.com/apache/impala/blob/449fe73d2145bd22f0f857623c3652a097f06d73/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3078


http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169
PS1, Line 1169:   // The per-partition schema information is not used by 
Impala.
> Good idea. In looking into this, the catalogd caches Impala-specific object
I'm not quite sure I understand why we still need this on the LocalCatalog side 
even after stripping them on the Catalog server side.

Rephrasing it, why do we hit this line with the "original Partition object" 
even after the fix in toHmsPartition()?



--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sun, 28 Oct 2018 19:52:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

2018-10-26 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169
PS1, Line 1169:   // The per-partition schema information is not used by 
Impala.
> Good idea. In looking into this, the catalogd caches Impala-specific object
The background here is that we were hoping to be able to run Impala in 
something approaching "direct from HMS" mode (see DirectMetaProvider). Thus, it 
makes sense to expose an API that looks not too dissimilar from the HMS itself.

Since then, we switched back to a design that uses catalogd as a cache, and we 
need to decide whether we will stick with caching in catalogd or try to push 
forward to pulling everything from HMS.

Of course, even if we pull from HMS, we could still convert it to internal 
formats before caching, which might make sense. The one particular potential 
tricky bit here is that, currently, HdfsPartition (the internal partition 
object) is "relative" to a given HdfsTable object in the sense that it 
references the table's hostIndex_, etc. We ended up solving that here by having 
a cache-local host index, so maybe the same solution could be done.

Another wrinkle is that HdfsPartition is, if I recall correctly, not immutable. 
It would be nice to have an immutable stripped-down Partition object that we 
could use for caching.



--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 26 Oct 2018 23:55:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

2018-10-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1182/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 26 Oct 2018 23:07:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

2018-10-26 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog
..


Patch Set 2:

(2 comments)

Updated with changes from code review comments.

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169
PS1, Line 1169:   // The per-partition schema information is not used by 
Impala.
> good idea, that would reduce network bandwidth noticeably considering we do
Good idea. In looking into this, the catalogd caches Impala-specific objects, 
then recreates an HMS Partition object, passing in the table schema when 
creating the StorageDescriptor(). The fix is to simply not add the schema.

When running in local catalog mode, we hit this line, with the original 
Partition object, so we still need this line.

There was discussion on some list or other about the idea of caching the Impala 
catalog objects rather than the HMS objects. The Impala objects appear to be 
stripped down. By caching HMS objects, we will find ourselves going through the 
same process that likely led to those light-weight Impala objects.


http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@71
PS1, Line 71:
> checkArgument
Done



--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 26 Oct 2018 22:33:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

2018-10-26 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11688

to look at the new patch set (#2).

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog
..

IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

Used existing method to remove cached per-partition column information
as requested in the JIRA ticket. See IMPALA-7501 for details.

Then, revised catalogd to not include the unwanted partition schema
info when re-creating an HMS parition object.

Cleaned up a few unused imports in test files used during this effort.

Tested by inspecting a heap dump before and after the change to ensure
that the column schema is no longer referenced from the Hive Partition
objects. (At present, we have no automated way to inspect a heap dump.)

Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
5 files changed, 5 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/11688/2
--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache

2018-10-26 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog cache
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169
PS1, Line 1169:   // The per-partition schema information is not used by 
Impala.
> Can we slim them down even before serializing it at the Catalog server? Mig
good idea, that would reduce network bandwidth noticeably considering we don't 
currently do compression on the wire



--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 26 Oct 2018 17:00:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache

2018-10-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog cache
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169
PS1, Line 1169:   // The per-partition schema information is not used by 
Impala.
Can we slim them down even before serializing it at the Catalog server? Might 
as well save some CPU and RPC costs

https://github.com/apache/impala/blob/1dbb3c1be634c2a3e356c3b2ee5deac2436c9815/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1727



--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 26 Oct 2018 01:21:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache

2018-10-25 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog cache
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11688/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11688/1//COMMIT_MSG@14
PS1, Line 14: objects. (At present, we have no automated way to inspect a heap 
dump.)
fwiw I tested changes like this in the past by starting an impalad, querying 
some table(s) to populate the cache, and then using 'jmap -histo:live' on the 
impalad. See 1dd4403fdcae665b0f343bf3fb7cb644c1875851's commit message for an 
example along with a script that can be handy to compare before/after jmap 
histo dumps.

Not super critical to go back and do it for this change if you're busy with 
other stuff.


http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@71
PS1, Line 71: checkState
checkArgument



--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 25 Oct 2018 23:36:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache

2018-10-25 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog cache
..


Patch Set 1:

Hey Todd, you asked for this fix. Can you take a look at this change? Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 25 Oct 2018 23:25:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache

2018-10-15 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11688


Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog cache
..

IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache

Used existing method to remove cached per-partition column information
as requested in the JIRA ticket. See IMPALA-7501 for details.

Tested by inspecting a heap dump before and after the change to ensure
that the column schema is no longer referenced from the Hive Partition
objects. (At present, we have no automated way to inspect a heap dump.)

Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
---
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
2 files changed, 7 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/11688/1
--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers