[Impala-ASF-CR] IMPALA-7219. Avoid wasting ~160 bytes per IncompleteTable in Catalog Server heap.

2018-07-21 Thread Misha Dmitriev (Code Review)
Misha Dmitriev has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10982 )

Change subject: IMPALA-7219. Avoid wasting ~160 bytes per IncompleteTable in 
Catalog Server heap.
..


Patch Set 4:

Patch set 4 failed twice in the Jenkins build, so I suspect there is some 
problem in it. The last failure is 
https://jenkins.impala.io/job/pre-review-test/190/ Unfortunately, I don't see 
any clues in the main build log, and I don't know how to get to the Java impala 
catalog logs. Any help would be appreciated.


--
To view, visit http://gerrit.cloudera.org:8080/10982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6
Gerrit-Change-Number: 10982
Gerrit-PatchSet: 4
Gerrit-Owner: Misha Dmitriev 
Gerrit-Reviewer: Misha Dmitriev 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Sat, 21 Jul 2018 21:36:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7219. Avoid wasting ~160 bytes per IncompleteTable in Catalog Server heap.

2018-07-21 Thread Misha Dmitriev (Code Review)
Misha Dmitriev has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/10982 )

Change subject: IMPALA-7219. Avoid wasting ~160 bytes per IncompleteTable in 
Catalog Server heap.
..

IMPALA-7219. Avoid wasting ~160 bytes per IncompleteTable in Catalog Server 
heap.

This change switches initialization from eager to lazy for the following fields
in Table: HashMap colsByName_ and ArrayType type_. In this way, we
avoid wasting memory when HashMaps stay empty (an empty HashMap still uses at
least 48 bytes in the heap), and when a StructType with its own empty HashMap
and ArrayList instances is initialized, but stays unused. I have calculated
ArrayType, StructType, two HashMaps and one ArrayList collectively use up
160 bytes in the heap per a Table. Further analysis can be found in the JIRA.
This optimization becomes really relevant when a catalog server loads a very
large number (millions) of tables, and all/most of them are represented as
IncompleteTable instances.

Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6
---
M fe/src/main/java/org/apache/impala/catalog/Table.java
1 file changed, 24 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/10982/4
--
To view, visit http://gerrit.cloudera.org:8080/10982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6
Gerrit-Change-Number: 10982
Gerrit-PatchSet: 4
Gerrit-Owner: Misha Dmitriev 
Gerrit-Reviewer: Misha Dmitriev 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists

2018-07-20 Thread Misha Dmitriev (Code Review)
Misha Dmitriev has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10982 )

Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty 
HashMaps and ArrayLists
..


Patch Set 2:

Thank you for your comments/clarifications, guys. I see what you mean now.

I've checked the heap dump more carefully, and now I realize that all the 
StructType instances (all of which have both of their data fields, fieldMap_ 
and fields_, empty) come from IncompleteTable's. Here is the relevant excerpt 
from the jxray heap report:


182,587K (1.2%): org.apache.impala.catalog.StructType: 7790403 / 100% objects
↖org.apache.impala.catalog.ArrayType.itemType_
↖org.apache.impala.catalog.IncompleteTable.type_
↖{java.util.concurrent.ConcurrentHashMap}.values
↖org.apache.impala.catalog.CatalogObjectCache.metadataCache_
↖org.apache.impala.catalog.Db.tableCache_
↖{java.util.concurrent.ConcurrentHashMap}.values
↖java.util.concurrent.atomic.AtomicReference.value
↖org.apache.impala.catalog.CatalogServiceCatalog.dbCache_
↖org.apache.impala.catalog.BuiltinsDb.parentCatalog_
↖Java Static org.apache.impala.catalog.Catalog.builtinsDb_

So, StructType objects themselves take 1.2% of the heap, plus their empty 
fieldMap_ maps take 2.5% and empty fields_ lists take 1.2%. And looks like if 
we simply initialize these StructType objects lazily (more precisely, the 
'ArrayType type_' field in Table), we will save all the above memory, i.e. 
considerably more than the current patch saves. Furthermore, the lazy changes 
in StructType itself can be reverted, which would keep the code more readable.

Makes sense? I think this is the best I can do in Impala code with my current 
very limited understanding of it. More advanced changes, like making 
IncompleteTable not inherit from Table, would likely require a much deeper 
knowledge of how this code works.


--
To view, visit http://gerrit.cloudera.org:8080/10982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6
Gerrit-Change-Number: 10982
Gerrit-PatchSet: 2
Gerrit-Owner: Misha Dmitriev 
Gerrit-Reviewer: Misha Dmitriev 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 20 Jul 2018 19:50:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists

2018-07-20 Thread Misha Dmitriev (Code Review)
Misha Dmitriev has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10982 )

Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty 
HashMaps and ArrayLists
..


Patch Set 2:

Thank you for the quick response, Vuk. Answering your questions:

1. The respective IMPALA-7219 contains more details. In the case that I looked 
at, there was exactly the same number (a little less than 8 million) of both 
kinds of HashMaps. I.e. ~8M coming from IncompleteTable.colsByName_ and ~8M 
coming from StructType.fieldMap_. Each empty HashMap takes 48 bytes, so 
collectively they wasted ~700MB of memory.

2. Let me clarify. There is no such thing as an object (collection) "with no 
fields". When you call 'new HashMap()', you always create a HashMap object, 
which internally has fields like 'int size, threshold' etc. The good news is 
that the most important field, 'HashMap$Entry table[]' is null until the first 
element is added to this table. But just the above data fields, plus the 
12-byte internal header that each object in the JVM heap has, result in the 
fact that an empty, completely unpopulated HashMap, occupies 48 bytes. This is 
what I want to get rid of. If all this still seems a little vague, I suggest 
you take a look at the slides here: 
https://www.slideshare.net/MikhailMishaDmitriev/java-memory-analysis-problems-and-solutions?trk=v-feed
 In particular, slide 10 shows the internals and lifecycle of a HashMap.

So your suggestion in (2) may or may not make sense depending what exactly you 
had in mind, and needs some clarification.

Regarding tests: yes, I ran it through the Impala Jenkins build, and all tests 
passed: https://jenkins.impala.io/job/pre-review-test/186/


--
To view, visit http://gerrit.cloudera.org:8080/10982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6
Gerrit-Change-Number: 10982
Gerrit-PatchSet: 2
Gerrit-Owner: Misha Dmitriev 
Gerrit-Reviewer: Misha Dmitriev 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 20 Jul 2018 18:41:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists

2018-07-20 Thread Misha Dmitriev (Code Review)
Misha Dmitriev has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10982


Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty 
HashMaps and ArrayLists
..

IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists

This change switches initialization of HashMaps in IncompleteTable.colsByName_ 
and
StructType.fieldMap_ from eager to lazy. In this way, we avoid wasting memory
when HashMaps stay empty (even an empty HashMap uses at least 48 bytes in the 
heap).
This optimization becomes really relevant when a catalog server loads a very 
large
number (millions) of tables.

Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6
---
M fe/src/main/java/org/apache/impala/catalog/StructType.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
2 files changed, 35 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/10982/2
--
To view, visit http://gerrit.cloudera.org:8080/10982
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6
Gerrit-Change-Number: 10982
Gerrit-PatchSet: 2
Gerrit-Owner: Misha Dmitriev 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac