Misha Dmitriev created HIVE-16489:
-------------------------------------
Summary: HMS wastes 26.4% of memory due to dup strings in
metastore.api.Partition.parameters
Key: HIVE-16489
URL: https://issues.apache.org/jira/browse/HIVE-16489
Project: Hive
Issue Type: Improvement
Components: HiveServer2
Reporter: Misha Dmitriev
Assignee: Misha Dmitriev
I've created a Hive table with 2000 partitions, each backed by two files, with
one row in each file. When I execute some number of concurrent queries against
this table, e.g. as follows
{code}
for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:10000 -n admin -p
admin -e "select count(i_f_1) from misha_table;" & done
{code}
it results in a big memory spike. With 20 queries I caused an OOM in a HS2
server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
I am attaching the results of jxray (www.jxray.com) analysis of a heap dump
that was generated in the 50queries/500m heap scenario. It suggests that there
are several opportunities to reduce memory pressure with not very invasive
changes to the code. One (duplicate strings) has been addressed in
https://issues.apache.org/jira/browse/HIVE-15882 In this ticket, I am going to
address the fact that almost 20% of memory is used by instances of
java.util.Properties. These objects are highly duplicate, since for each
partition each concurrently running query creates its own copy of Partion,
PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000
partitions) Properties in memory. By interning/deduplicating these objects we
may be able to save perhaps 15% of memory.
Note, however, that if there are queries that mutate partitions, the
corresponding Properties would be mutated as well. Thus we cannot simply use a
single "canonicalized" Properties object at all times for all Partition objects
representing the same DB partition. Instead, I am going to introduce a special
CopyOnFirstWriteProperties class. Such an object initially internally
references a canonicalized Properties object, and keeps doing so while only
read methods are called. However, once any mutating method is called, the given
CopyOnFirstWriteProperties copies the data into its own table from the
canonicalized table, and uses it ever after.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)