[ https://issues.apache.org/jira/browse/HDFS-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Templeton reassigned HDFS-13752: --------------------------------------- Assignee: Barnabas Maidics > fs.Path stores file path in java.net.URI causes big memory waste > ---------------------------------------------------------------- > > Key: HDFS-13752 > URL: https://issues.apache.org/jira/browse/HDFS-13752 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs > Affects Versions: 2.7.6 > Environment: Hive 2.1.1 and hadoop 2.7.6 > Reporter: Barnabas Maidics > Assignee: Barnabas Maidics > Priority: Major > Attachments: HDFS-13752.001.patch, HDFS-13752.002.patch, > HDFS-13752.003.patch, HDFSbenchmark.pdf, Screen Shot 2018-07-20 at > 11.12.38.png, heapdump-100000partitions.html, measurement.pdf > > > I was looking at HiveServer2 memory usage, and a big percentage of this was > because of org.apache.hadoop.fs.Path, where you store file paths in a > java.net.URI object. The URI implementation stores the same string in 3 > different objects (see the attached image). In Hive when there are many > partitions this cause a big memory usage. In my particular case 42% of memory > was used by java.net.URI so it could be reduced to 14%. > I wonder if the community is open to replace it with a more memory efficient > implementation and what other things should be considered here? It can be a > huge memory improvement for Hadoop and for Hive as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org