> On Feb. 24, 2017, 7:38 a.m., Rui Li wrote: > > common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java, line > > 67 > > <https://reviews.apache.org/r/56687/diff/2/?file=1642999#file1642999line67> > > > > why the stringField doesn't need the null check like other fields?
There are strong indicators in the URI.java code that this field is never null. For example, the javadoc says it's the only serializable field. However, it's hard to prove this with 100% accuracy, and it might change in the future. So, added a null check for safety. > On Feb. 24, 2017, 7:38 a.m., Rui Li wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 3147 > > <https://reviews.apache.org/r/56687/diff/2/?file=1643000#file1643000line3147> > > > > How about intern the path in the createEmptyFile method? Makes sense, done. > On Feb. 24, 2017, 7:38 a.m., Rui Li wrote: > > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java, line 183 > > <https://reviews.apache.org/r/56687/diff/2/?file=1643005#file1643005line183> > > > > can we call the util method? Done. > On Feb. 24, 2017, 7:38 a.m., Rui Li wrote: > > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java, line 188 > > <https://reviews.apache.org/r/56687/diff/2/?file=1643005#file1643005line188> > > > > guess we can also add a util method for this Good idea, done. > On Feb. 24, 2017, 7:38 a.m., Rui Li wrote: > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line > > 253 > > <https://reviews.apache.org/r/56687/diff/2/?file=1643008#file1643008line253> > > > > since we'll intern strings in the new path, do we have to intern > > taskTmpDir here? We need to - you can see that this String object is also added to several tables below on its own, not as part of taskTmpDirPath. > On Feb. 24, 2017, 7:38 a.m., Rui Li wrote: > > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java, > > line 322 > > <https://reviews.apache.org/r/56687/diff/2/?file=1643011#file1643011line322> > > > > will this cause the hash map to resize since the default load factor is > > 0.75? and several similar concerns below You are probably right, in that this constructor's parameter is the initial capacity of this table (more or less the size of the internal array) - not how many elements the table is expected to hold. However, if you check the code of HashMap, the things are more interesting. The actual capacity of the table is always a power of two, so unless this parameter is also a power of two, the capacity will be chosen as the nearest higher power of two, i.e. it will be higher than the parameter and closer to what we actually need. Also, if we create a table with the default size (16) here and then will put many more elements into it, it will be resized several times, whereas with the current code it will be resized at most once. Trying to "factor in" the load factor will likely add more confusion/complexity. All in all, given that choosing capacity in HashMap internally is non-trivial, I think it's easier/safer to just call 'new HashMap(oldMap.size())' as we do now. - Misha ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56687/#review166649 ----------------------------------------------------------- On Feb. 23, 2017, 9:01 p.m., Misha Dmitriev wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/56687/ > ----------------------------------------------------------- > > (Updated Feb. 23, 2017, 9:01 p.m.) > > > Review request for hive, Chaoyu Tang, Mohit Sabharwal, and Sergio Pena. > > > Bugs: https://issues.apache.org/jira/browse/HIVE-15882 > > https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-15882 > > > Repository: hive-git > > > Description > ------- > > See the description of the problem in > https://issues.apache.org/jira/browse/HIVE-15882 Interning strings per this > review removes most of the overhead due to duplicate strings. > > Also, where maps in several places are created from other maps, use the > original map's size for the new map. This is to avoid the situation when a > map with default capacity (typically 16) is created to hold just 2-3 entries, > and the rest of the internal 16-entry array is wasted. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java > e81cbce3e333d44a4088c10491f399e92a505293 > ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java > 08420664d59f28f75872c25c9f8ee42577b23451 > ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java > e91064b9c75e8adb2b36f21ff19ec0c1539b03b9 > ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java > 51530ac16c92cc75d501bfcb573557754ba0c964 > ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java > 55b3b551a1dac92583b6e03b10beb8172ca93d45 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java > 82dc89803be9cf9e0018720eeceb90ff450bfdc8 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java > c0edde9e92314d86482b5c46178987e79fae57fe > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java > c6ae6f290857cfd10f1023058ede99bf4a10f057 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java > 24d16812515bdfa90b4be7a295c0388fcdfe95ef > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java > ede4fcbe342052ad86dadebcc49da2c0f515ea98 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java > 0882ae2c6205b1636cbc92e76ef66bb70faadc76 > > ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java > 68b0ad9ea63f051f16fec3652d8525f7ab07eb3f > ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java > d4bdd96eaf8d179bed43b8a8c3be0d338940154a > ql/src/java/org/apache/hadoop/hive/ql/plan/MsckDesc.java > b7a7e4b7a5f8941b080c7805d224d3885885f444 > ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java > 73981e826870139a42ad881103fdb0a2ef8433a2 > > Diff: https://reviews.apache.org/r/56687/diff/ > > > Testing > ------- > > I've measured how much memory this change plus another one (interning > Properties in PartitionDesc) save in my HS2 benchmark - the result is 37%. > See the details in HIVE-15882. > > > Thanks, > > Misha Dmitriev > >
