> On Feb. 20, 2014, 8:06 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 67 > > <https://reviews.apache.org/r/18200/diff/8/?file=498177#file498177line67> > > > > For my info, does hadoop knows that a file is already in distributed > > cache so as to skip it. Otherise, it will cache everytime a job is > > launched. I couldn't find doc about this.
The big item from our perspective is that we are saving putting the data in HDFS each time. YARN has future work to share amongst jobs: https://issues.apache.org/jira/browse/YARN-1492 > On Feb. 20, 2014, 8:06 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 81 > > <https://reviews.apache.org/r/18200/diff/8/?file=498177#file498177line81> > > > > I'm not sure if we need put this in a synchronized block. Probably better to since HS2 is typically only running on single host and calculating hashes is CPU intensive. > On Feb. 20, 2014, 8:06 p.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 92 > > <https://reviews.apache.org/r/18200/diff/8/?file=498177#file498177line92> > > > > 2. So the cached file is named without including its originial name? > > This might make it hard to figure out if problem arises. Added the name for debugging purposes. - Brock ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18200/#review35042 ----------------------------------------------------------- On Feb. 19, 2014, 8:35 p.m., Brock Noland wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18200/ > ----------------------------------------------------------- > > (Updated Feb. 19, 2014, 8:35 p.m.) > > > Review request for hive. > > > Bugs: HIVE-860 > https://issues.apache.org/jira/browse/HIVE-860 > > > Repository: hive-git > > > Description > ------- > > Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by > their sha1 hash. This results in: > > 1) faster queries > 2) less distributed cache churn > 3) a smaller/cleaner hive-exec jar > > > Diffs > ----- > > bin/hive 3bd949f > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 > conf/hive-default.xml.template 0d08aa2 > packaging/src/main/assembly/bin.xml a97ef7d > ql/pom.xml 53d0b9e > ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f > shims/aggregator/pom.xml 7aa8c4c > > Diff: https://reviews.apache.org/r/18200/diff/ > > > Testing > ------- > > Tested manually on a cluster. > > > Thanks, > > Brock Noland > >