-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56995/#review166714
-----------------------------------------------------------
Thanks for the patch Vihang!
I had alternate version in my head, but your one fared better on my quick
performance test (see below).
Another thing - as I working on HIVE-15051 to run Yetus as a precommit check I
happily reporting nits found by checkstyle/findbugs/rat etc.
Here is the list:
- Checkstyle:
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:23:import
java.util.Collections;:8: warning: Unused import - java.util.Collections.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:31:import
java.util.concurrent.ConcurrentHashMap;:8: warning: Unused import -
java.util.concurrent.ConcurrentHashMap.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:34:import
java.util.concurrent.ExecutorService;:8: warning: Unused import -
java.util.concurrent.ExecutorService.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:37:import
java.util.concurrent.LinkedBlockingQueue;:8: warning: Unused import -
java.util.concurrent.LinkedBlockingQueue.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:39:import
java.util.concurrent.TimeUnit;:8: warning: Unused import -
java.util.concurrent.TimeUnit.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:40:import
java.util.concurrent.atomic.AtomicInteger;:8: warning: Unused import -
java.util.concurrent.atomic.AtomicInteger.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:429:
checkPartitionDirsSingleThreaded(basePaths, allDirs,
basePath.getFileSystem(conf), maxDepth, maxDepth);: warning: Line is longer
than 100 characters (found 109).
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:465:
return processPathDepthInfo(pd);: warning: method def child at
indentation level 8 not at correct indentation, 6
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:469:
throws FileNotFoundException, IOException, HiveException,
InterruptedException {:16: warning: Redundant throws: 'FileNotFoundException'
is subclass of 'IOException'.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:511:
final Path p;:16: warning: Variable 'p' must be private and have accessor
methods.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:512:
final int depth;:15: warning: Variable 'depth' must be private and have
accessor methods.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:537:
futures.add(pool.submit(new PathDepthInfoCallable(nextLevel.poll(),
maxDepth, fs, tempQueue)));: warning: Line is longer than 100 characters (found
105).
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:559:
private void checkPartitionDirsSingleThreaded(Queue<Path> basePaths,final
Set<Path> allDirs,:71: warning: ',' is not followed by whitespace.
- Findbugs:
Performance Warning: Should
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfo be a
_static_ inner class?
"This class is an inner class, but does not use its embedded reference
to the object which created it. This reference makes the instances
of the class larger, and may keep the reference to the creator object
alive longer than necessary. If possible, the class should be
made static."
Thanks for the patch,
Peter
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (lines
526 - 532)
<https://reviews.apache.org/r/56995/#comment238732>
Interesting, and fast solution.
My first reaction was: Why not use a a Queue to collect the futures, and
simply wait until the every future is resolved, and the futures queue is empty?
The child future would be added to the queue before the parent finishes, so we
have solution for the synchronization that way.
Something like this:
ConcurrentLinkedQueue<Future<Path>> futures = new
ConcurrentLinkedQueue<Future<Path>>();
futures.add(pool.submit(new PathDepthInfoCallable(new
PathDepthInfo(basePath, 0), maxDepth, fs, futures, pool)));
while(!futures.isEmpty()) {
Path p = futures.poll().get();
if (p != null) {
result.add(p);
}
}
But I have tried, and your version is performed better on local fs with 10k
files, 4 depth (year/month/day/job). 2800 ms vs. 3000 ms
I still think that my version is more readable, but speed is speed :D
- Peter Vary
On Feb. 23, 2017, 8:15 p.m., Vihang Karajgaonkar wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56995/
> -----------------------------------------------------------
>
> (Updated Feb. 23, 2017, 8:15 p.m.)
>
>
> Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena.
>
>
> Bugs: HIVE-15879
> https://issues.apache.org/jira/browse/HIVE-15879
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java
> 7c94c95f00492467ba27dedc9ce513e13c85ea61
>
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java
> 35f52cd522e0e48a333e30966245bec65cc2ec9c
>
> Diff: https://reviews.apache.org/r/56995/diff/
>
>
> Testing
> -------
>
> Tested using existing and newly added test cases
>
>
> Thanks,
>
> Vihang Karajgaonkar
>
>