-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 7:15 p.m.)


Review request for giraph.


Description
-------

One particular thing I added was the concept of "profiles", allowing for easily 
reading / writing from multiple tables. This should remove a lot of the cruft 
around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated 
Hive-only portion (under package org.apache.hadoop.hive). Things under this 
package (and its children) do not touch any Giraph code, and so can be 
contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually 
implement an XInputFormat anymore. They just create a class the implements the 
HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use 
HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 
89ef87fea7a370354156fb7be02ef4249e0a6111 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 
9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java 
PRE-CREATION 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 
  
giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java
 PRE-CREATION 
  
giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java
 fbcef720d3caa944af70a859996aac40a2f67558 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java 
c1f76f1a46d1fc9af489a916256884520c138cb4 
  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java
 PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java
 PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java
 PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java
 PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java
 PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java 
PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java
 PRE-CREATION 
  
giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java
 PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java 
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java 
PRE-CREATION 
  pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 

Diff: https://reviews.apache.org/r/8611/diff/


Testing (updated)
-------

Ran on some production jobs and verified results were exactly the same.

Here's some comparisons of performance on real work loads ("base" is hcatalog, 
"mine" is hive):
https://gist.github.com/nitay/b34c8397b7aa1821f858/raw/b5a960891ed0e45e4f7423758471231fc88d7614/current_city
https://gist.github.com/nitay/5bc7f9da50c9b4b4dba2/raw/0dd899e78fbb04ef8c990073fbc1c862db8d5b5b/college
https://gist.github.com/nitay/569cc1a37694de458a74/raw/ca8df93a804f9236b20d251a0dcd6cc97e205008/high_school

We see thatĀ even before significant performance improvements, this already 
speeds up input time. Some of the jobs allocate memory so quickly that it 
causes full GC which kills performance, but I expect that has more to do with 
tuning GC better to match the faster loading. There is an increase in physical 
memory which I will investigate.

Also there are few performance improvement ideas coming, this is just the first 
working version.


Thanks,

Nitay Joffe

Reply via email to