[
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704617#action_12704617
]
Zheng Shao commented on HIVE-352:
---------------------------------
hive-352-2009-4-30-4.patch:
Thanks Yongqiang. I tried it and it works now.
* exec/Utilities.java:createRCFileWriter:
parameter fs never used.
* RCFile.java:
Can you add some comments in the header to explain how you determine "row
split"?
What does this mean? <li>A sync-marker every few <code>100</code> bytes or
so.</li>
KeyBuffer.write(DataInput in, int length): length never used. Is this method
ever used? If not, remove it.
ColumnBuffer.getLength(): Let's remove the synchronized keyword for the sake of
performance.
ColumnBuffer.ColumnBuffer(Class valClass): Let's remove valClass since it's
never used.
Writer: how do you pass the column number from Hive to the configuration and
then to the RCFIle.Writer?
+ this.columnNumber = conf.getInt(COLUMN_NUMBER_CONF_STR, 0);
Writer: Per Java convention, COLUMNS_BUFFER_SIZE should be columnsBufferSize.
+ this.COLUMNS_BUFFER_SIZE = conf.getInt(COLUMNS_BUFFER_SIZE_CONF_STR,
+ 4 * 1024 * 1024);
Reader: add javadoc for "public synchronized boolean nextColumnsBatch()"
* RCFileRecordReader.java
Remove "synchronized" in all methods for the sake of performance
+ protected synchronized boolean next(LongWritable key) throws IOException {
* TestRCFile.java
You might want to use Text.encode() instead of "String".getBytes(). At least we
should use getBytes("UTF-8").
* ql/src/test/queries/clientpositive/columnarserde_create_shortcut.q
For all tests, please drop the created tables again at the end of the .q file.
This helps to make "show tables" in other .q files return a deterministic
result.
* BytesRefArrayWritable.java
What's the difference between get() and unCheckedGet()? The code is the same
BytesRefArrayWritable.resetValid: Let's call it resize.
set(..): should be valid <= index
+ if (valid < index)
+ valid = index + 1;
* BytesRefWritable.java
Shall we rename getBytes() to getBytesCopy()?
* ColumnarStruct.java
init(...): Instead of clearing out the object, we can do set(null,0,0) which
has the same effect but we don't need to create LazyObject for the next row.
{code}
Current:
+ if (field.length > 0) {
+ if (fields[fieldIndex] == null)
+ fields[fieldIndex] = LazyFactory.createLazyObject(fieldTypeInfos
+ .get(fieldIndex));
+ fields[fieldIndex].init(cachedByteArrayRef[fieldIndex], field
+ .getStart(), field.getLength());
+ } else if (fields[fieldIndex] != null
+ && fields[fieldIndex].getObject() != null) {
+ fields[fieldIndex] = null;
+ }
Let's change it to (Please have a test)
+ if (fields[fieldIndex] == null)
+ fields[fieldIndex] = LazyFactory.createLazyObject(fieldTypeInfos
+ .get(fieldIndex));
+ if (field.length > 0) {
+ fields[fieldIndex].init(cachedByteArrayRef[fieldIndex], field
+ .getStart(), field.getLength());
+ } else {
+ fields[fieldIndex].init(null, 0, 0);
+ }
* Can you add one more test for adding/deleting columns? The value of the newly
added columns should be NULL. RCFile should also be able to ignore the extra
columns from the data.
{code}
CREAT TABLE xxx (a string, b string) STORED AS RCFILE;
INSERT OVERWRITE TABLE xxx ...;
ALTER TABLE xxx ADD COLUMNS (c string);
SELECT * FROM xxx;
ALTER TABLE xxx REPLACE COLUMNS (a string);
SELECT * FROM xxx;
{code}
* General comments:
1 We don't need to put "this." in the code unless there is a local var with the
same name.
2 Please add more javadoc - the general rule for Hive is that every public
class/method should have javadoc, except getters and setters.
** Very important: Eclipse automatically generates a lot of empty javadocs for
variables etc. Please remove them if you don't have a comment for that
variable, otherwise "ant -Dhadoop.version=0.17.0 javadoc" will have warnings.
Please make sure to remove all warnings.
{code}
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/serde/src/java/org/apache/hadoop/hive/serde2/columnar/BytesRefWritable.java:95:
warning - @return tag has no arguments.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/serde/src/java/org/apache/hadoop/hive/serde2/columnar/BytesRefWritable.java:122:
warning - Tag @see: missing '#': "set(byte[] newData, int offset, int length)"
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/serde/src/java/org/apache/hadoop/hive/serde2/columnar/BytesRefWritable.java:105:
warning - Tag @see: missing '#': "readFields(DataInput in)"
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java:38:
warning - Tag @link: missing '#': "init(BytesRefArrayWritable cols)"
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:522:
warning - @param argument "keyClass" is not a parameter name.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:522:
warning - @param argument "valClass" is not a parameter name.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:149:
warning - Tag @see: malformed: "CompressionCodec, SequenceFile"
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:149:
warning - Tag @see: reference not found: CompressionCodec, SequenceFile
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1214:
warning - @return tag has no arguments.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1214:
warning - Tag @link: missing '#': "nextColumnBatch()"
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1214:
warning - Tag @link: can't find nextColumnBatch() in
org.apache.hadoop.hive.ql.io.RCFile.Reader
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1288:
warning - @return tag has no arguments.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:1288:
warning - @return tag cannot be used in method with void return type.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:543:
warning - @param argument "keyClass" is not a parameter name.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:543:
warning - @param argument "valClass" is not a parameter name.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:564:
warning - @param argument "keyClass" is not a parameter name.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:564:
warning - @param argument "valClass" is not a parameter name.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:591:
warning - @param argument "keyClass" is not a parameter name.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java:591:
warning - @param argument "valClass" is not a parameter name.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java:105:
warning - @return tag has no arguments.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java:105:
warning - @param argument "tableInfo" is not a parameter name.
[javadoc]
/data/users/zshao/tools/352-trunk-apache-hive/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:745:
warning - @param argument "dest" is not a parameter name.
{code}
> Make Hive support column based storage
> --------------------------------------
>
> Key: HIVE-352
> URL: https://issues.apache.org/jira/browse/HIVE-352
> Project: Hadoop Hive
> Issue Type: New Feature
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22
> progress.txt, hive-352-2009-4-15.patch, hive-352-2009-4-16.patch,
> hive-352-2009-4-17.patch, hive-352-2009-4-19.patch,
> hive-352-2009-4-22-2.patch, hive-352-2009-4-22.patch,
> hive-352-2009-4-23.patch, hive-352-2009-4-27.patch,
> hive-352-2009-4-30-2.patch, hive-352-2009-4-30-3.patch,
> hive-352-2009-4-30-4.patch, HIve-352-draft-2009-03-28.patch,
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP.
> Hive does a great job on raw row oriented storage. In this issue, we will
> enhance hive to support column based storage.
> Acctually we have done some work on column based storage on top of hdfs, i
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.