Review Request 68099: SerDe to support Teradata Binary Format

2018-07-29 Thread Lu Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68099/
---

Review request for hive and Carl Steinbach.


Bugs: HIVE-20225
https://issues.apache.org/jira/browse/HIVE-20225


Repository: hive-git


Description
---

When using TPT/BTEQ to export Data from Teradata, Teradata will export binary 
files based on the schema.

A Customized SerDe is needed in order to directly read these files from Hive.

CREATE EXTERNAL TABLE `TABLE1`(
...)
PARTITIONED BY (
...)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.contrib.serde2.TeradataBinarySerde'
STORED AS INPUTFORMAT
 
'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileInputFormat'
OUTPUTFORMAT
 
'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileOutputFormat'
LOCATION ...;

SELECT * FROM `TABLE1`;
Problem Statement:

Right now the fast way to export data from Teradata is using TPT. However, the 
Hive could not directly utilize these exported binary format because it doesn't 
have a SerDe for these files.

Result:

Provided with the SerDe, Hive can operate upon the exported Teradata Binary 
Format file transparently.


Diffs
-

  
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileInputFormat.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileOutputFormat.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestGeneralFunctions.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDate.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDecimal.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForTimeStamp.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeGeneral.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/68099/diff/1/


Testing
---

Junit tests have been added for Serialization and Deserialization functions


Thanks,

Lu Li



Re: Review Request 68099: SerDe to support Teradata Binary Format

2018-08-17 Thread Lu Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68099/
---

(Updated Aug. 17, 2018, 11:21 p.m.)


Review request for hive, Carl Steinbach and Daniel Dai.


Changes
---

1. changes based on the review
2. add the teradata.row.length to support 1MB record file
3. add Query Unit Test


Bugs: HIVE-20225
https://issues.apache.org/jira/browse/HIVE-20225


Repository: hive-git


Description
---

When using TPT/BTEQ to export Data from Teradata, Teradata will export binary 
files based on the schema.

A Customized SerDe is needed in order to directly read these files from Hive.

CREATE EXTERNAL TABLE `TABLE1`(
...)
PARTITIONED BY (
...)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.contrib.serde2.TeradataBinarySerde'
STORED AS INPUTFORMAT
 
'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileInputFormat'
OUTPUTFORMAT
 
'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileOutputFormat'
LOCATION ...;

SELECT * FROM `TABLE1`;
Problem Statement:

Right now the fast way to export data from Teradata is using TPT. However, the 
Hive could not directly utilize these exported binary format because it doesn't 
have a SerDe for these files.

Result:

Provided with the SerDe, Hive can operate upon the exported Teradata Binary 
Format file transparently.


Diffs (updated)
-

  data/files/teradata_binary_file/td_data_with_1mb_rowsize.teradata.gz 
PRE-CREATION 
  data/files/teradata_binary_file/teradata_binary_table.deflate PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/TeradataBinaryFileInputFormat.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/TeradataBinaryFileOutputFormat.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/TeradataBinaryRecordReader.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/test_teradatabinaryfile.q PRE-CREATION 
  ql/src/test/results/clientpositive/test_teradatabinaryfile.q.out PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/teradata/TeradataBinaryDataInputStream.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/teradata/TeradataBinaryDataOutputStream.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/teradata/TeradataBinarySerde.java 
PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/teradata/TestTeradataBinarySerdeForDate.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/teradata/TestTeradataBinarySerdeForDecimal.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/teradata/TestTeradataBinarySerdeForTimeStamp.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/teradata/TestTeradataBinarySerdeGeneral.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/68099/diff/3/

Changes: https://reviews.apache.org/r/68099/diff/2-3/


Testing
---

Junit tests have been added for Serialization and Deserialization functions


Thanks,

Lu Li



Re: Review Request 68099: SerDe to support Teradata Binary Format

2018-07-30 Thread Lu Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68099/
---

(Updated July 30, 2018, 10:22 p.m.)


Review request for hive and Carl Steinbach.


Changes
---

fix the issues found by Hive QA


Bugs: HIVE-20225
https://issues.apache.org/jira/browse/HIVE-20225


Repository: hive-git


Description
---

When using TPT/BTEQ to export Data from Teradata, Teradata will export binary 
files based on the schema.

A Customized SerDe is needed in order to directly read these files from Hive.

CREATE EXTERNAL TABLE `TABLE1`(
...)
PARTITIONED BY (
...)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.contrib.serde2.TeradataBinarySerde'
STORED AS INPUTFORMAT
 
'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileInputFormat'
OUTPUTFORMAT
 
'org.apache.hadoop.hive.contrib.fileformat.teradata.TeradataBinaryFileOutputFormat'
LOCATION ...;

SELECT * FROM `TABLE1`;
Problem Statement:

Right now the fast way to export data from Teradata is using TPT. However, the 
Hive could not directly utilize these exported binary format because it doesn't 
have a SerDe for these files.

Result:

Provided with the SerDe, Hive can operate upon the exported Teradata Binary 
Format file transparently.


Diffs (updated)
-

  
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileInputFormat.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileOutputFormat.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestGeneralFunctions.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDate.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForDecimal.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeForTimeStamp.java
 PRE-CREATION 
  
contrib/src/test/org/apache/hadoop/hive/contrib/serde2/TestTeradataBinarySerdeGeneral.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/68099/diff/2/

Changes: https://reviews.apache.org/r/68099/diff/1-2/


Testing
---

Junit tests have been added for Serialization and Deserialization functions


Thanks,

Lu Li



Re: Review Request 68099: SerDe to support Teradata Binary Format

2018-08-06 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68099/#review206929
---



Please note that while I have not commented on every occurrence of the 
following issues, I would still like them all to be fixed:
 * Unnecessary 'else' clauses
 * Unnecessary uses of 'this'
 * RuntimeExceptions which should be replaced with checked exceptions.


contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileInputFormat.java
Lines 1 (patched)


I think this code and the other files in this patch belong in the serde 
module. Please move the code to 
serde/src/java/org/apache/hadoop/hive/serde2/teradata



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryFileOutputFormat.java
Lines 71 (patched)


This should be a static final class variable, i.e:

static final byte RECORD_END_BYTE = ...



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 74 (patched)


Please avoid unnecessary uses of "this", both in this file and others in 
the patch.



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 89 (patched)


Change message to "Input file is compressed. Using compression code %s"



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 92 (patched)


Please remove or change message to "The input file is not compressed".



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 116 (patched)


static import String.format() in order to avoid constantly using the 
"String." prefix.



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 155 (patched)


Magic constants (e.g. "0x0a") should be defined in one place (e.g. 
TeradataConstants.java) as a static final variable.



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 197 (patched)


Remove



contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/teradata/TeradataBinaryRecordReader.java
Lines 211 (patched)


Please make this more readable by replace the ?: operator with equivalent 
if(){} else {} code.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
Lines 101 (patched)


Unnecessary else clause.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataInputStream.java
Lines 122 (patched)


This else clause is unnecessary.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 105 (patched)


This else clause is unnecessary if you explicitly return from the previous 
block.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 148 (patched)


Consider breaking this into multiple lines for improved readability:

int toWrite = date.get().getYear() * 1 +
  date.get().getMonth() * 100 +
  ...



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 178 (patched)


Add explicit return and remove unnecessary else clause.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 184 (patched)


Instead of logging this info separately, I think it would make more sense 
to include this in the exception message.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinaryDataOutputStream.java
Lines 191 (patched)


Unnecessary else clause.



contrib/src/java/org/apache/hadoop/hive/contrib/serde2/teradata/TeradataBinarySerde.java
Lines 190 (patched)


Is this worth logging? If so, consider changing the log level to DEBUG.



contrib/src/java/org/apache/hadoop/hive/contrib/ser