[ 
https://issues.apache.org/jira/browse/KUDU-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2454:
------------------------------
    Affects Version/s: 1.5.0

> Avro Import/Export does not round trip
> --------------------------------------
>
>                 Key: KUDU-2454
>                 URL: https://issues.apache.org/jira/browse/KUDU-2454
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>            Reporter: Grant Henke
>            Priority: Critical
>
> When exporting to Avro columns with type Byte or Short are treated as 
> Integers because Avro doesn't have a Byte or Short type. When re-importing 
> the data, the job fails because the column types do not match.
> Ideally spark-avro would solve this by safely casting the values back to the 
> smaller type. Guava has utilities to make this straightforward. (ex. 
> Shorts.checkedCast(i)). We could send a pull request to spark-avro to fix 
> this, or add some special handling to the Kudu side to handle the safe 
> downconversion. 
> Another type issue when exporting is that Decimal values are written as 
> Strings instead of BigDecimal logical types. There are a few un-merged pull 
> request to fix that here: 
>  * [https://github.com/databricks/spark-avro/pull/276]
>  * [https://github.com/databricks/spark-avro/pull/121]
> Additionally Timestamp values are written as longs instead of Timestamp 
> logical types (timestamp-micros). This is a data corruption issue because the 
> long [value that is 
> output|https://github.com/databricks/spark-avro/blob/0764d699015975acf87dc5210cca8a43db84196a/src/main/scala/com/databricks/spark/avro/AvroOutputWriter.scala#L103]
>  is in milliseconds (Timestamp.getTime()) but the expected long value for a 
> Kudu Timestamp column should be in microseconds.
> Given all these issues, ImportExportFiles needs a lot more test coverage 
> before we suggest it's use. Currently it only tests importing Strings form a 
> CSV and does not test Avro or parquet support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to