[ https://issues.apache.org/jira/browse/HIVE-17451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169474#comment-16169474 ]
Matt McCline commented on HIVE-17451: ------------------------------------- Seems like avro-tools.jar tojson isn't converting the binary (physical type) to decimal (logical type). > Cannot read decimal from avro file created with HIVE > ---------------------------------------------------- > > Key: HIVE-17451 > URL: https://issues.apache.org/jira/browse/HIVE-17451 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.1.0 > Reporter: liviu > Assignee: Ganesh Tripathi > Priority: Blocker > > Hi, > When we export decimal data from a hive managed table to a hive avro external > table (as bytes with decimal logicalType) the value from avro file cannot be > read with any other tools (ex: avro-tools, spark, datastage..) > _+Scenario:+_ > *create hive managed table an insert a decimal record:* > {code:java} > create table test_decimal (col1 decimal(20,2)); > insert into table test_decimal values (3.12); > {code} > *create avro schema /tmp/test_decimal.avsc with below content:* > {code:java} > { > "type" : "record", > "name" : "decimal_test_avro", > "fields" : [ { > "name" : "col1", > "type" : [ "null", { > "type" : "bytes", > "logicalType" : "decimal", > "precision" : 20, > "scale" : 2 > } ], > "default" : null, > "columnName" : "col1", > "sqlType" : "2" > }], > "tableName" : "decimal_test_avro" > } > {code} > *create an hive external table stored as avro:* > {code:java} > create external table test_decimal_avro > STORED AS AVRO > LOCATION '/tmp/test_decimal' > TBLPROPERTIES ( > 'avro.schema.url'='/tmp/test_decimal.avsc', > 'orc.compress'='SNAPPY'); > {code} > *insert data in avro external table from hive managed table:* > {code:java} > set hive.exec.compress.output=true; > set hive.exec.compress.intermediate=true; > set avro.output.codec=snappy; > insert overwrite table test_decimal_avro select * from test_decimal; > {code} > *successfully reading data from hive avro table through hive cli:* > {code:java} > select * from test_decimal_avro; > OK > 3.12 > {code} > *avro schema from avro created file is ok:* > {code:java} > hadoop jar /avro-tools.jar getschema /tmp/test_decimal/000000_0 > { > "type" : "record", > "name" : "decimal_test_avro", > "fields" : [ { > "name" : "col1", > "type" : [ "null", { > "type" : "bytes", > "logicalType" : "decimal", > "precision" : 20, > "scale" : 2 > } ], > "default" : null, > "columnName" : "col1", > "sqlType" : "2" > } ], > "tableName" : "decimal_test_avro" > } > {code} > *read data from avro file with avro-tools {color:#d04437}error{color}, got > {color:#d04437}"\u00018"{color} value instead of the correct one:* > {code:java} > hadoop jar avro-tools.jar tojson /tmp/test_decimal/000000_0 > {"col1":{"bytes":"\u00018"}} > {code} > *Read data in a spark dataframe error, got {color:#d04437}[01 38]{color} > and{color:#d04437} 8{color} when converted to string instead of correct > "3.12" value :* > {code:java} > val df = sql.read.avro("/tmp/test_decimal") > df: org.apache.spark.sql.DataFrame = [col1: binary] > scala> df.show() > +-------+ > | col1| > +-------+ > |[01 38]| > +-------+ > scala> df.withColumn("col2", 'col1.cast("String")).select("col2").show() > +----+ > |col2| > +----+ > | 8| > +----+ > {code} > Is this a Hive bug or there is anything else I can do in order to get correct > values in the avro file created by Hive? > Thanks, -- This message was sent by Atlassian JIRA (v6.4.14#64029)