[ https://issues.apache.org/jira/browse/SPARK-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323195#comment-15323195 ]
Zhan Zhang commented on SPARK-15848: ------------------------------------ cat > file1.csv<<EOF 0,38,91 0,65,28 0,78,16 1,34,96 1,78,14 1,11,43 EOF cat > file2.csv<<EOF 5,300,100 7,650,20 8,780,160 1,340,963 9,780,142 2,110,430 EOF CREATE TABLE csv_table ( STUDENT_ID INT, SUBJECT_ID INT, marks INT) PARTITIONED BY (Year INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH "file1.csv" OVERWRITE INTO TABLE csv_table PARTITION (Year = '2002'); LOAD DATA LOCAL INPATH "file2.csv" OVERWRITE INTO TABLE csv_table PARTITION (Year = '2000'); CREATE TABLE avro_table_uppercase PARTITIONeD BY ( YEAR INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.literal'='{ "namespace": "com.example.avro", "name": "student_marks", "type": "record", "fields": [ { "name":"STUDENT_ID","type":"int"}, { "name":"SUBJECT_ID","type":"int"}, { "name":"marks","type":"int"}] }'); INSERT OVERWRITE TABLE avro_table_uppercase partition(Year) SELECT STUDENT_ID, SUBJECT_ID, marks,Year FROM csv_table ; Now from hive, we can successfully run : select * from avro_table_uppercase; But from spark-shell, we find: scala> val tbl = sqlContext.table("default.avro_table_uppercase"); scala> tbl.show +----------+----------+-----+----+ |student_id|subject_id|marks|year| +----------+----------+-----+----+ | null| null| 100|2000| | null| null| 20|2000| | null| null| 160|2000| | null| null| 963|2000| | null| null| 142|2000| | null| null| 430|2000| | null| null| 91|2002| | null| null| 28|2002| | null| null| 16|2002| | null| null| 96|2002| | null| null| 14|2002| | null| null| 43|2002| +----------+----------+-----+----+ > Spark unable to read partitioned table in avro format and column name in > upper case > ----------------------------------------------------------------------------------- > > Key: SPARK-15848 > URL: https://issues.apache.org/jira/browse/SPARK-15848 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Zhan Zhang > > If external partitioned Hive tables created in Avro format. > Spark is returning "null" values if columns names are in Uppercase in the > Avro schema. > The same tables return proper data when queried in the Hive client. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org