Extra ZEROS getting generated in Target table
Hey folks, I am executing INSERT INTO SELECT hive query through oozie workflow. Below are the details. Extra zeros are getting generated in target table(parquet_table). Can anyone help me to identify the issue. Ex. Value of COLUMN_SAMPLE in source table(tab1) : 1 Value of COLUMN_SAMPLE in destination table(parquet_table) : 100 Avro column: "name":"COLUMN_SAMPLE","type":["null",{"type":"bytes","logicalType":"decimal","precision":"38","scale":"18"}] Select Query(insert.sql): insert into table parquet_table select * from default.tab1; OOZIE HIVE Action: ${jobTracker} ${nameNode} ${hive_site_location} hive.querylog.location /tmp hive.exec.local.scratchdir /tmp hive.root.logger INFO,console /user/file/insert.sql Regards, Dhaval
RE: Want to Add New Column in Avro Schema
Thanks guys. Just updated the .avsc and it’s done. No need to recreated the table again. Regards, Dhaval From: Maulik Gandhi [mailto:mmg...@gmail.com] Sent: Wednesday, March 23, 2016 7:05 PM To: user Cc: er.dcpa...@gmail.com Subject: Re: Want to Add New Column in Avro Schema Create table DDL looks right to me. How are you updating avro.schema.url ? Thanks. - Maulik On Wed, Mar 23, 2016 at 8:29 AM, Lunagariya, Dhaval <dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>> wrote: Here is the DDL. DROP TABLE IF EXISTS TEST; CREATE EXTERNAL TABLE TEST PARTITIONED BY ( COL1 STRING, COL2 STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs:///data/hive/TEST' TBLPROPERTIES ('avro.schema.url'='hdfs:///user/Test.avsc'); Thanks, Dhaval From: Aaron.Dossett [mailto:aaron.doss...@target.com<mailto:aaron.doss...@target.com>] Sent: Wednesday, March 23, 2016 6:50 PM To: user@avro.apache.org<mailto:user@avro.apache.org> Cc: 'er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>' Subject: Re: Want to Add New Column in Avro Schema You shouldn’t have to drop the table, just update the .avsc. Can you share the DDL you use to create the table? From: "Lunagariya, Dhaval" <dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>> Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Date: Wednesday, March 23, 2016 at 8:17 AM To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" <er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>> Subject: RE: Want to Add New Column in Avro Schema Yes. I made require changes in .avsc file and I drop the table and re-created using updated .avsc. But I am not getting existing data in that case. Where am I wrong? Can you through some light Thanks, Dhaval From: Aaron.Dossett [mailto:aaron.doss...@target.com] Sent: Wednesday, March 23, 2016 6:36 PM To: user@avro.apache.org<mailto:user@avro.apache.org> Cc: 'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>' Subject: Re: Want to Add New Column in Avro Schema If you create the external table by reference to the .avsc file (TBLPROPERTIES ( 'avro.schema.url’=‘hdfs://foo.avsc')) the all you have to do is update that avsc file in a compatible way and Hive should reflect the new schema. I’ve implemented this pattern in my production system for several months now. -Aaron From: "Lunagariya, Dhaval" <dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>> Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Date: Wednesday, March 23, 2016 at 6:32 AM To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" <er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>> Subject: Want to Add New Column in Avro Schema Hey folks, I want to add new column in existing Hive Table. We created external hive table with the help of .avsc. Now I want to add new column in that table. How can I do that without disturbing any data present in table? Please Help. Regards, Dhaval
RE: Want to Add New Column in Avro Schema
Here is the DDL. DROP TABLE IF EXISTS TEST; CREATE EXTERNAL TABLE TEST PARTITIONED BY ( COL1 STRING, COL2 STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs:///data/hive/TEST' TBLPROPERTIES ('avro.schema.url'='hdfs:///user/Test.avsc'); Thanks, Dhaval From: Aaron.Dossett [mailto:aaron.doss...@target.com] Sent: Wednesday, March 23, 2016 6:50 PM To: user@avro.apache.org Cc: 'er.dcpa...@gmail.com' Subject: Re: Want to Add New Column in Avro Schema You shouldn't have to drop the table, just update the .avsc. Can you share the DDL you use to create the table? From: "Lunagariya, Dhaval" <dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>> Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Date: Wednesday, March 23, 2016 at 8:17 AM To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" <er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>> Subject: RE: Want to Add New Column in Avro Schema Yes. I made require changes in .avsc file and I drop the table and re-created using updated .avsc. But I am not getting existing data in that case. Where am I wrong? Can you through some light Thanks, Dhaval From: Aaron.Dossett [mailto:aaron.doss...@target.com] Sent: Wednesday, March 23, 2016 6:36 PM To: user@avro.apache.org<mailto:user@avro.apache.org> Cc: 'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>' Subject: Re: Want to Add New Column in Avro Schema If you create the external table by reference to the .avsc file (TBLPROPERTIES ( 'avro.schema.url'='hdfs://foo.avsc')) the all you have to do is update that avsc file in a compatible way and Hive should reflect the new schema. I've implemented this pattern in my production system for several months now. -Aaron From: "Lunagariya, Dhaval" <dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>> Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Date: Wednesday, March 23, 2016 at 6:32 AM To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" <er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>> Subject: Want to Add New Column in Avro Schema Hey folks, I want to add new column in existing Hive Table. We created external hive table with the help of .avsc. Now I want to add new column in that table. How can I do that without disturbing any data present in table? Please Help. Regards, Dhaval
RE: Want to Add New Column in Avro Schema
Yes. I made require changes in .avsc file and I drop the table and re-created using updated .avsc. But I am not getting existing data in that case. Where am I wrong? Can you through some light Thanks, Dhaval From: Aaron.Dossett [mailto:aaron.doss...@target.com] Sent: Wednesday, March 23, 2016 6:36 PM To: user@avro.apache.org Cc: 'er.dcpa...@gmail.com' Subject: Re: Want to Add New Column in Avro Schema If you create the external table by reference to the .avsc file (TBLPROPERTIES ( 'avro.schema.url'='hdfs://foo.avsc')) the all you have to do is update that avsc file in a compatible way and Hive should reflect the new schema. I've implemented this pattern in my production system for several months now. -Aaron From: "Lunagariya, Dhaval" <dhaval.lunagar...@citi.com<mailto:dhaval.lunagar...@citi.com>> Reply-To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Date: Wednesday, March 23, 2016 at 6:32 AM To: "user@avro.apache.org<mailto:user@avro.apache.org>" <user@avro.apache.org<mailto:user@avro.apache.org>> Cc: "'er.dcpa...@gmail.com<mailto:'er.dcpa...@gmail.com>'" <er.dcpa...@gmail.com<mailto:er.dcpa...@gmail.com>> Subject: Want to Add New Column in Avro Schema Hey folks, I want to add new column in existing Hive Table. We created external hive table with the help of .avsc. Now I want to add new column in that table. How can I do that without disturbing any data present in table? Please Help. Regards, Dhaval
Want to Add New Column in Avro Schema
Hey folks, I want to add new column in existing Hive Table. We created external hive table with the help of .avsc. Now I want to add new column in that table. How can I do that without disturbing any data present in table? Please Help. Regards, Dhaval
Exception : Not in union
Hey folks, Getting below Exception. Seems problem with schema file. I am unable to figure out. Can anyone help me. org.apache.crunch.CrunchRuntimeException: org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["null","string"]: 0 at org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:45) at org.apache.crunch.MapFn.process(MapFn.java:34) at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98) at org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56) at com.citi.risk.retail.etl.bigdata.function.GC20FeedEnrichmentFn.process(GC20FeedEnrichmentFn.java:71) at com.citi.risk.retail.etl.bigdata.function.GC20FeedEnrichmentFn.process(GC20FeedEnrichmentFn.java:28) at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98) at org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56) at org.apache.crunch.MapFn.process(MapFn.java:34) at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:98) at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:109) at org.apache.crunch.impl.mr.run.CrunchMapper.map(CrunchMapper.java:60) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["null","string"]: 0 at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296) at org.apache.crunch.types.avro.AvroOutputFormat$1.write(AvroOutputFormat.java:86) at org.apache.crunch.types.avro.AvroOutputFormat$1.write(AvroOutputFormat.java:83) at org.apache.crunch.io.CrunchOutputs.write(CrunchOutputs.java:133) at org.apache.crunch.impl.mr.emit.MultipleOutputEmitter.emit(MultipleOutputEmitter.java:41) ... 19 more Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null","string"]: 0 at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:607) at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290) ... 23 more Regards, Dhaval