[jira] [Commented] (PIG-4628) Pig 0.14 job with order by fails in mapreduce mode with Oozie
[ https://issues.apache.org/jira/browse/PIG-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693974#comment-14693974 ] Viraj Bhat commented on PIG-4628: - Thanks Koji for your help. Viraj > Pig 0.14 job with order by fails in mapreduce mode with Oozie > - > > Key: PIG-4628 > URL: https://issues.apache.org/jira/browse/PIG-4628 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.14.0, 0.15.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Fix For: 0.15.1 > > Attachments: pig-4628-v01.patch, pig-4628-v02.patch > > > A simple pig script with order-by submitted through oozie and running with > mapreduce-mode > {code} > A = LOAD '$input' AS (a1:CHARARRAY,a2:CHARARRAY, ); > A_sorted = ORDER A BY url DESC PARALLEL 2; > STORE A_sorted INTO '$output'; > {code} > failed on our hadoop cluster which had security turned on. Part of the stack > trace had > {noformat} > 2015-06-08 22:24:39,246 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: > java.lang.RuntimeException: java.io.IOException: Exception reading > file:/tmp/2/yarn-local/usercache/userA/appcache/application_1432697993142_199266/container_e06_1432697993142_199266_01_03/container_tokens > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.init(WeightedRangePartitioner.java:155) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:75) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:58) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:281) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > {noformat} > This failing job was from application_1432697993142_199305 and the error path > was from application_1432697993142_199266 which was a oozie pig-launcher job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4628) Pig 0.14 job with order by fails in mapreduce mode with Oozie
[ https://issues.apache.org/jira/browse/PIG-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692669#comment-14692669 ] Viraj Bhat commented on PIG-4628: - Rohini can you please commit this to trunk and or backport to 0.14. We are running on Pig 0.14 with M/R mode and faced this problem. Viraj > Pig 0.14 job with order by fails in mapreduce mode with Oozie > - > > Key: PIG-4628 > URL: https://issues.apache.org/jira/browse/PIG-4628 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.14.0, 0.15.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-4628-v01.patch, pig-4628-v02.patch > > > A simple pig script with order-by submitted through oozie and running with > mapreduce-mode > {code} > A = LOAD '$input' AS (a1:CHARARRAY,a2:CHARARRAY, ); > A_sorted = ORDER A BY url DESC PARALLEL 2; > STORE A_sorted INTO '$output'; > {code} > failed on our hadoop cluster which had security turned on. Part of the stack > trace had > {noformat} > 2015-06-08 22:24:39,246 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: > java.lang.RuntimeException: java.io.IOException: Exception reading > file:/tmp/2/yarn-local/usercache/userA/appcache/application_1432697993142_199266/container_e06_1432697993142_199266_01_03/container_tokens > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.init(WeightedRangePartitioner.java:155) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:75) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:58) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:281) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > {noformat} > This failing job was from application_1432697993142_199305 and the error path > was from application_1432697993142_199266 which was a oozie pig-launcher job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4498) AvroStorage in Piggbank does not handle bad records and fails
[ https://issues.apache.org/jira/browse/PIG-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-4498: Attachment: PIG-4498.patch > AvroStorage in Piggbank does not handle bad records and fails > - > > Key: PIG-4498 > URL: https://issues.apache.org/jira/browse/PIG-4498 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.12.0, 0.11.1, 0.13.1, 0.14.1 > Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: piggybank > Fix For: 0.14.1 > > Attachments: PIG-4498.patch > > > The following Pig script fails if the records within the file are corrupted. > {code} > DEFINE AvroLoader > org.apache.pig.piggybank.storage.avro.AvroStorage('ignore_bad_files'); > DH_RAW = LOAD 'bad_data*' USING AvroLoader(); > STORE DH_RAW INTO 'output' USING PigStorage(); > {code} > Here is the stack trace: > {quote} > java.lang.ArrayIndexOutOfBoundsException: -49 at > org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:230) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:407) > ... 12 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -49 at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at > org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readMap(PigAvroDatumReader.java:89) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) > at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at > org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at > org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:198) > .. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4498) AvroStorage in Piggbank does not handle bad records and fails
[ https://issues.apache.org/jira/browse/PIG-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-4498: Labels: piggybank (was: ) Status: Patch Available (was: Open) > AvroStorage in Piggbank does not handle bad records and fails > - > > Key: PIG-4498 > URL: https://issues.apache.org/jira/browse/PIG-4498 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1, 0.12.0, 0.13.1, 0.14.1 > Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: piggybank > Fix For: 0.14.1 > > Attachments: PIG-4498.patch > > > The following Pig script fails if the records within the file are corrupted. > {code} > DEFINE AvroLoader > org.apache.pig.piggybank.storage.avro.AvroStorage('ignore_bad_files'); > DH_RAW = LOAD 'bad_data*' USING AvroLoader(); > STORE DH_RAW INTO 'output' USING PigStorage(); > {code} > Here is the stack trace: > {quote} > java.lang.ArrayIndexOutOfBoundsException: -49 at > org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:230) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:407) > ... 12 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -49 at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at > org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readMap(PigAvroDatumReader.java:89) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) > at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at > org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at > org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:198) > .. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4498) AvroStorage in Piggbank does not handle bad records and fails
[ https://issues.apache.org/jira/browse/PIG-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-4498: Affects Version/s: 0.13.1 0.12.0 > AvroStorage in Piggbank does not handle bad records and fails > - > > Key: PIG-4498 > URL: https://issues.apache.org/jira/browse/PIG-4498 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.12.0, 0.11.1, 0.13.1, 0.14.1 > Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.14.1 > > > The following Pig script fails if the records within the file are corrupted. > {code} > DEFINE AvroLoader > org.apache.pig.piggybank.storage.avro.AvroStorage('ignore_bad_files'); > DH_RAW = LOAD 'bad_data*' USING AvroLoader(); > STORE DH_RAW INTO 'output' USING PigStorage(); > {code} > Here is the stack trace: > {quote} > java.lang.ArrayIndexOutOfBoundsException: -49 at > org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:230) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:407) > ... 12 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -49 at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at > org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readMap(PigAvroDatumReader.java:89) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) > at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at > org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at > org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:198) > .. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4498) AvroStorage in Piggbank does not handle bad records and fails
Viraj Bhat created PIG-4498: --- Summary: AvroStorage in Piggbank does not handle bad records and fails Key: PIG-4498 URL: https://issues.apache.org/jira/browse/PIG-4498 Project: Pig Issue Type: Bug Components: piggybank Affects Versions: 0.11.1, 0.14.1 Reporter: Viraj Bhat Assignee: Viraj Bhat Fix For: 0.14.1 The following Pig script fails if the records within the file are corrupted. {code} DEFINE AvroLoader org.apache.pig.piggybank.storage.avro.AvroStorage('ignore_bad_files'); DH_RAW = LOAD 'bad_data*' USING AvroLoader(); STORE DH_RAW INTO 'output' USING PigStorage(); {code} Here is the stack trace: {quote} java.lang.ArrayIndexOutOfBoundsException: -49 at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:230) at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:407) ... 12 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -49 at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readMap(PigAvroDatumReader.java:89) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:198) .. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3222) New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer
[ https://issues.apache.org/jira/browse/PIG-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887401#comment-13887401 ] Viraj Bhat commented on PIG-3222: - Hi Daniel, It seems that this patch is in our code base for Pig 0.11. But still the query fails. I succeeds in Pig 0.12. I have asked Rohini if she has an idea on this. Thanks again Viraj > New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer > --- > > Key: PIG-3222 > URL: https://issues.apache.org/jira/browse/PIG-3222 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11 >Reporter: Feng Peng > Labels: hcatalog > Attachments: PigStorerDemo.java, hcat.trace, hcatstorer.trace.txt > > > Pig 0.11 assigns different UDFContextSignature for different invocations of > the same load/store statement. This change breaks the HCatStorer which > assumes all front-end and back-end invocations of the same store statement > has the same UDFContextSignature so that it can read the previously stored > information correctly. > The related HCatalog code is in > https://svn.apache.org/repos/asf/incubator/hcatalog/branches/branch-0.5/hcatalog-pig-adapter/src/main/java/org/apache/hcatalog/pig/HCatStorer.java > (the setStoreLocation() function). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3222) New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer
[ https://issues.apache.org/jira/browse/PIG-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887216#comment-13887216 ] Viraj Bhat commented on PIG-3222: - Hi Feng, Thanks for finding this error in Pig 0.11. It seems the limit to HCatStorer works fine with Pig 0.12 but is still a problem with Pig 0.11. Not sure if we need to backport something that got this working in Pig 0.12 Viraj > New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer > --- > > Key: PIG-3222 > URL: https://issues.apache.org/jira/browse/PIG-3222 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11 >Reporter: Feng Peng > Labels: hcatalog > Attachments: PigStorerDemo.java, hcat.trace, hcatstorer.trace.txt > > > Pig 0.11 assigns different UDFContextSignature for different invocations of > the same load/store statement. This change breaks the HCatStorer which > assumes all front-end and back-end invocations of the same store statement > has the same UDFContextSignature so that it can read the previously stored > information correctly. > The related HCatalog code is in > https://svn.apache.org/repos/asf/incubator/hcatalog/branches/branch-0.5/hcatalog-pig-adapter/src/main/java/org/apache/hcatalog/pig/HCatStorer.java > (the setStoreLocation() function). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
FW: IEEE CloudCom 2013 Call For Papers
Kindly consider submitting. Viraj From: c...@grid.chu.edu.tw [mailto:c...@grid.chu.edu.tw] Sent: Saturday, July 06, 2013 10:42 PM To: Viraj Bhat Subject: IEEE CloudCom 2013 Call For Papers Call for Papers IEEE CloudCom 2013 (5th IEEE International Conference on Cloud Computing, Technology and Science) 2-5 December 2013, Bristol, UK 2013.cloudcom.org General Information --- The “Cloud” is a natural evolution of distributed computing and of the widespread adaption of virtualization and SOA. In Cloud Computing, IT-related capabilities and resources are provided as services, via the Internet and on-demand, accessible without requiring detailed knowledge of the underlying technology. The IEEE International Conference and Workshops on Cloud Computing Technology and Science, steered by the Cloud Computing Association, aim to bring together researchers who work on cloud computing and related technologies. Important Dates --- Paper submission - July 31, 2013 Workshop, poster and demo papers – August 5, 2013 Notification – September 2, 2013 Camera-ready – September 16, 2013 Paper Submission - Manuscripts need to be prepared according to the IEEE CS format: http://www.computer.org/portal/web/cscps/formatting For regular papers, the page limit will be 8 pages. For workshops and Ph.D. consortium, the page limit will be 6 pages. For poster and demo, the page limit will be 4 pages. All accepted papers will be published by IEEE CS Press (IEEE Xplore) and Indexed by EI and ISSN. Accepted papers will be asked to present in a plenary session. Distinguished papers will be invited to be extended for submission in prestigious international journals. IEEE Transactions on Cloud Computing (TCC: http://computer.org/TCC) is organising a Special Issue which encourages submission of revised and extended versions of best/top rated papers in the area of Cloud Computing from IEEE CloudCom 2013. The IEEE CloudCom 2013 submission site is: https://www.easychair.org/conferences/?conf=ieeecloudcom2013 Topics of Interest -- ‧ Cloud architecture ‧ Big Data ‧ Security and Privacy in the Cloud ‧ Cloud services and Applications ‧ Virtualization ‧ HPC on Cloud ‧ IoT and Mobile on Cloud For further details and workshop information see http://2013.cloudcom.org or send enquiries to ieeecloudcom2...@easychair.org<mailto:ieeecloudcom2...@easychair.org> To subscribe other emails or see information of this mailing list, please go to http://grid.chu.edu.tw/mailling_list/subscribe.php To unsubscribe, please click http://grid.chu.edu.tw/unsubscribe.php?mail=vi...@yahoo-inc.com For other questions, please send email to cfp-ad...@grid.chu.edu.tw<mailto:cfp-ad...@grid.chu.edu.tw>
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: Employee6.avro > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.avro, Employee4.avro, Employee6.avro, > PIG-3318_5.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: Employee4.avro > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.avro, Employee4.avro, Employee6.avro, > PIG-3318_5.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: PIG-3318_5.patch > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.avro, Employee4.avro, Employee6.avro, > PIG-3318_5.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: Employee3.avro > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.avro, Employee4.avro, Employee6.avro, > PIG-3318_5.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated June 14, 2013, 12:15 a.m.) Review request for pig and Rohini Palaniswamy. Changes --- Indentation and variable case changes. Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1491562 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: Employee3.ser) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: Employee6.ser) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: Employee4.ser) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated June 13, 2013, 6:34 p.m.) Review request for pig and Rohini Palaniswamy. Changes --- 1) Change the testcase to use mockstorage 2) Remove the condition that does not verify results in Hadoop 23 3) Add back the "usemultipleSchemas" flag to handle cases when schemaToMergedSchemaMap is null and multiple_schemas is invoked. Test case testMultipleSchema1 fails for the previous patch 4) Testing done with Hadoop 23 Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1491562 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: PIG-3318_3.patch) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681848#comment-13681848 ] Viraj Bhat commented on PIG-3318: - Sorry for attaching the wrong patch, which makes the test case write to an Avro file. I have modified the test to use mock.Storage(), will reattach the correct patch. Viraj > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 > Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > PIG-3318_3.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: expected_testMultipleSchemasWithDefaultValue.avro) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > PIG-3318_3.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: PIG-3318_3.patch > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasWithDefaultValue.avro, PIG-3318_3.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated June 12, 2013, 2:05 a.m.) Review request for pig and Rohini Palaniswamy. Changes --- Modified changes with formatting Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1491562 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: PIG-3318_2.patch) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasWithDefaultValue.avro > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: PIG-3318_1.patch) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasWithDefaultValue.avro, PIG-3318_2.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: PIG-3318_2.patch > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasWithDefaultValue.avro, PIG-3318_2.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated June 11, 2013, 9:40 p.m.) Review request for pig and Rohini Palaniswamy. Changes --- Addressing comments in diff6 Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1491562 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: PIG-3118.0.11.patch) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasWithDefaultValue.avro, PIG-3318_1.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: expected_testMultipleSchemasWithDefaultValue.avro > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasWithDefaultValue.avro, PIG-3318_1.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: PIG-3318_1.patch > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > PIG-3118.0.11.patch, PIG-3318_1.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: expected_testMultipleSchemasDefault1.avro) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > PIG-3118.0.11.patch, PIG-3318_1.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: (was: PIG-3318.patch) > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > PIG-3118.0.11.patch, PIG-3318_1.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated June 11, 2013, 3:04 a.m.) Review request for pig and Rohini Palaniswamy. Changes --- Updated patch Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1491556 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1491562 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
Re: Review Request: PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
> On June 10, 2013, 7:06 p.m., Rohini Palaniswamy wrote: > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java, > > lines 327-338 > > <https://reviews.apache.org/r/11135/diff/5/?file=295050#file295050line327> > > > > This can be simplified into few lines This was fixed by creating a new function which will make the code more readable. - Viraj --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/#review21662 ------- On May 30, 2013, 2:28 a.m., Viraj Bhat wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/11135/ > --- > > (Updated May 30, 2013, 2:28 a.m.) > > > Review request for pig and Rohini Palaniswamy. > > > Description > --- > > Default values are not honoured when merging default schema > > > This addresses bug PIG-3318. > https://issues.apache.org/jira/browse/PIG-3318 > > > Diffs > - > > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java > 1484564 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java > 1484564 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java > 1484564 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java > 1484564 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java > 1484564 > > Diff: https://reviews.apache.org/r/11135/diff/ > > > Testing > --- > > Yes > > > Thanks, > > Viraj Bhat > >
[jira] [Created] (PIG-3353) Feature parity between HCatStorer and HCatLoader in Pig using Avroserde and Piggybank AvroStorage
Viraj Bhat created PIG-3353: --- Summary: Feature parity between HCatStorer and HCatLoader in Pig using Avroserde and Piggybank AvroStorage Key: PIG-3353 URL: https://issues.apache.org/jira/browse/PIG-3353 Project: Pig Issue Type: Improvement Reporter: Viraj Bhat Currently there are 2 paths for accessing a Avro File in Pig. One using the HCatLoader and HCatStorer and the other using the AvroStorage in piggybank. We need to investigate the feature differences between the two access patterns. Regards Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: PIG-3331_1.patch Updated patch > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > Attachments: expected_DefaultSchemaWrite.avro, PIG-3331_1.patch > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: (was: PIG-3331_1.patch) > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > Attachments: expected_DefaultSchemaWrite.avro, PIG-3331_1.patch > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3331 Default values not written to Schema when specified in the output schema
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11355/ --- (Updated June 4, 2013, 11:23 p.m.) Review request for pig and Rohini Palaniswamy. Changes --- Updated the patch based on PIG-3322 Viraj Description --- Patch to write default values to the Schema when the writer schema contains that in the AvroStorage. Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigSchema2Avro.java 1485826 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1489655 Diff: https://reviews.apache.org/r/11355/diff/ Testing --- Yes against the Piggybank in Pig trunk/Pig 0.12 Thanks, Viraj Bhat
Re: Review Request: PIG-3331 Default values not written to Schema when specified in the output schema
> On June 2, 2013, 8:55 p.m., Cheolsoo Park wrote: > > Hi Viraj, > > > > I have a couple of comments: > > - 5k records seems unnecessary for a unit test case. You need just a few > > records to verify your fix, don't you? > > - In you test case, can't you use mock.Storage instead of PigStorage? Then, > > you won't need an extra input file (numbers.txt). Please see > > org.apache.pig.builtin.mock.Storage.java. > > - Can you put code changes and test files in a single patch and attach it > > in the jira? It would be very helpful if I could apply everything with a > > single patch command. > > > > Thank you! Hi Cheolsoo, Thanks for your comments fixed the test case and removed the PigStorage() and replaced it with mock.Storage. Viraj - Viraj --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11355/#review21304 --- On June 4, 2013, 9:50 p.m., Viraj Bhat wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/11355/ > --- > > (Updated June 4, 2013, 9:50 p.m.) > > > Review request for pig and Rohini Palaniswamy. > > > Description > --- > > Patch to write default values to the Schema when the writer schema contains > that in the AvroStorage. > > > Diffs > - > > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigSchema2Avro.java > 1485826 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java > 1485826 > > Diff: https://reviews.apache.org/r/11355/diff/ > > > Testing > --- > > Yes against the Piggybank in Pig trunk/Pig 0.12 > > > Thanks, > > Viraj Bhat > >
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: PIG-3331_1.patch Latest patch > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > Attachments: expected_DefaultSchemaWrite.avro, PIG-3331_1.patch > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: (was: PIG-3331_1.patch) > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > Attachments: expected_DefaultSchemaWrite.avro, PIG-3331_1.patch > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3331 Default values not written to Schema when specified in the output schema
> On June 3, 2013, 1:20 p.m., Rohini Palaniswamy wrote: > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, > > lines 629-636 > > <https://reviews.apache.org/r/11355/diff/2/?file=295976#file295976line629> > > > > Isn't a load and store enough to reproduce the test case? Why such a > > long pig script? Please try to keep the unit tests simple. Made a smaller script to test it. - Viraj --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11355/#review21315 --- On June 4, 2013, 9:50 p.m., Viraj Bhat wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/11355/ > --- > > (Updated June 4, 2013, 9:50 p.m.) > > > Review request for pig and Rohini Palaniswamy. > > > Description > --- > > Patch to write default values to the Schema when the writer schema contains > that in the AvroStorage. > > > Diffs > - > > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigSchema2Avro.java > 1485826 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java > 1485826 > > Diff: https://reviews.apache.org/r/11355/diff/ > > > Testing > --- > > Yes against the Piggybank in Pig trunk/Pig 0.12 > > > Thanks, > > Viraj Bhat > >
Re: Review Request: PIG-3331 Default values not written to Schema when specified in the output schema
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11355/ --- (Updated June 4, 2013, 9:50 p.m.) Review request for pig and Rohini Palaniswamy. Changes --- 1) Changed patch to use mock.Storage() 2) Smaller generated avro file Description --- Patch to write default values to the Schema when the writer schema contains that in the AvroStorage. Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigSchema2Avro.java 1485826 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1485826 Diff: https://reviews.apache.org/r/11355/diff/ Testing --- Yes against the Piggybank in Pig trunk/Pig 0.12 Thanks, Viraj Bhat
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: expected_DefaultSchemaWrite.avro Expected Avro file > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > Attachments: expected_DefaultSchemaWrite.avro, PIG-3331_1.patch > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: PIG-3331_1.patch Updated Pig patch > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > Attachments: expected_DefaultSchemaWrite.avro, PIG-3331_1.patch > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: (was: expected_DefaultSchemaWrite.avro) > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: (was: numbers.txt) > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: (was: test_loadavrowithnulls.avro) > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: PIG-3322_3.patch > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: test_loadavrowithnulls.avro > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: (was: expected_testLoadAvrowithNulls.txt) > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: (was: PIG-3322_2.patch) > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase
> On June 2, 2013, 9:27 p.m., Cheolsoo Park wrote: > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java, > > line 1104 > > <https://reviews.apache.org/r/11333/diff/5/?file=298357#file298357line1104> > > > > If you use mock.Storage here instead of PigStoage, you won't need the > > verifyTextResults method and extra output file. Can you please update your > > test? > > > > Please see org.apache.pig.builtin.mock.Storage.java. Added Mock Storage - Viraj --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11333/#review21305 ------- On June 4, 2013, 12:15 a.m., Viraj Bhat wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/11333/ > --- > > (Updated June 4, 2013, 12:15 a.m.) > > > Review request for pig and Rohini Palaniswamy. > > > Description > --- > > Null pointer exception when loading union with null in it's schema. Test case > was also updated with a sample test case. > > > Diffs > - > > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java > 1485358 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java > 1485358 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java > 1485358 > > Diff: https://reviews.apache.org/r/11333/diff/ > > > Testing > --- > > Yes all tests pass in the piggybank > > > Thanks, > > Viraj Bhat > >
Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase
> On June 3, 2013, 1:03 p.m., Rohini Palaniswamy wrote: > > Just minor comments in the naming of the variable. Java variable names > > should be camel case. Thanks but now the verifyTxtResults method is not used any more - Viraj --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11333/#review21312 --- On June 4, 2013, 12:15 a.m., Viraj Bhat wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/11333/ > --- > > (Updated June 4, 2013, 12:15 a.m.) > > > Review request for pig and Rohini Palaniswamy. > > > Description > --- > > Null pointer exception when loading union with null in it's schema. Test case > was also updated with a sample test case. > > > Diffs > - > > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java > 1485358 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java > 1485358 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java > 1485358 > > Diff: https://reviews.apache.org/r/11333/diff/ > > > Testing > --- > > Yes all tests pass in the piggybank > > > Thanks, > > Viraj Bhat > >
Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11333/ --- (Updated June 4, 2013, 12:15 a.m.) Review request for pig and Rohini Palaniswamy. Changes --- Using MockStorage instead of the PigStorage and comparing results inline for 4 records. Description --- Null pointer exception when loading union with null in it's schema. Test case was also updated with a sample test case. Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1485358 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1485358 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1485358 Diff: https://reviews.apache.org/r/11333/diff/ Testing --- Yes all tests pass in the piggybank Thanks, Viraj Bhat
Re: Review Request: PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated May 30, 2013, 2:28 a.m.) Review request for pig and Rohini Palaniswamy. Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1484564 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
Re: Review Request: PIG-3331 Default values not written to Schema when specified in the output schema
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11355/ --- (Updated May 30, 2013, 2:29 a.m.) Review request for pig and Rohini Palaniswamy. Description --- Patch to write default values to the Schema when the writer schema contains that in the AvroStorage. Diffs - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigSchema2Avro.java 1485826 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1485826 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/numbers.txt PRE-CREATION Diff: https://reviews.apache.org/r/11355/diff/ Testing --- Yes against the Piggybank in Pig trunk/Pig 0.12 Thanks, Viraj Bhat
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: expected_testLoadAvrowithNulls.txt Golden test file generated > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > Attachments: expected_testLoadAvrowithNulls.txt, PIG-3322_2.patch, > test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: test_loadavrowithnulls.avro Test Input Avro file > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > Attachments: expected_testLoadAvrowithNulls.txt, PIG-3322_2.patch, > test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: PIG-3322_2.patch Patch for PIG-3322 > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > Attachments: expected_testLoadAvrowithNulls.txt, PIG-3322_2.patch, > test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: (was: test_loadavrowithnulls.avro) > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: (was: expected_testLoadAvrowithNulls.txt) > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11333/ --- (Updated May 29, 2013, 11:07 p.m.) Review request for pig and Rohini Palaniswamy. Changes --- Smaller input files and output golden files Description --- Null pointer exception when loading union with null in it's schema. Test case was also updated with a sample test case. Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1485358 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1485358 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1485358 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testLoadAvrowithNulls.txt PRE-CREATION Diff: https://reviews.apache.org/r/11333/diff/ Testing --- Yes all tests pass in the piggybank Thanks, Viraj Bhat
[jira] [Commented] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665713#comment-13665713 ] Viraj Bhat commented on PIG-3331: - Patch posted on the review board. https://reviews.apache.org/r/11355/ Viraj > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > Attachments: expected_DefaultSchemaWrite.avro, numbers.txt > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: numbers.txt Input text file with numbers > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > Attachments: expected_DefaultSchemaWrite.avro, numbers.txt > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Attachment: expected_DefaultSchemaWrite.avro ExpectedAvro file with Default Schema > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > Attachments: expected_DefaultSchemaWrite.avro, numbers.txt > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated May 23, 2013, 12:12 a.m.) Review request for pig. Summary (updated) - PIG-3318 Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1484564 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
[jira] [Commented] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664543#comment-13664543 ] Viraj Bhat commented on PIG-3322: - Review board https://reviews.apache.org/r/11333/ > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Attachments: expected_testLoadAvrowithNulls.txt, > test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Fix Version/s: 0.12 > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12 > > Attachments: expected_testLoadAvrowithNulls.txt, > test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: expected_testLoadAvrowithNulls.txt Expected File generated from the testcase > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Attachments: expected_testLoadAvrowithNulls.txt, > test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3322: Attachment: test_loadavrowithnulls.avro Avro file used for the TestAvroStorage.java > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Attachments: expected_testLoadAvrowithNulls.txt, > test_loadavrowithnulls.avro > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2330) Problem in org.apache.pig.piggybank.storage.avro.AvroStorage when storing a record with a single field.
[ https://issues.apache.org/jira/browse/PIG-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662567#comment-13662567 ] Viraj Bhat commented on PIG-2330: - Hi, The issue here is not related to : PIG-3322. The 1 line fix should solve the above problem. Consider a change in the script to add TOTUPLE: The below works to generate the following {code} A = load 'input.txt' AS (name1:chararray, name2:chararray); B = foreach A generate TOTUPLE($0); dump B; store B into 'singlefieldoutput' using org.apache.pig.piggybank.storage.avro.AvroStorage('{"schema": {"type": "record", "name": "main", "fields": [{"name": "name", "type": ["null", "string"]}]}}') {code} Output {noformat} ((Viraj)) ((Roh)) ((Govind)) {noformat} The table provided in: https://cwiki.apache.org/PIG/avrostorage.html shows that it is possible to convert from Pig Tuple to Avro Record as they are set of ordered fields. But is not possible to convert from "chararray" to "record". In Pig you cannot generate a single chararray, it is always wrapped by a tuple. Try loading the output generated by the older Pig script. {code} A = load 'singlefieldoutput' using org.apache.pig.piggybank.storage.avro.AvroStorage(); describe A; dump A; {code} Now we see the following: {noformat} (Viraj) (Roh) (Govind) {noformat} Which is different from "dump B" Viraj > Problem in org.apache.pig.piggybank.storage.avro.AvroStorage when storing a > record with a single field. > --- > > Key: PIG-2330 > URL: https://issues.apache.org/jira/browse/PIG-2330 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.9.0 >Reporter: Stan Rosenberg > Attachments: AvroStorage.patch, input.txt > > > Running the following script yields a RuntimeException. If the schema is > changed to contain two fields, then A can be stored successfully. > {noformat} > REGISTER 'piggybank.jar' > REGISTER 'avro-1.5.4.jar' > REGISTER 'json-simple-1.1.jar' > A = load 'input.txt' AS (name1:chararray, name2:chararray); > B = foreach A generate $0; > store B into './output' using > org.apache.pig.piggybank.storage.avro.AvroStorage( > '{"schema": {"type": "record", "name": "main", "fields": [{"name": "name", > "type": ["null", "string"]}]}}'); > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3323) AVRO: default value not stored in file when given as paramter to AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13662558#comment-13662558 ] Viraj Bhat commented on PIG-3323: - Hi Scott, Thanks for your explanation for understanding default values. The documentation on this is limited. BTW I have opened up: PIG-3331 which I think is valid. Please let me know if it is not. Regards Viraj > AVRO: default value not stored in file when given as paramter to AvroStorage > > > Key: PIG-3323 > URL: https://issues.apache.org/jira/browse/PIG-3323 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > A pig script like the below succeeds, but inspecting the resulting file I > find that the schema is stripped of the default value specification. > {code} > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > describe c2; > dump c2; > store c2 into ':OUTPATH:.intermediate_2' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_2", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ], > "default" : 0 > } > ] >} > } > '); > {code} > BTW, the documentation on https://cwiki.apache.org/PIG/avrostorage.html is > mute on the subject of defaults, so first question is: is my expectation that > the default is to be written to file not correct? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated May 21, 2013, 1:05 a.m.) Review request for pig. Changes --- Sorry for the spam. Hopefully the no more white spaces missed my attention. Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1484564 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
Re: Review Request: Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated May 21, 2013, 1 a.m.) Review request for pig. Changes --- Removed extra white spaces which escaped my attention and minor formatting changes. Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1484564 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
Re: Review Request: Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated May 21, 2013, 12:42 a.m.) Review request for pig. Changes --- Removed extra white spaces. Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1484564 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
Re: Review Request: Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- (Updated May 21, 2013, 12:23 a.m.) Review request for pig. Changes --- Removed Tabs and rebased patch with PIG-3321 Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs (updated) - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1484564 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1484564 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
Re: Review Request: Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
> On May 15, 2013, 12:19 a.m., Rohini Palaniswamy wrote: > > Please fix formatting - spaces instead of tabs and no extra white spaces. > > This patch will conflict with PIG-3321. Can you merge the changes once that > > is committed and upload a new patch? Hi Rohini, I have removed all the tabs and merged PIG-3321. Resubmitting again. Viraj - Viraj --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/#review20552 --- On May 14, 2013, 1:09 a.m., Viraj Bhat wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/11135/ > --- > > (Updated May 14, 2013, 1:09 a.m.) > > > Review request for pig. > > > Description > --- > > Default values are not honoured when merging default schema > > > This addresses bug PIG-3318. > https://issues.apache.org/jira/browse/PIG-3318 > > > Diffs > - > > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java > 1481245 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java > 1481245 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java > 1481245 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java > 1481245 > > http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java > 1481245 > > Diff: https://reviews.apache.org/r/11135/diff/ > > > Testing > --- > > Yes > > > Thanks, > > Viraj Bhat > >
[jira] [Updated] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3331: Description: Script which stores Avro using a predefined schema does not store the default values in the file {code} a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: float,doublenum: double); b2 = foreach a generate id, intnum5, intnum100; c2 = filter b2 by 110 <= id and id < 120; STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } '); {code} Opening the file shows the following schema {noformat} avro.schema {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} {noformat} There seems to be a problem storing the schema. Viraj was: Script which stores Avro using a predefined schema does not store the default values in the file {code} a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: float,doublenum: double); b2 = foreach a generate id, intnum5, intnum100; c2 = filter b2 by 110 <= id and id < 120; STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } '); {code} Opening the file shows the following schema {quote} avro.schema {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} {quote} There seems to be a problem storing the schema. Viraj > Default values not stored in avro file when using specific schemas during > store in AvroStorage > -- > > Key: PIG-3331 > URL: https://issues.apache.org/jira/browse/PIG-3331 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.1 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.11.2 > > > Script which stores Avro using a predefined schema does not store the default > values in the file > {code} > a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: > int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING > org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : > { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", > "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", > "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } > '); > {code} > Opening the file shows the following schema > {noformat} > avro.schema > {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} > {noformat} > There seems to be a problem storing the schema. > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3331) Default values not stored in avro file when using specific schemas during store in AvroStorage
Viraj Bhat created PIG-3331: --- Summary: Default values not stored in avro file when using specific schemas during store in AvroStorage Key: PIG-3331 URL: https://issues.apache.org/jira/browse/PIG-3331 Project: Pig Issue Type: Bug Components: piggybank Affects Versions: 0.11.1 Reporter: Viraj Bhat Assignee: Viraj Bhat Fix For: 0.11.2 Script which stores Avro using a predefined schema does not store the default values in the file {code} a = LOAD 'numbers.txt' USING PigStorage (':') as (intnum1000: int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: float,doublenum: double); b2 = foreach a generate id, intnum5, intnum100; c2 = filter b2 by 110 <= id and id < 120; STORE c2 INTO '/tmp/TestAvroStorage/testDefaultValueWrite' USING org.apache.pig.piggybank.storage.avro.AvroStorage (' { "debug" : 5, "schema" : { "name" : "rmyrecord", "type" : "record", "fields" : [ { "name" : "id", "type" : "int" , "default" : 0 }, { "name" : "intnum5", "type" : "int", "default" : 0 }, { "name" : "intnum100", "type" : "int", "default" : 0 } ] } } '); {code} Opening the file shows the following schema {quote} avro.schema {"type":"record","name":"rmyrecord","fields":[{"name":"id","type":"int"},{"name":"intnum5","type":"int"},{"name":"intnum100","type":"int"}]} {quote} There seems to be a problem storing the schema. Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3323) AVRO: default value not stored in file when given as paramter to AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat resolved PIG-3323. - Resolution: Invalid > AVRO: default value not stored in file when given as paramter to AvroStorage > > > Key: PIG-3323 > URL: https://issues.apache.org/jira/browse/PIG-3323 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > A pig script like the below succeeds, but inspecting the resulting file I > find that the schema is stripped of the default value specification. > {code} > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > describe c2; > dump c2; > store c2 into ':OUTPATH:.intermediate_2' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_2", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ], > "default" : 0 > } > ] >} > } > '); > {code} > BTW, the documentation on https://cwiki.apache.org/PIG/avrostorage.html is > mute on the subject of defaults, so first question is: is my expectation that > the default is to be written to file not correct? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3323) AVRO: default value not stored in file when given as paramter to AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661142#comment-13661142 ] Viraj Bhat commented on PIG-3323: - One correction on my first comment: Default values for union fields correspond to the first schema in the union according to the specification. So for the above use case posted by Egil, the final Output Schema should not contain the default value. In fact there is a bug in AvroStorage which does not write the default values of the individual fields. I will open another Jira and close this one. Viraj > AVRO: default value not stored in file when given as paramter to AvroStorage > > > Key: PIG-3323 > URL: https://issues.apache.org/jira/browse/PIG-3323 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > A pig script like the below succeeds, but inspecting the resulting file I > find that the schema is stripped of the default value specification. > {code} > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > describe c2; > dump c2; > store c2 into ':OUTPATH:.intermediate_2' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_2", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ], > "default" : 0 > } > ] >} > } > '); > {code} > BTW, the documentation on https://cwiki.apache.org/PIG/avrostorage.html is > mute on the subject of defaults, so first question is: is my expectation that > the default is to be written to file not correct? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3323) AVRO: default value not stored in file when given as paramter to AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661019#comment-13661019 ] Viraj Bhat commented on PIG-3323: - Spoke to Egil offline: His original comments were: 1) Should default value be written to a file? Ans) It should be if it is specified for a valid Complex Types. 2) Should Default schema specification be written to the file's metadata? Ans) It should be if it is valid for that Complex Type. Since Union does not support default it was not written out. But we need to see how the default schema's work for other data types. Viraj > AVRO: default value not stored in file when given as paramter to AvroStorage > > > Key: PIG-3323 > URL: https://issues.apache.org/jira/browse/PIG-3323 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > A pig script like the below succeeds, but inspecting the resulting file I > find that the schema is stripped of the default value specification. > {code} > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > describe c2; > dump c2; > store c2 into ':OUTPATH:.intermediate_2' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_2", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ], > "default" : 0 > } > ] >} > } > '); > {code} > BTW, the documentation on https://cwiki.apache.org/PIG/avrostorage.html is > mute on the subject of defaults, so first question is: is my expectation that > the default is to be written to file not correct? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3323) AVRO: default value not stored in file when given as paramter to AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660959#comment-13660959 ] Viraj Bhat commented on PIG-3323: - Hi Egil, I looked at the specification of the UNION, Default types and the source code in: "PigAvroDatumWriter" Field: "intum100" is a UNION of "null" and "int". So the type can be a "null" or an "int" That means if Pig does not find a value for "intnum100" in the previous step before the store it will generate null which is perfectly acceptable here. So the default value makes no sense here if the item does not exist. Also if you remove "null" from the specification of "intnumm100" and hope the default value is written out, there is another problem: If you read specification for Unions http://avro.apache.org/docs/current/spec.html#Unions plus Section on Default Values http://avro.apache.org/docs/current/spec.html#schema_complex Union does not have any default values in the specification. Closing a INVAILD Regards Viraj > AVRO: default value not stored in file when given as paramter to AvroStorage > > > Key: PIG-3323 > URL: https://issues.apache.org/jira/browse/PIG-3323 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > A pig script like the below succeeds, but inspecting the resulting file I > find that the schema is stripped of the default value specification. > {code} > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b2 = foreach a generate id, intnum5, intnum100; > c2 = filter b2 by 110 <= id and id < 120; > describe c2; > dump c2; > store c2 into ':OUTPATH:.intermediate_2' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_2", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ], > "default" : 0 > } > ] >} > } > '); > {code} > BTW, the documentation on https://cwiki.apache.org/PIG/avrostorage.html is > mute on the subject of defaults, so first question is: is my expectation that > the default is to be written to file not correct? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat reopened PIG-3322: - Hi Egil, The issue here is that the field "t" from the original data "studentcomplextab10k" set contains nulls. (fred hernandez,73,1.87) (fred hernandez,20,2.11) (calvin allen,60,2.49) (yuri zipper,76,2.05) So when this is stored via the AvroStorage, nulls are stored for the record. When you read it out the written avro from the previous store, it fails with a null pointer exception. The following snippet below works without any problems. {code} a = load 'studentcomplextab10k' using PigStorage() as (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, age:int, gpa:double)}); b = foreach a generate t; c = filter b by t is not null; store c into 'singltupleavronotnull' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); exec; b = load 'singltupleavronotnull' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); describe b; dump b; {code} Kindly note: This issue is different from PIG-2330 > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3320) AVRO: no empty field expressed when loading with AvroStorage using reader schema with extra field that has no default
[ https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat resolved PIG-3320. - Resolution: Invalid > AVRO: no empty field expressed when loading with AvroStorage using reader > schema with extra field that has no default > - > > Key: PIG-3320 > URL: https://issues.apache.org/jira/browse/PIG-3320 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Somewhat different use case than PIG-3318: > Loading with AvroStorage giving a loader schema that relative to the schema > in the Avro file had an extra filed w/o default and expected to see an extra > empty column, but the schema is as in the avro file w/o the extra column. > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 2, > # storing using writer schema > # loading using reader schema with extra field that > has no default > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > -- Store Avro file w. schema > b1 = foreach a generate id, intnum5; > c1 = filter b1 by 10 <= id and id < 20; > describe c1; > dump c1; > store c1 into ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"schema" : { > "name" : "schema_writing", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > exec; > -- Read back what was stored with Avro adding extra field to reader schema > u = load ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_reading", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"string" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > describe u; > dump u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b = filter a by (10 <= id and id < 20); > c = foreach b generate id, intnum5, ''; > store c into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3320) AVRO: no empty field expressed when loading with AvroStorage using reader schema with extra field that has no default
[ https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659803#comment-13659803 ] Viraj Bhat commented on PIG-3320: - With PIG-3321 committed, the above script throws an error which is listed in Comment 2 of this Jira. Suppose we want AvroStorage() to return an extra field "intnum100" with null instead of throwing an error in Comment 2; you have to do the following: 1) Pass with a null reader schema PigAvroDatumReader 2) Construct an mProtoTuple with field size equal to readerSchema 3) Reconcile the schemas manually by using the logic in getSchemaToMergedSchemaMap() 4) Populate mProtoTuple using the map keeping track of new to old position By doing all the above we are undoing the changes done in PIG-3321, where the readerSchema is not passed to PigAvroDatumReader(). We want Avro to handle the schema merges in this case and it does it correctly by throwing an error. Currently closing this Jira as invalid. > AVRO: no empty field expressed when loading with AvroStorage using reader > schema with extra field that has no default > - > > Key: PIG-3320 > URL: https://issues.apache.org/jira/browse/PIG-3320 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Somewhat different use case than PIG-3318: > Loading with AvroStorage giving a loader schema that relative to the schema > in the Avro file had an extra filed w/o default and expected to see an extra > empty column, but the schema is as in the avro file w/o the extra column. > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 2, > # storing using writer schema > # loading using reader schema with extra field that > has no default > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > -- Store Avro file w. schema > b1 = foreach a generate id, intnum5; > c1 = filter b1 by 10 <= id and id < 20; > describe c1; > dump c1; > store c1 into ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"schema" : { > "name" : "schema_writing", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > exec; > -- Read back what was stored with Avro adding extra field to reader schema > u = load ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_reading", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"string" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > describe u; > dump u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b = filter a by (10 <= id and id < 20); > c = foreach b generate id, intnum5, ''; > store c into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3320) AVRO: no empty field expressed when loading with AvroStorage using reader schema with extra field that has no default
[ https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658808#comment-13658808 ] Viraj Bhat commented on PIG-3320: - Hi Rohini, The error in the 2nd comment is after taking PIG-3321 into effect. Viraj > AVRO: no empty field expressed when loading with AvroStorage using reader > schema with extra field that has no default > - > > Key: PIG-3320 > URL: https://issues.apache.org/jira/browse/PIG-3320 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Somewhat different use case than PIG-3318: > Loading with AvroStorage giving a loader schema that relative to the schema > in the Avro file had an extra filed w/o default and expected to see an extra > empty column, but the schema is as in the avro file w/o the extra column. > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 2, > # storing using writer schema > # loading using reader schema with extra field that > has no default > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > -- Store Avro file w. schema > b1 = foreach a generate id, intnum5; > c1 = filter b1 by 10 <= id and id < 20; > describe c1; > dump c1; > store c1 into ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"schema" : { > "name" : "schema_writing", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > exec; > -- Read back what was stored with Avro adding extra field to reader schema > u = load ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_reading", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"string" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > describe u; > dump u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b = filter a by (10 <= id and id < 20); > c = foreach b generate id, intnum5, ''; > store c into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3320) AVRO: no empty field expressed when loading with AvroStorage using reader schema with extra field that has no default
[ https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658680#comment-13658680 ] Viraj Bhat commented on PIG-3320: - Hi all, What I found out that is that if you supply a user defined schema that is different from the schema which the actual data contains; there is no reconciliation that happens. In fact we have to reconcile it case by case basis by using the same logic which multiple_schemas is using. By changing a part of the source code to read the user defined schema, it throws the following error. I think this is valid considering that previously the script was passing and returning results with no extra column. java.lang.Exception: java.io.IOException: org.apache.avro.AvroTypeException: Found { "type" : "record", "name" : "schema_writing", "fields" : [ { "name" : "id", "type" : [ "null", "int" ] }, { "name" : "intnum5", "type" : [ "null", "int" ] } ] }, expecting { "type" : "record", "name" : "schema_reading", "fields" : [ { "name" : "id", "type" : [ "null", "int" ] }, { "name" : "intnum5", "type" : [ "null", "string" ] }, { "name" : "intnum100", "type" : [ "null", "int" ] } ] } at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:399) Caused by: java.io.IOException: org.apache.avro.AvroTypeException: Found { "type" : "record", "name" : "schema_writing", "fields" : [ { "name" : "id", "type" : [ "null", "int" ] }, { "name" : "intnum5", "type" : [ "null", "int" ] } ] }, expecting { "type" : "record", "name" : "schema_reading", "fields" : [ { "name" : "id", "type" : [ "null", "int" ] }, { "name" : "intnum5", "type" : [ "null", "string" ] }, { "name" : "intnum100", "type" : [ "null", "int" ] } ] } at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:370) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:497) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Regards Viraj > AVRO: no empty field expressed when loading with AvroStorage using reader > schema with extra field that has no default > - > > Key: PIG-3320 > URL: https://issues.apache.org/jira/browse/PIG-3320 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Somewhat different use case than PIG-3318: > Loading with AvroStorage giving a loader schema that relative to the schema > in the Avro file had an extra filed w/o default and expected to see an extra > empty column, but the schema is as in the avro file w/o the extra column. > E.g. see the e2e style test, which fails on this: > {code} > { >
[jira] [Commented] (PIG-3320) AVRO: no empty field expressed when loading with AvroStorage using reader schema with extra field that has no default
[ https://issues.apache.org/jira/browse/PIG-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658093#comment-13658093 ] Viraj Bhat commented on PIG-3320: - It seems that the schema specified during load time is stored in "outputAvroSchema" but is not used when reading the underlying data. It will be used when writing out the data. PIG-3321 will enable to use this schema when reading the data but will need to investigate if it fixes the above problem. > AVRO: no empty field expressed when loading with AvroStorage using reader > schema with extra field that has no default > - > > Key: PIG-3320 > URL: https://issues.apache.org/jira/browse/PIG-3320 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > Somewhat different use case than PIG-3318: > Loading with AvroStorage giving a loader schema that relative to the schema > in the Avro file had an extra filed w/o default and expected to see an extra > empty column, but the schema is as in the avro file w/o the extra column. > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 2, > # storing using writer schema > # loading using reader schema with extra field that > has no default > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > -- Store Avro file w. schema > b1 = foreach a generate id, intnum5; > c1 = filter b1 by 10 <= id and id < 20; > describe c1; > dump c1; > store c1 into ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"schema" : { > "name" : "schema_writing", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > exec; > -- Read back what was stored with Avro adding extra field to reader schema > u = load ':OUTPATH:.intermediate_1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(' > { >"debug" : 5, >"schema" : { > "name" : "schema_reading", > "type" : "record", > "fields" : [ > { > "name" : "id", > "type" : [ >"null", >"int" > ] > }, > { > "name" : "intnum5", > "type" : [ >"null", >"string" > ] > }, > { > "name" : "intnum100", > "type" : [ >"null", >"int" > ] > } > ] >} > } > '); > describe u; > dump u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/types/numbers.txt' using PigStorage(':') as (intnum1000: > int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: > float,doublenum: double); > b = filter a by (10 <= id and id < 20); > c = foreach b generate id, intnum5, ''; > store c into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658092#comment-13658092 ] Viraj Bhat commented on PIG-3322: - I meant PIG-3320 .. > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658091#comment-13658091 ] Viraj Bhat commented on PIG-3322: - Sorry the above comment was intended for PIG-3220 Viraj > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema
[ https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657784#comment-13657784 ] Viraj Bhat commented on PIG-3322: - It seems that the schema specified during load time is stored in "outputAvroSchema" but is not used when reading the underlying data. It will be used when writing out the data. PIG-3321 will enable to use this schema when reading the data but will need to investigate if it fixes the above problem. > AVRO: AvroStorage give NPE on reading file with union as top level schema > - > > Key: PIG-3322 > URL: https://issues.apache.org/jira/browse/PIG-3322 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Egil Sorensen >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > > I am getting NPE when loading a file with AvroStorage a file that has schema > like: > {code} > ["null",{"type":"record","name":"TUPLE_0","fields":[{"name":"name","type":["null","string"],"doc":"autogenerated > from Pig Field > Schema"},{"name":"age","type":["null","int"],"doc":"autogenerated from Pig > Field Schema"},{"name":"gpa","type":["null","double"],"doc":"autogenerated > from Pig Field Schema"}]}] > {code} > E.g. see the e2e style test, which fails on this: > {code} > { > 'num' => 4, > # storing file with Pig type tuple relying on > conversion to record > # loading using stored schemas > 'notmq' => 1, > 'pig' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > exec; > -- Read back what was stored with Avro > u = load ':OUTPATH:.intermediate' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > describe u; > store u into ':OUTPATH:'; > \, > 'verify_pig_script' => q\ > a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as > (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, > age:int, gpa:double)}); > b = foreach a generate t; > describe b; > store b into ':OUTPATH:'; > \, > }, > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Patch to address default values when schemas are merged in AvroStorage. It does this for Records containing primitive values
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11135/ --- Review request for pig. Description --- Default values are not honoured when merging default schema This addresses bug PIG-3318. https://issues.apache.org/jira/browse/PIG-3318 Diffs - http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1481245 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1481245 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 1481245 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 1481245 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 1481245 Diff: https://reviews.apache.org/r/11135/diff/ Testing --- Yes Thanks, Viraj Bhat
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Tags: AvroStorage Labels: patch (was: ) Status: Patch Available (was: Open) Patch for adding default values for merged schemas. > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasDefault1.avro, PIG-3118.0.11.patch, PIG-3318.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: Employee6.ser Avro test file > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasDefault1.avro, PIG-3118.0.11.patch, PIG-3318.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: expected_testMultipleSchemasDefault1.avro Expected resulting avro file > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasDefault1.avro, PIG-3118.0.11.patch, PIG-3318.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: Employee4.ser avro test file > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasDefault1.avro, PIG-3118.0.11.patch, PIG-3318.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: PIG-3118.0.11.patch Patch for branch 0.11.2 > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.12, 0.11.2 > > Attachments: PIG-3118.0.11.patch, PIG-3318.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3318) AVRO: 'default value' not honored when merging schemas on load with AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-3318: Attachment: Employee3.ser Avro file > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > --- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.11.2 >Reporter: Viraj Bhat >Assignee: Viraj Bhat > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, PIG-3118.0.11.patch, PIG-3318.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira