[ https://issues.apache.org/jira/browse/PIG-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681267#comment-13681267 ]
Rohini Palaniswamy commented on PIG-3318: ----------------------------------------- Ran TestAvroStorage before committing. Encountered Testcase: testMultipleSchemasWithDefaultValue took 3.543 sec Caused an ERROR Not a data file. java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84) at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.verifyResults(TestAvroStorage.java:1292) at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.verifyResults(TestAvroStorage.java:1262) at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testMultipleSchemasWithDefaultValue(TestAvroStorage.java:704) The problem is that the testcase is storing output using PigStorage and so output is not a avro file. I tried changing it to AvroStorage but still failed as it did not match with the expected_testMultipleSchemasWithDefaultValue.avro. Can you fix the testcase? Also can you rename Employee*.ser to Employee*.avro to be consistent with naming. > AVRO: 'default value' not honored when merging schemas on load with > AvroStorage > ------------------------------------------------------------------------------- > > Key: PIG-3318 > URL: https://issues.apache.org/jira/browse/PIG-3318 > Project: Pig > Issue Type: Bug > Components: piggybank > Affects Versions: 0.11.2 > Reporter: Viraj Bhat > Assignee: Viraj Bhat > Labels: patch > Fix For: 0.12, 0.11.2 > > Attachments: Employee3.ser, Employee4.ser, Employee6.ser, > expected_testMultipleSchemasWithDefaultValue.avro, PIG-3318_3.patch > > > Piggybank - AvroStorage. When merging multiple schemas where default values > have been specified in the avro schema; > The AvroStorage puts nulls in the merged data set. > ==> Employee3.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0 }, > {"name" : "dept", "type": "string", "default" : "DU"} ] } > ==> Employee4.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int", "default" : 0}, > {"name" : "dept", "type": "string", "default" : "DU"}, > {"name" : "office", "type": "string", "default" : "OU"} ] } > ==> Employee6.avro <== > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "lastname", "type": "string", "default" : "LNU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "salary", "type": "int", "default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"} ] } > The pig script: > employee = load 'employee{3,4,6}.ser' using > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > describe employee; > dump employee; > Output Schema: > employee: {name: chararray,age: int,dept: chararray,lastname: > chararray,salary: int,office: chararray} > (Milo,30,DH,,,) > (Asmya,34,PQ,,,) > (Baljit,23,RS,,,) > (Pune,60,Astrophysics,Warriors,5466,UTA) > (Rajsathan,20,Biochemistry,Royals,1378,Stanford) > (Chennai,50,Microbiology,Superkings,7338,Hopkins) > (Mumbai,20,Applied Math,Indians,4468,UAH) > (Praj,54,RMX,,,Champaign) > (Buba,767,HD,,,Sunnyvale) > (Manku,375,MS,,,New York) > Regards > Viraj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira