[ https://issues.apache.org/jira/browse/PIG-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652591#comment-13652591 ]
Viraj Bhat commented on PIG-3312: --------------------------------- Hi Hans, Could you try upgrading only the piggybank.jar, which contains the AvroStorage related classes from Pig 0.8.1 to Pig 0.10.1. I did not see this problem in Pig 0.10.1 and beyond. user_data= LOAD 'twitter_files/twitter.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); describe user_data; dump user_data; Results in: (miguno,Rock: Nerf paper, scissors is fine.,1366150681) (BlizzardCS,Works as intended. Terran is IMBA.,1366154481) (Test1,One Tweet,1366154490) You however cannot read the twitter.json using AvroStorage. Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84) at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:218) at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:169) at org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:145) at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:293) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151) ... 18 more Viraj > Pig duplicates avro records > --------------------------- > > Key: PIG-3312 > URL: https://issues.apache.org/jira/browse/PIG-3312 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.8.1 > Reporter: Hans Uhlig > Attachments: twitter.avro, twitter.avsc, twitter.json > > > Pig will report avro records twice. > To Reproduce: > * Place attached files on hdfs > * run pig > > register lib/piggybank.jar > > register lib/avro-1.7.4.jar > > register lib/json-simple-1.1.jar > > register lib/jackson-mapper-asl-1.6.0.jar > > register lib/jackson-core-asl-1.6.0.jar > > user_data= LOAD 'twitter.avro' using > > org.apache.pig.piggybank.storage.avro.AvroStorage(); > > dump user_data; > Result: > (miguno,Rock: Nerf paper, scissors is fine.,1366150681) > (BlizzardCS,Works as intended. Terran is IMBA.,1366154481) > (Test1,One Tweet,1366154490) > (miguno,Rock: Nerf paper, scissors is fine.,1366150681) > (BlizzardCS,Works as intended. Terran is IMBA.,1366154481) > (Test1,One Tweet,1366154490) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira