[
https://issues.apache.org/jira/browse/PIG-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652591#comment-13652591
]
Viraj Bhat commented on PIG-3312:
---------------------------------
Hi Hans,
Could you try upgrading only the piggybank.jar, which contains the AvroStorage
related classes from Pig 0.8.1 to Pig 0.10.1. I did not see this problem in Pig
0.10.1 and beyond.
user_data= LOAD 'twitter_files/twitter.avro' using
org.apache.pig.piggybank.storage.avro.AvroStorage();
describe user_data;
dump user_data;
Results in:
(miguno,Rock: Nerf paper, scissors is fine.,1366150681)
(BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
(Test1,One Tweet,1366154490)
You however cannot read the twitter.json using AvroStorage.
Caused by: java.io.IOException: Not a data file.
at
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:218)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:169)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:145)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:293)
at
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
... 18 more
Viraj
> Pig duplicates avro records
> ---------------------------
>
> Key: PIG-3312
> URL: https://issues.apache.org/jira/browse/PIG-3312
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.8.1
> Reporter: Hans Uhlig
> Attachments: twitter.avro, twitter.avsc, twitter.json
>
>
> Pig will report avro records twice.
> To Reproduce:
> * Place attached files on hdfs
> * run pig
> > register lib/piggybank.jar
> > register lib/avro-1.7.4.jar
> > register lib/json-simple-1.1.jar
> > register lib/jackson-mapper-asl-1.6.0.jar
> > register lib/jackson-core-asl-1.6.0.jar
> > user_data= LOAD 'twitter.avro' using
> > org.apache.pig.piggybank.storage.avro.AvroStorage();
> > dump user_data;
> Result:
> (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
> (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
> (Test1,One Tweet,1366154490)
> (miguno,Rock: Nerf paper, scissors is fine.,1366150681)
> (BlizzardCS,Works as intended. Terran is IMBA.,1366154481)
> (Test1,One Tweet,1366154490)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira