[ https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-760: ---------------------------------- Attachment: pigstorageschema-2.patch New patch to address findbugs and make the classes a little nicer to use. Made internal fields protected, since having them public *and* having getters/setters didn't really make sense. Setters now return "this", so that they can be chained. Array setters make a copy of the passed in array. Getters return the internal array, so it's still possible to shoot oneself in the foot (as findbugs points out), but side-effecting those arrays is the intended use case. Still flat-schemas only, haven't gotten around to wrestling the Jackson Parser on this one. David -- do you need nested schemas? Submitting as a patch so that Hudson can have a go. Would appreciate code comments, especially with regards to the interfaces (and changes I made to them) from the Load/Store redesign proposal. We probably want to hold off on commiting this until the new interfaces settle in a bit. > Serialize schemas for PigStorage() and other storage types. > ----------------------------------------------------------- > > Key: PIG-760 > URL: https://issues.apache.org/jira/browse/PIG-760 > Project: Pig > Issue Type: New Feature > Reporter: David Ciemiewicz > Assignee: Dmitriy V. Ryaboy > Fix For: 0.6.0 > > Attachments: pigstorageschema-2.patch, pigstorageschema.patch > > > I'm finding PigStorage() really convenient for storage and data interchange > because it compresses well and imports into Excel and other analysis > environments well. > However, it is a pain when it comes to maintenance because the columns are in > fixed locations and I'd like to add columns in some cases. > It would be great if load PigStorage() could read a default schema from a > .schema file stored with the data and if store PigStorage() could store a > .schema file with the data. > I have tested this out and both Hadoop HDFS and Pig in -exectype local mode > will ignore a file called .schema in a directory of part files. > So, for example, if I have a chain of Pig scripts I execute such as: > A = load 'data-1' using PigStorage() as ( a: int , b: int ); > store A into 'data-2' using PigStorage(); > B = load 'data-2' using PigStorage(); > describe B; > describe B should output something like { a: int, b: int } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.