[
https://issues.apache.org/jira/browse/PIG-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589116#comment-13589116
]
Cheolsoo Park commented on PIG-3142:
[~jpacker], thank you very much for the patch! This looks very useful. I have a
few minor comments:
* Can you move the following code to prepareToRead() and remove
requiredFieldsInitialized? The comment in LoadFunc says "This will be called
during execution before any calls to getNext()", so I think this kind of
initialization should be done there not in getNext().
{code:title=FixedWidthLoader.java}
+if (!requiredFieldsInitialized) {
+UDFContext udfc = UDFContext.getUDFContext();
+Properties p = udfc.getUDFProperties(this.getClass(), new String[]
{ udfContextSignature });
+requiredFields = (boolean[])
ObjectSerializer.deserialize(p.getProperty(REQUIRED_FIELDS_SIGNATURE));
+
+if (requiredFields != null) {
+numRequiredFields = 0;
+for (int i = 0; i < requiredFields.length; i++) {
+if (requiredFields[i])
+numRequiredFields++;
+}
+}
+
+requiredFieldsInitialized = true;
+}
{code}
In fact, I found almost all LoadFunc implementations (including PigStorage) do
the same trick. Looking at the commit history, I believe that this is a legacy
pattern that was introduced before the LoadFunc resign. It seems that we never
cleaned up this pattern when merging the LoadFunc redesign to trunk. At least,
let's stop using it from now on.
* Can you update the usage messages with Apache Pig namespace?
{code:title=FixedWidthLoader.java}
"Usage: com.mortardata.pig.FixedWidthLoader(" +
{code}
Also, the usage message in FixedWidthStorage incorrectly refers to
FixedWidthLoader.
{code:title=FixedWidthStorage.java}
"Usage: com.mortardata.pig.FixedWidthLoader(" +
{code}
* Can you factor out parseColumnSpec(String spec) and reuse it in both
FixedWidthLoader and FixedWidthStorage instead have duplicate code?
* Can you rename FixedWidthStorage to FixedWidthStorer? I think Storage refers
to Loader + Storer (e.g. PigStorage, AvroStorage, etc). So FixedWidthStorer is
probably a better name (e.g. HCatStorer).
Please let me know what you think. Thanks!
> Fixed-width load and store functions for the Piggybank
> --
>
> Key: PIG-3142
> URL: https://issues.apache.org/jira/browse/PIG-3142
> Project: Pig
> Issue Type: New Feature
> Components: piggybank
>Affects Versions: 0.11
>Reporter: Jonathan Packer
> Attachments: fixed-width.patch, fixed-width-updated.patch
>
>
> Adds load/store functions for fixed width data to the Piggybank. They use the
> syntax of the unix "cut" command to specify column positions, and have an
> option to skip the header row when loading or to write a header row when
> storing.
> The header handling works properly with multiple small files each with a
> header being combined into one split, or a large file with a single header
> being split into multiple splits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira