[ 
https://issues.apache.org/jira/browse/PIG-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064942#comment-13064942
 ] 

Thejas M Nair commented on PIG-2143:
------------------------------------

In PigStorage.applySchema, the schema is being deserialised in every call. It 
needs to be done only the first time.

Some minor things - 

formatting of a curly braces  in  the functions -  PigStorage(String delimiter, 
String options) and applySchema(Tuple tup) are off .

I ran test-patch, and there was a new javadoc warning - 

>     [javac] /tmp/trunk/src/org/apache/pig/builtin/JsonMetadata.java:94: 
> warning: [deprecation] 
> fullPath(java.lang.String,org.apache.pig.backend.datastorage.DataStorage) in 
> org.apache.pig.impl.io.FileLocalizer has been deprecated
>     [javac]         String fullPath = FileLocalizer.fullPath(path, storage);

There is also a javac warning from JsonMetadata.java, but that can be addressed 
separately (different jira) since JsonMetadata.java is just copied from 
existing file. 
 /tmp/trunk/src/org/apache/pig/builtin/JsonMetadata.java:191: warning - Tag 
@see: can't find getStatistics(String, Configuration) in 
org.apache.pig.LoadMetadata


Other thoughts (we can also track these in different jira's)
- If  PigStorage without parameters is used (ie same as specifying no load 
func) on data that has been stored with '-schema'. Should the schema and delim 
from metadata be used ?

- PigStorage is used with delim on file stored with '-schema', should it throw 
error if the delim in metadata file is different ? or warn and just use the 
delim specified in metadata file ?



> Improvements for PigStorage
> ---------------------------
>
>                 Key: PIG-2143
>                 URL: https://issues.apache.org/jira/browse/PIG-2143
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2143.diff
>
>
> I'd like to propose that we allow for a greater degree of customization in 
> PigStorage.
> An incomplete list features that we might want to add:
> - flag to tell it to overwrite existing output if it exists
> - flag to tell it to compress output using gzip|bzip|lzo (currently this can 
> be achieved by setting the directory name to end in .gz or .bz2, which is a 
> bit awkward)
> - flag to tell it to store the schema and header (perhaps by merging in 
> PigStorageSchema work?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to