[ 
https://issues.apache.org/jira/browse/PIG-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144225#comment-13144225
 ] 

Alan Gates commented on PIG-2209:
---------------------------------

I agree with Dmitriy that this will be very useful to most users with pretty 
minimal cost.  My concern is in the glob case, where we're potentially doing 
thousands of stats on the NameNode.  I would suggest adding a cap on the number 
of directories it could read, and providing a variable users could set to up 
this if they need to.  For example, if a glob tried to access more than 100 
directories, it would fail with a message like:

Error:  PigStorage exceeded max number of input directories.  To avoid this, 
you can turn of auto schema detection by setting what.ever.the.variable.is to 
false or you can increase the maximum allowed directories by setting 
what.ever.that.variable.is (warning, this will increase the load on your 
NameNode).

Olga, I don't understand your concern for backward compatibility.  If the user 
has both a schema and an as clause we try to massage the schema into the as 
clause.  The only issue will be if they store it with a schema and then give an 
as clause that is not compatible by our casting rules (e.g. the schema says a 
field is a long and they declare it as a string in the as clause).  Do you 
think that case is common?

                
> JsonMetadata fails to find schema for glob paths
> ------------------------------------------------
>
>                 Key: PIG-2209
>                 URL: https://issues.apache.org/jira/browse/PIG-2209
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Blocker
>             Fix For: 0.10
>
>
> JsonMetadata, used in PigStorage to work with serialized schemas, does not 
> correctly interpret paths like '/foo/bar/{1,2,3}' and throws an exception:
> {code}
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1131: 
> Could not find schema file for file:///foo/bar/{1,2}
>       at 
> org.apache.pig.builtin.JsonMetadata.nullOrException(JsonMetadata.java:217)
>       at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:186)
>       at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:438)
>       at 
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
>       ... 17 more
> Caused by: java.io.IOException: Unable to read file:///foo/bar/z/{1,2}
>       at 
> org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
>       at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:183)
>       ... 19 more
> Caused by: java.net.URISyntaxException: Illegal character in path at index 
> 36: file:///foo/bar/{1,2}
>       at java.net.URI$Parser.fail(URI.java:2809)
>       at java.net.URI$Parser.checkChars(URI.java:2982)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to