[
https://issues.apache.org/jira/browse/PIG-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144225#comment-13144225
]
Alan Gates commented on PIG-2209:
---------------------------------
I agree with Dmitriy that this will be very useful to most users with pretty
minimal cost. My concern is in the glob case, where we're potentially doing
thousands of stats on the NameNode. I would suggest adding a cap on the number
of directories it could read, and providing a variable users could set to up
this if they need to. For example, if a glob tried to access more than 100
directories, it would fail with a message like:
Error: PigStorage exceeded max number of input directories. To avoid this,
you can turn of auto schema detection by setting what.ever.the.variable.is to
false or you can increase the maximum allowed directories by setting
what.ever.that.variable.is (warning, this will increase the load on your
NameNode).
Olga, I don't understand your concern for backward compatibility. If the user
has both a schema and an as clause we try to massage the schema into the as
clause. The only issue will be if they store it with a schema and then give an
as clause that is not compatible by our casting rules (e.g. the schema says a
field is a long and they declare it as a string in the as clause). Do you
think that case is common?
> JsonMetadata fails to find schema for glob paths
> ------------------------------------------------
>
> Key: PIG-2209
> URL: https://issues.apache.org/jira/browse/PIG-2209
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.10
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Priority: Blocker
> Fix For: 0.10
>
>
> JsonMetadata, used in PigStorage to work with serialized schemas, does not
> correctly interpret paths like '/foo/bar/{1,2,3}' and throws an exception:
> {code}
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1131:
> Could not find schema file for file:///foo/bar/{1,2}
> at
> org.apache.pig.builtin.JsonMetadata.nullOrException(JsonMetadata.java:217)
> at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:186)
> at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:438)
> at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
> ... 17 more
> Caused by: java.io.IOException: Unable to read file:///foo/bar/z/{1,2}
> at
> org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
> at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:183)
> ... 19 more
> Caused by: java.net.URISyntaxException: Illegal character in path at index
> 36: file:///foo/bar/{1,2}
> at java.net.URI$Parser.fail(URI.java:2809)
> at java.net.URI$Parser.checkChars(URI.java:2982)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira