[ https://issues.apache.org/jira/browse/PIG-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144225#comment-13144225 ]
Alan Gates commented on PIG-2209: --------------------------------- I agree with Dmitriy that this will be very useful to most users with pretty minimal cost. My concern is in the glob case, where we're potentially doing thousands of stats on the NameNode. I would suggest adding a cap on the number of directories it could read, and providing a variable users could set to up this if they need to. For example, if a glob tried to access more than 100 directories, it would fail with a message like: Error: PigStorage exceeded max number of input directories. To avoid this, you can turn of auto schema detection by setting what.ever.the.variable.is to false or you can increase the maximum allowed directories by setting what.ever.that.variable.is (warning, this will increase the load on your NameNode). Olga, I don't understand your concern for backward compatibility. If the user has both a schema and an as clause we try to massage the schema into the as clause. The only issue will be if they store it with a schema and then give an as clause that is not compatible by our casting rules (e.g. the schema says a field is a long and they declare it as a string in the as clause). Do you think that case is common? > JsonMetadata fails to find schema for glob paths > ------------------------------------------------ > > Key: PIG-2209 > URL: https://issues.apache.org/jira/browse/PIG-2209 > Project: Pig > Issue Type: Bug > Affects Versions: 0.10 > Reporter: Dmitriy V. Ryaboy > Assignee: Dmitriy V. Ryaboy > Priority: Blocker > Fix For: 0.10 > > > JsonMetadata, used in PigStorage to work with serialized schemas, does not > correctly interpret paths like '/foo/bar/{1,2,3}' and throws an exception: > {code} > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1131: > Could not find schema file for file:///foo/bar/{1,2} > at > org.apache.pig.builtin.JsonMetadata.nullOrException(JsonMetadata.java:217) > at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:186) > at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:438) > at > org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150) > ... 17 more > Caused by: java.io.IOException: Unable to read file:///foo/bar/z/{1,2} > at > org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106) > at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:183) > ... 19 more > Caused by: java.net.URISyntaxException: Illegal character in path at index > 36: file:///foo/bar/{1,2} > at java.net.URI$Parser.fail(URI.java:2809) > at java.net.URI$Parser.checkChars(URI.java:2982) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira