[ https://issues.apache.org/jira/browse/PIG-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143649#comment-14143649 ]
Daniel Dai commented on PIG-3441: --------------------------------- I see the exception. The new resource gets introduced in LogicalPlanBuilder.buildLoadOp when we process the load statement. When we create LOLoad, we will read schema from MyFileSystem, and MyFileSystem do expect the entry in myfs.xml there, here is the stack: {code} Caused by: java.lang.IllegalStateException: This is the error mentioned in PIG-3441 at pig.pig3441.MyFileSystem.initialize(MyFileSystem.java:38) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53) at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:109) at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:189) at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:538) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175) at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:885) {code} LogicalPlanBuilder.buildLoadOp is after HExecutionEngine.init, where we inject all the configuration in. so the new entries in myfs-site.xml are not there. By the time we check missing resources again in MapReduceLauncher.launchPig, it is too late. The reason your fix works is because in HDataStorage.init:67, we create configuration with defaults and then pass to FileSystem.get(). Before we make potentially disrupting change to always create configuration with defaults, can we first introduce a config which you can pass additional resource file (myfs-site.xml)? > Allow Pig to use default resources from Configuration objects > ------------------------------------------------------------- > > Key: PIG-3441 > URL: https://issues.apache.org/jira/browse/PIG-3441 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.11.1 > Reporter: Bhooshan Mogal > Assignee: Daniel Dai > Attachments: PIG-3441-2.patch, PIG-3441-3.patch, PIG-3441.patch, > PIG-3441_1.patch > > > Pig currently ignores parameters from configuration files added statically to > Configuration objects as Configuration.addDefaultResource(filename.xml). > Consider the following scenario - > In a hadoop FileSystem driver for a non-HDFS filesystem you load properties > specific to that FileSystem in a static initializer block in the class that > extends org.apache.hadoop.fs.Filesystem for your FileSystem like below - > {code} > class MyFileSystem extends FileSystem { > static { > Configuration.addDefaultResource("myfs-default.xml"); > Configuration.addDefaultResource("myfs-site.xml"); > } > } > {code} > Interfaces like the Hadoop CLI, Hive, Hadoop M/R can find configuration > parameters defined in these configuration files as long as they are on the > classpath. > However, Pig cannot find parameters from these files, because it ignores > configuration files added statically. > Pig should allow users to specify if they would like pig to read parameters > from resources loaded statically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)