[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath resolved PIG-1378. --------------------------------- Release Note: The fix for this issue described in this jira depends on a issue with Hadoop code which was fixed on the hadoop trunk ( https://issues.apache.org/jira/browse/MAPREDUCE-1522). Until that goes into a hadoop release which is used by pig, this will remain an issue Resolution: Fixed Am closing this bug since the pig changes are in and hadoop changes are in trunk - this should work once we use the appropriate hadoop release. > har url of the form har:///<path> not usable in Pig scripts > (har://hdfs-namenode:port/<path> works) > --------------------------------------------------------------------------------------------------- > > Key: PIG-1378 > URL: https://issues.apache.org/jira/browse/PIG-1378 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.7.0 > Reporter: Viraj Bhat > Assignee: Pradeep Kamath > Fix For: 0.8.0 > > Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, > PIG-1378.patch > > > I am trying to use har (Hadoop Archives) in my Pig script. > I can use them through the HDFS shell > {noformat} > $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' > Found 1 items > -rw------- 5 viraj users 1537234 2010-04-14 09:49 > user/viraj/project/subproject/files/size/data/part-00001 > {noformat} > Using similar URL's in grunt yields > {noformat} > grunt> a = load 'har:///user/viraj/project/subproject/files/size/data'; > grunt> dump a; > {noformat} > {noformat} > 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2998: Unhandled internal error. > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible > file URI scheme: har : hdfs > 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There > is no log file to write to. > 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - > java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: > Incompatible file URI scheme: har : hdfs > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) > at > org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) > at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) > at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) > at org.apache.pig.PigServer.registerQuery(PigServer.java:425) > at > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) > at org.apache.pig.Main.main(Main.java:357) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: > Incompatible file URI scheme: har : hdfs > at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) > at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) > ... 13 more > {noformat} > According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the > following as stated in the original description > {noformat} > grunt> a = load > 'har://namenode-location/user/viraj/project/subproject/files/size/data'; > grunt> dump a; > {noformat} > {noformat} > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: > Unable to create input splits for: > har://namenode-location/user/viraj/project/subproject/files/size/data'; > ... 8 more > Caused by: java.io.IOException: No FileSystem for scheme: namenode-location > at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) > at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) > at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) > at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) > at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) > at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) > at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) > at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) > at > .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) > at > .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) > at > .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) > at > .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245) > {noformat} > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.