[
https://issues.apache.org/jira/browse/PIG-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pradeep Kamath updated PIG-517:
-------------------------------
Assignee: Pradeep Kamath
Status: Patch Available (was: Open)
Attached patch - details on the issue:
In case of a "cast" from bytearray to any other type, TypeCheckingVisitor tries
to match the cast to the corresponding load function for the source of the data
so that the load function can do the cast correctly ( since there is no generic
way to convert from bytearray to other types - only the load function which is
the source of the bytearray knows how to interpret the bytes present in the
bytearray). The TypeCheckingVisitor in the current code is setting a "LoadFunc"
reference in LOCast for this purpose. The LoadFunc reference does not have
information of the arguments needed to create the Loader object at runtime.
This manifests as a problem in the LogToPhyTranslator when it tries to create a
POCast from the LOCast.
The patch now uses FuncSpec to represent the loader function rather than a
LoadFunc reference. The FuncSpec has both the classname and the arguments to
the constructor of the loader.
> Custom Loader Function which takes in a constructor argument fails during
> typecast
> ----------------------------------------------------------------------------------
>
> Key: PIG-517
> URL: https://issues.apache.org/jira/browse/PIG-517
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Viraj Bhat
> Assignee: Pradeep Kamath
> Fix For: types_branch
>
> Attachments: phonenumber.txt, RegexLoader.java
>
>
> I have a custom loader function, known as RegexLoader that parses a line of
> input into fields using regex and then sets the fields. This RegexLoader
> extends Utf8StorageConverter and implements the LoadFunc. It takes in a
> constructor argument a regex string supplied by the user.
> The following piece of code, works when the loaded fields are not typecasted.
> {code}
> REGISTER pigudf2.0/java/build/loader.jar
> fullfile = load 'phonenumber.txt'
> using loader.RegexLoader('4*8')
> as (a,z,n) ;
> -- project required fields
> phonerecords = foreach fullfile {
> generate
> a as area,
> z as zone,
> n as number;
> }
> dump phonerecords;
> {code}
> But when the alias a is cast to int, the piece of script fails with the error
> java.io.IOException: Unable to open iterator for alias: phonerecords [Unable
> to store for alias: phonerecords [could not instantiate 'loader.RegexLoader'
> with arguments 'null']]
> {code}
> REGISTER pigudf2.0/java/build/loader.jar
> fullfile = load 'phonenumber.txt'
> using loader.RegexLoader('4*8')
> as (a,z,n) ;
> -- project required fields
> phonerecords = foreach fullfile {
> generate
> (int)a as area,
> z as zone,
> n as number;
> }
> dump phonerecords;
> {code}
> Full stack trace of the error:
> ==================================================================================================================
> java.io.IOException: Unable to open iterator for alias: phonerecords [Unable
> to store for alias: phonerecords [could not instantiate 'loader.RegexLoader'
> with arguments 'null']]
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:448)
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:454)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.instantiateFunc(POCast.java:67)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.setLoadFSpec(POCast.java:73)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1157)
> at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:60)
> at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:28)
> at
> org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:805)
> at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
> at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
> at
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:232)
> at org.apache.pig.PigServer.compilePp(PigServer.java:731)
> at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:644)
> at org.apache.pig.PigServer.store(PigServer.java:452)
> at org.apache.pig.PigServer.store(PigServer.java:421)
> at org.apache.pig.PigServer.openIterator(PigServer.java:384)
> at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> Caused by: java.io.IOException: Unable to store for alias: phonerecords
> [could not instantiate 'loader.RegexLoader' with arguments 'null']
> ... 24 more
> Caused by: java.lang.RuntimeException: could not instantiate
> 'loader.RegexLoader' with arguments 'null'
> ... 24 more
> Caused by: java.lang.InstantiationException: loader.RegexLoader
> at java.lang.Class.newInstance0(Class.java:340)
> at java.lang.Class.newInstance(Class.java:308)
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:418)
> ... 23 more
> ==================================================================================================================
> Attaching the custom RegexLoader with this Jira
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.