[ https://issues.apache.org/jira/browse/PIG-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vivek Padmanabhan updated PIG-2098: ----------------------------------- Description: While using phython udf, if I create a tuple with a single field, Pig execution fails with ClassCastException. Caused by: java.io.IOException: Error executing function: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert jython type to pig datatype java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple at org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245) An example to reproduce the issuue ; Pig Script {code} register 'mapkeys.py' using jython as mapkeys; A = load 'mapkeys.data' using PigStorage() as ( aMap: map[] ); C = foreach A generate mapkeys.keys(aMap); dump C; {code} mapkeys.py {code} @outputSchema("keys:bag{t:tuple(key:chararray)}") def keys(map): print "mapkeys.py:keys:map:", map outBag = [] for key in map.iterkeys(): t = (key) ## doesn't work, causes Pig to crash #t = (key,) ## adding empty value works :-/ outBag.append(t) print "mapkeys.py:keys:outBag:", outBag return outBag {code} Input data 'mapkeys.data' [name#John,phone#5551212] In the udf, t = (key) , because of this the item inside the bag is treated as a string instead of a tuple which causes for the class cast execption. If I provide an additional comma, t = (key,) , then the script goes through fine. >From code what I can see is that ,for "t = (key,)" , pythonToPig(..) recieves >the pyObject as [(u'name',), (u'phone',)] from the PyFunction call . But for "t = (key)" the return from PyFunction call is [u'name', u'phone'] was: While using phython udf, if I create a tuple with a single field, Pig execution fails with ClassCastException. Caused by: java.io.IOException: Error executing function: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert jython type to pig datatype java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.Tuple at org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245) An example to reproduce the issuue ; > jython - problem with single item tuple in bag > ---------------------------------------------- > > Key: PIG-2098 > URL: https://issues.apache.org/jira/browse/PIG-2098 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.1, 0.9.0 > Reporter: Vivek Padmanabhan > > While using phython udf, if I create a tuple with a single field, Pig > execution fails with ClassCastException. > Caused by: java.io.IOException: Error executing function: > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert > jython type to pig datatype java.lang.ClassCastException: java.lang.String > cannot be cast to org.apache.pig.data.Tuple > at > org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245) > An example to reproduce the issuue ; > Pig Script > {code} > register 'mapkeys.py' using jython as mapkeys; > A = load 'mapkeys.data' using PigStorage() as ( aMap: map[] ); > C = foreach A generate mapkeys.keys(aMap); > dump C; > {code} > mapkeys.py > {code} > @outputSchema("keys:bag{t:tuple(key:chararray)}") > def keys(map): > print "mapkeys.py:keys:map:", map > outBag = [] > for key in map.iterkeys(): > t = (key) ## doesn't work, causes Pig to crash > #t = (key,) ## adding empty value works :-/ > outBag.append(t) > print "mapkeys.py:keys:outBag:", outBag > return outBag > {code} > Input data 'mapkeys.data' > [name#John,phone#5551212] > In the udf, t = (key) , because of this the item inside the bag is treated as > a string instead of a tuple which causes for the class cast execption. > If I provide an additional comma, t = (key,) , then the script goes through > fine. > From code what I can see is that ,for "t = (key,)" , pythonToPig(..) recieves > the pyObject as [(u'name',), (u'phone',)] from the PyFunction call . > But for "t = (key)" the return from PyFunction call is [u'name', u'phone'] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira