[ 
https://issues.apache.org/jira/browse/PIG-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vivek Padmanabhan updated PIG-2098:
-----------------------------------

    Description: 
While using phython udf, if I create a tuple with a single field, Pig execution 
fails with ClassCastException.

Caused by: java.io.IOException: Error executing function: 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert 
jython type to pig datatype java.lang.ClassCastException: java.lang.String 
cannot be cast to org.apache.pig.data.Tuple
        at 
org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)


An example to reproduce the issuue ;

Pig Script
{code}
register 'mapkeys.py' using jython as mapkeys;
A = load 'mapkeys.data' using PigStorage() as ( aMap: map[] );
C = foreach A generate mapkeys.keys(aMap);
dump C;
{code}


mapkeys.py
{code}
@outputSchema("keys:bag{t:tuple(key:chararray)}")
def keys(map):
  print "mapkeys.py:keys:map:", map
  outBag = []
  for key in map.iterkeys():
    t = (key) ## doesn't work, causes Pig to crash
    #t = (key,) ## adding empty value works :-/
    outBag.append(t)
  print "mapkeys.py:keys:outBag:", outBag
  return outBag
{code}

Input data 'mapkeys.data'
[name#John,phone#5551212]


In the udf, t = (key) , because of this the item inside the bag is treated as a 
string instead of a tuple which causes for the class cast execption.
If I provide an additional comma, t = (key,) , then the script goes through 
fine.


>From code what I can see is that ,for "t = (key,)" , pythonToPig(..) recieves 
>the pyObject as  [(u'name',), (u'phone',)] from the PyFunction call .
But for "t = (key)" the return from PyFunction call is [u'name', u'phone']



  was:
While using phython udf, if I create a tuple with a single field, Pig execution 
fails with ClassCastException.

Caused by: java.io.IOException: Error executing function: 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert 
jython type to pig datatype java.lang.ClassCastException: java.lang.String 
cannot be cast to org.apache.pig.data.Tuple
        at 
org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)


An example to reproduce the issuue ;





> jython - problem with single item tuple in bag
> ----------------------------------------------
>
>                 Key: PIG-2098
>                 URL: https://issues.apache.org/jira/browse/PIG-2098
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.0
>            Reporter: Vivek Padmanabhan
>
> While using phython udf, if I create a tuple with a single field, Pig 
> execution fails with ClassCastException.
> Caused by: java.io.IOException: Error executing function: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert 
> jython type to pig datatype java.lang.ClassCastException: java.lang.String 
> cannot be cast to org.apache.pig.data.Tuple
>       at 
> org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
> An example to reproduce the issuue ;
> Pig Script
> {code}
> register 'mapkeys.py' using jython as mapkeys;
> A = load 'mapkeys.data' using PigStorage() as ( aMap: map[] );
> C = foreach A generate mapkeys.keys(aMap);
> dump C;
> {code}
> mapkeys.py
> {code}
> @outputSchema("keys:bag{t:tuple(key:chararray)}")
> def keys(map):
>   print "mapkeys.py:keys:map:", map
>   outBag = []
>   for key in map.iterkeys():
>     t = (key) ## doesn't work, causes Pig to crash
>     #t = (key,) ## adding empty value works :-/
>     outBag.append(t)
>   print "mapkeys.py:keys:outBag:", outBag
>   return outBag
> {code}
> Input data 'mapkeys.data'
> [name#John,phone#5551212]
> In the udf, t = (key) , because of this the item inside the bag is treated as 
> a string instead of a tuple which causes for the class cast execption.
> If I provide an additional comma, t = (key,) , then the script goes through 
> fine.
> From code what I can see is that ,for "t = (key,)" , pythonToPig(..) recieves 
> the pyObject as  [(u'name',), (u'phone',)] from the PyFunction call .
> But for "t = (key)" the return from PyFunction call is [u'name', u'phone']

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to