script UDF (jython) should utilize the intended output schema to more directly 
convert Py objects to Pig objects
----------------------------------------------------------------------------------------------------------------

                 Key: PIG-1942
                 URL: https://issues.apache.org/jira/browse/PIG-1942
             Project: Pig
          Issue Type: Improvement
          Components: impl
    Affects Versions: 0.8.0, 0.9.0
            Reporter: Woody Anderson
            Priority: Minor
             Fix For: 0.9.0


from https://issues.apache.org/jira/browse/PIG-1824

{code}
import re
@outputSchema("y:bag{t:tuple(word:chararray)}")
def strsplittobag(content,regex):
        return re.compile(regex).split(content)
{code}

does not work because split returns a list of strings. However, the output 
schema is known, and it would be quite simple to implicitly promote the string 
element to a tupled element.
also, a list/array/tuple/set etc. are all equally convertable to bag, and 
list/array/tuple are equally convertable to Tuple, this conversion can be done 
in a much less rigid way with the use of the schema.

this allows much more facile re-use of existing python code and less memory 
overhead to create intermediate re-converting of object types.
I have written the code to do this a while back as part of my version of the 
jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to