[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2013-02-04 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1942:


Status: Open  (was: Patch Available)

Marking open pending response to Thejas' comments.

> script UDF (jython) should utilize the intended output schema to more 
> directly convert Py objects to Pig objects
> 
>
> Key: PIG-1942
> URL: https://issues.apache.org/jira/browse/PIG-1942
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.9.0, 0.8.0
>Reporter: Woody Anderson
>Assignee: Woody Anderson
>Priority: Minor
>  Labels: python, schema, udf
> Attachments: 1942.patch, 1942_with_junit.patch
>
>
> from https://issues.apache.org/jira/browse/PIG-1824
> {code}
> import re
> @outputSchema("y:bag{t:tuple(word:chararray)}")
> def strsplittobag(content,regex):
> return re.compile(regex).split(content)
> {code}
> does not work because split returns a list of strings. However, the output 
> schema is known, and it would be quite simple to implicitly promote the 
> string element to a tupled element.
> also, a list/array/tuple/set etc. are all equally convertable to bag, and 
> list/array/tuple are equally convertable to Tuple, this conversion can be 
> done in a much less rigid way with the use of the schema.
> this allows much more facile re-use of existing python code and less memory 
> overhead to create intermediate re-converting of object types.
> I have written the code to do this a while back as part of my version of the 
> jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-10-06 Thread Alan Gates (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1942:


Fix Version/s: (was: 0.10)

Unlinking from 0.10 as we don't seem to have agreement on how to proceed yet.

> script UDF (jython) should utilize the intended output schema to more 
> directly convert Py objects to Pig objects
> 
>
> Key: PIG-1942
> URL: https://issues.apache.org/jira/browse/PIG-1942
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Woody Anderson
>Assignee: Woody Anderson
>Priority: Minor
>  Labels: python, schema, udf
> Attachments: 1942.patch, 1942_with_junit.patch
>
>
> from https://issues.apache.org/jira/browse/PIG-1824
> {code}
> import re
> @outputSchema("y:bag{t:tuple(word:chararray)}")
> def strsplittobag(content,regex):
> return re.compile(regex).split(content)
> {code}
> does not work because split returns a list of strings. However, the output 
> schema is known, and it would be quite simple to implicitly promote the 
> string element to a tupled element.
> also, a list/array/tuple/set etc. are all equally convertable to bag, and 
> list/array/tuple are equally convertable to Tuple, this conversion can be 
> done in a much less rigid way with the use of the schema.
> this allows much more facile re-use of existing python code and less memory 
> overhead to create intermediate re-converting of object types.
> I have written the code to do this a while back as part of my version of the 
> jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-07-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1942:


Status: Patch Available  (was: Open)

marking as SubmitPatch so this can be reviewed and committed.

> script UDF (jython) should utilize the intended output schema to more 
> directly convert Py objects to Pig objects
> 
>
> Key: PIG-1942
> URL: https://issues.apache.org/jira/browse/PIG-1942
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Woody Anderson
>Assignee: Woody Anderson
>Priority: Minor
>  Labels: python, schema, udf
> Fix For: 0.10
>
> Attachments: 1942.patch, 1942_with_junit.patch
>
>
> from https://issues.apache.org/jira/browse/PIG-1824
> {code}
> import re
> @outputSchema("y:bag{t:tuple(word:chararray)}")
> def strsplittobag(content,regex):
> return re.compile(regex).split(content)
> {code}
> does not work because split returns a list of strings. However, the output 
> schema is known, and it would be quite simple to implicitly promote the 
> string element to a tupled element.
> also, a list/array/tuple/set etc. are all equally convertable to bag, and 
> list/array/tuple are equally convertable to Tuple, this conversion can be 
> done in a much less rigid way with the use of the schema.
> this allows much more facile re-use of existing python code and less memory 
> overhead to create intermediate re-converting of object types.
> I have written the code to do this a while back as part of my version of the 
> jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-05-03 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1942:


Attachment: 1942_with_junit.patch

i forgot to svn add my unit test that contains a lot of useful tests and 
comments.

it's included in this patch. it has a timing loop at the end that you can 
enable by adding an annotation etc. or running it directly in eclipse etc. to 
show the performance difference between the methods.

> script UDF (jython) should utilize the intended output schema to more 
> directly convert Py objects to Pig objects
> 
>
> Key: PIG-1942
> URL: https://issues.apache.org/jira/browse/PIG-1942
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Woody Anderson
>Priority: Minor
>  Labels: python, schema, udf
> Fix For: 0.10
>
> Attachments: 1942.patch, 1942_with_junit.patch
>
>
> from https://issues.apache.org/jira/browse/PIG-1824
> {code}
> import re
> @outputSchema("y:bag{t:tuple(word:chararray)}")
> def strsplittobag(content,regex):
> return re.compile(regex).split(content)
> {code}
> does not work because split returns a list of strings. However, the output 
> schema is known, and it would be quite simple to implicitly promote the 
> string element to a tupled element.
> also, a list/array/tuple/set etc. are all equally convertable to bag, and 
> list/array/tuple are equally convertable to Tuple, this conversion can be 
> done in a much less rigid way with the use of the schema.
> this allows much more facile re-use of existing python code and less memory 
> overhead to create intermediate re-converting of object types.
> I have written the code to do this a while back as part of my version of the 
> jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-05-03 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1942:


Attachment: 1942.patch

I wanted to get this started, as this is a bit of a change.

often, it seems that people misuse the outputSchema annotation such that the 
output does not match the specified schema. At least, there was a unit test 
that did this, and it's possible that a few users in the wild have this issue 
as well.

At any rate, this patch includes code in JythonUtils that will coerce jythout 
object model output into the schema that the function is annotated with.

It's faster than the existing code and has quite a bit more functionality. It 
can convert arrays and many more types than previously. It also makes it much 
easier and faster to convert [1,2,3] to a bag rather than in jython create 
[(1), (2), (3)].

Given that this changes the functionality of udfs that use @outputSchema (by 
coercing schema adherence), we may want to use a different annotation, and 
allow outputSchema to exist in it's previous form, in that it doesn't actually 
convert the schema.


> script UDF (jython) should utilize the intended output schema to more 
> directly convert Py objects to Pig objects
> 
>
> Key: PIG-1942
> URL: https://issues.apache.org/jira/browse/PIG-1942
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Woody Anderson
>Priority: Minor
>  Labels: python, schema, udf
> Fix For: 0.10
>
> Attachments: 1942.patch
>
>
> from https://issues.apache.org/jira/browse/PIG-1824
> {code}
> import re
> @outputSchema("y:bag{t:tuple(word:chararray)}")
> def strsplittobag(content,regex):
> return re.compile(regex).split(content)
> {code}
> does not work because split returns a list of strings. However, the output 
> schema is known, and it would be quite simple to implicitly promote the 
> string element to a tupled element.
> also, a list/array/tuple/set etc. are all equally convertable to bag, and 
> list/array/tuple are equally convertable to Tuple, this conversion can be 
> done in a much less rigid way with the use of the schema.
> this allows much more facile re-use of existing python code and less memory 
> overhead to create intermediate re-converting of object types.
> I have written the code to do this a while back as part of my version of the 
> jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-04-06 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1942:


Fix Version/s: (was: 0.9.0)
   0.10

> script UDF (jython) should utilize the intended output schema to more 
> directly convert Py objects to Pig objects
> 
>
> Key: PIG-1942
> URL: https://issues.apache.org/jira/browse/PIG-1942
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Woody Anderson
>Priority: Minor
>  Labels: python, schema, udf
> Fix For: 0.10
>
>
> from https://issues.apache.org/jira/browse/PIG-1824
> {code}
> import re
> @outputSchema("y:bag{t:tuple(word:chararray)}")
> def strsplittobag(content,regex):
> return re.compile(regex).split(content)
> {code}
> does not work because split returns a list of strings. However, the output 
> schema is known, and it would be quite simple to implicitly promote the 
> string element to a tupled element.
> also, a list/array/tuple/set etc. are all equally convertable to bag, and 
> list/array/tuple are equally convertable to Tuple, this conversion can be 
> done in a much less rigid way with the use of the schema.
> this allows much more facile re-use of existing python code and less memory 
> overhead to create intermediate re-converting of object types.
> I have written the code to do this a while back as part of my version of the 
> jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira