[jira] [Created] (PIG-2160) recent regression wrt FrontendException: ERROR 1000

2011-07-13 Thread Woody Anderson (JIRA)
recent regression wrt FrontendException: ERROR 1000
---

 Key: PIG-2160
 URL: https://issues.apache.org/jira/browse/PIG-2160
 Project: Pig
  Issue Type: Bug
Reporter: Woody Anderson


i recently svn up'd  http://svn.apache.org/repos/asf/pig/branches/branch-0.9 
and rebuilt and tested the Antispam pig loader against the new 0.9.1 jar ensure 
everything is fine.
this was working previously.. when the build version for the branch was 0.9.0

currently not working at Revision: 1145388

it's not, and i'm a bit confused, so hopefully someone can help me out:

contents of ./target/surefire-reports/TEST-com.XTest.xml:
..
 error message=Error during parsing. lt;line 1, column 113gt;  mismatched 
input apos;(apos; expecting SEMI_COLON 
type=org.apache.pig.impl.logicalLayer.FrontendExceptionorg.apache.pig.impl.logicalLayer.FrontendException:
 ERROR 1000: Error during parsing. lt;line 1, column 113gt;  mismatched input 
apos;(apos; expecting SEMI_COLON
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1638)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1583)
at org.apache.pig.PigServer.registerQuery(PigServer.java:583)
at org.apache.pig.PigServer.registerQuery(PigServer.java:596)
at com...XTest.testLoadData(XTest.java:74)
..


that test code method looks like this:

@SuppressWarnings(unchecked)
@Test
public void testLoadData() throws Exception {
...
PigServer pigServer = new PigServer(ExecType.LOCAL);
pigServer.registerQuery(A = load 'file: + 
Util.encodeEscape(f.getAbsolutePath()) + ' using com.Storage( +
'a, b, c, d, e, f, g, h, i' +
) as (a:chararray, b:long, c:chararray, 
d:chararray, e:int, f:chararray, g:int, h:int, i:int););
...}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2116) HashPartitioner is not a safe partitioner for non-prime number of reducers, particularly bad for 2^n, which seems to be a common use

2011-06-09 Thread Woody Anderson (JIRA)
HashPartitioner is not a safe partitioner for non-prime number of reducers, 
particularly bad for 2^n, which seems to be a common use


 Key: PIG-2116
 URL: https://issues.apache.org/jira/browse/PIG-2116
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Woody Anderson


the implementation of hashCode should not be assumed to be good.
in particular, the hashCode of String and List (used by Tuple) are very bad for 
modulus 2^n.

we propose to add an additional perturbation of the int before doing the % 
reducers bucketing.
HashMap.java uses this to prevent the String.hashCode from causing massive 
bucket collisions etc. but that perturbation is targeted explicitly for a 2^n 
number of buckets, which Pig is not doing in general.

we propose possibly using the final mixing step from murmur3.

here is some discussion of this issue for context:

This has some amusing implications: this hash is terrible for
2,4,8,16,31, and 32 reducers, so even in normal situations that's pretty
bad, especially if pig happens to pick 31 reducers because it has
104-106 mappers * 0.3.

31 is congruent to -1 mod 2^k for all 2 = k = 5, so in that case the hash is
effectively:

t[0]*(-1)^(n-1) + t[1]*(-1)^(n-2) + ... + t[n-2]*(-1) + t[n-1]

= (for odd n) t[0] - t[1] + t[2] - t[3] + t[4] + ...

So for example the string mississippim hashes to 0 (mod 2^32), as
every even input character is cancelled out by an equal odd input
elsewhere.

H = 0
for c in mississippim:
  H = H*31 + ord(c)
  print %c: H=%d (mod 32) % (c, H%32)

m: H=13 (mod 32)
i: H=28 (mod 32)
s: H=23 (mod 32)
s: H=28 (mod 32)
i: H=13 (mod 32)
s: H=6 (mod 32)
s: H=13 (mod 32)
i: H=28 (mod 32)
p: H=20 (mod 32)
p: H=28 (mod 32)
i: H=13 (mod 32)
m: H=0 (mod 32)

Similarly with exactly 31 reducers, the hash function cancels out
entirely (31 is 0 mod 31, so everything but the last item is multiplied
by 0^i) and the result is simply the value of the last item.

A simple fix is to add a post-hash mixing step that nontrivially affects
the bits in the state over all other bits in the hash output, ideally
with probability 1/2 for all bits.  That way the modulo doesn't
distribute across the whole function back to the input, and the internal
state of the hash above whatever modulus has some effect.

H = 0
for c in mississippim:
  H = H*31 + ord(c)
  # these 0x ops are to simulate unsigned 32-bit math in python
  H = H0x
  Hout = (H + (H3))0x
  Hout = Hout ^ (Hout11)
  Hout = (Hout + (Hout15))0x
  print %c: H=%08x === %d (mod 32) % (c, Hout, Hout%32)

m: H=01ea83d5 === 21 (mod 32)
i: H=3d39fa73 === 19 (mod 32)
s: H=6c78d8d4 === 20 (mod 32)
s: H=3c76f555 === 21 (mod 32)
i: H=0abb25ff === 31 (mod 32)
s: H=40df81c9 === 9 (mod 32)
s: H=cfc8a427 === 7 (mod 32)
i: H=cea62c2b === 11 (mod 32)
p: H=4594d493 === 19 (mod 32)
p: H=f14b432a === 10 (mod 32)
i: H=169be0b0 === 16 (mod 32)
m: H=7d57b59c === 28 (mod 32)

The mixing step only needs to be done once at the end.  The one I
inserted was stolen from Bob Jenkins' hash site, which is required
reading for anyone who decides to implement their own hashing.

Or you could use a real (good, fast, tested) hash function like murmur3.

-Andy


On Thu, Jun 02, 2011 at 03:37:56PM -0700, Woody Anderson wrote:
 This caught me off guard the other day, so i figured i'd pass it along:
 
 the hashCode implementation of Tuple and String have very specific expansions 
 which do not provide a lot of hashCode variance mod 2^k when the elements are 
 all equal.
 
 string:
  t[0]*31^(n-1) + t[1]*31^(n-2) + ... + t[n-1]
 tuple:
 ..(((31 + t[0])*31 + t[1])*31 + t[2])*31 + t[4]..
 
 this expansion modulo powers of 2 is degenerate if t[i] are all equal.
 eg. you group by (n0, n1) to do some work, and there are an unusually high 
 number of tuples where n0 == n1, the value of n0/n1 makes no difference. this 
 will equal 1 mod 16.
 the same goes if you're grouping by strings, and have a lot of a, aa, 
 , b, bb, bbb, etc. type data
 this results in all the data ending up in a single reducer/part file. which 
 is either a waste or going to kill your job.
 so, if you use 2^k reducers then that's a terrible group-by. and it's not 
 going to be good (in general) for any non-prime.
 
 under 'normal' circumstances you probably won't notice this being a factor. I 
 didn't notice until i used string.hashCode as part of a group-by to both 
 group by my string an produce a semi-randomized output ordering (sherpa 
 requirement); this completely blew up when simply grouping by the string 
 hadn't.
 
 so, if you have highly varied data elements, this this is less of an issue, 
 though a prime will usually generalize better, and you won't suddenly wonder 
 about the bad dispersal you're getting.
 -w

--
This message is automatically 

[jira] [Assigned] (PIG-2098) jython - problem with single item tuple in bag

2011-05-26 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson reassigned PIG-2098:
---

Assignee: Woody Anderson

 jython - problem with single item tuple in bag
 --

 Key: PIG-2098
 URL: https://issues.apache.org/jira/browse/PIG-2098
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Vivek Padmanabhan
Assignee: Woody Anderson

 While using phython udf, if I create a tuple with a single field, Pig 
 execution fails with ClassCastException.
 Caused by: java.io.IOException: Error executing function: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert 
 jython type to pig datatype java.lang.ClassCastException: java.lang.String 
 cannot be cast to org.apache.pig.data.Tuple
   at 
 org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
 An example to reproduce the issuue ;
 Pig Script
 {code}
 register 'mapkeys.py' using jython as mapkeys;
 A = load 'mapkeys.data' using PigStorage() as ( aMap: map[] );
 C = foreach A generate mapkeys.keys(aMap);
 dump C;
 {code}
 mapkeys.py
 {code}
 @outputSchema(keys:bag{t:tuple(key:chararray)})
 def keys(map):
   print mapkeys.py:keys:map:, map
   outBag = []
   for key in map.iterkeys():
 t = (key) ## doesn't work, causes Pig to crash
 #t = (key,) ## adding empty value works :-/
 outBag.append(t)
   print mapkeys.py:keys:outBag:, outBag
   return outBag
 {code}
 Input data 'mapkeys.data'
 [name#John,phone#5551212]
 In the udf, t = (key) , because of this the item inside the bag is treated as 
 a string instead of a tuple which causes for the class cast execption.
 If I provide an additional comma, t = (key,) , then the script goes through 
 fine.
 From code what I can see is that ,for t = (key,) , pythonToPig(..) recieves 
 the pyObject as  [(u'name',), (u'phone',)] from the PyFunction call .
 But for t = (key) the return from PyFunction call is [u'name', u'phone']

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2098) jython - problem with single item tuple in bag

2011-05-26 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039803#comment-13039803
 ] 

Woody Anderson commented on PIG-2098:
-

to be clear on the parans issue, Nicolas Torzec cleared that up:

 In Python, a tuple is recognized by the commas that separate its elements, not 
by its surrounding parenthesis, which are just used for grouping expressions...

 That’s why both “t = (key, )” and “t = key, ” work, but not “t = (key)”. 

 Nicolas.

 jython - problem with single item tuple in bag
 --

 Key: PIG-2098
 URL: https://issues.apache.org/jira/browse/PIG-2098
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Vivek Padmanabhan
Assignee: Woody Anderson

 While using phython udf, if I create a tuple with a single field, Pig 
 execution fails with ClassCastException.
 Caused by: java.io.IOException: Error executing function: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert 
 jython type to pig datatype java.lang.ClassCastException: java.lang.String 
 cannot be cast to org.apache.pig.data.Tuple
   at 
 org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
 An example to reproduce the issuue ;
 Pig Script
 {code}
 register 'mapkeys.py' using jython as mapkeys;
 A = load 'mapkeys.data' using PigStorage() as ( aMap: map[] );
 C = foreach A generate mapkeys.keys(aMap);
 dump C;
 {code}
 mapkeys.py
 {code}
 @outputSchema(keys:bag{t:tuple(key:chararray)})
 def keys(map):
   print mapkeys.py:keys:map:, map
   outBag = []
   for key in map.iterkeys():
 t = (key) ## doesn't work, causes Pig to crash
 #t = (key,) ## adding empty value works :-/
 outBag.append(t)
   print mapkeys.py:keys:outBag:, outBag
   return outBag
 {code}
 Input data 'mapkeys.data'
 [name#John,phone#5551212]
 In the udf, t = (key) , because of this the item inside the bag is treated as 
 a string instead of a tuple which causes for the class cast execption.
 If I provide an additional comma, t = (key,) , then the script goes through 
 fine.
 From code what I can see is that ,for t = (key,) , pythonToPig(..) recieves 
 the pyObject as  [(u'name',), (u'phone',)] from the PyFunction call .
 But for t = (key) the return from PyFunction call is [u'name', u'phone']

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2098) jython - problem with single item tuple in bag

2011-05-26 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson resolved PIG-2098.
-

  Resolution: Duplicate
Release Note: dupe of PIG-1942

dupe of PIG-1942

 jython - problem with single item tuple in bag
 --

 Key: PIG-2098
 URL: https://issues.apache.org/jira/browse/PIG-2098
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Vivek Padmanabhan
Assignee: Woody Anderson

 While using phython udf, if I create a tuple with a single field, Pig 
 execution fails with ClassCastException.
 Caused by: java.io.IOException: Error executing function: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 0: Cannot convert 
 jython type to pig datatype java.lang.ClassCastException: java.lang.String 
 cannot be cast to org.apache.pig.data.Tuple
   at 
 org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:111)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
 An example to reproduce the issuue ;
 Pig Script
 {code}
 register 'mapkeys.py' using jython as mapkeys;
 A = load 'mapkeys.data' using PigStorage() as ( aMap: map[] );
 C = foreach A generate mapkeys.keys(aMap);
 dump C;
 {code}
 mapkeys.py
 {code}
 @outputSchema(keys:bag{t:tuple(key:chararray)})
 def keys(map):
   print mapkeys.py:keys:map:, map
   outBag = []
   for key in map.iterkeys():
 t = (key) ## doesn't work, causes Pig to crash
 #t = (key,) ## adding empty value works :-/
 outBag.append(t)
   print mapkeys.py:keys:outBag:, outBag
   return outBag
 {code}
 Input data 'mapkeys.data'
 [name#John,phone#5551212]
 In the udf, t = (key) , because of this the item inside the bag is treated as 
 a string instead of a tuple which causes for the class cast execption.
 If I provide an additional comma, t = (key,) , then the script goes through 
 fine.
 From code what I can see is that ,for t = (key,) , pythonToPig(..) recieves 
 the pyObject as  [(u'name',), (u'phone',)] from the PyFunction call .
 But for t = (key) the return from PyFunction call is [u'name', u'phone']

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2093) add comparison order to TOP udf to allow for optional sort order asc/desc

2011-05-24 Thread Woody Anderson (JIRA)
add comparison order to TOP udf to allow for optional sort order asc/desc
-

 Key: PIG-2093
 URL: https://issues.apache.org/jira/browse/PIG-2093
 Project: Pig
  Issue Type: Improvement
Reporter: Woody Anderson
Assignee: Woody Anderson
Priority: Minor


easy enough to allow the comparison used with the priority queue to be asc/desc 
with a simple boolean input to the UDF

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1824) Support import modules in Jython UDF

2011-05-20 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1824:


Release Note: 
module import state is determined before and after user code is executed. The 
resolved modules are inspected and added to the pigContext, then they are added 
to the job jar.

this patch addresses the following import modes:
- import re, which will (if configured) find re on the filesystem in the jython 
install root
- import foo (which can import bar), this works now provided bar is resolvable 
JYTHON_HOME, JYTHONPATH, curdir, etc.
- from pkg import *, which works when the cachedir is writable
- import non.jvm.class, which works when the cachedir is writable
- the directly imported module may use schema decorators, but recursively 
imported modules cannot until PIG-1943 is addressed


  was:
module import state is determined before and after user code is executed. The 
resolved modules are inspected and added to the pigContext, then they are added 
to the job jar.

this patch addresses the following import modes:
- import re, which will (if configured) find re on the filesystem in the jython 
install root
- import foo (which can import bar), this works now provided bar is resolvable 
JYTHON_HOME, JYTHONPATH, curdir, etc.
- from pkg import *, which works when the cachedir is writable
- import non.jvm.class, which works when the cachedir is writable


 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 
 1824c.patch, 1824d.patch, 1824x.patch, 
 TEST-org.apache.pig.test.TestGrunt.txt, 
 TEST-org.apache.pig.test.TestScriptLanguage.txt, 
 TEST-org.apache.pig.test.TestScriptUDF.txt


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2086) grunt parser fails for: load .. as \n (b:bag{});

2011-05-20 Thread Woody Anderson (JIRA)
grunt parser fails for: load .. as \n (b:bag{}); 
-

 Key: PIG-2086
 URL: https://issues.apache.org/jira/browse/PIG-2086
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.10
 Environment: mac 10.5.8
Reporter: Woody Anderson


this snippet fails:
{code}
IN4 = load '$in' using
com.zzz.Storage() as
( inpt:bag{} );
{code}
this works (as on same line as semi-colon)
{code}
IN4 = load '$in' using
com.zzz.Storage()
as ( inpt:bag{} );
{code}

this is the grunt error:
2011-05-20 20:19:34,934 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1200: file loadstore.pig, line 68, column 16  mismatched input ';' expecting 
RIGHT_PAREN

this only happens in cases where the types of the fields are complex e.g. 
bags/tuples
eg. change the type of _inpt_ to be _chararray_ and it will parse.

this is very strange! and i spent hours debugging my schema writing skills and 
reading QueryParser.g before simply trying as (expr); on the same line.

_all_ of my scripts had been written with the lines split the other way (with 
lots of ctor args and as-clause elements: hence the line breaks), this is not 
an issue if i don't load complicated types, but it fails in this particular 
case.
This is quite unexpected and seems to be undocumented and a bug imho.
i don't know enough about antlr (i was a javacc person) to make sense of why 
this would be an issue for the parser b/c the grammar looks good assuming 
newline is basically whitespace.

though i can't figure out how newlines are treated in the grammar, there does 
not seem to be a newline routine ala 
https://supportweb.cs.bham.ac.uk/documentation/tutorials/docsystem/build/tutorials/antlr/antlr.html

I'm going to assume the grammar author is much more sophisticated than that 
tutorial and knows how to fix this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-05-19 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036496#comment-13036496
 ] 

Woody Anderson commented on PIG-1824:
-

cool. can we get this into trunk so i don't have to keep fixing the patches?


 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 
 1824c.patch, 1824d.patch, 1824x.patch, 
 TEST-org.apache.pig.test.TestGrunt.txt, 
 TEST-org.apache.pig.test.TestScriptLanguage.txt, 
 TEST-org.apache.pig.test.TestScriptUDF.txt


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-05-17 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034932#comment-13034932
 ] 

Woody Anderson commented on PIG-1824:
-

hmm.. i ran each of those tests via:

ant -noclasspath test -Dtestcase=org.apache.pig.test.TestScriptUDF
etc. and they all passed.

is your environment clean?
% printenv | grep YTHON
(should be empty)

is there anything else i should be doing to try to mirror your test framework 
(while not having to run all tests for the 18 hours that that requires)?

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, 
 1824d.patch, 1824x.patch, TEST-org.apache.pig.test.TestGrunt.txt, 
 TEST-org.apache.pig.test.TestScriptLanguage.txt, 
 TEST-org.apache.pig.test.TestScriptUDF.txt


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1824) Support import modules in Jython UDF

2011-05-17 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1824:


Attachment: 1824_final.patch

ok. my bad!

testcase=full.package.path doesn't even run the test, so tho i claimed that the 
tests were passing, it was in fact simply that junit could run.


Here's a new patch:
there was an extra line that i mistakenly didn't delete when creating the 
re-trunked code.

this patch will pass the tests

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 
 1824c.patch, 1824d.patch, 1824x.patch, 
 TEST-org.apache.pig.test.TestGrunt.txt, 
 TEST-org.apache.pig.test.TestScriptLanguage.txt, 
 TEST-org.apache.pig.test.TestScriptUDF.txt


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1824) Support import modules in Jython UDF

2011-05-16 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1824:


Attachment: 1824x.patch

patch for trunk

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, 
 1824d.patch, 1824x.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2051) new LogicalSchema column prune code does not preserve type information for map subfields

2011-05-09 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-2051:


Attachment: 2051.patch

this patch propagates type information more correctly (though not 
recursive/fully) to the pushProjection call.

Mainly, this means putting type information into via subfields into map types.

It doesn't fully descend and provide type information for subfields of 
subfields etc. But, provided fields have the correct type information rather 
than DataType.BYTEARRAY


 new LogicalSchema column prune code does not preserve type information for 
 map subfields
 

 Key: PIG-2051
 URL: https://issues.apache.org/jira/browse/PIG-2051
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10
Reporter: Woody Anderson
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 2051.patch


 current impl of ColumnPruneVisitor.visit ignores field type info and passes 
 type BYTEARRAY for all map fields.
 the corrected type is pretty easy to fill in, especially since map field info 
 is only attempted 1 level deep.
 i came across this b/c i utilize the type information in the pushProjection 
 call, and this was previously of the 'correct' type information, the change 
 over to LogicalSchema caused a regression.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2053) PigInputFormat uses class.isAssignableFrom() where instanceof is more appropriate

2011-05-09 Thread Woody Anderson (JIRA)
PigInputFormat uses class.isAssignableFrom() where instanceof is more 
appropriate
-

 Key: PIG-2053
 URL: https://issues.apache.org/jira/browse/PIG-2053
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.10
Reporter: Woody Anderson
Priority: Minor
 Fix For: 0.10
 Attachments: 2053.patch

This is a code style/quality improvement.

isAssignableFrom is appropriate when the class is not known at compile type, 
but assignment needs to be checked.
e.g. foo.getClass().isAssignableFrom(bar.getClass())

but, if the class of foo is known (e.g. X.class), then instanceof is more 
appropriate and readable.
i also made use of de morgan's to simply the is combininable boolean 
statement, which is hard to grok as written.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2053) PigInputFormat uses class.isAssignableFrom() where instanceof is more appropriate

2011-05-09 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-2053:


Attachment: 2053.patch

patch

 PigInputFormat uses class.isAssignableFrom() where instanceof is more 
 appropriate
 -

 Key: PIG-2053
 URL: https://issues.apache.org/jira/browse/PIG-2053
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.10
Reporter: Woody Anderson
Priority: Minor
 Fix For: 0.10

 Attachments: 2053.patch


 This is a code style/quality improvement.
 isAssignableFrom is appropriate when the class is not known at compile type, 
 but assignment needs to be checked.
 e.g. foo.getClass().isAssignableFrom(bar.getClass())
 but, if the class of foo is known (e.g. X.class), then instanceof is more 
 appropriate and readable.
 i also made use of de morgan's to simply the is combininable boolean 
 statement, which is hard to grok as written.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-05-09 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031001#comment-13031001
 ] 

Woody Anderson commented on PIG-2012:
-

thanks for this one! this has been a major pain for me.

 Comments at the begining of the file throws off line numbers in errors
 --

 Key: PIG-2012
 URL: https://issues.apache.org/jira/browse/PIG-2012
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Alan Gates
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2012_1.patch, PIG-2012_2.patch, macro.pig


 The preprocessor does not appear to be handling leading comments properly 
 when calculating line numbers for error messages.  In the attached script, 
 the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1824) Support import modules in Jython UDF

2011-05-06 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1824:


Attachment: 1824d.patch

patch includes throw new IllegalStateException if the stream is null.

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, 
 1824d.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-05-06 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030116#comment-13030116
 ] 

Woody Anderson commented on PIG-1824:
-

i'm not sure what's really left to keep this out of the next release, given 
we've been going back an forth over issues that don't even affect functionality.
but, there are other jython related bugs in the pipe for 0.10 anyway, so 
perhaps having them all in the same release is a good idea for a feature 
grouping perspective.

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, 
 1824d.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-05-03 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1942:


Attachment: 1942.patch

I wanted to get this started, as this is a bit of a change.

often, it seems that people misuse the outputSchema annotation such that the 
output does not match the specified schema. At least, there was a unit test 
that did this, and it's possible that a few users in the wild have this issue 
as well.

At any rate, this patch includes code in JythonUtils that will coerce jythout 
object model output into the schema that the function is annotated with.

It's faster than the existing code and has quite a bit more functionality. It 
can convert arrays and many more types than previously. It also makes it much 
easier and faster to convert [1,2,3] to a bag rather than in jython create 
[(1), (2), (3)].

Given that this changes the functionality of udfs that use @outputSchema (by 
coercing schema adherence), we may want to use a different annotation, and 
allow outputSchema to exist in it's previous form, in that it doesn't actually 
convert the schema.


 script UDF (jython) should utilize the intended output schema to more 
 directly convert Py objects to Pig objects
 

 Key: PIG-1942
 URL: https://issues.apache.org/jira/browse/PIG-1942
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Priority: Minor
  Labels: python, schema, udf
 Fix For: 0.10

 Attachments: 1942.patch


 from https://issues.apache.org/jira/browse/PIG-1824
 {code}
 import re
 @outputSchema(y:bag{t:tuple(word:chararray)})
 def strsplittobag(content,regex):
 return re.compile(regex).split(content)
 {code}
 does not work because split returns a list of strings. However, the output 
 schema is known, and it would be quite simple to implicitly promote the 
 string element to a tupled element.
 also, a list/array/tuple/set etc. are all equally convertable to bag, and 
 list/array/tuple are equally convertable to Tuple, this conversion can be 
 done in a much less rigid way with the use of the schema.
 this allows much more facile re-use of existing python code and less memory 
 overhead to create intermediate re-converting of object types.
 I have written the code to do this a while back as part of my version of the 
 jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-05-03 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1942:


Attachment: 1942_with_junit.patch

i forgot to svn add my unit test that contains a lot of useful tests and 
comments.

it's included in this patch. it has a timing loop at the end that you can 
enable by adding an annotation etc. or running it directly in eclipse etc. to 
show the performance difference between the methods.

 script UDF (jython) should utilize the intended output schema to more 
 directly convert Py objects to Pig objects
 

 Key: PIG-1942
 URL: https://issues.apache.org/jira/browse/PIG-1942
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Priority: Minor
  Labels: python, schema, udf
 Fix For: 0.10

 Attachments: 1942.patch, 1942_with_junit.patch


 from https://issues.apache.org/jira/browse/PIG-1824
 {code}
 import re
 @outputSchema(y:bag{t:tuple(word:chararray)})
 def strsplittobag(content,regex):
 return re.compile(regex).split(content)
 {code}
 does not work because split returns a list of strings. However, the output 
 schema is known, and it would be quite simple to implicitly promote the 
 string element to a tupled element.
 also, a list/array/tuple/set etc. are all equally convertable to bag, and 
 list/array/tuple are equally convertable to Tuple, this conversion can be 
 done in a much less rigid way with the use of the schema.
 this allows much more facile re-use of existing python code and less memory 
 overhead to create intermediate re-converting of object types.
 I have written the code to do this a while back as part of my version of the 
 jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-05-03 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson reassigned PIG-1942:
---

Assignee: Woody Anderson

 script UDF (jython) should utilize the intended output schema to more 
 directly convert Py objects to Pig objects
 

 Key: PIG-1942
 URL: https://issues.apache.org/jira/browse/PIG-1942
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
Priority: Minor
  Labels: python, schema, udf
 Fix For: 0.10

 Attachments: 1942.patch, 1942_with_junit.patch


 from https://issues.apache.org/jira/browse/PIG-1824
 {code}
 import re
 @outputSchema(y:bag{t:tuple(word:chararray)})
 def strsplittobag(content,regex):
 return re.compile(regex).split(content)
 {code}
 does not work because split returns a list of strings. However, the output 
 schema is known, and it would be quite simple to implicitly promote the 
 string element to a tupled element.
 also, a list/array/tuple/set etc. are all equally convertable to bag, and 
 list/array/tuple are equally convertable to Tuple, this conversion can be 
 done in a much less rigid way with the use of the schema.
 this allows much more facile re-use of existing python code and less memory 
 overhead to create intermediate re-converting of object types.
 I have written the code to do this a while back as part of my version of the 
 jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-04-25 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025007#comment-13025007
 ] 

Woody Anderson commented on PIG-1824:
-

agree:

inre: PYTHON_CACHEDIR: the code behaves as you wish, in that it only deletes 
the dir if it (pig) created it.
sorry for not being being clear in comments about that, but if you read the 
code you'll see it.

if we can't write, i (pig) was creating an alternate directory. It may be 
possible to pre-populate this, and i understand (and had) the desire to have an 
error instead of a new directory, but I was initially experiencing this error:
{code}
*sys-package-mgr*: can't create package cache dir, 
'/grid/0/Releases/pig-0.8.0..1103222002-20110401-000/share/pig-0.8.0..1103222002/lib/cachedir/packages'
{code}

which is why i added the 'is writable' check, but after reviewing (per your 
comment), it seems that cachedir is not set on the grid (at least at the point 
when the static block runs). If left as null, it seems to default to some grid 
location that is not writable (and thus doesn't work), but if i set it to a 
writable tmp first, it works.
so.. i can safely agree that an error if the dir isn't writable is both 
desirable and works.

as for the getScriptAsStream():
i followed the existing code convention on that one, though i didn't like it 
either.
again, if you read down a bit you'll see that the impl of getScriptAsStream() 
is:
{code}
..
if (is == null) {
throw new IllegalStateException(
Could not initialize interpreter (from file system or 
classpath) with  + scriptPath);
}  
return is;
{code}

so, the null check is superfluous but does quiet the not null check warnings.
i didn't add an additional throw statement in this case b/c essentially, my 
code wouldn't add any _new_ errors that the existing code didn't already 
exhibit if somehow the impl of getScriptAsStream changed and could return null.

anyway, ill upload a new patch to address the writable issue, if you think it's 
a big deal we can add an 'else throw' statement around getScriptAsStream

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.9.0

 Attachments: 1824.patch, 1824a.patch, 1824b.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1824) Support import modules in Jython UDF

2011-04-25 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1824:


Attachment: 1824c.patch

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.9.0

 Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1973) UDFContext.getUDFContext has a thread race condition around it's ThreadLocal

2011-04-21 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022939#comment-13022939
 ] 

Woody Anderson commented on PIG-1973:
-

ok. i agree. it's not a bug.
though, i still find it misleading code, in that it doesn't utilize the easy 
concise form, and at least to me looks wrong on 1st and second inspection.

 UDFContext.getUDFContext has a thread race condition around it's ThreadLocal
 

 Key: PIG-1973
 URL: https://issues.apache.org/jira/browse/PIG-1973
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
Priority: Minor
 Fix For: 0.9.0

 Attachments: 1973.patch


 this is probably isn't manifesting anywhere, but it's an incorrect use of the 
 ThreadLocal pattern.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1973) UDFContext.getUDFContext has a thread race condition around it's ThreadLocal

2011-04-20 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022584#comment-13022584
 ] 

Woody Anderson commented on PIG-1973:
-

incorrect.

initialValue is invoked when get() is first called. However, in the old code, 
initialValue returns null b/c it was not overridden.
thus, if 2 threads call getUDFContext() at the same time they may get 2 
different UDFContext objects, b/c the method 
does an unprotected comparison/set check:

{code}
 public static UDFContext getUDFContext() {
if (tss.get() == null) {
UDFContext ctx = new UDFContext();
tss.set(ctx);
}
return tss.get();
 }
{code}

this is CLASSIC race condition.


 UDFContext.getUDFContext has a thread race condition around it's ThreadLocal
 

 Key: PIG-1973
 URL: https://issues.apache.org/jira/browse/PIG-1973
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
Priority: Minor
 Fix For: 0.9.0

 Attachments: 1973.patch


 this is probably isn't manifesting anywhere, but it's an incorrect use of the 
 ThreadLocal pattern.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2001) DefaultTuple(List) constructor is inefficient, causes List.size() System.arraycopy() calls (though they are 0 byte copies), DefaultTuple(int) constructor is a bit misleadin

2011-04-19 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-2001:


Attachment: 2001.patch

 DefaultTuple(List) constructor is inefficient, causes List.size() 
 System.arraycopy() calls (though they are 0 byte copies), DefaultTuple(int) 
 constructor is a bit misleading wrt time complexity
 -

 Key: PIG-2001
 URL: https://issues.apache.org/jira/browse/PIG-2001
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
Priority: Minor
 Fix For: 0.10

 Attachments: 2001.patch


 I was perusing the Tuple created by the default Tuple factory, when I wanted 
 it to copy my input list.
 here i noticed that the List constructor uses List.add(index, element), which 
 is different from set(index, element) in that it shifts the right side of the 
 list, with ArrayList this causes an no-op System.arraycopy call which is 
 completely unnecessary.
 Even though the array copy call isn't actually copying any bytes, it's still 
 unnecessary, and can be easily avoided.
 it's also N iterate/add function calls, that can be avoided by using:
 {code}
 new ArrayListObject(c);
 {code}
 which, is more efficient. For arbitrary collection inputs this is at worst N 
 iterator calls (same as existing code); when constructing from ArrayLists or 
 Arrays.asList, the construction is accomplished via a single System.arraycopy 
 call, which is an actual improvement.
 There do not seem to be DefaultTuple tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-04-08 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017549#comment-13017549
 ] 

Woody Anderson commented on PIG-1824:
-

1. i could re-work the initialization into the static block of the inner class 
Interpreter, it simply needs to be done before the interpreter is allocated. 
I'm not sure what you mean by not wanting a cache dir when using python udfs or 
control flow? can you clarify?
2. separate the logic out of init into what? I think it should, in general, be 
the contract of any script environment to handle resource inclusion (if 
possible). Are you imagining some scenario where init(file,..) would not 
actually parse/internalize the code inside init()? I don't much care where the 
code is parsed and added to a ScriptEngine, but when it is, it should handle 
all other evaluated resources that are necessary to succeed. In the current 
API, a user provided script file is given to init(), so that's where it must do 
this. There is really no other place to evaluate resource inclusions, and i 
think i might not be understanding your suggestion. As for other ScriptEngines 
that may not be able to support this concept, are you suggesting a 
supportsFeature() method that we use to test various SE's to determine if 
they can support this (or other) features? I'm not sure what we'd do with this 
knowledge.

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.9.0

 Attachments: 1824.patch, 1824a.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-1985) Utils.getSchemaFromString does not use the new parser, and thus fails to parse valid schema

2011-04-08 Thread Woody Anderson (JIRA)
Utils.getSchemaFromString does not use the new parser, and thus fails to parse 
valid schema
---

 Key: PIG-1985
 URL: https://issues.apache.org/jira/browse/PIG-1985
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Woody Anderson
 Fix For: 0.9.0


I've been told this is because Utils.getSchemaFromString does not use the new 
parser to parse the schema, so we should update the impl to use the new parser:

{code}
Utils.getSchemaFromString(f: map[])
{code}
results in: (org.apache.pig.impl.logicalLayer.schema.Schema) {f: map[]}

{code}
Utils.getSchemaFromString(f: map[int])
{code}
results in: An exception occurred: 
org.apache.pig.impl.logicalLayer.parser.ParseException
..
org.apache.pig.impl.logicalLayer.parser.ParseException: Encountered  map 
map  at line 1, column 4.
Was expecting one of:
int ...
long ...
float ...
double ...
chararray ...
bytearray ...
int ...
long ...
float ...
double ...
chararray ...
bytearray ...  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1985) Utils.getSchemaFromString does not use the new parser, and thus fails to parse valid schema

2011-04-08 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017727#comment-13017727
 ] 

Woody Anderson commented on PIG-1985:
-

this is a bug, why are we targeting for .10?

 Utils.getSchemaFromString does not use the new parser, and thus fails to 
 parse valid schema
 ---

 Key: PIG-1985
 URL: https://issues.apache.org/jira/browse/PIG-1985
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Woody Anderson
 Fix For: 0.10


 I've been told this is because Utils.getSchemaFromString does not use the new 
 parser to parse the schema, so we should update the impl to use the new 
 parser:
 {code}
 Utils.getSchemaFromString(f: map[])
 {code}
 results in: (org.apache.pig.impl.logicalLayer.schema.Schema) {f: map[]}
 {code}
 Utils.getSchemaFromString(f: map[int])
 {code}
 results in: An exception occurred: 
 org.apache.pig.impl.logicalLayer.parser.ParseException
 ..
 org.apache.pig.impl.logicalLayer.parser.ParseException: Encountered  map 
 map  at line 1, column 4.
 Was expecting one of:
 int ...
 long ...
 float ...
 double ...
 chararray ...
 bytearray ...
 int ...
 long ...
 float ...
 double ...
 chararray ...
 bytearray ...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-04-08 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017789#comment-13017789
 ] 

Woody Anderson commented on PIG-1824:
-

ok. i understand your thoughts on static, and mostly i have them too, but the 
PythonInterpreter is a static member of the Interperter class, and the code i 
wrote must run BEFORE that interpreter is constructed.

Interpeter is a private inner class, so it cannot be caused to load before 
normal use patterns. So, moving the static block into the static block for 
Interpreter addresses your concerns.

import will not cause the static block to be executed btw, it's the first 
executed reference to the class. However, i take the point that some code could 
have been:
{code}
Class = JythonScriptEngine.class;
{code}
or something like that to cause the class to be loaded. Still, as i said: 
Interpreter static block addresses this, and the ctor is out b/c of the static 
nature of Interpreter.interpreter.

on second point:
i dont' see the point of a includeResources() method, if it can be done, it can 
be done in init(), if not it won't be done. Why add a new method?


 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.9.0

 Attachments: 1824.patch, 1824a.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1973) UDFContext.getUDFContext has a thread race condition around it's ThreadLocal

2011-04-06 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1973:


Attachment: 1973.patch

use the initialValue method of ThreadLocal, which is how to correctly handle 
lazy initialization. 

 UDFContext.getUDFContext has a thread race condition around it's ThreadLocal
 

 Key: PIG-1973
 URL: https://issues.apache.org/jira/browse/PIG-1973
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
Priority: Minor
 Fix For: 0.8.0

 Attachments: 1973.patch


 this is probably isn't manifesting anywhere, but it's an incorrect use of the 
 ThreadLocal pattern.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-1973) UDFContext.getUDFContext has a thread race condition around it's ThreadLocal

2011-04-06 Thread Woody Anderson (JIRA)
UDFContext.getUDFContext has a thread race condition around it's ThreadLocal


 Key: PIG-1973
 URL: https://issues.apache.org/jira/browse/PIG-1973
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
Priority: Minor
 Fix For: 0.8.0
 Attachments: 1973.patch

this is probably isn't manifesting anywhere, but it's an incorrect use of the 
ThreadLocal pattern.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1955) PhysicalOperator has a member variable (non-static) Log object that is non-transient, this causes serialization errors

2011-04-04 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1955:


Attachment: 1955-static.patch

Agreed, i prefer the static approach.

 PhysicalOperator has a member variable (non-static) Log object that is 
 non-transient, this causes serialization errors
 --

 Key: PIG-1955
 URL: https://issues.apache.org/jira/browse/PIG-1955
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
 Fix For: 0.8.0, 0.9.0

 Attachments: 1955-po.patch, 1955-static.patch, 1955.patch


 I found this while trying to write unit tests. Creating a local PigServer to 
 test my LoadFunc caused a serialization of the PhysicalOperator class, which 
 failed due to:
 ..
 Caused by: java.io.NotSerializableException: 
 org.apache.commons.logging.impl.Log4JCategoryLog
 ..
 this is easily fixed by adding the transient keyword to the definition of log.
 e.g.
 on trunk:
 private final transient Log log = LogFactory.getLog(getClass());
 on the 0.8 tag:
 private transient Log log = LogFactory.getLog(getClass());

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-03-31 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014086#comment-13014086
 ] 

Woody Anderson commented on PIG-1824:
-

The following may not be immediately self evident to all developers:

import statements that execute from within runtime function calls will not work 
(unless the dependency has already been satisfied statically), eg:
{code}
def resplit(content, regex, index):
 import re
 return re.compile(regex).split(content)[index]
{code}

will not work b/c the import is not attempted until after the job has been 
defined, built, and deployed.
This import practice is frowned upon and is used very rarely. If you happen to 
be doing it (i'll assume you have a good reason), then you probably know how to 
fix it. If you're using someone else's code that is written like this, you can 
satisfy the dependency by explicitly importing the module up front, this will 
cause it to be added to the jar, and subsequent uses will succeed.


 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.8.0, 0.9.0, 0.10

 Attachments: 1824.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1955) PhysicalOperator has a member variable (non-static) Log object that is non-transient, this causes serialization errors

2011-03-31 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1955:


Attachment: 1955.patch

this doesn't have all of the unit test trimmings, but is that all really needed 
to mark a logger as transient?

 PhysicalOperator has a member variable (non-static) Log object that is 
 non-transient, this causes serialization errors
 --

 Key: PIG-1955
 URL: https://issues.apache.org/jira/browse/PIG-1955
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
 Fix For: 0.8.0, 0.9.0

 Attachments: 1955.patch


 I found this while trying to write unit tests. Creating a local PigServer to 
 test my LoadFunc caused a serialization of the PhysicalOperator class, which 
 failed due to:
 ..
 Caused by: java.io.NotSerializableException: 
 org.apache.commons.logging.impl.Log4JCategoryLog
 ..
 this is easily fixed by adding the transient keyword to the definition of log.
 e.g.
 on trunk:
 private final transient Log log = LogFactory.getLog(getClass());
 on the 0.8 tag:
 private transient Log log = LogFactory.getLog(getClass());

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-1955) PhysicalOperator has a member variable (non-static) Log object that is non-transient, this causes serialization errors

2011-03-31 Thread Woody Anderson (JIRA)
PhysicalOperator has a member variable (non-static) Log object that is 
non-transient, this causes serialization errors
--

 Key: PIG-1955
 URL: https://issues.apache.org/jira/browse/PIG-1955
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
 Fix For: 0.9.0, 0.8.0
 Attachments: 1955.patch

I found this while trying to write unit tests. Creating a local PigServer to 
test my LoadFunc caused a serialization of the PhysicalOperator class, which 
failed due to:
..
Caused by: java.io.NotSerializableException: 
org.apache.commons.logging.impl.Log4JCategoryLog
..


this is easily fixed by adding the transient keyword to the definition of log.

e.g.

on trunk:
private final transient Log log = LogFactory.getLog(getClass());
on the 0.8 tag:
private transient Log log = LogFactory.getLog(getClass());



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1955) PhysicalOperator has a member variable (non-static) Log object that is non-transient, this causes serialization errors

2011-03-31 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1955:


Attachment: 1955-po.patch

Ok. Unfortunately, this issue is more pervasive than i originally thought.

The 'simple' fix that is attached makes  the PO logger transient protected and  
removes the loggers from all subclasses which are defined often incorrectly 
(non-transient members) and inconsistently.

Personally, when i define loggers i always make them private and STATIC so that 
there is no getClass() call. this makes finding the class where the log line 
resides in source code much simpler to find. I dislike loggers that define 
themselves with getClass() b/c logging code in A.java will report as class B in 
output if class B extends class A.

I did not change this behavior b/c perhaps someone has their reasons for doing 
what they did. I did however remove some of the static loggers simply to ensure 
consistency (the majority were done with member variables).

The change to static private is also not such a big deal if anyone agrees we 
should consistently go that way instead.

 PhysicalOperator has a member variable (non-static) Log object that is 
 non-transient, this causes serialization errors
 --

 Key: PIG-1955
 URL: https://issues.apache.org/jira/browse/PIG-1955
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
 Fix For: 0.8.0, 0.9.0

 Attachments: 1955-po.patch, 1955.patch


 I found this while trying to write unit tests. Creating a local PigServer to 
 test my LoadFunc caused a serialization of the PhysicalOperator class, which 
 failed due to:
 ..
 Caused by: java.io.NotSerializableException: 
 org.apache.commons.logging.impl.Log4JCategoryLog
 ..
 this is easily fixed by adding the transient keyword to the definition of log.
 e.g.
 on trunk:
 private final transient Log log = LogFactory.getLog(getClass());
 on the 0.8 tag:
 private transient Log log = LogFactory.getLog(getClass());

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1824) Support import modules in Jython UDF

2011-03-31 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1824:


Attachment: 1824a.patch

This altered patch removes the explicit 'import re' test, as it relies on 
having a jython 2.5.0 install on disk and configured as visible to the runtime.

test nested accomplishes the test of the mechanism in use by 'import re', so 
removing the explicit test is simply more portable.

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.8.0, 0.9.0, 0.10

 Attachments: 1824.patch, 1824a.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1824) Support import modules in Jython UDF

2011-03-30 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1824:


Attachment: 1824.patch

here's the patch file.

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.8.0, 0.9.0, 0.10

 Attachments: 1824.patch


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2011-03-29 Thread Woody Anderson (JIRA)
script UDF (jython) should utilize the intended output schema to more directly 
convert Py objects to Pig objects


 Key: PIG-1942
 URL: https://issues.apache.org/jira/browse/PIG-1942
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Priority: Minor
 Fix For: 0.9.0


from https://issues.apache.org/jira/browse/PIG-1824

{code}
import re
@outputSchema(y:bag{t:tuple(word:chararray)})
def strsplittobag(content,regex):
return re.compile(regex).split(content)
{code}

does not work because split returns a list of strings. However, the output 
schema is known, and it would be quite simple to implicitly promote the string 
element to a tupled element.
also, a list/array/tuple/set etc. are all equally convertable to bag, and 
list/array/tuple are equally convertable to Tuple, this conversion can be done 
in a much less rigid way with the use of the schema.

this allows much more facile re-use of existing python code and less memory 
overhead to create intermediate re-converting of object types.
I have written the code to do this a while back as part of my version of the 
jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1824) Support import modules in Jython UDF

2011-03-29 Thread Woody Anderson (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Woody Anderson updated PIG-1824:


Description: 
Currently, Jython UDF script doesn't support Jython import statement as in the 
following example:

{code}
#!/usr/bin/python

import re
@outputSchema(word:chararray)
def resplit(content, regex, index):
return re.compile(regex).split(content)[index]
{code}

Can Pig automatically locate the Jython module file and ship it to the backend? 
Or should we add a ship clause to let user explicitly specify the module to 
ship? 

  was:
Currently, Jython UDF script doesn't support Jython import statement as in the 
following example:

{code}
#!/usr/bin/python

import re
@outputSchema(y:bag{t:tuple(word:chararray)})
def strsplittobag(content,regex):
return re.compile(regex).split(content)
{code}

Can Pig automatically locate the Jython module file and ship it to the backend? 
Or should we add a ship clause to let user explicitly specify the module to 
ship? 


 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-1943) jython functions can use the @outputSchema decorator, but only if in the out script that is imported, we should add a builting module pigdecorators.py so that developers ca

2011-03-29 Thread Woody Anderson (JIRA)
jython functions can use the @outputSchema decorator, but only if in the out 
script that is imported, we should add a builting module pigdecorators.py so 
that developers can import and use them in lib scripts


 Key: PIG-1943
 URL: https://issues.apache.org/jira/browse/PIG-1943
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Woody Anderson
Assignee: Woody Anderson
Priority: Minor
 Fix For: 0.9.0


if you have pig udf functions in a pig script, and want to re-use it (i.g. 
import from another script) the decorators must be defined. They will not be, 
due to scoping rules, so the decorators should be available via a standard 
importable module that ships with the jython framework (as we already define 
the decorators as part of initializing the interpreter).

this simply involves adding an appropriately named: pigdecorators.py to the 
classpath, so a dev can do:

{quote}
from pigdecorators import *
@outputSchema(w:chararray)
def word():
 return 'word'
{quote}

this can be done currently in the primary script, but when 
https://issues.apache.org/jira/browse/PIG-1824 is completed, that script would 
not properly import when used within another script in the future.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira