[
https://issues.apache.org/jira/browse/PIG-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233708#comment-13233708
]
[email protected] commented on PIG-2317:
----------------------------------------------------
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > Thanks Jonathan for this contribution. A lot of good stuff in there.
bq. > Please add comments when using Ruby specific mechanisms in there (child
classes notification, Class naming after the variable it is affected to, ...)
so that less Ruby savvy contributors can follow.
I'm going to update the JIRA with some of the broader comments.
My todo at this point is:
- Polish up the Javadocs (I should probably add @params and whatnot)
- Clean up the naming convention in the Ruby API defined in Java
- Add more test coverage (I added a couple big features, they need to be used
more in the e2e tests especially)
- Add varargs support
That said, the tests pass and it seems to work. The API feels quite clean to me.
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java,
line 41
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86324#file86324line41>
bq. >
bq. > ruby.getClass("DataBag") could be done only once
It could, but there is a caveat: the Ruby runtime HAS to be identical within
calls, or you're going to run into huge issues. The getClass call isn't
particularly heinous, and I think it's cleaner to leave in. For the time being,
all of the Ruby runtimes are identical, but in the future there may be multiple
(a different runtime per UDF or per script registered), so this is safer.
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/pigudf.rb, line 110
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line110>
bq. >
bq. > You could call it RegisterDescendentsUDF or a similar name as this
is a based class for UDFs.
bq. > The descendents registration is common to the PigUDF as well. Do you
want to unify the class hierarchy?
Agreed, I renamed it
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/pigudf.rb, line 87
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line87>
bq. >
bq. > I would add a separate field for schema func instead of this
convention.
I split it out into another class like you suggested
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/pigudf.rb, line 41
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line41>
bq. >
bq. > - please add a comment explain the mechanism that names the class
after the variable it is assigned to.
added a ton of comments
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/pigudf.rb, line 26
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line26>
bq. >
bq. > you could define a class for this
Agreed
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/pigudf.rb, line 6
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86320#file86320line6>
bq. >
bq. > maybe you need to separate the top level ruby script to be used
outside of pig and the one used inside using includes?
I was able to clean this up significantly in multiple ways. Will document how
in the JIRA.
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java,
line 65
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86324#file86324line65>
bq. >
bq. > I would override OutputSchema instead. In the case of a bag you are
losing the schema of the Tuple inside the bag.
Wisdom of the ages. Done :)
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java,
line 86
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86325#file86325line86>
bq. >
bq. > some of the ruby calling logic and exception catching could be
factored in this class
Agreed...the win felt small, I will take this into account, but it's not a
priority
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java, line 60
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86326#file86326line60>
bq. >
bq. > check for the varargs case
I'll make varargs a top priority for the next round of changes... didn't seem
super pressing atm, but maybe it is.
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java, line 15
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86327#file86327line15>
bq. >
bq. > It is very similar to JrubyEvalFunc. You can factor out more of this
code.
I suppose this begs the question of why we even have FilterFunc anymore, given
that it just extends EvalFunc<Boolean> now. I treated it separately because I
wasn't sure about that (I could just make FilterFuncs EvalFunc<Boolean>s and be
done with it)... it seems like something we should change in Pig (ie abolish
FilterFuncs). But if FilterFunc is treater separately, I want the two
classes... I suppose I could pull out the functionality, and should, but I'd
like to have some resolution on that, because it'll guide how I do so.
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java, line 107
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86328#file86328line107>
bq. >
bq. > this will strip the inner schema of bags and tuples. See comments
about overriding outputSchema
Can you avoid this with Algebraic UDFs? Fixed for Accumulators
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java, line 67
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86331#file86331line67>
bq. >
bq. > please make it private and at the top
I don't even ned it, I think
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java, line 40
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86332#file86332line40>
bq. >
bq. > make it private
bq. > 1 is a perfectly good value. it does not need to be unique.
bq. > Changing the serialVersionUID value will make the serialized data
format incompatible.
you're correct. I think this was an artifact from a previous implementation and
isn't actually necessary. good to know, though...
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java, line 104
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86332#file86332line104>
bq. >
bq. > in this case addAll would replace the current content. Is it
intended ?
Good call. It was a bug. I removed addAll and just made the functionality a
part of add.
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java, line 149
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86332#file86332line149>
bq. >
bq. > you could call runtime.getClass("DataBag").getClass("BagIterator")
once at the beginning
I eliminated BagIterator, it didn't seem necessary. That said, re: the getClass
calls, see above.
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java, line 98
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86333#file86333line98>
bq. >
bq. > give meaningful names to parameters
the issue with ruby parameters is that a lot of the functions can take multiple
types, etc, and the meaning changes depending. I could handle this a couple of
ways.
1) in the case where there is only one option, use that
2) make better use of @param javadoc
3) in the case where there are multiple possibilities, make sure to explicitly
create an object that is named to reflect what it is supposed to be
Would appreciate your thoughts on the matter
bq. On 2012-03-03 22:26:58, Julien Le Dem wrote:
bq. > trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb, line 32
bq. > <https://reviews.apache.org/r/4091/diff/1/?file=86341#file86341line32>
bq. >
bq. > What about keeping only the one parameter ouptutSchema and
outputSchemaFunctions?
bq. > Users would just use the function right before defining the UDF.
bq. >
bq. > outputSchema "word:chararray"
bq. > def concat input, input2
bq. > return input + input2
bq. > end
it'd be good to test both, agreed
- Jonathan
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4091/#review5601
-----------------------------------------------------------
On 2012-02-28 22:02:52, Jonathan Coveney wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4091/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-28 22:02:52)
bq.
bq.
bq. Review request for pig, Julien Le Dem and Alan Gates.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. This is the reviewboard for the following:
bq.
bq. https://issues.apache.org/jira/browse/PIG-2317
bq.
bq.
bq. This addresses bug PIG-2317.
bq. https://issues.apache.org/jira/browse/PIG-2317
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/src/org/apache/pig/AccumulatorEvalFunc.java PRE-CREATION
bq. trunk/src/org/apache/pig/AlgebraicEvalFunc.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/ScriptEngine.java 1294756
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
PRE-CREATION
bq. trunk/bin/pig 1294756
bq. trunk/ivy.xml 1294756
bq. trunk/ivy/libraries.properties 1294756
bq. trunk/pigudf.rb PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyUtils.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyExampleA.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyExampleB.java PRE-CREATION
bq. trunk/test/e2e/pig/build.xml 1294756
bq. trunk/test/e2e/pig/conf/default.conf 1294756
bq. trunk/test/e2e/pig/drivers/TestDriverPig.pm 1294756
bq. trunk/test/e2e/pig/tests/nightly.conf 1294756
bq. trunk/test/e2e/pig/udfs/ruby/morerubyudfs.rb PRE-CREATION
bq. trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb PRE-CREATION
bq. trunk/test/org/apache/pig/test/TestUDF.java 1294756
bq. trunk/test/org/apache/pig/test/utils/HelperEvalFuncUtils.java
PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4091/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. e2e tests can be run as follows:
bq.
bq. ant -Dharness.hadoop.home=<path_to_hadoop>
-Dharness.old.pig=<path_to_old_pig> test-e2e-deploy-local
bq. ant -Dharness.hadoop.home=<path_to_hadoop>
-Dharness.old.pig=<path_to_old_pig> -Dtests.to.run="-t RubyUDFs" test-e2e-local
bq.
bq.
bq. Thanks,
bq.
bq. Jonathan
bq.
bq.
> Ruby/Jruby UDFs
> ---------------
>
> Key: PIG-2317
> URL: https://issues.apache.org/jira/browse/PIG-2317
> Project: Pig
> Issue Type: New Feature
> Reporter: Jacob Perkins
> Assignee: Jonathan Coveney
> Priority: Minor
> Attachments: PIG-2317-8.patch, PIG-2317-8_plus.patch, PigUdf.rb,
> PigUdf.rb, jruby_scripting.patch, jruby_scripting_2_real.patch,
> jruby_scripting_3.patch, jruby_scripting_4.patch, jruby_scripting_5.patch,
> jruby_scripting_6.patch, jruby_scripting_7.patch, pigjruby.rb, pigjruby.rb,
> pigjruby.rb, pigudf.rb
>
>
> It should be possible to write UDFs in Ruby. These UDFs will be registered in
> the same way as python and javascript UDFs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira