[ https://issues.apache.org/jira/browse/PIG-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221720#comment-13221720 ]
jirapos...@reviews.apache.org commented on PIG-2317: ---------------------------------------------------- ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4091/#review5601 ----------------------------------------------------------- Thanks Jonathan for this contribution. A lot of good stuff in there. Please add comments when using Ruby specific mechanisms in there (child classes notification, Class naming after the variable it is affected to, ...) so that less Ruby savvy contributors can follow. trunk/pigudf.rb <https://reviews.apache.org/r/4091/#comment12179> maybe you need to separate the top level ruby script to be used outside of pig and the one used inside using includes? trunk/pigudf.rb <https://reviews.apache.org/r/4091/#comment12208> please add doc on public functions trunk/pigudf.rb <https://reviews.apache.org/r/4091/#comment12180> @function_to_register["GETCLASSFROMOBJECT"] could be replaced by a member @current_get_class_from_object trunk/pigudf.rb <https://reviews.apache.org/r/4091/#comment12214> you could define a class for this trunk/pigudf.rb <https://reviews.apache.org/r/4091/#comment12209> - please add a comment explain the mechanism that names the class after the variable it is assigned to. trunk/pigudf.rb <https://reviews.apache.org/r/4091/#comment12210> Add a comment to explain that classes inheriting PigUdf will automatically get registered here. trunk/pigudf.rb <https://reviews.apache.org/r/4091/#comment12211> I would add a separate field for schema func instead of this convention. trunk/pigudf.rb <https://reviews.apache.org/r/4091/#comment12181> I would throw an exception if both @schema and @functions_to_register[id] are set. trunk/pigudf.rb <https://reviews.apache.org/r/4091/#comment12182> You could call it RegisterDescendentsUDF or a similar name as this is a based class for UDFs. The descendents registration is common to the PigUDF as well. Do you want to unify the class hierarchy? trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java <https://reviews.apache.org/r/4091/#comment12184> ruby.getClass("DataBag") could be done only once trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java <https://reviews.apache.org/r/4091/#comment12183> throw an Exception here trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java <https://reviews.apache.org/r/4091/#comment12185> I would override OutputSchema instead. In the case of a bag you are losing the schema of the Tuple inside the bag. trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java <https://reviews.apache.org/r/4091/#comment12212> If you can add the name of the function in the message that would help with debugging. trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java <https://reviews.apache.org/r/4091/#comment12186> you could factor out the function calling and bag instantiated code. trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java <https://reviews.apache.org/r/4091/#comment12215> some of the ruby calling logic and exception catching could be factored in this class trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java <https://reviews.apache.org/r/4091/#comment12187> Override outputSchema instead like you did for EvalFunc trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java <https://reviews.apache.org/r/4091/#comment12213> private trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java <https://reviews.apache.org/r/4091/#comment12189> check for the varargs case trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java <https://reviews.apache.org/r/4091/#comment12190> you should throw an exception here trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java <https://reviews.apache.org/r/4091/#comment12217> It is very similar to JrubyEvalFunc. You can factor out more of this code. trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java <https://reviews.apache.org/r/4091/#comment12216> private trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java <https://reviews.apache.org/r/4091/#comment12218> turn these comments into javadoc trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java <https://reviews.apache.org/r/4091/#comment12192> this will strip the inner schema of bags and tuples. See comments about overriding outputSchema trunk/src/org/apache/pig/scripting/jruby/JrubyUtils.java <https://reviews.apache.org/r/4091/#comment12219> please add javadoc to public methods trunk/src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java <https://reviews.apache.org/r/4091/#comment12193> add javadoc trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java <https://reviews.apache.org/r/4091/#comment12194> please make it private and at the top trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java <https://reviews.apache.org/r/4091/#comment12195> add javadoc trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java <https://reviews.apache.org/r/4091/#comment12196> make it private 1 is a perfectly good value. it does not need to be unique. Changing the serialVersionUID value will make the serialized data format incompatible. trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java <https://reviews.apache.org/r/4091/#comment12197> in this case addAll would replace the current content. Is it intended ? trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java <https://reviews.apache.org/r/4091/#comment12220> want to remove it ? trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java <https://reviews.apache.org/r/4091/#comment12198> internalDB.add(JrubyUtils.rubyToPig((RubyArray)elem)); trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java <https://reviews.apache.org/r/4091/#comment12199> you could call runtime.getClass("DataBag").getClass("BagIterator") once at the beginning trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java <https://reviews.apache.org/r/4091/#comment12200> add spaces around operator and prefer the normalized "if" with { } if () { // } trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java <https://reviews.apache.org/r/4091/#comment12201> flattten with 3 Ts? It's going to be really flat. trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java <https://reviews.apache.org/r/4091/#comment12202> make this a javadoc comment trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java <https://reviews.apache.org/r/4091/#comment12203> private trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java <https://reviews.apache.org/r/4091/#comment12204> space after comma trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java <https://reviews.apache.org/r/4091/#comment12205> give meaningful names to parameters trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb <https://reviews.apache.org/r/4091/#comment12221> What about keeping only the one parameter ouptutSchema and outputSchemaFunctions? Users would just use the function right before defining the UDF. outputSchema "word:chararray" def concat input, input2 return input + input2 end trunk/test/org/apache/pig/test/TestUDF.java <https://reviews.apache.org/r/4091/#comment12222> if this value is known you may as well compare each value to the known constant instead of comparing to the previous value. - Julien On 2012-02-28 22:02:52, Jonathan Coveney wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4091/ bq. ----------------------------------------------------------- bq. bq. (Updated 2012-02-28 22:02:52) bq. bq. bq. Review request for pig, Julien Le Dem and Alan Gates. bq. bq. bq. Summary bq. ------- bq. bq. This is the reviewboard for the following: bq. bq. https://issues.apache.org/jira/browse/PIG-2317 bq. bq. bq. This addresses bug PIG-2317. bq. https://issues.apache.org/jira/browse/PIG-2317 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/src/org/apache/pig/AccumulatorEvalFunc.java PRE-CREATION bq. trunk/src/org/apache/pig/AlgebraicEvalFunc.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/ScriptEngine.java 1294756 bq. trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java PRE-CREATION bq. trunk/bin/pig 1294756 bq. trunk/ivy.xml 1294756 bq. trunk/ivy/libraries.properties 1294756 bq. trunk/pigudf.rb PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/JrubyUtils.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/RubyExampleA.java PRE-CREATION bq. trunk/src/org/apache/pig/scripting/jruby/RubyExampleB.java PRE-CREATION bq. trunk/test/e2e/pig/build.xml 1294756 bq. trunk/test/e2e/pig/conf/default.conf 1294756 bq. trunk/test/e2e/pig/drivers/TestDriverPig.pm 1294756 bq. trunk/test/e2e/pig/tests/nightly.conf 1294756 bq. trunk/test/e2e/pig/udfs/ruby/morerubyudfs.rb PRE-CREATION bq. trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb PRE-CREATION bq. trunk/test/org/apache/pig/test/TestUDF.java 1294756 bq. trunk/test/org/apache/pig/test/utils/HelperEvalFuncUtils.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/4091/diff bq. bq. bq. Testing bq. ------- bq. bq. e2e tests can be run as follows: bq. bq. ant -Dharness.hadoop.home=<path_to_hadoop> -Dharness.old.pig=<path_to_old_pig> test-e2e-deploy-local bq. ant -Dharness.hadoop.home=<path_to_hadoop> -Dharness.old.pig=<path_to_old_pig> -Dtests.to.run="-t RubyUDFs" test-e2e-local bq. bq. bq. Thanks, bq. bq. Jonathan bq. bq. > Ruby/Jruby UDFs > --------------- > > Key: PIG-2317 > URL: https://issues.apache.org/jira/browse/PIG-2317 > Project: Pig > Issue Type: New Feature > Reporter: Jacob Perkins > Assignee: Jonathan Coveney > Priority: Minor > Attachments: PIG-2317-8.patch, PigUdf.rb, PigUdf.rb, > jruby_scripting.patch, jruby_scripting_2_real.patch, jruby_scripting_3.patch, > jruby_scripting_4.patch, jruby_scripting_5.patch, jruby_scripting_6.patch, > jruby_scripting_7.patch, pigjruby.rb, pigjruby.rb, pigjruby.rb, pigudf.rb > > > It should be possible to write UDFs in Ruby. These UDFs will be registered in > the same way as python and javascript UDFs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira