[
https://issues.apache.org/jira/browse/PIG-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221720#comment-13221720
]
[email protected] commented on PIG-2317:
----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4091/#review5601
-----------------------------------------------------------
Thanks Jonathan for this contribution. A lot of good stuff in there.
Please add comments when using Ruby specific mechanisms in there (child classes
notification, Class naming after the variable it is affected to, ...) so that
less Ruby savvy contributors can follow.
trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12179>
maybe you need to separate the top level ruby script to be used outside of
pig and the one used inside using includes?
trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12208>
please add doc on public functions
trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12180>
@function_to_register["GETCLASSFROMOBJECT"] could be replaced by a member
@current_get_class_from_object
trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12214>
you could define a class for this
trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12209>
- please add a comment explain the mechanism that names the class after the
variable it is assigned to.
trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12210>
Add a comment to explain that classes inheriting PigUdf will automatically
get registered here.
trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12211>
I would add a separate field for schema func instead of this convention.
trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12181>
I would throw an exception if both @schema and @functions_to_register[id]
are set.
trunk/pigudf.rb
<https://reviews.apache.org/r/4091/#comment12182>
You could call it RegisterDescendentsUDF or a similar name as this is a
based class for UDFs.
The descendents registration is common to the PigUDF as well. Do you want
to unify the class hierarchy?
trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12184>
ruby.getClass("DataBag") could be done only once
trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12183>
throw an Exception here
trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12185>
I would override OutputSchema instead. In the case of a bag you are losing
the schema of the Tuple inside the bag.
trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12212>
If you can add the name of the function in the message that would help with
debugging.
trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12186>
you could factor out the function calling and bag instantiated code.
trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12215>
some of the ruby calling logic and exception catching could be factored in
this class
trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12187>
Override outputSchema instead like you did for EvalFunc
trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12213>
private
trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12189>
check for the varargs case
trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java
<https://reviews.apache.org/r/4091/#comment12190>
you should throw an exception here
trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java
<https://reviews.apache.org/r/4091/#comment12217>
It is very similar to JrubyEvalFunc. You can factor out more of this code.
trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java
<https://reviews.apache.org/r/4091/#comment12216>
private
trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java
<https://reviews.apache.org/r/4091/#comment12218>
turn these comments into javadoc
trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java
<https://reviews.apache.org/r/4091/#comment12192>
this will strip the inner schema of bags and tuples. See comments about
overriding outputSchema
trunk/src/org/apache/pig/scripting/jruby/JrubyUtils.java
<https://reviews.apache.org/r/4091/#comment12219>
please add javadoc to public methods
trunk/src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java
<https://reviews.apache.org/r/4091/#comment12193>
add javadoc
trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java
<https://reviews.apache.org/r/4091/#comment12194>
please make it private and at the top
trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12195>
add javadoc
trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12196>
make it private
1 is a perfectly good value. it does not need to be unique.
Changing the serialVersionUID value will make the serialized data format
incompatible.
trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12197>
in this case addAll would replace the current content. Is it intended ?
trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12220>
want to remove it ?
trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12198>
internalDB.add(JrubyUtils.rubyToPig((RubyArray)elem));
trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12199>
you could call runtime.getClass("DataBag").getClass("BagIterator") once at
the beginning
trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12200>
add spaces around operator and prefer the normalized "if" with { }
if () {
//
}
trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java
<https://reviews.apache.org/r/4091/#comment12201>
flattten with 3 Ts? It's going to be really flat.
trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
<https://reviews.apache.org/r/4091/#comment12202>
make this a javadoc comment
trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
<https://reviews.apache.org/r/4091/#comment12203>
private
trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
<https://reviews.apache.org/r/4091/#comment12204>
space after comma
trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
<https://reviews.apache.org/r/4091/#comment12205>
give meaningful names to parameters
trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb
<https://reviews.apache.org/r/4091/#comment12221>
What about keeping only the one parameter ouptutSchema and
outputSchemaFunctions?
Users would just use the function right before defining the UDF.
outputSchema "word:chararray"
def concat input, input2
return input + input2
end
trunk/test/org/apache/pig/test/TestUDF.java
<https://reviews.apache.org/r/4091/#comment12222>
if this value is known you may as well compare each value to the known
constant instead of comparing to the previous value.
- Julien
On 2012-02-28 22:02:52, Jonathan Coveney wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4091/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-02-28 22:02:52)
bq.
bq.
bq. Review request for pig, Julien Le Dem and Alan Gates.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. This is the reviewboard for the following:
bq.
bq. https://issues.apache.org/jira/browse/PIG-2317
bq.
bq.
bq. This addresses bug PIG-2317.
bq. https://issues.apache.org/jira/browse/PIG-2317
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/src/org/apache/pig/AccumulatorEvalFunc.java PRE-CREATION
bq. trunk/src/org/apache/pig/AlgebraicEvalFunc.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/ScriptEngine.java 1294756
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyAccumulatorEvalFunc.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyAlgebraicEvalFunc.java
PRE-CREATION
bq. trunk/bin/pig 1294756
bq. trunk/ivy.xml 1294756
bq. trunk/ivy/libraries.properties 1294756
bq. trunk/pigudf.rb PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyEvalFunc.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyFilterFunc.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/JrubyUtils.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/PigJrubyLibrary.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyBagIterator.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyDataBag.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyDataByteArray.java
PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyExampleA.java PRE-CREATION
bq. trunk/src/org/apache/pig/scripting/jruby/RubyExampleB.java PRE-CREATION
bq. trunk/test/e2e/pig/build.xml 1294756
bq. trunk/test/e2e/pig/conf/default.conf 1294756
bq. trunk/test/e2e/pig/drivers/TestDriverPig.pm 1294756
bq. trunk/test/e2e/pig/tests/nightly.conf 1294756
bq. trunk/test/e2e/pig/udfs/ruby/morerubyudfs.rb PRE-CREATION
bq. trunk/test/e2e/pig/udfs/ruby/scriptingudfs.rb PRE-CREATION
bq. trunk/test/org/apache/pig/test/TestUDF.java 1294756
bq. trunk/test/org/apache/pig/test/utils/HelperEvalFuncUtils.java
PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/4091/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. e2e tests can be run as follows:
bq.
bq. ant -Dharness.hadoop.home=<path_to_hadoop>
-Dharness.old.pig=<path_to_old_pig> test-e2e-deploy-local
bq. ant -Dharness.hadoop.home=<path_to_hadoop>
-Dharness.old.pig=<path_to_old_pig> -Dtests.to.run="-t RubyUDFs" test-e2e-local
bq.
bq.
bq. Thanks,
bq.
bq. Jonathan
bq.
bq.
> Ruby/Jruby UDFs
> ---------------
>
> Key: PIG-2317
> URL: https://issues.apache.org/jira/browse/PIG-2317
> Project: Pig
> Issue Type: New Feature
> Reporter: Jacob Perkins
> Assignee: Jonathan Coveney
> Priority: Minor
> Attachments: PIG-2317-8.patch, PigUdf.rb, PigUdf.rb,
> jruby_scripting.patch, jruby_scripting_2_real.patch, jruby_scripting_3.patch,
> jruby_scripting_4.patch, jruby_scripting_5.patch, jruby_scripting_6.patch,
> jruby_scripting_7.patch, pigjruby.rb, pigjruby.rb, pigjruby.rb, pigudf.rb
>
>
> It should be possible to write UDFs in Ruby. These UDFs will be registered in
> the same way as python and javascript UDFs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira