There are two ways to go about using R with java (that I've found). Both
are a little bit of a hassle depending on your setup.

JRI is a JNI for R, so you don't need R installed on the machine for it to
work. But you do need to include a set of DLLs in the classpath; the best
way I've found to do this is to bundle the dll's in the .jar and then copy
them to the local directory at runtime (as copying them elsewhere and
changing java.library.path won't work). There are some features missing
from JRI, though, especially the ability for multiple
environments/sessions; I don't quite yet have down a plan for the R/Pig
integration, but having sessions might be useful.

The other method is through Rserve, which is both a java package and an
application; the application sets up an R server that by default allows
only a single connection from a local machine (if you wanted, each
map-reduce job could connect to the same R server/instance, but I don't
think that's useful). To start this up, you would need R installed and then
run Rserve. In EMR, this would be possible as it does have R, so you would
just need a bootstrap script to start R. Optionally, it is probably
possible to tell Rserve to start from within java, but that's much trickier.

I prefer the first method as it eliminates the requirement of having R
installed; however, I'm hoping to implement both (for Rserve, I'll require
that the server is already started; and maybe include an option for
connecting to a specific server).

I don't have a clear vision of how R/Pig will interact; it will have to be
something different than Python or JScript, but I don't know how different.
I want to just scratch out something basic and then try and evolve it from
there.

I'll go ahead and submit that Jira.

Thanks,

- Connor


On Tue, Jan 22, 2013 at 4:44 PM, Jonathan Coveney <jcove...@gmail.com>wrote:

> Ahhh, I see. That makes sense. Sadly, this won't currently be possible in
> the current version of Pig, but this is a really good reason to want to do
> this. Can you make a ticket about making it possible to plug in
> ScriptingEngines without having a make a code change to Pig? I think this
> would be useful for this reason.
>
> That said, if you dig down into how these implementations work, they are
> based on EvalFunc's, so manually making UDF's to do it is an annoyance, but
> functionally quite similar.
>
> Question about R: is there a JVM implementation, or are you shelling out?
>
>
> 2013/1/22 Connor Woodson <cwoodson....@gmail.com>
>
> > I'm starting work on an R scripting engine; I'm not entirely sure how it
> > will be used, but I know that there have been attempts to get R working
> > with MapReduce / EMR and I thought it would be cool to do that through
> Pig.
> > (One fun use case might be to generate plots/graphs during the MR job
> (then
> > do something with them))
> >
> > The easy answer for how to get this working with Pig is to just stick new
> > scripting engines with the existing ones and update the ScriptingEngine
> > enum to include those; however, I would like to use this in EMR which
> > doesn't update its software regularly and so I was hoping there was some
> > hook to get this scripting engine called, but it looks like it'll just
> have
> > to be used for UDFs for now.
> >
> > If a change is going to be made, I think what would be helpful is a
> change
> > in how the ScriptingEngine decides which subclass  to call; right now
> (from
> > what I can tell) it will only look at the file suffix or the #! first
> line
> > of the script and try and match those with its internal list. Maybe allow
> > an annotation like
> > #@ <FQCN of a ScriptingEngine>
> > as the first line of a script to force Pig to use a specific engine.
> >
> > - Connor
> >
> >
> > On Tue, Jan 22, 2013 at 3:56 PM, Jonathan Coveney <jcove...@gmail.com
> > >wrote:
> >
> > > So, something like this is not currently possible, but I think it would
> > be
> > > possible to expose a set of interfaces that would make this possible.
> > That
> > > said, why is this desirable? Is your goal to override one of the
> existing
> > > SE's, or something? I could imagine reworking things so that anyone can
> > > register an arbitrary SE, and then we can implement the current SE's in
> > > terms of that interface. That said, I'm not sure of a compelling reason
> > to
> > > do this, and would love a use case.
> > >
> > > I worked on the JRuby implementation and reviewed the Groovy one and
> > think
> > > that we could be doing a lot more with scripting languages, so you have
> > my
> > > attention.
> > >
> > >
> > > 2013/1/21 Connor Woodson <cwoodson....@gmail.com>
> > >
> > > > I want to write a custom scripting engine and I would like to not
> have
> > to
> > > > modify the enum in ScriptingEngine.java to get it to work both in the
> > > > 'register' command for UDFs, but also for embedded scripts. From
> what I
> > > can
> > > > tell, the former is possible by passing in a FQCN to the register
> > command
> > > > instead of one of the keywords; however, I can't tell if it is
> possible
> > > to
> > > > get Pig to run my scripting engine when I pass it a non-pig file
> (e.g.
> > > you
> > > > pass it a .py file and it runs the jython scripting engine). So is
> this
> > > > second use possible, or (for now) can custom SE's only be used for
> > UDFs?
> > > >
> > > > (I'll admit here that I don't understand what I meant in the end of
> my
> > > > previous email; feel free to ignore it).
> > > >
> > > > Thanks,
> > > >
> > > > - Connor
> > > >
> > > >
> > > > On Mon, Jan 21, 2013 at 5:04 PM, Jonathan Coveney <
> jcove...@gmail.com
> > > > >wrote:
> > > >
> > > > > Can you describe at a higher level what you have in mind?
> > > > >
> > > > >
> > > > > 2013/1/21 Connor Woodson <cwoodson....@gmail.com>
> > > > >
> > > > > > Is there a way to get Pig to use your custom scripting engine
> > without
> > > > > > having to modify ScriptingEngine.java and placing it in the enum?
> > It
> > > > > looks
> > > > > > like it's possible with enums, but what about for embedding pig?
> > (as
> > > in
> > > > > how
> > > > > > Pig can run python scripts).
> > > > > >
> > > > > > - Connor
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 21, 2013 at 1:59 PM, Daniel Dai <
> da...@hortonworks.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Pig currently support jython, jruby, javascript and groovy. If
> > you
> > > > > > > need to write other scripting engine, extend ScriptEngine.
> > > > > > >
> > > > > > > Here are some references:
> > > > > > > 1.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.slideshare.net/daijy/pig-programming-is-more-fun-new-features-in-pig
> > > > > > > (pp 24, 25)
> > > > > > > 2. Groovy UDF: https://issues.apache.org/jira/browse/PIG-2763
> > > > > > > 3. JRuby UDF: https://issues.apache.org/jira/browse/PIG-2317
> > > > > > > 4. Javascript UDF:
> > https://issues.apache.org/jira/browse/PIG-1794
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Daniel
> > > > > > >
> > > > > > > On Fri, Jan 18, 2013 at 6:42 PM, Connor Woodson <
> > > > > cwoodson....@gmail.com>
> > > > > > > wrote:
> > > > > > > > Is there any support for a custom scripting engine, to allow
> > UDFs
> > > > to
> > > > > be
> > > > > > > > written in a different language / embed pig in another
> > language?
> > > > > > > >
> > > > > > > > - Connor
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to