Re: [Boston.pm] Inline::Java
Jerrad, Thanks for pointing me to these projects. Unfortunately Algorithm::CRFhttp://search.cpan.org/~clsung/Algorithm-CRF-0.04/lib/Algorithm/CRF.pm, was last updated in 2006. I haven't tested it but it appears to be alpha quality based on the fact that it has stub documentation. Also it provides a wrapper around a C++ library rather than a pure Perl implementation. It seems like any solution to use CRFs within Perl will use an implementation written in another language. I'm inclined to use the Mallet Java library and connect it to my Perl code with Inline::Java. Is there any reason to believe that connecting to a Java library with Inline::Java will be more difficult than interfacing with a C++ library or an R library? I haven't tried using Statistics::R or a C++ library with a Perl shim. So I'd be curious if anyone can compare these to Inline::Java. Thanks, David On Wed, May 29, 2013 at 4:52 PM, Jerrad Pierce belg4...@pthbb.org wrote: With regard to PDL, for most tasks you want be be able to easily apply a known algorithm to your data and not have to worry about writing it from scratch. PDL may or may not be a better platform than Python's numpy but there are sophisticated libraries such as sci-py and PyBrain built on top of numpy and no real equivalent for Perl and PDL. Searching for a few of the broad categories here http://docs.scipy.org/doc/scipy-0.12.0/reference/ I am able to quickly turns up PDL equivalents of many of categories of things SciPy does. YMMV. (CRF is not amongst this list--natively--nor something that one would expect an image processing library to have developed initially... and of course somebody has to write the first implementation ;-) FWIW There's Algorithm::CRF and CRF++'s own Perl shim http://code.google.com/p/crfpp/source/browse/trunk/perl/ ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Inline::Java
Thanks William, My biggest concern with Inline::Java is why it isn't more widely known and used. For data analysis, there are two main choices: Python and the JVM stack. CPAN is nice but it just doesn't have the data libraries these platforms do. It would seem like Inline::Java would be frequently touted as a savor of Perl letting you write code that uses both powerful Java libraries and CPAN while still having the benefits of Perl. Are the reasons why Inline::Java isn't more widely known and used cultural? E.g. Perl developers have an aversion to Java, most Perl developers don't need data analysis, Inline::Java is badly marketed, etc. Or are there technical reasons why Inline::Java is less attractive than it first appears? Would it make sense to use Inline::Java and Perl for a green field project or would it be better to just use Java or other JVM languages such as Scala or Jython when Java libraries are a major component? -- David On Tue, May 28, 2013 at 10:42 PM, William Cox mydimens...@gmail.com wrote: I've used Inline::Java for 2 main projects in my $work: one is a wrapper around the Maven libraries to introspect POMs, the other is a long run daemon using an internal java library. I've found it to be extremely reliable. To help maintain performance, keep in mind the following: 1. use the JNI interface so that all Perl-Java communication happens in the same process. the other mode runs a separate process and communicates over a unix socket. 2. more importantly, plan very carefully your calls (method calls, property lookups, object construction, etc.) from Perl to Java. I've found that the time to communicate even simple data can take much longer that the time taken to process that data (to much context switching). (2) for me was exemplified for me on wrapping our internal Java library. I was doing lots of back-and-forth. After implementing a Java-side layer that did the same calls I was doing from Perl, I saw a significant speed boost. Hope this helps. On Tue, May 28, 2013 at 8:59 PM, David Larochelle da...@larochelle.name wrote: Does anyone have experience with Inline::Java? I did some basic tests with it and it seems to work but I'm concerned about its reliability in production use. The use case is a large data processing system implemented in Perl. I'd like to add an algorithm (Conditional Random Fields) that's not implemented on CPAN but is available as a Java library. I'm hoping to use Inline::Java to allow this algorithm to be called from the existing Perl code. An alternative is to execute the Java code as a separate process and pass data to the Perl process with pipes and files. In my case the Java code will be run around 50,000 times a day to process incoming data. So I want to make sure that whatever solution I use is stable and low overhead. Thanks, David ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm -- William Cox email: mydimens...@gmail.com sgp.cm/mydimension -BEGIN GEEK CODE BLOCK- Version: 3.1 GCS d- s+:+() a C++()$ UBLC(++)$ P+++()$ L++(+++)$ !E--- W++(+++)$ !N !o? K--? !w--- !O M++ !V- PS-(--)@ PE+() Y+ !PGP t++ !5 X+++ !R tv(+) b+++ DI+(++) D+() G e h--- r+++ y+++ --END GEEK CODE BLOCK-- ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Inline::Java
For general data analysis, PDL, Statistics::R or PDL::R::math seem like more logical choices than the languages you mentioned. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Inline::Java
On Wed, May 29, 2013 at 01:41:56PM -0400, David Larochelle wrote: My biggest concern with Inline::Java is why it isn't more widely known and used. I would not read too much into the fact that a particular solution is not popular. I have used, and continue to use, such tools when appropriate. Having said that, most environments and programmers tend to be monolingual and tend to prefer solutions that don't cross languages. It's an easy and safe decision to make whereas the other requires careful thought and good reasons. -Gyepi -- Kites rise highest against the wind---not with it. --Winston Churchill ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Inline::Java
My suspicion is that it's 2-fold: 1. Java doesn't fit the Perl mold 2. Inline::Java can be tricky to get setup (despite the docs appearing to make it easy) On (1), the development philosophies between perl and java tend to be so different that each side almost writes the other off. In my experience on (2), for example by BEGIN block does the following things (for the maven wrapper): 1. find the installation of the mvn command line utility, then add all the jars to $ENV{CLASSPATH} 2. map java classnames to constant subs in the current package representing the perl package standing in for that java class Then I execute the 'use Inline (args);' line. Here is a brief example of what I'm doing: BEGIN { chomp(my $mvn = `which mvn`); $mvn or die Unable to locate maven installation - is it installed?; $maven_home = Cwd::realpath($mvn); $maven_home =~ s{/bin/mvn$}{}; $ENV{CLASSPATH} = join ':', $maven_home/lib/*.jar, $maven_home/boot/*.jar; @class = ( 'java.io.File', 'java.io.FileInputStream', ... 'org.apache.maven.Maven', ... ); %class = map { (/\.(\w+)$/ = Inline::Java::java2perl(__PACKAGE__, $_)) } @class; # this will create subs that simply return the full class of the java # package my $stash = Package::Stash-new(__PACKAGE__); while (my ($C, $P) = each %class) { my $symbol = \$C; next if $stash-has_symbol($symbol); $stash-add_symbol($symbol, sub () { $P }); } } use Inline ( Java = 'STUDY', EXTRA_JAVA_ARGS = -Dmaven.home=$maven_home -Dclassworlds.conf=$maven_home/bin/m2.conf -Xms2048m -Xmx2048m, STUDY = \@class, AUTOSTUDY = 1, JNI = 1, CLASSPATH = $ENV{CLASSPATH}, NAME = __PACKAGE__, DIRECTORY = ((__FILE__ =~ m{^(.*)\.pm$})[0] . '/_Inline'), ); All of this makes for a rather non-trivial use-case for Java inside perl. I spent a lot of time finding the right setup for my needs. And that tends to be common for Java in general (why engineer it once when you can over engineer it twice). On Wed, May 29, 2013 at 1:41 PM, David Larochelle da...@larochelle.name wrote: Thanks William, My biggest concern with Inline::Java is why it isn't more widely known and used. For data analysis, there are two main choices: Python and the JVM stack. CPAN is nice but it just doesn't have the data libraries these platforms do. It would seem like Inline::Java would be frequently touted as a savor of Perl letting you write code that uses both powerful Java libraries and CPAN while still having the benefits of Perl. Are the reasons why Inline::Java isn't more widely known and used cultural? E.g. Perl developers have an aversion to Java, most Perl developers don't need data analysis, Inline::Java is badly marketed, etc. Or are there technical reasons why Inline::Java is less attractive than it first appears? Would it make sense to use Inline::Java and Perl for a green field project or would it be better to just use Java or other JVM languages such as Scala or Jython when Java libraries are a major component? -- David On Tue, May 28, 2013 at 10:42 PM, William Cox mydimens...@gmail.com wrote: I've used Inline::Java for 2 main projects in my $work: one is a wrapper around the Maven libraries to introspect POMs, the other is a long run daemon using an internal java library. I've found it to be extremely reliable. To help maintain performance, keep in mind the following: 1. use the JNI interface so that all Perl-Java communication happens in the same process. the other mode runs a separate process and communicates over a unix socket. 2. more importantly, plan very carefully your calls (method calls, property lookups, object construction, etc.) from Perl to Java. I've found that the time to communicate even simple data can take much longer that the time taken to process that data (to much context switching). (2) for me was exemplified for me on wrapping our internal Java library. I was doing lots of back-and-forth. After implementing a Java-side layer that did the same calls I was doing from Perl, I saw a significant speed boost. Hope this helps. On Tue, May 28, 2013 at 8:59 PM, David Larochelle da...@larochelle.name wrote: Does anyone have experience with Inline::Java? I did some basic tests with it and it seems to work but I'm concerned about its reliability in production use. The use case is a large data processing system implemented in Perl. I'd like to add an algorithm (Conditional Random Fields) that's not implemented on CPAN but is available as a Java library. I'm hoping to use Inline::Java to allow this algorithm to be called from the existing Perl code. An alternative is to execute the Java code as a separate process and pass data to the Perl process with pipes and files. In my case the Java code will be run around 50,000 times a day to process incoming data.
Re: [Boston.pm] Inline::Java
Jerrad, I haven't used Statistics::R but it's probably worth considering. I suppose that R also merited mention as a language for data analysis in addition to Python and Java/JVM. With regard to PDL, for most tasks you want be be able to easily apply a known algorithm to your data and not have to worry about writing it from scratch. PDL may or may not be a better platform than Python's numpy but there are sophisticated libraries such as sci-py and PyBrain built on top of numpy and no real equivalent for Perl and PDL. -- David On Wed, May 29, 2013 at 1:45 PM, Jerrad Pierce belg4...@pthbb.org wrote: For general data analysis, PDL, Statistics::R or PDL::R::math seem like more logical choices than the languages you mentioned. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Inline::Java
William, Thank you for offering your thoughts and your experiences. I feel about better about using Inline::Java. My application is large enough that porting it to another language would be nontrivial. Inline::Java seem like a better choice than the alternatives such as calling Java with system, reimplementing algorithms. -- David On Wed, May 29, 2013 at 3:15 PM, William Cox mydimens...@gmail.com wrote: My suspicion is that it's 2-fold: 1. Java doesn't fit the Perl mold 2. Inline::Java can be tricky to get setup (despite the docs appearing to make it easy) On (1), the development philosophies between perl and java tend to be so different that each side almost writes the other off. In my experience on (2), for example by BEGIN block does the following things (for the maven wrapper): 1. find the installation of the mvn command line utility, then add all the jars to $ENV{CLASSPATH} 2. map java classnames to constant subs in the current package representing the perl package standing in for that java class Then I execute the 'use Inline (args);' line. Here is a brief example of what I'm doing: BEGIN { chomp(my $mvn = `which mvn`); $mvn or die Unable to locate maven installation - is it installed?; $maven_home = Cwd::realpath($mvn); $maven_home =~ s{/bin/mvn$}{}; $ENV{CLASSPATH} = join ':', $maven_home/lib/*.jar, $maven_home/boot/*.jar; @class = ( 'java.io.File', 'java.io.FileInputStream', ... 'org.apache.maven.Maven', ... ); %class = map { (/\.(\w+)$/ = Inline::Java::java2perl(__PACKAGE__, $_)) } @class; # this will create subs that simply return the full class of the java # package my $stash = Package::Stash-new(__PACKAGE__); while (my ($C, $P) = each %class) { my $symbol = \$C; next if $stash-has_symbol($symbol); $stash-add_symbol($symbol, sub () { $P }); } } use Inline ( Java = 'STUDY', EXTRA_JAVA_ARGS = -Dmaven.home=$maven_home -Dclassworlds.conf=$maven_home/bin/m2.conf -Xms2048m -Xmx2048m, STUDY = \@class, AUTOSTUDY = 1, JNI = 1, CLASSPATH = $ENV{CLASSPATH}, NAME = __PACKAGE__, DIRECTORY = ((__FILE__ =~ m{^(.*)\.pm$})[0] . '/_Inline'), ); All of this makes for a rather non-trivial use-case for Java inside perl. I spent a lot of time finding the right setup for my needs. And that tends to be common for Java in general (why engineer it once when you can over engineer it twice). On Wed, May 29, 2013 at 1:41 PM, David Larochelle da...@larochelle.name wrote: Thanks William, My biggest concern with Inline::Java is why it isn't more widely known and used. For data analysis, there are two main choices: Python and the JVM stack. CPAN is nice but it just doesn't have the data libraries these platforms do. It would seem like Inline::Java would be frequently touted as a savor of Perl letting you write code that uses both powerful Java libraries and CPAN while still having the benefits of Perl. Are the reasons why Inline::Java isn't more widely known and used cultural? E.g. Perl developers have an aversion to Java, most Perl developers don't need data analysis, Inline::Java is badly marketed, etc. Or are there technical reasons why Inline::Java is less attractive than it first appears? Would it make sense to use Inline::Java and Perl for a green field project or would it be better to just use Java or other JVM languages such as Scala or Jython when Java libraries are a major component? -- David On Tue, May 28, 2013 at 10:42 PM, William Cox mydimens...@gmail.com wrote: I've used Inline::Java for 2 main projects in my $work: one is a wrapper around the Maven libraries to introspect POMs, the other is a long run daemon using an internal java library. I've found it to be extremely reliable. To help maintain performance, keep in mind the following: 1. use the JNI interface so that all Perl-Java communication happens in the same process. the other mode runs a separate process and communicates over a unix socket. 2. more importantly, plan very carefully your calls (method calls, property lookups, object construction, etc.) from Perl to Java. I've found that the time to communicate even simple data can take much longer that the time taken to process that data (to much context switching). (2) for me was exemplified for me on wrapping our internal Java library. I was doing lots of back-and-forth. After implementing a Java-side layer that did the same calls I was doing from Perl, I saw a significant speed boost. Hope this helps. On Tue, May 28, 2013 at 8:59 PM, David Larochelle da...@larochelle.name wrote: Does anyone have experience with Inline::Java? I did some basic tests with it and it seems to work but I'm concerned about its reliability in production