Re: [Boston.pm] Inline::Java

2013-06-03 Thread David Larochelle
Jerrad,

Thanks for pointing me to these projects. Unfortunately
Algorithm::CRFhttp://search.cpan.org/~clsung/Algorithm-CRF-0.04/lib/Algorithm/CRF.pm,
was
last updated in 2006. I haven't tested it but it appears to be alpha
quality based on the fact that it has stub documentation. Also it provides
a wrapper around a C++ library rather than a pure Perl implementation.

It seems like any solution to use CRFs within Perl will use an
implementation written in another language. I'm inclined to use the Mallet
Java library and connect it to my Perl code with Inline::Java. Is there any
reason to believe that connecting to a Java library with Inline::Java will
be more difficult than interfacing with a C++ library or an R library?

I haven't tried using Statistics::R or a C++ library with a Perl shim. So
I'd be curious if anyone can compare these to Inline::Java.

Thanks,

David

On Wed, May 29, 2013 at 4:52 PM, Jerrad Pierce belg4...@pthbb.org wrote:

 With regard to PDL, for most tasks you want be be able to easily apply a
 known algorithm to your data and not have to worry about writing it from
 scratch. PDL may or may not be a better platform than Python's numpy but
 there are sophisticated libraries such as sci-py and PyBrain built on top
 of numpy and no real equivalent for Perl and PDL.

 Searching for a few of the broad categories here
 http://docs.scipy.org/doc/scipy-0.12.0/reference/ I am able to quickly
 turns up PDL equivalents of many of categories of things SciPy does. YMMV.
 (CRF is not amongst this list--natively--nor something that one would
 expect an image processing library to have developed initially...
 and of course somebody has to write the first implementation ;-)

 FWIW There's Algorithm::CRF and CRF++'s own Perl shim
 http://code.google.com/p/crfpp/source/browse/trunk/perl/


___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Inline::Java

2013-05-29 Thread David Larochelle
Thanks William,

My biggest concern with Inline::Java is why it isn't more widely known and
used. For data analysis, there are two main choices: Python and the JVM
stack. CPAN is nice but it just doesn't have the data libraries these
platforms do. It would seem like Inline::Java would be frequently touted as
a savor of Perl letting you write code that uses both powerful Java
libraries and CPAN while still having the benefits of Perl.

Are the reasons why Inline::Java isn't more widely known and used cultural?
E.g. Perl developers have an aversion to Java, most Perl developers don't
need data analysis, Inline::Java is badly marketed, etc. Or are there
technical reasons why Inline::Java is less attractive than it first
appears?

Would it make sense to use Inline::Java and Perl for a green field project
or would it be better to just use Java or other JVM languages such as Scala
or Jython when Java libraries are a major component?

--


David

On Tue, May 28, 2013 at 10:42 PM, William Cox mydimens...@gmail.com wrote:

 I've used Inline::Java for 2 main projects in my $work: one is a
 wrapper around the Maven libraries to introspect POMs, the other is a
 long run daemon using an internal java library. I've found it to be
 extremely reliable. To help maintain performance, keep in mind the
 following:

 1. use the JNI interface so that all Perl-Java communication happens
 in the same process. the other mode runs a separate process and
 communicates over a unix socket.
 2. more importantly, plan very carefully your calls (method calls,
 property lookups, object construction, etc.) from Perl to Java. I've
 found that the time to communicate even simple data can take much
 longer that the time taken to process that data (to much context
 switching).

 (2) for me was exemplified for me on wrapping our internal Java
 library. I was doing lots of back-and-forth. After implementing a
 Java-side layer that did the same calls I was doing from Perl, I saw a
 significant speed boost.

 Hope this helps.

 On Tue, May 28, 2013 at 8:59 PM, David Larochelle da...@larochelle.name
 wrote:
  Does anyone have experience with Inline::Java?
 
  I did some basic tests with it and it seems to work but I'm concerned
 about
  its reliability in production use.
 
  The use case is a large data processing system implemented in Perl. I'd
  like to add an algorithm (Conditional Random Fields) that's not
 implemented
  on CPAN but is available as a Java library.
  I'm hoping to use Inline::Java to allow this algorithm to be called from
  the existing Perl code.
  An alternative is to execute the Java code as a separate process and pass
  data to the Perl process with pipes and files.
 
  In my case the Java code will be run around 50,000 times a day to process
  incoming data. So I want to make sure that whatever solution I use is
  stable and low overhead.
 
  Thanks,
 
  David
 
  ___
  Boston-pm mailing list
  Boston-pm@mail.pm.org
  http://mail.pm.org/mailman/listinfo/boston-pm



 --
 William Cox

 email: mydimens...@gmail.com
 sgp.cm/mydimension

 -BEGIN GEEK CODE BLOCK-
 Version: 3.1
 GCS d- s+:+() a C++()$ UBLC(++)$
 P+++()$ L++(+++)$ !E--- W++(+++)$
 !N !o? K--? !w--- !O M++ !V- PS-(--)@ PE+()
 Y+ !PGP t++ !5 X+++ !R tv(+) b+++
 DI+(++) D+() G e h--- r+++ y+++
 --END GEEK CODE BLOCK--


___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Inline::Java

2013-05-29 Thread Jerrad Pierce
For general data analysis, PDL, Statistics::R or PDL::R::math
seem like more logical choices than the languages you mentioned.

___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Inline::Java

2013-05-29 Thread Gyepi SAM
On Wed, May 29, 2013 at 01:41:56PM -0400, David Larochelle wrote:
 My biggest concern with Inline::Java is why it isn't more widely known and 
 used.

I would not read too much into the fact that a particular solution is not
popular. I have used, and continue to use, such tools when appropriate.

Having said that, most environments and programmers tend to be monolingual
and tend to prefer solutions that don't cross languages. It's an easy and safe
decision to make whereas the other requires careful thought and good reasons.

-Gyepi

-- 
Kites rise highest against the wind---not with it.
--Winston Churchill  

___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Inline::Java

2013-05-29 Thread William Cox
My suspicion is that it's 2-fold:
  1. Java doesn't fit the Perl mold
  2. Inline::Java can be tricky to get setup (despite the docs
appearing to make it easy)

On (1), the development philosophies between perl and java tend to be
so different that each side almost writes the other off.

In my experience on (2), for example by BEGIN block does the following
things (for the maven wrapper):
  1. find the installation of the mvn command line utility, then add
all the jars to $ENV{CLASSPATH}
  2. map java classnames to constant subs in the current package
representing the perl package standing in for that java class

Then I execute the 'use Inline (args);' line. Here is a brief example
of what I'm doing:

BEGIN {
chomp(my $mvn = `which mvn`);
$mvn or die Unable to locate maven installation - is it installed?;

$maven_home = Cwd::realpath($mvn);
$maven_home =~ s{/bin/mvn$}{};

$ENV{CLASSPATH} = join ':', $maven_home/lib/*.jar,
  $maven_home/boot/*.jar;

@class = (
'java.io.File',
'java.io.FileInputStream',
...
'org.apache.maven.Maven',
...
);

%class = map { (/\.(\w+)$/ = Inline::Java::java2perl(__PACKAGE__,
$_)) } @class;

# this will create subs that simply return the full class of the java
# package
my $stash = Package::Stash-new(__PACKAGE__);
while (my ($C, $P) = each %class) {
my $symbol = \$C;
next if $stash-has_symbol($symbol);
$stash-add_symbol($symbol, sub () { $P });
}
}

use Inline (
Java = 'STUDY',
EXTRA_JAVA_ARGS =
  -Dmaven.home=$maven_home
-Dclassworlds.conf=$maven_home/bin/m2.conf -Xms2048m -Xmx2048m,
STUDY = \@class,
AUTOSTUDY = 1,
JNI   = 1,
CLASSPATH = $ENV{CLASSPATH},
NAME  = __PACKAGE__,
DIRECTORY = ((__FILE__ =~ m{^(.*)\.pm$})[0] . '/_Inline'),
);

All of this makes for a rather non-trivial use-case for Java inside
perl. I spent a lot of time finding the right setup for my needs. And
that tends to be common for Java in general (why engineer it once when
you can over engineer it twice).

On Wed, May 29, 2013 at 1:41 PM, David Larochelle da...@larochelle.name wrote:
 Thanks William,

 My biggest concern with Inline::Java is why it isn't more widely known and
 used. For data analysis, there are two main choices: Python and the JVM
 stack. CPAN is nice but it just doesn't have the data libraries these
 platforms do. It would seem like Inline::Java would be frequently touted as
 a savor of Perl letting you write code that uses both powerful Java
 libraries and CPAN while still having the benefits of Perl.

 Are the reasons why Inline::Java isn't more widely known and used cultural?
 E.g. Perl developers have an aversion to Java, most Perl developers don't
 need data analysis, Inline::Java is badly marketed, etc. Or are there
 technical reasons why Inline::Java is less attractive than it first appears?

 Would it make sense to use Inline::Java and Perl for a green field project
 or would it be better to just use Java or other JVM languages such as Scala
 or Jython when Java libraries are a major component?

 --


 David

 On Tue, May 28, 2013 at 10:42 PM, William Cox mydimens...@gmail.com wrote:

 I've used Inline::Java for 2 main projects in my $work: one is a
 wrapper around the Maven libraries to introspect POMs, the other is a
 long run daemon using an internal java library. I've found it to be
 extremely reliable. To help maintain performance, keep in mind the
 following:

 1. use the JNI interface so that all Perl-Java communication happens
 in the same process. the other mode runs a separate process and
 communicates over a unix socket.
 2. more importantly, plan very carefully your calls (method calls,
 property lookups, object construction, etc.) from Perl to Java. I've
 found that the time to communicate even simple data can take much
 longer that the time taken to process that data (to much context
 switching).

 (2) for me was exemplified for me on wrapping our internal Java
 library. I was doing lots of back-and-forth. After implementing a
 Java-side layer that did the same calls I was doing from Perl, I saw a
 significant speed boost.

 Hope this helps.

 On Tue, May 28, 2013 at 8:59 PM, David Larochelle da...@larochelle.name
 wrote:
  Does anyone have experience with Inline::Java?
 
  I did some basic tests with it and it seems to work but I'm concerned
  about
  its reliability in production use.
 
  The use case is a large data processing system implemented in Perl. I'd
  like to add an algorithm (Conditional Random Fields) that's not
  implemented
  on CPAN but is available as a Java library.
  I'm hoping to use Inline::Java to allow this algorithm to be called from
  the existing Perl code.
  An alternative is to execute the Java code as a separate process and
  pass
  data to the Perl process with pipes and files.
 
  In my case the Java code will be run around 50,000 times a day to
  process
  incoming data. 

Re: [Boston.pm] Inline::Java

2013-05-29 Thread David Larochelle
Jerrad,

I haven't used Statistics::R but it's probably worth considering. I suppose
that R also merited mention as a language for data analysis in addition to
Python and Java/JVM.

With regard to PDL, for most tasks you want be be able to easily apply a
known algorithm to your data and not have to worry about writing it from
scratch. PDL may or may not be a better platform than Python's numpy but
there are sophisticated libraries such as sci-py and PyBrain built on top
of numpy and no real equivalent for Perl and PDL.

--

David

On Wed, May 29, 2013 at 1:45 PM, Jerrad Pierce belg4...@pthbb.org wrote:

 For general data analysis, PDL, Statistics::R or PDL::R::math
 seem like more logical choices than the languages you mentioned.


___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Inline::Java

2013-05-29 Thread David Larochelle
William,

Thank you for offering your thoughts and your experiences.

I feel about better about using Inline::Java. My application is large
enough that porting it to another language would be nontrivial.
Inline::Java seem like a better choice than the alternatives such as
calling Java with system, reimplementing algorithms.

--

David

On Wed, May 29, 2013 at 3:15 PM, William Cox mydimens...@gmail.com wrote:

 My suspicion is that it's 2-fold:
   1. Java doesn't fit the Perl mold
   2. Inline::Java can be tricky to get setup (despite the docs
 appearing to make it easy)

 On (1), the development philosophies between perl and java tend to be
 so different that each side almost writes the other off.

 In my experience on (2), for example by BEGIN block does the following
 things (for the maven wrapper):
   1. find the installation of the mvn command line utility, then add
 all the jars to $ENV{CLASSPATH}
   2. map java classnames to constant subs in the current package
 representing the perl package standing in for that java class

 Then I execute the 'use Inline (args);' line. Here is a brief example
 of what I'm doing:

 BEGIN {
 chomp(my $mvn = `which mvn`);
 $mvn or die Unable to locate maven installation - is it installed?;

 $maven_home = Cwd::realpath($mvn);
 $maven_home =~ s{/bin/mvn$}{};

 $ENV{CLASSPATH} = join ':', $maven_home/lib/*.jar,
   $maven_home/boot/*.jar;

 @class = (
 'java.io.File',
 'java.io.FileInputStream',
 ...
 'org.apache.maven.Maven',
 ...
 );

 %class = map { (/\.(\w+)$/ = Inline::Java::java2perl(__PACKAGE__,
 $_)) } @class;

 # this will create subs that simply return the full class of the java
 # package
 my $stash = Package::Stash-new(__PACKAGE__);
 while (my ($C, $P) = each %class) {
 my $symbol = \$C;
 next if $stash-has_symbol($symbol);
 $stash-add_symbol($symbol, sub () { $P });
 }
 }

 use Inline (
 Java = 'STUDY',
 EXTRA_JAVA_ARGS =
   -Dmaven.home=$maven_home
 -Dclassworlds.conf=$maven_home/bin/m2.conf -Xms2048m -Xmx2048m,
 STUDY = \@class,
 AUTOSTUDY = 1,
 JNI   = 1,
 CLASSPATH = $ENV{CLASSPATH},
 NAME  = __PACKAGE__,
 DIRECTORY = ((__FILE__ =~ m{^(.*)\.pm$})[0] . '/_Inline'),
 );

 All of this makes for a rather non-trivial use-case for Java inside
 perl. I spent a lot of time finding the right setup for my needs. And
 that tends to be common for Java in general (why engineer it once when
 you can over engineer it twice).

 On Wed, May 29, 2013 at 1:41 PM, David Larochelle da...@larochelle.name
 wrote:
  Thanks William,
 
  My biggest concern with Inline::Java is why it isn't more widely known
 and
  used. For data analysis, there are two main choices: Python and the JVM
  stack. CPAN is nice but it just doesn't have the data libraries these
  platforms do. It would seem like Inline::Java would be frequently touted
 as
  a savor of Perl letting you write code that uses both powerful Java
  libraries and CPAN while still having the benefits of Perl.
 
  Are the reasons why Inline::Java isn't more widely known and used
 cultural?
  E.g. Perl developers have an aversion to Java, most Perl developers don't
  need data analysis, Inline::Java is badly marketed, etc. Or are there
  technical reasons why Inline::Java is less attractive than it first
 appears?
 
  Would it make sense to use Inline::Java and Perl for a green field
 project
  or would it be better to just use Java or other JVM languages such as
 Scala
  or Jython when Java libraries are a major component?
 
  --
 
 
  David
 
  On Tue, May 28, 2013 at 10:42 PM, William Cox mydimens...@gmail.com
 wrote:
 
  I've used Inline::Java for 2 main projects in my $work: one is a
  wrapper around the Maven libraries to introspect POMs, the other is a
  long run daemon using an internal java library. I've found it to be
  extremely reliable. To help maintain performance, keep in mind the
  following:
 
  1. use the JNI interface so that all Perl-Java communication happens
  in the same process. the other mode runs a separate process and
  communicates over a unix socket.
  2. more importantly, plan very carefully your calls (method calls,
  property lookups, object construction, etc.) from Perl to Java. I've
  found that the time to communicate even simple data can take much
  longer that the time taken to process that data (to much context
  switching).
 
  (2) for me was exemplified for me on wrapping our internal Java
  library. I was doing lots of back-and-forth. After implementing a
  Java-side layer that did the same calls I was doing from Perl, I saw a
  significant speed boost.
 
  Hope this helps.
 
  On Tue, May 28, 2013 at 8:59 PM, David Larochelle 
 da...@larochelle.name
  wrote:
   Does anyone have experience with Inline::Java?
  
   I did some basic tests with it and it seems to work but I'm concerned
   about
   its reliability in production