[ 
https://issues.apache.org/jira/browse/DATAFU-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880601#comment-13880601
 ] 

Matthew Hayes commented on DATAFU-14:
-------------------------------------

I cloned your repo and took a look.  It could be that autojar is getting 
confused.  It only tries to include the classes that are absolutely necessary.  
If lucene is referencing classes dynamically then autojar may not discover the 
dependency and remove them.  Autojar tries to handle cases like this 
supposedly, but maybe it isn't working here.  The other options are:

1) Packaging all required lucene JARs in datafu (under a different namespace of 
course) -- requires changing the build.xml 
2) Requiring that the lucene JARs be present in the classpath if you want to 
use the wrapper functions -- only requires moving your conf for the lucene jars 
from "packaged" to "common" in ivy.xml

We may want to start a separate discussion on JAR packaging actually and come 
up with some guidelines or policy.  I started doing this when we added the 
fastutil dependency, as fastutil is a very large JAR that you don't want to 
include in its entirety.  Autojar is nice because it strips out what you don't 
need.  It's also nice to not have to worry about other JARs.  Just get the 
datafu JAR, register it and you're set.  But maybe there are cases where the 
user should get the necessary JARs (rather than having them packaged), 
especially in cases where the UDF is a somewhat simple wrapper around 
functionality from another JAR.  Or, we could ship a separate artifact with the 
UDFs plus the necessary dependencies (e.g. lucene) in a single JAR, like 
datafu-pig-lucene-x.y.z.jar.  Or, as another example, 
datafu-pig-opennlp-x.y.z.jar.  I'm not sure what the right approach is, I'll 
have to think on it some more.

> Add NGram Tokenizer to datafu.pig.text.lucene
> ---------------------------------------------
>
>                 Key: DATAFU-14
>                 URL: https://issues.apache.org/jira/browse/DATAFU-14
>             Project: DataFu
>          Issue Type: Improvement
>         Environment: plants
>            Reporter: Russell Jurney
>
> See 
> https://github.com/rjurney/datafu/blob/lucene/src/java/datafu/pig/text/lucene/NGramTokenize.java
> Held up by 
> http://stackoverflow.com/questions/21064520/how-to-use-lucene-shinglefilter-could-not-find-implementing-class-for-org-apach/21067142?noredirect=1#21067142



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to