[ https://issues.apache.org/jira/browse/DATAFU-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880601#comment-13880601 ]
Matthew Hayes commented on DATAFU-14: ------------------------------------- I cloned your repo and took a look. It could be that autojar is getting confused. It only tries to include the classes that are absolutely necessary. If lucene is referencing classes dynamically then autojar may not discover the dependency and remove them. Autojar tries to handle cases like this supposedly, but maybe it isn't working here. The other options are: 1) Packaging all required lucene JARs in datafu (under a different namespace of course) -- requires changing the build.xml 2) Requiring that the lucene JARs be present in the classpath if you want to use the wrapper functions -- only requires moving your conf for the lucene jars from "packaged" to "common" in ivy.xml We may want to start a separate discussion on JAR packaging actually and come up with some guidelines or policy. I started doing this when we added the fastutil dependency, as fastutil is a very large JAR that you don't want to include in its entirety. Autojar is nice because it strips out what you don't need. It's also nice to not have to worry about other JARs. Just get the datafu JAR, register it and you're set. But maybe there are cases where the user should get the necessary JARs (rather than having them packaged), especially in cases where the UDF is a somewhat simple wrapper around functionality from another JAR. Or, we could ship a separate artifact with the UDFs plus the necessary dependencies (e.g. lucene) in a single JAR, like datafu-pig-lucene-x.y.z.jar. Or, as another example, datafu-pig-opennlp-x.y.z.jar. I'm not sure what the right approach is, I'll have to think on it some more. > Add NGram Tokenizer to datafu.pig.text.lucene > --------------------------------------------- > > Key: DATAFU-14 > URL: https://issues.apache.org/jira/browse/DATAFU-14 > Project: DataFu > Issue Type: Improvement > Environment: plants > Reporter: Russell Jurney > > See > https://github.com/rjurney/datafu/blob/lucene/src/java/datafu/pig/text/lucene/NGramTokenize.java > Held up by > http://stackoverflow.com/questions/21064520/how-to-use-lucene-shinglefilter-could-not-find-implementing-class-for-org-apach/21067142?noredirect=1#21067142 -- This message was sent by Atlassian JIRA (v6.1.5#6160)