[
https://issues.apache.org/jira/browse/DATAFU-88?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317125#comment-14317125
]
Matthew Hayes commented on DATAFU-88:
-------------------------------------
Thanks Jakob. I think this feature can be treated as optional.
So suppose we added a compile time dependency like below to the project. That
means when you build it will automatically download the library, however it
will not be packaged in the final datafu jar. The UDF will be included in the
final JAR but it won't work unless you download this dependency. We can
provide instructions on how to do that. Does this seem okay?
{code}
diff --git a/datafu-pig/build.gradle b/datafu-pig/build.gradle
index ea385d2..56466ed 100644
--- a/datafu-pig/build.gradle
+++ b/datafu-pig/build.gradle
@@ -151,6 +151,9 @@ dependencies {
autojarred "org.apache.opennlp:opennlp-tools:$openNlpVersion"
autojarred "org.apache.opennlp:opennlp-uima:$openNlpVersion"
autojarred "org.apache.opennlp:opennlp-maxent:$openNlpMaxEntVersion"
+
+ // not autojarred because this is GPL
+ compile "edu.stanford.nlp:stanford-corenlp:$stanfordCoreNlpVersion"
// needed to run jarjar
jarjar "com.googlecode.jarjar:jarjar:1.3"
@@ -218,4 +221,4 @@ test {
systemProperty 'datafu.data.dir', file('data')
maxHeapSize = "2G"
-}
\ No newline at end of file
+}
diff --git a/gradle/dependency-versions.gradle
b/gradle/dependency-versions.gradle
index 3b0835f..81012fc 100644
--- a/gradle/dependency-versions.gradle
+++ b/gradle/dependency-versions.gradle
@@ -39,4 +39,5 @@ ext {
jsonVersion="20090211"
jsr311Version="1.1.1"
slf4jVersion="1.6.4"
+ stanfordCoreNlpVersion="3.5.0"
}
{code}
> Port Stanford Core NLP Functionality to DataFu
> ----------------------------------------------
>
> Key: DATAFU-88
> URL: https://issues.apache.org/jira/browse/DATAFU-88
> Project: DataFu
> Issue Type: New Feature
> Affects Versions: 1.3.0
> Reporter: Russell Jurney
> Assignee: Russell Jurney
> Labels: lemmatizer, nlp, pig, pig_udf, stanford, stemmer
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> For starters I need the Stanford Core NLP stemmer and lemmatizer.
> It looks like maybe I can add something generic and feed arguments to code
> like: props.put("annotators", "tokenize, ssplit, pos, lemma");
> Helpful example of lemmatizing at
> http://stackoverflow.com/questions/1578062/lemmatization-java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)