[jira] [Commented] (OPENNLP-862) BRAT format packages do not handle punctuation correctly when training NER model
[ https://issues.apache.org/jira/browse/OPENNLP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15544995#comment-15544995 ] Joern Kottmann commented on OPENNLP-862: OpenNLP has to tokenize its input text. Brat can avoid this by just letting the user decide how he wants to mark things. In the end you will need a tokenizer, the Whitespace Tokenizer has the isue you mentioned, the SimpleTokenizer splits on character class and will probably work better for you. Anyway, I think it makes sense to add an option to let the Brat parser assume that annotation boundaries are also always token boundaries. It would be very nice if you could send us a patch to add this option. > BRAT format packages do not handle punctuation correctly when training NER > model > > > Key: OPENNLP-862 > URL: https://issues.apache.org/jira/browse/OPENNLP-862 > Project: OpenNLP > Issue Type: Bug > Components: Formats >Affects Versions: 1.6.0 >Reporter: Gregory Werner > > BRAT does not require preprocessing of text files in order to add annotations > to text documents. And this is great because I can feed documents from > corpora I am given directly into BRAT. If I have a line such as: > Residence: Athens, Georgia > I would provide 2 annotations in BRAT, Athens and Georgia, and BRAT would > generate the offset and everything would be fine. > It appears though that I only get 1 entity correctly processed (and the other > dropped) in OpenNLP with TokenNameFinderTrainer.brat, Georgia, because the > comma is not separated from Athens. I have 789 annotated raw, non > pre-processed text documents from past efforts. I believe that OpenNLP should > be able to handle lines like the above in the case of the BRAT format code. > It appears that BratNameSampleStream uses the WhitespaceTokenizer and that is > what creates Athens, as a token. I find that the SimpleTokenizer might > perform better with BRAT through my limited testing of raw documents if the > current general approach is held. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (OPENNLP-862) BRAT format packages do not handle punctuation correctly when training NER model
[ https://issues.apache.org/jira/browse/OPENNLP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15544995#comment-15544995 ] Joern Kottmann edited comment on OPENNLP-862 at 10/4/16 10:22 AM: -- OpenNLP has to tokenize its input text. Brat can avoid this by just letting the user decide how he wants to mark things. In the end you will need a tokenizer, the Whitespace Tokenizer has the issue you mentioned, the SimpleTokenizer splits on character class and will probably work better for you. Anyway, I think it makes sense to add an option to let the Brat parser assume that annotation boundaries are also always token boundaries. It would be very nice if you could send us a patch to add this option. was (Author: joern): OpenNLP has to tokenize its input text. Brat can avoid this by just letting the user decide how he wants to mark things. In the end you will need a tokenizer, the Whitespace Tokenizer has the isue you mentioned, the SimpleTokenizer splits on character class and will probably work better for you. Anyway, I think it makes sense to add an option to let the Brat parser assume that annotation boundaries are also always token boundaries. It would be very nice if you could send us a patch to add this option. > BRAT format packages do not handle punctuation correctly when training NER > model > > > Key: OPENNLP-862 > URL: https://issues.apache.org/jira/browse/OPENNLP-862 > Project: OpenNLP > Issue Type: Bug > Components: Formats >Affects Versions: 1.6.0 >Reporter: Gregory Werner > > BRAT does not require preprocessing of text files in order to add annotations > to text documents. And this is great because I can feed documents from > corpora I am given directly into BRAT. If I have a line such as: > Residence: Athens, Georgia > I would provide 2 annotations in BRAT, Athens and Georgia, and BRAT would > generate the offset and everything would be fine. > It appears though that I only get 1 entity correctly processed (and the other > dropped) in OpenNLP with TokenNameFinderTrainer.brat, Georgia, because the > comma is not separated from Athens. I have 789 annotated raw, non > pre-processed text documents from past efforts. I believe that OpenNLP should > be able to handle lines like the above in the case of the BRAT format code. > It appears that BratNameSampleStream uses the WhitespaceTokenizer and that is > what creates Athens, as a token. I find that the SimpleTokenizer might > perform better with BRAT through my limited testing of raw documents if the > current general approach is held. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545212#comment-15545212 ] Joern Kottmann commented on OPENNLP-776: Thanks, looks good, I think we can more or less merge it like that for the 1.6.1 release. One question, in which case can the else block of the if( in instanceof InputStream ) be entered in the read and write methods ? As far as I understand will this always be true, since the type is defined as part of the Java API and won't change. I suggest we drop the else block. I will test this on my cluster in the next days and then report back here. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545536#comment-15545536 ] Tristan Nixon commented on OPENNLP-776: --- Thanks, that's great! While it is true that ObjectInputStream is the only implementation of the ObjectInput interface (and ObjectOutputStream for ObjectOutput) in the JSE, there are different implementations in other frameworks, which is why I didn't want to presume and simply cast it. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545560#comment-15545560 ] Joern Kottmann commented on OPENNLP-776: Can you give me an example? OpenNLP today only runs on Java 7 and is not tested on any other JVMs. So you probably here and there run into issues. Do you run it on Android? I think it is save to simply hand over the stream and assume the type is InputStream / OutputStream. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545610#comment-15545610 ] Tristan Nixon commented on OPENNLP-776: --- It's not about the JVM version, it's about third-party frameworks that provide their own serialization implementation. I use OpenNLP in Spark, which makes use of Kryo, which provides a more optimized serialization implementation. Kryo has implementations of these interfaces that are not direct sub-types of InputStream/OutputStream. For example, see: https://github.com/EsotericSoftware/kryo/blob/cef15a3dc55e74162399fce163e19d4845a9f890/src/com/esotericsoftware/kryo/io/KryoObjectOutput.java If we removed these checks, it would work with JSE's serialization, but not Kryo's (even though they're both running on the same JVM). > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545655#comment-15545655 ] Joern Kottmann commented on OPENNLP-776: Hmm, then I don't understand it. The write method can only be called with an object of type java.io.ObjectOutputStream and that must extend OutputStream, so it should be safe to assume that? No? ObjectOutputStream is a class and not an interface. It is possible to pass in an object of it, or define a new class which extend it, in both cases the object has also the type OutputStream, right? > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545791#comment-15545791 ] Tristan Nixon commented on OPENNLP-776: --- Well, it's a bit of a messy type hierarchy, since the write( int) method is defined on both the abstract class OutputStream AND on the interface DataOutput, which is inherited by interface ObjectOutput. The ObjectOutputStream class inherits from BOTH OutputStream AND ObjectOutput. However, the Externalizable interface defines the method writeExternal( ObjectOutput ), which implies that there could be other implementations of this interface that are not necessarily subtypes of OutputStream. This is in fact what some other frameworks do - they provide an alternative implementation. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545809#comment-15545809 ] Joern Kottmann commented on OPENNLP-776: But it is not possible to pass on object which only implements ObjectOutput to void writeObject(final ObjectOutputStream out) therefore the passed in object must be of type OutputStream and the implementation in the patch will always execute the first part of the if, and can't go into the else block. Anyway, don't get me wrong, if we can make it work with Kyro serialization as well, that would be great. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545832#comment-15545832 ] Tristan Nixon commented on OPENNLP-776: --- But that's not what my implementation does. We use writeReplace() to supply a proxy object and the proxy is Externalizable meaning that writeExternal( ObjectOutput ) gets called. We discussed this, above (Aug 18,19). See BaseModel$BaseModelSerializationProxy. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545887#comment-15545887 ] Joern Kottmann commented on OPENNLP-776: Sorry for the confusion is was speaking all the time about serializable-basemodel.patch and not serialization_proxy.patch. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545887#comment-15545887 ] Joern Kottmann edited comment on OPENNLP-776 at 10/4/16 4:30 PM: - Sorry for the confusion I was speaking all the time about serializable-basemodel.patch and not serialization_proxy.patch. was (Author: joern): Sorry for the confusion is was speaking all the time about serializable-basemodel.patch and not serialization_proxy.patch. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545918#comment-15545918 ] Tristan Nixon commented on OPENNLP-776: --- Sorry, I probably should have removed that older patch. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (OPENNLP-776) Model Objects should be Serializable
[ https://issues.apache.org/jira/browse/OPENNLP-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15545918#comment-15545918 ] Tristan Nixon edited comment on OPENNLP-776 at 10/4/16 4:44 PM: Sorry, I probably should have removed that older patch and consolidated them into a single patch. was (Author: tnixon): Sorry, I probably should have removed that older patch. > Model Objects should be Serializable > > > Key: OPENNLP-776 > URL: https://issues.apache.org/jira/browse/OPENNLP-776 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: tools-1.5.3 >Reporter: Tristan Nixon >Assignee: Joern Kottmann >Priority: Minor > Labels: features, patch > Fix For: 1.6.1 > > Attachments: externalizable.patch, serializable-basemodel.patch, > serialization_proxy.patch > > > Marking model objects (ParserModel, SentenceModel, etc.) as Serializable can > enable a number of features offered by other Java frameworks (my own use case > is described below). You've already got a good mechanism for > (de-)serialization, but it cannot be leveraged by other frameworks without > implementing the Serializable interface. I'm attaching a patch to BaseModel > that implements the methods in the java.io.Externalizable interface as > wrappers to the existing (de-)serialization methods. This simple change can > open up a number of useful opportunities for integrating OpenNLP with other > frameworks. > My use case is that I am incorporating OpenNLP into a Spark application. This > requires that components of the system be distributed between the driver and > worker nodes within the cluster. In order to do this, Spark uses Java > serialization API to transmit objects between nodes. This is far more > efficient than instantiating models on each node independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)