[jira] [Updated] (OPENNLP-859) Cannot get entities from trained model using DictionaryFeatureGenerator

Damiano Porta (JIRA) Tue, 16 Aug 2016 08:23:51 -0700

     [ 
https://issues.apache.org/jira/browse/OPENNLP-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Damiano Porta updated OPENNLP-859:
----------------------------------
    Description: 
Hello,
I have created the following training data.

{code:title=train.txt|borderStyle=solid}
Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma  .
il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma .
il mio cap è lo 00144 nella capitale e e il mio nome è  <START:person> john 
<END> .
Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio amico 
.
Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a 
calcio .
{code}

And then this code:

{code:title=test.java|borderStyle=solid}

        Charset charset = Charset.forName("UTF-8");
        ObjectStream<String> lineStream =
                        new PlainTextByLineStream(new 
FileInputStream("/home/damiano/person.train"), charset);
        ObjectStream<NameSample> sampleStream = new 
NameSampleDataStream(lineStream);

        TokenNameFinderModel model;

        Dictionary dictionary = new Dictionary();
        dictionary.put(new StringList(new String[]{"giovanni"}));
        dictionary.put(new StringList(new String[]{"maria"}));
        dictionary.put(new StringList(new String[]{"luca"}));
      
        BufferedOutputStream aa = null;
          
        AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
                 new AdaptiveFeatureGenerator[]{                                
 
                    new WindowFeatureGenerator(new TokenFeatureGenerator(), 2, 
2),
                    new WindowFeatureGenerator(new 
TokenClassFeatureGenerator(true), 2, 2),
                    new OutcomePriorFeatureGenerator(),
                    new PreviousMapFeatureGenerator(),
                    new BigramNameFeatureGenerator(),
                    new SentenceFeatureGenerator(true, false),
                    new DictionaryFeatureGenerator("person", dictionary)
                   });

        try {
            model = NameFinderME.train("it", "person", sampleStream, 
TrainingParameters.defaultParams(),
                    featureGenerator, Collections.<String, Object>emptyMap());
        }
        finally {
          sampleStream.close();
        }

        // Save trained model
        try (BufferedOutputStream modelOut = new BufferedOutputStream(new 
FileOutputStream("/home/damiano/it-person-custom.bin"))) {
          model.serialize(modelOut);
        }
                
        // Read the trained model
        try (InputStream modelIn = new 
FileInputStream("/home/damiano/it-person-custom.bin")) {

            TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);

            NameFinderME nameFinder = new NameFinderME(nerModel, 
featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
          
            String sentence[] = new String[]{
                "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", 
"."
            };
            
            Span nameSpans[] = nameFinder.find(sentence);                     
          
            System.out.println(Arrays.toString(Span.spansToStrings(nameSpans, 
sentence)));
        }      
{code}

When i try `"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."` 
it correctly detect "Damiano" as PERSON, but if i change it with:

"Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."

it does not detect "maria" as PERSON but I added "maria" in the dictionary so 
it should get it. Why not ?

Thanks!

  was:
Hello,
I have created the following training data.

```
Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma  .
il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma .
il mio cap è lo 00144 nella capitale e e il mio nome è  <START:person> john 
<END> .
Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio amico 
.
Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a 
calcio .
```
And then this code:

```

        Charset charset = Charset.forName("UTF-8");
        ObjectStream<String> lineStream =
                        new PlainTextByLineStream(new 
FileInputStream("/home/damiano/person.train"), charset);
        ObjectStream<NameSample> sampleStream = new 
NameSampleDataStream(lineStream);

        TokenNameFinderModel model;

        Dictionary dictionary = new Dictionary();
        dictionary.put(new StringList(new String[]{"giovanni"}));
        dictionary.put(new StringList(new String[]{"maria"}));
        dictionary.put(new StringList(new String[]{"luca"}));
      
        BufferedOutputStream aa = null;
          
        AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
                 new AdaptiveFeatureGenerator[]{                                
 
                    new WindowFeatureGenerator(new TokenFeatureGenerator(), 2, 
2),
                    new WindowFeatureGenerator(new 
TokenClassFeatureGenerator(true), 2, 2),
                    new OutcomePriorFeatureGenerator(),
                    new PreviousMapFeatureGenerator(),
                    new BigramNameFeatureGenerator(),
                    new SentenceFeatureGenerator(true, false),
                    new DictionaryFeatureGenerator("person", dictionary)
                   });

        try {
            model = NameFinderME.train("it", "person", sampleStream, 
TrainingParameters.defaultParams(),
                    featureGenerator, Collections.<String, Object>emptyMap());
        }
        finally {
          sampleStream.close();
        }

        // Save trained model
        try (BufferedOutputStream modelOut = new BufferedOutputStream(new 
FileOutputStream("/home/damiano/it-person-custom.bin"))) {
          model.serialize(modelOut);
        }
                
        // Read the trained model
        try (InputStream modelIn = new 
FileInputStream("/home/damiano/it-person-custom.bin")) {

            TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);

            NameFinderME nameFinder = new NameFinderME(nerModel, 
featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
          
            String sentence[] = new String[]{
                "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", 
"."
            };
            
            Span nameSpans[] = nameFinder.find(sentence);                     
          
            System.out.println(Arrays.toString(Span.spansToStrings(nameSpans, 
sentence)));
        }      
```

When i try `"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."` 
it correctly detect "Damiano" as PERSON, but if i change it with:

"Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."

it does not detect "maria" as PERSON but I added "maria" in the dictionary so 
it should get it. Why not ?

Thanks!


> Cannot get entities from trained model using DictionaryFeatureGenerator 
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-859
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-859
>             Project: OpenNLP
>          Issue Type: Question
>          Components: Name Finder
>    Affects Versions: 1.6.0
>         Environment: ubuntu 16.04 java 8
>            Reporter: Damiano Porta
>
> Hello,
> I have created the following training data.
> {code:title=train.txt|borderStyle=solid}
> Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma  .
> il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma 
> .
> il mio cap è lo 00144 nella capitale e e il mio nome è  <START:person> john 
> <END> .
> Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio 
> amico .
> Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a 
> calcio .
> {code}
> And then this code:
> {code:title=test.java|borderStyle=solid}
>         Charset charset = Charset.forName("UTF-8");
>         ObjectStream<String> lineStream =
>                         new PlainTextByLineStream(new 
> FileInputStream("/home/damiano/person.train"), charset);
>         ObjectStream<NameSample> sampleStream = new 
> NameSampleDataStream(lineStream);
>         TokenNameFinderModel model;
>         Dictionary dictionary = new Dictionary();
>         dictionary.put(new StringList(new String[]{"giovanni"}));
>         dictionary.put(new StringList(new String[]{"maria"}));
>         dictionary.put(new StringList(new String[]{"luca"}));
>       
>         BufferedOutputStream aa = null;
>           
>         AdaptiveFeatureGenerator featureGenerator = new 
> CachedFeatureGenerator(
>                  new AdaptiveFeatureGenerator[]{                              
>    
>                     new WindowFeatureGenerator(new TokenFeatureGenerator(), 
> 2, 2),
>                     new WindowFeatureGenerator(new 
> TokenClassFeatureGenerator(true), 2, 2),
>                     new OutcomePriorFeatureGenerator(),
>                     new PreviousMapFeatureGenerator(),
>                     new BigramNameFeatureGenerator(),
>                     new SentenceFeatureGenerator(true, false),
>                     new DictionaryFeatureGenerator("person", dictionary)
>                    });
>         try {
>             model = NameFinderME.train("it", "person", sampleStream, 
> TrainingParameters.defaultParams(),
>                     featureGenerator, Collections.<String, Object>emptyMap());
>         }
>         finally {
>           sampleStream.close();
>         }
>         // Save trained model
>         try (BufferedOutputStream modelOut = new BufferedOutputStream(new 
> FileOutputStream("/home/damiano/it-person-custom.bin"))) {
>           model.serialize(modelOut);
>         }
>                 
>         // Read the trained model
>         try (InputStream modelIn = new 
> FileInputStream("/home/damiano/it-person-custom.bin")) {
>             TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);
>             NameFinderME nameFinder = new NameFinderME(nerModel, 
> featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
>           
>             String sentence[] = new String[]{
>                 "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", 
> "."
>             };
>             
>             Span nameSpans[] = nameFinder.find(sentence);                     
>           
>             System.out.println(Arrays.toString(Span.spansToStrings(nameSpans, 
> sentence)));
>         }      
> {code}
> When i try `"Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", 
> "."` it correctly detect "Damiano" as PERSON, but if i change it with:
> "Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."
> it does not detect "maria" as PERSON but I added "maria" in the dictionary so 
> it should get it. Why not ?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (OPENNLP-859) Cannot get entities from trained model using DictionaryFeatureGenerator

Reply via email to