Hi again,
ok, found it... now I understood what you meant with "manifest". It did
this:
--------------------------------
if (eosCharacters!=null)
setManifestProperty(EOS_CHARACTERS_PROPERTY,
eosCharArrayToString(eosCharacters));
--------------------------------
now it works.
Best
Katrin
On 02/09/2012 02:20 PM, Katrin Tomanek wrote:
Hi Jörn,
I did that:
public SentenceModel(String languageCode, AbstractModel sentModel,
boolean useTokenEnd, Dictionary abbreviations, char[] eosCharacters,
Map<String, String> manifestInfoEntries) {
super(COMPONENT_NAME, languageCode, manifestInfoEntries);
artifactMap.put(MAXENT_MODEL_ENTRY_NAME, sentModel);
setManifestProperty(TOKEN_END_PROPERTY, Boolean.toString(useTokenEnd));
// Abbreviations are optional
if (abbreviations != null)
artifactMap.put(ABBREVIATIONS_ENTRY_NAME, abbreviations);
// EOS characters are optional
if (eosCharacters!=null)
artifactMap.put(EOS_CHARACTERS_ENTRY_NAME,
eosCharArrayToString(eosCharacters));
checkArtifactMap();
}
the EOS-Char-Array is transformed to a string which is written to the
manifest.
Still, wenn serializing the model, I get:
Exception in thread "main" java.lang.IllegalStateException: Missing
serializer for eosCharacters
Best,
Katrin
On 02/09/2012 12:48 PM, Joern Kottmann wrote:
The artifactMap map contains a manifest (that is a Properties object).
You should store the EOS chars in this manifest. We need a smart way to
convert
them into a String.
The Sentence Detector should retrieve the EOS chars then from the model
e.g. make a method getEosChars.
Have a look at the other model classes as well, e.g. the tokenizer model.
It stores some settings in the manifest. That is a good place to look
for a
code sample.
Jörn
On Thu, Feb 9, 2012 at 12:38 PM, Katrin Tomanek
<[email protected]>wrote:
Hi,
I am moving the discussion on making the EOS characters of the sentence
splitter configurable to the dev list (it was previously on the user
list).
I am currently trying to make the EOS characters a parameter of the
SentenceDetectorME and store it as model parameter.
Thus far, this works fine (although it requires quite some positions in
the code to change).
I am putting a "char[] eosCharacters" to the artifactMap in
SentenceModel.
When predicting with a model, I test whether the eos parameter is set
and
if so I use these eos symbols, otherwise the language dependent ones.
Anyways, I am now getting into troubles when serializing the model with
the new "char[]" parameter:
Writing sentence detector model ... Exception in thread "main"
java.lang.*
*IllegalStateException: Missing serializer for eosCharacters
I know that I would have to write such a serializer, however, I am a bit
lost here. Any hints (maybe there is already a serializer for char[]
which
I could easily use).
Best
Katrin
--
Dr. Katrin Tomanek
Averbis GmbH
Tennenbacher Strasse 11
D-79106 Freiburg
Fon: +49 (0) 761 - 203 97696
Fax: +49 (0) 761 - 203 97694
E-Mail: [email protected]
Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó
Sitz der Gesellschaft: Freiburg i. Br.
AG Freiburg i. Br., HRB 701080