Following your suggestion (I think), I built a tokenfilter with the following
code for next():
public final Token next() throws IOException {
Token newToken = input.next();
termText = newToken.termText();
Character tempChar = termText.charAt(0);
if(Character.isUpperCase(tempChar)) {
for(int current = 0; current < termText.length();
current++){
Character currentChar =
termText.charAt(current);
if (Character.isWhitespace(currentChar) &
Character.isUpperCase(currentChar + 1) & current != termText.length()) {
return newToken;
}
}
}
return null;
}
-----------
and in calling this filter, i'm also calling NGramAnalyzerWrapper wrapping
WhitespaceAnalyzer (these two work together), but when building my index i
get the following error:
Exception in thread "main" java.lang.NullPointerException
at rem.NamedEntityTokenFilter.next(NamedEntityTokenFilter.java:21)
at
org.apache.lucene.index.DocumentWriter.invertDocument(DocumentWriter.java:219)
at
org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:95)
at
org.apache.lucene.index.IndexWriter.buildSingleDocSegment(IndexWriter.java:1013)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1001)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:983)
at ancorpMethods.Handlers.handleDOC(Handlers.java:92)
at ancorpMethods.Handlers.handleDir(Handlers.java:32)
at ancorpMethods.Handlers.handleDir(Handlers.java:30)
at ancorpMethods.Handlers.handleDir(Handlers.java:30)
at ancorpMethods.Handlers.handleDir(Handlers.java:30)
at ancorpMethods.Handlers.handleDir(Handlers.java:30)
at Base.Indexer.indexCapitalNgrams(Indexer.java:155)
at Base.Indexer.main(Indexer.java:81)
----------
am I forgetting something or am I going the wrong way? :|
--
View this message in context:
http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14691223.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]