Hi Chris, A null pointer exception can be causes by not checking newToken for null after this line: Token newToken = input.next()
I think Hoss meant to call next() on the input as long as returned tokens do not satisfy the check for being a named entity. Also, this code assumes white space in the token - which you won't have since using a WhiteSpaceAnalyzer. For returning single word names I think something like this should work: Token t; while ((t = in.next())!=null && ! Character.isUpperCase(t.termText().getCharAt(0))) { } return t; For identifying two consecutive token starting with an upper case character and returning them as a single name a bit more code is required. Btw, I don't understand why the NGram. HTH, Doron On Jan 8, 2008 5:05 PM, chris.b <[EMAIL PROTECTED]> wrote: > > Following your suggestion (I think), I built a tokenfilter with the > following > code for next(): > > public final Token next() throws IOException { > Token newToken = input.next(); > termText = newToken.termText(); > Character tempChar = termText.charAt(0); > if(Character.isUpperCase(tempChar)) { > for(int current = 0; current < termText.length(); > current++){ > Character currentChar = termText.charAt > (current); > if (Character.isWhitespace(currentChar) & > Character.isUpperCase(currentChar + 1) & current != termText.length()) { > return newToken; > } > } > } > return null; > } > > ----------- > and in calling this filter, i'm also calling NGramAnalyzerWrapper wrapping > WhitespaceAnalyzer (these two work together), but when building my index i > get the following error: > > Exception in thread "main" java.lang.NullPointerException > at rem.NamedEntityTokenFilter.next(NamedEntityTokenFilter.java:21) > at > org.apache.lucene.index.DocumentWriter.invertDocument(DocumentWriter.java > :219) > at > org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:95) > at > org.apache.lucene.index.IndexWriter.buildSingleDocSegment(IndexWriter.java > :1013) > at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java > :1001) > at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java > :983) > at ancorpMethods.Handlers.handleDOC(Handlers.java:92) > at ancorpMethods.Handlers.handleDir(Handlers.java:32) > at ancorpMethods.Handlers.handleDir(Handlers.java:30) > at ancorpMethods.Handlers.handleDir(Handlers.java:30) > at ancorpMethods.Handlers.handleDir(Handlers.java:30) > at ancorpMethods.Handlers.handleDir(Handlers.java:30) > at Base.Indexer.indexCapitalNgrams(Indexer.java:155) > at Base.Indexer.main(Indexer.java:81) > > ---------- > am I forgetting something or am I going the wrong way? :| > >