RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers
OK, so I figured out what the problem was. It wasn't with the digits but rather with the various delimiters like ( and - that I use. Essentially, the statement String[] subTerms = qstr.split(\\s+); Does not split a query the same way as the query parser would do it. And thanks, query.toString(), helped me see that. My question now is this: is there a way of easily extracting a sequence of substrings from query to use in place of the subTerms array I get from split? I see that sometimes query.toString() returns things like contents:800 contents:555 contents:1212 but other times it's somehting like contents:800 (contents:555 contents:1212) So instead of trying to guess what other formats query.toString can produce and trying to parse those, can I somehow extract the substrings of the query reliably? Thanks! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, June 13, 2012 11:42 PM To: java-user@lucene.apache.org Subject: Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Try putting the phone number in quotes in the query: String qstr = \800-555-1212\; And check query.toString to see how the query parser analyzed the term, bot with and without quotes. And make sure you initialized the query parser with contents as the default field. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Wednesday, June 13, 2012 10:52 PM To: java-user@lucene.apache.org Subject: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Hello All, I am using 3.4. I need to find locations of query hits in a document. What I've implemented works fine for textual queries but does not work for phone numbers. Here's how I index my docs: String oc = Joe dialed 800-555-1212 but got a busy signal; doc.add(new Field(contents, oc, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); Now, here how I find locations. I search for a query. If I get a hit, I split my query (in case it's multi-word) into words and search for each of them using TermFreqVector like this: //String qstr = my multiword query; // for queries like this it works fine... String qstr = 800-555-1212; // ...but not for ones like this Query query = parser.parse(qstr); TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[] hits = results.scoreDocs; String[] subTerms = qstr.split(\\s+); // phone string stays intact here for (int i = 0; i hits.length; i++) { int docId = hits[i].doc; Document doc = searcher.doc(docId); TermFreqVector tfvector = reader.getTermFreqVector(docId, contents); TermPositionVector tpvector = (TermPositionVector)tfvector; for (String subTerm : subTerms) { String subq = subTerm.toLowerCase(); int termidx = tfvector.indexOf(subq); // get termidx = -1 here TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx); for (int j=0;jtvoffsetinfo.length;j++) { int offsetStart = tvoffsetinfo[j].getStartOffset(); int offsetEnd = tvoffsetinfo[j].getEndOffset(); // ... For a query like 800-555-1212, tfvector.indexOf returns -1. What am I doing wrong? Thanks, Ilya Zavorin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers
Just take the BooleanQuery returned by the QueryParser and get its clauses (sub-queries like TermQuery, PhraseQuery, other BooleanQuery...). By that you get all query components. In most cases some recursive instanceof checking for various Query subclasses can do this. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ilya Zavorin [mailto:izavo...@caci.com] Sent: Thursday, June 14, 2012 6:49 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers OK, so I figured out what the problem was. It wasn't with the digits but rather with the various delimiters like ( and - that I use. Essentially, the statement String[] subTerms = qstr.split(\\s+); Does not split a query the same way as the query parser would do it. And thanks, query.toString(), helped me see that. My question now is this: is there a way of easily extracting a sequence of substrings from query to use in place of the subTerms array I get from split? I see that sometimes query.toString() returns things like contents:800 contents:555 contents:1212 but other times it's somehting like contents:800 (contents:555 contents:1212) So instead of trying to guess what other formats query.toString can produce and trying to parse those, can I somehow extract the substrings of the query reliably? Thanks! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, June 13, 2012 11:42 PM To: java-user@lucene.apache.org Subject: Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Try putting the phone number in quotes in the query: String qstr = \800-555-1212\; And check query.toString to see how the query parser analyzed the term, bot with and without quotes. And make sure you initialized the query parser with contents as the default field. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Wednesday, June 13, 2012 10:52 PM To: java-user@lucene.apache.org Subject: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Hello All, I am using 3.4. I need to find locations of query hits in a document. What I've implemented works fine for textual queries but does not work for phone numbers. Here's how I index my docs: String oc = Joe dialed 800-555-1212 but got a busy signal; doc.add(new Field(contents, oc, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); Now, here how I find locations. I search for a query. If I get a hit, I split my query (in case it's multi-word) into words and search for each of them using TermFreqVector like this: //String qstr = my multiword query; // for queries like this it works fine... String qstr = 800-555-1212; // ...but not for ones like this Query query = parser.parse(qstr); TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[] hits = results.scoreDocs; String[] subTerms = qstr.split(\\s+); // phone string stays intact here for (int i = 0; i hits.length; i++) { int docId = hits[i].doc; Document doc = searcher.doc(docId); TermFreqVector tfvector = reader.getTermFreqVector(docId, contents); TermPositionVector tpvector = (TermPositionVector)tfvector; for (String subTerm : subTerms) { String subq = subTerm.toLowerCase(); int termidx = tfvector.indexOf(subq); // get termidx = -1 here TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx); for (int j=0;jtvoffsetinfo.length;j++) { int offsetStart = tvoffsetinfo[j].getStartOffset(); int offsetEnd = tvoffsetinfo[j].getEndOffset(); // ... For a query like 800-555-1212, tfvector.indexOf returns -1. What am I doing wrong? Thanks, Ilya Zavorin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers
: Subject: need to find locations of query hits in doc: works fine for regular : text but not for phone numbers : Message-ID: a57498edec10c64781ea0f7dba665cef264de...@ex2010mb01-1.caci.com : References: 1339635547170-3989548.p...@n3.nabble.com : In-Reply-To: 1339635547170-3989548.p...@n3.nabble.com https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers
Uwe, sorry but I am having trouble understanding this. Can you point me to a place in documentation that explains this in more detail (I've read http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/queryParser/QueryParser.html but still am confused) or some example code? Thanks much, Ilya -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, June 14, 2012 12:57 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Just take the BooleanQuery returned by the QueryParser and get its clauses (sub-queries like TermQuery, PhraseQuery, other BooleanQuery...). By that you get all query components. In most cases some recursive instanceof checking for various Query subclasses can do this. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ilya Zavorin [mailto:izavo...@caci.com] Sent: Thursday, June 14, 2012 6:49 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers OK, so I figured out what the problem was. It wasn't with the digits but rather with the various delimiters like ( and - that I use. Essentially, the statement String[] subTerms = qstr.split(\\s+); Does not split a query the same way as the query parser would do it. And thanks, query.toString(), helped me see that. My question now is this: is there a way of easily extracting a sequence of substrings from query to use in place of the subTerms array I get from split? I see that sometimes query.toString() returns things like contents:800 contents:555 contents:1212 but other times it's somehting like contents:800 (contents:555 contents:1212) So instead of trying to guess what other formats query.toString can produce and trying to parse those, can I somehow extract the substrings of the query reliably? Thanks! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, June 13, 2012 11:42 PM To: java-user@lucene.apache.org Subject: Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Try putting the phone number in quotes in the query: String qstr = \800-555-1212\; And check query.toString to see how the query parser analyzed the term, bot with and without quotes. And make sure you initialized the query parser with contents as the default field. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Wednesday, June 13, 2012 10:52 PM To: java-user@lucene.apache.org Subject: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Hello All, I am using 3.4. I need to find locations of query hits in a document. What I've implemented works fine for textual queries but does not work for phone numbers. Here's how I index my docs: String oc = Joe dialed 800-555-1212 but got a busy signal; doc.add(new Field(contents, oc, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); Now, here how I find locations. I search for a query. If I get a hit, I split my query (in case it's multi-word) into words and search for each of them using TermFreqVector like this: //String qstr = my multiword query; // for queries like this it works fine... String qstr = 800-555-1212; // ...but not for ones like this Query query = parser.parse(qstr); TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[] hits = results.scoreDocs; String[] subTerms = qstr.split(\\s+); // phone string stays intact here for (int i = 0; i hits.length; i++) { int docId = hits[i].doc; Document doc = searcher.doc(docId); TermFreqVector tfvector = reader.getTermFreqVector(docId, contents); TermPositionVector tpvector = (TermPositionVector)tfvector; for (String subTerm : subTerms) { String subq = subTerm.toLowerCase(); int termidx = tfvector.indexOf(subq); // get termidx = -1 here TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx); for (int j=0;jtvoffsetinfo.length;j++) { int offsetStart = tvoffsetinfo[j].getStartOffset(); int offsetEnd = tvoffsetinfo[j].getEndOffset(); // ... For a query like 800-555-1212, tfvector.indexOf returns -1. What am I doing wrong? Thanks, Ilya Zavorin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For
Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers
Look at this code: QueryTermExtractor.getTerms(Query query) http://lucene.apache.org/core/3_6_0/api/contrib-highlighter/org/apache/lucene/search/highlight/QueryTermExtractor.html -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Thursday, June 14, 2012 2:36 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Uwe, sorry but I am having trouble understanding this. Can you point me to a place in documentation that explains this in more detail (I've read http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/queryParser/QueryParser.html but still am confused) or some example code? Thanks much, Ilya -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, June 14, 2012 12:57 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Just take the BooleanQuery returned by the QueryParser and get its clauses (sub-queries like TermQuery, PhraseQuery, other BooleanQuery...). By that you get all query components. In most cases some recursive instanceof checking for various Query subclasses can do this. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ilya Zavorin [mailto:izavo...@caci.com] Sent: Thursday, June 14, 2012 6:49 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers OK, so I figured out what the problem was. It wasn't with the digits but rather with the various delimiters like ( and - that I use. Essentially, the statement String[] subTerms = qstr.split(\\s+); Does not split a query the same way as the query parser would do it. And thanks, query.toString(), helped me see that. My question now is this: is there a way of easily extracting a sequence of substrings from query to use in place of the subTerms array I get from split? I see that sometimes query.toString() returns things like contents:800 contents:555 contents:1212 but other times it's somehting like contents:800 (contents:555 contents:1212) So instead of trying to guess what other formats query.toString can produce and trying to parse those, can I somehow extract the substrings of the query reliably? Thanks! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, June 13, 2012 11:42 PM To: java-user@lucene.apache.org Subject: Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Try putting the phone number in quotes in the query: String qstr = \800-555-1212\; And check query.toString to see how the query parser analyzed the term, bot with and without quotes. And make sure you initialized the query parser with contents as the default field. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Wednesday, June 13, 2012 10:52 PM To: java-user@lucene.apache.org Subject: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Hello All, I am using 3.4. I need to find locations of query hits in a document. What I've implemented works fine for textual queries but does not work for phone numbers. Here's how I index my docs: String oc = Joe dialed 800-555-1212 but got a busy signal; doc.add(new Field(contents, oc, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); Now, here how I find locations. I search for a query. If I get a hit, I split my query (in case it's multi-word) into words and search for each of them using TermFreqVector like this: //String qstr = my multiword query; // for queries like this it works fine... String qstr = 800-555-1212; // ...but not for ones like this Query query = parser.parse(qstr); TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[] hits = results.scoreDocs; String[] subTerms = qstr.split(\\s+); // phone string stays intact here for (int i = 0; i hits.length; i++) { int docId = hits[i].doc; Document doc = searcher.doc(docId); TermFreqVector tfvector = reader.getTermFreqVector(docId, contents); TermPositionVector tpvector = (TermPositionVector)tfvector; for (String subTerm : subTerms) { String subq = subTerm.toLowerCase(); int termidx = tfvector.indexOf(subq); // get termidx = -1 here TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx); for (int j=0;jtvoffsetinfo.length;j++) { int offsetStart = tvoffsetinfo[j].getStartOffset(); int offsetEnd = tvoffsetinfo[j].getEndOffset(); // ... For a query like 800-555-1212, tfvector.indexOf returns -1. What am I doing wrong? Thanks, Ilya Zavorin
RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers
worked like a charm! thx! From: Jack Krupansky [j...@basetechnology.com] Sent: Thursday, June 14, 2012 3:30 PM To: java-user@lucene.apache.org Subject: Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Look at this code: QueryTermExtractor.getTerms(Query query) http://lucene.apache.org/core/3_6_0/api/contrib-highlighter/org/apache/lucene/search/highlight/QueryTermExtractor.html -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Thursday, June 14, 2012 2:36 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Uwe, sorry but I am having trouble understanding this. Can you point me to a place in documentation that explains this in more detail (I've read http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/queryParser/QueryParser.html but still am confused) or some example code? Thanks much, Ilya -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Thursday, June 14, 2012 12:57 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Just take the BooleanQuery returned by the QueryParser and get its clauses (sub-queries like TermQuery, PhraseQuery, other BooleanQuery...). By that you get all query components. In most cases some recursive instanceof checking for various Query subclasses can do this. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ilya Zavorin [mailto:izavo...@caci.com] Sent: Thursday, June 14, 2012 6:49 PM To: java-user@lucene.apache.org Subject: RE: need to find locations of query hits in doc: works fine for regular text but not for phone numbers OK, so I figured out what the problem was. It wasn't with the digits but rather with the various delimiters like ( and - that I use. Essentially, the statement String[] subTerms = qstr.split(\\s+); Does not split a query the same way as the query parser would do it. And thanks, query.toString(), helped me see that. My question now is this: is there a way of easily extracting a sequence of substrings from query to use in place of the subTerms array I get from split? I see that sometimes query.toString() returns things like contents:800 contents:555 contents:1212 but other times it's somehting like contents:800 (contents:555 contents:1212) So instead of trying to guess what other formats query.toString can produce and trying to parse those, can I somehow extract the substrings of the query reliably? Thanks! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, June 13, 2012 11:42 PM To: java-user@lucene.apache.org Subject: Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Try putting the phone number in quotes in the query: String qstr = \800-555-1212\; And check query.toString to see how the query parser analyzed the term, bot with and without quotes. And make sure you initialized the query parser with contents as the default field. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Wednesday, June 13, 2012 10:52 PM To: java-user@lucene.apache.org Subject: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Hello All, I am using 3.4. I need to find locations of query hits in a document. What I've implemented works fine for textual queries but does not work for phone numbers. Here's how I index my docs: String oc = Joe dialed 800-555-1212 but got a busy signal; doc.add(new Field(contents, oc, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); Now, here how I find locations. I search for a query. If I get a hit, I split my query (in case it's multi-word) into words and search for each of them using TermFreqVector like this: //String qstr = my multiword query; // for queries like this it works fine... String qstr = 800-555-1212; // ...but not for ones like this Query query = parser.parse(qstr); TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[] hits = results.scoreDocs; String[] subTerms = qstr.split(\\s+); // phone string stays intact here for (int i = 0; i hits.length; i++) { int docId = hits[i].doc; Document doc = searcher.doc(docId); TermFreqVector tfvector = reader.getTermFreqVector(docId, contents); TermPositionVector tpvector = (TermPositionVector)tfvector; for (String subTerm : subTerms) { String subq = subTerm.toLowerCase(); int termidx = tfvector.indexOf(subq); // get termidx = -1 here TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx);
need to find locations of query hits in doc: works fine for regular text but not for phone numbers
Hello All, I am using 3.4. I need to find locations of query hits in a document. What I've implemented works fine for textual queries but does not work for phone numbers. Here's how I index my docs: String oc = Joe dialed 800-555-1212 but got a busy signal; doc.add(new Field(contents, oc, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); Now, here how I find locations. I search for a query. If I get a hit, I split my query (in case it's multi-word) into words and search for each of them using TermFreqVector like this: //String qstr = my multiword query; // for queries like this it works fine... String qstr = 800-555-1212; // ...but not for ones like this Query query = parser.parse(qstr); TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[] hits = results.scoreDocs; String[] subTerms = qstr.split(\\s+); // phone string stays intact here for (int i = 0; i hits.length; i++) { int docId = hits[i].doc; Document doc = searcher.doc(docId); TermFreqVector tfvector = reader.getTermFreqVector(docId, contents); TermPositionVector tpvector = (TermPositionVector)tfvector; for (String subTerm : subTerms) { String subq = subTerm.toLowerCase(); int termidx = tfvector.indexOf(subq); // get termidx = -1 here TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx); for (int j=0;jtvoffsetinfo.length;j++) { int offsetStart = tvoffsetinfo[j].getStartOffset(); int offsetEnd = tvoffsetinfo[j].getEndOffset(); // ... For a query like 800-555-1212, tfvector.indexOf returns -1. What am I doing wrong? Thanks, Ilya Zavorin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: need to find locations of query hits in doc: works fine for regular text but not for phone numbers
Try putting the phone number in quotes in the query: String qstr = \800-555-1212\; And check query.toString to see how the query parser analyzed the term, bot with and without quotes. And make sure you initialized the query parser with contents as the default field. -- Jack Krupansky -Original Message- From: Ilya Zavorin Sent: Wednesday, June 13, 2012 10:52 PM To: java-user@lucene.apache.org Subject: need to find locations of query hits in doc: works fine for regular text but not for phone numbers Hello All, I am using 3.4. I need to find locations of query hits in a document. What I've implemented works fine for textual queries but does not work for phone numbers. Here's how I index my docs: String oc = Joe dialed 800-555-1212 but got a busy signal; doc.add(new Field(contents, oc, Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); Now, here how I find locations. I search for a query. If I get a hit, I split my query (in case it's multi-word) into words and search for each of them using TermFreqVector like this: //String qstr = my multiword query; // for queries like this it works fine... String qstr = 800-555-1212; // ...but not for ones like this Query query = parser.parse(qstr); TopDocs results = searcher.search(query, Integer.MAX_VALUE); ScoreDoc[] hits = results.scoreDocs; String[] subTerms = qstr.split(\\s+); // phone string stays intact here for (int i = 0; i hits.length; i++) { int docId = hits[i].doc; Document doc = searcher.doc(docId); TermFreqVector tfvector = reader.getTermFreqVector(docId, contents); TermPositionVector tpvector = (TermPositionVector)tfvector; for (String subTerm : subTerms) { String subq = subTerm.toLowerCase(); int termidx = tfvector.indexOf(subq); // get termidx = -1 here TermVectorOffsetInfo[] tvoffsetinfo = tpvector.getOffsets(termidx); for (int j=0;jtvoffsetinfo.length;j++) { int offsetStart = tvoffsetinfo[j].getStartOffset(); int offsetEnd = tvoffsetinfo[j].getEndOffset(); // ... For a query like 800-555-1212, tfvector.indexOf returns -1. What am I doing wrong? Thanks, Ilya Zavorin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org