Duplicate Hits

2005-02-01 Thread Jerry Jalenak
Is there a way to eliminate duplicate hits being returned from the index?

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Duplicate Hits

2005-02-01 Thread Jerry Jalenak
Ok, OK.  Should have that response coming  8-)

The documents I'm indexing are sent from a legacy system, and can be sent
multiple times - but I only want to keep the documents if something has
changed.  If the indexed fields match exactly, I don't want to index the
second (or third, forth, etc) documents.  If the indexed fields have
changed, then I want to index the 'new' document, and keep it.

Given Erik's response of 'don't put duplicate documents in the index', how
can I accomplish this in the IndexWriter?

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 01, 2005 8:35 AM
To: Lucene Users List
Subject: Re: Duplicate Hits


On Feb 1, 2005, at 9:01 AM, Jerry Jalenak wrote:
 Is there a way to eliminate duplicate hits being returned from the 
 index?

Sure, don't put duplicate documents in the index :)

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Duplicate Hits

2005-02-01 Thread Jerry Jalenak
Nice idea John - one I hadn't considered.  Once you have the checksum, do
you 'check' in the index first before storing the second document?  Or do
you filter on the query side?

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


-Original Message-
From: John Haxby [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 01, 2005 9:06 AM
To: Lucene Users List
Subject: Re: Duplicate Hits


Jerry Jalenak wrote:

Given Erik's response of 'don't put duplicate documents in the index', how
can I accomplish this in the IndexWriter?
  

I was dealing with a similar requirement recently.   I eventually 
decided on storing the MD5 checksum of the document as a keyword.   It 
means reading it twice (once to calculate the checksum, once to index 
it), but it seems to do the trick.

jch

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Duplicate Hits

2005-02-01 Thread Jerry Jalenak
Just to make sure I understand

Do you keep an IndexReader open at the same time you are running the
IndexWriter?  From what I can see in the JavaDocs, it looks like only
IndexReader (or IndexSearch) can peek into the index and see if a document
exists or not

Thanks!

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


-Original Message-
From: John Haxby [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 01, 2005 9:39 AM
To: Lucene Users List
Subject: Re: Duplicate Hits


Jerry Jalenak wrote:

Nice idea John - one I hadn't considered.  Once you have the checksum, do
you 'check' in the index first before storing the second document?  Or do
you filter on the query side?
  

I do a quick search for the md5 checksum before indexing.

Although I suspect not applicable in your case, I also maintained a 
last time something was indexed time alongside the index.  I used this 
to drastically prune the number of documents that needed to be 
considered for indexing if I restarted; anything modified before then 
wasn't a candidate.  Since the MD5 checksum provides the definitive (for 
a sufficiently loose definition of definitive) indication of whether a 
document is indexed I didn't need to worry about ultra-fine granularity 
in the time stamp and I didn't need to worry about it being committed to 
disk; it generally got committed to the magnetic stuff every few seconds 
or so.

It does help a lot though if documents have nice unique identifiers that 
you can use instead, then you can use the identifier and the last 
modified time to decide whether or not to re-index.

jch

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Duplicate Hits

2005-02-01 Thread Jerry Jalenak
OK - but I'm dealing with indexing between 1.5 and 2 million documents, so I
really don't want to 'batch' them up if I can avoid it.  And I also don't
think I can keep an IndexRead open to the index at the same time I have an
IndexWriter open.  I may have to try and deal with this issue through some
sort of filter on the query side, provided it doesn't impact performance to
much.

Thanks.

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


-Original Message-
From: John Haxby [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 01, 2005 9:48 AM
To: Lucene Users List
Subject: Re: Duplicate Hits


Jerry Jalenak wrote:

Just to make sure I understand

Do you keep an IndexReader open at the same time you are running the
IndexWriter?  From what I can see in the JavaDocs, it looks like only
IndexReader (or IndexSearch) can peek into the index and see if a document
exists or not
  

I slightly misled you: it wasn't Lucene that I was using at the time and 
in that system the distinction between IndexReader and IndexWriter 
didn't exist.   I'm just getting to grips with Lucene really but it 
would seem to be possible to use a similar scheme, especially if you 
batch up your documents for indexing: as they come in, check the md5 
checksum against what's already known and what's already queued and then 
when the time comes to process the queue you know what you've got needs 
to be indexed.

jch

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Index Layout Question

2005-01-27 Thread Jerry Jalenak
I am in the process of indexing about 1.5 million documents, and have
started down the path of indexing these by month.  Each month has between
100,000 and 200,000 documents.  From a performance standpoint, is this the
right approach?  This allows me to use MultiSearcher (or
ParallelMultiSearcher), but I'm not sure if the performance gains are really
there.  Would one monolithic index be better?

Thanks.

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Index Layout Question

2005-01-27 Thread Jerry Jalenak
That's good to know.

I'm indexing on 11 fields (9 keyword, 2 text).  The documents themselves are
between 1K to 2K in size.

Is there a point at which IndexSearcher performance begins to fall off?  (in
term of # of index records?)

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


-Original Message-
From: Ian Soboroff [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 27, 2005 10:31 AM
To: Lucene Users List
Subject: Re: Index Layout Question


Jerry Jalenak [EMAIL PROTECTED] writes:

 I am in the process of indexing about 1.5 million documents, and have
 started down the path of indexing these by month.  Each month has between
 100,000 and 200,000 documents.  From a performance standpoint, is this the
 right approach?  This allows me to use MultiSearcher (or
 ParallelMultiSearcher), but I'm not sure if the performance gains are
really
 there.  Would one monolithic index be better?

Depends on your search infrastructure.  Doug Cutting has sent out some
basic optimization guidelines on this list which should be in the
archives... simply, you need to think about how many CPUs and spindles
are involved.

1.5m documents isn't a challenge for Lucene to index or search on a
single machine with a monolithic index.  I indexed about 1.6m web
pages in 22 hours on a single machine with all data local, and search
with a single IndexSearcher was instantaneous.  We've also done some
testing with a larger collection (25m pages) and
ParallelMultiSearchers on several machines, and likewise on a fast
network haven't felt a slowdown, but we haven't actually benchmarked
it.

Ian



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[HOWTO] Setting BooleanQuery MaxClauseCount

2005-01-26 Thread Jerry Jalenak
Is there a way to set the maxClauseCount field of BooleanQuery when using
QueryParser?

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [HOWTO] Setting BooleanQuery MaxClauseCount

2005-01-26 Thread Jerry Jalenak
Never mind.

disclaimer
These types of questions is what occurs when one is trying to do too many
things at the same time.
/disclaimer

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


-Original Message-
From: Jerry Jalenak [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 26, 2005 10:19 AM
To: 'lucene-user@jakarta.apache.org'
Subject: [HOWTO] Setting BooleanQuery MaxClauseCount


Is there a way to set the maxClauseCount field of BooleanQuery when using
QueryParser?

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential
and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible
for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Filtering w/ Multiple Terms

2005-01-24 Thread Jerry Jalenak
I spent some time reading the Lucene in Action book this weekend (great job,
btw), and came across the section on using custom filters.  Since the data
that I need to use to filter my hit set with comes from a database, I
thought it would be worth my effort this morning to write a custom filter
that would handle the filtering for me.  So, using the example from the book
(page 210), I've coded an AccountFilter:

public class AccountFilter extends Filter
{
public AccountFilter()
{}

public BitSet bits(IndexReader indexReader)
throws IOException
{
System.out.println(Entering AccountFilter...);
BitSet bitSet = new BitSet(indexReader.maxDoc());

String[] reportingAccounts = new String[] {0011, 4kfs};

int[] docs = new int[1];
int[] freqs = new int[1];

for (int i = 0; i  reportingAccounts.length; i++)
{
String reportingAccount = reportingAccounts[i];
if (reportingAccount != null)
{
TermDocs termDocs = indexReader.termDocs(new
Term(account, reportingAccount));
int count = termDocs.read(docs, freqs);
if (count == 1)
{
System.out.println(Setting bit
on);
bitSet.set(docs[0]);
}
}
}
System.out.println(Leaving AccountFilter...);
return bitSet;
}
}

I see where the AccountFilter is setting the cooresponding 'bits', but I end
up without any 'hits':

Entering AccountFilter...
Entering AccountFilter...
Entering AccountFilter...
Setting bit on
Setting bit on
Setting bit on
Setting bit on
Setting bit on
Leaving AccountFilter...
Leaving AccountFilter...
Leaving AccountFilter...
... Found 0 matching documents in 1000 ms

Can anyone tell me what I've done wrong?

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
 Sent: Friday, January 21, 2005 8:15 AM
 To: Lucene Users List
 Subject: RE: Filtering w/ Multiple Terms
 
 
 This:
 http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/se
 arch/BooleanQuery.TooManyClauses.html
 ?
 
 You can control that limit via
 http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/se
 arch/BooleanQuery.html#maxClauseCount
 
 Otis
 
 
 --- Jerry Jalenak [EMAIL PROTECTED] wrote:
 
  OK.  But isn't there a limit on the number of 
 BooleanQueries that can
  be
  combined with AND / OR / etc?
  
  
  
  Jerry Jalenak
  Senior Programmer / Analyst, Web Publishing
  LabOne, Inc.
  10101 Renner Blvd.
  Lenexa, KS  66219
  (913) 577-1496
  
  [EMAIL PROTECTED]
  
  
   -Original Message-
   From: Erik Hatcher [mailto:[EMAIL PROTECTED]
   Sent: Thursday, January 20, 2005 5:05 PM
   To: Lucene Users List
   Subject: Re: Filtering w/ Multiple Terms
   
   
   
   On Jan 20, 2005, at 5:02 PM, Jerry Jalenak wrote:
   
In looking at the examples for filtering of hits, it looks 
   like I can 
only
specify a single term; i.e.
   
Filter f = new QueryFilter(new TermQuery(new 
 Term(acct,
acct1)));
   
I need to specify more than one term in my filter.  Short of
  using 
something
like ChainFilter, how are others handling this?
   
   You can make as complex of a Query as you want for 
   QueryFilter.  If you 
   want to filter on multiple terms, construct a BooleanQuery 
   with nested 
   TermQuery's, either in an AND or OR fashion.
   
 Erik
   
   
  
  
 -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail:
  [EMAIL PROTECTED]
   
   
  
  This transmission (and any information attached to it) may be
  confidential and
  is intended solely for the use of the individual or entity to which
  it is
  addressed. If you are not the intended recipient or the person
  responsible for
  delivering the transmission to the intended recipient, be advised
  that you
  have received this transmission in error and that any use,
  dissemination,
  forwarding, printing, or copying of this information is strictly
  prohibited.
  If you have received this transmission in error, please immediately
  notify
  LabOne at the following email address:
  [EMAIL PROTECTED]
  
  
  
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED

RE: Filtering w/ Multiple Terms

2005-01-24 Thread Jerry Jalenak
Paul / Erik - 

I'm use the ParallelMultiSearcher to search three indexes concurrently -
hence the three entries into AccountFilter.  If I remove the filter from my
query, and simply enter the query on the command line, I get two hits back.
In other words, I can enter this:

smith AND (account:0011)

and get hits back.  When I add the filter back in (which should take care of
the account:0011 part of the query), and enter only smith as my query, I get
0 hits.



Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Monday, January 24, 2005 1:07 PM
 To: Lucene Users List
 Subject: Re: Filtering w/ Multiple Terms
 
 
 
 On Jan 24, 2005, at 12:26 PM, Jerry Jalenak wrote:
  I spent some time reading the Lucene in Action book this weekend 
  (great job,
  btw)
 
 Thanks!
 
  public class AccountFilter extends Filter
  I see where the AccountFilter is setting the cooresponding 
 'bits', but 
  I end
  up without any 'hits':
 
  Entering AccountFilter...
  Entering AccountFilter...
  Entering AccountFilter...
  Setting bit on
  Setting bit on
  Setting bit on
  Setting bit on
  Setting bit on
  Leaving AccountFilter...
  Leaving AccountFilter...
  Leaving AccountFilter...
  ... Found 0 matching documents in 1000 ms
 
  Can anyone tell me what I've done wrong?
 
 A filter constrains which documents will be consulted during 
 a search, 
 but the Query needs to match some documents that are turned on by the 
 filter bits.  I'm guessing that your Query did not match any of the 
 documents you turned on.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Filtering w/ Multiple Terms

2005-01-24 Thread Jerry Jalenak
sheepish-look-on-face/

After re-reading the book (again), and the javadocs (again), it dawned on my
little brain that I needed to have a doc and freq array *the size of
maxDocs* for the index reader.  I also needed to iterate through the docs
array and call bitSet.set for each entry in docs (that was valid, of
course).  Everything is good now

Thanks!

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Monday, January 24, 2005 1:27 PM
 To: Lucene Users List
 Subject: Re: Filtering w/ Multiple Terms
 
 
 As Paul suggested, output the Lucene document numbers from your Hits, 
 and also output which bit you're setting in your filter.  Do 
 those sets 
 overlap?
 
   Erik
 
 On Jan 24, 2005, at 2:13 PM, Jerry Jalenak wrote:
 
  Paul / Erik -
 
  I'm use the ParallelMultiSearcher to search three indexes 
 concurrently 
  -
  hence the three entries into AccountFilter.  If I remove the filter 
  from my
  query, and simply enter the query on the command line, I 
 get two hits 
  back.
  In other words, I can enter this:
 
  smith AND (account:0011)
 
  and get hits back.  When I add the filter back in (which 
 should take 
  care of
  the account:0011 part of the query), and enter only smith 
 as my query, 
  I get
  0 hits.
 
 
 
  Jerry Jalenak
  Senior Programmer / Analyst, Web Publishing
  LabOne, Inc.
  10101 Renner Blvd.
  Lenexa, KS  66219
  (913) 577-1496
 
  [EMAIL PROTECTED]
 
 
  -Original Message-
  From: Erik Hatcher [mailto:[EMAIL PROTECTED]
  Sent: Monday, January 24, 2005 1:07 PM
  To: Lucene Users List
  Subject: Re: Filtering w/ Multiple Terms
 
 
 
  On Jan 24, 2005, at 12:26 PM, Jerry Jalenak wrote:
  I spent some time reading the Lucene in Action book this weekend
  (great job,
  btw)
 
  Thanks!
 
  public class AccountFilter extends Filter
  I see where the AccountFilter is setting the cooresponding
  'bits', but
  I end
  up without any 'hits':
 
  Entering AccountFilter...
  Entering AccountFilter...
  Entering AccountFilter...
  Setting bit on
  Setting bit on
  Setting bit on
  Setting bit on
  Setting bit on
  Leaving AccountFilter...
  Leaving AccountFilter...
  Leaving AccountFilter...
  ... Found 0 matching documents in 1000 ms
 
  Can anyone tell me what I've done wrong?
 
  A filter constrains which documents will be consulted during
  a search,
  but the Query needs to match some documents that are 
 turned on by the
  filter bits.  I'm guessing that your Query did not match any of the
  documents you turned on.
 
 Erik
 
 
  
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: 
 [EMAIL PROTECTED]
 
 
 
  This transmission (and any information attached to it) may be 
  confidential and
  is intended solely for the use of the individual or entity 
 to which it 
  is
  addressed. If you are not the intended recipient or the person 
  responsible for
  delivering the transmission to the intended recipient, be 
 advised that 
  you
  have received this transmission in error and that any use, 
  dissemination,
  forwarding, printing, or copying of this information is strictly 
  prohibited.
  If you have received this transmission in error, please immediately 
  notify
  LabOne at the following email address: 
  [EMAIL PROTECTED]
 
 
  
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Filtering w/ Multiple Terms

2005-01-21 Thread Jerry Jalenak
OK.  But isn't there a limit on the number of BooleanQueries that can be
combined with AND / OR / etc?



Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Thursday, January 20, 2005 5:05 PM
 To: Lucene Users List
 Subject: Re: Filtering w/ Multiple Terms
 
 
 
 On Jan 20, 2005, at 5:02 PM, Jerry Jalenak wrote:
 
  In looking at the examples for filtering of hits, it looks 
 like I can 
  only
  specify a single term; i.e.
 
  Filter f = new QueryFilter(new TermQuery(new Term(acct,
  acct1)));
 
  I need to specify more than one term in my filter.  Short of using 
  something
  like ChainFilter, how are others handling this?
 
 You can make as complex of a Query as you want for 
 QueryFilter.  If you 
 want to filter on multiple terms, construct a BooleanQuery 
 with nested 
 TermQuery's, either in an AND or OR fashion.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Filtering w/ Multiple Terms

2005-01-20 Thread Jerry Jalenak
In looking at the examples for filtering of hits, it looks like I can only
specify a single term; i.e.

Filter f = new QueryFilter(new TermQuery(new Term(acct,
acct1)));

I need to specify more than one term in my filter.  Short of using something
like ChainFilter, how are others handling this?

Thanks!

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
) + 
, DOB =  +
document.get(dob) + , Collected =  + document.get(collected) + ,
Created =  + document.get(created));

//System.out.println(document.get(content));
}
}
}
}
catch(Exception e)
{
System.out.println(e.getClass() +  caught with message  +
e.getMessage());
}
}
/snip

When I run this using a criteria string of 

lastname:mar*

I get back the following:

Query: 
lastname:mar*
Searching for: lastname:mar*
... Found 9 matching documents

Hit 0: Specimen = 40062720, Account = 0001, Status = N, Name = LOIS MARTIN,
SSN = 536628498, DOB = 19010101, Collected = 20050118, Created = 20050119
Hit 1: Specimen = 38843845, Account = 4NEK, Status = N, Name = RENEE
CAPPETTA, SSN = 585132901, DOB = 19010101, Collected = 20050117, Created =
20050119
Hit 2: Specimen = 39894441, Account = 3384, Status = N, Name = LINDA CANTU,
SSN = 453539817, DOB = 19010101, Collected = 20050118, Created = 20050119
Hit 3: Specimen = 39894441, Account = 3384, Status = N, Name = LINDA CANTU,
SSN = 453539817, DOB = 19010101, Collected = 20050118, Created = 20050119
Hit 4: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT
BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created =
20050119
Hit 5: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT
BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created =
20050119
Hit 6: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT
BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created =
20050119
Hit 7: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT
BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created =
20050119
Hit 8: Specimen = 38247027, Account = 23SQ, Status = N, Name = ROBERT
BASTOW, SSN = 528960058, DOB = 19010101, Collected = 20050118, Created =
20050119

I'm at a loss to explain why I'm getting hits 1 - 8 - the lastnames don't
start with mar!  I suspect it is due to an incorrect use of Field.Keyword vs
Field.Text in the indexer, but I can seem to figure it out...

Thanks.

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
Erik,

Thanks for reply.  Some lists want all the info, some don't.  Just thought
I'd try to provide as much info as possible  8-)

That being said, where do I find Luke?



Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 19, 2005 2:42 PM
 To: Lucene Users List
 Subject: Re: [newbie] Confused about PrefixQuery
 
 
 
 On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote:
  The text files have two control lines at the beginning of 
 them - CC 
  and
  AN.
 
 That's quite a complex example to ask a user list to decipher.
 
 Simplifying the example, besides making it easier for us to 
 understand, 
 would likely shed light on the problem.
 
  Everything (I think) indexes correctly.
 
 To be sure, try Luke out and see what got indexed exactly.  You can 
 also use Luke as an ad-hoc search tool rather than writing your own.
 
When I search against
  this index, though, I get some weird results, especially 
 when using an 
  '*'
  at the end of my criteria.
 
 The results you got definitely are weird given the query, and in my 
 initial glance through your code I did not see the issue pop 
 out.  Luke 
 will likely shed much more light on the matter.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
oops /

Never mind.  Stupid, stupid assumption on my part with the data.

Thanks anyway.

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Jerry Jalenak [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 19, 2005 3:12 PM
 To: 'Lucene Users List'
 Subject: RE: [newbie] Confused about PrefixQuery
 
 
 Erik,
 
 Thanks for reply.  Some lists want all the info, some don't.  
 Just thought
 I'd try to provide as much info as possible  8-)
 
 That being said, where do I find Luke?
 
 
 
 Jerry Jalenak
 Senior Programmer / Analyst, Web Publishing
 LabOne, Inc.
 10101 Renner Blvd.
 Lenexa, KS  66219
 (913) 577-1496
 
 [EMAIL PROTECTED]
 
 
  -Original Message-
  From: Erik Hatcher [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, January 19, 2005 2:42 PM
  To: Lucene Users List
  Subject: Re: [newbie] Confused about PrefixQuery
  
  
  
  On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote:
   The text files have two control lines at the beginning of 
  them - CC 
   and
   AN.
  
  That's quite a complex example to ask a user list to decipher.
  
  Simplifying the example, besides making it easier for us to 
  understand, 
  would likely shed light on the problem.
  
   Everything (I think) indexes correctly.
  
  To be sure, try Luke out and see what got indexed exactly.  You can 
  also use Luke as an ad-hoc search tool rather than writing your own.
  
 When I search against
   this index, though, I get some weird results, especially 
  when using an 
   '*'
   at the end of my criteria.
  
  The results you got definitely are weird given the query, and in my 
  initial glance through your code I did not see the issue pop 
  out.  Luke 
  will likely shed much more light on the matter.
  
  Erik
  
  
  
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
  
 
 This transmission (and any information attached to it) may be 
 confidential and
 is intended solely for the use of the individual or entity to 
 which it is
 addressed. If you are not the intended recipient or the 
 person responsible for
 delivering the transmission to the intended recipient, be 
 advised that you
 have received this transmission in error and that any use, 
 dissemination,
 forwarding, printing, or copying of this information is 
 strictly prohibited.
 If you have received this transmission in error, please 
 immediately notify
 LabOne at the following email address: 
 [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
Sorry.  Thought Luke came bundled with Lucene, and I was just missing it..

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 19, 2005 3:28 PM
 To: Lucene Users List
 Subject: Re: [newbie] Confused about PrefixQuery
 
 
 
 On Jan 19, 2005, at 4:12 PM, Jerry Jalenak wrote:
  Thanks for reply.  Some lists want all the info, some don't.  Just 
  thought
  I'd try to provide as much info as possible  8-)
 
 The info is good... I just push for simple examples :)  By 
 simplifying, 
 often the problem becomes apparent and trivial.
 
  That being said, where do I find Luke?
 
 Silly response, but go to Google, type in _luke lucene_ and 
 press I'm 
 feeling lucky :)
 
 But, since I already have the URL handy, here it is:
 
   http://www.getopt.org/luke/
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]