I did the change and here are the results: Query (default field is COMP_PART_NUMBER): 2444* Query: COMP_PART_NUMBER:2444* Query Time: 328 ms - time for query to run. 383 total matching documents. Cycle Time: 141 ms - time to run through hits.
Query (default field is COMP_PART_NUMBER): *91822* Query: COMP_PART_NUMBER:*91822* Query Time: 9375 ms 251 total matching documents. Cycle Time: 20094 ms On 9/8/05, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : is if the query starts with a wildcard. In the case where it starts with > a > : wildcard, lucene has no option but to linearly go over every term in the > : index to see if it matches your pattern. It must visit every singe term > in > > That would explain why the search itself takes a while, but not why > accessing the hits after the call to search would take a while. note > where the timing code is in his example. > > There are two possible explanations i can think of... > > : >>It seems when I have a wilcard query like *abcd* vs weqrew*, the > *abcd* > : query will always take longer to retrieve the documents even if they are > of > : simular result sizes. We are talking a big difference 1 second vs 16. It > is > > 1) How similar, and how many? ... If i remember correctly, the Hits > constructor does some work to pre-fetch the first 100 results. So if you > are iterating over all of the results, the first 100 are free. On the > 101st iteration the prefetching method is called again to fetch N more (i > don't remember what N is off the top of my head. > > what this means is that if you are only timing the method calls on Hits, > then the first 100 documents are free -- if one wildcard search returns 99 > results, and the other returns 105 results, those numbers may not seemthat > different, but in the first case the code you are timing is accessing > nothing but memory, and in the second case it has to read from disk. > > 2) The second idea also requires you to answer a question" the number of > results returned for each query might be identicle, but are the > results themselves identical? > > I'm guessing that either the documents from the "slow" case are either > much bigger (ie: larger stored fields) or the results from the fast case > are all documents that are "near" eachother on disk, so fetching back all > of hte stored fields would require less IO then if the results are stored > farther apart. If i remember correctly, the stored fields of documents > are kept in order that the documents are added, so hypothetically, the > query you did was on a "name" field, and the documents were added to the > index in alphabetical order by "name" then by definition the results for > "weqrew*' will all be close together, while the results for "*abcd*" will > be spread out throughout the index. > > an easy way to disprove that 2nd theory would be to change your timing > code to this and see what happens... > > > Hits hits = searcher.search(query); > long startTime = System.currentTimeMillis(); > for (int i = 0; i < hits.length(); i++) { > int id = hits.id(i); > } > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >