Thank you Alexander. 

I will try the SVN right now. :) 
I will keep you in touch with any result.

See you.

----- Original Message ----
From: Alexander Veremyev <[EMAIL PROTECTED]>
To: Sebi <[EMAIL PROTECTED]>
Cc: fw-general@lists.zend.com
Sent: Thursday, February 8, 2007 6:27:07 PM
Subject: Re: [fw-general] Zend_Search_Lucene questions ...

Hi Sebi,

I've done with the first optimization issue and committed it. Please try 
current SVN version.

In my environment it's faster (!) than Java Lucene for simple queries 
and has near the same result for complex boolean
 queries.

If second optimization issue will be also fixed, then it has a chance to 
be faster in all cases :)


Time should be calculated including index opening time 
(Zend_Search_Lucene loads term dictionary index only at first find() 
request).



The second optimization issue is also important, because of "default 
search field" processing. Java Lucene searches through 'contents' field 
by default, but Zend_Search_Lucene searches through all fields. So it 
transforms query to more complex query, but this complexity may be 
reduced with query optimizer.

With best regards,
    Alexander Veremyev.


Alexander Veremyev wrote:
> Hi Sebi,
> 
> I already have these improvements in mind for a long time. So I'll do this.
> I think it will be done soon because of importance of these improvements.
> 
> Actually I already work on Boolean queries
 optimization.
> 
> 
> With best regards,
>    Alexander Veremyev.
> 
> 
> Sebi wrote:
>> -->
>> Thank you for your support, Alexander. Now we have to find developers 
>> to implement these
>> optimizations. When do you think they will be implemented?
>>
>>
>>     * Hi Sebi,
>>     * I made small research on a performance problem.
>>     * Several things makes search time with Java Lucene and
>>       Zend_Search_Lucene
>>     * so different.
>>     * 1. Luke and Java Lucene searching example doesn't calculate index
>>     * opening time.
>>     * Including this time Java Lucene is only two times faster
 than
>>     * Zend_Search_Lucene.
>>     * 2. Boolean queries are not optimal now.
>>     * Boolean query should skip non-matched documents, but it tries to
>>     * calculate score for them (and gets right result "0").
>>     * If it's fixed, then searching time will be near the same as for
>>       Java Lucene.
>>     * 3. Zend_Search_Lucene doesn't optimize query yet. It should 
>> transform
>>     * query to most simple form and it's designed to do this, but this
>>       feature
>>     * is not implemented yet.
>>     * Implementation of this feature may give the same result as boolean
>>     *
 queries optimization (most queries may be transformed to
>>       term/multi-term
>>     * queries).
>>     * Of course, both these optimizations have to be implemented.
>>     * In addition to this a lot of time is taken by I/O operations
>>       (30-40%).
>>     * As I tested before, moving these operations into C extension makes
>>       them
>>     * several times faster.
>>     * So moving I/O into optional C extension may make Zend_Search
>>       faster than
>>     * Java Lucene :)
>>     * With best regards,
>>     *     Alexander
 Veremyev.
>>     * Sebi wrote:
>>     *  > I see. You got good results. I want to have them too. I think
>>       there might be 1 problem: my computer performance.
>>     *  > For about 9000 docs, and with index optimized (using optimize()
>>       function) I get a search (which returned about 70 docs) with a
>>       time of 1.5 sec. Anyway this is slow. The interesting point is
>>       that Luke execute the same query only in 56 ms.
>>     *  >
>>     *  > I have the following questions:
>>     *  >
>>     *  > 1. Why do you think the Luke
 tool search the same query in 56
>>       ms? It is PHP execution so slow?
>>     *  >
>>     *  > 2. I have the 7 version of Zend installed. Should I get the
>>       last snapshot?
>>     *  >
>>     *  > 3. Do you have any advices for improving this search process?
>>     *  >
>>     *  >
>>     *  >
>>     *  > Hi Sebi,
>>     *  >
>>     *  > 1. I've just added necessary methods.
>>     *  >
>>    
 *  > $index->numDocs() may be used to retrieve number of non-deleted
>>       documents.
>>     *  > $index->maxDoc() returns one greater than the largest possible
>>       document
>>     *  > number (synonym for $index->count()).
>>     *  >
>>     *  >
>>     *  > 2. I think, it's already a speed of PHP strings/objects 
>> processing
>>     *  > itself + large result set.
>>     *  >
>>     *  > I just made some tests:
>>     *  > PHP v5.2, WinXP
>>    
 *  > AMD Athlon 64 3000+, Seagate ST316082 7AS 160Gb SATA HD
>>     *  >
>>     *  > a.
>>     *  > index size - 11.000 documents
>>     *  > optimized index - ~42Mb (document content is also stored)
>>     *  > source documents size - 33Mb
>>     *  >
>>     *  > Results:
>>     *  > ---------------------------
>>     *  > find() with 11000 docs result set - ~2.0 sec
>>     *  > find() with 4000 docs result set  - ~0.86 sec
>>     *  > find() with 1000 docs result set  - ~0.35
 sec
>>     *  > ---------------------------
>>     *  >
>>     *  > b.
>>     *  > index size - 6.059 documents
>>     *  > optimized index - ~40Mb
>>     *  > source documents size - 31Mb (document content is also stored)
>>     *  >
>>     *  > Results:
>>     *  > ---------------------------
>>     *  > find() with 6059 docs result set - ~0.90 sec
>>     *  > find() with 2 docs result set  - ~0.17 sec
>>     *  > find() with 0 docs result set  - ~0.17
 sec
>>     *  > ---------------------------
>>     *  >
>>     *  >
>>     *  > I think it's also possible to make some optimizations.
>>     *  > Please add an issue into issue tracker for this (or I can do 
>> it).
>>     *  >
>>     *  >
>>     *  > 3. I got one report for large index some time ago:
>>     *  > Source data: 8Gb
>>     *  > 2xAMD 64 Opteron 250
>>     *  > iSCSI 4x36Gb in RAID 1+0
>>     *  > FreeBSD 7.0
>>    
 *  > Search time is 5-10 sec
>>     *  >
>>     *  > I also have some ideas for search optimization, which will work
>>     *  > especially for large indices.
>>     *  >
>>     *  >
>>     *  > With best regards,
>>     *  >     Alexander Veremyev.
>>     *  >
>>     *  >
>>     *  > Sebi wrote:
>>     *  >> Any answer? Alexander?
>>     *  >>
>>     *  >> Anyway I want to add some more
 questions.
>>     *  >>
>>     *  >> 1. The $index->count() does not reflect the real content of
>>       the database. I need to optimize the index for retrieving the
>>       correct number of documents. Is there any other way to find the
>>       exact count of documents?
>>     *  >>
>>     *  >> 2. I want to reopen the search problem. The time is to big.
>>     *  >>
>>     *  >>> I have 8737 documents which are indexed right now. When I
>>       search after  keywords like: 'arte', 'galeria', etc, I get a
 time
>>       about 3.15 sec. When I had
>>     *  >>> only  4500 documents my time was about 1.6 sec. The generated
>>       query looks like: +(((titleSrch:galeria))
>>       ((descriptionSrch:galeria)) ((tagsSrch:galeria)))
>>     *  >>> +(countryID:1) .
>>     *  >>> I mention that I measure only the time of the call of find()
>>       function. Without the retrieval of the documents fields.
>>     *  >> I optimized the index using optimize function and the search
>>       was improved. The time was about 1.5 sec (2 times faster). But
>>      
 again is too big. I have only 8737 documents and a size of index
>>       about 2.7 MB. Another interesting thing is that, if I use Luke for
>>       searching, the time is only 56 ms. So, What is the problem? The
>>       PHP file system access?
>>     *  >>
>>     *  >> I want help with search time because that was my first goal:
>>       to have a fast search by relevance. And this is not what I get
>>       right now.
>>     *  >>
>>     *  >> 3. How this engine will behave with 1 million of documents?
>>       For searching
 inside.
>>     *  >>
>>     *









Need Mail bonding?
Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.





 
____________________________________________________________________________________
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.
http://mobile.yahoo.com/mail 

Reply via email to