Oh, okay.  I think I just mis-understood you before.  If you're not
regressing case sensitive search, and case insensitive search is the new
feature, then I'm a lot less worried.  Sorry for my confusion here.

-j

On Thu, Aug 21, 2008 at 01:38:01PM -0700, Brock Pytlik wrote:
> [EMAIL PROTECTED] wrote:
> > Brock,
> >
> >   
> >> http://cr.opensolaris.org/~bpytlik/ips-2672-v1/
> >> has the patch.
> >>     
> >
> > The code here looks okay to me.  I still don't completely grasp what's
> > going on in query_engine, but I did take a look at all the files.
> >
> >   
> Ok, if there's anything specific where I've lost you, let me know or 
> stop by and I can (hopefully) explain what's going on.
> >> Remote search takes the biggest hit, going from .1-.2 seconds to .6-.7 
> >> seconds. For now, I think this is still acceptable. The time it takes 
> >> will grow with the number of new search tokens introduced.
> >>     
> >
> > So this is between 6x and 3.5x slower?  That seems like a large
> > regression.  I'm assuming case insensitive search is lumped into this
> > number, but it's not entirely clear.
> >   
> Sorry, to be clear, case sensitive search (once it's again available on 
> the server) will continue to be at the .1-.2 second response time. The 
> hit comes from switching to a direct hash lookup on the token to a 
> regular expression match against all known tokens. You can see the same 
> performance hit when you do a wildcard search. For example, (on ipkg so 
> the numbers aren't directly comparable to the numbers above) searching 
> for exact token match takes .15 seconds, while simply tacking a * on the 
> end of each queries makes them take about .86 seconds on average. There 
> is a huge variation in times though that I can't really explain. The max 
> is about 1 seconds, but a few searches only took .13 or .14 seconds, and 
> I don't have a good explanation for that.
> > How does this scale?  You've said the time will grow with the number of
> > tokens.  Is the growth linear with respect to the number of tokens?
> >   
> Yes, it should be linear in the number of unique tokens.
> > It would be interesting to see at what point this gets painful.  Have
> > you run an experiment that would simulate a repository with a large
> > number of builds, just to see how long we have before this becomes a
> > serious issue?
> >
> >   
> I think I can easily tell you the relationship between number of unique 
> tokens and time for search. What's much harder to determine is the 
> relationship between number of unique tokens and builds. For example, in 
> one extreme, if we just republished a build exactly, it should have 
> essentially no effect on search times. On the other hand, if we brought 
> in a new consolidation (SFW for example), that would mean a huge bump in 
> the number of unique tokens. Having said all that, I'll do my best to 
> get the data I can so that we can at least start looking at the numbers. 
> I'll work on that this afternoon and send out the figures once I have them.
> >> If this becomes a performance issue in the future, there are several
> >> possible optimizations that can be made on the server side. My choice
> >> would be to essentially pre-build the case sensitive dictionary in
> >> addition to the case-sensitive dictionary.
> >>     
> >
> > Glad you know how to solve this problem when it shows up. ;)
> >
> >   
> > -j
> >
> >   
> 
> Thanks for the feedback,
> 
> Brock
> _______________________________________________
> pkg-discuss mailing list
> [email protected]
> http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to