Oh, okay. I think I just mis-understood you before. If you're not regressing case sensitive search, and case insensitive search is the new feature, then I'm a lot less worried. Sorry for my confusion here.
-j On Thu, Aug 21, 2008 at 01:38:01PM -0700, Brock Pytlik wrote: > [EMAIL PROTECTED] wrote: > > Brock, > > > > > >> http://cr.opensolaris.org/~bpytlik/ips-2672-v1/ > >> has the patch. > >> > > > > The code here looks okay to me. I still don't completely grasp what's > > going on in query_engine, but I did take a look at all the files. > > > > > Ok, if there's anything specific where I've lost you, let me know or > stop by and I can (hopefully) explain what's going on. > >> Remote search takes the biggest hit, going from .1-.2 seconds to .6-.7 > >> seconds. For now, I think this is still acceptable. The time it takes > >> will grow with the number of new search tokens introduced. > >> > > > > So this is between 6x and 3.5x slower? That seems like a large > > regression. I'm assuming case insensitive search is lumped into this > > number, but it's not entirely clear. > > > Sorry, to be clear, case sensitive search (once it's again available on > the server) will continue to be at the .1-.2 second response time. The > hit comes from switching to a direct hash lookup on the token to a > regular expression match against all known tokens. You can see the same > performance hit when you do a wildcard search. For example, (on ipkg so > the numbers aren't directly comparable to the numbers above) searching > for exact token match takes .15 seconds, while simply tacking a * on the > end of each queries makes them take about .86 seconds on average. There > is a huge variation in times though that I can't really explain. The max > is about 1 seconds, but a few searches only took .13 or .14 seconds, and > I don't have a good explanation for that. > > How does this scale? You've said the time will grow with the number of > > tokens. Is the growth linear with respect to the number of tokens? > > > Yes, it should be linear in the number of unique tokens. > > It would be interesting to see at what point this gets painful. Have > > you run an experiment that would simulate a repository with a large > > number of builds, just to see how long we have before this becomes a > > serious issue? > > > > > I think I can easily tell you the relationship between number of unique > tokens and time for search. What's much harder to determine is the > relationship between number of unique tokens and builds. For example, in > one extreme, if we just republished a build exactly, it should have > essentially no effect on search times. On the other hand, if we brought > in a new consolidation (SFW for example), that would mean a huge bump in > the number of unique tokens. Having said all that, I'll do my best to > get the data I can so that we can at least start looking at the numbers. > I'll work on that this afternoon and send out the figures once I have them. > >> If this becomes a performance issue in the future, there are several > >> possible optimizations that can be made on the server side. My choice > >> would be to essentially pre-build the case sensitive dictionary in > >> addition to the case-sensitive dictionary. > >> > > > > Glad you know how to solve this problem when it shows up. ;) > > > > > > -j > > > > > > Thanks for the feedback, > > Brock > _______________________________________________ > pkg-discuss mailing list > [email protected] > http://mail.opensolaris.org/mailman/listinfo/pkg-discuss _______________________________________________ pkg-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
