The model I proposed will not buy us a significant performance gain, which was also not my major motivation. (That being said, I also second the idea of implementing a benchmark.) Instead, I wanted to address the issue of separating the concerns of handling bots and regular devices.
Maybe I better should rephrase my starting point: How can we add new bot and HTTP client footprints to the existing DDR? On Tue Dec 09 2014 at 2:31:24 PM Reza Naghibi <[email protected]> wrote: > So let me explain some of the issues with this. Regardless, I would still > like you to benchmark said patch and share the results. This will help > drive the direction of future work on the clients. > > 1) Im almost certain isBot(ua) will perform worse than classify(ua), > defeating the whole purpose of short circuiting classify. How do you plan > on implementing isBot()? If that algorithm performs better than classify(), > we might as well use it to match the entire DDR. No? > > 2) Under no circumstances should we implement DDR logic in code. The code > should remain as a generic as possible. This means that its just a plain > old ngram matcher. This kind of logic belongs in the DDR definition. Right > now this allows for patterns and ranking. So maybe what you asking is that > high ranking patterns be checked for first in a very quick way? Well, why > are bots so high ranking? In normal traffic, bots make up a very small > percentage. So wouldnt it make sense to check for Samsung and Apple > products? > > Once again, if possible, please benchmark some before and afters so we can > get a better idea of what we are working with here. Eventhough im leaning > towards saying this is a bad idea, I think it is a good exercise. > > > From: Volkan YAZICI <[email protected]> > To: "[email protected]" <devicemap-dev@incubator. > apache.org> > Sent: Tuesday, December 9, 2014 7:34 AM > Subject: Handling Bots and HTTP Clients > > Hello, > > In the context of discussion "how do we handle HTTP clients", I would like > to vote for treating them as bots. Further, I want to propose adding a thin > layer above DeviceMapClient.classify() to make a shortcut for handling of > the bots as follows. > > private final static Map<String, String> botAttributes = > Collections.singletonMap("is_bot", "true"); > > public Map<String, String> classify(String userAgent) { > if (isBot(userAgent)) return botAttributes; > } > > The motivation for this change is as follows: > > - Almost all of the attributes are making no sense for a bot and we are > losing time to match it against the whole DDR. > - Bot database will be able to evolve independently. > - We can come up with a single compiled j.u.regex.Pattern to check bots. > (I am pretty sure Reza knows a lot better performing approaches, but > maybe > for a future release.) > > If the development team is ok with that, I want to implement this feature. > > Best. > > >
