Of course. Here is how it would look:
https://gist.github.com/rezan/7215f5b8d85db1eed0b0
So that configuration goes into a patch file. The patch files are overlayed
into the devicemap standard index. So the above XML would go into:
BuilderDataSourcePatch.xml
I just did a quick test on the java client and it works perfectly :) The one
thing is that you need to bundle the patch file with the DDR. This makes
loading from JAR or URL a bit harder since you cannot easily insert a patch
file into those sources. So you need to download the DDR and load from
filesystem folder. Im looking to make this easier in future versions.
From: Volkan YAZICI <[email protected]>
To: [email protected]; Reza Naghibi <[email protected]>
Sent: Tuesday, December 9, 2014 11:59 AM
Subject: Re: Handling Bots and HTTP Clients
The model I proposed will not buy us a significant performance gain, which was
also not my major motivation. (That being said, I also second the idea of
implementing a benchmark.) Instead, I wanted to address the issue of separating
the concerns of handling bots and regular devices.
Maybe I better should rephrase my starting point: How can we add new bot and
HTTP client footprints to the existing DDR?
On Tue Dec 09 2014 at 2:31:24 PM Reza Naghibi <[email protected]>
wrote:
So let me explain some of the issues with this. Regardless, I would still like
you to benchmark said patch and share the results. This will help drive the
direction of future work on the clients.
1) Im almost certain isBot(ua) will perform worse than classify(ua), defeating
the whole purpose of short circuiting classify. How do you plan on implementing
isBot()? If that algorithm performs better than classify(), we might as well
use it to match the entire DDR. No?
2) Under no circumstances should we implement DDR logic in code. The code
should remain as a generic as possible. This means that its just a plain old
ngram matcher. This kind of logic belongs in the DDR definition. Right now this
allows for patterns and ranking. So maybe what you asking is that high ranking
patterns be checked for first in a very quick way? Well, why are bots so high
ranking? In normal traffic, bots make up a very small percentage. So wouldnt it
make sense to check for Samsung and Apple products?
Once again, if possible, please benchmark some before and afters so we can get
a better idea of what we are working with here. Eventhough im leaning towards
saying this is a bad idea, I think it is a good exercise.
From: Volkan YAZICI <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Tuesday, December 9, 2014 7:34 AM
Subject: Handling Bots and HTTP Clients
Hello,
In the context of discussion "how do we handle HTTP clients", I would like
to vote for treating them as bots. Further, I want to propose adding a thin
layer above DeviceMapClient.classify() to make a shortcut for handling of
the bots as follows.
private final static Map<String, String> botAttributes =
Collections.singletonMap("is_bot", "true");
public Map<String, String> classify(String userAgent) {
if (isBot(userAgent)) return botAttributes;
}
The motivation for this change is as follows:
- Almost all of the attributes are making no sense for a bot and we are
losing time to match it against the whole DDR.
- Bot database will be able to evolve independently.
- We can come up with a single compiled j.u.regex.Pattern to check bots.
(I am pretty sure Reza knows a lot better performing approaches, but maybe
for a future release.)
If the development team is ok with that, I want to implement this feature.
Best.