So I looked over the results.

The biggest mismatch seems to be over the Pre. It looks like the problem is 
with the OpenDDR client, its mis identifying browsers as the Pre due to 
matching on Presto, Preload, Wordpress, etc. Ex:

0.0666|desktopDevice|Opera/9.80 (Android 3.0.1; Linux; Opera 
Tablet/ADR-1106291546; U; en) Presto/2.8.149 Version/11.10 |Pre/3.0
0.0906|desktopDevice|Opera/9.80 (Windows NT 6.1; WOW64; U; IBM 
EVV/3.0/EAK01AG9/LE; MRA 5.10 (build 5310); ro) Presto/2.10.229 
Version/11.62|Pre/3.0
0.0816|desktopDevice|Opera/9.80 (Windows NT 5.1; U; MRA 5.6 (build 03402); 
MRSPUTNIK OW 2, 3, 0, 104; ru) Presto/2.6.30 Version/10.63|Pre/3.0

Next biggest offender: Nokia7210. This is a TwoStep pattern which occurs as a 
unigram in the user agent.

Next: r451. Same problem, defined as a TwoStep, it should be a unigram, a lot 
of detections are wrong on the OpenDDR side.

Next: Droid. This is hitting a partial word: Android on the OpenDDR side. We 
should have an "Android" pattern which is a catch all for Android phones anyway.

Next: HTC Desire HD. This one is fine, the ids used are different.

Next: SCH-M828C. This one has a [ following it, so it isnt being tokenized 
properly. This is fixable.

Next: NokiaC3. Defined as a TwoStep, appears as a unigram.


Next: GT-I9100. A lot of the user agents have a letter following the 9100...

Next: LG220C. Defined as TwoStep, appears a unigram.

Next: Nokia6300. Defined as a TwoStep, appears as a unigram.


So im going to stop there. This accounts for about 1000 of the 3000 mismatches, 
but maybe more. So im going to add logic to combine TwoStep patterns as 
unigram. That should fix a big chunk of these.

However, at some point we should maybe address some of these problems in the 
DDR data.

Also, something I did in dclass was make a lot of fallback patterns which catch 
large groups of device classes. Example: Android, Nokia...., LG....?. So we 
should consider doing this as well, and maybe without the use of regex...



________________________________
 From: eberhard speer jr. <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Tuesday, June 25, 2013 10:19 PM
Subject: Test results - continued
 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just noticed some UAProf URLs slipped thru as UserAgent, so that
reduces the 'unknown' count to 1,774.

esjr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRyk+zAAoJEOxywXcFLKYcTcAIAKsya8YBC2Zx6R+OGxgE31f8
d8DP4Jzya6EUg5Qq+Ir+LwyjtXmALdboqHSD+kPEDJb606fCUAXnJ8vOakYfo4Bt
IgfZLR9qAhsdf1VuL+KQ3EFqanf27OaKfp80EkdhLDAAuaDMBnf2Hn1eF8py+g0a
a5daEfSXTU4AvQyosP+K2FdYioPJ9AlG9cvQRub+vEhKyCzS8Mrb0ZQiFozkRRyY
6SPFNhRDp8UGnza5DypJFf5buZv6Z4O6NCaUpKN1ZrMnbrzBqnLnJga0RN4WI8Om
g0sqzXrqnUUfb11pdDzm+RSgDdgk1BrtuJog0iFUg3P1ic2TxmHpIsujX4yg0fc=
=BR/U
-----END PGP SIGNATURE-----

Reply via email to