So I looked over the results. The biggest mismatch seems to be over the Pre. It looks like the problem is with the OpenDDR client, its mis identifying browsers as the Pre due to matching on Presto, Preload, Wordpress, etc. Ex:
0.0666|desktopDevice|Opera/9.80 (Android 3.0.1; Linux; Opera Tablet/ADR-1106291546; U; en) Presto/2.8.149 Version/11.10 |Pre/3.0 0.0906|desktopDevice|Opera/9.80 (Windows NT 6.1; WOW64; U; IBM EVV/3.0/EAK01AG9/LE; MRA 5.10 (build 5310); ro) Presto/2.10.229 Version/11.62|Pre/3.0 0.0816|desktopDevice|Opera/9.80 (Windows NT 5.1; U; MRA 5.6 (build 03402); MRSPUTNIK OW 2, 3, 0, 104; ru) Presto/2.6.30 Version/10.63|Pre/3.0 Next biggest offender: Nokia7210. This is a TwoStep pattern which occurs as a unigram in the user agent. Next: r451. Same problem, defined as a TwoStep, it should be a unigram, a lot of detections are wrong on the OpenDDR side. Next: Droid. This is hitting a partial word: Android on the OpenDDR side. We should have an "Android" pattern which is a catch all for Android phones anyway. Next: HTC Desire HD. This one is fine, the ids used are different. Next: SCH-M828C. This one has a [ following it, so it isnt being tokenized properly. This is fixable. Next: NokiaC3. Defined as a TwoStep, appears as a unigram. Next: GT-I9100. A lot of the user agents have a letter following the 9100... Next: LG220C. Defined as TwoStep, appears a unigram. Next: Nokia6300. Defined as a TwoStep, appears as a unigram. So im going to stop there. This accounts for about 1000 of the 3000 mismatches, but maybe more. So im going to add logic to combine TwoStep patterns as unigram. That should fix a big chunk of these. However, at some point we should maybe address some of these problems in the DDR data. Also, something I did in dclass was make a lot of fallback patterns which catch large groups of device classes. Example: Android, Nokia...., LG....?. So we should consider doing this as well, and maybe without the use of regex... ________________________________ From: eberhard speer jr. <[email protected]> To: "[email protected]" <[email protected]> Sent: Tuesday, June 25, 2013 10:19 PM Subject: Test results - continued -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Just noticed some UAProf URLs slipped thru as UserAgent, so that reduces the 'unknown' count to 1,774. esjr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRyk+zAAoJEOxywXcFLKYcTcAIAKsya8YBC2Zx6R+OGxgE31f8 d8DP4Jzya6EUg5Qq+Ir+LwyjtXmALdboqHSD+kPEDJb606fCUAXnJ8vOakYfo4Bt IgfZLR9qAhsdf1VuL+KQ3EFqanf27OaKfp80EkdhLDAAuaDMBnf2Hn1eF8py+g0a a5daEfSXTU4AvQyosP+K2FdYioPJ9AlG9cvQRub+vEhKyCzS8Mrb0ZQiFozkRRyY 6SPFNhRDp8UGnza5DypJFf5buZv6Z4O6NCaUpKN1ZrMnbrzBqnLnJga0RN4WI8Om g0sqzXrqnUUfb11pdDzm+RSgDdgk1BrtuJog0iFUg3P1ic2TxmHpIsujX4yg0fc= =BR/U -----END PGP SIGNATURE-----
