Dear Wiki user, You have subscribed to a wiki page or wiki category on "Devicemap Wiki" for change notification.
The "esjr/Test Data" page has been changed by esjr: https://wiki.apache.org/devicemap/esjr/Test%20Data New page: ##master-page:HomepageReadWritePageTemplate ##master-date:Unknown-Date #format wiki #language en = UserAgent Test Data = This document describes the test data files used in DeviceMap tests. <<BR>> ''(todo : add svn link once upload finishes)'' == UserAgentString.txt == Columns : * UserAgentString : nvarchar(1500) Currently contains 918,709 unique user agent strings.<<BR>> The majority was collected from web access logs from live web servers.<<BR>> 102,121 of these were identified as belonging to mobile or other devices. == UserAgentDetail.txt == Pipe-separated text file.<<BR>> Columns : * StringHash : varbinary(32) : hashbytes('SHA2_256', UserAgentString) * TypeId : int * Flag : int Because there is no separator character imaginable that can be useful to separate columns, the actual user agent string is split from it's properties in UserAgentDetail.txt.<<BR>> The user agent string is linked to it's detail record via its SHA-2 256 hash. (In an RDBMS, like MS SQL, adding this field as persistent computed columns speeds things up '''considerably'''.)<<BR>> The TypeId field is the PK or Id of the Types listed in UserAgentType.txt.<<BR>> The Flag field is used to mark user agent strings so that the same set can be used in different tests (see below). == UserAgentType.txt == Pipe-separated text file.<<BR>> Columns : * Id int * Type nvarchar(50) UserAgentType list 76 types of user agent strings (some of which are debatable). == UserAgentDevice.txt == Pipe-separated text file.<<BR>> Columns : * StringHash : UserAgent SHA-256 hash * OpenDdr : OpenDdr device Id found via OpenDdr code * DeviceMap : OpenDdr device Id found via DeviceMapClient code * Flag used to separate data sets for testing == Testing == For tests the data is best loaded in an RDBMS.<<BR>> This is the general procedure I use :<<BR>> 1. Create instance of client/parser class 2. GetDataSet : SELECT PK and UserAgentString : random, based on type or flagged dataset 3. 'cold' run using 3 pre-selected user agent strings 4. For each UserAgentString in DataSet * Start Timer * Map/Resolve * Stop Timer * INSERT PK, TimeTaken and DeviceId (or 'unknown') in ResultLog 5. Rinse and repeat...
