[
https://issues.apache.org/jira/browse/DMAP-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624864#comment-14624864
]
Reza Naghibi commented on DMAP-107:
-----------------------------------
Regarding removing the 2 test strings, looks like "HTC One X" and "HTC One X+"
match the same since 1.0 normalizes out the regex. Otherwise, im guessing the +
is an error because its likely meant to be a literal \+ and not a regex. Since
the patterns are identical, the device choosen is the first one reached during
iteration. The patch changes this because the data structure changes from a
HashMap to a HashSet, so iteration order is different.
I could add logic to the ranking function to fix this, but at this point there
is no use. Getting matching to properly work on the ODDR data will never be
perfect because the data has many errors like the one above. So as always, the
solution here is the fix the bad pattern.
Also, some problems with your split() method. It shouldn't be static and you
can remove the reference to Apache commons by using String.isEmpty(). Not sure
we need the null check, but null is allowed in normalize(), so its best to err
on the side of safety. Below is the corrected version:
{code:java}
private List<String> split(String text) {
List<String> nonemptyParts = new ArrayList<String>();
String[] parts = TEXT_SPLIT_PATTERN.split(text);
for (String part : parts) {
String normalizedPart = Pattern.normalize(part);
if (normalizedPart != null && !normalizedPart.isEmpty()) {
nonemptyParts.add(normalizedPart);
}
}
return nonemptyParts;
}
{code}
Also, the style of the 1.0 Java client is to be explicit with imports and not
use the wildcard. Just a small style nitpick. So if you can correct the above
split() function (and fix the imports), your patch should be good to go with
the HTC One X tests removed.
> Performance optimizations for DeviceMapClient.classify()
> --------------------------------------------------------
>
> Key: DMAP-107
> URL: https://issues.apache.org/jira/browse/DMAP-107
> Project: DeviceMap
> Issue Type: Improvement
> Components: Java Client
> Affects Versions: 1.1.0 Java
> Reporter: Volkan Yazıcı
> Fix For: 1.1.1 Java
>
> Attachments: classify.diff
>
>
> This patch removes redundant {{DeviceType}} checks and User-Agent string
> normalization calls. Performance gain is more than 2x. Check out the
> devicemap-client-benchmark (introduced in issue DMAP-106) output:
> {code}
> $ export userAgentFile=/path/to/user-agents.txt
> $ wc -l $userAgentFile
> 195325
> $ java \
> -jar
> devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar \
> -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> ".*DeviceMapClientBenchmark.*"
> # Using the most recent trunk.
> Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000), stdev =
> 1160.484
> Confidence interval (99.9%): [10838.781, 13320.036]
> # Using the enhanced classify().
> Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> 413.211
> Confidence interval (99.9%): [5063.607, 5947.103]
> {code}
> JVM version: 1.8.0_25
> OS: OS X 10.9.5
> Processor: 2.6 GHz Intel Core i5
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)