Volkan,

Thanks for the performance patch. I reviewed it and it looks pretty good. Pre 
patch, we were running each ngram set thru some raw string processing 
normalizations. You patch does a good job moving that to the beginning and 
optimizing the regex. Good job :)

As for pattern matching, if you look at the normalization method, we only look 
at alpha-numerics. This was done for simplicity sake. The downside here is that 
we weaken any pattern which contains non alpha numerics. There are several ways 
to address and fix this, but since DeviceMap has control over its own data, I 
prefer fixing the patterns and keeping the matching engine simple. The thing to 
remember is that our data came from OpenDDR which had a more complex 
classification algorithm and heuristics, so we kind of have a bit of legacy 
baggage to sort thru as this project evolves.

Regarding our next release, I already have the Java client 1.1.0 ready to go. I 
would like to get your patch in on the next release, 1.1.1.

Reza


      From: Volkan YAZICI <[email protected]>
 To: "[email protected]" <[email protected]> 
 Sent: Wednesday, December 10, 2014 9:32 AM
 Subject: 2x Performance Increase in classify()
   
Good news everyone!

Here is the patch that introduces JMH-based benchmarks for Java client:
DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>

And here is the patch that introduces >2x performance gain: DMAP-107
<https://issues.apache.org/jira/browse/DMAP-107>

*Sample output:*

$ export userAgentFile=/path/to/user-agents.txt
$ wc -l $userAgentFile
195325
$ java \
    -jar 
devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
\
    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
-Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
    ".*DeviceMapClientBenchmark.*"

# Using the most recent trunk.
Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
stdev = 1160.484
  Confidence interval (99.9%): [10838.781, 13320.036]

# Using the enhanced classify().
Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev = 413.211
  Confidence interval (99.9%): [5063.607, 5947.103]


Cheers!

  

Reply via email to