Hi
I have some doubt about your patch.
1. If we consider that
WHITESPACE_PATTERN will be compiled only once during the class load
since its a static variable
private static final Pattern WHITESPACE_PATTERN = Pattern.compile("(?:
|\\u00A0|\\s|[\\s&&[^ ]])\\s*");
Then i doubt the code will take more time than yours: considering this
function is
called 'N'(in millions) number of times after the class has been
loaded. Compilation is the only time when
pattern takes time otherwise its smooth in O(N).
Consider a scenario when you need to call the api 1 million times on
strings like
"A B B F B G F"
2. Just a wage idea.
if we use str.trim() before starting the loop, we dont need to use the
variable
whitespacesCount which will remove some conditions from the code.
Thanks
Anshul Zunke
On Fri, Jun 20, 2014 at 3:21 PM, librucha <[email protected]> wrote:
> GitHub user librucha opened a pull request:
>
> https://github.com/apache/commons-lang/pull/27
>
> Reimplemented normalize space
>
> Hi.
> I reimplemented normalize space method of String utils, because regexp
> is much more rich operation than array traversing and I use this
> normalization often.
> Here is benchmark created using google caliper.
>
> 0% Scenario{vm=java, trial=0, benchmark=ApacheNormalize, length=0}
> 1553.39 ns; σ=142.64 ns @ 10 trials
> 13% Scenario{vm=java, trial=0, benchmark=NewNormalize, length=0}
> 114.80 ns; σ=1.00 ns @ 3 trials
> 25% Scenario{vm=java, trial=0, benchmark=ApacheNormalize, length=10}
> 1513.44 ns; σ=133.58 ns @ 10 trials
> 38% Scenario{vm=java, trial=0, benchmark=NewNormalize, length=10}
> 114.49 ns; σ=1.05 ns @ 3 trials
> 50% Scenario{vm=java, trial=0, benchmark=ApacheNormalize, length=100}
> 1545.23 ns; σ=107.97 ns @ 10 trials
> 63% Scenario{vm=java, trial=0, benchmark=NewNormalize, length=100}
> 114.28 ns; σ=0.65 ns @ 3 trials
> 75% Scenario{vm=java, trial=0, benchmark=ApacheNormalize, length=1000}
> 1550.35 ns; σ=138.57 ns @ 10 trials
> 88% Scenario{vm=java, trial=0, benchmark=NewNormalize, length=1000}
> 115.16 ns; σ=1.14 ns @ 3 trials
>
> benchmark length ns linear runtime
> ApacheNormalize 0 1553 ==============================
> ApacheNormalize 10 1513 =============================
> ApacheNormalize 100 1545 =============================
> ApacheNormalize 1000 1550 =============================
> NewNormalize 0 115 ==
> NewNormalize 10 114 ==
> NewNormalize 100 114 ==
> NewNormalize 1000 115 ==
>
> vm: java
> trial: 0
>
> You can merge this pull request into a Git repository by running:
>
> $ git pull https://github.com/librucha/commons-lang trunk
>
> Alternatively you can review and apply these changes as the patch at:
>
> https://github.com/apache/commons-lang/pull/27.patch
>
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
>
> This closes #27
>
> ----
> commit 07437c259692e6f6ecdcef86411586df6800cffd
> Author: Libor Ondrusek <[email protected]>
> Date: 2014-06-20T09:36:43Z
>
> Reimplemented normalize space
>
> ----
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at [email protected] or file a JIRA ticket
> with INFRA.
> ---
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
--
Anshul Zunke