Re: [GitHub] commons-lang pull request: Reimplemented normalize space

Anshul Zunke Fri, 20 Jun 2014 04:54:27 -0700

Hi

I have some doubt about your patch.


1. If we consider that

 WHITESPACE_PATTERN will be compiled only once during the class load
since its a static variable

private static final Pattern WHITESPACE_PATTERN = Pattern.compile("(?:
|\\u00A0|\\s|[\\s&&[^ ]])\\s*");


Then i doubt the code will take more time than yours: considering this
function is
called 'N'(in millions) number of times after the class has been
loaded. Compilation is the only time when

pattern takes time otherwise its smooth in O(N).

Consider a scenario when you need to call the api 1 million times on
strings like

"A    B         B       F            B           G         F"



2. Just a wage idea.

if we use str.trim() before starting the loop, we dont need to use the
variable
whitespacesCount which will remove some conditions from the code.

Thanks
Anshul Zunke




On Fri, Jun 20, 2014 at 3:21 PM, librucha <[email protected]> wrote:

> GitHub user librucha opened a pull request:
>
>     https://github.com/apache/commons-lang/pull/27
>
>     Reimplemented normalize space
>
>     Hi.
>     I reimplemented normalize space method of String utils, because regexp
> is much more rich operation than array traversing and I use this
> normalization often.
>     Here is benchmark created using google caliper.
>
>      0% Scenario{vm=java, trial=0, benchmark=ApacheNormalize, length=0}
> 1553.39 ns; σ=142.64 ns @ 10 trials
>     13% Scenario{vm=java, trial=0, benchmark=NewNormalize, length=0}
> 114.80 ns; σ=1.00 ns @ 3 trials
>     25% Scenario{vm=java, trial=0, benchmark=ApacheNormalize, length=10}
> 1513.44 ns; σ=133.58 ns @ 10 trials
>     38% Scenario{vm=java, trial=0, benchmark=NewNormalize, length=10}
> 114.49 ns; σ=1.05 ns @ 3 trials
>     50% Scenario{vm=java, trial=0, benchmark=ApacheNormalize, length=100}
> 1545.23 ns; σ=107.97 ns @ 10 trials
>     63% Scenario{vm=java, trial=0, benchmark=NewNormalize, length=100}
> 114.28 ns; σ=0.65 ns @ 3 trials
>     75% Scenario{vm=java, trial=0, benchmark=ApacheNormalize, length=1000}
> 1550.35 ns; σ=138.57 ns @ 10 trials
>     88% Scenario{vm=java, trial=0, benchmark=NewNormalize, length=1000}
> 115.16 ns; σ=1.14 ns @ 3 trials
>
>           benchmark length   ns linear runtime
>     ApacheNormalize      0 1553 ==============================
>     ApacheNormalize     10 1513 =============================
>     ApacheNormalize    100 1545 =============================
>     ApacheNormalize   1000 1550 =============================
>        NewNormalize      0  115 ==
>        NewNormalize     10  114 ==
>        NewNormalize    100  114 ==
>        NewNormalize   1000  115 ==
>
>     vm: java
>     trial: 0
>
> You can merge this pull request into a Git repository by running:
>
>     $ git pull https://github.com/librucha/commons-lang trunk
>
> Alternatively you can review and apply these changes as the patch at:
>
>     https://github.com/apache/commons-lang/pull/27.patch
>
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
>
>     This closes #27
>
> ----
> commit 07437c259692e6f6ecdcef86411586df6800cffd
> Author: Libor Ondrusek <[email protected]>
> Date:   2014-06-20T09:36:43Z
>
>     Reimplemented normalize space
>
> ----
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at [email protected] or file a JIRA ticket
> with INFRA.
> ---
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


-- 
Anshul Zunke

Re: [GitHub] commons-lang pull request: Reimplemented normalize space

Reply via email to