[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

ASF GitHub Bot (JIRA) Mon, 24 Jul 2017 10:36:19 -0700

    [ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098884#comment-16098884
 ]


ASF GitHub Bot commented on TEXT-98:
------------------------------------

Github user ameyjadiye commented on a diff in the pull request:

    https://github.com/apache/commons-text/pull/57#discussion_r129102692
  
    --- Diff: src/main/java/org/apache/commons/text/WordUtils.java ---
    @@ -747,45 +750,29 @@ public static boolean containsAllWords(final 
CharSequence word, final CharSequen
             return true;
         }
     
    -    
//-----------------------------------------------------------------------
    +    // 
-----------------------------------------------------------------------
         /**
    -     * Is the character a delimiter.
    +     * <p>
    +     * Converts an array of delimiters to a hash set of code points. Code 
point of space(32) is added as the default
    +     * value if delimiters is null. The generated hash set provides O(1) 
lookup time.
    +     * </p>
          *
    -     * @param ch  the character to check
    -     * @param delimiters  the delimiters
    -     * @return true if it is a delimiter
    +     * @param delimiters set of characters to determine capitalization, 
null means whitespace
    +     * @return Set<Integer>
          */
    -    public static boolean isDelimiter(final char ch, final char[] 
delimiters) {
    -        if (delimiters == null) {
    -            return Character.isWhitespace(ch);
    -        }
    -        for (final char delimiter : delimiters) {
    -            if (ch == delimiter) {
    -                return true;
    +    private static Set<Integer> generateDelimiterSet(final char[] 
delimiters) {
    +        Set<Integer> delimiterHashSet = new HashSet<>();
    +        if (delimiters == null || delimiters.length == 0) {
    +            if (delimiters == null) {
    +                delimiterHashSet.add(Character.codePointAt(new char[] {' 
'}, 0));
                 }
    +            return delimiterHashSet;
             }
    -        return false;
    -    }
     
    -  //-----------------------------------------------------------------------
    -    /**
    -     * Is the codePoint a delimiter.
    -     *
    -     * @param codePoint the codePint to check
    -     * @param delimiters  the delimiters
    -     * @return true if it is a delimiter
    -     */
    -    public static boolean isDelimiter(final int codePoint, final char[] 
delimiters) {
    --- End diff --
    
    Rather removing we should keep this method.


> Remove isDelimiter() and use HashSets for delimiter check
> ---------------------------------------------------------
>
>                 Key: TEXT-98
>                 URL: https://issues.apache.org/jira/browse/TEXT-98
>             Project: Commons Text
>          Issue Type: Improvement
>    Affects Versions: 1.1
>            Reporter: Arun Vinud 
>            Priority: Minor
>             Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

Reply via email to