[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109211#comment-16109211
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user asfgit closed the pull request at:

https://github.com/apache/commons-text/pull/57


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109121#comment-16109121
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user chtompki commented on the issue:

https://github.com/apache/commons-text/pull/57
  
Will get to this today.


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109052#comment-16109052
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user arunvinudss commented on the issue:

https://github.com/apache/commons-text/pull/57
  
@chtompki Please review and merge.


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109049#comment-16109049
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user coveralls commented on the issue:

https://github.com/apache/commons-text/pull/57
  

[![Coverage 
Status](https://coveralls.io/builds/12640859/badge)](https://coveralls.io/builds/12640859)

Coverage decreased (-0.2%) to 98.021% when pulling 
**fb6d5935451397c561bd52cf1d483975f83b2c7b on arunvinudss:TEXT-98** into 
**998764ebe38113eb51e6850058ca01936625dd92 on apache:master**.



> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099967#comment-16099967
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user ameyjadiye commented on the issue:

https://github.com/apache/commons-text/pull/57
  
@chtompki , I think whichever items are piled up for 2.x are not too 
critical, we should wait for 2.x release.
If we are releasing some major improvement or fix we can release all queued 
items in that. ATM I don't see anything.


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099935#comment-16099935
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user chtompki commented on the issue:

https://github.com/apache/commons-text/pull/57
  
This all opens the question about going `2.x`. I think we have a couple of 
things that would warrant a 2.x move. Do we want to attempt that in the fall, 
or is it still too premature?


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-07-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099520#comment-16099520
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user ameyjadiye commented on the issue:

https://github.com/apache/commons-text/pull/57
  
@arunvinudss , just addition to @PascalSchumacher comment , at this point 
we don't know other than Commons text who else having dependancy on 
`isDelimiter` so better we can make it depricated and we can remove code all 
together in 2.x


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-07-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099101#comment-16099101
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user PascalSchumacher commented on the issue:

https://github.com/apache/commons-text/pull/57
  
@arunvinudss While I agree that `isDelimiter` should have been private, it 
is public and was released with commons-text `1.1`. Due to the strict binary 
compatibilities promise of commons it can not be removed before `2.0`. For now 
the best we can do is mark it as deprecated and explain that it will be removed 
in version `2.0`.


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-07-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099033#comment-16099033
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user arunvinudss commented on the issue:

https://github.com/apache/commons-text/pull/57
  
@ameyjadiye I want to remove the isDelimiter method. I would be surprised 
if someone uses isDelimiter separately because all it does is to check if a 
given element is present in an array or not. Moreover the isDelimiter char 
version is already dead code as we don't use it anymore. I would say the scope 
of the isDelimiter should have been private.


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-07-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098884#comment-16098884
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user ameyjadiye commented on a diff in the pull request:

https://github.com/apache/commons-text/pull/57#discussion_r129102692
  
--- Diff: src/main/java/org/apache/commons/text/WordUtils.java ---
@@ -747,45 +750,29 @@ public static boolean containsAllWords(final 
CharSequence word, final CharSequen
 return true;
 }
 
-
//---
+// 
---
 /**
- * Is the character a delimiter.
+ * 
+ * Converts an array of delimiters to a hash set of code points. Code 
point of space(32) is added as the default
+ * value if delimiters is null. The generated hash set provides O(1) 
lookup time.
+ * 
  *
- * @param ch  the character to check
- * @param delimiters  the delimiters
- * @return true if it is a delimiter
+ * @param delimiters set of characters to determine capitalization, 
null means whitespace
+ * @return Set
  */
-public static boolean isDelimiter(final char ch, final char[] 
delimiters) {
-if (delimiters == null) {
-return Character.isWhitespace(ch);
-}
-for (final char delimiter : delimiters) {
-if (ch == delimiter) {
-return true;
+private static Set generateDelimiterSet(final char[] 
delimiters) {
+Set delimiterHashSet = new HashSet<>();
+if (delimiters == null || delimiters.length == 0) {
+if (delimiters == null) {
+delimiterHashSet.add(Character.codePointAt(new char[] {' 
'}, 0));
 }
+return delimiterHashSet;
 }
-return false;
-}
 
-  //---
-/**
- * Is the codePoint a delimiter.
- *
- * @param codePoint the codePint to check
- * @param delimiters  the delimiters
- * @return true if it is a delimiter
- */
-public static boolean isDelimiter(final int codePoint, final char[] 
delimiters) {
--- End diff --

Rather removing we should keep this method.


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-07-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098883#comment-16098883
 ] 

ASF GitHub Bot commented on TEXT-98:


Github user ameyjadiye commented on a diff in the pull request:

https://github.com/apache/commons-text/pull/57#discussion_r129102634
  
--- Diff: src/main/java/org/apache/commons/text/WordUtils.java ---
@@ -747,45 +750,29 @@ public static boolean containsAllWords(final 
CharSequence word, final CharSequen
 return true;
 }
 
-
//---
+// 
---
 /**
- * Is the character a delimiter.
+ * 
+ * Converts an array of delimiters to a hash set of code points. Code 
point of space(32) is added as the default
+ * value if delimiters is null. The generated hash set provides O(1) 
lookup time.
+ * 
  *
- * @param ch  the character to check
- * @param delimiters  the delimiters
- * @return true if it is a delimiter
+ * @param delimiters set of characters to determine capitalization, 
null means whitespace
+ * @return Set
  */
-public static boolean isDelimiter(final char ch, final char[] 
delimiters) {
--- End diff --

Rather removing we should keep this method.


> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls *isDelimiter* for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check

2017-07-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097920#comment-16097920
 ] 

ASF GitHub Bot commented on TEXT-98:


GitHub user arunvinudss opened a pull request:

https://github.com/apache/commons-text/pull/57

TEXT-98: Remove isDelimiter and use HashSets for delimiter checks



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arunvinudss/commons-text TEXT-98

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/commons-text/pull/57.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #57


commit eabb18efa39b1fbebf66d46282d6abc3f9b2c7aa
Author: Arun Vinud 
Date:   2017-07-23T14:57:37Z

Remove isDelimiter and using HashSets for delimiter checks




> Remove isDelimiter() and use HashSets for delimiter check
> -
>
> Key: TEXT-98
> URL: https://issues.apache.org/jira/browse/TEXT-98
> Project: Commons Text
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Arun Vinud 
>Priority: Minor
> Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in 
> *WordUtils* calls isDelimiter for every character and/or codepoint and 
> isDelimiter loops through the array of delimiters to check for the  
> occurrence. This is a bit inefficient and results in O(nk) complexity and it 
> can be reduced to O(n)[if n>k] or O(k)[if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)