[text] Re: CharSequence vs. String (was Re: [GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...)
>If a method doesn't intrinsically require a String, then I prefer CharSequence. It's probable that sooner or later something is going to demand a String, but that's not a good reason to be "that guy" :-) I lean towards using CharSequence when that makes sense too (i.e. suggesting we are working on code points, and supporting implementations of charsequence). The tdebatty/java-string-similarity library work only Strings I think. Others like LingPipe, ICU4J, Lucene, Apache Commons Text, and Apache OpenNLP use both CharSequence and String. Analysing the use of CharSequence and String could be an interesting idea for a blog post, and could even raise some tickets to fix consistency in the API of [text] or some other component/project. >Also, wouldn't some sort of low-space-overhead string storage be a good fit >for text? Sounds interesting. Normally when I have some idea like that for [text] (or for other projects/components) I either note it down somewhere (normally first at http://kinoshita.eti.br/todo/), and then file an issue like TEXT-71, TEXT-77, TEXT-78, or TEXT-79, to start investigating it. If you have some idea of how that could be implemented, or know about some projects for that, feel free to suggest it in a JIRA ticket, or start another thread here in the mailing list. Cheers Bruno From: Simon Spero <sesunc...@gmail.com> To: Commons Developers List <dev@commons.apache.org> Sent: Tuesday, 20 June 2017 1:39 AM Subject: CharSequence vs. String (was Re: [GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...) On Jun 12, 2017 10:47 AM, "arunvinudss" <g...@git.apache.org> wrote: Github user arunvinudss commented on a diff in the pull request: I am a bit biased towards using String instead of CharSequence . Yes CharSequence allows us to pass String Buffers and builders and other types as input potentially increasing the scope of the function but considering the nature of work we do in this particular method it may not necessarily be a good idea. My basic contention is that the minute we call toString() on a charSequence to do any sort of manipulation it becomes a costly operation and we may lose performance . True if the particular CharSequence is not in fact an instance of String. String::toString returns this. The bigger problem is that too many methods use String as a parameter or return type, when CharSequence would serve just as well. This indeed requires the invocation of Object::toString. For methods that use String as the return type, changing the result to CharSequence is source and binary incompatible, and properly so (since at some point the user may actually need a String). A generic method with Type parameter with CharSequence as bound (T extends CharSequence) can sometimes be useful, and can be added in addition to methods taking String arguments, but can't replace them. There are some places in javac that have special treatment for String - for example, the + operator , but jdk9 reduces that particular win by indyfying concat. If a method doesn't intrinsically require a String, then I prefer CharSequence. It's probable that sooner or later something is going to demand a String, but that's not a good reason to be "that guy" :-) Note: Strings can be an incredible waste of memory; 40 + ⌈length/4⌉ bytes (reduced to a mere 40 + ⌈length/8⌉ bytes in jdk9 when compact strings can be used). This is incredibly painful if you have a vast number of small "strings", which may not all need to be materialized simultaneously. See e.g. [1] (~50MiB of UTF-8 chars becomes ~250MiB of Strings. And since there's no individual humongous object they all get to make the journey from TLAB to Old Space the hard way. Note this predates jdk 9,but illustrates some of the win from compact strings) Storing the character data in a shared byte array is a huge win. Someone should tell the jdk implementors to look at applications that do this. Like, um, javac :-) Materializing these strings as possibly transient CharSequence's is really convenient... until some method just has to have a String Also, wouldn't some sort of low-space-overhead string storage be a good fit for text? Simon [1] Spero,S. (2015). Time And Relative Dimensions In Semantics: Is OWL Bigger On The Inside? OWLED 2015. Available at http://cgi.csc.liv.ac.uk/~valli/OWLED2015/OWLED_2015_paper_12.pdf - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
CharSequence vs. String (was Re: [GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...)
On Jun 12, 2017 10:47 AM, "arunvinudss"wrote: Github user arunvinudss commented on a diff in the pull request: I am a bit biased towards using String instead of CharSequence . Yes CharSequence allows us to pass String Buffers and builders and other types as input potentially increasing the scope of the function but considering the nature of work we do in this particular method it may not necessarily be a good idea. My basic contention is that the minute we call toString() on a charSequence to do any sort of manipulation it becomes a costly operation and we may lose performance . True if the particular CharSequence is not in fact an instance of String. String::toString returns this. The bigger problem is that too many methods use String as a parameter or return type, when CharSequence would serve just as well. This indeed requires the invocation of Object::toString. For methods that use String as the return type, changing the result to CharSequence is source and binary incompatible, and properly so (since at some point the user may actually need a String). A generic method with Type parameter with CharSequence as bound (T extends CharSequence) can sometimes be useful, and can be added in addition to methods taking String arguments, but can't replace them. There are some places in javac that have special treatment for String - for example, the + operator , but jdk9 reduces that particular win by indyfying concat. If a method doesn't intrinsically require a String, then I prefer CharSequence. It's probable that sooner or later something is going to demand a String, but that's not a good reason to be "that guy" :-) Note: Strings can be an incredible waste of memory; 40 + ⌈length/4⌉ bytes (reduced to a mere 40 + ⌈length/8⌉ bytes in jdk9 when compact strings can be used). This is incredibly painful if you have a vast number of small "strings", which may not all need to be materialized simultaneously. See e.g. [1] (~50MiB of UTF-8 chars becomes ~250MiB of Strings. And since there's no individual humongous object they all get to make the journey from TLAB to Old Space the hard way. Note this predates jdk 9,but illustrates some of the win from compact strings) Storing the character data in a shared byte array is a huge win. Someone should tell the jdk implementors to look at applications that do this. Like, um, javac :-) Materializing these strings as possibly transient CharSequence's is really convenient... until some method just has to have a String Also, wouldn't some sort of low-space-overhead string storage be a good fit for text? Simon [1] Spero,S. (2015). Time And Relative Dimensions In Semantics: Is OWL Bigger On The Inside? OWLED 2015. Available at http://cgi.csc.liv.ac.uk/~valli/OWLED2015/OWLED_2015_paper_12.pdf
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user asfgit closed the pull request at: https://github.com/apache/commons-text/pull/46 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user arunvinudss commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121417851 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * + * + * CaseUtils.toCamelCase(null, false) = null + * CaseUtils.toCamelCase("", false, *) = "" + * CaseUtils.toCamelCase(*, false, null) = * + * CaseUtils.toCamelCase(*, true, new char[0]) = * + * CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'}) = "toCamelCase" + * CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "toCamelCase" + * CaseUtils.toCamelCase(" @to @ Camel case", false, new char[]{'@'}) = "toCamelCase" + * + * + * @param str the String to be converted to camelCase, may be null + * @param capitalizeFirstLetter boolean that determines if the first character of first word should be title case. + * @param delimiters set of characters to determine capitalization, null and/or empty array means whitespace + * @return camelCase of String, null if null String input + */ +public static String toCamelCase(String str, boolean capitalizeFirstLetter, final char... delimiters) { --- End diff -- I am a bit biased towards using String instead of CharSequence . Yes CharSequence allows us to pass String Buffers and builders and other types as input potentially increasing the scope of the function but considering the nature of work we do in this particular method it may not necessarily be a good idea. My basic contention is that the minute we call toString() on a charSequence to do any sort of manipulation it becomes a costly operation and we may lose performance . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail:
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user arunvinudss commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121408959 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * + * + * CaseUtils.toCamelCase(null, false) = null + * CaseUtils.toCamelCase("", false, *) = "" + * CaseUtils.toCamelCase(*, false, null) = * + * CaseUtils.toCamelCase(*, true, new char[0]) = * + * CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'}) = "toCamelCase" + * CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "toCamelCase" + * CaseUtils.toCamelCase(" @to @ Camel case", false, new char[]{'@'}) = "toCamelCase" + * + * + * @param str the String to be converted to camelCase, may be null + * @param capitalizeFirstLetter boolean that determines if the first character of first word should be title case. + * @param delimiters set of characters to determine capitalization, null and/or empty array means whitespace + * @return camelCase of String, null if null String input + */ +public static String toCamelCase(String str, boolean capitalizeFirstLetter, final char... delimiters) { --- End diff -- Yes the boolean should be final not sure why I missed that . The input string is not final as you mentioned I change it to lower case inside the method . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user arunvinudss commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121406140 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * + * + * CaseUtils.toCamelCase(null, false) = null + * CaseUtils.toCamelCase("", false, *) = "" + * CaseUtils.toCamelCase(*, false, null) = * + * CaseUtils.toCamelCase(*, true, new char[0]) = * + * CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'}) = "toCamelCase" + * CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "toCamelCase" --- End diff -- Yes it's a documentation error I will update it . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user chtompki commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121374008 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * + * + * CaseUtils.toCamelCase(null, false) = null + * CaseUtils.toCamelCase("", false, *) = "" + * CaseUtils.toCamelCase(*, false, null) = * + * CaseUtils.toCamelCase(*, true, new char[0]) = * + * CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'}) = "toCamelCase" + * CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "toCamelCase" --- End diff -- Wouldn't we want ```java CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "ToCamelCase" ``` because of the `true` boolean? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user chtompki commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121376797 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * + * + * CaseUtils.toCamelCase(null, false) = null + * CaseUtils.toCamelCase("", false, *) = "" + * CaseUtils.toCamelCase(*, false, null) = * + * CaseUtils.toCamelCase(*, true, new char[0]) = * + * CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'}) = "toCamelCase" + * CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "toCamelCase" + * CaseUtils.toCamelCase(" @to @ Camel case", false, new char[]{'@'}) = "toCamelCase" + * + * + * @param str the String to be converted to camelCase, may be null + * @param capitalizeFirstLetter boolean that determines if the first character of first word should be title case. + * @param delimiters set of characters to determine capitalization, null and/or empty array means whitespace + * @return camelCase of String, null if null String input + */ +public static String toCamelCase(String str, boolean capitalizeFirstLetter, final char... delimiters) { --- End diff -- I think that we'll want: ```java public static String toCamelCase(final String str, final boolean capitalizeFirstLetter, final char... delimiters) ``` but I see that you've fiddled with the `str` variable below. So I'm open to discussion on that point. Generally, I'm torn on whether to have the signature be: ```java public static String toCamelCase(final String str, final boolean capitalizeFirstLetter, final char... delimiters); ``` or ```java public static String toCamelCase(final CharSequence str, final boolean capitalizeFirstLetter, final char... delimiters); ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user chtompki commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121380327 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * + * + * CaseUtils.toCamelCase(null, false) = null + * CaseUtils.toCamelCase("", false, *) = "" + * CaseUtils.toCamelCase(*, false, null) = * + * CaseUtils.toCamelCase(*, true, new char[0]) = * + * CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'}) = "toCamelCase" + * CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "toCamelCase" + * CaseUtils.toCamelCase(" @to @ Camel case", false, new char[]{'@'}) = "toCamelCase" + * + * + * @param str the String to be converted to camelCase, may be null + * @param capitalizeFirstLetter boolean that determines if the first character of first word should be title case. + * @param delimiters set of characters to determine capitalization, null and/or empty array means whitespace + * @return camelCase of String, null if null String input + */ +public static String toCamelCase(String str, boolean capitalizeFirstLetter, final char... delimiters) { +if (StringUtils.isEmpty(str)) { +return str; +} +str = str.toLowerCase(); +int strLen = str.length(); +int [] newCodePoints = new int[strLen]; +int outOffset = 0; +Set delimiterSet = generateDelimiterSet(delimiters); +boolean capitalizeNext = false; +if (capitalizeFirstLetter) { +capitalizeNext = true; +} +for (int index = 0; index < strLen;) { +final int codePoint = str.codePointAt(index); + +if (delimiterSet.contains(codePoint)) { +capitalizeNext = true; +if (outOffset == 0) { +capitalizeNext = false; +} +index += Character.charCount(codePoint); +} else if (capitalizeNext || (outOffset == 0 &&
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user chtompki commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121374329 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { --- End diff -- You could make the constructor `private`, but I do like the consistency with [`org.apache.commons.lang3.StringUtils.java`](https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L176-L186) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user chtompki commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121380270 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * --- End diff -- I wonder if, like in [`Character.toUpperCase`](https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#toUpperCase-char-), we should document that we are not locale based which I support. Adding in `Locale` to the mix throws added complexity that may not be needed. Or document that we're relying on `Character.toTitleCase` which has no `Locale` based analogue in `String`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user chtompki commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121378617 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * + * + * CaseUtils.toCamelCase(null, false) = null + * CaseUtils.toCamelCase("", false, *) = "" + * CaseUtils.toCamelCase(*, false, null) = * + * CaseUtils.toCamelCase(*, true, new char[0]) = * + * CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'}) = "toCamelCase" + * CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "toCamelCase" + * CaseUtils.toCamelCase(" @to @ Camel case", false, new char[]{'@'}) = "toCamelCase" + * + * + * @param str the String to be converted to camelCase, may be null + * @param capitalizeFirstLetter boolean that determines if the first character of first word should be title case. + * @param delimiters set of characters to determine capitalization, null and/or empty array means whitespace + * @return camelCase of String, null if null String input + */ +public static String toCamelCase(String str, boolean capitalizeFirstLetter, final char... delimiters) { +if (StringUtils.isEmpty(str)) { +return str; +} +str = str.toLowerCase(); +int strLen = str.length(); +int [] newCodePoints = new int[strLen]; +int outOffset = 0; +Set delimiterSet = generateDelimiterSet(delimiters); +boolean capitalizeNext = false; +if (capitalizeFirstLetter) { +capitalizeNext = true; +} +for (int index = 0; index < strLen;) { +final int codePoint = str.codePointAt(index); --- End diff -- If we decide to go with `CharSequence` here, this becomes a tougher move. We have to begin worrying about UTF-16 and "surrogate pairs' of characters. See [docs here](https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#unicode) --- If your project is set up for it, you can reply to this email and have your
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user chtompki commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121376091 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * + * + * CaseUtils.toCamelCase(null, false) = null + * CaseUtils.toCamelCase("", false, *) = "" + * CaseUtils.toCamelCase(*, false, null) = * + * CaseUtils.toCamelCase(*, true, new char[0]) = * + * CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'}) = "toCamelCase" + * CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "toCamelCase" + * CaseUtils.toCamelCase(" @to @ Camel case", false, new char[]{'@'}) = "toCamelCase" + * + * + * @param str the String to be converted to camelCase, may be null + * @param capitalizeFirstLetter boolean that determines if the first character of first word should be title case. + * @param delimiters set of characters to determine capitalization, null and/or empty array means whitespace + * @return camelCase of String, null if null String input + */ +public static String toCamelCase(String str, boolean capitalizeFirstLetter, final char... delimiters) { +if (StringUtils.isEmpty(str)) { +return str; +} +str = str.toLowerCase(); +int strLen = str.length(); +int [] newCodePoints = new int[strLen]; +int outOffset = 0; +Set delimiterSet = generateDelimiterSet(delimiters); +boolean capitalizeNext = false; +if (capitalizeFirstLetter) { +capitalizeNext = true; +} +for (int index = 0; index < strLen;) { +final int codePoint = str.codePointAt(index); + +if (delimiterSet.contains(codePoint)) { +capitalizeNext = true; +if (outOffset == 0) { +capitalizeNext = false; +} +index += Character.charCount(codePoint); +} else if (capitalizeNext || (outOffset == 0 &&
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
Github user chtompki commented on a diff in the pull request: https://github.com/apache/commons-text/pull/46#discussion_r121376021 --- Diff: src/main/java/org/apache/commons/text/CaseUtils.java --- @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.text; + +import org.apache.commons.lang3.StringUtils; + +import java.util.HashSet; +import java.util.Set; + +/** + * Case manipulation operations on Strings that contain words. + * + * This class tries to handle null input gracefully. + * An exception will not be thrown for a null input. + * Each method documents its behaviour in more detail. + * + * @since 1.0 + */ +public class CaseUtils { + +/** + * CaseUtils instances should NOT be constructed in + * standard programming. Instead, the class should be used as + * CaseUtils.toCamelCase("foo bar", true, new char[]{'-'});. + * + * This constructor is public to permit tools that require a JavaBean + * instance to operate. + */ +public CaseUtils() { +super(); +} + +// Camel Case + //--- +/** + * Converts all the delimiter separated words in a String into camelCase, + * that is each word is made up of a titlecase character and then a series of + * lowercase characters. The + * + * The delimiters represent a set of characters understood to separate words. + * The first non-delimiter character after a delimiter will be capitalized. The first String + * character may or may not be capitalized and it's determined by the user input for capitalizeFirstLetter + * variable. + * + * A null input String returns null. + * Capitalization uses the Unicode title case, normally equivalent to + * upper case. + * + * + * CaseUtils.toCamelCase(null, false) = null + * CaseUtils.toCamelCase("", false, *) = "" + * CaseUtils.toCamelCase(*, false, null) = * + * CaseUtils.toCamelCase(*, true, new char[0]) = * + * CaseUtils.toCamelCase("To.Camel.Case", false, new char[]{'.'}) = "toCamelCase" + * CaseUtils.toCamelCase(" to @ Camel case", true, new char[]{'@'}) = "toCamelCase" --- End diff -- After looking at the tests, this appears to be merely a documentation error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...
GitHub user arunvinudss opened a pull request: https://github.com/apache/commons-text/pull/46 TEXT-85:Added CaseUtils class with camel case conversion support You can merge this pull request into a Git repository by running: $ git pull https://github.com/arunvinudss/commons-text TEXT-85 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/commons-text/pull/46.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #46 commit 5b8c5ea3b7e39a49a9ee66588a2d8fbc5d8cc6e7 Author: Arun VinudDate: 2017-06-11T11:36:21Z TEXT-85:Added CaseUtils class with camel case conversion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org