theshoeshiner commented on code in PR #450:
URL: https://github.com/apache/commons-text/pull/450#discussion_r1324802675


##########
src/main/java/org/apache/commons/text/cases/CamelCase.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.text.cases;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.commons.lang3.CharUtils;
+import org.apache.commons.lang3.StringUtils;
+
+/**
+ * Case implementation that parses and formats strings of the form 
'myCamelCase'
+ * <p>
+ * This case separates tokens on uppercase ASCII alpha characters. Each token 
begins with an

Review Comment:
   So, in making this change, I ran into another quirk. Unicode has 3 cases 
(UPPER, Lower, and Title). In my updated implementation, I was only handling 
upper and lower. i.e. in order to convert the token `Xabc` into a Pascal 
string, the `X` character _must_ translate to (or already be) an uppercase 
character. However, one might argue that having a titlecase character is also 
sufficient, since the purpose of the Pascal case is to uppercase only the first 
character, which is also the purpose of unicode titlecase.
   
   So this string `dzabc` (note that "dz" is the single digraph character) 
could be represented as either:
   
   - `DZabc` (dz to uppercase)
   - `Dzabc` (dz to titlecase)
   
   For that specific character we have two possible options, however there are 
also lowercase characters whose uppercase character is of the **titlecase** 
category. Which means anywhere we use the `Character.isUpperCase` check, it 
will fail, despite the fact that kinda works.
   
   I see a couple questions / decisions...
   
   - Should we allow titlecase characters to serve as uppercase?
   - Should we _prefer_ unicode titlecase over uppercase when both exist?
   - Should the user be able to choose?
   
   (In all cases, if a character cannot be mapped to lowercase, then logic that 
requires it will throw an exception).
   
   FWIW - my inclination is to handle only upper/lower case. Title case feels 
more like a feature of proper grammar, and this library makes no such 
assumptions. A character is a character, and in reality we have no clue what 
language the user is using. So `dzabc` would transform/output to `DZabc`. The 
user needs to be aware either way that they're using Unicode and that DZ is a 
single character.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to