What happens when a token contains an unpermitted character? On Wed, Aug 9, 2023 at 8:30 PM Daniel Watson <dcwatso...@gmail.com> wrote: > > Here's my stab at a spec. Wanted to clarify some parts of the Case > interface first before jumping into the implementations. Wondering what a > good package name for this stuff is, given that "case" is a reserved word? > > Case (interface) > The Case interface defines two methods: > * String format(Iterable<String> tokens) > The format method accepts an Iterable of String tokens and returns a single > String formatted according to the implementation. The format method is > intended to handle transforming between cases, thus tokens passed to the > format() method need not be properly formatted for the given Case instance, > though they must still respect any reserve character restrictions. > * List<String> parse(String string) > The parse method accepts a single string and returns a List of string > tokens that abide by the Case implementation. > Note: format() and parse() methods must be fully reciprocal. ie. On a > single Case instance, when calling parse() with a valid string, and passing > the resulting tokens into format(), a matching string should be returned. > > DelimitedCase (base class for kebab and snake) > Defines a Case where all tokens are separated by a single character > delimiter. The delimiter is considered a reserved character and is not > allowed to appear within tokens when formatting. No further restrictions > are placed on token contents by this base implementation. Tokens can > contain any valid Java String character. DelimitedCases can support > zero-length tokens, which can occur if there are no characters between two > instances of the delimiter or if the parsed string begins or ends with the > delimiter. > Note: Other Case implementations may not support zero-length tokens, and > attempts to call format(...) with empty tokens may fail. > > KebabCase > Extends DelimitedCase and initializes the delimiter as the hyphen '-' > character. This case allows only alphanumeric characters within tokens. > > SnakeCase > Extends DelimitedCase and initializes the delimiter as the underscore '_' > character. This case allows only alphanumeric characters within tokens. > > PascalCase > Defines a Case where tokens begin with an uppercase alpha character. All > subsequent token characters must be lowercase alpha or numeric characters. > Whenever an uppercase alpha character is encountered, the previous token is > considered complete and a new token begins, with the uppercase character > being the first character of the new token. PascalCase does not allow > zero-length tokens when formatting, as it would violate the reciprocal > contract of format() and parse(). > > CamelCase > Extends PascalCase and sets one additional restriction - that the first > character of the first token (ie the first character of the full string) > must be a lowercase alpha character (rather than the uppercase requirement > of PascalCase). All other restrictions of PascalCase apply. > > > On Tue, Aug 8, 2023 at 8:55 PM Daniel Watson <dcwatso...@gmail.com> wrote: > > > Kebab case is extremely common for web identifiers, eg html element ids, > > classes, attributes, etc. > > > > In regards to PascalCase, i agree that most people won't understand the > > reasoning behind the name, but it is nevertheless a widely accepted term > > for that case style. If an alternative is deemed necessary then > > "ProperCase" might work - since that is also how English proper nouns are > > cased. Understanding that name just depends on your knowledge of English > > grammar. > > > > A spec can definitely be written for the 4 provided concrete > > implementations. And... I may eat these words but... the spec should not be > > all that complex. I will take a stab at it. > > > > Thanks for the feedback! > > Any other thoughts or comments are welcome! > > > > Dan > > > > > > On Tue, Aug 8, 2023, 7:45 PM Elliotte Rusty Harold <elh...@ibiblio.org> > > wrote: > > > >> This is a good idea and seems like useful functionality. In order to > >> accept it into commons, it needs solid documentation and excellent > >> test coverage. I've worked on code like this in another language (not > >> Java) and the production bugs were bad. E.g. what happens when a > >> string contains numbers as well as letters? > >> > >> I'd like to see a full spec that unambiguously defines how every > >> Unicode string is converted into camel/snake/kebab case. The spec > >> should be independent of the code. That's not easy to write but it's > >> essential. > >> > >> I don't want any loose/strict modes. It should all be strict according to > >> spec. > >> > >> I've never heard of kebab cases before. Is that a common name? I'd > >> also like to rename Pascal case. How many programmers under 40 have > >> even heard of Pascal, much less are familiar with its case > >> conventions? > >> > >> Long story short - a PR is premature until there's an agreed upon spec. > >> > >> On Tue, Aug 8, 2023 at 8:04 PM Daniel Watson <dcwatso...@gmail.com> > >> wrote: > >> > > >> > I have a bit of code that adds the ability to parse and format strings > >> into > >> > various case patterns. Wanted to check if it's of worth and in-scope for > >> > commons-text... > >> > > >> > Its a bit broader than the existing CaseUtils.toCamelCase(...) method. > >> > Rather than simply formatting tokens into the case, this API adds the > >> > additional goal of being able to transform one case to another. e.g. > >> > > >> > SnakeCase.format(PascalCase.parse("MyPascalString")); // returns > >> > My_Pascal_String > >> > CamelCase.format(SnakeCase.parse("my_snake_string")); // returns > >> > mySnakeString > >> > KebabCase.format(CamelCase.parse("myCamelString")); // returns > >> > my-Camel-String > >> > //Note that kebab and snake do not alter the alphabetic case of the > >> tokens, > >> > as they are essentially case agnostic joining, according to this > >> > implementation. Though this can be overridden by end users. > >> > > >> > The API has one core interface: Case, which has format and parse > >> methods. > >> > There is a single abstract implementation of it - > >> AbstractConfigurableCase > >> > - which is a configuration driven way to create a case pattern. It has > >> > enough options to accommodate the 4 popular cases, and thus the > >> subclasses > >> > just have to configure these options rather than implement them > >> directly. > >> > Any further extensions can override or extend the api as necessary. > >> > > >> > There are five core concrete implementations: > >> > > >> > PascalCase > >> > CamelCase (extends PascalCase) > >> > DelimitedCase > >> > KebabCase (extends DelimitedCase) > >> > SnakeCase (extends DelimitedCase) > >> > > >> > Each has a static INSTANCE field to avoid redundant instantiation. > >> > > >> > Some of my reasoning / concerns... > >> > > >> > * I considered bundling all of this logic into static methods, similar > >> to > >> > CaseUtils, but that prevents the user from truly customizing or > >> extending > >> > the code for odd cases. This approach is, in my opinion, far easier to > >> > understand, extend, and debug. > >> > * I believe the parsing side should potentially have a loose / strict > >> mode, > >> > in that the logic can ignore non-critical rules on the parsing side. > >> e.g. > >> > the command CamelCase.parse("MyString") should work, even though the > >> input > >> > is not strictly camel case. Strict parsing would ensure (if possible) > >> that > >> > the input abides by all elements of the format. > >> > * I'm still unsure about how best to handle reserved characters when > >> > translating. e.g. How should > >> > KebabCase.format(PascalCase.parse("MyPascal-String")) handle the hyphen? > >> > Should the kebab case strip the reserved character from the token > >> values? > >> > > >> > Long story short - is this worth pursuing in the form of a pull request > >> for > >> > review? Or is it out of scope for commons-text? > >> > > >> > Dan > >> > >> > >> > >> -- > >> Elliotte Rusty Harold > >> elh...@ibiblio.org > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> For additional commands, e-mail: dev-h...@commons.apache.org > >> > >>
-- Elliotte Rusty Harold elh...@ibiblio.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org