What happens when a token contains an unpermitted character?

On Wed, Aug 9, 2023 at 8:30 PM Daniel Watson <dcwatso...@gmail.com> wrote:
>
> Here's my stab at a spec. Wanted to clarify some parts of the Case
> interface first before jumping into the implementations. Wondering what a
> good package name for this stuff is, given that "case" is a reserved word?
>
> Case (interface)
> The Case interface defines two methods:
> * String format(Iterable<String> tokens)
> The format method accepts an Iterable of String tokens and returns a single
> String formatted according to the implementation. The format method is
> intended to handle transforming between cases, thus tokens passed to the
> format() method need not be properly formatted for the given Case instance,
> though they must still respect any reserve character restrictions.
> * List<String> parse(String string)
> The parse method accepts a single string and returns a List of string
> tokens that abide by the Case implementation.
> Note: format() and parse() methods must be fully reciprocal. ie. On a
> single Case instance, when calling parse() with a valid string, and passing
> the resulting tokens into format(), a matching string should be returned.
>
> DelimitedCase (base class for kebab and snake)
> Defines a Case where all tokens are separated by a single character
> delimiter. The delimiter is considered a reserved character and is not
> allowed to appear within tokens when formatting. No further restrictions
> are placed on token contents by this base implementation. Tokens can
> contain any valid Java String character. DelimitedCases can support
> zero-length tokens, which can occur if there are no characters between two
> instances of the delimiter or if the parsed string begins or ends with the
> delimiter.
> Note: Other Case implementations may not support zero-length tokens, and
> attempts to call format(...) with empty tokens may fail.
>
> KebabCase
> Extends DelimitedCase and initializes the delimiter as the hyphen '-'
> character. This case allows only alphanumeric characters within tokens.
>
> SnakeCase
> Extends DelimitedCase and initializes the delimiter as the underscore '_'
> character. This case allows only alphanumeric characters within tokens.
>
> PascalCase
> Defines a Case where tokens begin with an uppercase alpha character. All
> subsequent token characters must be lowercase alpha or numeric characters.
> Whenever an uppercase alpha character is encountered, the previous token is
> considered complete and a new token begins, with the uppercase character
> being the first character of the new token. PascalCase does not allow
> zero-length tokens when formatting, as it would violate the reciprocal
> contract of format() and parse().
>
> CamelCase
> Extends PascalCase and sets one additional restriction - that the first
> character of the first token (ie the first character of the full string)
> must be a lowercase alpha character (rather than the uppercase requirement
> of PascalCase). All other restrictions of PascalCase apply.
>
>
> On Tue, Aug 8, 2023 at 8:55 PM Daniel Watson <dcwatso...@gmail.com> wrote:
>
> > Kebab case is extremely common for web identifiers, eg html element ids,
> > classes, attributes, etc.
> >
> > In regards to PascalCase, i agree that most people won't understand the
> > reasoning behind the name, but it is nevertheless a widely accepted term
> > for that case style. If an alternative is deemed necessary then
> > "ProperCase" might work - since that is also how English proper nouns are
> > cased. Understanding that name just depends on your knowledge of English
> > grammar.
> >
> > A spec can definitely be written for the 4 provided concrete
> > implementations. And... I may eat these words but... the spec should not be
> > all that complex. I will take a stab at it.
> >
> > Thanks for the feedback!
> > Any other thoughts or comments are welcome!
> >
> > Dan
> >
> >
> > On Tue, Aug 8, 2023, 7:45 PM Elliotte Rusty Harold <elh...@ibiblio.org>
> > wrote:
> >
> >> This is a good idea and seems like useful functionality. In order to
> >> accept it into commons, it needs solid documentation and excellent
> >> test coverage. I've worked on code like this in another language (not
> >> Java) and the production bugs were bad. E.g. what happens when a
> >> string contains numbers as well as letters?
> >>
> >> I'd like to see a full spec that unambiguously defines how every
> >> Unicode string is converted into camel/snake/kebab case. The spec
> >> should be independent of the code. That's not easy to write but it's
> >> essential.
> >>
> >> I don't want any loose/strict modes. It should all be strict according to
> >> spec.
> >>
> >> I've never heard of kebab cases before. Is that a common name? I'd
> >> also like to rename Pascal case. How many programmers under 40 have
> >> even heard of Pascal, much less are familiar with its case
> >> conventions?
> >>
> >> Long story short - a PR is premature until there's an agreed upon spec.
> >>
> >> On Tue, Aug 8, 2023 at 8:04 PM Daniel Watson <dcwatso...@gmail.com>
> >> wrote:
> >> >
> >> > I have a bit of code that adds the ability to parse and format strings
> >> into
> >> > various case patterns. Wanted to check if it's of worth and in-scope for
> >> > commons-text...
> >> >
> >> > Its a bit broader than the existing CaseUtils.toCamelCase(...) method.
> >> > Rather than simply formatting tokens into the case, this API adds the
> >> > additional goal of being able to transform one case to another. e.g.
> >> >
> >> > SnakeCase.format(PascalCase.parse("MyPascalString")); // returns
> >> > My_Pascal_String
> >> > CamelCase.format(SnakeCase.parse("my_snake_string")); // returns
> >> > mySnakeString
> >> > KebabCase.format(CamelCase.parse("myCamelString")); // returns
> >> > my-Camel-String
> >> > //Note that kebab and snake do not alter the alphabetic case of the
> >> tokens,
> >> > as they are essentially case agnostic joining, according to this
> >> > implementation. Though this can be overridden by end users.
> >> >
> >> > The API has one core interface: Case, which has format and parse
> >> methods.
> >> > There is a single abstract implementation of it -
> >> AbstractConfigurableCase
> >> > - which is a configuration driven way to create a case pattern. It has
> >> > enough options to accommodate the 4 popular cases, and thus the
> >> subclasses
> >> > just have to configure these options rather than implement them
> >> directly.
> >> > Any further extensions can override or extend the api as necessary.
> >> >
> >> > There are five core concrete implementations:
> >> >
> >> > PascalCase
> >> > CamelCase (extends PascalCase)
> >> > DelimitedCase
> >> > KebabCase (extends DelimitedCase)
> >> > SnakeCase (extends DelimitedCase)
> >> >
> >> > Each has a static INSTANCE field to avoid redundant instantiation.
> >> >
> >> > Some of my reasoning / concerns...
> >> >
> >> > * I considered bundling all of this logic into static methods, similar
> >> to
> >> > CaseUtils, but that prevents the user from truly customizing or
> >> extending
> >> > the code for odd cases. This approach is, in my opinion, far easier to
> >> > understand, extend, and debug.
> >> > * I believe the parsing side should potentially have a loose / strict
> >> mode,
> >> > in that the logic can ignore non-critical rules on the parsing side.
> >> e.g.
> >> > the command CamelCase.parse("MyString") should work, even though the
> >> input
> >> > is not strictly camel case. Strict parsing would ensure (if possible)
> >> that
> >> > the input abides by all elements of the format.
> >> > * I'm still unsure about how best to handle reserved characters when
> >> > translating. e.g. How should
> >> > KebabCase.format(PascalCase.parse("MyPascal-String")) handle the hyphen?
> >> > Should the kebab case strip the reserved character from the token
> >> values?
> >> >
> >> > Long story short - is this worth pursuing in the form of a pull request
> >> for
> >> > review? Or is it out of scope for commons-text?
> >> >
> >> > Dan
> >>
> >>
> >>
> >> --
> >> Elliotte Rusty Harold
> >> elh...@ibiblio.org
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> >> For additional commands, e-mail: dev-h...@commons.apache.org
> >>
> >>



-- 
Elliotte Rusty Harold
elh...@ibiblio.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to