> On 19 Nov 2016, at 15:38, Rob Tompkins <chtom...@gmail.com> wrote:
> 
> 
>> On Nov 19, 2016, at 6:33 AM, Benedikt Ritter <brit...@apache.org> wrote:
>> 
>> Hello Gray,
>> 
>> Gary Gregory <garydgreg...@gmail.com> schrieb am Sa., 19. Nov. 2016 um
>> 01:07 Uhr:
>> 
>>> Just a thought:
>>> 
>>> Does all the current (and future) string escaping code (XML, HTML, ...)
>>> really belong in [lang]? Would it be more natural to have it in [text]?
>>> 
>> 
>> My view on the whole think currently is, that we put stuff that is related
>> to strings in Lang. Code that works on texts should go to Text. To me a
>> text is more than just a string. A text contains works, that make up
>> sentences, which in turn build paragraphs.
>> 
>> Using this description, I'd argue that escaping belongs into lang and not
>> into text, because it works on individual characters rather than on texts.
> 
> I think this is a difficult distinction to draw because fundamentally 
> anything that does sufficient text processing necessarily operates on a 
> character by character basis. I propose below a distinction more along the 
> lines of potential usage.
> 
>> 
>> But this would also raise the question if the various edit distance
>> algorithms works on texts or on strings. So maybe my distinction is not
>> good at all.
>> 
>> Do we need to better specify the scope of text?
> 
> I definitely agree with the sentiment that we should find a clear line of 
> distinction between lang and text with regards to strings. Some thoughts that 
> spring to mind are more in the terms of how the algorithms are to be used. 
> 
> So let’s consider the two extremes of the spectrum of string/word/text 
> algorithms. On one hand, we have utilities like “StringUtils.isBlank(String 
> s)” which is ubiquitously used in standard day to day and is a foundational 
> extension of java. On the other hand, we have algorithms like natural 
> language processing or statistical processing of words for analysis of 
> biological sequences (two chapters in M. Lothaire’s “Applied Combinatorics on 
> Words). The extremes seem to point towards day-to-day usage in any variety of 
> java applications, where as the other extreme seems to point to an 
> application that is specifically designed at string/word/text processing. I 
> don’t see folks in everyday usage wanting to find edit distance between two 
> strings unless they’re writing something specifically doing text processing 
> or something of that nature.
> 
> Now clearly the problem with this distinction is the amount of grey area that 
> it leaves in figuring out what goes where, so I don’t know if it’s the right 
> way to go. It was just the thought that came to mind.
> 
> Any thoughts out there?

I think you're on the right track here. Lang is supposed to plug the gaps in 
Java's core packages. A certain amount of text manipulation is expected in many 
applications, but once we get into the realms of statistical analysis or fuzzy 
comparison methods then we've moved beyond that. 

Perhaps a tongue-in-cheek definition of "if you had to consult a book to write 
that, it belongs in Text". 

Duncan

> 
> Cheers,
> -Rob
> 
>> 
>> Benedikt
>> 
>> 
>>> 
>>> Gary
>>> 
>>> --
>>> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
>>> Java Persistence with Hibernate, Second Edition
>>> <
>>> https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8
>>>> 
>>> 
>>> <http:////
>>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
>>> JUnit in Action, Second Edition
>>> <
>>> https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22
>>>> 
>>> 
>>> <http:////
>>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
>>> Spring Batch in Action
>>> <
>>> https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action
>>>> 
>>> <http:////
>>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to