On Tue, May 02, 2017 at 11:45:35PM +1000, Nick Coghlan wrote:

> Attempting to align the terminology with existing string methods and
> other stdlib APIs:
[...]
> 1. we don't have any current APIs or documentation that use "chunk" in
> combination with any kind of delimiter
> 2. we don't have any current APIs or documentation that use "chunk" as
> a verb - they all use it as a noun

English has a long and glorious tradition of verbing nouns, and nouning 
verbs. Group can mean the action of putting things into a group, join 
likewise refers to both the action of attaching two things and the seam 
or joint where they have been joined. Likewise for chunking:

https://duckduckgo.com/html/?q=chunking

"Chunk" has used as a verb since at least 1890 (albeit with a different 
meaning). None of my dictionaries give a date for the use of chunking to 
mean dividing something up into chunks, so that could be quite recent, 
but it's well-established in education (chunking as a technique for 
doing long division), psychology, linguistics and more. I remember using 
"chunking" as a verb to describe Hyperscript's text handling back in the 
mid 1980s, e.g. "word 2 of line 6 of text".

The nltk library handles chunk as both a noun and verb in a similar 
sense:

http://www.nltk.org/howto/chunk.html


> So if we went with this approach, then Carl Smith's suggestion of
> "str.delimit()" likely makes sense.

The problem with "delimit" is that in many contexts it refers to 
marking both the start and end boundaries, e.g. people often refer to 
string delimiters '...' and list delimiters [...]. That doesn't apply 
here, where we're adding separators between chunks/groups.

The term delimiter can be used in various ways, and some of them do not 
match the behaviour we want here:

http://stackoverflow.com/questions/9118769/when-to-use-the-terms-delimiter-terminator-and-separator

In this case, we are not adding delimiters, we're adding separators. 
We're chunking (or grouping) characters by counting them, then 
separating the groups. The test here is what happens if the string is 
shorter than the group size?

"xyz".chunk(5, '*')

If we're delimiting the boundaries of the group, then I expect that we 
should get "*xyz*", but if we're separating groups, I expect that we 
should get "xyz" unchanged.


> However, the other question worth asking is whether we might want a
> "string slice splitting" operation rather than a string delimiting
> option: once you have the slices, then combining them again with
> str.join is straightforward, but extracting the slices in the first
> place is currently a little fiddly (especially for the reversed case):

Let me think about that :-)



-- 
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to