On 2 May 2017 at 21:31, Steven D'Aprano <st...@pearwood.info> wrote: > On Mon, May 01, 2017 at 11:38:20PM +1000, Nick Coghlan wrote: >> However, a much simpler alternative would be to just support two >> keyword arguments to hex(): "delimiter" (as you suggest) and >> "chunk_size" (defaulting to 1, so you get per-byte chunking by >> default) > > I disagree with this approach. There's nothing special about bytes.hex() > here, perhaps we want to format the output of hex() or bin() or oct(), > or for that matter "%x" and any of the other string templates? > > In fact, this is a string operation that could apply to any character > string, including decimal digits. > > Rather than duplicate the API and logic everywhere, I suggest we add a > new string method. My suggestion is str.chunk(size, delimiter=' ') and > str.rchunk() with the same arguments: > > "1234ABCDEF".chunk(4) > => returns "1234 ABCD EF" > > rchunk will be useful for money or other situations where we group from > the right rather than from the left: > > "$" + str(10**6).rchunk(3, ',') > => returns "$1,000,000"
Nice. That proposal also addresses one of the problems I raised in the issue tracker, which is that the decimal equivalent to hex/oct/bin is just str, so anything based on keyword arguments to the display functions is hard to apply to ordinary decimal numbers. Attempting to align the terminology with existing string methods and other stdlib APIs: 1. the programming FAQ uses "chunks" as the accumulation variable prior to calling str.join(): https://docs.python.org/3/faq/programming.html#what-is-the-most-efficient-way-to-concatenate-many-strings-together 2. the most analogous itertools recipe is the "grouper" recipe, which describes it purpose as "Collect data into fixed-length chunks or blocks" 3. there's a top level "chunk" module for working with audio file formats (today-I-learned...) 4. multiprocessing uses "chunksize" to manage the dispatching of work to worker processes 5. various networking, IO and serialisation libraries use "chunk" to describe data blocks for incremental reads and writes I think a couple of key problems are illustrated by that survey: 1. we don't have any current APIs or documentation that use "chunk" in combination with any kind of delimiter 2. we don't have any current APIs or documentation that use "chunk" as a verb - they all use it as a noun So if we went with this approach, then Carl Smith's suggestion of "str.delimit()" likely makes sense. However, the other question worth asking is whether we might want a "string slice splitting" operation rather than a string delimiting option: once you have the slices, then combining them again with str.join is straightforward, but extracting the slices in the first place is currently a little fiddly (especially for the reversed case): def splitslices(self, size): return [self[start:start+size] for start in range(0, len(self), size)] def rsplitslices(self, size): blocks = [self[start:start+size] for start in range(-2*size, -len(self), -size)] blocks.append(self[-size:]) return blocks Given those methods, the split-and-rejoin use case that started the thread would look like: " ".join("1234ABCDEF".splitslices(4)) => "1234 ABCD EF" "$" + ",".join(str(10**6).rsplitslices(3)) => "$1,000,000" Which is the same pattern that can be used to change a delimiter with str.split() and str.splitlines(). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/