Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Erik

On 04/05/17 01:24, Steven D'Aprano wrote:

On Thu, May 04, 2017 at 12:13:25AM +0100, Erik wrote:

I had a use-case where splitting an iterable into a sequence of
same-sized chunks efficiently improved the performance of my code

[...]

So I didn't propose it. I have no idea now what I spent my saved hours
doing, but I imagine that it was fun



Summary: I didn't present the argument because I'm not a masochist


I'm not sure what the point of that anecdote was, unless it was "I wrote
some useful code, and you missed out".


Then you have misunderstood me. Paul suggested that my use-case 
(chunking could be faster) was perhaps enough to propose that my patch 
may be considered. I responded with historical/empirical evidence that 
perhaps that would actually not be the case.


I was responding, honestly, to the questions raised by Paul's email.


Your comments come across as a passive-aggressive chastisment of the
core devs and the Python-Ideas community for being too quick to reject
useful code: we missed out on something good, because you don't have the
time or energy to deal with our negativity and knee-jerk rejection of
everything good. That's the way your series of posts come across to me.


I apologise if my words or my turn of phrase do not appeal to you. I am 
trying to be constructive with everything I post.


If you choose to interpret my messages in a different way then I'm not 
sure what I can do about that.


Back to the important stuff though:


- you could have offered it to the moreitertools project;


A more efficient version of moreitertools.chunked() is what we're 
talking about.



- you could have published it on PyPy;


Does PyPy support C extension modules? If so, that's a possibility.


- you could have proposed it on Python-Ideas with an explicit statement


I may well do that - my current patch (because of when I did it) is 
against a Py2 codebase, but I could port it to Py3. I still have a 
nagging doubt that I'd be wasting my time though ;)




If
you care so little that you can't be bothered even to propose it, why do
you care if it is rejected?


You are mistaking not caring enough about the functionality with not 
caring enough to enter into an argument about including that 
functionality ...


I didn't propose it at the time because of the reasons I mentioned. But 
when I saw something being discussed yet again that I had a general 
solution for already written I thought I mention it in case it was 
useful. As I said, I'm _trying_ to be constructive.


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Steven D'Aprano
On Thu, May 04, 2017 at 12:13:25AM +0100, Erik wrote:
> I had a use-case where splitting an iterable into a sequence of 
> same-sized chunks efficiently improved the performance of my code 
[...]
> So I didn't propose it. I have no idea now what I spent my saved hours 
> doing, but I imagine that it was fun

> Summary: I didn't present the argument because I'm not a masochist

I'm not sure what the point of that anecdote was, unless it was "I wrote 
some useful code, and you missed out".

Your comments come across as a passive-aggressive chastisment of the 
core devs and the Python-Ideas community for being too quick to reject 
useful code: we missed out on something good, because you don't have the 
time or energy to deal with our negativity and knee-jerk rejection of 
everything good. That's the way your series of posts come across to me.

Not every piece of useful code has to go into the std lib, and even if 
it should, it doesn't necessarily have to go into it from day 1. If you 
wanted to give back to the community, there are a number of options 
apart from "std lib or nothing":

- you could have offered it to the moreitertools project;

- you could have published it on PyPy;

- you could have proposed it on Python-Ideas with an explicit statement 
that you didn't have the time or energy to get into a debate about 
including the function, "here's my implementation and an appropriate 
licence for you to use it: use it yourself, or if somebody else wants 
to champion putting it into the std lib, go right ahead, but I won't";

and possibly more. 

I'm not suggesting that you have any obligation to do any of these 
things, but you don't *have* to get into a long-winded, energy-sapping 
debate over inclusion unless you *really* care about having it added. If 
you care so little that you can't be bothered even to propose it, why do 
you care if it is rejected?



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Erik

Hi Paul,

On 03/05/17 08:57, Paul Moore wrote:
> On 3 May 2017 at 02:48, Erik  wrote:
>> Anyway, I know you can't stop anyone from *proposing* something like 
this,

>> but as soon as they do you may decide to quote the recipe from
>> "https://docs.python.org/3/library/functions.html#zip; and try to block
>> their proposition. There are already threads on fora that do that.
>>
>> That was my sticking point at the time when I implemented a general
>> solution. Why bother to propose something that (although it made my code
>> significantly faster) had already been blocked as being something that
>> should be a python-level operation and not something to be included in a
>> built-in?
>
> It sounds like you have a reasonable response to the suggestion of
> using zip- that you have a use case where performance matters, and
> your proposed solution is of value in that case.

I don't think so, though.

I had a use-case where splitting an iterable into a sequence of 
same-sized chunks efficiently improved the performance of my code 
significantly (processing a LOT of 24-bit, multi-channel - 16 to 32 - 
PCM streams from a WAV file).


Having thought "I need to split this stream by a fixed number of bytes" 
and then found more_itertools.chunked() (and the 
zip_longest(*([iter(foo)] * num)) trick) it turned out they were not 
quick enough so I implemented itertools.chunked() in C.


That worked well for me, so when I was done I did a search in case it 
was worth proposing as an enhancement to feed it back to the community. 
Then I came across things such as the following:


http://bugs.python.org/issue6021

I am specifically referring to the "It has been rejected before" 
comment, also mentioned here:


https://mail.python.org/pipermail/python-dev/2012-July/120885.html

See this entire thread, too:

https://mail.python.org/pipermail/python-ideas/2012-July/015671.html

This is the reason why I really just didn't care enough to go through 
the process of proposing it in the end (even though the 
more_itertools.chunked function was one of the first 3 implemented in 
V1.0 and seems to _still_ be cropping up all the time in different 
guises - so is perhaps more fundamental than people recognise).


The strong implication of the discussions linked to above is that if it 
had been mentioned before it would be immediately rejected, and that was 
supported by several members of the community in good standing.


So I didn't propose it. I have no idea now what I spent my saved hours 
doing, but I imagine that it was fun


> Whether it's a
> *sufficient* response remains to be seen, but unless you present the
> argument we won't know.

Summary: I didn't present the argument because I'm not a masochist

Regards, E.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Nick Coghlan
On 3 May 2017 at 08:10, Greg Ewing  wrote:
> For a name, I think "group" would be better than "chunk".
> We talk about grouping the digits of a number, not chunking
> them.

As soon as I added an intermediate variable to my example, I came to
the same conclusion:

>>> digit_groups = b'\xb9\x01\xef'.hex().splitgroups(2)
>>> ' '.join(digit_groups)
'b9 01 ef'

(from http://bugs.python.org/issue22385#msg292900)

And for David's telephone number examples:

>>> digit_groups = str(4135559414).rsplitgroups(4,3)
>>> '-'.join(digit_groups)
'413-555-9414'

>>> digit_groups = "0113225551212".rsplitgroups(2,2,3,1,2,3)
>>> '-'.join(digit_groups)
'011-32-2-555-12-12'

Another example would be generating numeric literals with underscores:

>>> digit_groups = str(int(1e6).rsplitgroups(3)
>>> '_'.join(digit_groups)
'1_000_000'

While a generalised reversed version wouldn't be possible, a
corresponding "itertools.itergroups" function could be used to produce
groups of defined lengths as islice iterators, similar to the way
itertools.groupby works (i.e. producing subiterators of variable
length rather than a fixed length tuple the way the grouper() recipe
in the docs does).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Steven D'Aprano
On Wed, May 03, 2017 at 02:48:03AM +0100, Erik wrote:
> On 03/05/17 01:43, Steven D'Aprano wrote:

> >I'm not stopping anyone from proposing a generalisation of this that
> >works with other sequence types. As somebody did :-)
> 
> Who? I didn't spot that in the thread - please give a reference. Thanks.

https://mail.python.org/pipermail/python-ideas/2017-May/045568.html

[...]
> Knowing which sequence classes have a "chunk" method and which don't is 
> a higher barrier than knowing that all sequences can be "chunked" by a 
> single imported function.

At the moment, we're only talking about strings. That's the only actual 
use-case been presented so far. Everything else is at best Nice To Have, 
if not YAGNI.

Let's not kill this idea by over-generalising it. We can always extend 
the idea in the future once it is proven. Or for those who really want a 
general purpose group-any-iterable function, it can start life as a 
third party module, and we can discuss adding it to the language when 
it is mature and the kinks are ironed out.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Steven D'Aprano
On Tue, May 02, 2017 at 09:07:41PM -0400, Juancarlo Añez wrote:
> On Tue, May 2, 2017 at 8:43 PM, Steven D'Aprano  wrote:
> 
> > String methods should return strings.
> >
> 
> >>> "A-B-C".split("-")
> ['A', 'B', 'C']

Yes, thank you. And don't forget:

py> 'abcd'.index('c')
2


But in context, I was responding to the question of why this proposed 
chunk()/group() method returns a string rather than an iterator. I 
worded my answer badly, but the intention was clear, at least in my own 
head *wink*

Given that we were discussing a method that both groups the characters 
of a string and inserts the separators, it makes sense to return a 
string, like other string methods:

'foo'.upper() returns 'FOO', not iter(['F', 'O', 'O'])

'cheese and green eggs'.replace('green', 'red') returns a string, 
not iter(['cheese and ', 'red', ' eggs'])

'xyz'.zfill(5) returns '00xyz' not iter(['00', 'xyz'])

etc, and likewise:

'abcdef'.chunk(2, sep='-') should return 'ab-cd-ef' rather than
iter(['ab', '-', 'cd', '-', 'ef'])

If we're talking about a different API, one where only the grouping is 
done and inserting separators is left for join(), then my answer will be 
different. In that case, then it is a matter of taste whether to return 
a list (like split()) or an iterator. I lean slightly towards returning 
a list, but I can see arguments for and against both.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Paul Moore
On 3 May 2017 at 02:48, Erik  wrote:
> Anyway, I know you can't stop anyone from *proposing* something like this,
> but as soon as they do you may decide to quote the recipe from
> "https://docs.python.org/3/library/functions.html#zip; and try to block
> their proposition. There are already threads on fora that do that.
>
> That was my sticking point at the time when I implemented a general
> solution. Why bother to propose something that (although it made my code
> significantly faster) had already been blocked as being something that
> should be a python-level operation and not something to be included in a
> built-in?

It sounds like you have a reasonable response to the suggestion of
using zip - that you have a use case where performance matters, and
your proposed solution is of value in that case. Whether it's a
*sufficient* response remains to be seen, but unless you present the
argument we won't know.

IMO, the idea behind itertools being building blocks is not to deter
proposals for new tools, but to make sure that people focus on
providing important low-level tools, and not on high level operations
that can just as easily be written using those tools - essentially the
guideline "not every 3-line function needs to be a builtin". So it's
to make people think, not to block innovation.

Hope this clarifies,
Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Greg Ewing

Steven D'Aprano wrote:
I've also been thinking about generalisations such as grouping lines 
into paragraphs, words into lines, etc. 


You're probably going to want considerably more complicated
algorithms for that kind of thing, though. Let's keep it
simple.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Juancarlo Añez
On Tue, May 2, 2017 at 8:43 PM, Steven D'Aprano  wrote:

> String methods should return strings.
>

>>> "A-B-C".split("-")
['A', 'B', 'C']


If chunk() worked for all iterables:

>>> " ".join("1234ABCDEF".chunk(4))
"1234 ABCD EF"


Cheers,


-- 
Juancarlo *Añez*
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Steven D'Aprano
On Tue, May 02, 2017 at 11:39:48PM +0100, Erik wrote:
> On 02/05/17 12:31, Steven D'Aprano wrote:

> >Rather than duplicate the API and logic everywhere, I suggest we add a
> >new string method. My suggestion is str.chunk(size, delimiter=' ') and
> >str.rchunk() with the same arguments:

For the record, I now think the second argument should be called "sep", 
for separator, and I'm okay with Greg's suggestion we call the method
"group". 


> >"1234ABCDEF".chunk(4)
> >=> returns "1234 ABCD EF"
[...]

> Why do you want to limit it to strings?

I'm not stopping anyone from proposing a generalisation of this that 
works with other sequence types. As somebody did :-)

I've also been thinking about generalisations such as grouping lines 
into paragraphs, words into lines, etc. In text processing, chunking can 
refer to more than just characters.

But here we have a specific, concrete use-case that involves strings. 
Anything else is YAGNI until a need is demonstrated :-)

> Isn't something like this 
> potentially useful for all sequences (where the result is a tuple of 
> objects that are the same as the source sequence - be that strings or 
> lists or lazy ranges or whatever?). Why aren't the chunks returned via 
> an iterator?

String methods should return strings.

That's not to argue against a generic iterator solution, but the barrier 
to use of an iterator solution is higher than just calling a method. You 
have to learn about importing, you need to know there is an itertools 
module (or a third party module to install first!), you have to know how 
to convert the iterator back to a string...


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Steven D'Aprano
On Tue, May 02, 2017 at 10:48:08AM -0700, David Mertz wrote:

> Maybe your API is for any length tuple, with the final element repeated.
> So I guess maybe this example could be:
> 
> "0113225551212".rchunk((2,2,3,1,2,3),'-')

That's what I meant.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Erik

On 02/05/17 12:31, Steven D'Aprano wrote:

I disagree with this approach. There's nothing special about bytes.hex()
here, perhaps we want to format the output of hex() or bin() or oct(),
or for that matter "%x" and any of the other string templates?

In fact, this is a string operation that could apply to any character
string, including decimal digits.

Rather than duplicate the API and logic everywhere, I suggest we add a
new string method. My suggestion is str.chunk(size, delimiter=' ') and
str.rchunk() with the same arguments:

"1234ABCDEF".chunk(4)
=> returns "1234 ABCD EF"


FWIW, I implemented a version of something similar as a fixed-length 
"chunk" method in itertoolsmodule.c (it was similar to izip_longest - it 
had a "fill" keyword to pad the final chunk). It was ~100 LOC including 
the structure definitions. The chunk method was an iterator (so it 
returned a sequence of "chunks" as defined by the API).


Then I read that "itertools" should consist of primitives only and that 
we should defer to "moreitertools" for anything that is of a higher 
level (which this is - it can be done in terms of itertools functions). 
So I didn't propose it, although the processing of my WAV files (in 
which the sample data are groups of bytes - frames - of a fixed length) 
was significantly faster with it :(


I also looked at implementing itertools.chunk as a function that would 
make use of a "__chunk__" method on the source object if it existed 
(which allowed a class to support an even more efficient version of 
chunking - things like range() etc).



I don't see any advantage to adding this to bytes.hex(), hex(), oct(),
bin(), and I really don't think it is helpful to be grouping the
characters by the number of bits. Its a string formatting operation, not
a bit operation.


Why do you want to limit it to strings? Isn't something like this 
potentially useful for all sequences (where the result is a tuple of 
objects that are the same as the source sequence - be that strings or 
lists or lazy ranges or whatever?). Why aren't the chunks returned via 
an iterator?


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Greg Ewing

For a name, I think "group" would be better than "chunk".
We talk about grouping the digits of a number, not chunking
them.

--
Greg

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Carl Smith
The main reason for naming it `delimit` was to be consistent with the karg
`delimiter`, so `str.delimit(index, delimiter)`. You could call it `chop` I
guess, but I'm just bikeshedding, so will leave it while you guys figure
out the important stuff.

-- Carl Smith
carl.in...@gmail.com

On 2 May 2017 at 18:48, David Mertz  wrote:

> On Tue, May 2, 2017 at 4:31 AM, Steven D'Aprano 
> wrote:
>
>> Rather than duplicate the API and logic everywhere, I suggest we add a
>> new string method. My suggestion is str.chunk(size, delimiter=' ') and
>> str.rchunk() with the same arguments:
>>
>> "1234ABCDEF".chunk(4)
>> => returns "1234 ABCD EF"
>>
>> rchunk will be useful for money or other situations where we group from
>> the right rather than from the left:
>>
>> "$" + str(10**6).rchunk(3, ',')
>> => returns "$1,000,000"
>>
>> # Format mobile phone number in the Australian style
>> "04123456".rchunk((4, 3))
>> => returns "0412 345 678"
>>
>> # Format an integer in the Indian style
>> str(123456789).rchunk((3, 2), ",")
>> => returns "12,34,56,789"
>>
>
> I like this general idea very much.  Dealing with lakh and crore is a very
> nice feature (and one that the `.format()` mini-language sadly fails to
> handle; it assumes numeric delimiters can only be commas, and only ever
> three positions).
>
> But I'm not sure the semantics you propose is flexible enough.  I take it
> that the tuple means (, ) from your
> examples.  But I don't think that suffices for every common format.  It
> would be fine to get a USA phone number like:
>
> str(4135559414 <(413)%20555-9414>).rchunk((4,3),'-')  # ->
> 413-555-9414 <(413)%20555-9414>
>
> But for example, looking somewhat at random at an international call (
> https://en.wikipedia.org/wiki/Telephone_numbers_in_Belgium)
>
> *Dialing from New York to Brussel**011-32-2-555-12-12 
> <+32%202%20555%2012%2012>* - Omitting the leading "0".
>
> Maybe your API is for any length tuple, with the final element repeated.
> So I guess maybe this example could be:
>
> "0113225551212 <+32%202%20555%2012%2012>".rchunk((2,2,3,1,2,3),'-')
>
> I don't care about this method being called .chunk() vs. .delimit() vs.
> something else.
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread David Mertz
On Tue, May 2, 2017 at 4:31 AM, Steven D'Aprano  wrote:

> Rather than duplicate the API and logic everywhere, I suggest we add a
> new string method. My suggestion is str.chunk(size, delimiter=' ') and
> str.rchunk() with the same arguments:
>
> "1234ABCDEF".chunk(4)
> => returns "1234 ABCD EF"
>
> rchunk will be useful for money or other situations where we group from
> the right rather than from the left:
>
> "$" + str(10**6).rchunk(3, ',')
> => returns "$1,000,000"
>
> # Format mobile phone number in the Australian style
> "04123456".rchunk((4, 3))
> => returns "0412 345 678"
>
> # Format an integer in the Indian style
> str(123456789).rchunk((3, 2), ",")
> => returns "12,34,56,789"
>

I like this general idea very much.  Dealing with lakh and crore is a very
nice feature (and one that the `.format()` mini-language sadly fails to
handle; it assumes numeric delimiters can only be commas, and only ever
three positions).

But I'm not sure the semantics you propose is flexible enough.  I take it
that the tuple means (, ) from your
examples.  But I don't think that suffices for every common format.  It
would be fine to get a USA phone number like:

str(4135559414).rchunk((4,3),'-')  # -> 413-555-9414

But for example, looking somewhat at random at an international call (
https://en.wikipedia.org/wiki/Telephone_numbers_in_Belgium)

*Dialing from New York to Brussel**011-32-2-555-12-12* - Omitting the
leading "0".

Maybe your API is for any length tuple, with the final element repeated.
So I guess maybe this example could be:

"0113225551212".rchunk((2,2,3,1,2,3),'-')

I don't care about this method being called .chunk() vs. .delimit() vs.
something else.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Steven D'Aprano
On Tue, May 02, 2017 at 11:45:35PM +1000, Nick Coghlan wrote:

> Attempting to align the terminology with existing string methods and
> other stdlib APIs:
[...]
> 1. we don't have any current APIs or documentation that use "chunk" in
> combination with any kind of delimiter
> 2. we don't have any current APIs or documentation that use "chunk" as
> a verb - they all use it as a noun

English has a long and glorious tradition of verbing nouns, and nouning 
verbs. Group can mean the action of putting things into a group, join 
likewise refers to both the action of attaching two things and the seam 
or joint where they have been joined. Likewise for chunking:

https://duckduckgo.com/html/?q=chunking

"Chunk" has used as a verb since at least 1890 (albeit with a different 
meaning). None of my dictionaries give a date for the use of chunking to 
mean dividing something up into chunks, so that could be quite recent, 
but it's well-established in education (chunking as a technique for 
doing long division), psychology, linguistics and more. I remember using 
"chunking" as a verb to describe Hyperscript's text handling back in the 
mid 1980s, e.g. "word 2 of line 6 of text".

The nltk library handles chunk as both a noun and verb in a similar 
sense:

http://www.nltk.org/howto/chunk.html


> So if we went with this approach, then Carl Smith's suggestion of
> "str.delimit()" likely makes sense.

The problem with "delimit" is that in many contexts it refers to 
marking both the start and end boundaries, e.g. people often refer to 
string delimiters '...' and list delimiters [...]. That doesn't apply 
here, where we're adding separators between chunks/groups.

The term delimiter can be used in various ways, and some of them do not 
match the behaviour we want here:

http://stackoverflow.com/questions/9118769/when-to-use-the-terms-delimiter-terminator-and-separator

In this case, we are not adding delimiters, we're adding separators. 
We're chunking (or grouping) characters by counting them, then 
separating the groups. The test here is what happens if the string is 
shorter than the group size?

"xyz".chunk(5, '*')

If we're delimiting the boundaries of the group, then I expect that we 
should get "*xyz*", but if we're separating groups, I expect that we 
should get "xyz" unchanged.


> However, the other question worth asking is whether we might want a
> "string slice splitting" operation rather than a string delimiting
> option: once you have the slices, then combining them again with
> str.join is straightforward, but extracting the slices in the first
> place is currently a little fiddly (especially for the reversed case):

Let me think about that :-)



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Nick Coghlan
On 2 May 2017 at 21:31, Steven D'Aprano  wrote:
> On Mon, May 01, 2017 at 11:38:20PM +1000, Nick Coghlan wrote:
>> However, a much simpler alternative would be to just support two
>> keyword arguments to hex(): "delimiter" (as you suggest) and
>> "chunk_size" (defaulting to 1, so you get per-byte chunking by
>> default)
>
> I disagree with this approach. There's nothing special about bytes.hex()
> here, perhaps we want to format the output of hex() or bin() or oct(),
> or for that matter "%x" and any of the other string templates?
>
> In fact, this is a string operation that could apply to any character
> string, including decimal digits.
>
> Rather than duplicate the API and logic everywhere, I suggest we add a
> new string method. My suggestion is str.chunk(size, delimiter=' ') and
> str.rchunk() with the same arguments:
>
> "1234ABCDEF".chunk(4)
> => returns "1234 ABCD EF"
>
> rchunk will be useful for money or other situations where we group from
> the right rather than from the left:
>
> "$" + str(10**6).rchunk(3, ',')
> => returns "$1,000,000"

Nice. That proposal also addresses one of the problems I raised in the
issue tracker, which is that the decimal equivalent to hex/oct/bin is
just str, so anything based on keyword arguments to the display
functions is hard to apply to ordinary decimal numbers.

Attempting to align the terminology with existing string methods and
other stdlib APIs:

1. the programming FAQ uses "chunks" as the accumulation variable
prior to calling str.join():
https://docs.python.org/3/faq/programming.html#what-is-the-most-efficient-way-to-concatenate-many-strings-together
2. the most analogous itertools recipe is the "grouper" recipe, which
describes it purpose as "Collect data into fixed-length chunks or
blocks"
3. there's a top level "chunk" module for working with audio file
formats (today-I-learned...)
4. multiprocessing uses "chunksize" to manage the dispatching of work
to worker processes
5. various networking, IO and serialisation libraries use "chunk" to
describe data blocks for incremental reads and writes

I think a couple of key problems are illustrated by that survey:

1. we don't have any current APIs or documentation that use "chunk" in
combination with any kind of delimiter
2. we don't have any current APIs or documentation that use "chunk" as
a verb - they all use it as a noun

So if we went with this approach, then Carl Smith's suggestion of
"str.delimit()" likely makes sense.

However, the other question worth asking is whether we might want a
"string slice splitting" operation rather than a string delimiting
option: once you have the slices, then combining them again with
str.join is straightforward, but extracting the slices in the first
place is currently a little fiddly (especially for the reversed case):

def splitslices(self, size):
return [self[start:start+size] for start in range(0, len(self), size)]

def rsplitslices(self, size):
blocks = [self[start:start+size] for start in range(-2*size,
-len(self), -size)]
blocks.append(self[-size:])
return blocks

Given those methods, the split-and-rejoin use case that started the
thread would look like:

  " ".join("1234ABCDEF".splitslices(4))
=> "1234 ABCD EF"

  "$" + ",".join(str(10**6).rsplitslices(3))
   => "$1,000,000"

Which is the same pattern that can be used to change a delimiter with
str.split() and str.splitlines().

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Carl Smith
On the block size arg, couldn't it just be named `index`?

On Tue, 2 May 2017 13:12 Carl Smith,  wrote:

> Sorry. I meant to be terse, but wasn't clear enough. I meant the method
> name. If it takes a `delimiter` karg, it would be consistent to call the
> operation `delimit`.
>
> On Tue, 2 May 2017 13:06 Carl Smith,  wrote:
>
>> Couldn't it just be named `str.delimit`? I totally agree with Steve for
>> what it's worth. Thanks for everything guys. Best,
>>
>> On Tue, 2 May 2017 13:02 Joao S. O. Bueno,  wrote:
>>
>>> On 1 May 2017 at 11:04, Juancarlo Añez  wrote:
>>> >
>>> > On Mon, May 1, 2017 at 9:38 AM, Nick Coghlan 
>>> wrote:
>>> >>
>>> >> just support two
>>> >> keyword arguments to hex(): "delimiter" (as you suggest) and
>>> >> "chunk_size" (defaulting to 1, so you get per-byte chunking by
>>> >> default)
>>> >
>>> >
>>> > I'd expect "chunk_size"  to mean the number of hex digits (not bytes)
>>> per
>>> > chunk.
>>> So do I. Moreover, if "1" is for two digits, there is no way to
>>> specify single digits - for little use we can perceive for that.
>>>
>>> Maybe it does not need to be named "chunk_size" - "digits_per_block"
>>> is too big, but is precise.
>>>
>>> Also, whatever we think is good for "hex" could also be done to "bin" .
>>>
>>> >
>>> > Cheers,
>>> >
>>> >
>>> > --
>>> > Juancarlo Añez
>>> >
>>> > ___
>>> > Python-ideas mailing list
>>> > Python-ideas@python.org
>>> > https://mail.python.org/mailman/listinfo/python-ideas
>>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>> >
>>> ___
>>> Python-ideas mailing list
>>> Python-ideas@python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Carl Smith
Sorry. I meant to be terse, but wasn't clear enough. I meant the method
name. If it takes a `delimiter` karg, it would be consistent to call the
operation `delimit`.

On Tue, 2 May 2017 13:06 Carl Smith,  wrote:

> Couldn't it just be named `str.delimit`? I totally agree with Steve for
> what it's worth. Thanks for everything guys. Best,
>
> On Tue, 2 May 2017 13:02 Joao S. O. Bueno,  wrote:
>
>> On 1 May 2017 at 11:04, Juancarlo Añez  wrote:
>> >
>> > On Mon, May 1, 2017 at 9:38 AM, Nick Coghlan 
>> wrote:
>> >>
>> >> just support two
>> >> keyword arguments to hex(): "delimiter" (as you suggest) and
>> >> "chunk_size" (defaulting to 1, so you get per-byte chunking by
>> >> default)
>> >
>> >
>> > I'd expect "chunk_size"  to mean the number of hex digits (not bytes)
>> per
>> > chunk.
>> So do I. Moreover, if "1" is for two digits, there is no way to
>> specify single digits - for little use we can perceive for that.
>>
>> Maybe it does not need to be named "chunk_size" - "digits_per_block"
>> is too big, but is precise.
>>
>> Also, whatever we think is good for "hex" could also be done to "bin" .
>>
>> >
>> > Cheers,
>> >
>> >
>> > --
>> > Juancarlo Añez
>> >
>> > ___
>> > Python-ideas mailing list
>> > Python-ideas@python.org
>> > https://mail.python.org/mailman/listinfo/python-ideas
>> > Code of Conduct: http://python.org/psf/codeofconduct/
>> >
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Carl Smith
Couldn't it just be named `str.delimit`? I totally agree with Steve for
what it's worth. Thanks for everything guys. Best,

On Tue, 2 May 2017 13:02 Joao S. O. Bueno,  wrote:

> On 1 May 2017 at 11:04, Juancarlo Añez  wrote:
> >
> > On Mon, May 1, 2017 at 9:38 AM, Nick Coghlan  wrote:
> >>
> >> just support two
> >> keyword arguments to hex(): "delimiter" (as you suggest) and
> >> "chunk_size" (defaulting to 1, so you get per-byte chunking by
> >> default)
> >
> >
> > I'd expect "chunk_size"  to mean the number of hex digits (not bytes) per
> > chunk.
> So do I. Moreover, if "1" is for two digits, there is no way to
> specify single digits - for little use we can perceive for that.
>
> Maybe it does not need to be named "chunk_size" - "digits_per_block"
> is too big, but is precise.
>
> Also, whatever we think is good for "hex" could also be done to "bin" .
>
> >
> > Cheers,
> >
> >
> > --
> > Juancarlo Añez
> >
> > ___
> > Python-ideas mailing list
> > Python-ideas@python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Joao S. O. Bueno
On 1 May 2017 at 11:04, Juancarlo Añez  wrote:
>
> On Mon, May 1, 2017 at 9:38 AM, Nick Coghlan  wrote:
>>
>> just support two
>> keyword arguments to hex(): "delimiter" (as you suggest) and
>> "chunk_size" (defaulting to 1, so you get per-byte chunking by
>> default)
>
>
> I'd expect "chunk_size"  to mean the number of hex digits (not bytes) per
> chunk.
So do I. Moreover, if "1" is for two digits, there is no way to
specify single digits - for little use we can perceive for that.

Maybe it does not need to be named "chunk_size" - "digits_per_block"
is too big, but is precise.

Also, whatever we think is good for "hex" could also be done to "bin" .

>
> Cheers,
>
>
> --
> Juancarlo Añez
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Steven D'Aprano
On Mon, May 01, 2017 at 11:38:20PM +1000, Nick Coghlan wrote:

> We're definitely open to offering better formatting options for bytes.hex().
> 
> My proposal in https://bugs.python.org/issue22385 was to define a new
> formatting mini-language (akin to the way strftime works, but with a
> much simpler formatting mini-language):
> http://bugs.python.org/issue22385#msg292663
> 
> However, a much simpler alternative would be to just support two
> keyword arguments to hex(): "delimiter" (as you suggest) and
> "chunk_size" (defaulting to 1, so you get per-byte chunking by
> default)

I disagree with this approach. There's nothing special about bytes.hex() 
here, perhaps we want to format the output of hex() or bin() or oct(), 
or for that matter "%x" and any of the other string templates?

In fact, this is a string operation that could apply to any character 
string, including decimal digits.

Rather than duplicate the API and logic everywhere, I suggest we add a 
new string method. My suggestion is str.chunk(size, delimiter=' ') and 
str.rchunk() with the same arguments:

"1234ABCDEF".chunk(4)
=> returns "1234 ABCD EF"

rchunk will be useful for money or other situations where we group from 
the right rather than from the left:

"$" + str(10**6).rchunk(3, ',')
=> returns "$1,000,000"


And if we want to add bells and whistles, we could accept a tuple for 
the size argument:

# Format mobile phone number in the Australian style
"04123456".rchunk((4, 3))
=> returns "0412 345 678"

# Format an integer in the Indian style
str(123456789).rchunk((3, 2), ",")
=> returns "12,34,56,789"


In the OP's use-case:

bytes("abcde", "ascii").hex().chunk(2)
=> returns '61 62 63 64 65'

bytes("abcde", "ascii").hex().chunk(4)
=> returns '6162 6364 65'


I don't see any advantage to adding this to bytes.hex(), hex(), oct(), 
bin(), and I really don't think it is helpful to be grouping the 
characters by the number of bits. Its a string formatting operation, not 
a bit operation.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-01 Thread Terry Reedy

On 5/1/2017 1:41 PM, Alexandre Brault wrote:

On 2017-05-01 01:34 PM, Ethan Furman wrote:

On 05/01/2017 07:04 AM, Juancarlo Añez wrote:

On Mon, May 1, 2017 at 9:38 AM, Nick Coghlan wrote:


just support two
keyword arguments to hex(): "delimiter" (as you suggest) and
"chunk_size" (defaulting to 1, so you get per-byte chunking by
default)


I'd expect "chunk_size"  to mean the number of hex digits (not bytes)
per chunk.


I was also surprised by that.  Also, should Python be used on a
machine with, say, 24-bit words then a chunk size of three makes more
sense that one of 1.5.  ;)

--
~Ethan~

A hex digit is 4 bits long. To separate into words, the 24-bit word
Python would use 3 (counting in bytes as initially proposed), or 6
(counting in hex digits). Neither option would result in a 1.5
chunk_size for 24-bit chunks.

Counting chunk_size either in nibbles or bytes seem equally intuitive to
me (as long as it's documented).


Call the paramater 'octets' and it should be clear that it means 8 bit 
chunks.  Do any machine now use anything else?


--
Terry Jan Reedy


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-01 Thread Alexandre Brault
On 2017-05-01 01:41 PM, Alexandre Brault wrote:
> On 2017-05-01 01:34 PM, Ethan Furman wrote:
>> On 05/01/2017 07:04 AM, Juancarlo Añez wrote:
>>> On Mon, May 1, 2017 at 9:38 AM, Nick Coghlan wrote:
>>>
 just support two
 keyword arguments to hex(): "delimiter" (as you suggest) and
 "chunk_size" (defaulting to 1, so you get per-byte chunking by
 default)
>>> I'd expect "chunk_size"  to mean the number of hex digits (not bytes)
>>> per chunk.
>> I was also surprised by that.  Also, should Python be used on a
>> machine with, say, 24-bit words then a chunk size of three makes more
>> sense that one of 1.5.  ;)
>>
>> -- 
>> ~Ethan~
> A hex digit is 4 bits long. To separate into words, the 24-bit word
> Python would use 3 (counting in bytes as initially proposed), or 6
> (counting in hex digits). Neither option would result in a 1.5
> chunk_size for 24-bit chunks.
>
> Counting chunk_size either in nibbles or bytes seem equally intuitive to
> me (as long as it's documented).
And I only just realised your main concern was about the 12-bit byte of
that 24-bit word architecture. Carry on
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-01 Thread Alexandre Brault
On 2017-05-01 01:34 PM, Ethan Furman wrote:
> On 05/01/2017 07:04 AM, Juancarlo Añez wrote:
>> On Mon, May 1, 2017 at 9:38 AM, Nick Coghlan wrote:
>>
>>> just support two
>>> keyword arguments to hex(): "delimiter" (as you suggest) and
>>> "chunk_size" (defaulting to 1, so you get per-byte chunking by
>>> default)
>>
>> I'd expect "chunk_size"  to mean the number of hex digits (not bytes)
>> per chunk.
>
> I was also surprised by that.  Also, should Python be used on a
> machine with, say, 24-bit words then a chunk size of three makes more
> sense that one of 1.5.  ;)
>
> -- 
> ~Ethan~
A hex digit is 4 bits long. To separate into words, the 24-bit word
Python would use 3 (counting in bytes as initially proposed), or 6
(counting in hex digits). Neither option would result in a 1.5
chunk_size for 24-bit chunks.

Counting chunk_size either in nibbles or bytes seem equally intuitive to
me (as long as it's documented).
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-01 Thread Ethan Furman

On 05/01/2017 07:04 AM, Juancarlo Añez wrote:

On Mon, May 1, 2017 at 9:38 AM, Nick Coghlan wrote:


just support two
keyword arguments to hex(): "delimiter" (as you suggest) and
"chunk_size" (defaulting to 1, so you get per-byte chunking by
default)


I'd expect "chunk_size"  to mean the number of hex digits (not bytes) per chunk.


I was also surprised by that.  Also, should Python be used on a machine with, say, 24-bit words then a chunk size of 
three makes more sense that one of 1.5.  ;)


--
~Ethan~

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-01 Thread Nick Coghlan
On 1 May 2017 at 17:19,   wrote:
> The bytes.hex() function is the inverse function of Bytes.fromhex().
>
> But fromhex can process spaces (which is much more readable), while hex()
> provides no way to include spaces.
>
> My proposal would be to add an optional delimiter, that allows to specify a
> string that will be inserted between the digit pairs of a byte:
>
> def hex(self, delimiter=‘‘): …

We're definitely open to offering better formatting options for bytes.hex().

My proposal in https://bugs.python.org/issue22385 was to define a new
formatting mini-language (akin to the way strftime works, but with a
much simpler formatting mini-language):
http://bugs.python.org/issue22385#msg292663

However, a much simpler alternative would be to just support two
keyword arguments to hex(): "delimiter" (as you suggest) and
"chunk_size" (defaulting to 1, so you get per-byte chunking by
default)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-01 Thread robert.hoelzl
The bytes.hex() function is the inverse function of Bytes.fromhex().
But fromhex can process spaces (which is much more readable), while hex() 
provides no way to include spaces.

My proposal would be to add an optional delimiter, that allows to specify a 
string that will be inserted between the digit pairs of a byte:

def hex(self, delimiter=‘‘): …

This would allow to write:

assert b’abc‘.hex(‘ ‘) == ’61 62 63‘

Gesendet von Mail für Windows 10

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/