[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Matthew Gilson

Matthew Gilson added the comment:

Tracking which group the grouper _should_ be on using an incrementing integer 
seems to work pretty well.

In additional to the tests in `test_itertools.py`, I've gotten the following 
tests to pass:

class TestGroupBy(unittest.TestCase):
def test_unpacking(self):
iterable = 'AB'
(_, a), (_, b) = groupby(iterable)
self.assertEqual(list(a), [])
self.assertEqual(list(b), [])

def test_weird_iterating(self):
g = groupby('AB')
_, a = next(g)
_, b = next(g)
_, aa = next(g)
self.assertEqual(list(a), [])
self.assertEqual(list(b), [])
self.assertEqual(list(aa), list('A'))

If I was to submit this as a PR,

1. where would I want to add these tests?
2. should I update the documentation for the "equivalent" python version to 
match exactly?

--
keywords: +patch
Added file: http://bugs.python.org/file46860/groupby-fix.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30346>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Matthew Gilson

Matthew Gilson added the comment:

Oops.  You don't need to pass `self.currvalue` to `_grouper`. I didn't end up 
using it in there...

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30346>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30346] Odd behavior when unpacking `itertools.groupby`

2017-05-12 Thread Matthew Gilson

Matthew Gilson added the comment:

I think that works to solve the problem that I pointed out.  In my stack 
overflow question (http://stackoverflow.com/a/43926058/748858) it has been 
pointed out that there are other opportunities for weirdness here.

Specifically, if if I skip processing 2 groups and then I process a third group 
whose key is the same as the first:


inputs = [(x > 5, x) for x in range(10)]
inputs += [(False, 10), (True, 11)]

g = groupby(inputs2 + [(True, 11)], key=itemgetter(0))
_, a = next(g)
_, b = next(g)
_, c = next(g)

print(list(a))
print(list(b))

Both `a` and `b` should probably be empty at this point, but they aren't.  

What if you kept track of the last iterable group and just consumed it at 
whenever `next` is called?  I think then you also need to keep track of whether 
or not the input iterable has been completely consumed, but that's not too bad 
either:

_sentinel = object()

class groupby:
# [k for k, g in groupby('BBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('BBBCCD')] -->  BBB CC D
def __init__(self, iterable, key=None):
if key is None:
key = lambda x: x
self.keyfunc = key
self.it = iter(iterable)
self.last_group = self.currkey = self.currvalue = _sentinel
self.empty = False

def __iter__(self):
return self

def __next__(self):
if self.last_group is not _sentinel:
for _ in self.last_group:
pass
if self.empty:
raise StopIteration

if self.currvalue is _sentinel:
try:
self.currvalue = next(self.it)
except StopIteration:
self.empty = True
raise
self.currkey = self.keyfunc(self.currvalue)
self.last_group = self._grouper(self.currkey, self.currvalue)
return (self.currkey, self.last_group)

def _grouper(self, tgtkey, currvalue):
while self.currkey == tgtkey:
yield self.currvalue
try:
self.currvalue = next(self.it)
except StopIteration:
self.empty = True
return
self.currkey = self.keyfunc(self.currvalue)

I haven't tested this to make sure it passes the test suite -- I also don't 
know if this would have major performance implications or anything.  If it did 
have severe performance implications, then it probably isn't worthwhile...

--
nosy: +mgilson

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30346>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27470] -3 commandline option documented differently via man

2016-07-08 Thread Matthew Gilson

New submission from Matthew Gilson:

The man page for python says:

> Warn about Python 3.x incompatibilities that 2to3 cannot trivially fix.

The official documentation 
(https://docs.python.org/2/using/cmdline.html#cmdoption-3) does not mention 
2to3 at all:

> Warn about Python 3.x possible incompatibilities by emitting a 
> DeprecationWarning for features that are removed or significantly changed in 
> Python 3.

This seems like a pretty big oversight when the following code issues no 
warnings (presumably because 2to3 can trivially handle this change):

```
from __future__ import print_function

class test(object):
def __nonzero__(self):
return False

t = test()
if t:
print('Hello')
```

--
assignee: docs@python
components: Documentation
messages: 269994
nosy: docs@python, mgilson
priority: normal
severity: normal
status: open
title: -3 commandline option documented differently via man
versions: Python 2.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27470>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21746] urlparse.BaseResult no longer exists

2014-06-13 Thread Matthew Gilson

New submission from Matthew Gilson:

The BaseResult has been replaced by namedtuple.  This also opens up all of the 
documented methods on namedtuple which would be nice to have as part of the 
API.  I've taken a stab and re-writing the docs here.  Feel free to use it (or 
not)...

--
files: python_doc_patch.patch
keywords: patch
messages: 220425
nosy: mgilson
priority: normal
severity: normal
status: open
title: urlparse.BaseResult no longer exists
Added file: http://bugs.python.org/file35612/python_doc_patch.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21746] urlparse.BaseResult no longer exists

2014-06-13 Thread Matthew Gilson

Changes by Matthew Gilson m.gils...@gmail.com:


--
assignee:  - docs@python
components: +Documentation
nosy: +docs@python
versions: +Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21746] urlparse.BaseResult no longer exists

2014-06-13 Thread Matthew Gilson

Matthew Gilson added the comment:

Sorry, forgot to remove the mention of BaseResult ...

--
Added file: http://bugs.python.org/file35613/python_doc_patch.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21746] urlparse.BaseResult no longer exists

2014-06-13 Thread Matthew Gilson

Matthew Gilson added the comment:

This originally came up on stackoverflow:

http://stackoverflow.com/questions/24200988/modify-url-components-in-python-2/24201020?noredirect=1#24201020

Would it be helpful if I also added a simple example to the docs as in the 
example there?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21746
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19934] collections.Counter.most_common does not document `None` as acceptable input.

2013-12-08 Thread Matthew Gilson

New submission from Matthew Gilson:

Reading the source for collections.Counter.most_common, the docstring mentions 
that `n` can be `None` or omitted, but the online documentation does not 
mention that `n` can be `None`.

--
assignee: docs@python
components: Documentation
messages: 205648
nosy: docs@python, mgilson
priority: normal
severity: normal
status: open
title: collections.Counter.most_common does not document `None` as acceptable 
input.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19934
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19934] collections.Counter.most_common does not document `None` as acceptable input.

2013-12-08 Thread Matthew Gilson

Matthew Gilson added the comment:

This is a very simple patch which addresses the issue.  I am still curious 
whether the reported function signature should be changed from:

.. method:: most_common([n])

to:

.. method:: most_common(n=None)

.  Any thoughts?

Also, while I was in there, I changed a few *None* to ``None`` for consistency 
with the rest of the documentation.

--
keywords: +patch
Added file: http://bugs.python.org/file33050/mywork.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19934
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Non-identifiers in dictionary keys for **expression syntax

2013-05-23 Thread Matthew Gilson
This is a question regarding the documentation around dictionary 
unpacking.  The documentation for the call syntax 
(http://docs.python.org/3/reference/expressions.html#grammar-token-call) 
says:


If the syntax **expression appears in the function call, expression 
must evaluate to a mapping, the contents of which are treated as 
additional keyword arguments.


That's fine, but what is a keyword argument?  According to the glossary 
(http://docs.python.org/3.3/glossary.html):


/keyword argument/: an argument preceded by an identifier (e.g. name=) 
in a function call or passed as a value in a dictionary preceded by **.


As far as I'm concerned, this leads to some ambiguity in whether the 
keys of the mapping need to be valid identifiers or not.


Using Cpython, we can do the following:

 def func(**kwargs):
  print kwargs

 d = {'foo bar baz':3}

So that might lead us to believe that the keys of the mapping do not 
need to be valid identifiers.  However, the previous function does not 
work with the following dictionary:


d = {1:3}

because not all the keys are strings.  Is there a way to petition to get 
this more rigorously defined?


Thanks,
~Matt



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Non-identifiers in dictionary keys for **expression syntax

2013-05-23 Thread Matthew Gilson


On 05/23/2013 03:20 PM, Neil Cerutti wrote:

On 2013-05-23, Matthew Gilson m.gils...@gmail.com wrote:

That's fine, but what is a keyword argument?  According to the glossary
(http://docs.python.org/3.3/glossary.html):

/keyword argument/: an argument preceded by an identifier (e.g. name=)
in a function call or passed as a value in a dictionary preceded by **.

As far as I'm concerned, this leads to some ambiguity in
whether the keys of the mapping need to be valid identifiers or
not.

I don't see any ambiguity. A keyword argument is an argument
preceded by an identifier according to the definition. Where are
you perceiving wiggle room?

The wiggle room comes from the or passed as a value in a dictionary 
clause.  We sort of get caught in a infinite loop there because the 
stuff that can be passed in a dictionary is a keyword which is an 
identifer=expression or something passed as a value in a dictionary ...


Also the fact that:

 func(**{foo bar baz:1})

works even though `foo bar baz` isn't a valid identifier, but:

 func(**{1:3})

doesn't work.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Non-identifiers in dictionary keys for **expression syntax

2013-05-23 Thread Matthew Gilson


On 05/23/2013 04:52 PM, Terry Jan Reedy wrote:

On 5/23/2013 2:52 PM, Matthew Gilson wrote:

This is a question regarding the documentation around dictionary
unpacking.  The documentation for the call syntax
(http://docs.python.org/3/reference/expressions.html#grammar-token-call)
says:

If the syntax **expression appears in the function call, expression
must evaluate to a mapping, the contents of which are treated as
additional keyword arguments.

That's fine, but what is a keyword argument?  According to the glossary
(http://docs.python.org/3.3/glossary.html):

/keyword argument/: an argument preceded by an identifier (e.g. name=)
in a function call or passed as a value in a dictionary preceded by **.


It appears that the requirement has been relaxed (in the previous 
quote), so that 'dictionary' should also be changed to 'mapping'. It 
might not hurt to add 'The key for the value should be an identifier.'


As far as I'm concerned, this leads to some ambiguity in whether the
keys of the mapping need to be valid identifiers or not.


I think you are being too lawyerly. The pretty clear and sensible 
implication is that the key for the value should be a string with a 
valid identifier. If it is anything else, you are on your own and 
deserve any joy or pain that results ;=)



Using Cpython, we can do the following:

  def func(**kwargs):
   print kwargs

  d = {'foo bar baz':3}

So that might lead us to believe that the keys of the mapping do not
need to be valid identifiers.


There are two ways to pass args to func to be gathered into kwargs; 
explicit key=val pairs and **mapping, or both.

func(a=1, b='hi', **{'foo bar baz':3})
#
{'foo bar baz': 3, 'a': 1, 'b': 'hi'}

So func should not expect anything other than identifier strings.

  However, the previous function does not

work with the following dictionary:

 d = {1:3}

because not all the keys are strings.


So CPython checks that keys are strings, because that is cheap, but 
not that the strings are identifiers, because that would be more 
expensive. Just because an implementation allow somethings (omits a 
check) for efficiency does not mean you should do it.


globals()[1] = 1
works, but is not normally very sensible or useful.


 Is there a way to petition to get this more rigorously defined?


bugs.python.org
The problem is that mandating a rigorous check by implementations 
makes Python slower to the detriment of sensible programmers 


To be clear, you're saying that

 func(**{'foo bar baz':3})

is not supported (officially), but it works in CPython because checking 
that every string in the dict is a valid identifier would be costly.  Of 
course that is sensible and I don't think the behaviour should be 
changed to the detriment of sensible programmers.  However, it would 
be nice if it was documented somewhere that the above function call is 
something that a non-sensible programmer would do.  Perhaps with a 
CPython implementation detail type of block.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Pythonic way to count sequences

2013-04-25 Thread Matthew Gilson


A Counter is definitely the way to go about this.  Just as a little more 
information.  The below example can be simplified:


from collections import Counter
count = Counter(mylist)

With the other example, you could have achieved the same thing (and been 
backward compatible to python2.5) with


   from collections import defaultdict
   count = defaultdict(int)
   for k in mylist:
count[k] += 1



On 4/25/13 9:16 PM, Modulok wrote:

On 4/25/13, Denis McMahon denismfmcma...@gmail.com wrote:

On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote:


I have to count the number of various two-digit sequences in a list such
as this:

mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)]  # (Here the (2,4) sequence
appears 2 times.)

and tally up the results, assigning each to a variable.

...

Consider using the ``collections`` module::


 from collections import Counter

 mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)]
 count = Counter()
 for k in mylist:
 count[k] += 1

 print(count)

 # Output looks like this:
 # Counter({(2, 4): 2, (4, 5): 1, (3, 4): 1, (2, 1): 1})


You then have access to methods to return the most common items, etc. See more
examples here:

http://docs.python.org/3.3/library/collections.html#collections.Counter


Good luck!
-Modulok-


--
http://mail.python.org/mailman/listinfo/python-list


Feature Request: `operator.not_in`

2013-04-19 Thread Matthew Gilson
I believe that I read somewhere that this is the place to start 
discussions on feature requests, etc.  Please let me know if this isn't 
the appropriate venue (and what the appropriate venue would be if you know).


This request has 2 related parts, but I think they can be considered 
seperately:


1) It seems to me that the operator module should have a `not_in` or 
`not_contains` function.  It seems asymmetric that there exists a 
`is_not` function which implements `x is not y` but there isn't a 
function to represent `x not in y`.


2) I suspect this one might be a little more controversial, but it seems 
to me that there should be a separate magic method bound to the `not in` 
operator.  Currently, when inspecting the bytecode, it appears to me 
that `not x in y` is translated to `x not in y` (this supports item 1 
slightly).  However, I don't believe this should be the case.  In 
python, `x  y` does not imply `not x = y` because a custom object can 
do whatever it wants with `__ge__` and `__lt__` -- They don't have to 
fit the normal mathematical definitions.  I don't see any reason why 
containment should behave differently.  `x in y` shouldn't necessarily 
imply `not x not in y`.  I'm not sure if `object` could have a default 
`__not_contains__` method (or whatever name seems most appropriate) 
implemented equivalently to:


 def __not_contains__(self,other):
  return not self.__contains__(other)

If not, it could probably be provided by something like 
`functools.total_ordering`.  Anyway, it's food for thought and I'm 
interested to see if anyone else feels the same way that I do.


Thanks,
~Matt
--
http://mail.python.org/mailman/listinfo/python-list


Re: Feature Request: `operator.not_in`

2013-04-19 Thread Matthew Gilson


On 4/19/13 2:27 PM, Terry Jan Reedy wrote:

On 4/19/2013 10:27 AM, Matthew Gilson wrote:
) It seems to me that the operator module should have a `not_in` or

`not_contains` function.  It seems asymmetric that there exists a
`is_not` function which implements `x is not y` but there isn't a
function to represent `x not in y`.


There is also no operator.in.


True.  I'm not arguing that there should be ...

There is operator.contains and operator.__contains__.


Thankfully :-)


There is no operator.not_contains because there is no __not_contains__ 
special method. (Your point two, which I disagree with.)


But there's also no special method `__is_not__`, but there's a 
corresponding `is_not` in the operator module so I don't really see that 
argument.  It's a matter of functionality that I'm thinking about in the 
first part.


 itertools.starmap(operator.not_in,x,y)

vs.

itertools.starmap(lambda a,b: a not in b,x,y)

Pretty much every other operator in python (that I can think of) has an 
analogous function in the operator module.



2) I suspect this one might be a little more controversial, but it seems
to me that there should be a separate magic method bound to the `not in`
operator.


The reference manual disagrees.
The operator not in is defined to have the inverse true value of in.


I would still leave that as the default behavior.  It's by far the most 
useful and commonly expected.  And I suppose if you *can't* have default 
behavior like that because that is a special case in itself, then that 
makes this second part of the request dead in the water at the outset 
(and I can live with that explanation).



 Currently, when inspecting the bytecode, it appears to me
that `not x in y` is translated to `x not in y` (this supports item 1
slightly).  However, I don't believe this should be the case. In
python, `x  y` does not imply `not x = y` because a custom object can
do whatever it wants with `__ge__` and `__lt__` -- They don't have to
fit the normal mathematical definitions.


The reason for this is that the rich comparisons do not have to return 
boolean values, and do not for numarray arrays which, I believe, 
implement the operators itemwise.


Yes, you're correct about numpy arrays behaving that way.  It can be 
very useful for indexing them.


It would also be fine for a special method `__not_contains__` to be 
expected to return a boolean value as well.  It could still be very 
useful.  Consider a finite square well from quantum mechanics.  I could 
define `in` for my particle in the square well to return `True` if there 
is a 70% chance that it is located in the well (It's a wave-function, so 
it doesn't have a well defined position -- the particle could be anyway, 
but 7 out of 10 measurements will tell me it's in the well).  It might 
be nice if I could define `not in` to be  `True` if there is only a 30% 
chance that it is in the well.  Of course, this leaves us with a 
no-man's land around the 50% mark.  Is it in the well or not?  There's 
no telling.  I'm sure you could argue that this sort of thing *could* be 
done with rich comparisons, but I would consider that a deflection from 
the point at hand.  It seems it should be up to the user to design the 
API most suited for their task.  Or what about a `Fraternity` class.  
Are the new pledges in the fraternity or not?  Maybe they should be 
considered neither in, nor out until pledge season is over.



 I don't see any reason why containment should behave differently.

'Design by analogy' is tricky because analogies often leave out 
important details. __contains__ *is* expected to return true/false.


 object.__contains__(self, item)
Called to implement membership test operators. Should return true 
if item is in self, false otherwise


--
Terry Jan Reedy




--
http://mail.python.org/mailman/listinfo/python-list