[issue30346] Odd behavior when unpacking `itertools.groupby`
Matthew Gilson added the comment: Tracking which group the grouper _should_ be on using an incrementing integer seems to work pretty well. In additional to the tests in `test_itertools.py`, I've gotten the following tests to pass: class TestGroupBy(unittest.TestCase): def test_unpacking(self): iterable = 'AB' (_, a), (_, b) = groupby(iterable) self.assertEqual(list(a), []) self.assertEqual(list(b), []) def test_weird_iterating(self): g = groupby('AB') _, a = next(g) _, b = next(g) _, aa = next(g) self.assertEqual(list(a), []) self.assertEqual(list(b), []) self.assertEqual(list(aa), list('A')) If I was to submit this as a PR, 1. where would I want to add these tests? 2. should I update the documentation for the "equivalent" python version to match exactly? -- keywords: +patch Added file: http://bugs.python.org/file46860/groupby-fix.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30346> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30346] Odd behavior when unpacking `itertools.groupby`
Matthew Gilson added the comment: Oops. You don't need to pass `self.currvalue` to `_grouper`. I didn't end up using it in there... -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30346> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30346] Odd behavior when unpacking `itertools.groupby`
Matthew Gilson added the comment: I think that works to solve the problem that I pointed out. In my stack overflow question (http://stackoverflow.com/a/43926058/748858) it has been pointed out that there are other opportunities for weirdness here. Specifically, if if I skip processing 2 groups and then I process a third group whose key is the same as the first: inputs = [(x > 5, x) for x in range(10)] inputs += [(False, 10), (True, 11)] g = groupby(inputs2 + [(True, 11)], key=itemgetter(0)) _, a = next(g) _, b = next(g) _, c = next(g) print(list(a)) print(list(b)) Both `a` and `b` should probably be empty at this point, but they aren't. What if you kept track of the last iterable group and just consumed it at whenever `next` is called? I think then you also need to keep track of whether or not the input iterable has been completely consumed, but that's not too bad either: _sentinel = object() class groupby: # [k for k, g in groupby('BBBCCDAABBB')] --> A B C D A B # [list(g) for k, g in groupby('BBBCCD')] --> BBB CC D def __init__(self, iterable, key=None): if key is None: key = lambda x: x self.keyfunc = key self.it = iter(iterable) self.last_group = self.currkey = self.currvalue = _sentinel self.empty = False def __iter__(self): return self def __next__(self): if self.last_group is not _sentinel: for _ in self.last_group: pass if self.empty: raise StopIteration if self.currvalue is _sentinel: try: self.currvalue = next(self.it) except StopIteration: self.empty = True raise self.currkey = self.keyfunc(self.currvalue) self.last_group = self._grouper(self.currkey, self.currvalue) return (self.currkey, self.last_group) def _grouper(self, tgtkey, currvalue): while self.currkey == tgtkey: yield self.currvalue try: self.currvalue = next(self.it) except StopIteration: self.empty = True return self.currkey = self.keyfunc(self.currvalue) I haven't tested this to make sure it passes the test suite -- I also don't know if this would have major performance implications or anything. If it did have severe performance implications, then it probably isn't worthwhile... -- nosy: +mgilson ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30346> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27470] -3 commandline option documented differently via man
New submission from Matthew Gilson: The man page for python says: > Warn about Python 3.x incompatibilities that 2to3 cannot trivially fix. The official documentation (https://docs.python.org/2/using/cmdline.html#cmdoption-3) does not mention 2to3 at all: > Warn about Python 3.x possible incompatibilities by emitting a > DeprecationWarning for features that are removed or significantly changed in > Python 3. This seems like a pretty big oversight when the following code issues no warnings (presumably because 2to3 can trivially handle this change): ``` from __future__ import print_function class test(object): def __nonzero__(self): return False t = test() if t: print('Hello') ``` -- assignee: docs@python components: Documentation messages: 269994 nosy: docs@python, mgilson priority: normal severity: normal status: open title: -3 commandline option documented differently via man versions: Python 2.7 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27470> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21746] urlparse.BaseResult no longer exists
New submission from Matthew Gilson: The BaseResult has been replaced by namedtuple. This also opens up all of the documented methods on namedtuple which would be nice to have as part of the API. I've taken a stab and re-writing the docs here. Feel free to use it (or not)... -- files: python_doc_patch.patch keywords: patch messages: 220425 nosy: mgilson priority: normal severity: normal status: open title: urlparse.BaseResult no longer exists Added file: http://bugs.python.org/file35612/python_doc_patch.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21746 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21746] urlparse.BaseResult no longer exists
Changes by Matthew Gilson m.gils...@gmail.com: -- assignee: - docs@python components: +Documentation nosy: +docs@python versions: +Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21746 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21746] urlparse.BaseResult no longer exists
Matthew Gilson added the comment: Sorry, forgot to remove the mention of BaseResult ... -- Added file: http://bugs.python.org/file35613/python_doc_patch.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21746 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21746] urlparse.BaseResult no longer exists
Matthew Gilson added the comment: This originally came up on stackoverflow: http://stackoverflow.com/questions/24200988/modify-url-components-in-python-2/24201020?noredirect=1#24201020 Would it be helpful if I also added a simple example to the docs as in the example there? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21746 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19934] collections.Counter.most_common does not document `None` as acceptable input.
New submission from Matthew Gilson: Reading the source for collections.Counter.most_common, the docstring mentions that `n` can be `None` or omitted, but the online documentation does not mention that `n` can be `None`. -- assignee: docs@python components: Documentation messages: 205648 nosy: docs@python, mgilson priority: normal severity: normal status: open title: collections.Counter.most_common does not document `None` as acceptable input. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19934 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19934] collections.Counter.most_common does not document `None` as acceptable input.
Matthew Gilson added the comment: This is a very simple patch which addresses the issue. I am still curious whether the reported function signature should be changed from: .. method:: most_common([n]) to: .. method:: most_common(n=None) . Any thoughts? Also, while I was in there, I changed a few *None* to ``None`` for consistency with the rest of the documentation. -- keywords: +patch Added file: http://bugs.python.org/file33050/mywork.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19934 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Non-identifiers in dictionary keys for **expression syntax
This is a question regarding the documentation around dictionary unpacking. The documentation for the call syntax (http://docs.python.org/3/reference/expressions.html#grammar-token-call) says: If the syntax **expression appears in the function call, expression must evaluate to a mapping, the contents of which are treated as additional keyword arguments. That's fine, but what is a keyword argument? According to the glossary (http://docs.python.org/3.3/glossary.html): /keyword argument/: an argument preceded by an identifier (e.g. name=) in a function call or passed as a value in a dictionary preceded by **. As far as I'm concerned, this leads to some ambiguity in whether the keys of the mapping need to be valid identifiers or not. Using Cpython, we can do the following: def func(**kwargs): print kwargs d = {'foo bar baz':3} So that might lead us to believe that the keys of the mapping do not need to be valid identifiers. However, the previous function does not work with the following dictionary: d = {1:3} because not all the keys are strings. Is there a way to petition to get this more rigorously defined? Thanks, ~Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Non-identifiers in dictionary keys for **expression syntax
On 05/23/2013 03:20 PM, Neil Cerutti wrote: On 2013-05-23, Matthew Gilson m.gils...@gmail.com wrote: That's fine, but what is a keyword argument? According to the glossary (http://docs.python.org/3.3/glossary.html): /keyword argument/: an argument preceded by an identifier (e.g. name=) in a function call or passed as a value in a dictionary preceded by **. As far as I'm concerned, this leads to some ambiguity in whether the keys of the mapping need to be valid identifiers or not. I don't see any ambiguity. A keyword argument is an argument preceded by an identifier according to the definition. Where are you perceiving wiggle room? The wiggle room comes from the or passed as a value in a dictionary clause. We sort of get caught in a infinite loop there because the stuff that can be passed in a dictionary is a keyword which is an identifer=expression or something passed as a value in a dictionary ... Also the fact that: func(**{foo bar baz:1}) works even though `foo bar baz` isn't a valid identifier, but: func(**{1:3}) doesn't work. -- http://mail.python.org/mailman/listinfo/python-list
Re: Non-identifiers in dictionary keys for **expression syntax
On 05/23/2013 04:52 PM, Terry Jan Reedy wrote: On 5/23/2013 2:52 PM, Matthew Gilson wrote: This is a question regarding the documentation around dictionary unpacking. The documentation for the call syntax (http://docs.python.org/3/reference/expressions.html#grammar-token-call) says: If the syntax **expression appears in the function call, expression must evaluate to a mapping, the contents of which are treated as additional keyword arguments. That's fine, but what is a keyword argument? According to the glossary (http://docs.python.org/3.3/glossary.html): /keyword argument/: an argument preceded by an identifier (e.g. name=) in a function call or passed as a value in a dictionary preceded by **. It appears that the requirement has been relaxed (in the previous quote), so that 'dictionary' should also be changed to 'mapping'. It might not hurt to add 'The key for the value should be an identifier.' As far as I'm concerned, this leads to some ambiguity in whether the keys of the mapping need to be valid identifiers or not. I think you are being too lawyerly. The pretty clear and sensible implication is that the key for the value should be a string with a valid identifier. If it is anything else, you are on your own and deserve any joy or pain that results ;=) Using Cpython, we can do the following: def func(**kwargs): print kwargs d = {'foo bar baz':3} So that might lead us to believe that the keys of the mapping do not need to be valid identifiers. There are two ways to pass args to func to be gathered into kwargs; explicit key=val pairs and **mapping, or both. func(a=1, b='hi', **{'foo bar baz':3}) # {'foo bar baz': 3, 'a': 1, 'b': 'hi'} So func should not expect anything other than identifier strings. However, the previous function does not work with the following dictionary: d = {1:3} because not all the keys are strings. So CPython checks that keys are strings, because that is cheap, but not that the strings are identifiers, because that would be more expensive. Just because an implementation allow somethings (omits a check) for efficiency does not mean you should do it. globals()[1] = 1 works, but is not normally very sensible or useful. Is there a way to petition to get this more rigorously defined? bugs.python.org The problem is that mandating a rigorous check by implementations makes Python slower to the detriment of sensible programmers To be clear, you're saying that func(**{'foo bar baz':3}) is not supported (officially), but it works in CPython because checking that every string in the dict is a valid identifier would be costly. Of course that is sensible and I don't think the behaviour should be changed to the detriment of sensible programmers. However, it would be nice if it was documented somewhere that the above function call is something that a non-sensible programmer would do. Perhaps with a CPython implementation detail type of block. -- http://mail.python.org/mailman/listinfo/python-list
Re: Pythonic way to count sequences
A Counter is definitely the way to go about this. Just as a little more information. The below example can be simplified: from collections import Counter count = Counter(mylist) With the other example, you could have achieved the same thing (and been backward compatible to python2.5) with from collections import defaultdict count = defaultdict(int) for k in mylist: count[k] += 1 On 4/25/13 9:16 PM, Modulok wrote: On 4/25/13, Denis McMahon denismfmcma...@gmail.com wrote: On Wed, 24 Apr 2013 22:05:52 -0700, CM wrote: I have to count the number of various two-digit sequences in a list such as this: mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] # (Here the (2,4) sequence appears 2 times.) and tally up the results, assigning each to a variable. ... Consider using the ``collections`` module:: from collections import Counter mylist = [(2,4), (2,4), (3,4), (4,5), (2,1)] count = Counter() for k in mylist: count[k] += 1 print(count) # Output looks like this: # Counter({(2, 4): 2, (4, 5): 1, (3, 4): 1, (2, 1): 1}) You then have access to methods to return the most common items, etc. See more examples here: http://docs.python.org/3.3/library/collections.html#collections.Counter Good luck! -Modulok- -- http://mail.python.org/mailman/listinfo/python-list
Feature Request: `operator.not_in`
I believe that I read somewhere that this is the place to start discussions on feature requests, etc. Please let me know if this isn't the appropriate venue (and what the appropriate venue would be if you know). This request has 2 related parts, but I think they can be considered seperately: 1) It seems to me that the operator module should have a `not_in` or `not_contains` function. It seems asymmetric that there exists a `is_not` function which implements `x is not y` but there isn't a function to represent `x not in y`. 2) I suspect this one might be a little more controversial, but it seems to me that there should be a separate magic method bound to the `not in` operator. Currently, when inspecting the bytecode, it appears to me that `not x in y` is translated to `x not in y` (this supports item 1 slightly). However, I don't believe this should be the case. In python, `x y` does not imply `not x = y` because a custom object can do whatever it wants with `__ge__` and `__lt__` -- They don't have to fit the normal mathematical definitions. I don't see any reason why containment should behave differently. `x in y` shouldn't necessarily imply `not x not in y`. I'm not sure if `object` could have a default `__not_contains__` method (or whatever name seems most appropriate) implemented equivalently to: def __not_contains__(self,other): return not self.__contains__(other) If not, it could probably be provided by something like `functools.total_ordering`. Anyway, it's food for thought and I'm interested to see if anyone else feels the same way that I do. Thanks, ~Matt -- http://mail.python.org/mailman/listinfo/python-list
Re: Feature Request: `operator.not_in`
On 4/19/13 2:27 PM, Terry Jan Reedy wrote: On 4/19/2013 10:27 AM, Matthew Gilson wrote: ) It seems to me that the operator module should have a `not_in` or `not_contains` function. It seems asymmetric that there exists a `is_not` function which implements `x is not y` but there isn't a function to represent `x not in y`. There is also no operator.in. True. I'm not arguing that there should be ... There is operator.contains and operator.__contains__. Thankfully :-) There is no operator.not_contains because there is no __not_contains__ special method. (Your point two, which I disagree with.) But there's also no special method `__is_not__`, but there's a corresponding `is_not` in the operator module so I don't really see that argument. It's a matter of functionality that I'm thinking about in the first part. itertools.starmap(operator.not_in,x,y) vs. itertools.starmap(lambda a,b: a not in b,x,y) Pretty much every other operator in python (that I can think of) has an analogous function in the operator module. 2) I suspect this one might be a little more controversial, but it seems to me that there should be a separate magic method bound to the `not in` operator. The reference manual disagrees. The operator not in is defined to have the inverse true value of in. I would still leave that as the default behavior. It's by far the most useful and commonly expected. And I suppose if you *can't* have default behavior like that because that is a special case in itself, then that makes this second part of the request dead in the water at the outset (and I can live with that explanation). Currently, when inspecting the bytecode, it appears to me that `not x in y` is translated to `x not in y` (this supports item 1 slightly). However, I don't believe this should be the case. In python, `x y` does not imply `not x = y` because a custom object can do whatever it wants with `__ge__` and `__lt__` -- They don't have to fit the normal mathematical definitions. The reason for this is that the rich comparisons do not have to return boolean values, and do not for numarray arrays which, I believe, implement the operators itemwise. Yes, you're correct about numpy arrays behaving that way. It can be very useful for indexing them. It would also be fine for a special method `__not_contains__` to be expected to return a boolean value as well. It could still be very useful. Consider a finite square well from quantum mechanics. I could define `in` for my particle in the square well to return `True` if there is a 70% chance that it is located in the well (It's a wave-function, so it doesn't have a well defined position -- the particle could be anyway, but 7 out of 10 measurements will tell me it's in the well). It might be nice if I could define `not in` to be `True` if there is only a 30% chance that it is in the well. Of course, this leaves us with a no-man's land around the 50% mark. Is it in the well or not? There's no telling. I'm sure you could argue that this sort of thing *could* be done with rich comparisons, but I would consider that a deflection from the point at hand. It seems it should be up to the user to design the API most suited for their task. Or what about a `Fraternity` class. Are the new pledges in the fraternity or not? Maybe they should be considered neither in, nor out until pledge season is over. I don't see any reason why containment should behave differently. 'Design by analogy' is tricky because analogies often leave out important details. __contains__ *is* expected to return true/false. object.__contains__(self, item) Called to implement membership test operators. Should return true if item is in self, false otherwise -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list