Re: [Python-ideas] More user-friendly version for string.translate()
On Wed, Nov 2, 2016 at 12:02 PM, Mikhail Vwrote: > Actually even with ASCII (read for python 2.7) I would also be happy > to have such function: say I just want to keep only digits so I write: > > digits = "0123456789" > newstring = somestring.keep(digits) > well, with ascii, it's not too hard to make a translation table: digits = "0123456789" table = [(o if chr(o) in digits else None )for o in range(256)] s = "some stuff and some 456 23 numbers 888" s.translate(table) '45623888' but then there is the defaultdict way: s.translate(defaultdict(*lambda*: *None*, {ord(c):c *for* c *in* digits}.items())) '45623888' wasn't that easy? Granted, if you need to do this, you'd wrap it in a function like Chris A. Suggested. But this really isn't easy or discoverable -- it took me a fair bit of fidlding to get right, and I knew I was looking for a defaultdict implementation. Also: In [43]: table Out[43]: defaultdict(>, {48: '0', 49: '1', 50: '2', 51: '3', 52: '4', 53: '5', 54: '6', 55: '7', 56: '8', 57: '9'}) In [44]: s.translate(table) Out[44]: '45623888' In [45]: table Out[45]: defaultdict(>, {32: None, 48: '0', 49: '1', 50: '2', 51: '3', 52: '4', 53: '5', 54: '6', 55: '7', 56: '8', 57: '9', 97: None, 98: None, 100: None, 101: None, 102: None, 109: None, 110: None, 111: None, 114: None, 115: None, 116: None, 117: None}) defaultdict puts an entry in for every ordinal checked -- this could get big -- granted, probaly nt a big deal with modern computer memory, but still... it might even be worth making a NoneDict for this: class NoneDict(dict): """ Dictionary implementation that always returns None when a key is not in the dict, rather than raising a KeyError """ def __getitem__(self, key): try: val = dict.__getitem__(self, key) except KeyError: val = None return val (see enclosed -- it works fine with translate) (OK, that was fun, but no, not really that useful) Despite I can do it other way, this would be much simpler and clearer > way to do it. And I suppose it is quite common task not only for me. > That's the key question -- is this a common task? If so, then whie there are ways to do it, they're not easy nor discoverable. And while some of the guiding principles of this list are: "not every two line function needs to be in the standard lib" and "put it up on PYPi, and see if a lot of people find it useful" It's actually kind of silly to put a single function up as a PyPi package -- and I doubt many people will find it if you did. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov class NoneDict(dict): """ Dictionary implimentaton that always returns None when a key is not in the dict, rather than raising a KeyError """ def __getitem__(self, key): try: val = dict.__getitem__(self, key) except KeyError: val = None return val def test_basic(): ''' some simple tests ''' test_dict = {'a': 23, 'b': 45, } arbitrary_keys = [23, 'that', (1, 2, 3)] d = NoneDict(test_dict) # do the ones in there work: for key, val in test_dict.items(): print("trying:", key, val) print(d[key]) assert d[key] == val for key in arbitrary_keys: print("trying:", key) assert d[key] is None # did the dict length change? assert len(d) == len(test_dict) print("all tests pass") def test_with_translate(): """ tests using a NoneDict with str.translate() """ digits = "0123456789" # make a translate table: table = NoneDict({ord(c): c for c in digits}) s = "some stuff and some 456 23 numbers 888" final = s.translate(table) for c in final: assert c in digits print("it works with translate") if __name__ == "__main__": test_basic() test_with_translate() ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 27 October 2016 at 00:17, Chris Barkerwrote: > 1) an easy way to spell "remove all the characters other than these" > > I think that's a good idea. What with unicode having an enormous number > of code points, it really does make sense to have a way to specify > only what you >want, rather than what you don't want. > > Back in the good old days of 1-byte chars, it wasn't hard to build up > a full 256 element translate table -- not so much anymore. > And one of the whole points of str.translate() is good performance. Actually even with ASCII (read for python 2.7) I would also be happy to have such function: say I just want to keep only digits so I write: digits = "0123456789" newstring = somestring.keep(digits) Despite I can do it other way, this would be much simpler and clearer way to do it. And I suppose it is quite common task not only for me. Currently 99% of my programs are in python 2.7. And I started to use python 3 only for tasks when I want to process unicode strings (ironically only to get rid of unicode). Mikhail ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Tue, Nov 1, 2016 at 12:15 AM, Stephen J. Turnbull < turnbull.stephen...@u.tsukuba.ac.jp> wrote: > > pretty slick -- but any hope of it being as fast as a C implemented > method? > > I would expect not in CPython, but if "fast" matters, why are you > using CPython rather than PyPy or Cython? oh come on! > If it matters *that* much, > you can afford to write your own C implementation. This is about a possible addition to the stdlib -- me writing my own C implementation has nothing to do with it. > But I doubt that > fast matters "that much" often enough to be worth maintaining yet > another string method in Python. This could be said about every string method in Python -- I understand that every addition is more code to maintain. But somehow we are adding all kinds of stuff like yet another string formatting method, talking about null coalescing operators and who knows what -- those are all a MUCH larger burden -- not just for maintaining the interpreter, but for everyone using python having more to remember and understand. On the other hand, powerful and performant string methods are a major plus for Python -- a good reason to us it over Perl :-) So an new one that provides, as I write before: > 1) single method call to do a common thing > > > > 2) nice fast, pure C performance > would fit right into to Python, and indeed, would be a similar implementation to existing methods -- so the maintenance burden would be a small addition (i.e if the internal representation for strings changed, all those methods would need re-visiting and similar changes) So the only key question is -- is the a common enough use case? > so I think a "keep these" method would help with both of these > > goals. > > Sure, but the translate method already gives you that, and a lot more. > yes but only with the fairly esoteric use of defaultdict. which brings me back to the above: 1) single method call to do a common thing the nice thing about a single method call is discoverability -- no newbie is going to figure out the .translate + defaultdict approach. > Note that when you're talking about working with Unicode characters, > no natural language activity I can imagine (not even translating > Buddhist texts, which involves a couple of Indian scripts as well as > Han ideographs) uses more than a fraction of defined characters. > which is why you may want to remove all the others :-) So really translate with defaultdict is a specialized loop that > marries an algorithmic body (which could do things like look up the > original script or other character properties to decide on the > replacement for the generic case) with a (usually "small") table of > exceptions. That seems like inspired design to me. > indeed -- .translate() itself is remarkably flexible -- you could even pas in a custom class that does all sorts of logic. and adding the defaultdict is an easy way to add a useful feature. But again, advanced usage and not very discoverable. Maybe that means we need some more docs and/or perhaps recipes instead. Anyway, I joined this thread to clarify what might be on the table -- but while I think it's a good idea, I dont have the bandwidth to move it through the process -- so unless someone steps up that does, we're done. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Fri, Oct 28, 2016 at 7:28 AM, Terry Reedywrote: > >>> s = 'kjskljkxcvnalsfjaweirKJZknzsnlkjsvnskjszsdscccjasfdjf' > >>> s2 = ''.join(c for c in s if c in set('abc')) > pretty slick -- but any hope of it being as fast as a C implemented method? for example, with a 1000 char string: In [59]: % timeit string.translate(table) 10 loops, best of 3: 3.62 µs per loop In [60]: % timeit ''.join(c for c in string if c in set(letters)) 1000 loops, best of 3: 1.14 ms per loop so the translate() method is about 300 times faster in this case. (and it used a defaultdict with a None factory, which is probably a bit slower than a pure C implementation might be. I've always figured that Python's rich string methods provided two things: 1) single method call to do common things 2) nice fast, pure C performance so I think a "keep these" method would help with both of these goals. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Sat, Oct 29, 2016 at 1:28 AM, Terry Reedywrote: > If one has a translation dictionary d, use that in twice in the genexp. > d = {'a': '1', 'b': '3x', 'c': 'fum'} ''.join(d[c] for c in s if c in d.keys()) > 'fum11fumfumfum1' Trivial change: >>> ''.join(d[c] for c in s if c in d) 'fum11fumfumfum1' ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 10/26/2016 6:17 PM, Chris Barker wrote: I"ve lost track of what (If anything) is actually being proposed here... so I"m going to try a quick summary: 1) an easy way to spell "remove all the characters other than these" In other words, 'only keep these'. We already have easy ways to create filtered strings. >>> s = 'kjskljkxcvnalsfjaweirKJZknzsnlkjsvnskjszsdscccjasfdjf' >>> s2 = ''.join(c for c in s if c in set('abc')) >>> s2 'caaccca' >>> s3 = ''.join(filter(lambda c: c in set('abc'), s)) >>> s3 'caaccca' I expect the first to be a bit faster. Either can be wrapped in a keep() function. If one has a translation dictionary d, use that in twice in the genexp. >>> d = {'a': '1', 'b': '3x', 'c': 'fum'} >>> ''.join(d[c] for c in s if c in d.keys()) 'fum11fumfumfum1' -- Terry Jan Reedy ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
>>return string.translate(collections.defaultdict(lambda: None, **table)) Nice! I forgot about defautdict -- so this just needs a recipe somewhere -- maybe even in the docs for str.translate. BTW, great use case for defautdict -- I had been wondering what the point was, given that a regular dict as .setdefault -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Thu, Oct 27, 2016 at 8:48 AM, Mikhail Vwrote: > On 26 October 2016 at 20:58, Stephen J. Turnbull > wrote: >>import collections >>def translate_or_drop(string, table): >>""" >>string: a string to process >>table: a dict as accepted by str.translate >>""" >>return string.translate(collections.defaultdict(lambda: None, **table)) > >>All OK now? > > Not really. I tried with a simple example > intab = "ae" > outtab = "XM" > table = string.maketrans(intab, outtab) > collections.defaultdict(lambda: None, **table) > > an this gives me > TypeError: type object argument after ** must be a mapping, not str > > But I probably I misunderstood the idea. You're 99% of the way to understanding it. Try the exercise again in Python 3. You don't have string.maketrans (which creates a 256-byte translation mapping) - instead, you use a dictionary. ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Wed, Oct 26, 2016 at 5:32 PM, Mikhail Vwrote: > > (b) has the advantage of adding translation and removal in one fell > swoop -- > > but if you only want to remove, then you have to make a translation > table of > > 1:1 mappings = not hard, but a annoying: > > Exactly that is the proposal. And for same exact reason that you point out, > I also can't give a comment what would be better. It would be indeed > quite strange from syntactical POV if I just want to remove "all except" > and must call translate(). So ideally both should exist I think. > That kind of violate OWTDI though. Probably one's enough. and if fact with the use-cases I can think of, and the one you mentioned, they are really two steps: there are the characters you want to translate, and the ones you want to keep, but the ones you want to keep are a superset of the ones you want to translate. so if we added the "remove"option to .translate(), then you would need to add all the "keep" charactors to your translate table. I'm thinking they really are different operations, give them a different method. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Wed, Oct 26, 2016 at 3:48 PM, MRABwrote: > str.replace( ("aaa", "a", "b"), ("b", "bbb", "a") >> >> and all sort of other complications! >> >> > 2) Check from the longest to the shortest. > > If you're going to pick choice 2, does it have to be 2 tuples/lists? Why > not a dict instead? > then we have a string.translate() that accepts a table of string replacements, rather than individual character replacements -- maybe a good idea! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 27 October 2016 at 00:17, Chris Barkerwrote: > I"ve lost track of what (If anything) is actually being proposed here... so > I"m going to try a quick summary: > > > 1) an easy way to spell "remove all the characters other than these" > > I think that's a good idea. What with unicode having an enormous number of > code points, it really does make sense to have a way to specify only what > you want, rather than what you don't want. > > Back in the good old days of 1-byte chars, it wasn't hard to build up a full > 256 element translate table -- not so much anymore. And one of the whole > points of str.translate() is good performance. > > a) a new method: > >str.remove_all_but(sequence_of_chars) > (naming TBD) > > b) a new flag in translate (Kind of like the decode keywords) > > str.translate(table, missing='ignore'|'remove') > > > (b) has the advantage of adding translation and removal in one fell swoop -- > but if you only want to remove, then you have to make a translation table of > 1:1 mappings = not hard, but a annoying: Exactly that is the proposal. And for same exact reason that you point out, I also can't give a comment what would be better. It would be indeed quite strange from syntactical POV if I just want to remove "all except" and must call translate(). So ideally both should exist I think. Mikhail ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 2016-10-26 23:17, Chris Barker wrote: I"ve lost track of what (If anything) is actually being proposed here... so I"m going to try a quick summary: 1) an easy way to spell "remove all the characters other than these" I think that's a good idea. What with unicode having an enormous number of code points, it really does make sense to have a way to specify only what you want, rather than what you don't want. Back in the good old days of 1-byte chars, it wasn't hard to build up a full 256 element translate table -- not so much anymore. And one of the whole points of str.translate() is good performance. a) a new method: str.remove_all_but(sequence_of_chars) (naming TBD) b) a new flag in translate (Kind of like the decode keywords) str.translate(table, missing='ignore'|'remove') c) pass a function that returns the replacement: def replace(c): return c.upper() if c.isalpha() else '' str.translate(replace) The replacement function could be called only on distinct codepoints. (b) has the advantage of adding translation and removal in one fell swoop -- but if you only want to remove, then you have to make a translation table of 1:1 mappings = not hard, but a annoying: table = {c:c for c in sequence_of_chars} I'm on the fence about what I personally prefer. 2) (in another thread, but similar enough) being able to pass in more than one string to replace: str.replace( old=seq_of_strings, new=seq_of_strings ) I know I've wanted this a lot, and certainly from a performance perspective, it could be a nice bonus. But: It overlaps a lot with str.translate -- at least for single character replacements. so really why? so it would really only make sense if supported multi-char strings: str.replace(old = ("aword", "another_word"), ("something", "something else")) However: a string IS a sequence of strings, so we'd have confusion about that: str.replace("this", "four") Does the user want the word "this" replaced with the word "four" -- or do they want each character replaced? Maybe we'd need a .replace_many() method? ugh! There are also other issues with what to di with repeated / overlapping cahractors: str.replace( ("aaa", "a", "b"), ("b", "bbb", "a") and all sort of other complications! Possible choices are: 1) Use the given order. 2) Check from the longest to the shortest. If you're going to pick choice 2, does it have to be 2 tuples/lists? Why not a dict instead? THAT I think could be nailed down by defining the "order of operations" Does it lop through the entire string for each item? or through each item for each point in the string? note that if you loop thorugh the entire string for each item, you might as well have written the loop yourself: for old, new in sip(old_list, new_list): s = s.replace(old, new)) and at least if the length of the string si long-ish, and the number of replacements short-ish -- performance would be fine. *** So the question is -- is there support for these enhancements? If so, then it would be worth hashing ot the details. But the next question is -- does anyone care enough to manage that process -- it'll be a lot of work! NOTE: there has also been a fair bit of discussion in this thread about ordinals vs characters, and unicode itself -- I don't think any of that resulted in any possible proposals... [snip] ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 26 October 2016 at 20:58, Stephen J. Turnbullwrote: >import collections >def translate_or_drop(string, table): >""" >string: a string to process >table: a dict as accepted by str.translate >""" >return string.translate(collections.defaultdict(lambda: None, **table)) >All OK now? Not really. I tried with a simple example intab = "ae" outtab = "XM" table = string.maketrans(intab, outtab) collections.defaultdict(lambda: None, **table) an this gives me TypeError: type object argument after ** must be a mapping, not str But I probably I misunderstood the idea. Anyway this code does not make much sence to me, I would never in life understand what is meant here. And in my not so big, but not so small, Python experience I *never* had an occasion using collections or lambda. >sets as a single, universal character set. As it happens, although >there are differences of opinion over how to handle Unicode in Python, >there is consensus that Python does have to handle Unicode flexibly, >effectively and efficiently. > I was merely talking about syntax and sources files standard, not about unicode strings. No doubt one needs some way to store different glyph sets. So I was talking about that if one defines a syntax and has good intentions for readability in mind, there is not so many rationale to adopt the syntax to current "hybrid" system: 7-bit and/or multibyte paradigm. Again this a too far going discussion, but one should not probably much look ahead on those. The situation is not so good in this sense that most standard software is attached to this strange paradigm (even those which does not have anything to do with multi-lingual typography). So IMO something gone wrong with those standard characters. >If you insist on bucking it, you'll >have to do it pretty much alone, perhaps even maintaining your own >fork of Python. As for me I would take the path of developing of own IDE which will enable typografic quality rendering and of course all useful glyphs, such as curly quotes, bullets, etc, which all is fundamental to any possible improvements of cognitive qualities of code. And I'll stay in 8-bit boundaries, thats for sure. So if Python will take the path of "unicode" code input (e.g. for some punctuaion characters) this would only add a minor issue for generating valid Python source files in this case. Mikhail ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
Mikhail V writes: > >That said, multiple methods is a valid option for the API. > > Certainly I like the look of distinct functions more. > It allows me to visually parse the code effectively, > so e.g. for str.remove() I would not need to look > in docs to understand what the function does. OK, as I said, you're in accord with Guido on that. His rationale is somewhat different, but that's OK. > Just in some cases I need to convert them to numpy arrays back and > forth, so this unicode vanity worries me a bit. I think you're borrowing trouble you actually don't have. Either way, the rest of the world *needs* Unicode to do their work, and it's not going to go away. On the positive side, turning a string into a list of codepoints is trivial: [ord(c) for c in string] > So I am just not the one who believes in these maximalistical "we > need over 9000 glyphs" talks. But you don't need to believe in it. What you do need to believe is that the rest of us believe that we need the union of our character sets as a single, universal character set. As it happens, although there are differences of opinion over how to handle Unicode in Python, there is consensus that Python does have to handle Unicode flexibly, effectively and efficiently. Believe me, it *is* a consensus. If you insist on bucking it, you'll have to do it pretty much alone, perhaps even maintaining your own fork of Python. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
Mikhail V writes: > I need translate() which drops non-defined chars. Please :) import collections def translate_or_drop(string, table): """ string: a string to process table: a dict as accepted by str.translate """ return string.translate(collections.defaultdict(lambda: None, **table)) All OK now? ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Wed, Oct 26, 2016 at 04:29:13AM +0200, Mikhail V wrote: > I need translate() which drops non-defined chars. Please :) > No optimisation, no new syntax. deal? I still wonder whether this might be worth introducing as a new string method, or an option to translate. But the earliest that will happen is Python 3.7, so in the meantime, something like this should be enough: # untested keep = "abcdßαβπд∞" text = "..." # Find all the characters in text that are not in keep: delchars = set(text) - set(keep) delchars = ''.join(delchars) text = text.translate(str.maketrans("", "", delchars)) -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 26 October 2016 at 03:40, Steven D'Apranowrote: > in a "table.txt" file, and typing: > > { > 123: 456, > 124: 457, > 125: 458, > # two hundred more lines > } > > > in a "table.py" file? The difference is insignificant. And the Python > version can be cleaned up: > Ok, you have opened my eyes here. Thank you, you re good. > [...] >> Motivation is that those can be optimised for speed > > That's not a motivation. Why are you talking about "optimizing for > speed" functions that we have not yet established are needed? > > That reminds me of a story I once heard of somebody who was driving > across the desert in the US once. One of his passengers noticed the > highway signs and said "Wait, aren't we going the wrong way?" The driver > replied "Who cares, we're making fantastic time!" > > Optimizing a function you don't need is not an optimization. It is a > waste of time. Making good time is important indeed! I need translate() which drops non-defined chars. Please :) No optimisation, no new syntax. deal? Mikhail ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Tue, Oct 25, 2016 at 05:15:58PM +0200, Mikhail V wrote: [...] > >Or it can take a mapping (usually a dict) that maps either characters or > >ordinal numbers to a new string (not just a single character, but an > >arbitrary string) or ordinal numbers. > > > >str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43}) > > >(or None, to delete them). Note the flexibility: you don't need to > > Good. But of course if I do it with big tables, I would anyway > need to parse them from some table file. Typing all values > direct in code is not a comfortable way. Why not? What is the difference between typing 123: 456 124: 457 125: 458 # two hundred more lines in a "table.txt" file, and typing: { 123: 456, 124: 457, 125: 458, # two hundred more lines } in a "table.py" file? The difference is insignificant. And the Python version can be cleaned up: for i in range(123, 333): table[i] = 456 - 123 + i Not all data whould be written as code, especially if you expect unskilled users to edit it, but generating data directly in code is a very powerful technique, and the strict syntax of the programming language helps prevent some errors. [...] > Motivation is that those can be optimised for speed That's not a motivation. Why are you talking about "optimizing for speed" functions that we have not yet established are needed? That reminds me of a story I once heard of somebody who was driving across the desert in the US once. One of his passengers noticed the highway signs and said "Wait, aren't we going the wrong way?" The driver replied "Who cares, we're making fantastic time!" Optimizing a function you don't need is not an optimization. It is a waste of time. -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 25 October 2016 at 19:10, Stephen J. Turnbullwrote: > So my previous thought on it was, that there could be set of such functions: > > str.translate_keep(table) - this is current translate, namely keeps > non-defined chars untouched > str.translate_drop(table) - all the same, but dropping non-defined chars > > Probaly also a pair of functions without translation: > str.remove(chars) - removes given chars > str.keep(chars) - removes all, except chars > > Motivation is that those can be optimised for speed and I suppose those > can work faster than re.sub(). >That said, multiple methods is a valid option for the API. Eg, Guido >generally prefers that distinctions that can't be made on type of >arguments (such as translate_keep vs translate_drop) be done by giving >different names rather than a flag argument. Do you *like* this API, >or was this motivated primarily by the possibilities you see for >optimization? Certainly I like the look of distinct functions more. It allows me to visually parse the code effectively, so e.g. for str.remove() I would not need to look in docs to understand what the function does. It has its downside of course, since new definitions can accidentally be similar to current ones, so more names, more the probability that no good names are left. Speed is not so important for majority of cases, at least for my current tasks. However if I'll need to process very large texts (seems like I will), speed will be more important. >The width is constant for any given string. However, I don't see at >this point that you'll need more than the functions available in >Python already, plus one or more wrappers to marshal the information >your API accepts to the data that str.translate wants. Just in some cases I need to convert them to numpy arrays back and forth, so this unicode vanity worries me a bit. But I cannot clearly explain why exactly I need this. > >> but as said I don't like very much the idea and would be OK for me to > >> use numeric values only. > Yeah I am strange. This however gives you guarantee for any environment that you > can see and input them ans save the work in ASCII. >This is not going to be a problem if you're running Python and can >enter the program and digits. In any case, the API is going to have >to be convenient for all the people who expect that they will never >again be reduced to a hex keypad and 7-segment display Here I will dare to make a lyrical degression again. It could have made an impression that I am stuck in nineties or something. But that is not the case. In nineties I used the PC mostly to play Duke Nukem (yeh big times!). And all the more I hadnt any idea what is efficiency of information representation and readability. Now I kind of realize it. So I am just not the one who believes in these maximalistical "we need over 9000 glyphs" talks. And, somewhat prophetic view on this: with the come of cyber era this all be flushed so fast, that all this diligences around unicode could look funny actually. And a hex keypad will not sound "retro" but "brand new". In other words: I feel really strong that nothin besides standard characters must appear in code sources. If one wants to process unicode, then parse them as resources. So please, at least out of respect to rationally minded, don't make a code look like a christmas-tree. BTW, I use VIM to code actually so anyway I will not see them in my code. Mikhail ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Mon, Oct 24, 2016 at 07:39:16PM +0200, Mikhail V wrote: > Hello all, > > I would be happy to see a somewhat more general and user friendly > version of string.translate function. > It could work this way: > string.newtranslate(file_with_table, Drop=True, Dec=True) That's an interesting concept for "user friendly". Apart from functions that are actually designed to read files of a particular format, can you think of any built-in functions that take a file as argument? This is how you would use this "user friendly version of translate": path = '/tmp/table' # hope no other program is using it... with open(path, 'w') as f: f.write('97{65}\n') f.write('98{66}\n') f.write('99{67}\n') with open(path, 'r') as f: new_string = old_string.newtranslate(f, False, True) Compared to the existing solution: new_string = old_string.translate(str.maketrans('abc', 'ABC')) Mikhail, I appreciate that you have many ideas and want to share them, but try to think about how those ideas would work. The Python standard library is full of really well-designed programming interfaces. You can learn a lot by thinking "what existing function is this like? how does that existing function work?". str.translate and str.maketrans already exist. Look at how maketrans builds a translation table: it can take either two equal length strings, and maps characters in one to the equivalent character in the other: str.maketrans('abc', 'ABC') Or it can take a mapping (usually a dict) that maps either characters or ordinal numbers to a new string (not just a single character, but an arbitrary string) or ordinal numbers. str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43}) (or None, to delete them). Note the flexibility: you don't need to specify ahead of time whether you are specifying the ordinal value as a decimal, hex, octal or binary value. Any expression that evaluates to a string or a int within the legal range is valid. That's a good programming interface. Could it be better? Perhaps. I've suggested that maybe translate could automatically call maketrans if given more than one argument. Maybe there's an easier way to just delete unwanted characters. Perhaps there could be a way to say "any character not in the translation table should be dropped". These are interesting questions. > Further thoughts: for 8-bit strings this should be simple to implement > I think. I doubt that these new features will be added to bytes as well as strings. For 8-bits byte strings, it is easy enough to generate your own translation and deletion tables -- there are only 256 values to consider. > For 16-bit of course > there is issue of memory usage for lookup tables, but the gurus could > probably optimise it. There are no 16-bit strings. Unicode is a 21-bit encoding, usually encoded as either fixed-width sequence of 4-byte code units (UTF-32) or a variable-width sequence of 2-byte (UTF-16) or 1-byte (UTF-8) code units. But it absolutely is not a "16-bit string". [...] > but as said I don't like very much the idea and would be OK for me to > use numeric values only. I think you are very possibly the only Python programmer in the world who thinks that writing decimal ordinal values is more user-friendly than writing the actual character itself. I know I would much rather see $, π or ╔ than 36, 960 or 9556. -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
> Just a pair of usage cases which I was facing in my practice: > So I just define a table like: > { > 1072: 97 > 1073: 98 > 1074: 99 > ... > [which localizes Cyrillic into ASCII] > ... > 97:97 > 98:98 > 99:99 > ... > [those chars that are OK, leave them] > } > > Then I use os.walk() and os.rename() and voila! the file system > regains it virginity > in one simple script. This sounds like a perfect use case for str.translate() as it is. > 2. Say I have a multi-lingual file or whatever, I want to filter out > some unwanted > characters so I can do it similarly. Filtering out is different-- but I would think that you would want replace, rather than remove. If you wanted names to all comply with a given encoding (ascii or Latin-1, or...), then encoding/decoding (with error set to replace) would do nicely. -CHB > > > Mikhail ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
>> > re.sub('[^0-9]', '', 'ab0c2m3g5') >> '0235' >> >> Possibly because there's a lot of good Python builtins that allow you >> to avoid the re module when *not* needed, it's easy to forget it in >> the cases where it does pretty much exactly what you want, There is a LOT of overhead to figuring out how to use the re module. I've always though t it had it's place, but it sure seems like overkill for something this seemingly simple. If (a big if) removing "all but these" was a common use case, it would be nice to have a way to do it with string methods. This is a classic case of: Put it on PyPi, and see how much interest it garners. -CHB ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 24 October 2016 at 23:10, Paul Moorewrote: > On 24 October 2016 at 21:54, Chris Barker wrote: >> I don't know a way to do "remove every character except these", but someone >> I expect there is a way to do that efficiently with Python strings. > > It's easy enough with the re module: > re.sub('[^0-9]', '', 'ab0c2m3g5') > '0235' > > Possibly because there's a lot of good Python builtins that allow you > to avoid the re module when *not* needed, it's easy to forget it in > the cases where it does pretty much exactly what you want, or can be > persuaded to do so with much less difficulty than rolling your own > solution (I know I'm guilty of that...). > > Paul Thanks, this would solve the task of course. However for example in the case in my last example (filenames) this would require: - Write a function to construct the expression for "all except given" characters from my table. This could be easy I believe, but still another task. Then: 1. Apply translate() with my table to the string. 2. Apply re.sub() to the string. I usually start using RE when I want to find/replace words or patterns, but not translate/filter the characters directly. So since there is already an "inclusive" translate() then probably having an "exclusive" one is not a bad idea. I believe it is something very similar in implementation, so instead of appending next character which is not in the table, it simply does nothing. Mikhail ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 24 October 2016 at 22:54, Chris Barkerwrote: > On Mon, Oct 24, 2016 at 1:30 PM, Mikhail V wrote: >> >> But how would you with current translate function drop all characters >> that are not in the table? > > > that is another question altogether, and one for a different list, actually. > > I don't know a way to do "remove every character except these", but someone > I expect there is a way to do that efficiently with Python strings. > > you could probably (ab)use the codecs module, though. > > If there really is no way to do it, then you might have feature worth > pursuing, but be prepared with use-cases! > > The only use-case I've had for that sort of this is when I want only ASCII > -- but I can uses the ascii codec for that :-) > >> This for example >> is needed for filtering out all non-standard characters from paths, etc. > > > You'd usually want to replace those with something, rather than remove them > entirely, yes? Just a pair of usage cases which I was facing in my practice: 1. Imagine I perform some admin tasks in a company with very different users who also tend to name the files as they wish. So only God knows what can be there in filenames. And I know foe example that there can be Cyrillic besides ASCII their. So I just define a table like: { 1072: 97 1073: 98 1074: 99 ... [which localizes Cyrillic into ASCII] ... 97:97 98:98 99:99 ... [those chars that are OK, leave them] } Then I use os.walk() and os.rename() and voila! the file system regains it virginity in one simple script. 2. Say I have a multi-lingual file or whatever, I want to filter out some unwanted characters so I can do it similarly. Mikhail ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 24 October 2016 at 20:02, Chris Barkerwrote: > On Mon, Oct 24, 2016 at 10:50 AM, Ryan Birmingham > wrote: >> >> I also believe that using a text file would not be the best solution; >> using a dictionary, > > > actually, now that you mention it -- .translate() already takes a dict, so > if youw ant to put your translation table in a text file, you can use a dict > literal to do it: > > # contents of file: > > > { > 32: 95, > > 105: 64, > 115: 36, > } > > then use it: > > s.translate(ast.literal_eval(open("trans_table.txt").read())) > > now all you need is a tiny little utility function: > > def translate_from_file(s, filename): > return s.translate(ast.literal_eval(open(filename).read())) > > > :-) > > -Chris > Yes making special file format is not a good option I agree. Also of course it does not have sence to read it everytime if translate is called in a loop with the same table. So it was merely a sketch of behaviour. But how would you with current translate function drop all characters that are not in the table? so I can pass [deletechars] to the function but this seems not very convenient to me -- very often I want to drop them *all*, excluding some particular values. This for example is needed for filtering out all non-standard characters from paths, etc. So in other words, there should be an option to control this behavior. Probably I am missing something here, but I didn't find such solution for translate() and that is main point of proposal actually. It is all the same as translate() but with this extension it can cover much more usage cases. Mikhail ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On 24 October 2016 at 18:39, Mikhail Vwrote: > I would be happy to see a somewhat more general and user friendly > version of string.translate function. > It could work this way: > string.newtranslate(file_with_table, Drop=True, Dec=True) Using a text file seems very odd. But regardless, this could *easily* be published on PyPI, and then if it gained enough users be proposed for the stdlib. I don't think there's anything like sufficient value to warrant "fast-tracking" something like this direct to the stdlib. And real-world use via PyPI would very quickly establish whether the unusual "pass a file with a translation table in it" design was acceptable to users. Paul ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
On Mon, Oct 24, 2016 at 10:50 AM, Ryan Birminghamwrote: > I also believe that using a text file would not be the best solution; > using a dictionary, > actually, now that you mention it -- .translate() already takes a dict, so if youw ant to put your translation table in a text file, you can use a dict literal to do it: # contents of file: > { 32: 95, > 105: 64, 115: 36, } then use it: s.translate(ast.literal_eval(open("trans_table.txt").read())) now all you need is a tiny little utility function: def translate_from_file(s, filename): return s.translate(ast.literal_eval(open(filename).read())) :-) -Chris > > > > other data structure, or anonomyous function would make more sense than > having a specially formatted file. > > On Oct 24, 2016 13:45, "Chris Barker" wrote: > >> my thought on this: >> >> If you need translate() you probably can write the code to parse a text >> file, and then you can use whatever format you want. >> >> This seems a very special case to build into the stdlib. >> >> -CHB >> >> >> >> >> On Mon, Oct 24, 2016 at 10:39 AM, Mikhail V wrote: >> >>> Hello all, >>> >>> I would be happy to see a somewhat more general and user friendly >>> version of string.translate function. >>> It could work this way: >>> string.newtranslate(file_with_table, Drop=True, Dec=True) >>> >>> So the parameters: >>> >>> 1. "file_with_table" : a text file with table in following format: >>> >>> #[In][Out] >>> >>> 97{65} >>> 98{66} >>> 99{67} >>> 100{} >>> ... >>> 110{110} >>> >>> >>> Notes: >>> All values are decimal or hex (to switch between parsing format use >>> Dec parameter) >>> As it turned out from my last discussion, majority prefers hex notation, >>> so I am not in mainstream with my decimal notation here, but both >>> should be supported. >>> Empty [Out] value {} means that the character will be deleted. >>> >>> 2. "Drop = True" this will set the default behavior for those values >>> which are NOT in the table. >>> >>> For Drop = True: all values not defined in table set to [out] = {}, >>> and be deleted. >>> >>> For Drop=False: all values not defined in table set [out] = [in], so >>> those remain as is. >>> >>> 3. Dec= True : parsing format Decimal/hex. I use decimal everywhere. >>> >>> >>> Further thoughts: for 8-bit strings this should be simple to implement >>> I think. For 16-bit of course >>> there is issue of memory usage for lookup tables, but the gurus could >>> probably optimise it. >>> E.g. at the parsing stage it is not necessary to build the lookup >>> table for whole 16-bit range of course, >>> but take only values till the largest ordinal present in the table file. >>> >>> About the format of table file: I suppose many users would want also >>> to define characters directly, I am not sure >>> if it is really needed, but if so, additional brackets or escape char >>> could be used, like this for example: >>> >>> a{A} >>> \98{\66} >>> \99{\67} >>> >>> but as said I don't like very much the idea and would be OK for me to >>> use numeric values only. >>> >>> So approximately I see it. >>> Feel free to share thoughts or criticise. >>> >>> >>> Mikhail >>> ___ >>> Python-ideas mailing list >>> Python-ideas@python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR(206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> chris.bar...@noaa.gov >> >> ___ >> Python-ideas mailing list >> Python-ideas@python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] More user-friendly version for string.translate()
my thought on this: If you need translate() you probably can write the code to parse a text file, and then you can use whatever format you want. This seems a very special case to build into the stdlib. -CHB On Mon, Oct 24, 2016 at 10:39 AM, Mikhail Vwrote: > Hello all, > > I would be happy to see a somewhat more general and user friendly > version of string.translate function. > It could work this way: > string.newtranslate(file_with_table, Drop=True, Dec=True) > > So the parameters: > > 1. "file_with_table" : a text file with table in following format: > > #[In][Out] > > 97{65} > 98{66} > 99{67} > 100{} > ... > 110{110} > > > Notes: > All values are decimal or hex (to switch between parsing format use > Dec parameter) > As it turned out from my last discussion, majority prefers hex notation, > so I am not in mainstream with my decimal notation here, but both > should be supported. > Empty [Out] value {} means that the character will be deleted. > > 2. "Drop = True" this will set the default behavior for those values > which are NOT in the table. > > For Drop = True: all values not defined in table set to [out] = {}, > and be deleted. > > For Drop=False: all values not defined in table set [out] = [in], so > those remain as is. > > 3. Dec= True : parsing format Decimal/hex. I use decimal everywhere. > > > Further thoughts: for 8-bit strings this should be simple to implement > I think. For 16-bit of course > there is issue of memory usage for lookup tables, but the gurus could > probably optimise it. > E.g. at the parsing stage it is not necessary to build the lookup > table for whole 16-bit range of course, > but take only values till the largest ordinal present in the table file. > > About the format of table file: I suppose many users would want also > to define characters directly, I am not sure > if it is really needed, but if so, additional brackets or escape char > could be used, like this for example: > > a{A} > \98{\66} > \99{\67} > > but as said I don't like very much the idea and would be OK for me to > use numeric values only. > > So approximately I see it. > Feel free to share thoughts or criticise. > > > Mikhail > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/