Re: [Python-ideas] More user-friendly version for string.translate()

2016-11-02 Thread Chris Barker
On Wed, Nov 2, 2016 at 12:02 PM, Mikhail V  wrote:

> Actually even with ASCII (read for python 2.7) I would also be happy
> to have such function: say I just want to keep only digits so I write:
> digits = "0123456789"
> newstring = somestring.keep(digits)

well, with ascii, it's not too hard to make a translation table:

digits = "0123456789"

table = [(o if chr(o) in digits else None )for o in range(256)]

s = "some stuff and some 456 23 numbers 888"



but then there is the defaultdict way:

s.translate(defaultdict(*lambda*: *None*, {ord(c):c *for* c *in*

wasn't that easy? Granted, if you need to do this, you'd wrap it in a
function like Chris A. Suggested. But this really isn't easy or
discoverable -- it took me a fair bit of fidlding to get right, and I knew
I was looking for a defaultdict implementation.


In [43]: table



{48: '0',
 49: '1',
 50: '2',
 51: '3',
 52: '4',
 53: '5',
 54: '6',
 55: '7',
 56: '8',
 57: '9'})

In [44]: s.translate(table)

Out[44]: '45623888'

In [45]: table



{32: None,
 48: '0',
 49: '1',
 50: '2',
 51: '3',
 52: '4',
 53: '5',
 54: '6',
 55: '7',
 56: '8',
 57: '9',
 97: None,
 98: None,
 100: None,
 101: None,
 102: None,
 109: None,
 110: None,
 111: None,
 114: None,
 115: None,
 116: None,
 117: None})

defaultdict puts an entry in for every ordinal checked -- this could get
big -- granted, probaly nt a big deal with modern computer memory, but

it might even be worth making a NoneDict for this:

class NoneDict(dict):
Dictionary implementation that always returns None when a key is not in
the dict,
rather than raising a KeyError
def __getitem__(self, key):
val = dict.__getitem__(self, key)
except KeyError:
val = None
return val

(see enclosed -- it works fine with translate)

(OK, that was fun, but no, not really that useful)

Despite I can do it other way, this would be much simpler and clearer
> way to do it. And I suppose it is quite common task not only for me.

That's the key question -- is this a common task? If so, then whie there
are ways to do it, they're not easy nor discoverable.

And while some of the guiding principles of this list are:

"not every two line function needs to be in the standard lib"


"put it up on PYPi, and see if a lot of people find it useful"

It's actually kind of silly to put a single function up as a PyPi package
-- and I doubt many people will find it if you did.



class NoneDict(dict):
Dictionary implimentaton that always returns None when a key is not in the dict,
rather than raising a KeyError
def __getitem__(self, key):
val = dict.__getitem__(self, key)
except KeyError:
val = None
return val

def test_basic():
some simple tests

test_dict = {'a': 23,
 'b': 45,
arbitrary_keys = [23, 'that', (1, 2, 3)]
d = NoneDict(test_dict)

# do the ones in there work:
for key, val in test_dict.items():
print("trying:", key, val)
assert d[key] == val

for key in arbitrary_keys:
print("trying:", key)
assert d[key] is None

# did the dict length change?
assert len(d) == len(test_dict)

print("all tests pass")

def test_with_translate():
tests using a NoneDict with str.translate()
digits = "0123456789"
# make a translate table:
table = NoneDict({ord(c): c for c in digits})

s = "some stuff and some 456 23 numbers 888"

final = s.translate(table)

for c in final:
assert c in digits
print("it works with translate")

if __name__ == "__main__":

Re: [Python-ideas] More user-friendly version for string.translate()

2016-11-02 Thread Mikhail V
On 27 October 2016 at 00:17, Chris Barker  wrote:
> 1) an easy way to spell "remove all the characters other than these"
> I think that's a good idea. What with unicode having an enormous number
> of code points, it really does make sense to have a way to specify
> only what you >want, rather than what you don't want.
> Back in the good old days of 1-byte chars, it wasn't hard to build up
> a full 256 element translate table -- not so much anymore.
> And one of the whole points of str.translate() is good performance.

Actually even with ASCII (read for python 2.7) I would also be happy
to have such function: say I just want to keep only digits so I write:

digits = "0123456789"
newstring = somestring.keep(digits)

Despite I can do it other way, this would be much simpler and clearer
way to do it. And I suppose it is quite common task not only for me.
Currently 99% of my programs are in python 2.7. And I started to use python 3
only for tasks when I want to process unicode strings (ironically only
to get rid of unicode).

Re: [Python-ideas] More user-friendly version for string.translate()

2016-11-02 Thread Chris Barker
On Tue, Nov 1, 2016 at 12:15 AM, Stephen J. Turnbull <> wrote:

>  > pretty slick -- but any hope of it being as fast as a C implemented
> method?
> I would expect not in CPython, but if "fast" matters, why are you
> using CPython rather than PyPy or Cython?

oh come on!

>  If it matters *that* much,
> you can afford to write your own C implementation.

This is about a possible addition to the stdlib -- me writing my own C
implementation has nothing to do with it.

> But I doubt that
> fast matters "that much" often enough to be worth maintaining yet
> another string method in Python.

This could be said about every string method in Python -- I understand that
every addition is more code to maintain. But somehow we are adding all
kinds of stuff like yet another string formatting method, talking about
null coalescing operators and who knows what -- those are all a MUCH larger
burden -- not just for maintaining the interpreter, but for everyone using
python having more to remember and understand.

On the other hand, powerful and performant string methods are a major plus
for Python -- a good reason to us it over Perl :-)

So an new one that provides, as I write before:

 > 1) single method call to do a common thing
>  >
>  > 2) nice fast, pure C performance

would fit right into to Python, and indeed, would be a similar
implementation to existing methods -- so the maintenance burden would be a
small addition (i.e if the internal representation for strings changed, all
those methods would need re-visiting and similar changes)

So the only key question is -- is the a common enough use case?

 > so I think a "keep these" method would help with both of these
>  > goals.
> Sure, but the translate method already gives you that, and a lot more.

yes but only with the fairly esoteric use of defaultdict. which brings me
back to the above:

1) single method call to do a common thing

the nice thing about a single method call is discoverability -- no newbie
is going to figure out the .translate + defaultdict approach.

> Note that when you're talking about working with Unicode characters,
> no natural language activity I can imagine (not even translating
> Buddhist texts, which involves a couple of Indian scripts as well as
> Han ideographs) uses more than a fraction of defined characters.

which is why you may want to remove all the others :-)

So really translate with defaultdict is a specialized loop that
> marries an algorithmic body (which could do things like look up the
> original script or other character properties to decide on the
> replacement for the generic case) with a (usually "small") table of
> exceptions.  That seems like inspired design to me.

indeed -- .translate() itself is remarkably flexible -- you could even pas
in a custom class that does all sorts of logic. and adding the defaultdict
is an easy way to add a useful feature. But again, advanced usage and not
very discoverable.

Maybe that means we need some more docs and/or perhaps recipes instead.

Anyway, I joined this thread to clarify what might be on the table -- but
while I think it's a good idea, I dont have the bandwidth to move it
through the process -- so unless someone steps up that does, we're done.



Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-31 Thread Chris Barker
On Fri, Oct 28, 2016 at 7:28 AM, Terry Reedy  wrote:

> >>> s = 'kjskljkxcvnalsfjaweirKJZknzsnlkjsvnskjszsdscccjasfdjf'
> >>> s2 = ''.join(c for c in s if c in set('abc'))

pretty slick -- but any hope of it being as fast as a C implemented method?

for example, with a 1000 char string:

In [59]: % timeit string.translate(table)
10 loops, best of 3: 3.62 µs per loop

In [60]: % timeit ''.join(c for c in string if c in set(letters))
1000 loops, best of 3: 1.14 ms per loop

so the translate() method is about 300 times faster in this case. (and it
used a defaultdict with a None factory, which is probably a bit slower than
a pure C implementation might be.

I've always figured that Python's rich string methods provided two things:

1) single method call to do common things

2) nice fast, pure C performance

so I think a "keep these" method would help with both of these goals.



Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-28 Thread Chris Angelico
On Sat, Oct 29, 2016 at 1:28 AM, Terry Reedy  wrote:
> If one has a translation dictionary d, use that in twice in the genexp.
 d = {'a': '1', 'b': '3x', 'c': 'fum'}
 ''.join(d[c] for c in s if c in d.keys())
> 'fum11fumfumfum1'

Trivial change:

>>> ''.join(d[c] for c in s if c in d)

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-28 Thread Terry Reedy

On 10/26/2016 6:17 PM, Chris Barker wrote:

I"ve lost track of what (If anything) is actually being proposed here...
so I"m going to try a quick summary:

1) an easy way to spell "remove all the characters other than these"

In other words, 'only keep these'.
We already have easy ways to create filtered strings.

>>> s = 'kjskljkxcvnalsfjaweirKJZknzsnlkjsvnskjszsdscccjasfdjf'
>>> s2 = ''.join(c for c in s if c in set('abc'))
>>> s2
>>> s3 = ''.join(filter(lambda c: c in set('abc'), s))
>>> s3

I expect the first to be a bit faster.  Either can be wrapped in a 
keep() function.  If one has a translation dictionary d, use that in 
twice in the genexp.

>>> d = {'a': '1', 'b': '3x', 'c': 'fum'}
>>> ''.join(d[c] for c in s if c in d.keys())

Terry Jan Reedy

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-27 Thread Chris Barker
>>return string.translate(collections.defaultdict(lambda: None,

Nice! I forgot about defautdict -- so this just needs a recipe somewhere --
maybe even in the docs for str.translate.

BTW, great use case for defautdict -- I had been wondering what the point
was, given that a regular dict as .setdefault



Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-26 Thread Chris Angelico
On Thu, Oct 27, 2016 at 8:48 AM, Mikhail V  wrote:
> On 26 October 2016 at 20:58, Stephen J. Turnbull
>  wrote:
>>import collections
>>def translate_or_drop(string, table):
>>string: a string to process
>>table: a dict as accepted by str.translate
>>return string.translate(collections.defaultdict(lambda: None, **table))
>>All OK now?
> Not really. I tried with a simple example
> intab = "ae"
> outtab = "XM"
> table = string.maketrans(intab, outtab)
> collections.defaultdict(lambda: None, **table)
> an this gives me
> TypeError: type object argument after ** must be a mapping, not str
> But I probably I misunderstood the idea.

You're 99% of the way to understanding it. Try the exercise again in
Python 3. You don't have string.maketrans (which creates a 256-byte
translation mapping) - instead, you use a dictionary.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-26 Thread Chris Barker
On Wed, Oct 26, 2016 at 5:32 PM, Mikhail V  wrote:

> > (b) has the advantage of adding translation and removal in one fell
> swoop --
> > but if you only want to remove, then you have to make a translation
> table of
> > 1:1 mappings = not hard, but a annoying:
> Exactly that is the proposal. And for same exact reason that you point out,
> I also can't give a comment what would be better. It would be indeed
> quite strange from syntactical POV if I just want to remove "all except"
> and must call translate(). So ideally both should exist I think.

That kind of violate OWTDI though. Probably one's enough.

and if fact with the use-cases I can think of, and the one you mentioned,
they are really two steps: there are the characters you want to translate,
and the ones you want to keep, but the ones you want to keep are a superset
of the ones you want to translate. so if we added the "remove"option to
.translate(), then you would need to add all the "keep" charactors to your
translate table.

I'm thinking they really are different operations, give them a different



Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-26 Thread Chris Barker
On Wed, Oct 26, 2016 at 3:48 PM, MRAB  wrote:

> str.replace( ("aaa", "a", "b"), ("b", "bbb", "a")
>> and all sort of other complications!
> 2) Check from the longest to the shortest.
> If you're going to pick choice 2, does it have to be 2 tuples/lists? Why
> not a dict instead?

then we have a string.translate() that accepts a table of string
replacements, rather than individual character replacements -- maybe a good



Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-26 Thread Mikhail V
On 27 October 2016 at 00:17, Chris Barker  wrote:
> I"ve lost track of what (If anything) is actually being proposed here... so
> I"m going to try a quick summary:
> 1) an easy way to spell "remove all the characters other than these"
> I think that's a good idea. What with unicode having an enormous number of
> code points, it really does make sense to have a way to specify only what
> you want, rather than what you don't want.
> Back in the good old days of 1-byte chars, it wasn't hard to build up a full
> 256 element translate table -- not so much anymore. And one of the whole
> points of str.translate() is good performance.
>  a) a new method:
>   (naming TBD)
> b) a new flag in translate (Kind of like the decode keywords)
>   str.translate(table, missing='ignore'|'remove')
> (b) has the advantage of adding translation and removal in one fell swoop --
> but if you only want to remove, then you have to make a translation table of
> 1:1 mappings = not hard, but a annoying:

Exactly that is the proposal. And for same exact reason that you point out,
I also can't give a comment what would be better. It would be indeed
quite strange from syntactical POV if I just want to remove "all except"
and must call translate(). So ideally both should exist I think.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-26 Thread MRAB

On 2016-10-26 23:17, Chris Barker wrote:

I"ve lost track of what (If anything) is actually being proposed here...
so I"m going to try a quick summary:

1) an easy way to spell "remove all the characters other than these"

I think that's a good idea. What with unicode having an enormous number
of code points, it really does make sense to have a way to specify only
what you want, rather than what you don't want.

Back in the good old days of 1-byte chars, it wasn't hard to build up a
full 256 element translate table -- not so much anymore. And one of the
whole points of str.translate() is good performance.

 a) a new method:

  (naming TBD)

b) a new flag in translate (Kind of like the decode keywords)

  str.translate(table, missing='ignore'|'remove')

c) pass a function that returns the replacement:

def replace(c):
return c.upper() if c.isalpha() else ''


The replacement function could be called only on distinct codepoints.

(b) has the advantage of adding translation and removal in one fell
swoop -- but if you only want to remove, then you have to make a
translation table of 1:1 mappings = not hard, but a annoying:

table = {c:c for c in sequence_of_chars}

I'm on the fence about what I personally prefer.

2) (in another thread, but similar enough) being able to pass in more
than one string to replace:

str.replace( old=seq_of_strings, new=seq_of_strings )

I know I've wanted this a lot, and certainly from a performance
perspective, it could be a nice bonus.

But: It overlaps a lot with str.translate -- at least for single
character replacements. so really why? so it would really only make
sense if supported multi-char strings:

str.replace(old = ("aword", "another_word"), ("something", "something

However: a string IS a sequence of strings, so we'd have confusion about

str.replace("this", "four")

Does the user want the word "this" replaced with the word "four" -- or
do they want each character replaced?

Maybe we'd need a .replace_many() method? ugh!

There are also other issues with what to di with repeated / overlapping

str.replace( ("aaa", "a", "b"), ("b", "bbb", "a")

and all sort of other complications!

Possible choices are:

1) Use the given order.

2) Check from the longest to the shortest.

If you're going to pick choice 2, does it have to be 2 tuples/lists? Why 
not a dict instead?

THAT I think could be nailed down by defining the "order of operations"
Does it lop through the entire string for each item? or through each
item for each point in the string? note that if you loop thorugh the
entire string for each item, you might as well have written the loop

for old, new in sip(old_list, new_list):
s = s.replace(old, new))

and at least if the length of the string si long-ish, and the number of
replacements short-ish -- performance would be fine.

*** So the question is -- is there support for these enhancements? If
so, then it would be worth hashing ot the details.

But the next question is -- does anyone care enough to manage that
process -- it'll be a lot of work!

NOTE: there has also been a fair bit of discussion in this thread about
ordinals vs characters, and unicode itself -- I don't think any of that
resulted in any possible proposals...


Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-26 Thread Mikhail V
On 26 October 2016 at 20:58, Stephen J. Turnbull
>import collections
>def translate_or_drop(string, table):
>string: a string to process
>table: a dict as accepted by str.translate
>return string.translate(collections.defaultdict(lambda: None, **table))

>All OK now?

Not really. I tried with a simple example
intab = "ae"
outtab = "XM"
table = string.maketrans(intab, outtab)
collections.defaultdict(lambda: None, **table)

an this gives me
TypeError: type object argument after ** must be a mapping, not str

But I probably I misunderstood the idea. Anyway this code does not make
much sence to me, I would never in life understand what is meant here.
And in my not so big, but not so small, Python experience I *never* had
an occasion using collections or lambda.

>sets as a single, universal character set.  As it happens, although
>there are differences of opinion over how to handle Unicode in Python,
>there is consensus that Python does have to handle Unicode flexibly,
>effectively and efficiently.

I was merely talking about syntax and sources files standard, not about unicode
strings. No doubt one needs some way to store different glyph sets.

So I was talking about that if one defines a syntax and has good intentions
for readability in mind, there is not so many rationale to adopt the syntax
to current "hybrid" system: 7-bit and/or multibyte paradigm.
Again this a too far going discussion, but one should not probably much
look ahead on those. The situation is not so good in this sense that most
standard software is  attached to this strange paradigm
(even those which does not have anything
to do with multi-lingual typography).
So IMO something gone wrong with those standard characters.

>If you insist on bucking it, you'll
>have to do it pretty much alone, perhaps even maintaining your own
>fork of Python.

As for me I would take the path of developing of own IDE which will enable
typografic quality rendering and of course all useful glyphs, such as
curly quotes,
bullets, etc, which all is fundamental to any possible improvements of
cognitive qualities of code. And I'll stay in 8-bit boundaries, thats for sure.
So if Python will take the path of "unicode" code input (e.g. for some
punctuaion characters)
this would only add a minor issue for generating valid Python source
files in this case.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-26 Thread Stephen J. Turnbull
Mikhail V writes:

 > >That said, multiple methods is a valid option for the API.
 > Certainly I like the look of distinct functions more.
 > It allows me to visually parse the code effectively,
 > so e.g. for str.remove() I would not need to look
 > in docs to understand what the function does.

OK, as I said, you're in accord with Guido on that.  His rationale is
somewhat different, but that's OK.

 > Just in some cases I need to convert them to numpy arrays back and
 > forth, so this unicode vanity worries me a bit.

I think you're borrowing trouble you actually don't have.  Either way,
the rest of the world *needs* Unicode to do their work, and it's not
going to go away.  On the positive side, turning a string into a list
of codepoints is trivial:

[ord(c) for c in string]

 > So I am just not the one who believes in these maximalistical "we
 > need over 9000 glyphs" talks.

But you don't need to believe in it.  What you do need to believe is
that the rest of us believe that we need the union of our character
sets as a single, universal character set.  As it happens, although
there are differences of opinion over how to handle Unicode in Python,
there is consensus that Python does have to handle Unicode flexibly,
effectively and efficiently.

Believe me, it *is* a consensus.  If you insist on bucking it, you'll
have to do it pretty much alone, perhaps even maintaining your own
fork of Python.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-26 Thread Stephen J. Turnbull
Mikhail V writes:

 > I need translate() which drops non-defined chars. Please :)

import collections
def translate_or_drop(string, table):
string: a string to process
table: a dict as accepted by str.translate
return string.translate(collections.defaultdict(lambda: None, **table))

All OK now?

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-26 Thread Steven D'Aprano
On Wed, Oct 26, 2016 at 04:29:13AM +0200, Mikhail V wrote:

> I need translate() which drops non-defined chars. Please :)
> No optimisation, no new syntax. deal?

I still wonder whether this might be worth introducing as a new 
string method, or an option to translate. But the earliest that will 
happen is Python 3.7, so in the meantime, something like this should 
be enough:

# untested
keep = "abcdßαβπд∞"
text = "..."
# Find all the characters in text that are not in keep:
delchars = set(text) - set(keep)
delchars = ''.join(delchars)
text = text.translate(str.maketrans("", "", delchars))

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-25 Thread Mikhail V
On 26 October 2016 at 03:40, Steven D'Aprano  wrote:

> in a "table.txt" file, and typing:
> {
> 123: 456,
> 124: 457,
> 125: 458,
> # two hundred more lines
> }
> in a "" file? The difference is insignificant. And the Python
> version can be cleaned up:

Ok, you have opened my eyes here. Thank you,
you re good.

> [...]
>> Motivation is that those can be optimised for speed
> That's not a motivation. Why are you talking about "optimizing for
> speed" functions that we have not yet established are needed?
> That reminds me of a story I once heard of somebody who was driving
> across the desert in the US once. One of his passengers noticed the
> highway signs and said "Wait, aren't we going the wrong way?" The driver
> replied "Who cares, we're making fantastic time!"
> Optimizing a function you don't need is not an optimization. It is a
> waste of time.

Making good time is important indeed!
I need translate() which drops non-defined chars. Please :)
No optimisation, no new syntax. deal?

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-25 Thread Steven D'Aprano
On Tue, Oct 25, 2016 at 05:15:58PM +0200, Mikhail V wrote:

> >Or it can take a mapping (usually a dict) that maps either characters or
> >ordinal numbers to a new string (not just a single character, but an
> >arbitrary string) or ordinal numbers.
> >
> >str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43})
> >(or None, to delete them). Note the flexibility: you don't need to
> Good. But of course if I do it with big tables, I would anyway
> need to parse them from some table file. Typing all values
> direct in code is not a comfortable way.

Why not? What is the difference between typing

123: 456
124: 457
125: 458
# two hundred more lines

in a "table.txt" file, and typing:

123: 456,
124: 457,
125: 458,
# two hundred more lines

in a "" file? The difference is insignificant. And the Python 
version can be cleaned up:

for i in range(123, 333):
table[i] = 456 - 123 + i

Not all data whould be written as code, especially if you expect 
unskilled users to edit it, but generating data directly in code is a 
very powerful technique, and the strict syntax of the programming 
language helps prevent some errors.

> Motivation is that those can be optimised for speed 

That's not a motivation. Why are you talking about "optimizing for 
speed" functions that we have not yet established are needed?

That reminds me of a story I once heard of somebody who was driving 
across the desert in the US once. One of his passengers noticed the 
highway signs and said "Wait, aren't we going the wrong way?" The driver 
replied "Who cares, we're making fantastic time!"

Optimizing a function you don't need is not an optimization. It is a 
waste of time.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-25 Thread Mikhail V
On 25 October 2016 at 19:10, Stephen J. Turnbull

 > So my previous thought on it was, that there could be set of such functions:
 > str.translate_keep(table) - this is current translate, namely keeps
 > non-defined chars untouched
 > str.translate_drop(table) - all the same, but dropping non-defined chars
 > Probaly also a pair of functions without translation:
 > str.remove(chars) - removes given chars
 > str.keep(chars) - removes all, except chars
 > Motivation is that those can be optimised for speed and I suppose those
 > can work faster than re.sub().

>That said, multiple methods is a valid option for the API.  Eg, Guido
>generally prefers that distinctions that can't be made on type of
>arguments (such as translate_keep vs translate_drop) be done by giving
>different names rather than a flag argument.  Do you *like* this API,
>or was this motivated primarily by the possibilities you see for

Certainly I like the look of distinct functions more.
It allows me to visually parse the code effectively,
so e.g. for str.remove() I would not need to look
in docs to understand what the function does.
It has its downside of course, since new definitions
can accidentally be similar to current ones, so more
names, more the probability that no good names are left.
Speed is not so important for majority of cases, at least
for my current tasks. However if I'll need to process very large
texts (seems like I will), speed will be more important.

>The width is constant for any given string.  However, I don't see at
>this point that you'll need more than the functions available in
>Python already, plus one or more wrappers to marshal the information
>your API accepts to the data that str.translate wants.

Just in some cases I need to convert them to numpy arrays back and forth,
so this unicode vanity worries me a bit. But I cannot clearly explain
why exactly I need this.

 > >> but as said I don't like very much the idea and would be OK for me to
 > >> use numeric values only.
 > Yeah I am strange. This however gives you guarantee for any
environment that you
 > can see and input them ans save the work in ASCII.

>This is not going to be a problem if you're running Python and can
>enter the program and digits.  In any case, the API is going to have
>to be convenient for all the people who expect that they will never
>again be reduced to a hex keypad and 7-segment display

Here I will dare to make a lyrical degression again.
It could have made an impression that I am stuck in nineties or
something. But that is not the case. In nineties
I used the PC mostly to play Duke Nukem (yeh big times!).
And all the more I hadnt any idea what is efficiency
of information representation and readability.
Now I kind of realize it.
So I am just not the one who believes in these
maximalistical "we need over 9000 glyphs" talks.
And, somewhat prophetic view on this:
with the come of cyber era this all be flushed
so fast, that all this diligences around unicode
could look funny actually. And a hex keypad
will not sound "retro" but "brand new".

In other words: I feel really strong that nothin
besides standard characters must appear in code sources.
If one wants to process unicode, then parse them
as resources.
So please, at least out of respect to rationally
minded, don't make a code look like a christmas-tree.
BTW, I use VIM to code actually so anyway I will not
see them in my code.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-24 Thread Steven D'Aprano
On Mon, Oct 24, 2016 at 07:39:16PM +0200, Mikhail V wrote:
> Hello all,
> I would be happy to see a somewhat more general and user friendly
> version of string.translate function.
> It could work this way:
> string.newtranslate(file_with_table, Drop=True, Dec=True)

That's an interesting concept for "user friendly". Apart from functions 
that are actually designed to read files of a particular format, can 
you think of any built-in functions that take a file as argument?

This is how you would use this "user friendly version of translate":

path = '/tmp/table'  # hope no other program is using it...
with open(path, 'w') as f:

with open(path, 'r') as f:
new_string = old_string.newtranslate(f, False, True)

Compared to the existing solution:

new_string = old_string.translate(str.maketrans('abc', 'ABC'))

Mikhail, I appreciate that you have many ideas and want to share them, 
but try to think about how those ideas would work. The Python standard 
library is full of really well-designed programming interfaces. You can 
learn a lot by thinking "what existing function is this like? how does 
that existing function work?".

str.translate and str.maketrans already exist. Look at how maketrans 
builds a translation table: it can take either two equal length strings, 
and maps characters in one to the equivalent character in the other:

str.maketrans('abc', 'ABC')

Or it can take a mapping (usually a dict) that maps either characters or 
ordinal numbers to a new string (not just a single character, but an 
arbitrary string) or ordinal numbers. 

str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43})

(or None, to delete them). Note the flexibility: you don't need to 
specify ahead of time whether you are specifying the ordinal 
value as a decimal, hex, octal or binary value. Any expression that 
evaluates to a string or a int within the legal range is valid.

That's a good programming interface.

Could it be better? Perhaps. I've suggested that maybe translate could 
automatically call maketrans if given more than one argument. Maybe 
there's an easier way to just delete unwanted characters. Perhaps there 
could be a way to say "any character not in the translation table should 
be dropped". These are interesting questions.

> Further thoughts: for 8-bit strings this should be simple to implement
> I think.

I doubt that these new features will be added to bytes as well as 
strings. For 8-bits byte strings, it is easy enough to generate your own 
translation and deletion tables -- there are only 256 values to 

> For 16-bit of course
> there is issue of memory usage for lookup tables, but the gurus could
> probably optimise it.

There are no 16-bit strings.

Unicode is a 21-bit encoding, usually encoded as either fixed-width 
sequence of 4-byte code units (UTF-32) or a variable-width sequence of 
2-byte (UTF-16) or 1-byte (UTF-8) code units. But it absolutely is not a 
"16-bit string".

> but as said I don't like very much the idea and would be OK for me to
> use numeric values only.

I think you are very possibly the only Python programmer in the world 
who thinks that writing decimal ordinal values is more user-friendly 
than writing the actual character itself. I know I would much rather 
see $, π or ╔ than 36, 960 or 9556.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-24 Thread Chris Barker - NOAA Federal
> Just a pair of usage cases which I was facing in my practice:

> So I just define a table like:
> {
> 1072: 97
> 1073: 98
> 1074: 99
> ...
> [which localizes Cyrillic into ASCII]
> ...
> 97:97
> 98:98
> 99:99
> ...
> [those chars that are OK, leave them]
> }
> Then I use os.walk() and os.rename() and voila! the file system
> regains it virginity
> in one simple script.

This sounds like a perfect use case for str.translate() as it is.

> 2. Say I have a multi-lingual file or whatever, I want to filter out
> some unwanted
> characters so I can do it similarly.

Filtering out is different-- but I would think that you would want
replace, rather than remove.

If you wanted names to all comply with a given encoding (ascii or
Latin-1, or...), then encoding/decoding (with error set to replace)
would do nicely.


> Mikhail
Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-24 Thread Chris Barker - NOAA Federal
> re.sub('[^0-9]', '', 'ab0c2m3g5')
>> '0235'
>> Possibly because there's a lot of good Python builtins that allow you
>> to avoid the re module when *not* needed, it's easy to forget it in
>> the cases where it does pretty much exactly what you want,

There is a LOT of overhead to figuring out how to use the re module.
I've always though t it had it's place, but it sure seems like
overkill for something this seemingly simple.

If (a big if) removing "all but these" was a common use case, it would
be nice to have a way to do it with string methods.

This is a classic case of:

Put it on PyPi, and see how much interest it garners.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-24 Thread Mikhail V
On 24 October 2016 at 23:10, Paul Moore  wrote:
> On 24 October 2016 at 21:54, Chris Barker  wrote:
>> I don't know a way to do "remove every character except these", but someone
>> I expect there is a way to do that efficiently with Python strings.
> It's easy enough with the re module:
 re.sub('[^0-9]', '', 'ab0c2m3g5')
> '0235'
> Possibly because there's a lot of good Python builtins that allow you
> to avoid the re module when *not* needed, it's easy to forget it in
> the cases where it does pretty much exactly what you want, or can be
> persuaded to do so with much less difficulty than rolling your own
> solution (I know I'm guilty of that...).
> Paul

Thanks, this would solve the task of course.
However for example in the case in my last example (filenames)
this would require:

- Write a function to construct the expression for "all except given"
characters from my table. This could be easy I believe, but still another task.

1. Apply translate() with my table to the string.
2. Apply re.sub() to the string.

I usually start using RE when I want to find/replace words or patterns,
but not translate/filter the characters directly. So since there is
already an "inclusive"
translate() then probably having an "exclusive" one is not a bad idea.
I believe it is something very similar in implementation, so instead
of appending next character which is not in the table, it simply does nothing.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-24 Thread Mikhail V
On 24 October 2016 at 22:54, Chris Barker  wrote:
> On Mon, Oct 24, 2016 at 1:30 PM, Mikhail V  wrote:
>> But how would you with current translate function drop all characters
>> that are not in the table?
> that is another question altogether, and one for a different list, actually.
> I don't know a way to do "remove every character except these", but someone
> I expect there is a way to do that efficiently with Python strings.
> you could probably (ab)use the codecs module, though.
> If there really is no way to do it, then you might have feature worth
> pursuing, but be prepared with use-cases!
> The only use-case I've had for that sort of this is when I want only ASCII
> -- but I can uses the ascii codec for that :-)
>> This for example
>> is needed for filtering out all non-standard characters from paths, etc.
> You'd usually want to replace those with something, rather than remove them
> entirely, yes?

Just a pair of usage cases which I was facing in my practice:
1. Imagine I perform some admin tasks in a company with very different users
who also tend to name the files as they wish. So only God knows what can
be there in filenames. And I know foe example that there can be Cyrillic besides
ASCII their. So I just define a table like:
1072: 97
1073: 98
1074: 99
[which localizes Cyrillic into ASCII]
[those chars that are OK, leave them]

Then I use os.walk() and os.rename() and voila! the file system
regains it virginity
in one simple script.

2. Say I have a multi-lingual file or whatever, I want to filter out
some unwanted
characters so I can do it similarly.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-24 Thread Mikhail V
On 24 October 2016 at 20:02, Chris Barker  wrote:
> On Mon, Oct 24, 2016 at 10:50 AM, Ryan Birmingham 
> wrote:
>> I also believe that using a text file would not be the best solution;
>> using a dictionary,
> actually, now that you mention it -- .translate() already takes a dict, so
> if youw ant to put your translation table in a text file, you can use a dict
> literal to do it:
> # contents of file:
> {
> 32: 95,
> 105: 64,
> 115: 36,
> }
> then use it:
> s.translate(ast.literal_eval(open("trans_table.txt").read()))
> now all you need is a tiny little utility function:
> def translate_from_file(s, filename):
> return s.translate(ast.literal_eval(open(filename).read()))
> :-)
> -Chris

Yes making special file format is not a good option I agree.
Also of course it does not have sence to read it everytime if translate
is called in a loop with the same table. So it was merely a sketch of

But how would you with current translate function drop all characters
that are not in the table? so I can pass [deletechars] to the function but
this seems not very convenient to me -- very often I want to
drop them *all*, excluding some particular values.  This for example
is needed for filtering out all non-standard characters from paths, etc.
So in other words, there should be an option to control this behavior.
Probably I am missing something here, but I didn't find such solution
for translate() and that is main point of proposal actually.
It is all the same as translate() but with this extension it can cover
much more usage cases.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-24 Thread Paul Moore
On 24 October 2016 at 18:39, Mikhail V  wrote:
> I would be happy to see a somewhat more general and user friendly
> version of string.translate function.
> It could work this way:
> string.newtranslate(file_with_table, Drop=True, Dec=True)

Using a text file seems very odd. But regardless, this could *easily*
be published on PyPI, and then if it gained enough users be proposed
for the stdlib. I don't think there's anything like sufficient value
to warrant "fast-tracking" something like this direct to the stdlib.
And real-world use via PyPI would very quickly establish whether the
unusual "pass a file with a translation table in it" design was
acceptable to users.

Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-24 Thread Chris Barker
On Mon, Oct 24, 2016 at 10:50 AM, Ryan Birmingham 

> I also believe that using a text file would not be the best solution;
> using a dictionary,

actually, now that you mention it -- .translate() already takes a dict, so
if youw ant to put your translation table in a text file, you can use a
dict literal to do it:

# contents of file:

32: 95,

> 105: 64,
115: 36,

then use it:

now all you need is a tiny little utility function:

def translate_from_file(s, filename):
return s.translate(ast.literal_eval(open(filename).read()))



> other data structure, or anonomyous function would make more sense than
> having a specially formatted file.
> On Oct 24, 2016 13:45, "Chris Barker"  wrote:
>> my thought on this:
>> If you need translate() you probably can write the code to parse a text
>> file, and then you can use whatever format you want.
>> This seems a very special case to build into the stdlib.
>> -CHB
>> On Mon, Oct 24, 2016 at 10:39 AM, Mikhail V  wrote:
>>> Hello all,
>>> I would be happy to see a somewhat more general and user friendly
>>> version of string.translate function.
>>> It could work this way:
>>> string.newtranslate(file_with_table, Drop=True, Dec=True)
>>> So the parameters:
>>> 1. "file_with_table" : a text file with table in following format:
>>> #[In][Out]
>>> 97{65}
>>> 98{66}
>>> 99{67}
>>> 100{}
>>> ...
>>> 110{110}
>>> Notes:
>>> All values are decimal or hex (to switch between parsing format use
>>> Dec parameter)
>>> As it turned out from my last discussion, majority prefers hex notation,
>>> so I am not in mainstream with my decimal notation here, but both
>>> should be supported.
>>> Empty [Out] value {} means that the character will be deleted.
>>> 2. "Drop = True" this will set the default behavior for those values
>>> which are NOT in the table.
>>> For Drop = True: all values not defined in table set to [out] = {},
>>> and be deleted.
>>> For Drop=False: all values not defined in table set [out] = [in], so
>>> those remain as is.
>>> 3. Dec= True : parsing format Decimal/hex. I use decimal everywhere.
>>> Further thoughts: for 8-bit strings this should be simple to implement
>>> I think. For 16-bit of course
>>> there is issue of memory usage for lookup tables, but the gurus could
>>> probably optimise it.
>>> E.g. at the parsing stage it is not necessary to build the lookup
>>> table  for whole 16-bit range of course,
>>> but take only values till the largest ordinal present in the table file.
>>> About the format of table file: I suppose many users would want also
>>> to define characters directly, I am not sure
>>> if it is really needed, but if so, additional brackets or escape char
>>> could be used, like this for example:
>>> a{A}
>>> \98{\66}
>>> \99{\67}
>>> but as said I don't like very much the idea and would be OK for me to
>>> use numeric values only.
>>> So approximately I see it.
>>> Feel free to share thoughts or criticise.
>>> Mikhail
>>> ___
>>> Python-ideas mailing list
>>> Code of Conduct:
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>> Emergency Response Division
>> NOAA/NOS/OR(206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>> ___
>> Python-ideas mailing list
>> Code of Conduct:


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception
Re: [Python-ideas] More user-friendly version for string.translate()

2016-10-24 Thread Chris Barker
my thought on this:

If you need translate() you probably can write the code to parse a text
file, and then you can use whatever format you want.

This seems a very special case to build into the stdlib.


On Mon, Oct 24, 2016 at 10:39 AM, Mikhail V  wrote:

> Hello all,
> I would be happy to see a somewhat more general and user friendly
> version of string.translate function.
> It could work this way:
> string.newtranslate(file_with_table, Drop=True, Dec=True)
> So the parameters:
> 1. "file_with_table" : a text file with table in following format:
> #[In][Out]
> 97{65}
> 98{66}
> 99{67}
> 100{}
> ...
> 110{110}
> Notes:
> All values are decimal or hex (to switch between parsing format use
> Dec parameter)
> As it turned out from my last discussion, majority prefers hex notation,
> so I am not in mainstream with my decimal notation here, but both
> should be supported.
> Empty [Out] value {} means that the character will be deleted.
> 2. "Drop = True" this will set the default behavior for those values
> which are NOT in the table.
> For Drop = True: all values not defined in table set to [out] = {},
> and be deleted.
> For Drop=False: all values not defined in table set [out] = [in], so
> those remain as is.
> 3. Dec= True : parsing format Decimal/hex. I use decimal everywhere.
> Further thoughts: for 8-bit strings this should be simple to implement
> I think. For 16-bit of course
> there is issue of memory usage for lookup tables, but the gurus could
> probably optimise it.
> E.g. at the parsing stage it is not necessary to build the lookup
> table  for whole 16-bit range of course,
> but take only values till the largest ordinal present in the table file.
> About the format of table file: I suppose many users would want also
> to define characters directly, I am not sure
> if it is really needed, but if so, additional brackets or escape char
> could be used, like this for example:
> a{A}
> \98{\66}
> \99{\67}
> but as said I don't like very much the idea and would be OK for me to
> use numeric values only.
> So approximately I see it.
> Feel free to share thoughts or criticise.
> Mikhail
> ___
> Python-ideas mailing list
> Code of Conduct:


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception
