Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Cameron Simpson
On 26May2021 12:11, Jon Ribbens  wrote:
>On 2021-05-26, Alan Gauld  wrote:
>> I confess I had just assumed the unicode strings were stored
>> in native unicode UTF8 format.
>
>If you do that then indexing and slicing strings becomes very slow.

True, but that isn't necessarily a show stopper. My impression, on 
reflection, is that most slicing is close to the beginning or end of a 
string, and that _most strings are small. (Alan has exceptions at least 
to the latter.) In those circumstances, the cost of slicing a variable 
width encoding is greatly mitigated.

Indexing is probably more general (in my subjective hand waving 
guesstimation). But... how common is indexing into large strings?  
Versus, say, iteration over a large string?

I was surprised when getting introduced to Golang a few years ago that 
it stores all Strings as UTF8 byte sequences. And when writing Go code, 
I found very few circumstances where that would actually bring 
performance issues, which I attribute in part to my suggestions above 
about when, in practical terms, we slice and index strings.

If the internal storage is UTF8, then in an ecosystem where all, or 
most, text files are themselves UTF8 then reading a text file has zero 
decoding cost - you can just read the bytes and store them! And to write 
a String out to a UTF8 file, you just copy the bytes - zero encoding!

Also, UTF8 is a funny thing - it is deliberately designed so that you 
can just jump into the middle of an arbitrary stream of UTF8 bytes and 
find the character boundaries. That doesn't solve slicing/indexing in 
general, but it does avoid any risk of producing mojibake just by 
starting your decode at a random place.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Alan Gauld via Python-list
On 26/05/2021 22:15, Tim Chase wrote:

> If you don't decode it upon reading it in, it should still be 100MB
> because it's a stream of encoded bytes.  

I usually convert them to utf8.

> You don't specify what you then do with this humongous string, 

Mainly I search for regex patterns which can span multiple lines.
I could chunk it up if memory was an issue but a single read is
just more convenient. Up until now it hasn't been an issue and
to be honest I don't often hit multi-byte characters so mostly
it will be single byte character strings.

They are mostly research papers and such from my university days
written on a Commodore PET and various early DOS computers with
weird long-lost word processors. Over the years they've been
exported/converted/reimported and then re-xported several times.
A very few have embedded text or "graphics"/equations which might
have some unicode characters but its not a big issue for me in practice.
I was more just thinking of the kinds of scenario where big strings
might become a problem if suddenly consuming 4x the storage
you expect.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Turtle module

2021-05-26 Thread Greg Ewing

On 27/05/21 4:17 am, Chris Angelico wrote:

Worst case, it
is technically available as the ._fullcircle member, but I would
advise against using that if you can help it!


If you're worried about that, you could create your own
turle subclass that tracks the state how you want.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Turtle module

2021-05-26 Thread boB Stepp
On Wed, May 26, 2021 at 10:59 AM Michael F. Stemper  wrote:

> In order to turn the turtle, I need to select a way to represent
> angles. I could use either degrees or radians (or, I suppose,
> grads). However, for my functions to work, I need to set the
> turtle to that mode. This means that I could possibly mess up
> things for the caller. What I would like to do is capture the
> angle-representation mode on entry and restore it on return.
> However, looking through the methods of turtle.Turtle(), I
> can't find any means of capturing this information

In the "Tell Turtle's state" section
(https://docs.python.org/3/library/turtle.html#tell-turtle-s-state) I
would think that you could roll your own function by using your
knowledge of the coordinates of where the turtle is at and the
turtle.towards(0, 0).  By computing your own angle (knowing what units
you are using) and comparing with what is returned you should be able
to determine what angular units are currently set.

HTH!
boB Stepp
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Tim Chase
On 2021-05-26 18:43, Alan Gauld via Python-list wrote:
> On 26/05/2021 14:09, Tim Chase wrote:
>>> If so, doesn't that introduce a pretty big storage overhead for
>>> large strings?  
>> 
>> Yes.  Though such large strings tend to be more rare, largely
>> because they become unweildy for other reasons.  
> 
> I do have some scripts that work on large strings - mainly produced
> by reading an entire text file into a string using file.read().
> Some of these are several MB long so potentially now 4x bigger than
> I thought. But you are right, even a 100MB string should still be
> OK on a modern PC with 8GB+ RAM!...

If you don't decode it upon reading it in, it should still be 100MB
because it's a stream of encoded bytes.  It would only 2x or 4x in
size if you decoded that (either as a parameter of how you opened it,
or if you later took that string and decoded it explicitly, though
now you have the original 100MB byte-string **plus** the 100/200/400MB
decoded unicode string).

You don't specify what you then do with this humongous string, but
for most of my large files like this, I end up iterating over them
piecewise rather than f.read()'ing them all in at once. Or even if
the whole file does end up in memory, it's usually chunked and split
into useful pieces.  That could mean that each line is its own
string, almost all of which are one-byte-per-char with a couple
strings at sporadic positions in the list-of-strings where they are
2/4 bytes per char.

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Turtle module

2021-05-26 Thread Chris Angelico
On Thu, May 27, 2021 at 6:51 AM Michael F. Stemper  wrote:
>
> On 26/05/2021 11.17, Chris Angelico wrote:
> > On Thu, May 27, 2021 at 1:59 AM Michael F. Stemper  
> > wrote:
>
>
> >>What I would like to do is capture the
> >> angle-representation mode on entry and restore it on return.
> >> However, looking through the methods of turtle.Turtle(), I
> >> can't find any means of capturing this information.
> >>
> >> Similarly, I'd like to return with the pen in the same state
> >> (up or down) as when my functions were called. Again, I can't
> >> find any way to query the pen state.
> >>
> >> How can my function get this information? Or do I need to be
> >> sloppy and let the callers beware?
> >
> > For the most part, this is what you want:
> >
> > https://docs.python.org/3/library/turtle.html#turtle.pen
>
> Just found the same thing myself and came back to report it. But,
> immediately after your link is:
> 
> which seems even better for up/down.

Yeah, if that's the only thing you need.

> > It doesn't seem to include the fullcircle state (which is what
> > .degrees() and .radians() set), and I'm not sure why. Worst case, it
> > is technically available as the ._fullcircle member, but I would
> > advise against using that if you can help it!
>
> Well, I'll take your advice and not use it. Seems really sloppy to
> change an attribute of an object as a side-effect, but I guess that
> I'll have to be sloppy.
>

If this matters to you, here's what I'd recommend: Propose that the
fullcircle state be added to what pen() returns and can update, and
also possibly propose a context manager, so you can do something like
this:

def f():
with turt.local():
turt.pendown()
...

# once we get here, the turtle's pen state will have been restored

I'm not the person to ask about these, as I don't use the turtle
module, but those would seem fairly plausible enhancements.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Alan Gauld via Python-list
On 26/05/2021 14:09, Tim Chase wrote:

>> If so, doesn't that introduce a pretty big storage overhead for
>> large strings?
> 
> Yes.  Though such large strings tend to be more rare, largely because
> they become unweildy for other reasons.

I do have some scripts that work on large strings - mainly produced by
reading an entire text file into a string using file.read(). Some of
these are several MB long so potentially now 4x bigger than I thought.
But you are right, even a 100MB string should still be OK on a
modern PC with 8GB+ RAM!...

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Turtle module

2021-05-26 Thread Michael F. Stemper

On 26/05/2021 13.24, Stefan Ram wrote:

"Michael F. Stemper"  writes:

   What I would like to do is capture the
angle-representation mode on entry and restore it on return.



   another one:

def f( turtle_ ):
 my_turtle = turtle_.clone()
 # now work with my_turtle only


Oooh, I like this. It's proof against other types of bullets that
might come along.


   All of the above ideas have *not* been tested.


My first test found me with two arrows (turtle icons?) upon return.
I immediately realized what caused it, and put
   t.hideturtle()
at the end of my testbed function.

Thanks,
--
Michael F. Stemper
Economists have correctly predicted seven of the last three recessions.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Turtle module

2021-05-26 Thread Michael F. Stemper

On 26/05/2021 11.17, Chris Angelico wrote:

On Thu, May 27, 2021 at 1:59 AM Michael F. Stemper  wrote:




   What I would like to do is capture the
angle-representation mode on entry and restore it on return.
However, looking through the methods of turtle.Turtle(), I
can't find any means of capturing this information.

Similarly, I'd like to return with the pen in the same state
(up or down) as when my functions were called. Again, I can't
find any way to query the pen state.

How can my function get this information? Or do I need to be
sloppy and let the callers beware?


For the most part, this is what you want:

https://docs.python.org/3/library/turtle.html#turtle.pen


Just found the same thing myself and came back to report it. But,
immediately after your link is:

which seems even better for up/down.


It doesn't seem to include the fullcircle state (which is what
.degrees() and .radians() set), and I'm not sure why. Worst case, it
is technically available as the ._fullcircle member, but I would
advise against using that if you can help it!


Well, I'll take your advice and not use it. Seems really sloppy to
change an attribute of an object as a side-effect, but I guess that
I'll have to be sloppy.

Thanks
--
Michael F. Stemper
Deuteronomy 10:18-19
--
https://mail.python.org/mailman/listinfo/python-list


Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Terry Reedy

On 5/26/2021 12:07 PM, Chris Angelico wrote:

On Thu, May 27, 2021 at 1:59 AM Jon Ribbens via Python-list
 wrote:


On 2021-05-26, Alan Gauld  wrote:

On 25/05/2021 23:23, Terry Reedy wrote:

In CPython's Flexible String Representation all characters in a string
are stored with the same number of bytes, depending on the largest
codepoint.


I'm learning lots of new things in this thread!

Does that mean that if I give Python a UTF8 string that is mostly single
byte characters but contains one 4-byte character that Python will store
the string as all 4-byte characters?


Note that while unix uses utf-8, Windows uses utf-16.


If so, doesn't that introduce a pretty big storage overhead for
large strings?


Memory is cheap ;-)



This is true, but sometimes memory translates into time - either
direction. When the Flexible String Representation came in, it was
actually an alternative to using four bytes per character on ALL
strings (not just those that contain non-BMP characters),


Except on Windows, where CPython used 2 bytes/char + surrogates for 
non-BMP char.  This meant that indexing did not quite work on Windows 
and that applications that allowed astral chars and wanted to work on 
all systems had to have separate code for Windows and unix-based systems.



and it
actually improved performance quite notably, despite some additional
complications.


And it made CPython text manipulation code work on all CPython system.


Performance optimization is a funny science :)



--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Turtle module

2021-05-26 Thread Chris Angelico
On Thu, May 27, 2021 at 1:59 AM Michael F. Stemper  wrote:
> In order to turn the turtle, I need to select a way to represent
> angles. I could use either degrees or radians (or, I suppose,
> grads). However, for my functions to work, I need to set the
> turtle to that mode. This means that I could possibly mess up
> things for the caller. What I would like to do is capture the
> angle-representation mode on entry and restore it on return.
> However, looking through the methods of turtle.Turtle(), I
> can't find any means of capturing this information.
>
> Similarly, I'd like to return with the pen in the same state
> (up or down) as when my functions were called. Again, I can't
> find any way to query the pen state.
>
> How can my function get this information? Or do I need to be
> sloppy and let the callers beware?

For the most part, this is what you want:

https://docs.python.org/3/library/turtle.html#turtle.pen

It doesn't seem to include the fullcircle state (which is what
.degrees() and .radians() set), and I'm not sure why. Worst case, it
is technically available as the ._fullcircle member, but I would
advise against using that if you can help it!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Chris Angelico
On Thu, May 27, 2021 at 1:59 AM Jon Ribbens via Python-list
 wrote:
>
> On 2021-05-26, Alan Gauld  wrote:
> > On 25/05/2021 23:23, Terry Reedy wrote:
> >> In CPython's Flexible String Representation all characters in a string
> >> are stored with the same number of bytes, depending on the largest
> >> codepoint.
> >
> > I'm learning lots of new things in this thread!
> >
> > Does that mean that if I give Python a UTF8 string that is mostly single
> > byte characters but contains one 4-byte character that Python will store
> > the string as all 4-byte characters?
> >
> > If so, doesn't that introduce a pretty big storage overhead for
> > large strings?
>
> Memory is cheap ;-)
>

This is true, but sometimes memory translates into time - either
direction. When the Flexible String Representation came in, it was
actually an alternative to using four bytes per character on ALL
strings (not just those that contain non-BMP characters), and it
actually improved performance quite notably, despite some additional
complications.

Performance optimization is a funny science :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Pandas: How does df.apply(lambda work to create a result

2021-05-26 Thread Veek M
t = pd.DataFrame([[4,9],]*3, columns=['a', 'b'])
   a  b
0  4  9
1  4  9
2  4  9

t.apply(lambda x: [x]) gives
a[[1, 2, 2]]
b[[1, 2, 2]]
How?? When you 't' within console the entire data frame is dumped but how are 
the individual elements passed into .apply()? I can't do lambda x,y: [x,y] 
because only 1 arg is passed so how does [4] generate [[ ]]

Also - this:
 t.apply(lambda x: [x], axis=1)
0[[139835521287488, 139835521287488]]
1[[139835521287488, 139835521287488]]
2[[139835521287488, 139835521287488]]
vey weird - what just happened??

In addition, how do I filter data eg:  t[x] = t[x].apply(lambda x: x*72.72) I'd 
like to remove numeric -1 contained in the column output of t[x]. 'filter' only 
works with labels of indices, i can't do t[ t[x] != -1 ] because that will then 
generate all the rows and I have no idea how that will be translate to 
within a .apply(lambda x... (hence my Q on what's going on internally)

Could someone clarify please.

(could someone also tell me briefly the best way to use NNTP and filter
out the SPAM - 'pan' and 'tin' don't work anymore afaik
[eternal-september]  and I'm using slrn currently - the SLang regex is 
weird within the kill file - couldn't get it to work - wound up killing
everything when I did
Subject: [A-Z][A-Z][A-Z]+
)
-- 
https://mail.python.org/mailman/listinfo/python-list


Turtle module

2021-05-26 Thread Michael F. Stemper

I recently discovered the turtle module and have been playing
around with it for the last few days. I've started writing some
functions for turtles, and would like to make them a bit more
robustly than what I currently have.

In particular, I'd like the state of the turtle to be more or
less the same after the function return as it was before the
function invocation.

In order to turn the turtle, I need to select a way to represent
angles. I could use either degrees or radians (or, I suppose,
grads). However, for my functions to work, I need to set the
turtle to that mode. This means that I could possibly mess up
things for the caller. What I would like to do is capture the
angle-representation mode on entry and restore it on return.
However, looking through the methods of turtle.Turtle(), I
can't find any means of capturing this information.

Similarly, I'd like to return with the pen in the same state
(up or down) as when my functions were called. Again, I can't
find any way to query the pen state.

How can my function get this information? Or do I need to be
sloppy and let the callers beware?

--
Michael F. Stemper
2 Chronicles 19:7
--
https://mail.python.org/mailman/listinfo/python-list


Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Jon Ribbens via Python-list
On 2021-05-26, Alan Gauld  wrote:
> On 25/05/2021 23:23, Terry Reedy wrote:
>> In CPython's Flexible String Representation all characters in a string 
>> are stored with the same number of bytes, depending on the largest 
>> codepoint.
>
> I'm learning lots of new things in this thread!
>
> Does that mean that if I give Python a UTF8 string that is mostly single
> byte characters but contains one 4-byte character that Python will store
> the string as all 4-byte characters?
>
> If so, doesn't that introduce a pretty big storage overhead for
> large strings?

Memory is cheap ;-)

> I confess I had just assumed the unicode strings were stored
> in native unicode UTF8 format.

If you do that then indexing and slicing strings becomes very slow.
-- 
https://mail.python.org/mailman/listinfo/python-list


exit() builtin, was Re: imaplib: is this really so unwieldy?

2021-05-26 Thread Peter Otten

On 26/05/2021 01:02, Cameron Simpson wrote:

On 25May2021 15:53, Dennis Lee Bieber  wrote:

On Tue, 25 May 2021 19:21:39 +0200, hw  declaimed the
following:

Oh ok, it seemed to be fine.  Would it be the right way to do it with
sys.exit()?  Having to import another library just to end a program
might not be ideal.


I've never had to use sys. for exit...

C:\Users\Wulfraed>python
Python ActivePython 3.8.2 (ActiveState Software Inc.) based on
on win32
Type "help", "copyright", "credits" or "license" for more information.

exit()




I have learned a new thing today.

Regardless, hw didn't call it, just named it :-)


exit() is inserted into the built-ins by site.py. This means it may not 
be available:


PS D:\> py -c "exit('bye ')"
bye
PS D:\> py -S -c "exit('bye ')"
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'exit' is not defined

I have no idea if this is of any practical relevance...

--
https://mail.python.org/mailman/listinfo/python-list


Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Tim Chase
On 2021-05-26 08:18, Alan Gauld via Python-list wrote:
> Does that mean that if I give Python a UTF8 string that is mostly
> single byte characters but contains one 4-byte character that
> Python will store the string as all 4-byte characters?

As best I understand it, yes:  the cost of each "character" in a
string is the same for the entire string, so even one lone 4-byte
character in an otherwise 1-byte-character string is enough to push
the whole string to 4-byte characters.  Doesn't effect other strings
though (so if you had a pure 7-bit string and a unicode string, the
former would still be 1-byte-per-char…it's not a global aspect)

If you encode these to a UTF8 byte-string, you'll get the space
savings you seek, but at the cost of sensible O(1) indexing.

Both are a trade-off, and if your data consists mostly of 7-bit ASCII
characters, or lots of small strings, the overhead is less pronounced
than if you have one single large blob of text as a string.

> If so, doesn't that introduce a pretty big storage overhead for
> large strings?

Yes.  Though such large strings tend to be more rare, largely because
they become unweildy for other reasons.

-tkc


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Chris Angelico
On Wed, May 26, 2021 at 10:04 PM Alan Gauld via Python-list
 wrote:
>
> On 25/05/2021 23:23, Terry Reedy wrote:
>
> > In CPython's Flexible String Representation all characters in a string
> > are stored with the same number of bytes, depending on the largest
> > codepoint.
>
> I'm learning lots of new things in this thread!
>
> Does that mean that if I give Python a UTF8 string that is mostly single
> byte characters but contains one 4-byte character that Python will store
> the string as all 4-byte characters?

Nitpick: It won't be "a UTF-8 string"; it will be "a Unicode string".
UTF-8 is a scheme for representing Unicode as a series of bytes, so if
something is UTF-8, it'll be like b'Stra\xc3\x9fe' (with two bytes
representing one non-ASCII character), whereas the corresponding
Unicode string is 'Stra\xdfe' with a single character. Or, if it were
beyond the first 256 characters, '\u2026' is an ellipsis,
b'\xe2\x80\xa6' is a UTF-8 representation of that same character. And
if it's beyond the BMP, then '\U0001F921' is one of the few non-ASCII
characters that you can legitimately write off as a "funny character",
and b'\xf0\x9f\xa4\xa1' is the UTF-8 byte sequence that would carry
that.

So. Yes, if you give Python a large ASCII string with a single non-BMP
character, the entire string *will* be stored as four-byte characters.

(Or, to nitpick against myself: CPython will do this. Other Python
implementations are free to do differently, and for instance, uPy
actually uses UTF-8 like you were predicting. For the rest of this
post, when I say "Python", I actually mean "CPython 3.3 or later".)

> If so, doesn't that introduce a pretty big storage overhead for
> large strings?
>
> >
> >  >>> sys.getsizeof('\U0001')
> > 80
> >  >>> sys.getsizeof('\U0001'*2)
> > 84
> >  >>> sys.getsizeof('a\U0001')
> > 84

Correct. Each additional character is going to cost you four bytes.

> Which is what this seems to be saying.
>
> I confess I had just assumed the unicode strings were stored
> in native unicode UTF8 format.
>

UTF-8 isn't native any more than any other encoding. It's a good
compact format for transmission, but it's quite inefficient for
manipulation. Python opts to spend some memory in order to improve
time, because that's usually the correct tradeoff to make - it means
that indexing in a large string is fast, slicing a large string is
fast, etc, etc, etc.

Also, the truth is that, *in practice*, very few strings will pay this
sort of penalty. If you have a whole lot of (say) Chinese text,
there's going to be a small proportion of ASCII text, but most of the
text is going to be wider characters. Working with most European
languages will require the use of the BMP (which means 16-bit text),
but not anything beyond. And if someone's going to use one emoji from
the supplemental planes (which would require 32-bit text), it's fairly
likely that they'll use multiple.

And if you look at all strings in the Python interpreter, the vast
majority of them will be ASCII-only, getting optimized all the way
down to a single byte. Remember, every module-level variable is stored
in that module's dictionary, keyed by its name - and *most* variable
names in Python are ASCII.

So while it's true that, in theory, a single wide character can cost
you a lot of memory... in practice, this is still a lot more compact,
overall, than storing all strings in UCS-2.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-26 Thread Greg Ewing

On 26/05/21 7:15 pm, hw wrote:
it could at least figure out which, the 
type or the variable, to use as parameter for a function that should 
specify a variable type when the name is given.  Obviously, python knows 
what's expected, so why not chose that.


It knows, but not until *after* the function is called. Not
when the parameters are being evaluated.

Note that not even C does what you're asking for. If you use
a type name where a variable name is expected or vice versa
in C, you get a compile-time error. It doesn't try to
automagically figure out what you mean. The only difference
is that Python gives you the error at run time.

Also, you seem to be thinking only of the case where the type
name appears directly in the call. But what about this:

int = 42
type = int
if isinstance(something, type):
...

Would you expect Python to auto-correct this as well?

Maybe a special 
syntax for declaring type names which then can not become ambigous so 
easily would be a good thing?


The fact that types are treated as a form of data is a useful
feature. It means you can use the full power of the language on
them. Dividing names into two kinds, types and non-types, would
severely diminish that power.

I already wondered that the scope of variables is not limited to the 
context they are declared within:


for number in something:
     # do stuff


This has been a subject of debate for quite a long time.
Many people feel that a for-loop variable should be local to
the loop, but Guido deliberately didn't make it that way.
He felt it was useful to be able to refer to the loop
variable after the loop finishes. That way you can do things
like search for something in a loop, break out early when it's
found, and the loop variable then contains the thing you found.

It also has the benefit of being simple and consistent with
the way scoping and assignments work in the rest of the language.

There are other ways to code a search, of course, but it's been
the way it is from the beginning, and changing it now would be
massively disruptive.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Alan Gauld via Python-list
On 25/05/2021 23:23, Terry Reedy wrote:

> In CPython's Flexible String Representation all characters in a string 
> are stored with the same number of bytes, depending on the largest 
> codepoint.

I'm learning lots of new things in this thread!

Does that mean that if I give Python a UTF8 string that is mostly single
byte characters but contains one 4-byte character that Python will store
the string as all 4-byte characters?

If so, doesn't that introduce a pretty big storage overhead for
large strings?

> 
>  >>> sys.getsizeof('\U0001')
> 80
>  >>> sys.getsizeof('\U0001'*2)
> 84
>  >>> sys.getsizeof('a\U0001')
> 84

Which is what this seems to be saying.

I confess I had just assumed the unicode strings were stored
in native unicode UTF8 format.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-26 Thread Chris Angelico
On Wed, May 26, 2021 at 5:49 PM hw  wrote:
>
> On 5/25/21 10:32 AM, Chris Angelico wrote:
> > On Tue, May 25, 2021 at 1:00 PM hw  wrote:
> >>
> >> On 5/24/21 3:54 PM, Chris Angelico wrote:
> >>> You keep using that word "unfinished". I do not think it means what
> >>> you think it does.
> >>
> >> What do you think I think it means?
> >
> > I think it means that the language is half way through development,
> > doesn't have enough features to be usable, isn't reliable enough for
> > production, and might at some point in the future become ready to use.
>
> Right, that is what it seemed.

And that's where you're insulting (a) the Python devs, (b) the Python
user community, and (c) everyone who dares to use such a language in
production.

Without evidence.

> > None of which is even slightly supported by evidence.
>
> That's not true.  Remember the change from version 2 to 3 and ask
> yourself how likely it is that breaking things like that happens again.
>   You may find a statement from developers about a policy that changes
> are to be announced a year before they go in, and that is evidence enough.
>
> What you make of this policy and what it means to you is for you to
> decide, but the evidence is clearly there.

Ah yes. A breaking change *a decade ago*, and which has been clearly
stated as not being repeated, is cause for you to be scared. Can you
quit fudding please?

> > You can complain about whether it's likeable or not, but all you're
> > doing is demonstrating the Blub Paradox.
>
> And you remain unable to show how python making it easy to mess up type
> names is a likeable feature.

Exactly what I'm saying about Blub. You assume that every language has
to treat type names as keywords, because that's the only model that
fits inside your brain.

> It looked unfinished to me because it doesn't even give an error message
> when I assign something to a type name as if it was a variable.

That's because a type name IS a variable, yet you can't understand
that this is a good thing.

> > Yes, because C uses keywords for types. That's the only difference
> > you're seeing here. You keep getting caught up on this one thing, one
> > phenomenon that comes about because of YOUR expectations that Python
> > and C should behave the same way. If you weren't doing isinstance
> > checks, you wouldn't even have noticed this! It is *NOT* a fundamental
> > difference.
>
> When it doesn't make a difference, then why doesn't python just use
> keywords for types and avoids all this ambiguity?

Because Python has a lot of types, and lets you create your own.
Fundamentally it *cannot* use keywords for every type name that you
might create.

> > Also, you keep arguing against the language, instead of just using it
> > the way it is. It really sounds to me like you'd do better to just
> > give up on Python and go use some language that fits your brain
> > better. If you won't learn how a language works, it's not going to
> > work well for you.
>
> I'm not arguing against the language but discussing it with people who
> are trying to defend it.  I tried to use it yesterday and failed because
> the documentation of a library I wanted to use it with is bad, and
> because maybe it wasn't such a good library.  I would use it with
> another library for another purpose, but there is no documentation for
> that library at all, so I can't use it.  So currently, it's not looking
> good for learning python.

Yes, you are arguing against the language.

I'm done arguing with you. Clearly you do not want to use Python - you
want to warp it to fit inside your own brain. Go use something else
that actually works for you, and let the rest of us be productive with
languages that are more powerful than you dare imagine.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: learning python ...

2021-05-26 Thread hw

On 5/25/21 10:32 AM, Chris Angelico wrote:

On Tue, May 25, 2021 at 1:00 PM hw  wrote:


On 5/24/21 3:54 PM, Chris Angelico wrote:

You keep using that word "unfinished". I do not think it means what
you think it does.


What do you think I think it means?


I think it means that the language is half way through development,
doesn't have enough features to be usable, isn't reliable enough for
production, and might at some point in the future become ready to use.


Right, that is what it seemed.


None of which is even slightly supported by evidence.


That's not true.  Remember the change from version 2 to 3 and ask 
yourself how likely it is that breaking things like that happens again. 
 You may find a statement from developers about a policy that changes 
are to be announced a year before they go in, and that is evidence enough.


What you make of this policy and what it means to you is for you to 
decide, but the evidence is clearly there.



Python has keywords. C has keywords. In Python, "None" is a keyword,
so you can't assign to it; in C, "int" is a keyword, so you can't
assign to it. There is no fundamental difference here, and the two
languages even have roughly the same number of keywords (35 in current
versions of Python; about 40 or 50 in C, depending on the exact
specification you're targeting). The only difference is that, in
Python, type names aren't keywords. You're getting caught up on a
trivial difference that has happened to break one particular test that
you did, and that's all.


Then what is 'float' in the case of isinstance() as the second
parameter, and why can't python figure out what 'float' refers to in
this case?  Perhaps type names should be keywords to avoid confusion.


It's a name. In Python, any name reference is just a name reference.
There's no magic about the language "knowing" that the isinstance()
function should take a keyword, especially since there's no keywords
for these things.


When I look at the error message, it seems to indicate that python knows 
very well what kind of parameter is expected.



Maybe you can show how this is a likeable feature.  I already understood
that you can somehow re-define functions in python and I can see how
that can be useful.  You can do things like that in elisp as well.  But
easily messing up built-in variable types like that is something else.
Why would I want that in a programming language, and why would I want to
use one that allows it?


Because all you did was mess with the *name* of the type. It's not
breaking the type system at all.


And how is it a likeable feature?


You can complain about whether it's likeable or not, but all you're
doing is demonstrating the Blub Paradox.


And you remain unable to show how python making it easy to mess up type 
names is a likeable feature.



The C language never says that Python is "unfinished". I'm not making
assumptions, I'm reading your posts.


I never said it is unfinished, I said it /seems/ unfinished.  In any
case, there is nothing insulting about it.  Python is still being worked
on (which is probably a good thing), and the switch from version 2 to
version 3 has broken a lot of software, which doesn't help in making it
appear as finished or mature.


It's been around for thirty years. Quit fudding. You're getting very
close to my killfile.


I take it you can't stand it when someone thinks differently than you 
do.  Good for you that you have a kill file to protect you from 
different thoughts and ideas.



Python 3 has been around since 2009. Are you really telling me that
Python looks unfinished because of a breaking change more than a
decade ago?


It looked unfinished to me because it doesn't even give an error message 
when I assign something to a type name as if it was a variable.


Breaking things like that doesn't help, and it doesn't matter that it 
happened a while ago because I still have to deal with the issues it caused.


That something has been around for a while doesn't have anything to do 
with how finished or unfinished it may appear.



The Go language didn't even *exist* before Python 3 - does
that mean that Go is also unfinished?


Go is one of the games I never learned to play, though it seems kinda 
interesting.



Just look at what the compiler says when you try to compile these
examples.  In the first example, you can't defeat a built-in data type
by assigning something to it, and in the second one, you declare
something as an instance of a build-in data type and then try to use it
as a function.  That is so because the language is designed as it is.


Yes, because C uses keywords for types. That's the only difference
you're seeing here. You keep getting caught up on this one thing, one
phenomenon that comes about because of YOUR expectations that Python
and C should behave the same way. If you weren't doing isinstance
checks, you wouldn't even have noticed this! It is *NOT* a fundamental
difference.


When it doesn't make a difference, then why d

Re: learning python ...

2021-05-26 Thread hw

On 5/25/21 9:42 AM, Greg Ewing wrote:

On 25/05/21 2:59 pm, hw wrote:
Then what is 'float' in the case of isinstance() as the second 
parameter, and why can't python figure out what 'float' refers to in 
this case? 


You seem to be asking for names to be interpreted differently
when they are used as parameters to certain functions.


Sure, why not.  Since python allows me to use the name of a variable 
type as a name for a variable, it could at least figure out which, the 
type or the variable, to use as parameter for a function that should 
specify a variable type when the name is given.  Obviously, python knows 
what's expected, so why not chose that.



Python doesn't do that sort of thing. The way it evaluates
expressions is very simple and consistent, and that's a good
thing. It means there aren't any special cases to learn and
remember.


Ok, that's certainly a valid point.  It could be a general case that 
python picks for functions what is expected of their parameters when 
their parameters have ambigous names.



Maybe you're not aware that isinstance is just a function,
and not any kind of special syntax?


Perhaps type names should be keywords to avoid confusion.


Python has quite a lot of built-in types, some of them in
the builtin namespace, some elsewhere. Making them all keywords
would be impractical, even if it were desirable.

And what about user-defined types? Why should they be treated
differently to built-in types? Or are you suggesting there
should be a special syntax for declaring type names?


I don't know; how many different types does it have?  Maybe a special 
syntax for declaring type names which then can not become ambigous so 
easily would be a good thing?


I already wondered that the scope of variables is not limited to the 
context they are declared within:



for number in something:
# do stuff

# do more stuff or not ...
print(number)


When you do something like that in perl, the variable you declared is 
out of scope outside the loop whereas python surprises me by keeping it 
around.  Somehow, the more I learn about python, the harder it becomes 
for me to read.


Maybe that changes over time, yet keeping variables around like that is 
not something I would prefer.

--
https://mail.python.org/mailman/listinfo/python-list