from:"Eli the Bearded"

Re: Printing UTF-8 mail to terminal

2024-11-02 Thread Eli the Bearded via Python-list

In comp.lang.python, Gilmeh Serda   wrote:
> Python 3.12.6 (main, Sep  8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> help('modules')
> 
> Please wait a moment while I gather a list of all available modules...
> 
> AssemblyApp apparmorio  pyzipper
> AssemblyGui appdirs ipaddress   qrtools
> CAMSimulatorapplication_utility isodate queue
> Cheetah apprise isort   quopri
> [...]
> """
> 
> Put it in a list, unmangle it, sort it and you should have an alphabetical 
> list of all modules on your system.

As someone who has done a lot of work with email in other languages,
"quopri" is not a name I'd expect or look for first pass for dealing
with MIME quoted-printable encoding. (Me, being me, I'd probably just
write it for myself if I didn't quickly find it while working with
email.)

Elijah
--
MIME: multipurpose Internet mail extensions
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Find the path of a shell command

2022-10-13 Thread Eli the Bearded

In comp.lang.python, jkn   wrote:
> On Wednesday, October 12, 2022 at 6:12:23 AM UTC+1, jak wrote:
>> I'm afraid you will have to look for the command in every path listed in 
>> the PATH environment variable.
> erm, or try 'which rm' ?

It is so hilarious seeing the responses to this thread. Hint: what do
you think the `which` program does?

$ touch /var/tmp/rm
$ chmod 755 /var/tmp/rm
$ env -i PATH=/etc:/usr:/lib:/var/tmp:/tmp /usr/bin/which rm
/var/tmp/rm
$ mv /var/tmp/rm /tmp/ 
$ env -i PATH=/etc:/usr:/lib:/var/tmp:/tmp /usr/bin/which rm
/tmp/rm
$ 

Elijah
--
/usr/bin/busybox rm /tmp/rm
-- 
https://mail.python.org/mailman/listinfo/python-list

-ffast-math

2022-09-07 Thread Eli the Bearded

https://twitter.com/moyix/status/1567167774039973888

Brendan Dolan-Gavitt @moyix

New blog post is live! In which I download 4 TB of Python packages
containing native x86-64 libraries and see how many of them use
-ffast-math, potentially altering floating point behavior in any
program unlucky enough to load them!


https://moyix.blogspot.com/2022/09/someones-been-messing-with-my-subnormals.html

8:08 AM - Sep 6, 2022

It's quite an adventure, a longish read but fun.

Elijah
--
TL;DR: two dependencies using same misguided makefile cause most issues
-- 
https://mail.python.org/mailman/listinfo/python-list

news to me today: RIP Aahz

2021-11-18 Thread Eli the Bearded

Aahz, co-author of Python for Dummies with Stef Maruch, recently passed
away.

Tiny death notice (with name typo) from the wilds of the Internet:

http://file770.com/pixel-scroll-10-15-21-i-know-what-pixel-you-scrolled-last-summer/

(12) AAHZ MARUCH (1967-2021). [Item by James Davis Nicoll.] Python
programmer, whose fannish activities date back at least as far as
classic USENET (alt.poly and other groups), died October 14
following several years of ill health. Survived by partner Steph
Maruch.

Editor's postscript: Alan Prince Winston earlier this year described
him as "an unstoppable-seeming guy" who "became a contra and square
dance caller and choreographer despite really severe hearing
impairment."

I met Aahz once. He always wanted to be a mononym person, and used his
partner's surname only reluctantly.

Elijah
--
Aahz's rule6 website seems to be held by a squatter now

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: The task is to invent names for things

2021-10-27 Thread Eli the Bearded

In comp.lang.python, Peter J. Holzer  wrote:
 ^^

> On 2021-10-27 12:41:56 +0200, Karsten Hilbert wrote:
>> In that situation, is it preferable to choose a nonsensical
>> name over a mediocre one ?
> I don't know. A mediocre name conveys at least some information, and
> that seems to be better than none. On the other hand it might be just
> enough to lead the reader astray which wouldn't happen with a
> non-sensical name.

C is named as a pun on the earlier B, which was derived from BCPL, the
Barely Complete Programming Language. Unix is a pun on Multix, Linux
a personalization of Unix. sed is a portmanteaux of "stream editor".
AWK is named for the authors' initials. Perl started out as an
initialism (Practical Extraction and Report Language, I think), which
the manpage used to lead with. Ruby is named as a play on Perl, it's a
different four letter gem, and Ruby has obvious Perl influence.

But "Python"? What's Python named for? Hint, it's a pretty "non-sensical
name" for a computer language.

> But since perfect names are hard to find, using nonsensical instead of
> mediocre names would mean choosing nonsensical names most of the time.
> So I'll stick with mediocre names if in doubt.

The choice of a non-sensical is perfectly fine _when_ it's a major
component. Kafka, Python, Java, Rust. Those are all non-sensically named,
in that the name doesn't fit what it is, by pun, initials, or reference.
Someone just liked the name and applied it to thing being build. The
designer of Kafka liked the author. Guido liked Monty Python. Java is
named for coffee. Rust is named for a fungus. 

Those all work. But if you are writing a new web framework and you name
your method to log stuff to a remote server "Britney" because you were
listening the singer, that's not perfectly fine, even you want to make
"Oops, I did it again" jokes about your logged errors.

Where naming has a great importance to understanding, it needs to be
done carefully. Mediocre names work, but can be confusing. For the
remote logging example, there's probably not a lot of difficulty with
mediocre. If you're doing something with a multiple of things, do you
call it a "pod", "cluster", "group", "set", etc? You can pick one but
then when you have multiples of multiples, you'll want to pick another
and if you do it wrong it will confuse people. A Kubernetes "cluster"
will run replica "sets" of "pods" (each of which have one or more
"containers", but "containers" is a word that predates Kubernetes).

If your framework runs "sets" of "clusters" that reversal of heirarchy
ends up being more likely to confuse. Or look at the mess that AWS has
for Elasticache Redis: you can have a "cluster" that provides redundancy
in case something fails. Or you can run in "cluster mode" which shards
the data across multiple independent nodes. If you want redundancy
in "cluster mode" you can can have groups of replicas.

Redis no replication or sharding:   Node

Redis non-cluster mode cluster: Node Replica-Node1 ...

Redis cluster mode cluster no replicaion:
Shard-1-Primary-Node
Shard-2-Primary-Node
...

Redis cluster mode cluster with replicas:
Shard-1-Primary-Node Shard-1-Replica-Node-1 ...
Shard-2-Primary-Node Shard-2-Replica-Node-1 ...
...

Maybe this is Redis's fault or maybe it's AWS's, I don't know the
history of these names, I've only used it on AWS. But whoever did the
naming did a "mediocre" job to come up with something that's called a
"Redis cluster-mode enabled cluster".

https://aws.amazon.com/blogs/database/work-with-cluster-mode-on-amazon-elasticache-for-redis/

Elijah
--
naming is hard, unless it's easy
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: XML Considered Harmful

2021-09-25 Thread Eli the Bearded

In comp.lang.python, Chris Angelico   wrote:
> Eli the Bearded <*@eli.users.panix.com> wrote:
>> I'd use one of the netpbm formats instead of JPEG. PBM for one bit
>> bitmaps, PGM for one channel (typically grayscale), PPM for three
>> channel RGB, and PAM for anything else (two channel gray plus alpha,
>> CMYK, RGBA, HSV, YCbCr, and more exotic formats). JPEG is tricky to
>> map to CSV since it is a three channel format (YCbCr), where the
>> channels are typically not at the same resolution. Usually Y is full
>> size and the Cb and Cr channels are one quarter size ("4:2:0 chroma
>> subsampling"). The unequal size of the channels does not lend itself
>> to CSV, but I can't say it's impossible.
> Examine prior art, and I truly do mean art, from Matt Parker:
> https://www.youtube.com/watch?v=UBX2QQHlQ_I

His spreadsheet is a PPM file, not a JPEG. You can tell because all of
the cells are the same size.

He also ignores vector graphics when considering digital images. Often
they are rendered in what he calls "spreadsheets" but not always. I have
a Vectrex, for example.

Elijah
--
then there's typewriter art with non-square "pixels"
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: XML Considered Harmful

2021-09-23 Thread Eli the Bearded

In comp.lang.python, Christian Gollwitzer   wrote:
> Am 22.09.21 um 16:52 schrieb Michael F. Stemper:
>> On 21/09/2021 19.30, Eli the Bearded wrote:
>>> Yes, CSV files can model that. But it would not be my first choice of
>>> data format. (Neither would JSON.) I'd probably use XML.
>> Okay. 'Go not to the elves for counsel, for they will say both no
>> and yes.' (I'm not actually surprised to find differences of opinion.)

Well, I have a recommendation with my answer.

> It's the same as saying "CSV supports images". Of course it doesn't, its 
> a textfile, but you could encode a JPEG as base64 and then put this 
> string into the cell of a CSV table. That definitely isn't what a sane 
> person would understand as "support".

I'd use one of the netpbm formats instead of JPEG. PBM for one bit
bitmaps, PGM for one channel (typically grayscale), PPM for three
channel RGB, and PAM for anything else (two channel gray plus alpha,
CMYK, RGBA, HSV, YCbCr, and more exotic formats). JPEG is tricky to
map to CSV since it is a three channel format (YCbCr), where the
channels are typically not at the same resolution. Usually Y is full
size and the Cb and Cr channels are one quarter size ("4:2:0 chroma
subsampling"). The unequal size of the channels does not lend itself
to CSV, but I can't say it's impossible.

But maybe you meant the whole JFIF or Exif JPEG file format base64
encoded with no attempt to understand the image. That sort of thing
is common in JSON, and I've seen it in YAML, too. It wouldn't surprise
me if people do that in CSV or XML, but I have so far avoided seeing
that. I used that method for sticking a tiny PNG in a CSS file just
earlier this month. The whole PNG was smaller than the typical headers
of an HTTP/1.1 request and response, so I figured "don't make it a
separate file".

Elijah
--
can at this point recegnize a bunch of "magic numbers" in base64

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: XML Considered Harmful

2021-09-21 Thread Eli the Bearded

In comp.lang.python, Michael F. Stemper  wrote:
> I've heard of JSON, but never done anything with it.

You probably have used it inadvertantly on a regular basis over the
past few years. Websites live on it.

> How does CSV handle hierarchical data? For instance, I have
> generators[1], each of which has a name, a fuel and one or more
> incremental heat rate curves. Each fuel has a name, UOM, heat content,
> and price. Each incremental cost curve has a name, and a series of
> ordered pairs (representing a piecewise linear curve).
> 
> Can CSV files model this sort of situation?

Can a string of ones and zeros encode the sounds of Bach, the images
of his sheet music, the details to reproduce his bust in melted plastic
extruded from nozzle under the control of machines?

Yes, CSV files can model that. But it would not be my first choice of
data format. (Neither would JSON.) I'd probably use XML.

I rather suspect that all (many) of those genomes that end up in
Microsoft Excel files get there via a CSV export from a command line
tool. Once you can model life in CSV, everything seems possible.

> [1] The kind made of tons of iron and copper, filled with oil, and
> rotating at 1800 rpm.

Those are rather hard to model in CSV, too, but I'm sure it could be
done.

Elijah
--
for bonus round, use punched holes in paper to encode the ones and zeros
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: basic auth request

2021-08-25 Thread Eli the Bearded

In comp.lang.python, Barry wrote:
> It is possible to sign an ip address in a certificate, but that is not
> often done.

It's bad practice. I've never seen one in the wild.

> Getting to reuse the IP address that example.com was using will not help
> the attacker unless they can make a cert that signs the dns name.
> And that means they hacked the CA which is a big problem.

You misunderstand the attack. Some web searching suggests the term is
"dangling DNS record".

Big co Acme Example, with example.com, has a website for the regular
public on www.example.com, gets mail at mail.example.com, serves
DNS from ns1., ns2. and ns3.example.com. The IT staff watch those
domaines very carefully.

One day marketing says, "We've got a big CES show this year, let's
make a website for the press at ces.example.com." They tell execs
the execs tell the IT guys the IT guys say "Okay, what does it point
to?" and Marketing gives them the IP address of the blog site they
just rented. IT sets up an A record. IT does not watch _that_
carefully. Two years later Marketing stops paying the bill on the
blog site, and ces.example.com has a "dangling" DNS record, it
exists but no longer points to a valid resource.

Attacker gets the IP address that points to (maybe they churn
through a bunch of temporary accounts until they do) and now with
the right IP to match ces.example.com they go off to get a SSL
cert for that.

$500 bug bounty write up here for someone who found a dangling
record, but didn't churn for the record to exploit it:

https://gist.github.com/TheBinitGhimire/9ebcd27086a11df1d7ec925e5f604e03

Another variant of this, which probably doesn't get you an SSL
cert, is a dangling CNAME. These can be easier to get. If
ces.example.com was a CNAME to cesdemosite2017.com then when
cesdemosite2017.com expires, it's trivial to re-register it and
squat "as" ces.example.com.

The most insidious version is a DNS delegation. If ces.example.com is an
NS record (unlikely for a marketing site, but plausible for some other
scenarios) and it goes to ns1.parternership.net, when parternership.net
expires the attacker can grab that, create a new ns1.parternership.net
and give themselves finan.ces.example.com then start spitting out bogus
bills with it.

The CAA record adds a smidgen more protection against those attacks.
(I don't think that's what it is designed for, but a good defense
works against more than just the original attack method.)

I also found this in my search, which is exactly the sort of threat
CAA was meant to handle:

https://en.wikipedia.org/wiki/Comodo_Cybersecurity#Dangling_markup_injection_vulnerability
On 25 July 2016, Matthew Bryant showed that Comodo's website is
vulnerable to dangling markup injection attacks and can send emails
to system administrators from Comodo's servers to approve a wildcard
certificate issue request which can be used to issue arbitrary
wildcard certificates via Comodo's 30-Day PositiveSSL product.

Bugs in automated systems that give out arbitrary certs are not
common, but very very nasty.

Elijah
--
DNS: the cause of, and solution to, all our Internet problems
--
https://mail.python.org/mailman/listinfo/python-list

Re: basic auth request

2021-08-25 Thread Eli the Bearded

In comp.lang.python, Jon Ribbens   wrote:
> On 2021-08-25, Eli the Bearded <*@eli.users.panix.com> wrote:
>> $COMPANY puts out a lot of things on different IP addresses from
>> a shared public(ish) pool like AWS and assigns different names
>> to them. Later $COMPANY discontinues one or more of those things,
>> terminates the host, and lets the IP address rejoin the public(ish)
>> pool.
>>
>> $ATTACKER notices the domain name pointing to an unused IP address
>> and works to acquire it for their own server. $ATTACKER then gets
>> a cert for that domain, since they can easily prove ownership of
>> the server through http content challenges. $ATTACKER now has a
>> host in $COMPANY's name to launch phishing attacks.
> How does CAA help with this? Unless the domain owner knows in advance
> that they're going to forget about the hostname and prepares for it
> by setting a CAA record that denies all CAs, the attacker will simply
> get a certificate from one of the permitted CAs - since, as you point
> out, they genuinely own and control the relevant IP address.

I believe the way it helps is by limiting to a CA that will insist
all cert requests come through the right channel, not some random
one off somewhere. This doesn't prevent issues, but does raise the
complexity on an already niche attack.

It does aid in knocking out the easy random one-offs from Let's Encrypt.

Elijah
--
using LE for all his personal sites these days
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: basic auth request

2021-08-25 Thread Eli the Bearded

In comp.lang.python, Jon Ribbens   wrote:
> Another attempt at combatting this problem is DNS CAA records,
> which are a way of politely asking all CAs in the world except the
> ones you choose "please don't issue a certificate for my domain".
> By definition someone who had hacked a CA would pay no attention
> to that request, of course.

Yeah, but it works for the case of forgotten hostnames, a rare but
real attack. Basically it works like this:

$COMPANY puts out a lot of things on different IP addresses from
a shared public(ish) pool like AWS and assigns different names
to them. Later $COMPANY discontinues one or more of those things,
terminates the host, and lets the IP address rejoin the public(ish)
pool.

$ATTACKER notices the domain name pointing to an unused IP address
and works to acquire it for their own server. $ATTACKER then gets
a cert for that domain, since they can easily prove ownership of
the server through http content challenges. $ATTACKER now has a
host in $COMPANY's name to launch phishing attacks.

This probably has some clever infosec name that I don't know.

Elijah
--
or a clever infosec name now forgotten

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python text://protocol client?

2021-04-09 Thread Eli the Bearded

In comp.lang.python, Petite Abeille   wrote:
> Would you know of any python text://protocol client? Or server?

The whole thing was started (judging by git commits) about a month ago.

https://github.com/textprotocol?tab=overview&from=2021-03-01&to=2021-03-31

March 7: First repository

I suspect no one has cared to reimplement it any language since then.

> Thanks in advance.
> 
> [1] https://textprotocol.org

I have read that and I don't understand what one does with this
protocol or why.

> [2] https://github.com/textprotocol/public
> [3] https://github.com/textprotocol/publictext

The Lua code is not long, under 2k LOC. Why don't you just study it and
create your own python version if you care?

Elijah
--
has not been given enough reason to care to read the code
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: A 35mm film camera represented in Python object

2021-04-01 Thread Eli the Bearded

In comp.lang.python, Richard Damon   wrote:
> On 4/1/21 6:41 PM, 2qdxy4rzwzuui...@potatochowder.com wrote:
>> Richard Damon  wrote:
>>> If you keep track of the positions as a floating point number, the
>>> precision will be more than you could actually measure it.
>> I won't disagree with Richard, although I will give you a general
>> warning about floating point rounding issues:  if you do, in fact, end
>> up with your first solution and lots and lots (millions?  billions?
>> more?) of discrete calculations, be aware that what looks like a lot of
>> precision in the beginning may not be all that precise (or accurate) in
>> the end.

I'm having a hard time figuring where the floating point rounding issues
enter in. My reading is that instead of N discrete steps (level 12 is 
1% moved, lever 12 is 2% moved, lever 12 is 3% moved and makes contact
to cam 3, lever 12 is 4% moved and cam 3 is 5% moved; or what not) using
floating points lever 12 could move 0.0 to > 1, and cam 3 start moving
at lever 12 => 0.04.

>> Also, doesn't the overall motion of the camera as a whole also depend on
>> external factors, such as whether/how it's mounted or handheld, the
>> nature of the "ground" (e.g., soft wet sand vs. hard concrete
>> vs. someone standing on a boat in the water), an experienced
>> photographer "squeezing" the shutter release vs. a newbie "pressing the
>> button"?  I can think of plenty of variables; I guess it depends on what
>> you're trying to model and how accurate you intend to be (or not to be).

I suspect very little of the motion of parts *inside* the camera see
meaningful changes from that. The motion of the camera relative to the
scene is meaningful for how motion blurred a shot will be. But the
springs and levers that move as the shutter release button is pushed
will be basically only moving relative to the camera, and not much
changed by tripod or handheld.

> Actually, I would contend that due to all the factors that you can't
> take into account accurately makes the use of floating point more
> applicable. Yes, you need to realize that just because the value has
> many digits, you KNOW that is only an approximation, and you process
> accordingly, knowing you just has an approximation.

All of the parts of the camera were built with some engineering
tolerance. Using a precision in code that exceeds that tolerance fails
to accurately model the camera.

> The real question comes, what is the purpose of the simulation? You can
> NEVER simulate everything, and some parts of 'simulating' requires
> actual hardware to interact with. Sometimes the real thing is the best
> simulation available.

The purpose was stated in the original post: model a particular camera in
software from the point of view of someone who has done repair work. If
lever 12 is bent, and only touches cam 3 after 15% (or => 0.15) motion
that changes the way the camera works. I believe those sorts of things
are meant to be visible in this model.

Elijah
--
would use floating point, or fixed point as if floating
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: A 35mm film camera represented in Python object

2021-03-17 Thread Eli the Bearded

In comp.lang.python,
D.M. Procida  wrote:
> Eli the Bearded <*@eli.users.panix.com> wrote:
>> I see you don't even attempt to tackle ISO outside of
>> supported range (and I have no idea how the camera itself deals with
>> that). Is the camera sensing the ISO from the film roll (so won't work
>> with hand rolled film cartridges)? Is there a setting on the camera to
>> manually specify that? (I don't think so.)
> The camera's film speed setting (it's old enough that it's ASA rather
> than ISO) is set manually. If you try to set an illegal value, there's a
> setter decorator that raises a NonExistentFilmSpeed exception.

I can see what the code does, I'm asking what the camera does and do you
plan to work that into your code? Maybe it only works for ISO 1600 in
manual mode, but works.

> I have to add a button and winder lever to the camera object itself, I'm
> doing those things bit by bit.

Gotcha.

> Yes, it would be fun to allow it to "take a picture" of an image file,
> and process the result. Or ultimately built into a web application using
> somehting like https://anvil.works and have take a real picture with a
> user's webcam.

Yes, that sounds like good future work.

Elijah
--
bring light into the dark box
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: A 35mm film camera represented in Python object

2021-03-17 Thread Eli the Bearded

In comp.lang.python,
D.M. Procida  wrote:
> Hi everyone, I've created  -
> a representation of a Canonet G-III QL17 in Python.
> 
> There's also documentation: .

This is interesting. Some feedback.

> It's a pure Python model of the physical sub-systems of a camera and
> their interactions. So far it's at a fairly high level - I haven't yet
> got down to the level of individual springs and levers yet.

There's a wealth of specifics for that camera above individual springs
and levers. Notably how the light meter performs with different
batteries and how it works with other films. This much is clear from
just a few minutes research on the Canonet G-III QL17 (what a mouthful
of a name).

I'm guessing you plan to deal light meter quirks because of the battery
voltage setting. I see you don't even attempt to tackle ISO outside of
supported range (and I have no idea how the camera itself deals with
that). Is the camera sensing the ISO from the film roll (so won't work
with hand rolled film cartridges)? Is there a setting on the camera to
manually specify that? (I don't think so.)

> You can do things like advance the film, release the shutter, meter the
> scene with the built-in light meter (if the camera has a battery of
> course) and even spoil your film if you make the mistake of opening the
> back in daylight.

Film spoilage isn't boolean in real life. If I rewind most, but not all
of the way, before I open the back, I've only ruined a few frames. If I
open it in a lightproof camera bag, I can take the roll out without
rewinding.

(I've done such things with pin hole cameras.)

> But you can also do things that you shouldn't do, like opening the back
> of the camera in daylight with a partially-exposed roll of film inside -
> which will spoil the film::
> 
> >>> c.back.open()
> Opening back
> Resetting frame counter to 0
> 'Film is ruined'

If luminosity is set to zero, that could emulate the lightproof bag.
Frame by frame "film is ruined" might be a better choice for boolean.

On this camera, there's no manual double exposure setting, right? So
partial rewind would be the way to do that. But I can make double
exposures with this code:

>>> c.shutter.trip()
Shutter openening for 1/128 seconds
Shutter closes
Shutter uncocked
'Tripped'
>>> c.shutter.cock()
Cocking shutter
Cocked
'Cocked'
>>> c.shutter_speed = 1/512
>>> c.shutter.trip()
Shutter openening for 1/512 seconds
Shutter closes
Shutter uncocked
'Tripped'
>>>

In general, I never used simple cameras with light meters. Advanced SLR
or dumb cameras. My personal favorite film camera is a Universal Mercury
II, a half frame 35mm from mid 1940s with hot and cold shoes (intended
for flash and film meter attachments), bulb to 1/1000 shutter range,
mechanical exposure calculator on the back, and a dial for reminding you
what film you have in it.

Does a camera like the one you have modelled that actively stop you from
using a ISO/shutter speed/F-stop that will vastly over- or under- expose
things? Or is it just a warning light in the viewfinder?

Certainly my c.shutter.trip() calls give me no advice from the meter.

A useful thing your camera-as-code model could provide, but doesn't, is
some measure of how exposed each frame is. This will be a function of
film speed, iris setting, cumulative exposure time from zero or more
shutter openings, and scene luminosity. (You could extend this to
include opened back over exposure conditions.)

Elijah
--
can see how this might get integrated with an image generation tool
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Exploring terminfo

2021-01-19 Thread Eli the Bearded

In comp.lang.python, Greg Ewing   wrote:
> On 18/01/21 3:34 am, Alan Gauld wrote:
>> The problem is terminfo is not really part of curses.
>> Curses is built on top of terminfo.
> As far as I can tell from the man pages, terminfo itself
> is just a file format. The only programmatic interfaces I
> can find for it *are* part of curses:

Putting my Unix hat on, curses is a "friendly" library around creating
text-windowed applications. Programs like mutt use curses rather than
raw terminal operations, programs like vi use raw terminal operations.
Either curses or raw terminal operations will (should) consult a
terminal capabilities database to figure out what can be done and how to
do it. The two competing database formats for that are termcap and
terminfo, where terminfo is the newer, better one.

Termcap used a single large text file for all terminals types.
Terminfo uses a directory tree full of small files, one per type.

I'm pretty sure both include the ability to say something along the
lines of "start with this one terminal, and then change these bits".
So that starts to get complicated without a library. Or maybe I'm wrong,
and vi uses curses. I'm not really sure how vi reads the term info files.

Okay, checking the source to the only vi I have lying around[*], it uses
a few curses calls, apparently only these:

int tgetent(char *bp, const char *name);
int tgetflag(char *id);
int tgetnum(char *id);
char *tgetstr(char *id, char **area);
char *tgoto(const char *cap, int col, int row);
int tputs(const char *str, int affcnt, int (*putc)(int));

My local manpage calles this set the "direct curses interface to the
terminfo capability database" whereas things I think of as "curses"
programs use calls like:

WINDOW *initscr(void);
int cbreak(void);
int start_color(void);
int noecho(void);
int move(int y, int x);
int attr_set(attr_t attrs, short pair, void *opts);
int getch(void);
int addch(const chtype ch);
int printw(const char *fmt, ...);

The vi method relies on the programmer knowing what attributes are
wanted and how to use them, and how to use alternatives when the
first choices aren't provided. The curses method relies on the programmer
knowing which of a hundred different library functions to use for any
given output. :^)

[*] It's ex-1.1, an "heirloom" source package that has possibly been
brushed up just enough to compile on a modern system. ex-1.1 is by
most reckoning, the first vi. It won't start in vi mode though, you
need to run ex, then begin the visual mode. It is recongizable as vi
to me, but a somewhat different experience.

Elijah
--
has modified C curses programs, but not written one from scratch
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: why sqrt is not a built-in function?

2021-01-15 Thread Eli the Bearded

In comp.lang.python, Chris Angelico   wrote:
> Michael F. Stemper  wrote:
>> On 15/01/2021 14.01, Stefan Ram wrote:
>>> __import__( "math" ).sqrt( 4 )
>> I had no idea that syntax existed, and find it completely at odds
>> with The Zen of Python. I'm torn between forgetting that I ever saw
>> it and using it for some evilly-obfuscated code.
> I recommend option #2. It is incredibly good fun. For added bonus
> obscurity, don't import a module directly; import it from some
> unrelated module, such as "from ast import sys" or "from base64 import
> re".

Is there an Obfuscated Python contest, like there is with C? I know the
C ones are often quite interesting, like the 2020 entry that implements
tic-tac-toe in "Turing complete" sub-language of printf() format strings.
(Carlini, http://www.ioccc.org/years.html#2020 )

Elijah
--
with the complexity of fitting the 200k format into a 2k source file
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: why sqrt is not a built-in function?

2021-01-14 Thread Eli the Bearded

In comp.lang.python, Ethan Furman   wrote:
> On 1/14/21 11:06 AM, Eli the Bearded wrote:
>> "There should be one-- and preferably only one --obvious way to do it."
>> Plus the ** operation ("root = x ** 0.5"), that's now three ways.
> Yes, but which of those is obvious?

If it's up to me, the ** one.

Elijah
--
using a syntax with "^" instead of "**" would be okay, too
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Exploring terminfo

2021-01-14 Thread Eli the Bearded

In comp.lang.python, Barry Scott   wrote:
> Alan Gauld via Python-list  wrote:
>> I've written a short program that is supposed to
>> - *clear the screen*,
>> - read some input
>> - display the result in a message *highlighted in bold*.
>> - get input to end the program
> It seems that curses does not allow you to mix raw stdin/stdout with its 
> calls.

This sounds very plausable. In C, in curses one uses printw() not
printf().

> If all you want is simple things like bold and clear I'd just use the
> ANSI escape sequences directly.
> 
> Are there any terminals that do not understand ANSI escape sequences
> these days?

Probably, I hear tales of people using odd set-ups from time to time.
But that could just be the circles I hang out in.

When I've wanted to do simple things like bold and clear, I've used the
tput(1) tool. You can capture stdout from the tool and use the output
over and over. Typically I've done this in shell scripts:

#!/bin/sh
bold=$(tput smso)   # set mode stand out
nobold=$(tput rmso) # remove mode stand out
clear=$(tput clear) # clear screen
home=$(tput home)   # home, without clear

for word in Ten Nine Eight Seven Six Five Four Three Two One; do
   echo "${clear}${bold}${word}${nobold} ..."
   sleep 1
done
echo "${home}Nothing happens."
exit

Elijah
--
adapting to python left as an excercise for the reader
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: why sqrt is not a built-in function?

2021-01-14 Thread Eli the Bearded

In comp.lang.python, Skip Montanaro   wrote:
> Finally, should have never considered it, I think you might want to
> study the output of
> 
> import this
> 
> Think on the second and last lines in particular.

   >>> import this
   The Zen of Python, by Tim Peters

   Beautiful is better than ugly.
   Explicit is better than implicit.
   Simple is better than complex.
   Complex is better than complicated.
   Flat is better than nested.
   Sparse is better than dense.
   Readability counts.
   Special cases aren't special enough to break the rules.
   Although practicality beats purity.
   Errors should never pass silently.
   Unless explicitly silenced.
   In the face of ambiguity, refuse the temptation to guess.
   There should be one-- and preferably only one --obvious way to do it.
   Although that way may not be obvious at first unless you're Dutch.
   Now is better than never.
   Although never is often better than *right* now.
   If the implementation is hard to explain, it's a bad idea.
   If the implementation is easy to explain, it may be a good idea.
   Namespaces are one honking great idea -- let's do more of those!
   >>> 

"There should be one-- and preferably only one --obvious way to do it."

Meanwhile, Alan Gauld pointed out:

  AG> because pow() is a builtin function and
  AG> root = pow(x,0.5)
  AG> is the same as
  AG> root = math.sqrt(x)

Plus the ** operation ("root = x ** 0.5"), that's now three ways.

Elijah
--
python user, not python advocate
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Application window geometry specifier

2021-01-12 Thread Eli the Bearded

In comp.lang.python, Rich Shepard   wrote:
>> Keep in mind that if you target Linux, the "modern" window server
>> (Wayland) will not allow user code to decide the positioning and size of
> I suspect that Slackware will continue with X11.

Even with traditional X11, geometry is "preferred" size and position[*],
not a requirement. Window managers have always been able to override
that if desired, and tiling window managers make that override a
feature. There might not be a tiling window manager in Slack's standard
packages, but there sure are in Slackbuilds.

Elijah
--
[*] See GEOMETRY SPECIFICATIONS in "man X"
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: dayofyear is not great when going into a new year

2021-01-05 Thread Eli the Bearded

In comp.lang.python, Mats Wichmann   wrote:
> "workweeks" has always been fun, ISO standard or not, there's been a 
> variation for ages since people don't seem to always follow ISO for 
> that.  I spent over a decade at a place that lived and died by their 
> WorkWeek references ("due WW22" or the like would appear in every status 
> report ever written, and there were zillions of those) - and it didn't 
> agree with ISO on whether WW1 was the week that contained Jan 1 or 
> whether it was the week that followed the previous year's last workweek. 
> After all, those few days can't actually belong to two different 
> workweeks, now can they?  :)

I think the ISO standard was to try to unify a bunch of inconsistent
locally defined things like that. In Gnu date(1), there are THREE
different, and sometimes the same and sometimes not, week of year
codes:

   %U week number of year, with Sunday as first day of week (00..53)

   %V ISO week number, with Monday as first day of week (01..53)

   %W week number of year, with Monday as first day of week (00..53)

I don't think that is an exhaustive list of methods used, either.

(excuse the vi command ugliness; % is special to : commands in vi)

:r! date +"U: \%U; V: \%V; W: \%W"
U: 01; V: 01; W: 01

Today they all match. But not always.

:r! date --date="Jan 02 2005" +"U: \%U; V: \%V; W: \%W"
U: 01; V: 53; W: 00

> (that was not a good memory you guys brought back :) )

Oh what a tangled web we weave, when we first begin to [measure time].

Elijah
--
they all disagree for Jan 02 2022, too, but slightly differently
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: dayofyear is not great when going into a new year

2021-01-05 Thread Eli the Bearded

In comp.lang.python, Chris Angelico   wrote:
> There are multiple definitions for "day of year", depending on how you
> want to handle certain oddities. The simplest is to identify Jan 1st
> as 1, Jan 2nd as 2, etc, to Dec 31st as either 365 or 366; but some
> libraries will define the year as starting with the week that contains
> the Thursday, or something, and then will define days of year
> accordingly.

That sounds like some weird off-shoot of the ISO-8601 calendar. That
document primarily concerns itself with weeks. Week 1 of a year is the
first week with a Thursday in it. The last week of a year will be either
52 or 53, and you can have things like days in January belonging to the
week of the previous year. Wikipedia gives examples:

https://en.wikipedia.org/wiki/ISO_week_date

If you are operating on that, then it might indeed make sense to number
the days from -W1-1. I can't say I've ever encountered that. Since
W1-1 is always a Monday on the ISO calendar, it would have the neat
property that you could always turn day of year into day of week with a
mod operation. It would have the quirk that years are either 364 or 371
days, neither of which most people would answer when asked "How many
days are there in a year?"

I've only used ISO dates for by-week graphs, because they have the nice
property of "all weeks are seven days", so you don't get oddball weeks
screwing up your plots.

> If you want an easy way to graph day-by-day data and the exact day
> numbers are irrelevant, what I'd recommend is: Convert the date into
> Unix time, divide by 86400, floor it. That'll give you a Julian-style
> date number where Jan 1st 1970 is 0, Jan 2nd is 1, etc, and at the end
> of a year, it'll just keep on incrementing. That would get you past
> the 2020/2021 boundary pretty smoothly.

That works well. The previous suggestion using January 1st 2020 as an
epoch start is also good.

Elijah
--
also finds "week starts on Monday" to be oddball about ISO-8601
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Changing strings in files

2020-11-10 Thread Eli the Bearded

In comp.lang.python, Chris Angelico  wrote:
> Eli the Bearded <*@eli.users.panix.com> wrote:
>> Read first N lines of a file. If all parse as valid UTF-8, consider it text.
>> That's probably the rough method file(1) and Perl's -T use. (In
>> particular allow no nulls. Maybe allow ISO-8859-1.)
> ISO-8859-1 is basically "allow any byte values", so all you'd be doing
> is checking for a lack of NUL bytes.

ISO-8859-1, unlike similar Windows "charset"s, does not use octets
128-190. Charsets like Windows CP-1252 are nastier, because they do
use that range. Usage of 1-31 will be pretty restricted in either,
probably not more than tab, linefeed, and carriage return.

> I'd definitely recommend
> mandating UTF-8, as that's a very good way of recognizing valid text,
> but if you can't do that then the simple NUL check is all you really
> need.

Dealing with all UTF-8 is my preference, too.

> And let's be honest here, there aren't THAT many binary files that
> manage to contain a total of zero NULs, so you won't get many false
> hits :)

There's always the issue of how much to read before deciding.

Elijah
--
ASCII with embedded escapes? could be a VT100 animation
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Changing strings in files

2020-11-10 Thread Eli the Bearded

In comp.lang.python, Loris Bennett  wrote:
> Manfred Lotz  writes:
> > My idea was to do
> >
> > - os.scandir and for each file
> >- check if a file is a text file
 ^^
> >- if it is not a text file skip that file
> >- change the string as often as it occurs in that file
> >
> > What is the best way to check if a file is a text file? In a script I
^^^
> > could use the `file` command which is not ideal as I have to grep the
> > result. In Perl I could do  -T file.
> If you are on Linux and more interested in the result than the
> programming exercise, I would suggest the following non-Python solution:
> 
>find . -type -f -exec sed -i 's/foo/bar/g' {} \;

That 100% fails the "check if a text file" part.

> Having said that, I would be interested to know what the most compact
> way of doing the same thing in Python might be.

Read first N lines of a file. If all parse as valid UTF-8, consider it text.
That's probably the rough method file(1) and Perl's -T use. (In
particular allow no nulls. Maybe allow ISO-8859-1.)

Elijah
--
pretty no nulls is file(1) check
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: why no camelCase in PEP 8?

2020-05-18 Thread Eli the Bearded

In comp.lang.python, Paul Rubin   wrote:
> Eli the Bearded <*@eli.users.panix.com> writes:
>> One of those is easier to "grep" for than the other.
> grep -i might help.

Or might not, if I want case sensitivity in the rest of my RE.

Elijah
--
can, but doesn't want to, build REs that are flexible about partial sensitivity
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: why no camelCase in PEP 8?

2020-05-18 Thread Eli the Bearded

In comp.lang.python, Paul Rubin   wrote:
> I don't know if this was the explicit motivation for PEP 8, but it
> has always seemed valid to me:
> 
> https://en.wikipedia.org/wiki/Camel_case#Readability_studies

There are three things cited there. One is a NYTimes story from 2009
"Against Camel Case" starting out with criticism of "iPhone" which
the author describes but won't use as it it too difiguring. That's not a
programmer talking about program identifiers.

The other two are more relevant, two studies one from 2009 and one from
2010, each of which seems to reach a conclusion at odds with the other.
The 2009 one finds camelCase easier to read than snake_case, and the
2010 one finds people recognize snake_case identifiers faster than
camelCase ones. I don't think that Wikipedia page helps your case.

I personally abhor the use of inappropriate mid-word caps in English,
which fits the NYT piece, but am only mildly against them in code. I had
some bad expierences with code that forced use of capital letters in
college and that has tainted me against excess capitals ever since. This
is a highly personal reason that I don't expect anyone else to share.

Here's a simple argument against camel case: when it becomes necessary
to join identifiers, camel case requires modification of the original
unit while snake case just adds stuff to beginning and/or end. One
noteworthy example is when a negated version is needed.

camelCase   ->   noCamelCase
snake_case  ->   no_snake_case

One of those is easier to "grep" for than the other.

Elijah
--
grep-ability of code should on everyone's mond
-- 
https://mail.python.org/mailman/listinfo/python-list

news.bbs.nz is spewing duplicates to comp.lang.python

2020-04-21 Thread Eli the Bearded

This just arrived at my newserver:

Path: 
reader2.panix.com!panix!goblin2!goblin.stu.neva.ru!news.unit0.net!2.eu.feeder.erje.net!4.us.feeder.erje.net!feeder.erje.net!xmission!csiph.com!news.bbs.nz!.POSTED.agency.bbs.nz!not-for-mail
From: Eli the Bearded <*@eli.users.panix.com> (Eli the Bearded)
Newsgroups: comp.lang.python
Subject: Re: Getting a 401 from requests.get, but not when logging in via 
the br
Date: Mon, 20 Apr 2020 19:18:48 +1200
Organization: fsxNet Usenet Gateway | bbs.nz/#fsxNet
Message-ID: <3057175...@f38.n261.z1.binkp.net>
References: <1492048...@f38.n261.z1.binkp.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Info: news.bbs.nz; 
posting-host="8IWYKlztXHa0+IViEdY46zrq8kpk7dC9fTbT74JiSDQ";
logging-data="12595"; mail-complaints-to="ab...@news.bbs.nz"
User-Agent: VSoup v1.2.9.47Beta [95/NT]
X-Comment-To: dcwhatthe
X-MailConverter: SoupGate-Win32 v1.05
Lines: 36

I find that very curious because the post is mine but which I
sent out with these headers:

Path: reader2.panix.com!panix!qz!not-for-mail
From: Eli the Bearded <*@eli.users.panix.com>
Newsgroups: comp.lang.python
Subject: Re: Getting a 401 from requests.get, but not when logging in via 
the browser.
Date: Mon, 20 Apr 2020 19:18:48 + (UTC)
Organization: Some absurd concept
Lines: 37
Message-ID: 
References: <48a2e19c-0a52-4cfa-b498-15e3d15b6...@googlegroups.com>
NNTP-Posting-Host: panix5.panix.com
X-Trace: reader2.panix.com 1587410328 23363 166.84.1.5 (20 Apr 2020 
19:18:48 GMT)
X-Complaints-To: ab...@panix.com
NNTP-Posting-Date: Mon, 20 Apr 2020 19:18:48 + (UTC)
X-Liz: It's actually happened, the entire Internet is a massive game of 
Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
User-Agent: Vectrex rn 2.1 (beta)

The timezone on the date header has changed, the subject has been
truncated, the Path and injection info is all different, and most
crucially, the MESSAGE-ID and REFERENCES are completely bogus.
News servers rely on Message-ID to tell if a message is unique. Change
it, and it is now a new message. Screwing up References: header while
changing the Subject: is going to break news threading.

The newsgroup is almost not worth the effort with all the breakage that
happens here.

Elijah
--
other breakage happens with References: headers due to mailing list stuff
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Getting a 401 from requests.get, but not when logging in via the browser.

2020-04-20 Thread Eli the Bearded

In comp.lang.python,   wrote:
> On Monday, April 20, 2020 at 5:02:23 PM UTC-4, Eli the Bearded wrote:
> > For an example, back to telnet again.
> > 
> > $ telnet example.com 80
> > Trying 255.11.22.123...
> > Connected to example.com
> > Escape character is '^]'.
> > GET /digest/ HTTP/1.1
> > Host: example.com

FYI, I type in the lines "GET /digest/ HTTP/1.1" and "Host: example.com"
(which should match the name of the site you wish to connect to), then
a blank line. After that the server first begins to "speak".

> > If you don't understand what the site is asking for, it may be very
> > difficult for you to satisfy it.
> Been years since I used Telnet.  I didn't even think that Windows had it
> anymore.

I use it frequently in Linux. But nc or the like works as well for the
way I use it: not for actual "telnet" protocol stuff.

> I tried telneting the landing page, i.e. without the specific node that
> requires the login.  So e.g.
> 
> Telnet thissite.oh.gov 80
> 
> , but it returns a 400 Bad Request.  Before that, the Telnet screen is
> completely blank ; I have to press a key before it returns the Bad
> Request.

It sounds like you are entering a blank line without the GET request
and Host: header.

> Roger on knowing what the site is asking for.  But I don't know how to
> determine that.

Developer tools in your browser may be helpful. As I mentioned, I'm much
better at this at low levels. I haven't kept up with the changes in the
tools over the past decade or so.

If you have an HTTPS site you want to check, the openssl tool can
create an interactive tunnel for you. Command line is more complicated
and output is more verbose, but the gist is similar:

$ openssl s_client -connect example.com:443
[outs of output]
GET /digest/ HTTP/1.1
Host: example.com

[lots of output]

Elijah
--
likes knowing how the pieces fit together
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Getting a 401 from requests.get, but not when logging in via the browser.

2020-04-20 Thread Eli the Bearded

In comp.lang.python,   wrote, in reply to me:
> "What do you think it is doing?"
> I thought the timeout was waiting for a successful connection.

A successful *connection* and a successful *authentication* are
different things. 

$ telnet example.com 80
Trying 255.11.22.123...
Connected to example.com
Escape character is '^]'.

[...]

There's a connection. No authentication, however.

> "Are you sure the site is using HTTPBasicAuth()? Because if it's not,
> that would explain how the same credentials can fail. (It could also
> be something else, like a site returning "401 Unauthorized" because
> it doesn't like your User-Agent.)"
> 
> Yes, that's what I'm getting.
> 
> No, I don't know if it's using Basic Authentication.  If I log in
> through the browser, then it pops up for an id and password.
> 
> How do I find out what type of Authentication is applicable?  

Look at the WWW-Authenticate: header.

For an example, back to telnet again.

$ telnet example.com 80
Trying 255.11.22.123...
Connected to example.com
Escape character is '^]'.
GET /digest/ HTTP/1.1
Host: example.com

HTTP/1.1 401 Unauthorized
Date: Mon, 20 Apr 2020 20:42:25 GMT
Server: Apache/2.4.41 (Unix) OpenSSL/1.0.2k
WWW-Authenticate: Digest realm="File Resources", 
nonce="RyTO776jBQA=5fe3887c65536842f2ebb8ad6cf39bb6b5ec9b66", algorithm=MD5, 
domain="/digest/", qop="auth"
Content-Length: 381
Connection: close
Content-Type: text/html; charset=iso-8859-1
...



$ telnet example.com 80
Trying 255.11.22.123...
Connected to example.com
Escape character is '^]'.
GET /basic/ HTTP/1.1
Host: example.com

HTTP/1.1 401 Unauthorized
Date: Mon, 20 Apr 2020 20:45:22 GMT
Server: Apache/2.4.41 (Unix) OpenSSL/1.0.2k
WWW-Authenticate: Basic realm="Restricted Resources"
Content-Length: 381
Connection: close
Content-Type: text/html; charset=iso-8859-1
...


There are other ways to authenticate besides those two, but those
are the ones I've used that operate on the HTTP level and in browsers.

http://www.iana.org/assignments/http-authschemes/http-authschemes.xhtml

That list is supposedly all of the auth schemes, I don't know how many
are widely implemented. Certainly some of them, like "Bearer" I've
seen for APIs, but not using a browser password GUI. Bearer is a very
common way to authenticate for APIs.

If you don't understand what the site is asking for, it may be very
difficult for you to satisfy it.

Elijah
--
understands all of this at a low level and not well at a library level
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Getting a 401 from requests.get, but not when logging in via the browser.

2020-04-20 Thread Eli the Bearded

In comp.lang.python,   wrote:
> However, one of them immediately returns a 401.  I'm using the exact
> same credentials to check this site, as when loggin in.
> 
> Also, interestingly, it returns the 401 right away.  I tried setting the
> timeout value for a ridiculously long time, but it passes the 401 return
> immediately.

The timeout presumably is how long to wait for a reply. When the
site replies 401 immediately, it's never even bumping up to the
timeout.

> Am I misunderstanding the meaning of the timeout parameter?

What do you think it is doing?

> The line in question is 
> request = requests.get(ip_s,timeout=5000, verify = False, auth
> =HTTPBasicAuth( user_id_s, pw_s))

Are you sure the site is using HTTPBasicAuth()? Because if it's not,
that would explain how the same credentials can fail. (It could also
be something else, like a site returning "401 Unauthorized" because
it doesn't like your User-Agent.)

I use "AuthType Digest" on some of my websites. It's not great, but
it's TONS better than basic auth, which sends passwords basically
in the clear.

https://en.wikipedia.org/wiki/Digest_access_authentication

In my browser, Digest authentication looks the same GUI-wise as Basic
authentication. The differences are all under the hood.

Elijah
--
digest auth is not as well supported by clients or servers
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: on sorting things

2019-12-19 Thread Eli the Bearded

In comp.lang.python, Peter Otten  <__pete...@web.de> wrote:
> Eli the Bearded wrote:
>> But what caught my eye most, as someone relatively new to Python but
>> with long experience in C in Perl, is sorting doesn't take a

s/C in /C and/

Ugh.

>> *comparison* function, it takes a *key generator* function, and that
>> function is supposed to transform the thing into something that the
>> native comparison knows how to compare.
>> 
>> This seems a strange choice, and I'm wondering if someone can explain
>> the benefits of doing it that way to me.
> 
> Python 2 started with a comparison function and then grew a key function.
> With a key function you still have to compare items, you are just breaking 
> the comparison into two steps:

[snip]

Thanks for that good explanation. The benchmark comparison makes it
very thorough.

In my mind I gravitate towards the complicated sorts of sort that can be
quickly compared for some sorts of keys and not as quickly for others.

Consider a sort that first compares file size and if the same number of
bytes, then compares file checksum. Any decently scaled real world
implementation would memoize the checksum for speed, but only work it out
for files that do not have a unique file size. The key method requires
it worked out in advance for everything.

But I see the key method handles the memoization under the hood for you,
so those simpler, more common sorts of sort get an easy to see benefit.

Elijah
--
even memoizing the stat() calls would help for large lists
-- 
https://mail.python.org/mailman/listinfo/python-list

on sorting things

2019-12-18 Thread Eli the Bearded

I recently saw a link to an old post on a blog and then started looking
at the newer posts. This one:

https://leancrew.com/all-this/2019/11/the-key-to-sorting-in-python/

discusses ways to deal with useful sorting of movie / television show
titles. Some initial words should be re-ordered for sorting purposes
(_Adventures of Huckleberry Finn, The_), roman numbers should sort like
regular numbers (_Rocky V_ comes before _Rocky IX_), and something needs
to be done about sorting accented vowels. 

But what caught my eye most, as someone relatively new to Python but
with long experience in C in Perl, is sorting doesn't take a
*comparison* function, it takes a *key generator* function, and that
function is supposed to transform the thing into something that the
native comparison knows how to compare.

This seems a strange choice, and I'm wondering if someone can explain
the benefits of doing it that way to me.

Elijah
--
imagines it could make porting some code in or out of Python trickier
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Artifact repository?

2019-10-31 Thread Eli the Bearded

In comp.lang.python, Paul Rubin   wrote:
> Dan Stromberg  writes:
>> By an Artifact Repository, I mean something that can version largish
>> binaries that are mostly produced by a build process.
> I'm not familiar with the term "artifact repository" and hadn't heard of
> the ones you mentioned, but have you looked at git-annex ?

Git-annex solves a different problem.

Use git-annex for the problem of "revision control with git for binary
files not suitable for normal git storage".

Use Artifactory for the problem of "store the binary product of source
code at a particular revision point".

They are kinda related, but: git doesn't magically know that when you
update foo.c that lib/libfoo.a linked into bin/projectfoo are now
obsolete. Artifactory, doesn't either, but it doesn't slide files
forward to new revisions the way git would unless you specifically
replace or delete them.

After you `make` your code, you can `make archive` (or whatever) to
copy the compiled results to your artifact repository and your deploy
code elsewhere can look to the artifact repository to get "latest" or
a specific revision.

git-annex is good for things like images used in a project that you do
want to automatically persist into the next revision. Say if you have
screenshots in your documentation and want the next `make pdfs` to
have access to them. Or if you have a blog in source code control.

Elijah
--
the cheapest artifact repository is a webserver with zips / tars
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to decode UTF strings?

2019-10-26 Thread Eli the Bearded

In comp.lang.python, DFS   wrote:
> On 10/25/2019 10:57 PM, MRAB wrote:
>> Here's a simple example, based in your code:
>> 
>> from email.header import decode_header
>> 
>> def test(header, default_encoding='utf-8'):
>>   parts = []
>> 
>>   for data, encoding in decode_header(header):
>>   if isinstance(data, str):
>>  parts.append(data)
>>   else:
>>  parts.append(data.decode(encoding or default_encoding))
>> 
>>   print(''.join(parts))
>> 
>> test('=?iso-8859-9?b?T/B1eg==?= ')
>> test('=?utf-8?Q?=EB=AF=B8?= ')
>> test('=?GBK?B?0Pu66A==?= ')
>> test('=?UTF-8?B?zp3Or866zr/PgiDOks6tz4HOs86/z4I=?= 
>> ')
> I don't think it's working:

It's close. Just ''.join should be ' '.join.

> $ python decode_utf.py
> O≡uz
> δ»╕
> ╨√║Φ
> ╬¥╬»╬║╬┐╧é ╬Æ╬¡╧ü╬│╬┐╧é

Is your terminal UTF-8? I think not.

Elijah
--
answered with C code to do this in comp.lang.c
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: question about making an App for Android

2019-10-11 Thread Eli the Bearded

In comp.lang.python, Dennis Lee Bieber   wrote:
> pyotr filipivich  declaimed the following:
>> "A simple program" to divide the amount of "today's" daylight into 12
>> even '"hours", so that Dawn begins the First hour, the third hour is
>> mid-morning, noon is the middle of the day, the ninth hour mid after
>> noon, and the twelfth hour ends at sunset.  Is simple, no?  {no.}

How antique.

>   Even ignoring "phone" this is anything but simple. It relies upon
> knowing one's latitude and date to allow computing the angle of the sun.

That sounds like one "hard" input, one easy input, and a bunch of
already solved math.

>> But getting from the development environment (the desktop) to the
>> phone is something I am clueless about.
>   Getting anything that is not written in Java onto an Android phone is
> likely going to be a pain. You will most likely need an environment that
> runs on ARM architecture. And I have no idea what iOS requires.

Running Python on Android is trivial. Install Termux:

https://f-droid.org/en/packages/com.termux/

(Or via Google's app store.) Run Termux then type: 

  pkg up
  pkg install python

To read GPS directly from Termux, you'll need Termux:Api which has an
app part and a package part:

https://f-droid.org/en/packages/com.termux.api/

  pkg install termux-api

Then use the shell command `termux-location -p gps -f last`  (or `-f
once`) to get location.

To get the python onto the phone you may wish to install curl or wget or
git and use them to fetch over the network.

For output you could show text in termux or compose a PNG and use
`termux-wallpaper -f`.

I, myself, have just gotten "cron" like functionality in Termux with
Tasker and the Termux:Task plugin. Every five minutes Tasker runs a
shell script that collects data (`termux-sensor`), simplifies the JSON
output (`|jq`) and writes it to a file on my phone.

https://f-droid.org/en/packages/com.termux.tasker/
https://tasker.joaoapps.com/

This happens completely in the background. Using a script like that to
change wallpaper to show solar hours would, I think, be pretty cool.

Elijah
--
has also compiled code (clang) on Termux

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Generate simple image on a standalone Raspberrry Pi

2019-09-27 Thread Eli the Bearded

In comp.lang.python, Roy Hann   wrote:
> I am designing a mobile application to run on a Raspberry Pi 3 model B.
> It will not have any Internet access. I need to generate a static image
> consisting of a simple arc representing (say) a speedometer or a
> pressure gauge. The image will need to be regenerated every 5 seconds.
> The image must be displayed in a web browser (served by gunicorn
> running on the Pi). I prefer it to be a PNG but that is only a
> preference.

The browser can probably display SVG. So generating that may be easier.
Is the browser running on the same system? You say no "Internet access",
but that's not the same as "no network access". A browser running on a
different local system might have more CPU oomph to handle the "SVG to
bitmap for display" step.

I've used libpng from C and chunky_png from ruby and not found PNG
generation too onerous. That makes me think pypng wouldn't be bad
either. For the case of a speedometer / pressure gauge you can likely
have a blank gauge file you read in each time through the loop and then
only write the changes needed (eg, the needle).

Alternatively the guage could be a static image in the browser with an
overlaid dynamic needle image using a transparent background. The PNG
format will handle that easily.

Elijah
--
thinks there are a lot of unstated parts to this problem
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python3 subprocess run sudo cmd in remote failed

2019-09-17 Thread Eli the Bearded

In comp.lang.python, lampahome   wrote:
> what I tried many times like enter password, but it failed.
> I just want to use ps.stdin.write(password) to send password, but it always
> jump password prompt immediately.

Passwords are frequently read from stderr, not stdin, so that tools can
get a human answered password from inside a pipleline providing stdin
to something downstream.

> How to solve this

Use some sort of out-of-band authentication. Jenkins (when I looked at
it) used ssh-agent. Ansible used sshpass. And you've already had
key-pairs suggested in a different reply.

Elijah
--
likes ssh-agent and key-pairs
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode UCS2, UCS4 and ... UCS1

2019-09-17 Thread Eli the Bearded

In comp.lang.python, moi   wrote:
> I hope, one day, for those who are interested in Unicode,
> they find a book, publication, ... which will explain
> what is UCS1.

There isn't anything called UCS1. There is a UTF-1, but don't use it.
UTF-8 is better in every way.

https://en.wikipedia.org/wiki/Universal_Coded_Character_Set

If you want it in book form, look for the "Create a book" link in the
side bar. I'd suggest 

https://en.wikipedia.org/wiki/Unicode
https://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings
https://en.wikipedia.org/wiki/UTF-8
https://en.wikipedia.org/wiki/UTF-16
https://en.wikipedia.org/wiki/UTF-32

As other things to include in your book.

Elijah
--
doesn't think there is a character encoding newsgroup
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: [OT(?)] Ubuntu 18 vim now defaults to 4-space tabs

2019-09-10 Thread Eli the Bearded

In comp.lang.python, Tobiah   wrote:
>> Your subject missed a critical word: vim.
> It's there!

I added it.

> > Run vim. Then ':set' to see what's set different than default. Then,
> > if it is tabstop you want to know about, ':verbose set tabstop?' will
> > tell you where that setting was last altered.
> 
> Nothing that seems to point to space indent:
> 
> 
>background=dark hlsearchruler   smartcase  
>  ttyfast  
> wildmenu
>helplang=C. ignorecase  scroll=36   smartindent
> 
> ttymouse=xterm2   nowrap
>hidden  modifiedshowcmd   nostartofline
>  wildcharm=^Z

Since it does not appear to have "filetype=python" in there, maybe I
should have specified "Run vim with a .py filename".

I tried vim on a new account on a NetBSD machine and saw much different
settings today.

:set
--- Options ---
  autoindent  langnoremap suffixesadd=.py
  comments=b:#,fb:- nolangremap   syntax=python
  display=truncatenrformats=bin,hex   ttimeout
  expandtab   ruler   ttimeoutlen=100
  filetype=python scroll=11   ttyfast
  helplang=en scrolloff=5 ttymouse=xterm
  history=200 shiftwidth=4wildignore=*.pyc
  incsearch   showcmd wildmenu
  keywordprg=pydocsofttabstop=4
[... long options omitted ...]

:verbose set shiftwidth? softtabstop? expandtab?
  shiftwidth
   Last set from /[...]/vim/vim81/ftplugin/python.vim line 118
  softtabstop=4
   Last set from /[...]/vim/vim81/ftplugin/python.vim line 118
  expandtab
   Last set from /[...]/vim/vim81/ftplugin/python.vim line 118

Looks like the culprit there.

> I'll check with a vim specific group.  Thanks!

I see you posted to the vim-users list. For those following along, they
suggested it's newer vim "no .vimrc file" defaults kicking in. That's
something that I usually manage to avoid by having a vimrc which is
possibly why I was unable to duplicate it on my regular accounts.

Elijah
--
really dislikes vim's mouse handling
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: [OT(?)] Ubuntu 18 vim now defaults to 4-space tabs

2019-09-09 Thread Eli the Bearded

In comp.lang.python, Tobiah   wrote:
> We upgraded a server to 18.04 and now when I start typing

Your subject missed a critical word: vim. There are a lot of editors in
Ubuntu, and probably they don't all do that.

> This is more of a vim question perhaps, but I'm already
> subscribed here and I figured someone would know what
> to do.

Run vim. Then ':set' to see what's set different than default. Then,
if it is tabstop you want to know about, ':verbose set tabstop?' will
tell you where that setting was last altered.

I'm not seeing tabstops changed on my Ubuntu 18.04, but I may have vim
installed with different packages. I prefer vim configured in a closer to
vi-compatible way than defaults.

Elijah
--
expects `ed` and `nano` still work the same
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Proper shebang for python3

2019-07-25 Thread Eli the Bearded

In comp.lang.python, Thomas 'PointedEars' Lahn   wrote:
> Michael Torrie wrote:
>> On 7/24/19 4:20 PM, Cameron Simpson wrote:
>>> That is some progress, hooray. Then there's just sbin -> bin to go.
>> I suppose in the olden days sbin was for static binaries, […]
> No, “sbin” is short for “*system* binaries” which in general only the 
> superuser should be able to execute.

I think Michael is confusing "sbin" with the statically linked utilities
some systems (particularly older ones, but also FreeBSD in /rescue/)
have for repairing the system when things start to go bad. You'd want
a shell (sh is great), a basic editor (eg, eg), and a smattering of
other tools, akin to the ones listed as "must be in /sbin" in your
linuxfoundation link.

But more than a few utilities in /sbin are useful for non-superusers.
Eg ip or ifconfig for informational purposes like identifying current
IP address and getting MAC.

> Which is why the above is a Very Bad Idea[tm].

Why? Programs that can *only* be usefully run by a privileged user
or in a system context (eg halt or getty) already *must* prevent non
privileged use. So why would it be a Very Bad Idea[tm] to have them in
a common directory like /bin/?

(Feel free to crosspost and set follow-ups to another group if you like.
But I would suggest *not* a Linux group, since this is something general
to all Unix-likes.)

> 

Elijah
--
uses both netbsd and linux regularly
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Hermetic environments

2019-07-24 Thread Eli the Bearded

In comp.lang.python, DL Neil   wrote:
> Is Python going 'the right way' with virtual environments?
...
> Am I 'getting away with it', perhaps because my work-pattern doesn't 
> touch some 'gotcha' or show-stopper?
> 
> Why, if so much of 'the rest of the world' is utilising "containers", 
> both for mobility and for growth, is the Python eco-system following its 
> own path?

I'm going to speculate that even inside containers, some people will use
multiple virtual environments. It could be that the app and the
monitoring for that app are developed by different branches of the
company and have different requirements.

But I think a lot of the use of virtual environments is in dev
environments where a developer wants to have multiple closed settings
for doing work. On the dev branch, newer versions of things can be
tested, but a production environment can be retained for hotfixes to
deployed code.

Or because the different microservices being used are each at different
update levels and need their own environments.

> Is there something about dev (and ops) using Python venvs which is a 
> significant advantage over a language-independent (even better: an 
> OpSys-independent) container?

I'm not a big fan of language-dependent virtual environments because
they only capture the needs of a particular language. Very often code
works with things that are outside of that language, even if it is only
system libraries.

Elijah
--
interested in hearing other voices on this
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Proper shebang for python3

2019-07-22 Thread Eli the Bearded

In comp.lang.python, Tim Daneliuk   wrote:
> On 7/20/19 1:20 PM, Chris Angelico wrote:
> > On Sun, Jul 21, 2019 at 4:13 AM Michael Speer  wrote:
> >> You may want to use `#!/usr/bin/env python3` instead.

I no longer have one to verify, but I recall Solaris boxen used /bin/env
not /usr/bin/env.

> So, no, do NOT encode the hard location - ever.  Always use env to
> discover the one that the user has specified. 

But wait, you just hard coded the location of env...

>The only exception is
> /bin/sh which - for a variety of reasons - can reliably counted upon.

B! Fully half of my work porting trn4 to my cellphone was fixing all
the places that ancient build system believed /bin/sh was the name of
sh. In that environment (Termux shell on an Android phone) the location
is /data/data/com.termux/files/usr/bin/sh (and env is also in
/data/data/com.termux/files/usr/bin hahaha).

Even on more traditional environments -cough-Solaris-cough- /bin/sh may
exist but be so ancient as to break things that work elsewhere. "^" as
a synonym for "|", is a noteworthy gotcha.

Figuring out where things are on the user's path is a laudable goal, but
do it only at install time, not run time, for consistent runs.

Elijah
--
pathological edge cases -r- us
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: convert .py to Android ?

2019-05-14 Thread Eli the Bearded

In comp.lang.python, Ben Finney   wrote:
> "Steve"  writes:
>> I have a working .py program
>> that I want to get into my Android Moto G phone.
> To my knowledge, an Android app must be implemented, at some level, in
> Java and specifically linked to Android Java libraries. That's a hard
> limitation of the Android platform.
> 
> That implies that any Python program will need to be written at least
> with partial awareness that it is not going to run in a native Python
> VM, but instead get compiled to somehow run in a Java Android environment.

If it doesn't have a GUI, the easy solution is put Termux on the phone,
run "pkg install python" (or python2), then install 'curl' or 'ssh' (for
scp) to transfer the program over. I use Termux scripts on my phone for
image preprocessing (namely strip exif and downscale) prior to using my
photos on the open web.

Elijah
--
also uses Termux as an ssh client
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Conway's game of Life, just because.

2019-05-07 Thread Eli the Bearded

In comp.lang.python, MRAB   wrote:
> I've never seen a version of Conway's Game of Life where the board 
> doesn't wrap around.

The one I wrote in vi macros doesn't. It's a design choice you can make.

(Thanks for the explainations everyone.)

Elijah
--
the vi macro one is included in the vim macros directory
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Conway's game of Life, just because.

2019-05-07 Thread Eli the Bearded

In comp.lang.python, Paul Rubin   wrote:

Thanks for posting this. I'm learning python and am very familiar with
this "game".

> #!/usr/bin/python3
> from itertools import chain
> 
> def adjacents(cell):# generate coordinates of cell neighbors
> x, y = cell # a cell is just an x,y coordinate pair
> return ((x+i,y+j) for i in [-1,0,1] for j in [-1,0,1] if i or j)

This line confuses me. How do you expect "if i or j" to work there?

>>> for pair in adjacents((0,0)):
...print(pair)
...
(-1, -1)
(-1, 0)
(-1, 1)
(0, -1)
(0, 1)
(1, -1)
(1, 0)
(1, 1)
>>> def neighboring(cell):
... x, y = cell
... return ((x+i,y+j) for i in [-1,0,1] for j in [-1,0,1])
... 
>>> 
>>> for pair in neighboring((0,0)):
...print(pair)
... 
(-1, -1)
(-1, 0)
(-1, 1)
(0, -1)
(0, 0)
(0, 1)
(1, -1)
(1, 0)
(1, 1)
>>>

Elijah
--
is the torus game board unintentional?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Generating generations of files

2019-04-29 Thread Eli the Bearded

In comp.lang.python, DL Neil   wrote:
> On 30/04/19 10:59 AM, Chris Angelico wrote:
>>> bet a FAT filesystem would produce a different error
>> Probably it'd raise BadFileSystemError or something. Which is a
> Fortunately, it runs on a Linux 'compute server'.

I mount FAT under Linux all the time.

Elijah
--
but generally for sneakernet reasons
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Generating generations of files

2019-04-29 Thread Eli the Bearded

In comp.lang.python, Peter J. Holzer  wrote:
> On 2019-04-29 20:12:28 -, Grant Edwards wrote:
>> Well, the FILES-11 filesystem on VAX/VMS did that automatically, but
>> that's probably not too helpful.
> Until this is finished you could use something like this:
> 
> #!/usr/bin/python3
> 
> import os
> 
> def open11(file, mode, **kwargs):
> if "w" in mode:
> try:
> oldfile = os.readlink(file)
> basename, version = oldfile.split(";")
> except FileNotFoundError:
> basename = os.path.basename(file)
> version = 0
> newfile = basename + ";" + str(int(version) + 1)
> os.unlink(file)
> os.symlink(newfile, file)
> return open(file, mode, **kwargs)
> 
> 
> if __name__ == "__main__":
> with open11("foo", "w") as f:
> f.write("test1")
> 
> with open11("foo", "w") as f:
> f.write("test2")
> 
> with open11("foo", "w") as f:
> f.write("test3")
> 
> :-)

Noted.

> (WARNING: I haven't really tested this)

No foo:

Traceback (most recent call last):
  File "versioned-open", line 21, in 
with open11("foo", "w") as f:
  File "versioned-open", line 15, in open11
os.unlink(file)
FileNotFoundError: [Errno 2] No such file or directory: 'foo'

There is a foo, but it's not a symlink:

Traceback (most recent call last):
  File "versioned-open", line 21, in 
with open11("foo", "w") as f:
  File "versioned-open", line 9, in open11
oldfile = os.readlink(file)
OSError: [Errno 22] Invalid argument: 'foo'

Parse error on version string:

Traceback (most recent call last):
  File "versioned-open", line 21, in 
with open11("foo", "w") as f:
  File "versioned-open", line 10, in open11
basename, version = oldfile.split(";")
ValueError: not enough values to unpack (expected 2, got 1)

etc (there are several possible parse errors: no semicolon, multiple
semicolons, invalid literal for int()).

That said, I do think it makes a reasonable suggestion for the stated
versioning requirement.

Elijah
--
bet a FAT filesystem would produce a different error
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: need help understanding: converting text to binary

2019-04-23 Thread Eli the Bearded

In comp.lang.python, Cameron Simpson   wrote:
> On 23Apr2019 20:35, Eli the Bearded <*@eli.users.panix.com> wrote:
>> That feels entirely wrong. I don't know what b'\x9A' means without
>> knowing the character set and character encoding. If the encoding is a
>> multibyte one, b'\x9A' doesn't mean anything on its own. That's why I
>> want to treat it as binary.
> If you don't know the encoding then you don't know you're looking at a 
> hex digit. OTOH, if the binary data contain ASCII data then you do know 
> the encoding: it is ASCII.

Hmmm. Maybe I'm not making myself clear. ASCII "=9a" should decode to
b'\x9A' and it is that binary byte for which I don't know the meaning
and why I don't want to use "text internallly" for as suggested
upthread.

> If that is mixed with other data then you need to know where it 
> starts/stops in order to pull it out to be decoded. The overall data may 
> be a mix, but the bit you're pulling out is encoded text, which you 
> could decode.

I do want to decode it, and possibly compare it for an exact match. And
because there are different possible encodings of the same source data
(consider the trivial case of "=9A" versus "=9a", I don't want to just
keep it in raw form).

Elijah
--
not to mention QP versus b64
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: need help understanding: converting text to binary

2019-04-23 Thread Eli the Bearded

In comp.lang.python, Paul Rubin   wrote:
> Eli the Bearded <*@eli.users.panix.com> writes:
>> # decode a single hex digit
>> def hord(c): ...
> 
>def hord(c): return int(c, 16)

That's a good method, thanks.

> > # decode quoted printable, specifically the MIME-encoded words
> > # variant which is slightly different than the body text variant
> > def decodeqp(v): ...
> 
> I think this is supposed to mean:
> 
> from itertools import islice
> 
> def getbytes(v):
> cs = iter(bytes(v,'ascii'))
> for c in cs:
> if c == ord('='):
> h1,h2 = islice(cs,2)
> yield int(chr(h1)+chr(h2), 16)
> else: yield c
> 
> def decodeqp(v):
> return bytes(getbytes(v))
> 
> print (decodeqp('=21_yes'))  # prints "b'!_yes'"

But that's not the output my sample produced.

  def getbytes(v):
  cs = iter(bytes(v,'ascii'))
  for c in cs:
  if c == ord('='):
  h1,h2 = islice(cs,2)
  yield int(chr(h1)+chr(h2), 16)
  elif c == ord('_'):
  yield ord(' ')
  else: yield c

That's certainly a lot cleaner that what I had.

Elijah
--
and shorter than the one in /usr/lib/python3.5/quopri.py
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: need help understanding: converting text to binary

2019-04-23 Thread Eli the Bearded

In comp.lang.python, Chris Angelico   wrote:
> Have you checked to see if Python can already do this? You mention

I'm sure there's a library already. I'm trying to mix library usage with
my own code to get practice writing in python. In this case, I want code
to deal with MIME encoding in email headers. I turned to a library for
the base64 part and translated C code I once wrote for the
quoted-printable part. I was struck by how complicated it seems to be to
generate binary blobs in python, which makes me think I'm not getting
something.

>> Is there a more python-esque way to convert what should be plain ascii
> What does "plain ASCII" actually mean, though?

ASCII encoded binary data. ASCII is code points that fit in 7-bits
comprising the characters found on a typical 1970s US oriented
typewriter plus a few control characters.

>> into a binary "bytes" object? In the use case I'm working towards the
>> charset will not be ascii or UTF-8 all of the time, and the charset
>> isn't the responsibility of the python code. Think "decode this if
>> charset matches user-specified value, then output in that same charset;
>> otherwise do nothing."
> I'm not sure what this means,

If the terminal expects this encoding, then decode the ASCII transport
encoding and show the raw stream. If the terminal doesn't expect this
encoding, do not decode. Python should be treating it as a a binary
stream, and doesn't need to understand the encoding itself.

> but I would strongly recommend just
> encoding and decoding regardless. Use text internally and bytes at the
> outside.

That feels entirely wrong. I don't know what b'\x9A' means without
knowing the character set and character encoding. If the encoding is a
multibyte one, b'\x9A' doesn't mean anything on its own. That's why I
want to treat it as binary.

Elijah
--
thinking of an array of "unsigned char" not of characters
-- 
https://mail.python.org/mailman/listinfo/python-list

need help understanding: converting text to binary

2019-04-22 Thread Eli the Bearded

Here's some code I wrote today:

-- cut here 8< --
HEXCHARS = (b'0', b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8', b'9',
b'A', b'B', b'C', b'D', b'E', b'F',
b'a', b'b', b'c', b'd', b'e', b'f')


# decode a single hex digit
def hord(c):
c = ord(c)
if c >= ord(b'a'):
return c - ord(b'a') + 10
elif c >= ord(b'A'):
return c - ord(b'a') + 10
else:
return c - ord(b'0')


# decode quoted printable, specifically the MIME-encoded words
# variant which is slightly different than the body text variant
def decodeqp(v):
out = b''
state = '' # used for =XY decoding
for c in list(bytes(v,'ascii')):
c = bytes((c,))

if c == b'=':
if state == '':
state = '='
else:
raise ValueError
continue

   if c == b'_':   # underscore is space only for MIME words
if state == '':
out += b' '
else:
raise ValueError
continue

if c in HEXCHARS:
if state == '':
out += c
elif state == '=':
state = hord(c)
else:
state *= 16
state += hord(c)
out += bytes((state,))
state = ''
continue

if state == '':
out += c
else:
raise ValueError
continue

if state != '':
raise ValueError

return out
-- >8 cut here --

It works, in the sense that

 print(decodeqp("=21_yes"))

will output

 b'! yes'

But the bytes() thing is really confusing me. Most of this is translated
from C code I wrote some time ago. I'm new to python and did spend some
time reading:

https://docs.python.org/3/library/stdtypes.html#bytes-objects

Why does "bytes((integertype,))" work? I'll freely admit to stealing
that trick from /usr/lib/python3.5/quopri.py on my system. (Why am I not
using quopri? Well, (a) I want to learn, (b) it decodes to a file
not a variable, (c) I want different error handling.)

Is there a more python-esque way to convert what should be plain ascii
into a binary "bytes" object? In the use case I'm working towards the
charset will not be ascii or UTF-8 all of the time, and the charset
isn't the responsibility of the python code. Think "decode this if
charset matches user-specified value, then output in that same charset;
otherwise do nothing."

Elijah
--
has yet to warm up to this language
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: question about speed of sequential string replacement vs regex or

2011-09-28 Thread Eli the Bearded

In comp.lang.perl.misc, Willem   wrote:
> In Perl, it would be applicable.  You see, in Perl, you can call a function
> in the replacement of the regex substitution, which can then look up the
> html entity and return the wanted unicode literal.

A function? I'd use a hash.

> I think you can do that in some other languages as well.

Hash / array type substitutions indexed by $1 (or language equivilent)
are probably easy to implement in many languages.

Elijah
--
for really fast, write code to generate a C lexer to do it
-- 
http://mail.python.org/mailman/listinfo/python-list

54 matches

Mail list logo