{NOTE, after some diversion, this long message does revert a bit to the topic.}

Ah, Chris, the games we played when we were young and relatively immature!

Has anyone else played with embedding "escape sequences" or other gimmicks in 
unexpected places like filenames so that on the right terminals, such as the 
VT100, it displayed oddly or was hard to open or delete as what showed was not 
quite the underlying representation?

The main reason for the restrictions in olden days was cost. Everything was so 
costly including storage/memory and CPU time. Clearly it is a lot easier to 
have fixed-length filenames that fit into say 16 bytes, or storing multiple 
flags about file permissions as single bits, even if it meant lots of 
bit-twiddling or using masks to retrieve their values. We think nothing of 
creating structures that have many others embedded in them as attributes or 
function calls that allow a hundred optional arguments so that the function 
spends much of the time used just figuring out what was set before doing 
whatever calculation is required to fulfill the request.

I was reading a novel recently (Jack Reacher Series) where the main character 
is noticing how much technology has changed as they have been ignoring it for a 
decade or so. Everything seems to be coming up faster. My view was that if 
something seems ten times as fast as it was, it also probably is doing a 
hundred or ten thousand times as much to get that result.  The real speed 
changes are often counterbalanced by expecting to do more. A web page served 
may display a screen of text but to transmit it may include not just lots of 
padding in the HTML, but have all kinds of code such as in Java or JavaScript 
or lots of back and forth with the server to keep something like a graph 
displayed being refreshed ...

So back to filenames, the concept of having to search for long filenames that 
may not even be stored sequentially in large blocks that can be read (ahead) 
efficiently, may have seemed to be so illogical as not to be considered. So 
given that the shorter ones were not allowed to have embedded spaces, it made 
sense to treat them like tokens that could be broken up at whitespace. As 
mentioned, languages (or other programs) would often parse a command line and 
create something like this for the main program in C with suitable version in 
Python and other languages:

   main(int argc, char *argv[])

The code variations on the above do suppose that something has already parsed 
the command line that invoked them and partitioned it properly into individual 
strings placed in an array of such strings and also returned how many arguments 
it saw. Users invoking the program needed to be careful such as using double 
quotes around anything with embedded spaces, where allowed.

But like many paradigms, there can be a shift. Consider the fact that languages 
like Python are constantly parsing things like names. Can you create a variable 
name like "me first" with an embedded space or even other symbols normally 
reserved such as parentheses? Most languages do not like such things. It makes 
it hard to parse if not quoted in some unique way. Yet languages like R happily 
allow such constructs if placed in back quotes (grave accents?) as in `me & 
you` as a variable name or the name of a function. Of course, you then never 
use the darn name without the extra quotes.

Similarly, when you make an object like a DataFrame, can you include spaces and 
other things in the names of columns (or sometimes rows)? If so, is there only 
access some ways and not others? 

The answer often is not simple. As Chris repeatedly highlights, making a 
language consistent as you play with features can be VERY hard and sometimes 
not quite possible without relaxing some rules or making exceptions. Sometimes 
the answer varies. In base R a data.frame can be given a column name like "me + 
you" which it then stores as "me...you" leading to odd results. But it happily 
returns that result if you ask for mydf$me using auto-completion. Spell it out 
fully and it won't find it! A later package added on makes modified data.frame 
objects called tibbles which do not autocomplete but do completely store and 
let you access the name so mydf$me fails and mydf$"me + you" or mydf$`me + you` 
works but oddly an alternative format like mydf[, "me + you"] works while the 
similar mydf[, `me + you`] fails!

My point is not about R but a more general one. I can rant about many other 
languages, LOL! Allowing spaces or other characters in what used to be a more 
easily viewable name that can be parsed easier, can lead to having to find 
every place such things are used and seeing if they can be made to work 
consistently. I show an example above where it is not consistent, in my view. 

But when humans view things and their perceptions differ, you are inviting 
disagreements about whatever you implement. You may end up having to make 
people do more than they would prefer such as quoting all variable names even 
if they do not "need" it. Wouldn't it be nice to be able to name a variable 
1time@least if you felt like it? Many people have passwords like that. I think 
the answer is NO, not if it meant quoting every variable because there was no 
longer any reasonable way to parse programs.

The issue that started the discussion was different but in a sense similar. If 
you want to extend the functionality of a "for" loop in one of many possible 
ways, how do you design a way to specify it so it can both be unambiguously 
parsed and implemented while at the same time making sense to humans reading it 
and interpreting it using their human language skills.

I happen to think this is even harder for some who speak in languages other 
than English and have to program in languages loosely based on English. I am 
happy that I seem to think in English now but I was seven when I first 
encountered it after thinking in others. People who program do not all speak 
English or are more fluent in other languages. T may may be used to other word 
orders for example. They may move verbs to the end of a sentence  or place 
adjectives or other modifiers after versus before a word and forget about all 
the other games played where the same word means something completely 
different. To them ELSE may either mean nothing or the phrase IF ... ELSE may 
be said differently or adding a clause after the construct is not seen as 
natural.

So was this way of doing FOR ... ELSE the only or even best way, is what some 
of this debate is about.

I am thinking of a function in some languages that lets you specify what should 
happen in a later circumstance. In a language like R, you can specify one or 
more instances of on.exit(...) that are not run immediately. Each one can 
replace the commands in the previous one or add on to it. When the function 
they are defined in exits for any reason, it pauses to run any uncleared such 
commands. Clearly this works better if a language has a way to defer evaluation 
of code so there are no side effects.

So consider the suggestion of code that should be run if you have a loop and 
you break out of it. Could you design an alternate way to handle that other 
than adding an ELSE clause after the loop?

Clearly you could simply add a function called on.break() that can be used as 
described but only within the body of that loop. It might be something that can 
be set and unset as needed and when the loop is exited, the program implicitly 
checks to see if any code has been dynamically set and executes it. This 
clearly is not necessarily a good way or better way, but is an example of how 
you can implement something without using any key words. No need to think about 
forcing the use of ELSE versus a new keyword that may conflict with existing 
code. Yes, the name on.break may conflict but that is trivially handled in 
Python by invoking it with a full name that includes what module it is in or by 
creating an alias. 
 
So what about considering an alternate approach that does handle a for loop 
that does nothing? Would it create huge incompatibilities for something like:

for eye in range(0), on.empty=... :
    pass

In some languages, arbitrary additional arguments are allowed, and if not 
understood, are ignored. Python does not allow anything like the above. And in 
this case, the entire body of the for loop is never evaluated so no gimmicks 
inside the body are possible. A gimmick before it might work and I even wonder 
if there is room here for a decorator concept like:

@on.empty(...)
for eye in range(0):
    pass

I am ending with a reminder. NOTHING I am writing here is meant to be taken 
seriously but merely as part of a well-intentioned debate to share ideas and 
not to win or lose but learn. Python is more than a language but also has 
aspects of a culture and we sometimes talk about whether something has a 
pythonic flavor or is pythonic versus translating it literally from a language 
like C rather than using the ideas common in python. The method chosen to 
implement the ELSE clause here may well be Pythonic and some of my attempts to 
show other ways may well not be. I am not one of those that find the current 
implementation to be the wrong one and will happily use it when I have code 
that can be done well that way. I am just discussing the issue and wider ones. 
Languages have an amazing variety of designs that fascinate me.



-----Original Message-----
From: Chris Angelico <ros...@gmail.com>
To: python-list@python.org
Sent: Fri, Mar 4, 2022 12:46 pm
Subject: Re: Behavior of the for-else construct


 On Sat, 5 Mar 2022 at 02:02, Tim Chase <python.l...@tim.thechases.com> wrote:
>
> On 2022-03-04 11:55, Chris Angelico wrote:
> > In MS-DOS, it was perfectly possible to have spaces in file names
>
> DOS didn't allow space (0x20) in filenames unless you hacked it by
> hex-editing your filesystem (which I may have done a couple times).
> However it did allow you to use 0xFF in filenames which *appeared* as
> a space in most character-sets.

Hmm, I'm not sure which APIs worked which way, but I do believe that I
messed something up at one point and made a file with an included
space (not FF, an actual 20) in it. Maybe it's something to do with
the (ancient) FCB-based calls. It was tricky to get rid of that file,
though I think it turned out that it could be removed by globbing,
putting a question mark where the space was.

(Of course, internally, MS-DOS considered that the base name was
padded to eight with spaces, and the extension padded to three with
spaces, so "READ.ME" would be "READ\x20\x20\x20\x20ME\x20", but that
doesn't count, since anything that enumerates the contents of a
directory would translate that into the way humans think of it.)

> I may have caused a mild bit of consternation in school computer labs
> doing this. ;-)

Nice :)

> > Windows forbade a bunch of characters in file names
>
> Both DOS and Windows also had certain reserved filenames
>
> https://www.howtogeek.com/fyi/windows-10-still-wont-let-you-use-these-file-names-reserved-in-1974/
>
> that could cause issues if passed to programs.

Yup. All because, way back in the day, they didn't want to demand the
colon. If you actually *want* to use the printer device, for instance,
you could get a hard copy of a directory listing like this:

DIR >LPT1:

and it's perfectly clear that you don't want to create a file called
"LPT1", you want to send it to the printer. But noooooo it had to be
that you could just write "LPT1" and it would go to the printer.

> To this day, if you poke around on microsoft.com and change random
> bits of URLs to include one of those reserved filenames in the GET
> path, you'll often trigger a 5xx error rather than a 404 that you
> receive with random jibberish in the same place.
>
>   https://microsoft.com/…/asdfjkl → 404
>   https://microsoft.com/…/lpt1 → 5xx
>   https://microsoft.com/…/asdfjkl/some/path → 404
>   https://microsoft.com/…/lpt1/some/path → 5xx
>
> Just in case you aspire to stir up some trouble.
>

In theory, file system based URLs could be parsed such that, if you
ever hit one of those, it returns "Directory not found". In
practice... apparently they didn't do that.

As a side point, I've been increasingly avoiding any sort of system
whereby I take anything from the user and hand it to the file system.
The logic is usually more like:

If path matches "/static/%s":
1) Get a full directory listing of the declared static-files directory
2) Search that for the token given
3) If not found, return 404
4) Return the contents of the file, with cache markers

Since Windows will never return "lpt1" in that directory listing, I
would simply never find it, never even try to open it. This MIGHT be
an issue with something that accepts file *uploads*, but I've been
getting paranoid about those too, so, uhh... my file upload system now
creates URLs that look like this:

https://sikorsky.rosuav.com/static/upload-49497888-6bede802d13c8d2f7b92ca9fac7c

That was uploaded as "pie.gif" but stored on the file system as
~/stillebot/httpstatic/uploads/49497888-6bede802d13c8d2f7b92ca9fac7c
with some metadata stored elsewhere about the user-specified file
name. So hey, if you were to try to upload a file that had an NTFS
invalid character in it, I wouldn't even notice.

Maybe I'm *too* paranoid, but at least I don't have to worry about
file system attacks.


ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to