On Wed, Apr 22, 2020 at 6:30 AM Barry Scott <ba...@barrys-emacs.org> wrote: > > > > > On 21 Apr 2020, at 20:47, dcwhat...@gmail.com wrote: > > > > On Tuesday, April 21, 2020 at 3:16:51 PM UTC-4, Barry Scott wrote: > >>> On 21 Apr 2020, at 18:11, dc wrote: > >>> > >>> On Tuesday, April 21, 2020 at 12:40:25 PM UTC-4, Dieter Maurer wrote: > >>>> dc wrote at 2020-4-20 14:48 -0700: > >>>>> ... > >>>>> I tried telneting the landing page, i.e. without the specific node that > >>>>> requires the login. So e.g. > >>>>> > >>>>> Telnet thissite.oh.gov 80 > >>>>> > >>>>> , but it returns a 400 Bad Request. Before that, the Telnet screen is > >>>>> completely blank ; I have to press a key before it returns the Bad > >>>>> Request. > >>>>> > >>>>> > >>>>> Roger on knowing what the site is asking for. But I don't know how to > >>>>> determine that. > >>>> > >>>> I use `wget -S` to learn about server responses. > >>>> I has the advantage (over `telnet`) to know the HTTP protocl. > >>> > >>> Sure enough, wget DOES return a lot of information. In fact, although an > >>> initial response of 401 is returned, it waits for the response and > >>> finally returns a 200. > >>> > >>> So, I guess the question finally comes down to: How do we make the > >>> requests.get() wait for a response? The timeout value isn't the same > >>> thing that I thought it was. So how do we tell .get() to wait 20 or 30 > >>> seconds for an OK response? > >> > >> The way HTTP protocol works is that you send a request and get a response. > >> 1 in 1 out. > >> The response can tell you that you need to do more work, like add > >> authentication data. > >> > >> The only use of the timeout is to allow you to give up if a response does > >> not comeback > >> before you get bored waiting. > >> > >> In the case of the 401 you can read what it means here: > >> https://httpstatuses.com/401 > >> > >> It is then up to your code to issue a new request with the requirer > >> authentication headers. > >> The headers you got back in the first response will tell you what type of > >> authentication is requires, > >> basic, digest etc. > >> > >> The library you are using should be able to handle this if you provide > >> what the library requires from > >> you to do the authenticate. > >> > >> Personally I debug stuff using the curl command. curl -v <url> shows you > >> the request and the response. > >> You can then add curl options to provide authenicate data > >> (username/password) and how to use it --basic > >> and --digest for example. > >> > >> Oh and the other status that needs handling is a 302 redirect. This allows > >> a web site to more a page > >> and tell you the new location. Again you have to allow your library to do > >> this for you. > >> > >> Barry > >> > >> > >> > >>> > >>> -- > >>> https://mail.python.org/mailman/listinfo/python-list > >>> > > > > Barry, Thanks. I'm starting to get a bigger picture, now. > > > > So I really do need to raise the status, in order to get the headers I had > > put this in orginally, but then thought it wasn't necessary. > > In a response you always get a status line, headers and a body. In case of a > response that is not a 200 there is often > important information in the headers. The body is usually for showing to > humans when the program does not know > how to handle the status code. > > > > > So in the case of this particular site, if I understand correctly, I would > > be using the NTLM to decide which type of Authentication to follow up with > > (I think). > > > > Content-Length: 1293 > > Content-Type: text/html > > WWW-Authenticate: Negotiate, NTLM > > Yep that is right. The site wants you to use NTLM to authenticate with it. > NTLM is not always supported, you will need to check your library docs to see > if it supports NTLM. >
I believe the 'requests' library supports NTLM, although I haven't personally used it so I can't check. ChrisA -- https://mail.python.org/mailman/listinfo/python-list