from:"John Nagle"

[issue24954] No way to generate or parse timezone as produced by datetime.isoformat()

2017-07-27 Thread John Nagle


John Nagle added the comment:

As the original author of the predecessor bug report (issue 15873) in 2012, I 
would suggest that there's too much bikeshedding here. I filed this bug because 
there was no usable ISO8601 date parser available.  PyPi contained four 
slightly different buggy ones, and three more versions were found later.  

I suggested following RFC3339, "Date and Time on the Internet: Timestamps", 
section 5.6, which specifies a clear subset of ISO8601.  Five years later, I 
suggest just going with that. Fancier variations belong in non-standard 
libraries.

Date parsing should not be platform-dependent.  Using an available C library 
was convenient, but not portable. 

Let's get this done.

--

___
Python tracker 
<http://bugs.python.org/issue24954>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue28756] robotfileparser always uses default Python user-agent

2016-11-20 Thread John Nagle


John Nagle added the comment:

(That's from a subclass I wrote.  As a change to RobotFileParser, __init__ 
should start like this.)

def __init__(self, url='', user_agent=None):
self.user_agent = user_agent# save user agent
...

--

___
Python tracker 
<http://bugs.python.org/issue28756>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue28756] robotfileparser always uses default Python user-agent

2016-11-20 Thread John Nagle


John Nagle added the comment:

Suggest adding a user_agent optional parameter, as shown here:

def __init__(self, url='', user_agent=None):
urllib.robotparser.RobotFileParser.__init__(self, url)  # init parent
self.user_agent = user_agent# save user agent
 
def read(self):
"""
Reads the robots.txt URL and feeds it to the parser.
Overrides parent read function.
"""
try:
req = urllib.request.Request(   # request with user 
agent specified
self.url, 
data=None)
if self.user_agent is not None :# if overriding user 
agent
req.add_header("User-Agent", self.user_agent)
f = urllib.request.urlopen(req) # open connection
except urllib.error.HTTPError as err:
if err.code in (401, 403):
self.disallow_all = True
elif err.code >= 400 and err.code < 500:
self.allow_all = True
else:
raw = f.read()
self.parse(raw.decode("utf-8").splitlines())

--

___
Python tracker 
<http://bugs.python.org/issue28756>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue28756] robotfileparser always uses default Python user-agent

2016-11-20 Thread John Nagle


New submission from John Nagle:

urllib.robotparser.RobotFileParser always uses the default Python user agent. 
This agent is now blacklisted by many sites, and it's not possible to read the 
robots.txt file at all.

--
components: Library (Lib)
messages: 281314
nosy: nagle
priority: normal
severity: normal
status: open
title: robotfileparser always uses default Python user-agent
type: enhancement
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker 
<http://bugs.python.org/issue28756>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27065] robotparser user agent considered hostile by mod_security rules.

2016-05-19 Thread John Nagle


New submission from John Nagle:

"robotparser" uses the default Python user agent when reading the "robots.txt" 
file, and there's no parameter for changing that.

Unfortunately, the "mod_security" add-on for Apache web server, when used with 
the standard OWASP rule set, blacklists the default Python USER-AGENT in Rule 
990002, User Agent Identification. It doesn't like certain HTTP USER-AGENT 
values. One of them is "python-httplib2". So any program in Python which 
accesses the web site will trigger this rule and be blocked form access.  

For regular HTTP accesses, it's possible to put a user agent string in the 
Request object and work around this. But "robotparser" has no such option. 

Worse, if "robotparser" has its read of "robots.txt" rejected, it interprets 
that as a "deny all" robots.txt file, and returns False for all "can_fetch()" 
requests.

--
components: Library (Lib)
messages: 265900
nosy: nagle
priority: normal
severity: normal
status: open
title: robotparser user agent considered hostile by mod_security rules.
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6

___
Python tracker 
<http://bugs.python.org/issue27065>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24985] Python install test fails - OpenSSL - "dh key too small"

2015-09-02 Thread John Nagle


New submission from John Nagle:

Installing Python 3.4.3 on a new CentOS Linux release 7.1.1503 server.
Started with source tarball, did usual ./configure; make; make test
SSL test fails with "dh key too small".  See below.

OpenSSL has recently been modified to reject short keys, due to a security 
vulnerability. See
  http://www.ubuntu.com/usn/usn-2639-1/
and see here for an analysis of the issue on a Python install:
  http://www.alexrhino.net/jekyll/update/2015/07/14/dh-params-test-fail.html

Apparently the "dh512.pem" file in the test suite is now obsolete, because the 
minimum length dh key is now 768.

The question is, does this break anything else?  Google for "dh key too small" 
and various other projects report problems. 


==
ERROR: test_dh_params (test.test_ssl.ThreadedTests)
--
Traceback (most recent call last):
  File 
"/home/sitetruth/private/downloads/python/Python-3.4.3/Lib/test/test_ssl.   
py", line 2728, in test_dh_params
chatty=True, connectionchatty=True)
  File 
"/home/sitetruth/private/downloads/python/Python-3.4.3/Lib/test/test_ssl.   
py", line 1866, in server_params_test
s.connect((HOST, server.port))
  File "/home/sitetruth/private/downloads/python/Python-3.4.3/Lib/ssl.py", line 
   846, in connect
self._real_connect(addr, False)
  File "/home/sitetruth/private/downloads/python/Python-3.4.3/Lib/ssl.py", line 
   837, in _real_connect
self.do_handshake()
  File "/home/sitetruth/private/downloads/python/Python-3.4.3/Lib/ssl.py", line 
   810, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: SSL_NEGATIVE_LENGTH] dh key too small (_ssl.c:600)

--
Ran 99 tests in 12.012s

FAILED (errors=1, skipped=4)
test test_ssl failed
make: *** [test] Error 1

==

--
components: Installation
messages: 249566
nosy: nagle
priority: normal
severity: normal
status: open
title: Python install test fails - OpenSSL - "dh key too small"
versions: Python 3.4

___
Python tracker 
<http://bugs.python.org/issue24985>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23843] ssl.wrap_socket doesn't handle virtual TLS hosts

2015-04-02 Thread John Nagle


John Nagle added the comment:

I'm using wrap_socket because I want to read the details of a server's SSL 
certificate.  

"Starting from Python 3.2, it can be more flexible to use 
SSLContext.wrap_socket() instead" does not convey that ssl.wrap_socket() will 
fail to connect to some servers because it will silently check the wrong 
certificate.

--

___
Python tracker 
<http://bugs.python.org/issue23843>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23588] Errno conflicts in ssl.SSLError

2015-04-01 Thread John Nagle


John Nagle added the comment:

If SSL error reporting is getting some attention, something should be done to 
provide better text messages for the SSL errors.  All certificate verify 
exceptions return the string "certificate verify failed (_ssl.c:581)". The line 
number in _ssl.c is not particularly useful to end users. OpenSSL has more 
specific messages, but they're not making it to the Python level.

'python ssl "certificate verify failed"' has 17,000 hits in Google, which 
indicates users need more help dealing with this class of error.

--
nosy: +nagle

___
Python tracker 
<http://bugs.python.org/issue23588>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23843] ssl.wrap_socket doesn't handle virtual TLS hosts

2015-04-01 Thread John Nagle


New submission from John Nagle:

ssl.wrap_socket() always uses the SSL certificate associated with the raw IP 
address, rather than using the server_host feature of TLS. Even when 
wrap_socket is used before calling "connect(port, host)", the "host" parameter 
isn't used by TLS.

To get proper TLS behavior (which only works in recent Python versions), it's 
necessary to create an SSLContext, then use

context.wrap_socket(sock, server_hostname="example.com")

This behavior is backwards-compatible (the SSL module didn't talk TLS until 
very recently) but confusing.  The documentation does not reflect this 
difference.  There's a lot of old code and online advice which suggests using 
ssl.wrap_socket().  It works until you hit a virtual host with TLS support. 
Then you get the wrong server cert and an unexpected "wrong host" SSL error.

Possible fixes:

1. Deprecate ssl.wrap_socket(), and modify the documentation to tell users to 
always use context.wrap_socket().

2. Add a "server_hostname" parameter to ssl.wrap_socket().  It doesn't accept 
that parameter; only context.wrap_socket() does.  Modify documentation 
accordingly.

--
assignee: docs@python
components: Documentation, Library (Lib)
messages: 239834
nosy: docs@python, nagle
priority: normal
severity: normal
status: open
title: ssl.wrap_socket doesn't handle virtual TLS hosts
versions: Python 3.4

___
Python tracker 
<http://bugs.python.org/issue23843>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23736] "make test" on clean py3 install on CentOS 6.2 - 2 tests fail

2015-03-21 Thread John Nagle


New submission from John Nagle:

Installing Python 3.4.2 on CentOS 6.  Clean install.  Using procedure in README 
file:

./configure
make
make test

2 tests fail in "make test" The first one is because the FTP client
test is trying to test against a site that is long gone, the Digital Equipment 
Corporation Systems Research Center in Palo Alto:

ERROR: test_ftp (test.test_urllib2net.OtherNetworkTests) 
(url='ftp://gatekeeper.research.compaq.com/pub/DEC/SRC/research-reports/00README-Legal-Rules-Regs')
--
Traceback (most recent call last):
  File "/home/staging/local/python/Python-3.4.3/Lib/urllib/request.py", line 
1399, in ftp_open
fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
  File "/home/staging/local/python/Python-3.4.3/Lib/urllib/request.py", line 
1445, in connect_ftp
dirs, timeout)
  File "/home/staging/local/python/Python-3.4.3/Lib/urllib/request.py", line 
2243, in __init__
self.init()
  File "/home/staging/local/python/Python-3.4.3/Lib/urllib/request.py", line 
2249, in init
self.ftp.connect(self.host, self.port, self.timeout)
  File "/home/staging/local/python/Python-3.4.3/Lib/ftplib.py", line 153, in 
connect
source_address=self.source_address)
  File "/home/staging/local/python/Python-3.4.3/Lib/socket.py", line 512, in 
create_connection
raise err
  File "/home/staging/local/python/Python-3.4.3/Lib/socket.py", line 503, in 
create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out


The second one is failing because "readline" (probably GNU readline) didn't 
behave as expected. The installed GCC is
"gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)", which came with
"CentOS release 6.2 (Final)".  This is a long-running production server.
Is that too old?


FAIL: test_init (test.test_readline.TestReadline)
--
Traceback (most recent call last):
  File "/home/staging/local/python/Python-3.4.3/Lib/test/test_readline.py", 
line 57, in test_init
self.assertEqual(stdout, b'')
AssertionError: b'\x1b[?1034h' != b''

--
components: Installation
messages: 238869
nosy: nagle
priority: normal
severity: normal
status: open
title: "make test" on clean py3 install on CentOS 6.2 - 2 tests fail
versions: Python 3.4

___
Python tracker 
<http://bugs.python.org/issue23736>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23655] Memory corruption using pickle over pipe to subprocess

2015-03-15 Thread John Nagle


John Nagle added the comment:

More info: the problem is on the "unpickle" side.  If I use _Unpickle and 
Pickle, so the unpickle side is in Python, but the pickle side is in C, no 
problem. If I use Unpickle and _Pickle, so the unpickle side is C, crashes.

--

___
Python tracker 
<http://bugs.python.org/issue23655>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23655] Memory corruption using pickle over pipe to subprocess

2015-03-13 Thread John Nagle


John Nagle added the comment:

"minimize you data" - that's a big job here. Where are the tests for "pickle"?  
Is there one that talks to a subprocess over a pipe? Maybe I can adapt that.

--

___
Python tracker 
<http://bugs.python.org/issue23655>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23655] Memory corruption using pickle over pipe to subprocess

2015-03-13 Thread John Nagle


John Nagle added the comment:

> Or just use pickle._Pickler instead of pickle.Pickler and like 
> (implementation detail!).

Tried that.  Changed my own code as follows:

25a26
> 
71,72c72,73
< self.reader = pickle.Unpickler(self.proc.stdout)# set up reader
< self.writer = pickle.Pickler(self.proc.stdin,kpickleprotocolversion)
---
> self.reader = pickle._Unpickler(self.proc.stdout)# set up reader
> self.writer = pickle._Pickler(self.proc.stdin,kpickleprotocolversion
125,126c126,127
< self.reader = pickle.Unpickler(self.datain) # set up reader
< self.writer = pickle.Pickler(self.dataout,kpickleprotocolversion)   
---
> self.reader = pickle._Unpickler(self.datain) # set up reader
> self.writer = pickle._Pickler(self.dataout,kpickleprotocolversion)  

Program runs after those changes.

So it looks like CPickle has a serious memory corruption problem.

--

___
Python tracker 
<http://bugs.python.org/issue23655>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23655] Memory corruption using pickle over pipe to subprocess

2015-03-12 Thread John Nagle


New submission from John Nagle:

I'm porting a large, working system from Python 2 to Python 3, using "six", so 
the same code works with both. One part of the system works a lot like the 
multiprocessing module, but predates it. It launches child processes with 
"Popen" and talks to them using "pickle" over stdin/stdout as pipes.  Works 
fine under Python 2, and has been working in production for years.

Under Python 3, I'm getting errors that indicate memory corruption:

Fatal Python error: GC object already tracked

Current thread 0x1a14 (most recent call first):
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 411
in description
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 1248
in _get_descriptions
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 1182
in _read_result_packet
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 1132
in read
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 929
in _read_query_result
  File "C:\python34\lib\site-packages\pymysql\connections.py", line 768
in query
  File "C:\python34\lib\site-packages\pymysql\cursors.py", line 282 in
_query
  File "C:\python34\lib\site-packages\pymysql\cursors.py", line 134 in
execute
  File "C:\projects\sitetruth\domaincacheitem.py", line 128 in select
  File "C:\projects\sitetruth\domaincache.py", line 30 in search
  File "C:\projects\sitetruth\ratesite.py", line 31 in ratedomain
  File "C:\projects\sitetruth\RatingProcess.py", line 68 in call
  File "C:\projects\sitetruth\subprocesscall.py", line 140 in docall
  File "C:\projects\sitetruth\subprocesscall.py", line 158 in run
  File "C:\projects\sitetruth\RatingProcess.py", line 89 in main
  File "C:\projects\sitetruth\RatingProcess.py", line 95 in 

That's clear memory corruption.

Also,

  File "C:\projects\sitetruth\InfoSiteRating.py", line 200, in scansite
if len(self.badbusinessinfo) > 0 :  # if bad stuff
NameError: name 'len' is not defined

There are others, but those two should be impossible to cause from Python 
source. 

I've done the obvious stuff - deleted all .pyc files and Python cache 
directories.  All my code is in Python. Every library module came in via "pip", 
into a clean Python 3.4.3 (32 bit) installation on Win7/x86-64.

Currently installed packages (via "pip list")

beautifulsoup4 (4.3.2)
dnspython3 (1.12.0)
html5lib (0.999)
pip (6.0.8)
PyMySQL (0.6.6)
pyparsing (2.0.3)
setuptools (12.0.5)
six (1.9.0)

Nothing exotic there.  The project has zero local C code; any C code came 
from the Python installation or the above packages, most of which are pure 
Python.

It all works fine with Python 2.7.9.  Everything else in the program seems
to be working fine under both 2.7.9 and 3.4.3, until subprocesses are involved.

What's being pickled is very simple; no custom objects, although Exception 
types are sometimes pickled if the subprocess raises an exception.  

Pickler and Unpickler instances are being reused here.  A message is pickled, 
piped to the subprocess, unpickled, work is done, and a response comes back 
later via the return pipe.  A send looks like:

self.writer.dump(args)  # send data
self.dataout.flush()# finish output
self.writer.clear_memo()# no memory from cycle to cycle

and a receive looks like:

result = self.reader.load() # read and return from child
self.reader.memo = {}   # no memory from cycle to cycle

Those were the recommended way to reset "pickle" for new traffic years ago.
(You have to clear the receive side as well as the send side, or the dictionary
of saved objects grows forever.) My guess is that there's something about 
reusing "pickle" instances that botches memory uses in CPython 3's C code 
for "cpickle".  That should work, though; the "multiprocessing" module works
by sending pickled data over pipes.

The only code difference between Python 2 and 3 is that under Python 3 I have 
to use "sys.stdin.buffer" and "sys.stdout.buffer" as arguments to Pickler and 
Unpickler. Otherwise they complain that they're getting type "str".

Unfortunately, I don't have an easy way to reproduce this bug yet. 

Is there some way to force the use of the pure Python pickle module under 
Python 3? That would help isolate the problem.

John Nagle

--
components: Library (Lib)
messages: 238009
nosy: nagle
priority: normal
severity: normal
status: open
title: Memory corruption using pickle over pipe to subprocess
versions: Python 3.4

___
Python tracker 
<http://bugs.python.org/issue23655>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9679] unicode DNS names in urllib, urlopen

2015-03-06 Thread John Nagle


John Nagle added the comment:

Three years later, I'm converting to Python 3. Did this get fixed in Python 3?

--

___
Python tracker 
<http://bugs.python.org/issue9679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23476] SSL cert verify fail for "www.verisign.com"

2015-03-05 Thread John Nagle


John Nagle added the comment:

Will this be applied to the Python 2.7.9 library as well?

--

___
Python tracker 
<http://bugs.python.org/issue23476>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23476] SSL cert verify fail for "www.verisign.com"

2015-02-20 Thread John Nagle


John Nagle added the comment:

The "fix" in Ubuntu was to the Ubuntu certificate store, which is a directory 
tree with one cert per file, with lots of symbolic links with names based on 
hashes to express dependencies. Python's SSL isn't using that.  Python is 
taking in one big text file of SSL certs, with no link structure, and feeding 
it to OpenSSL.  

This is an option at

 SSLContext.load_verify_locations(cafile=None, capath=None, cadata=None)

I've been testing with "cafile".  "capath" is a path to a set of preprocessed 
certs laid out like the Ubuntu certificate store.  It may be that the directory 
parameter works but the single-file parameter does not.  It's possible to 
create such a directory from a single .pem file by splitting the big file into 
smaller files (the suggested tool is an "awk" script) and then running 
"c_rehash", which comes with OpenSSL.  See 
"https://www.openssl.org/docs/apps/c_rehash.html";  

So I tried a workaround, using Python 3.4.0 and Ubuntu 14.04 LTS.  I broke up 
"cacert.pem" into one file per cert with the suggested "awk" script, and used 
"c_rehash" to build all the links, creating a directory suitable for "capath". 
It didn't help.  Fails for "verisign.com", works for "python.org" and 
"google.com", just like the original single-file test. The "capath" version did 
exactly the same thing as the "cafile" version.

Python is definitely reading the cert file or directories; if I try an empty 
cert file or dir, everything fails, like it should.

Tried the same thing on Win7 x64. Same result. Tried the command line openssl 
tool using the cert directory. Same results as with the single file on both 
platforms.

So that's not it. 

A fix to OpenSSL was proposed in 2012, but no action was taken:

http://rt.openssl.org/Ticket/Display.html?id=2732 at
"Wed Jun 13 17:15:04 2012 Arne Becker - Correspondence added".

Any ideas?

--

___
Python tracker 
<http://bugs.python.org/issue23476>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23476] SSL cert verify fail for "www.verisign.com"

2015-02-17 Thread John Nagle


John Nagle added the comment:

To try this with the OpenSSL command line client, use this shell command:

openssl s_client -connect www.verisign.com:443 -CAfile cacert.pem

This provides more detailed error messages than Python provides.

"verify error:num=20:unable to get local issuer certificate" is the OpenSSL 
error for "www.verisign.com".  The corresponding Python error is "[SSL: 
CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)."

--

___
Python tracker 
<http://bugs.python.org/issue23476>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23476] SSL cert verify fail for "www.verisign.com"

2015-02-17 Thread John Nagle


John Nagle added the comment:

Add cert file for testing.  Source of this file is

http://curl.haxx.se/ca/cacert.pem

--
Added file: http://bugs.python.org/file38166/cacert.pem

___
Python tracker 
<http://bugs.python.org/issue23476>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23476] SSL cert verify fail for "www.verisign.com"

2015-02-17 Thread John Nagle


New submission from John Nagle:

SSL certificate verification fails for "www.verisign.com" when using the cert 
list from Firefox. Other sites ("google.com", "python.org") verify fine. 

This may be related to a known, and fixed, OpenSSL bug. See:

http://rt.openssl.org/Ticket/Display.html?id=2732&user=guest&pass=guest
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1014640 

Some versions of OpenSSL are known to be broken for cases where there multiple 
valid certificate trees.  This happens when one root cert is being phased out 
in favor of another, and cross-signing is involved.

Python ships with its own copy of OpenSSL on Windows.  Tests
for "www.verisign.com"

Win7, x64:

   Python 2.7.9 with OpenSSL 1.0.1j 15 Oct 2014. FAIL
   Python 3.4.2 with OpenSSL 1.0.1i 6 Aug 2014.  FAIL
   openssl s_client -OpenSSL 1.0.1h 5 Jun 2014   FAIL

Ubuntu 14.04 LTS, x64, using distro's versions of Python:

   Python 2.7.6 - test won't run, needs create_default_context
   Python 3.4.0 with OpenSSL 1.0.1f 6 Jan 2014.  FAIL
   openssl s_client  OpenSSL 1.0.1f 6 Jan 2014   PASS

That's with the same cert file in all cases. The OpenSSL version for Python 
programs comes from ssl.OPENSSL_VERSION. 

The Linux situation has me puzzled.  On Linux, Python is supposedly using the 
system version of OpenSSL. The versions match.  Why do Python and the OpenSSL 
command line client disagree?  Different options passed to OpenSSL by Python?

A simple test program and cert file are attached.  Please try this in your 
environment.

--
components: Library (Lib)
files: ssltest.py
messages: 236158
nosy: nagle
priority: normal
severity: normal
status: open
title: SSL cert verify fail for "www.verisign.com"
versions: Python 2.7, Python 3.4
Added file: http://bugs.python.org/file38165/ssltest.py

___
Python tracker 
<http://bugs.python.org/issue23476>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20916] ssl.enum_certificates() will not return all certificates trusted by Windows

2015-02-11 Thread John Nagle


John Nagle added the comment:

Amusingly, I'm getting this failure on "verisign.com" on Windows 7 with Python 
2.7.9:

"HTTP error - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed 
(_ssl.c:581)..)"  The current Verisign root cert (Class 3 public) is, indeed, 
not in the Windows 7 cert store. Verisign has a newer root cert.

That error message ought to be improved. Tell the user which cert was rejected.

"python.org", with a DigiCert certificate, works fine.

I'm going to use the Mozilla certificate store explicitly.

--
nosy: +nagle

___
Python tracker 
<http://bugs.python.org/issue20916>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22873] Re: SSLsocket.getpeercert - return ALL the fields of the certificate.

2014-11-14 Thread John Nagle


John Nagle added the comment:

May be a duplicate of Issue 204679: "ssl.getpeercert() should include 
extensions"

http://bugs.python.org/issue20469

--

___
Python tracker 
<http://bugs.python.org/issue22873>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22873] Re: SSLsocket.getpeercert - return ALL the fields of the certificate.

2014-11-14 Thread John Nagle


New submission from John Nagle:

In each revision of "getpeercert", a few more fields are returned. Python 3.2 
added "issuer" and "notBefore".  Python 3.4 added "crlDistributionPoints", 
"caIssuers", and OCSP URLS. But some fields
still aren't returned.  I happen to need CertificatePolicies, which is how you 
distinguish DV, OV, and EV certs.

   Here's what you get now from "getpeercert()" for "bankofamerica.com":

{'OCSP': ('http://EVSecure-ocsp.verisign.com',),
 'caIssuers': ('http://EVSecure-aia.verisign.com/EVSecure2006.cer',),
 'crlDistributionPoints':
('http://EVSecure-crl.verisign.com/EVSecure2006.crl',),
 'issuer': ((('countryName', 'US'),),
(('organizationName', 'VeriSign, Inc.'),),
(('organizationalUnitName', 'VeriSign Trust Network'),),
(('organizationalUnitName',
  'Terms of use at https://www.verisign.com/rpa (c)06'),),
(('commonName', 'VeriSign Class 3 Extended Validation SSL
CA'),)),
 'notAfter': 'Mar 22 23:59:59 2015 GMT',
 'notBefore': 'Feb 20 00:00:00 2014 GMT',
 'serialNumber': '69A7BC85C106DDE1CF4FA47D5ED813DC',
 'subject': ((('1.3.6.1.4.1.311.60.2.1.3', 'US'),),
 (('1.3.6.1.4.1.311.60.2.1.2', 'Delaware'),),
 (('businessCategory', 'Private Organization'),),
 (('serialNumber', '2927442'),),
 (('countryName', 'US'),),
 (('postalCode', '60603'),),
 (('stateOrProvinceName', 'Illinois'),),
 (('localityName', 'Chicago'),),
 (('streetAddress', '135 S La Salle St'),),
 (('organizationName', 'Bank of America Corporation'),),
 (('organizationalUnitName', 'Network Infrastructure'),),
 (('commonName', 'www.bankofamerica.com'),)),
 'subjectAltName': (('DNS', 'mobile.bankofamerica.com'),
('DNS', 'www.bankofamerica.com')),
 'version': 3}

Missing fields (from Firefox's view of the cert) include:

 Certificate Policies:
2.16.840.1.113733.1.7.23.6:
Extended Validation (EV) SSL Server Certificate
Certification Practice Statement pointer: https://www.verisign.com/cps
(This tells you it's a valid EV cert).

 Certificate Basic Constraints:
Is not a Certificate Authority
(which means they can't issue more certs below this one)

 Extended Key Usage:
TLS Web Server Authentication (1.3.6.1.5.5.7.3.1)
TLS Web Client Authentication (1.3.6.1.5.5.7.3.2)
(which means this cert is for web use, not email or code signing)

   How about just returning ALL the remaining fields and finishing the job, so 
this doesn't have to be fixed again?  Thanks.

--
components: Library (Lib)
messages: 231166
nosy: nagle
priority: normal
severity: normal
status: open
title: Re: SSLsocket.getpeercert - return ALL the fields of the certificate.
versions: Python 3.4

___
Python tracker 
<http://bugs.python.org/issue22873>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18907] urllib2.open FTP open times out at 20 secs despite timeout parameter

2013-09-14 Thread John Nagle


John Nagle added the comment:

The server operator at the US Securities and Exchange Commission writes to me: 
"There was a DNS issue that affected the availability of FTP at night. We 
believe it is resolved. Please let us know if you encounter any further 
problems.  Thanks, SEC Webmaster".

So this may have been a DNS related issue, perhaps a load balancer referring 
the connection to a dead machine.  Yet, for some reason, the Windows command 
line FTP client can recover from this problem after 20 seconds? What are they 
doing right? Completely retrying the open?

--
status: pending -> open

___
Python tracker 
<http://bugs.python.org/issue18907>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18907] urllib2.open FTP open times out at 20 secs despite timeout parameter

2013-09-02 Thread John Nagle


John Nagle added the comment:

Reproduced problem in Python 3.3 (Win32). Error message there is:

Open of ftp://ftp.sec.gov/edgar/daily-index failed after 21.08 seconds: 


So this is broken in both Python 2.7 and Python 3.3.

--
versions: +Python 3.3
Added file: http://bugs.python.org/file31559/edgartimeouttest3.py

___
Python tracker 
<http://bugs.python.org/issue18907>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18907] urllib2.open FTP open times out at 20 secs despite timeout parameter

2013-09-02 Thread John Nagle


New submission from John Nagle:

urllib2.open for an FTP url does not obey the timeout parameter.

Attached test program times out on FTP open after 21 seconds, even though the 
specified timeout is 60 seconds.  Timing is consistent; times have ranged from 
21.03 to 21.05 seconds. Python documentation 
(http://docs.python.org/2/library/urllib2.html) says "The optional timeout 
parameter specifies a timeout in seconds for blocking operations like the 
connection attempt (if not specified, the global default timeout setting will 
be used). This actually only works for HTTP, HTTPS and FTP connections."  The 
documentation for Python 3 reads the same.

Open of ftp://ftp.sec.gov/edgar/daily-index failed after 21.05 seconds: 


This was on Windows 7, but the same result is observed on Linux.  

(The FTP server at the U.S. Securities and Exchange Commission is now imposing 
a 20-second connection delay during busy periods.  This is causing our Python 
code that retrieves new SEC filings to fail.  It may be necessary to run the 
program at different times of the day to reproduce the problem.)

--
components: Library (Lib)
files: edgartimeouttest.py
messages: 196800
nosy: nagle
priority: normal
severity: normal
status: open
title: urllib2.open FTP open times out at 20 secs despite timeout parameter
versions: Python 2.7
Added file: http://bugs.python.org/file31558/edgartimeouttest.py

___
Python tracker 
<http://bugs.python.org/issue18907>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15873] "datetime" cannot parse ISO 8601 dates and times

2012-09-09 Thread John Nagle


John Nagle added the comment:

For what parts of ISO 8601 to accept, there's a standard: RFC3339, "Date and 
Time on the Internet: Timestamps".  See section 5.6:

   date-fullyear   = 4DIGIT
   date-month  = 2DIGIT  ; 01-12
   date-mday   = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31 based on
 ; month/year
   time-hour   = 2DIGIT  ; 00-23
   time-minute = 2DIGIT  ; 00-59
   time-second = 2DIGIT  ; 00-58, 00-59, 00-60 based on leap second
 ; rules
   time-secfrac= "." 1*DIGIT
   time-numoffset  = ("+" / "-") time-hour ":" time-minute
   time-offset = "Z" / time-numoffset

   partial-time= time-hour ":" time-minute ":" time-second
 [time-secfrac]
   full-date   = date-fullyear "-" date-month "-" date-mday
   full-time   = partial-time time-offset

   date-time   = full-date "T" full-time

   NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this
  syntax may alternatively be lower case "t" or "z" respectively.

  ISO 8601 defines date and time separated by "T".
  Applications using this syntax may choose, for the sake of
  readability, to specify a full-date and full-time separated by
  (say) a space character.

That's straightforward, and can be expressed as a regular expression.

--

___
Python tracker 
<http://bugs.python.org/issue15873>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15873] "datetime" cannot parse ISO 8601 dates and times

2012-09-06 Thread John Nagle


John Nagle added the comment:

Re: "%z format is supported".

That's platform-specific; the actual parsing is delegated to the C library.  
It's not in Python 2.7 / Win32:

  ValueError: 'z' is a bad directive in format '%Y-%m-%dT%H:%M:%S%z'

It really shouldn't be platform-specific; the underlying platform is irrelevant 
to this task.  That's more of a documentation error; the features not common to 
all supported Python platforms should not be mentioned in the documentation.  

Re: "I would very much like such promiscuous parser to be implemented in 
datetime.__new__. "

For string input, it's probably better to do this conversion in a specific 
class-level function.  Full ISO 8601 dates/times generally come from 
computer-generated data via a file or API.  If invalid text shows up, it should 
be detected as an error, not be heuristically interpreted as a date.  There's 
already "fromtimestamp" and "fromordinal", 
and "isoformat" as an instance method, so "fromisoformat" seems reasonable.

I'd also suggest providing a standard subclass of tzinfo in datetime for fixed 
offsets.  That's needed to express the time zone information in an ISO 8601 
date. The new "fromisoformat" would convert an ISO 8601 date/time would be 
convertible to a time-zone "aware" datetime object.  If converted back to an 
ISO 8601 string with .isoformat(), the round trip should preserve the original 
data, including time zone offset.

(Several more implementations of this conversion have turned up.  In addition 
to the four already mentioned, there was one in xml.util, and one in 
feedparser. There are probably more yet to be found.)

--

___
Python tracker 
<http://bugs.python.org/issue15873>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15873] "datetime" cannot parse ISO 8601 dates and times

2012-09-06 Thread John Nagle


New submission from John Nagle:

The datetime module has support for output to a string of dates and times in 
ISO 8601 format ("2012-09-09T18:00:00-07:00"), with the object method 
"isoformat([sep])".  But there's no support for parsing such strings.  A string 
to datetime class method should be provided, one capable of parsing at least 
the RFC 3339 subset of ISO 8601.

The problem is parsing time zone information correctly.
The allowed formats for time zone are
   empty   - no TZ, date/time is "naive" in the datetime sense
   Z   - zero, or Zulu time, i.e. UTC.
   [+-]nn.nn   - offset from UTC
   
"strptime" does not understand timezone offsets. The "datetime" documentation 
suggests that the "z" format directive handles time zone info, but that's not 
actually implemented for input.  

Pypi has four modules for parsing ISO 8601 dates. Each has least one major
problem in time zone handling:

iso8601 0.1.4   
   Abandonware.  Mishandles time zone when time zone is "Z" and
   the default time zone is specified. 
iso8601.py 0.1dev   
   Always returns a "naive" datetime object, even if zone specified.
iso8601plus 0.1.6   
   Fork of abandonware version above.  Same bug.
zc.iso8601 0.2.0
   Zope version.  Imports the pytz module with the full Olsen time zone
   database, but doesn't actually use that database.

Thus, nothing in Pypi provides a good alternative. 

It would be appropriate to handle this in the datetime module.  One small, 
correct, tested function would be better than the existing five bad 
alternatives.

--
components: Library (Lib)
messages: 169941
nosy: nagle
priority: normal
severity: normal
status: open
title: "datetime" cannot parse ISO 8601 dates and times
type: enhancement
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4

___
Python tracker 
<http://bugs.python.org/issue15873>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9679] unicode DNS names in urllib, urlopen

2012-06-13 Thread John Nagle


John Nagle  added the comment:

The current convention is that domains go into DNS lookup as punycode, and the 
port, query, and fragment fields of the URL are encoded with percent-escapes.  
See

http://lists.w3.org/Archives/Public/ietf-http-wg/2011OctDec/0155.html

Python needs to get with the program here.

--

___
Python tracker 
<http://bugs.python.org/issue9679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9679] unicode DNS names in urllib, urlopen

2012-06-13 Thread John Nagle


John Nagle  added the comment:

A "IRI library" is not needed to fix this problem.  It's already fixed in the 
sockets library and the http library.  We just need consistency in urllib2.  

urllib2 functions which take a "url" parameter should apply 
"encodings.idna.ToASCII" to each label of the domain name.  

urllib2 function which return a "url" value (such as "geturl()") should apply 
"encodings.idna.ToUnicode" to each label of the domain name.

Note that in both cases, the conversion function must be applied to each label 
(field between "."s) of the domain name only.  Applying it to the entire domain 
name or the entire URL will not work. 

If there are future changes to domain syntax, those should go into 
"encodings.idna", which is the proper library for domain syntax issues.

--
nosy: +nagle

___
Python tracker 
<http://bugs.python.org/issue9679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11900] 2.7.1 unicode subclasses not calling str() for print statement

2011-12-15 Thread John Nagle


John Nagle  added the comment:

This has nothing to do with Python 3.  There's a difference in __str__ handling 
between Python 2.6 and Python 2.7.2.  It's enough to crash BeautifulSoup:

[Thread-8] Unexpected EXCEPTION while processing page 
"http://www.verisign.com": global name '__str__' is not defined
[Thread-8] Traceback (most recent call last):
...
[Thread-8]   File "C:\projects\sitetruth\BeautifulSoup.py", line 646, in 
prettify
[Thread-8] return self.__str__(encoding, True)
[Thread-8]   File "C:\projects\sitetruth\BeautifulSoup.py", line 621, in __str__
[Thread-8] contents = self.renderContents(encoding, prettyPrint, 
indentContents)
[Thread-8]   File "C:\projects\sitetruth\BeautifulSoup.py", line 656, in 
renderContents
[Thread-8] text = c.__str__(encoding)
[Thread-8]   File "C:\projects\sitetruth\BeautifulSoup.py", line 415, in __str__
[Thread-8] return "" % NavigableString.__str__(self, encoding)
[Thread-8]   File "C:\projects\sitetruth\BeautifulSoup.py", line 393, in 
__unicode__
[Thread-8] return __str__(self, None)
[Thread-8] NameError: global name '__str__' is not defined

The class method that's failing is simply

class NavigableString(unicode, PageElement):
...
def __unicode__(self):
return __str__(self, None)    EXCEPTION RAISED HERE 

def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING):
if encoding:
return self.encode(encoding)
else:
return self

Using __str__ in the global namespace is probably wrong, and in a later version 
of BeautifulSoup, that code is changed to

def __unicode__(self):
return str(self).decode(DEFAULT_OUTPUT_ENCODING)

which seems to work.  However, it is a real change from 2.6 to 2.7 that breaks 
code.

--
nosy: +nagle

___
Python tracker 
<http://bugs.python.org/issue11900>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13288] SSL module doesn't allow access to cert issuer information

2011-10-28 Thread John Nagle


New submission from John Nagle :

The SSL module still doesn't return much information from the
certificate.  SSLSocket.getpeercert only returns a few basic items
about the certificate subject.  You can't retrieve issuer information,
and you can't get the extensions needed to check if a cert is an EV cert.

With the latest flaps about phony cert issuers, (another CA compromise hit the 
news today) it's worth having issuer info available.
It was available in the old M2Crypto module, but not in the current Python SSL 
module.

    John Nagle

--
components: Library (Lib)
messages: 146579
nosy: nagle
priority: normal
severity: normal
status: open
title: SSL module doesn't allow access to cert issuer information
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4

___
Python tracker 
<http://bugs.python.org/issue13288>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10202] ftplib doesn't check close status after sending file

2010-10-26 Thread John Nagle


John Nagle  added the comment:

Proper behavior for ftplib when sending is to send all desired data, then call 
"sock.shutdown(socket.SHUT_RDWR)".  This indicates that no more data will be 
sent, and blocks until the receiver has acknowledged all their data. 

"socketmodule.c" handles this right.  "shutdown" is called on the socket, and 
the return value is checked.  If the return value is negative, an error handler 
is returned.  Compare the handling in "close".  

FTP send is one of the few situations where this matters, because FTP uses the 
close of the data connection to indicate EOF.

--

___
Python tracker 
<http://bugs.python.org/issue10202>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10202] ftplib doesn't check close status after sending file

2010-10-26 Thread John Nagle


New submission from John Nagle :

"ftplib" doesn't check the status on socket close after writing.  This can lead 
to silently truncated files when sending files with "ftplib".

A report of truncated files on comp.lang.python led me to check the source 
code. 

The "ftplib" module does sending by calling sock_sendall in "socketmodule.c". 
That does an OS-level "send", and once everything has been sent, returns.

But an OS-level socket send returns when the data is queued for sending, not 
when it is delivered.  Only when the socket is closed, and the close status 
checked, do you know if the data was delivered. There's a final TCP close 
handshake that occurs when close has been called at both ends, and only when it 
completes successfully do you know that the data has been delivered.

At the socket level, this is performed by "shutdown" (which closes the 
connection and returns the proper network status information), or by "close".

Look at sock_close in "socketmodule.c".  Note that it ignores the return status 
on close, always returns None, and never raises an exception.  As the Linux 
manual page for "close" says:
"Not checking the return value of close() is a common but nevertheless serious 
programming error. It is quite possible that errors on a previous write(2) 
operation are first reported at the final close(). Not checking the return 
value when closing the file may lead to silent loss of data."

"ftplib", in "storlines" and "storbinary", calls "close" without checking the 
status or calling "shutdown" first.  So if the other end disconnects after all 
data has been queued locally but not sent, received and acknowledged, the 
sender will never know.

--
components: Library (Lib)
messages: 119638
nosy: nagle
priority: normal
severity: normal
status: open
title: ftplib doesn't check close status after sending file
type: behavior
versions: Python 2.5, Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3

___
Python tracker 
<http://bugs.python.org/issue10202>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7558] Python 3.1.1 installer botches upgrade when installation is not on C drive.

2009-12-21 Thread John Nagle


John Nagle  added the comment:

Cancel bug report.

It was my error. The installer says it is replacing the existing
installation, but by default installs it in C:.  But that can be
overridden in the directory entry field below the big empty white entry box.

--

___
Python tracker 
<http://bugs.python.org/issue7558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7558] Python 3.1.1 installer botches upgrade when installation is not on C drive.

2009-12-21 Thread John Nagle


New submission from John Nagle :

I just installed "python3.1.1.msi" on a system that had "python3.1.msi"
installed in "D:/python31".  In this situation, the installer does not
ask the user for a destination directory.  The installer found the old
installation in "D:/python31", removed most but not all of the files
there, and then installed the new version in "C:/python31". 

I uninstalled the failed install, and reinstalled.

On a new install, the installer prompts for the destination dir, and
that works. Upgrade installs, though, are botched.

John Nagle

--
components: Installation
messages: 96768
nosy: nagle
severity: normal
status: open
title: Python 3.1.1 installer botches upgrade when installation is not on C 
drive.
type: behavior
versions: Python 3.1

___
Python tracker 
<http://bugs.python.org/issue7558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1712522] urllib.quote throws exception on Unicode URL

2009-04-22 Thread John Nagle


John Nagle  added the comment:

Note that the problem can't be solved by telling end users to call a
different "quote" function.  The problem is down inside a library
module. "robotparser" is calling "urllib.quote". One of those two
library modules needs to be fixed.

--

___
Python tracker 
<http://bugs.python.org/issue1712522>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1637] urlparse.urlparse misparses URLs with query but no path

2007-12-19 Thread John Nagle


John Nagle added the comment:

I tried downloading the latest rev of urlparse.py (59480) and it flunked
its own unit test, "urlparse.test()"  Two test cases fail. So I don't
want to try to fix the module until the last people to change it fix
their unit test problems. 
The fix I provided should fix the problem I reported, but I'm not sure
if there's anything else wrong, since it flunks its unit test.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1637>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1637] urlparse.urlparse misparses URLs with query but no path

2007-12-16 Thread John Nagle


New submission from John Nagle:

urlparse.urlparse will mis-parse URLs which have a "/" after a "?".
>>
>> sa1 = 'http://example.com?blahblah=/foo'
>> sa2 = 'http://example.com?blahblah=foo'
>> print urlparse.urlparse(sa1)
>> ('http', 'example.com?blahblah=', '/foo', '', '', '') # WRONG
>> print urlparse.urlparse(sa2)
>> ('http', 'example.com', '', '', 'blahblah=foo', '') # RIGHT

That's wrong. RFC3896 ("Uniform Resource Identifier (URI): Generic
Syntax"), page 23 says

"The characters slash ("/") and question mark ("?") may represent data
within the query component.  Beware that some older, erroneous
implementations may not handle such data correctly when it is used as
the base URI for relative references (Section 5.1), apparently
because they fail to distinguish query data from path data when
looking for hierarchical separators."

 So "urlparse" is an "older, erroneous implementation".  Looking
 at the code for "urlparse", it references RFC1808 (1995), which
 was a long time ago, three revisions back.
>>
>> Here's the bad code:
>>
>> def _splitnetloc(url, start=0):
>> for c in '/?#': # the order is important!
>> delim = url.find(c, start)
>> if delim >= 0:
>> break
>> else:
>> delim = len(url)
>> return url[start:delim], url[delim:]
>>
>> That's just wrong.  The domain ends at the first appearance of
>> any character in '/?#', but that code returns the text before the
>> first '/' even if there's an earlier '?'.  A URL/URI doesn't
>> have to have a path, even when it has query parameters. 

OK, here's a fix to "urlparse", replacing _splitnetloc.  I didn't use
a regular expression because "urlparse" doesn't import "re", and I
didn't want to change that.

def _splitnetloc(url, start=0):
delim = len(url)# position of end of domain part of url, default is end
for c in '/?#':# look for delimiters; the order is NOT important   
wdelim = url.find(c, start)# find first of this delim
if wdelim >= 0:# if found
delim = min(delim, wdelim)# use earliest delim position
return url[start:delim], url[delim:]# return (domain, rest)

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1637>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1637] urlparse.urlparse misparses URLs with query but no path

2007-12-16 Thread John Nagle


Changes by John Nagle:


--
components: Library (Lib)
nosy: nagle
severity: normal
status: open
title: urlparse.urlparse misparses URLs with query but no path
type: behavior
versions: Python 2.4, Python 2.5

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1637>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

41 matches

Mail list logo