Bug#471970: wget -N and space in the path (HTML encoding)
Package: wget Version: 1.10.2-0bpo1 Severity: normal wget -N does not work when filename has a space in the filename. Steps to reproduce: $ echo http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf; dummy.txt $ wget -N -i dummy.txt $ wget -N -i dummy.txt the second time, the file should not have been downloaded. I suspect the use of % in the HTML URL encoding is not being decoded properly for use in the -N option. Thanks ! -- System Information: Debian Release: 3.1 Architecture: i386 (i686) Kernel: Linux 2.6.18-4-686-bigmem Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Versions of packages wget depends on: ii libc6 2.5-9+b1 GNU C Library: Shared libraries ii libssl0.9.7 0.9.7e-3sarge5 SSL shared libraries -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))
I still do not believe this has anything to do with the server, if you have a couple of seconds please try this file instead: http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf You'll see that both files are stored at the exact same location, but wget report two different things (*). I *seriously* doubt the server has a per file configuration... Thanks for your time anyway, -Mathieu (*) --19:53:33-- http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf = `ENsphere%20DICOM%203%20Conformance%20Statement.pdf' Resolving www.medical.philips.com... 161.88.247.197 Connecting to www.medical.philips.com|161.88.247.197|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1,936 (1.9K) [text/html] Last-modified header missing -- time-stamps turned off. --19:53:33-- http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf = `ENsphere%20DICOM%203%20Conformance%20Statement.pdf' Reusing existing connection to www.medical.philips.com:80. HTTP request sent, awaiting response... 200 OK Length: 1,936 (1.9K) [text/html] 0K . 100% 694.97 KB/s 19:53:33 (694.97 KB/s) - `ENsphere%20DICOM%203%20Conformance%20Statement.pdf' saved [1936/1936] --19:53:33-- http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf = `usit15l3_final.pdf' Reusing existing connection to www.medical.philips.com:80. HTTP request sent, awaiting response... 200 OK Length: 217,998 (213K) [application/octet-stream] Server file no newer than local file `usit15l3_final.pdf' -- not retrieving. FINISHED --19:53:33-- Downloaded: 1,936 bytes in 1 files On Fri, Mar 21, 2008 at 7:00 PM, Debian Bug Tracking System [EMAIL PROTECTED] wrote: This is an automatic notification regarding your Bug report which was filed against the wget package: #471970: wget -N and space in the path (HTML encoding) It has been closed by Micah Cowan [EMAIL PROTECTED]. Their explanation is attached below along with your original report. If this explanation is unsatisfactory and you have not received a better one in a separate message then please contact Micah Cowan [EMAIL PROTECTED] by replying to this email. -- 471970: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471970 Debian Bug Tracking System Contact [EMAIL PROTECTED] with problems -- Forwarded message -- From: Micah Cowan [EMAIL PROTECTED] To: Mathieu Malaterre [EMAIL PROTECTED], [EMAIL PROTECTED] Date: Fri, 21 Mar 2008 10:56:28 -0700 Subject: Re: Bug#471970: wget -N and space in the path (HTML encoding) -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mathieu Malaterre wrote: Package: wget Version: 1.10.2-0bpo1 Severity: normal wget -N does not work when filename has a space in the filename. Steps to reproduce: $ echo http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf; dummy.txt $ wget -N -i dummy.txt $ wget -N -i dummy.txt the second time, the file should not have been downloaded. In the log that Wget issues while downloading that file, is the line: Last-modified header missing -- time-stamps turned off. Your issue has nothing to do with spaces in the filename (at least, on Wget's end), and everything to do with the server not telling wget when it was last modified. Therefore, wget cannot determine whether the file on the server is newer or older than the local copy. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH4/bM7M8hyUobTrERApCdAJsFlWyubh1pnVY8qwgatoZPRWDXBgCdFyVn yjgZ+itvfDouqQ40WL3C4BE= =Mn8U -END PGP SIGNATURE- -- Forwarded message -- From: Mathieu Malaterre [EMAIL PROTECTED] To: Debian Bug Tracking System [EMAIL PROTECTED] Date: Fri, 21 Mar 2008 14:27:51 +0100 Subject: wget -N and space in the path (HTML encoding) Package: wget Version: 1.10.2-0bpo1 Severity: normal wget -N does not work when filename has a space in the filename. Steps to reproduce: $ echo http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf; dummy.txt $ wget -N -i dummy.txt $ wget -N -i dummy.txt the second time, the file should not have been downloaded. I suspect the use of % in the HTML URL encoding is not being
Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))
Ra ! Ok I finally found the issue, AND I WAS RIGHT ! Sorry :( Try this: $ cat dummy.txt a href=http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere DICOM 3 Conformance Statement.pdfdummy/a Then $ wget -N --force-html -i dummy.txt --20:03:04-- http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%20DICOM%203%20Conformance%20Statement.pdf = `ENsphere DICOM 3 Conformance Statement.pdf' Resolving www.medical.philips.com... 161.88.247.197 Connecting to www.medical.philips.com|161.88.247.197|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 334,708 (327K) [application/octet-stream] Server file no newer than local file `ENsphere DICOM 3 Conformance Statement.pdf' -- not retrieving. So please reopn the bug report, as I really believe wget -i and space in the path is not working. Thank you -Mathieu On Fri, Mar 21, 2008 at 7:54 PM, Mathieu Malaterre [EMAIL PROTECTED] wrote: I still do not believe this has anything to do with the server, if you have a couple of seconds please try this file instead: http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf You'll see that both files are stored at the exact same location, but wget report two different things (*). I *seriously* doubt the server has a per file configuration... Thanks for your time anyway, -Mathieu (*) --19:53:33-- http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf = `ENsphere%20DICOM%203%20Conformance%20Statement.pdf' Resolving www.medical.philips.com... 161.88.247.197 Connecting to www.medical.philips.com|161.88.247.197|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1,936 (1.9K) [text/html] Last-modified header missing -- time-stamps turned off. --19:53:33-- http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf = `ENsphere%20DICOM%203%20Conformance%20Statement.pdf' Reusing existing connection to www.medical.philips.com:80. HTTP request sent, awaiting response... 200 OK Length: 1,936 (1.9K) [text/html] 0K . 100% 694.97 KB/s 19:53:33 (694.97 KB/s) - `ENsphere%20DICOM%203%20Conformance%20Statement.pdf' saved [1936/1936] --19:53:33-- http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf = `usit15l3_final.pdf' Reusing existing connection to www.medical.philips.com:80. HTTP request sent, awaiting response... 200 OK Length: 217,998 (213K) [application/octet-stream] Server file no newer than local file `usit15l3_final.pdf' -- not retrieving. FINISHED --19:53:33-- Downloaded: 1,936 bytes in 1 files On Fri, Mar 21, 2008 at 7:00 PM, Debian Bug Tracking System [EMAIL PROTECTED] wrote: This is an automatic notification regarding your Bug report which was filed against the wget package: #471970: wget -N and space in the path (HTML encoding) It has been closed by Micah Cowan [EMAIL PROTECTED]. Their explanation is attached below along with your original report. If this explanation is unsatisfactory and you have not received a better one in a separate message then please contact Micah Cowan [EMAIL PROTECTED] by replying to this email. -- 471970: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471970 Debian Bug Tracking System Contact [EMAIL PROTECTED] with problems -- Forwarded message -- From: Micah Cowan [EMAIL PROTECTED] To: Mathieu Malaterre [EMAIL PROTECTED], [EMAIL PROTECTED] Date: Fri, 21 Mar 2008 10:56:28 -0700 Subject: Re: Bug#471970: wget -N and space in the path (HTML encoding) -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mathieu Malaterre wrote: Package: wget Version: 1.10.2-0bpo1 Severity: normal wget -N does not work when filename has a space in the filename. Steps to reproduce: $ echo http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf; dummy.txt $ wget -N -i dummy.txt $ wget -N -i dummy.txt the second time, the file should not have been downloaded. In the log that Wget issues while downloading that file, is the line: Last-modified header missing -- time-stamps turned off. Your issue has nothing to do with spaces in the filename (at least, on Wget's end), and everything to do with the server not telling wget when it was last modified. Therefore, wget cannot determine
Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mathieu Malaterre wrote: $ wget -N --force-html -i dummy.txt --20:03:04-- http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%20DICOM%203%20Conformance%20Statement.pdf = `ENsphere DICOM 3 Conformance Statement.pdf' Resolving www.medical.philips.com... 161.88.247.197 Connecting to www.medical.philips.com|161.88.247.197|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 334,708 (327K) [application/octet-stream] Server file no newer than local file `ENsphere DICOM 3 Conformance Statement.pdf' -- not retrieving. So please reopn the bug report, as I really believe wget -i and space in the path is not working. I'm at a loss as to how the above demonstrates a problem. It decided not to download the file, because it was no newer than the local copy. Isn't that the behavior you were asking for? That's certainly what -N is intended for. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH5Az87M8hyUobTrERAoqbAJ9Ka/lE01OmC1cCWYMEVQqDfKV0iwCdEzOe qPqHaLYwnXPrT6AMnKOLkiQ= =NKzV -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mathieu Malaterre wrote: I still do not believe this has anything to do with the server, if you have a couple of seconds please try this file instead: http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf The first one is not a good link (note that it doesn't contain spaces; it contains %2520, which decodes to %20, literally). It goes to a Page not found page (which lacks a modification timestamp, as you can see from the logs). However, the server issues a 200 HTTP status code for that page, so Wget can't know that it's not a good file (it does, however, notice that it received HTML and not PDF or application/octet-stream. In your other example, you corrected the percent-encoding, and lo! it worked. FWIW, it's helpful to run wget with --debug so you can see the headers involved, in addition to other helpful information. - -- HAND, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH5A9o7M8hyUobTrERAq7sAJ9MdOa5y+bTi2P2rDC2xywmPOu9/gCggwnr ALk51+oYcQ9lrUjuDDzREVE= =/E8u -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))
On Fri, Mar 21, 2008 at 8:41 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mathieu Malaterre wrote: I still do not believe this has anything to do with the server, if you have a couple of seconds please try this file instead: http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf The first one is not a good link (note that it doesn't contain spaces; it contains %2520, which decodes to %20, literally). It goes to a Page not found page (which lacks a modification timestamp, as you can see from the logs). However, the server issues a 200 HTTP status code for that page, so Wget can't know that it's not a good file (it does, however, notice that it received HTML and not PDF or application/octet-stream. In your other example, you corrected the percent-encoding, and lo! it worked. Thanks. You can close the bug. Sorry for the noise. -- Mathieu -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))
contact Micah Cowan [EMAIL PROTECTED] by replying to this email. -- 471970: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471970 Debian Bug Tracking System Contact [EMAIL PROTECTED] with problems -- Forwarded message -- From: Micah Cowan [EMAIL PROTECTED] To: Mathieu Malaterre [EMAIL PROTECTED], [EMAIL PROTECTED] Date: Fri, 21 Mar 2008 10:56:28 -0700 Subject: Re: Bug#471970: wget -N and space in the path (HTML encoding) -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mathieu Malaterre wrote: Package: wget Version: 1.10.2-0bpo1 Severity: normal wget -N does not work when filename has a space in the filename. Steps to reproduce: $ echo http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf; dummy.txt $ wget -N -i dummy.txt $ wget -N -i dummy.txt the second time, the file should not have been downloaded. In the log that Wget issues while downloading that file, is the line: Last-modified header missing -- time-stamps turned off. Your issue has nothing to do with spaces in the filename (at least, on Wget's end), and everything to do with the server not telling wget when it was last modified. Therefore, wget cannot determine whether the file on the server is newer or older than the local copy. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH4/bM7M8hyUobTrERApCdAJsFlWyubh1pnVY8qwgatoZPRWDXBgCdFyVn yjgZ+itvfDouqQ40WL3C4BE= =Mn8U -END PGP SIGNATURE- -- Forwarded message -- From: Mathieu Malaterre [EMAIL PROTECTED] To: Debian Bug Tracking System [EMAIL PROTECTED] Date: Fri, 21 Mar 2008 14:27:51 +0100 Subject: wget -N and space in the path (HTML encoding) Package: wget Version: 1.10.2-0bpo1 Severity: normal wget -N does not work when filename has a space in the filename. Steps to reproduce: $ echo http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf; dummy.txt $ wget -N -i dummy.txt $ wget -N -i dummy.txt the second time, the file should not have been downloaded. I suspect the use of % in the HTML URL encoding is not being decoded properly for use in the -N option. Thanks ! -- System Information: Debian Release: 3.1 Architecture: i386 (i686) Kernel: Linux 2.6.18-4-686-bigmem Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Versions of packages wget depends on: ii libc6 2.5-9+b1 GNU C Library: Shared libraries ii libssl0.9.7 0.9.7e-3sarge5 SSL shared libraries -- no debconf information -- Mathieu -- Mathieu -- Mathieu -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mathieu Malaterre wrote: Thanks. You can close the bug. Sorry for the noise. No worries. :) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH5BTM7M8hyUobTrERApb0AKCO3+6R+icl6HXdsS5OnjJZIh5aTQCeNOrN W6PD1av933wZhtfvofGjYYk= =hKMz -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]