Bug#471970: wget -N and space in the path (HTML encoding)

2008-03-21 Thread Mathieu Malaterre
Package: wget
Version: 1.10.2-0bpo1
Severity: normal


wget -N does not work when filename has a space in the filename.

Steps to reproduce:

$ echo
http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf;
  dummy.txt
$ wget -N -i dummy.txt
$ wget -N -i dummy.txt

the second time, the file should not have been downloaded. I suspect the
use of % in the HTML URL encoding is not being decoded properly for use
in the -N option.

Thanks !

-- System Information:
Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.6.18-4-686-bigmem
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Versions of packages wget depends on:
ii  libc6 2.5-9+b1   GNU C Library: Shared libraries
ii  libssl0.9.7   0.9.7e-3sarge5 SSL shared libraries

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))

2008-03-21 Thread Mathieu Malaterre
I still do not believe this has anything to do with the server, if you
have a couple of seconds please try this file instead:

http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf
http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf

You'll see that both files are stored at the exact same location, but
wget report two different things (*). I *seriously* doubt the server
has a per file configuration...

Thanks for your time anyway,
-Mathieu

(*)
--19:53:33--  
http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf
   = `ENsphere%20DICOM%203%20Conformance%20Statement.pdf'
Resolving www.medical.philips.com... 161.88.247.197
Connecting to www.medical.philips.com|161.88.247.197|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1,936 (1.9K) [text/html]
Last-modified header missing -- time-stamps turned off.
--19:53:33--  
http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf
   = `ENsphere%20DICOM%203%20Conformance%20Statement.pdf'
Reusing existing connection to www.medical.philips.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 1,936 (1.9K) [text/html]

0K . 100%  694.97 KB/s

19:53:33 (694.97 KB/s) -
`ENsphere%20DICOM%203%20Conformance%20Statement.pdf' saved [1936/1936]

--19:53:33--  
http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf
   = `usit15l3_final.pdf'
Reusing existing connection to www.medical.philips.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 217,998 (213K) [application/octet-stream]
Server file no newer than local file `usit15l3_final.pdf' -- not retrieving.


FINISHED --19:53:33--
Downloaded: 1,936 bytes in 1 files


On Fri, Mar 21, 2008 at 7:00 PM, Debian Bug Tracking System
[EMAIL PROTECTED] wrote:

  This is an automatic notification regarding your Bug report
  which was filed against the wget package:

  #471970: wget -N and space in the path (HTML encoding)

  It has been closed by Micah Cowan [EMAIL PROTECTED].

  Their explanation is attached below along with your original report.
  If this explanation is unsatisfactory and you have not received a
  better one in a separate message then please contact Micah Cowan [EMAIL 
 PROTECTED] by
  replying to this email.


  --
  471970: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471970
  Debian Bug Tracking System
  Contact [EMAIL PROTECTED] with problems


 -- Forwarded message --
 From: Micah Cowan [EMAIL PROTECTED]
 To: Mathieu Malaterre [EMAIL PROTECTED], [EMAIL PROTECTED]
 Date: Fri, 21 Mar 2008 10:56:28 -0700
 Subject: Re: Bug#471970: wget -N and space in the path (HTML encoding)
 -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  Mathieu Malaterre wrote:
   Package: wget
   Version: 1.10.2-0bpo1
   Severity: normal
  
  
   wget -N does not work when filename has a space in the filename.
  
   Steps to reproduce:
  
   $ echo
   
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf;
   dummy.txt
   $ wget -N -i dummy.txt
   $ wget -N -i dummy.txt
  
   the second time, the file should not have been downloaded.

  In the log that Wget issues while downloading that file, is the line:

   Last-modified header missing -- time-stamps turned off.

  Your issue has nothing to do with spaces in the filename (at least, on
  Wget's end), and everything to do with the server not telling wget when
  it was last modified. Therefore, wget cannot determine whether the file
  on the server is newer or older than the local copy.

  - --
  Micah J. Cowan
  Programmer, musician, typesetting enthusiast, gamer...
  http://micah.cowan.name/
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.6 (GNU/Linux)
  Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

  iD8DBQFH4/bM7M8hyUobTrERApCdAJsFlWyubh1pnVY8qwgatoZPRWDXBgCdFyVn
  yjgZ+itvfDouqQ40WL3C4BE=
  =Mn8U
  -END PGP SIGNATURE-



 -- Forwarded message --
 From: Mathieu Malaterre [EMAIL PROTECTED]
 To: Debian Bug Tracking System [EMAIL PROTECTED]
 Date: Fri, 21 Mar 2008 14:27:51 +0100
 Subject: wget -N and space in the path (HTML encoding)
 Package: wget
  Version: 1.10.2-0bpo1
  Severity: normal


  wget -N does not work when filename has a space in the filename.

  Steps to reproduce:

  $ echo
  
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf;
   dummy.txt
  $ wget -N -i dummy.txt
  $ wget -N -i dummy.txt

  the second time, the file should not have been downloaded. I suspect the
  use of % in the HTML URL encoding is not being

Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))

2008-03-21 Thread Mathieu Malaterre
Ra !

Ok I finally found the issue, AND I WAS RIGHT ! Sorry :(

Try this:

$ cat dummy.txt
a 
href=http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere
DICOM 3 Conformance Statement.pdfdummy/a

Then
$ wget -N  --force-html -i dummy.txt

--20:03:04--  
http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%20DICOM%203%20Conformance%20Statement.pdf
   = `ENsphere DICOM 3 Conformance Statement.pdf'
Resolving www.medical.philips.com... 161.88.247.197
Connecting to www.medical.philips.com|161.88.247.197|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 334,708 (327K) [application/octet-stream]
Server file no newer than local file `ENsphere DICOM 3 Conformance
Statement.pdf' -- not retrieving.


So please reopn the bug report, as I really believe wget -i and space
in the path is not working.

Thank you
-Mathieu

On Fri, Mar 21, 2008 at 7:54 PM, Mathieu Malaterre
[EMAIL PROTECTED] wrote:
 I still do not believe this has anything to do with the server, if you
  have a couple of seconds please try this file instead:

  
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf
  
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf

  You'll see that both files are stored at the exact same location, but
  wget report two different things (*). I *seriously* doubt the server
  has a per file configuration...

  Thanks for your time anyway,
  -Mathieu

  (*)
  --19:53:33--  
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf
= `ENsphere%20DICOM%203%20Conformance%20Statement.pdf'
  Resolving www.medical.philips.com... 161.88.247.197
  Connecting to www.medical.philips.com|161.88.247.197|:80... connected.
  HTTP request sent, awaiting response... 200 OK
  Length: 1,936 (1.9K) [text/html]
  Last-modified header missing -- time-stamps turned off.
  --19:53:33--  
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf
= `ENsphere%20DICOM%203%20Conformance%20Statement.pdf'
  Reusing existing connection to www.medical.philips.com:80.
  HTTP request sent, awaiting response... 200 OK
  Length: 1,936 (1.9K) [text/html]

 0K . 100%  694.97 KB/s

  19:53:33 (694.97 KB/s) -
  `ENsphere%20DICOM%203%20Conformance%20Statement.pdf' saved [1936/1936]

  --19:53:33--  
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf
= `usit15l3_final.pdf'
  Reusing existing connection to www.medical.philips.com:80.
  HTTP request sent, awaiting response... 200 OK
  Length: 217,998 (213K) [application/octet-stream]
  Server file no newer than local file `usit15l3_final.pdf' -- not retrieving.


  FINISHED --19:53:33--
  Downloaded: 1,936 bytes in 1 files




  On Fri, Mar 21, 2008 at 7:00 PM, Debian Bug Tracking System
  [EMAIL PROTECTED] wrote:
  
This is an automatic notification regarding your Bug report
which was filed against the wget package:
  
#471970: wget -N and space in the path (HTML encoding)
  
It has been closed by Micah Cowan [EMAIL PROTECTED].
  
Their explanation is attached below along with your original report.
If this explanation is unsatisfactory and you have not received a
better one in a separate message then please contact Micah Cowan [EMAIL 
 PROTECTED] by
replying to this email.
  
  
--
471970: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471970
Debian Bug Tracking System
Contact [EMAIL PROTECTED] with problems
  
  
   -- Forwarded message --
   From: Micah Cowan [EMAIL PROTECTED]
   To: Mathieu Malaterre [EMAIL PROTECTED], [EMAIL PROTECTED]
   Date: Fri, 21 Mar 2008 10:56:28 -0700
   Subject: Re: Bug#471970: wget -N and space in the path (HTML encoding)
   -BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
  
Mathieu Malaterre wrote:
 Package: wget
 Version: 1.10.2-0bpo1
 Severity: normal


 wget -N does not work when filename has a space in the filename.

 Steps to reproduce:

 $ echo
 
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf;
   dummy.txt
 $ wget -N -i dummy.txt
 $ wget -N -i dummy.txt

 the second time, the file should not have been downloaded.
  
In the log that Wget issues while downloading that file, is the line:
  
 Last-modified header missing -- time-stamps turned off.
  
Your issue has nothing to do with spaces in the filename (at least, on
Wget's end), and everything to do with the server not telling wget when
it was last modified. Therefore, wget cannot determine

Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))

2008-03-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mathieu Malaterre wrote:
 $ wget -N  --force-html -i dummy.txt
 
 --20:03:04--  
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%20DICOM%203%20Conformance%20Statement.pdf
= `ENsphere DICOM 3 Conformance Statement.pdf'
 Resolving www.medical.philips.com... 161.88.247.197
 Connecting to www.medical.philips.com|161.88.247.197|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: 334,708 (327K) [application/octet-stream]
 Server file no newer than local file `ENsphere DICOM 3 Conformance
 Statement.pdf' -- not retrieving.
 
 So please reopn the bug report, as I really believe wget -i and space
 in the path is not working.

I'm at a loss as to how the above demonstrates a problem. It decided not
to download the file, because it was no newer than the local copy. Isn't
that the behavior you were asking for? That's certainly what -N is
intended for.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH5Az87M8hyUobTrERAoqbAJ9Ka/lE01OmC1cCWYMEVQqDfKV0iwCdEzOe
qPqHaLYwnXPrT6AMnKOLkiQ=
=NKzV
-END PGP SIGNATURE-



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))

2008-03-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mathieu Malaterre wrote:
 I still do not believe this has anything to do with the server, if you
 have a couple of seconds please try this file instead:
 
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf

The first one is not a good link (note that it doesn't contain spaces;
it contains %2520, which decodes to %20, literally). It goes to a
Page not found page (which lacks a modification timestamp, as you can
see from the logs). However, the server issues a 200 HTTP status code
for that page, so Wget can't know that it's not a good file (it does,
however, notice that it received HTML and not PDF or
application/octet-stream.

In your other example, you corrected the percent-encoding, and lo! it
worked.

FWIW, it's helpful to run wget with --debug so you can see the headers
involved, in addition to other helpful information.

- --
HAND,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH5A9o7M8hyUobTrERAq7sAJ9MdOa5y+bTi2P2rDC2xywmPOu9/gCggwnr
ALk51+oYcQ9lrUjuDDzREVE=
=/E8u
-END PGP SIGNATURE-



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))

2008-03-21 Thread Mathieu Malaterre
On Fri, Mar 21, 2008 at 8:41 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  Mathieu Malaterre wrote:
   I still do not believe this has anything to do with the server, if you
   have a couple of seconds please try this file instead:
  
   
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf
   
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/usit15l3_final.pdf

  The first one is not a good link (note that it doesn't contain spaces;
  it contains %2520, which decodes to %20, literally). It goes to a
  Page not found page (which lacks a modification timestamp, as you can
  see from the logs). However, the server issues a 200 HTTP status code
  for that page, so Wget can't know that it's not a good file (it does,
  however, notice that it received HTML and not PDF or
  application/octet-stream.

  In your other example, you corrected the percent-encoding, and lo! it
  worked.

Thanks. You can close the bug.

Sorry for the noise.
-- 
Mathieu



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))

2008-03-21 Thread Mathieu Malaterre
 contact Micah Cowan 
 [EMAIL PROTECTED] by
  replying to this email.


  --
  471970: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471970
  Debian Bug Tracking System
  Contact [EMAIL PROTECTED] with problems


 -- Forwarded message --
 From: Micah Cowan [EMAIL PROTECTED]
 To: Mathieu Malaterre [EMAIL PROTECTED], [EMAIL PROTECTED]
 Date: Fri, 21 Mar 2008 10:56:28 -0700
 Subject: Re: Bug#471970: wget -N and space in the path (HTML encoding)
 -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  Mathieu Malaterre wrote:
   Package: wget
   Version: 1.10.2-0bpo1
   Severity: normal
  
  
   wget -N does not work when filename has a space in the filename.
  
   Steps to reproduce:
  
   $ echo
   
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf;
   dummy.txt
   $ wget -N -i dummy.txt
   $ wget -N -i dummy.txt
  
   the second time, the file should not have been downloaded.

  In the log that Wget issues while downloading that file, is the line:

   Last-modified header missing -- time-stamps turned off.

  Your issue has nothing to do with spaces in the filename (at least, on
  Wget's end), and everything to do with the server not telling wget when
  it was last modified. Therefore, wget cannot determine whether the file
  on the server is newer or older than the local copy.

  - --
  Micah J. Cowan
  Programmer, musician, typesetting enthusiast, gamer...
  http://micah.cowan.name/
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.6 (GNU/Linux)
  Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

  iD8DBQFH4/bM7M8hyUobTrERApCdAJsFlWyubh1pnVY8qwgatoZPRWDXBgCdFyVn
  yjgZ+itvfDouqQ40WL3C4BE=
  =Mn8U
  -END PGP SIGNATURE-



 -- Forwarded message --
 From: Mathieu Malaterre [EMAIL PROTECTED]
 To: Debian Bug Tracking System [EMAIL PROTECTED]
 Date: Fri, 21 Mar 2008 14:27:51 +0100
 Subject: wget -N and space in the path (HTML encoding)
 Package: wget
  Version: 1.10.2-0bpo1
  Severity: normal


  wget -N does not work when filename has a space in the filename.

  Steps to reproduce:

  $ echo
  
 http://www.medical.philips.com/us/company/connectivity/assets/docs/dicomcs/ENsphere%2520DICOM%25203%2520Conformance%2520Statement.pdf;
   dummy.txt
  $ wget -N -i dummy.txt
  $ wget -N -i dummy.txt

  the second time, the file should not have been downloaded. I suspect 
 the
  use of % in the HTML URL encoding is not being decoded properly for use
  in the -N option.

  Thanks !

  -- System Information:
  Debian Release: 3.1
  Architecture: i386 (i686)
  Kernel: Linux 2.6.18-4-686-bigmem
  Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

  Versions of packages wget depends on:
  ii  libc6 2.5-9+b1   GNU C Library: Shared 
 libraries
  ii  libssl0.9.7   0.9.7e-3sarge5 SSL shared libraries

  -- no debconf information




  
  
  
--
Mathieu
  



  --
  Mathieu




-- 
Mathieu



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#471970: closed by Micah Cowan [EMAIL PROTECTED] (Re: Bug#471970: wget -N and space in the path (HTML encoding))

2008-03-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mathieu Malaterre wrote:
 Thanks. You can close the bug.
 
 Sorry for the noise.

No worries. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH5BTM7M8hyUobTrERApb0AKCO3+6R+icl6HXdsS5OnjJZIh5aTQCeNOrN
W6PD1av933wZhtfvofGjYYk=
=hKMz
-END PGP SIGNATURE-



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]