[Bug-wget] wget and srcset tag

2017-06-11 Thread chris
Hi,

I'm just wondering if I've possibly found a bug, unless I'm just doing
something incorrectly (which I assume is more likely).

I grab my webpage using 'wget -T1 -t1 -E -k -H -nd -N -p -P site_output
https://www.anfractuosity.com/projects/ultrasound-networking/ > note1 2>
note2'

But i notice the srcset tags in the resulting downloaded files produce
'srcset="fsk.png.html 533w, fsk-266x300.png 266w" sizes="(max-width: 533px)
100vw, 533px" />' in the output index.html.

On the actual webpage it looks like "srcset="
https://www.anfractuosity.com/wp-content/uploads/2014/02/fft.png 762w,"
no .html extension on the .png.

Cheers
Chris


Re: [Bug-wget] wget and srcset tag

2017-06-12 Thread Tim Rühsen
Hi Chris,


On 06/11/2017 05:24 PM, chris wrote:
> Hi,
> 
> I'm just wondering if I've possibly found a bug, unless I'm just doing
> something incorrectly (which I assume is more likely).
> 
> I grab my webpage using 'wget -T1 -t1 -E -k -H -nd -N -p -P site_output
> https://www.anfractuosity.com/projects/ultrasound-networking/ > note1 2>
> note2'
> 
> But i notice the srcset tags in the resulting downloaded files produce
> 'srcset="fsk.png.html 533w, fsk-266x300.png 266w" sizes="(max-width: 533px)
> 100vw, 533px" />' in the output index.html.
> 
> On the actual webpage it looks like "srcset="
> https://www.anfractuosity.com/wp-content/uploads/2014/02/fft.png 762w,"
> no .html extension on the .png.

You requested -E (--adjust-extension) and -k (--convert-links).
That would change the file name when the server tags the file as
content-type 'text/html'. You could see that in the debug output
(options -d or --debug).

> 
> Cheers
> Chris
> 

With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget and srcset tag

2017-06-12 Thread chris
Hi Tim,

Thanks for your reply, I notice the following in the debug logs:

"""
will convert url
http://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to local
site_output/fsk.png
will convert url
https://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to local
site_output/fsk.png.html
"""

The difference between those URLs seems to be one is https and one isn't.
When I wget those URLs though, both seem to return a .png, with 'Length:
51068 (50K) [image/png]'.

So I'm a bit confused why I get the fsk.png.html URL.

cheers
Chris

On Mon, Jun 12, 2017 at 9:08 AM, Tim Rühsen  wrote:

> Hi Chris,
>
>
> On 06/11/2017 05:24 PM, chris wrote:
> > Hi,
> >
> > I'm just wondering if I've possibly found a bug, unless I'm just doing
> > something incorrectly (which I assume is more likely).
> >
> > I grab my webpage using 'wget -T1 -t1 -E -k -H -nd -N -p -P site_output
> > https://www.anfractuosity.com/projects/ultrasound-networking/ > note1 2>
> > note2'
> >
> > But i notice the srcset tags in the resulting downloaded files produce
> > 'srcset="fsk.png.html 533w, fsk-266x300.png 266w" sizes="(max-width:
> 533px)
> > 100vw, 533px" />' in the output index.html.
> >
> > On the actual webpage it looks like "srcset="
> > https://www.anfractuosity.com/wp-content/uploads/2014/02/fft.png
> 762w,"
> > no .html extension on the .png.
>
> You requested -E (--adjust-extension) and -k (--convert-links).
> That would change the file name when the server tags the file as
> content-type 'text/html'. You could see that in the debug output
> (options -d or --debug).
>
> >
> > Cheers
> > Chris
> >
>
> With Best Regards, Tim
>
>


Re: [Bug-wget] wget and srcset tag

2017-06-12 Thread Tim Rühsen
On 06/12/2017 10:27 AM, chris wrote:
> Hi Tim,
> 
> Thanks for your reply, I notice the following in the debug logs:
> 
> """
> will convert url
> http://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to local
> site_output/fsk.png
> will convert url
> https://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to local
> site_output/fsk.png.html
> """
> 
> The difference between those URLs seems to be one is https and one isn't.
> When I wget those URLs though, both seem to return a .png, with 'Length:
> 51068 (50K) [image/png]'.
> 
> So I'm a bit confused why I get the fsk.png.html URL.

What version of wget are you using ? (1.19.1 here)

I tried some combinations of srcset (with https and http) and your
original options. I thought of an issue with redirection (because that's
an answer with text/html Content-Type).

Could you create a small reproducer page ? e.g. like

https://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png
533w,
http://www.anfractuosity.com/wp-content/uploads/2014/02/fsk-266x300.png
266w">


With whatever paths you are using for the .png files.
I don't want to download tons of files (limited bandwidth here).

> cheers
> Chris
> 
> On Mon, Jun 12, 2017 at 9:08 AM, Tim Rühsen  wrote:
> 
>> Hi Chris,
>>
>>
>> On 06/11/2017 05:24 PM, chris wrote:
>>> Hi,
>>>
>>> I'm just wondering if I've possibly found a bug, unless I'm just doing
>>> something incorrectly (which I assume is more likely).
>>>
>>> I grab my webpage using 'wget -T1 -t1 -E -k -H -nd -N -p -P site_output
>>> https://www.anfractuosity.com/projects/ultrasound-networking/ > note1 2>
>>> note2'
>>>
>>> But i notice the srcset tags in the resulting downloaded files produce
>>> 'srcset="fsk.png.html 533w, fsk-266x300.png 266w" sizes="(max-width:
>> 533px)
>>> 100vw, 533px" />' in the output index.html.
>>>
>>> On the actual webpage it looks like "srcset="
>>> https://www.anfractuosity.com/wp-content/uploads/2014/02/fft.png
>> 762w,"
>>> no .html extension on the .png.
>>
>> You requested -E (--adjust-extension) and -k (--convert-links).
>> That would change the file name when the server tags the file as
>> content-type 'text/html'. You could see that in the debug output
>> (options -d or --debug).
>>
>>>
>>> Cheers
>>> Chris
>>>
>>
>> With Best Regards, Tim



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] wget and srcset tag

2017-06-12 Thread Chris
Hi Tim,

I just created a test page at -
https://www.anfractuosity.com/files/test2.html
were I still get the issue.

The version is 'GNU Wget 1.19.1 built on linux-gnu.'

cheers
Chris


On 12 June 2017 at 15:35, Tim Rühsen  wrote:

> On 06/12/2017 10:27 AM, chris wrote:
> > Hi Tim,
> >
> > Thanks for your reply, I notice the following in the debug logs:
> >
> > """
> > will convert url
> > http://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to local
> > site_output/fsk.png
> > will convert url
> > https://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to
> local
> > site_output/fsk.png.html
> > """
> >
> > The difference between those URLs seems to be one is https and one isn't.
> > When I wget those URLs though, both seem to return a .png, with 'Length:
> > 51068 (50K) [image/png]'.
> >
> > So I'm a bit confused why I get the fsk.png.html URL.
>
> What version of wget are you using ? (1.19.1 here)
>
> I tried some combinations of srcset (with https and http) and your
> original options. I thought of an issue with redirection (because that's
> an answer with text/html Content-Type).
>
> Could you create a small reproducer page ? e.g. like
> 
>  srcset="https://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png
> 533w,
> http://www.anfractuosity.com/wp-content/uploads/2014/02/fsk-266x300.png
> 266w">
> 
>
> With whatever paths you are using for the .png files.
> I don't want to download tons of files (limited bandwidth here).
>
> > cheers
> > Chris
> >
> > On Mon, Jun 12, 2017 at 9:08 AM, Tim Rühsen  wrote:
> >
> >> Hi Chris,
> >>
> >>
> >> On 06/11/2017 05:24 PM, chris wrote:
> >>> Hi,
> >>>
> >>> I'm just wondering if I've possibly found a bug, unless I'm just doing
> >>> something incorrectly (which I assume is more likely).
> >>>
> >>> I grab my webpage using 'wget -T1 -t1 -E -k -H -nd -N -p -P site_output
> >>> https://www.anfractuosity.com/projects/ultrasound-networking/ > note1
> 2>
> >>> note2'
> >>>
> >>> But i notice the srcset tags in the resulting downloaded files produce
> >>> 'srcset="fsk.png.html 533w, fsk-266x300.png 266w" sizes="(max-width:
> >> 533px)
> >>> 100vw, 533px" />' in the output index.html.
> >>>
> >>> On the actual webpage it looks like "srcset="
> >>> https://www.anfractuosity.com/wp-content/uploads/2014/02/fft.png
> >> 762w,"
> >>> no .html extension on the .png.
> >>
> >> You requested -E (--adjust-extension) and -k (--convert-links).
> >> That would change the file name when the server tags the file as
> >> content-type 'text/html'. You could see that in the debug output
> >> (options -d or --debug).
> >>
> >>>
> >>> Cheers
> >>> Chris
> >>>
> >>
> >> With Best Regards, Tim
>
>


Re: [Bug-wget] wget and srcset tag

2017-06-12 Thread Tim Rühsen
On Montag, 12. Juni 2017 17:07:30 CEST Chris wrote:
> Hi Tim,
> 
> I just created a test page at -
> https://www.anfractuosity.com/files/test2.html
> were I still get the issue.
> 
> The version is 'GNU Wget 1.19.1 built on linux-gnu.'

Thanks, Chris.

The issue is reproducible with latest git, thanks to your test page. 
I'll create a test case tomorrow and then we'll fix it.
It has something to do with If-Modified-Since. If you use 
--no-if-modified-since 
the links are converted correctly.

The good news is: Wget2 (https://gitlab.com/gnuwget/wget2) does it correctly 
:-)

With Best Regards, Tim

> 
> cheers
> Chris
> 
> On 12 June 2017 at 15:35, Tim Rühsen  wrote:
> > On 06/12/2017 10:27 AM, chris wrote:
> > > Hi Tim,
> > > 
> > > Thanks for your reply, I notice the following in the debug logs:
> > > 
> > > """
> > > will convert url
> > > http://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to local
> > > site_output/fsk.png
> > > will convert url
> > > https://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to
> > 
> > local
> > 
> > > site_output/fsk.png.html
> > > """
> > > 
> > > The difference between those URLs seems to be one is https and one
> > > isn't.
> > > When I wget those URLs though, both seem to return a .png, with 'Length:
> > > 51068 (50K) [image/png]'.
> > > 
> > > So I'm a bit confused why I get the fsk.png.html URL.
> > 
> > What version of wget are you using ? (1.19.1 here)
> > 
> > I tried some combinations of srcset (with https and http) and your
> > original options. I thought of an issue with redirection (because that's
> > an answer with text/html Content-Type).
> > 
> > Could you create a small reproducer page ? e.g. like
> > 
> >  > srcset="https://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png
> > 533w,
> > http://www.anfractuosity.com/wp-content/uploads/2014/02/fsk-266x300.png
> > 266w">
> > 
> > 
> > With whatever paths you are using for the .png files.
> > I don't want to download tons of files (limited bandwidth here).
> > 
> > > cheers
> > > Chris
> > > 
> > > On Mon, Jun 12, 2017 at 9:08 AM, Tim Rühsen  wrote:
> > >> Hi Chris,
> > >> 
> > >> On 06/11/2017 05:24 PM, chris wrote:
> > >>> Hi,
> > >>> 
> > >>> I'm just wondering if I've possibly found a bug, unless I'm just doing
> > >>> something incorrectly (which I assume is more likely).
> > >>> 
> > >>> I grab my webpage using 'wget -T1 -t1 -E -k -H -nd -N -p -P
> > >>> site_output
> > >>> https://www.anfractuosity.com/projects/ultrasound-networking/ > note1
> > 
> > 2>
> > 
> > >>> note2'
> > >>> 
> > >>> But i notice the srcset tags in the resulting downloaded files produce
> > >> 
> > >>> 'srcset="fsk.png.html 533w, fsk-266x300.png 266w" sizes="(max-width:
> > >> 533px)
> > >> 
> > >>> 100vw, 533px" />' in the output index.html.
> > >>> 
> > >>> On the actual webpage it looks like "srcset="
> > >>> https://www.anfractuosity.com/wp-content/uploads/2014/02/fft.png
> > >> 
> > >> 762w,"
> > >> 
> > >>> no .html extension on the .png.
> > >> 
> > >> You requested -E (--adjust-extension) and -k (--convert-links).
> > >> That would change the file name when the server tags the file as
> > >> content-type 'text/html'. You could see that in the debug output
> > >> (options -d or --debug).
> > >> 
> > >>> Cheers
> > >>> Chris
> > >> 
> > >> With Best Regards, Tim



signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] wget and srcset tag

2017-06-13 Thread Tim Rühsen
Fixed in current master (fix release will be 1.19.2).

Thanks for your report and help !


With Best Regards, Tim



On 06/12/2017 06:07 PM, Chris wrote:
> Hi Tim,
> 
> I just created a test page at -
> https://www.anfractuosity.com/files/test2.html
> were I still get the issue.
> 
> The version is 'GNU Wget 1.19.1 built on linux-gnu.'
> 
> cheers
> Chris
> 
> 
> On 12 June 2017 at 15:35, Tim Rühsen  wrote:
> 
>> On 06/12/2017 10:27 AM, chris wrote:
>>> Hi Tim,
>>>
>>> Thanks for your reply, I notice the following in the debug logs:
>>>
>>> """
>>> will convert url
>>> http://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to local
>>> site_output/fsk.png
>>> will convert url
>>> https://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png to
>> local
>>> site_output/fsk.png.html
>>> """
>>>
>>> The difference between those URLs seems to be one is https and one isn't.
>>> When I wget those URLs though, both seem to return a .png, with 'Length:
>>> 51068 (50K) [image/png]'.
>>>
>>> So I'm a bit confused why I get the fsk.png.html URL.
>>
>> What version of wget are you using ? (1.19.1 here)
>>
>> I tried some combinations of srcset (with https and http) and your
>> original options. I thought of an issue with redirection (because that's
>> an answer with text/html Content-Type).
>>
>> Could you create a small reproducer page ? e.g. like
>> 
>> > srcset="https://www.anfractuosity.com/wp-content/uploads/2014/02/fsk.png
>> 533w,
>> http://www.anfractuosity.com/wp-content/uploads/2014/02/fsk-266x300.png
>> 266w">
>> 
>>
>> With whatever paths you are using for the .png files.
>> I don't want to download tons of files (limited bandwidth here).
>>
>>> cheers
>>> Chris
>>>
>>> On Mon, Jun 12, 2017 at 9:08 AM, Tim Rühsen  wrote:
>>>
 Hi Chris,


 On 06/11/2017 05:24 PM, chris wrote:
> Hi,
>
> I'm just wondering if I've possibly found a bug, unless I'm just doing
> something incorrectly (which I assume is more likely).
>
> I grab my webpage using 'wget -T1 -t1 -E -k -H -nd -N -p -P site_output
> https://www.anfractuosity.com/projects/ultrasound-networking/ > note1
>> 2>
> note2'
>
> But i notice the srcset tags in the resulting downloaded files produce
> 'srcset="fsk.png.html 533w, fsk-266x300.png 266w" sizes="(max-width:
 533px)
> 100vw, 533px" />' in the output index.html.
>
> On the actual webpage it looks like "srcset="
> https://www.anfractuosity.com/wp-content/uploads/2014/02/fft.png
 762w,"
> no .html extension on the .png.

 You requested -E (--adjust-extension) and -k (--convert-links).
 That would change the file name when the server tags the file as
 content-type 'text/html'. You could see that in the debug output
 (options -d or --debug).

>
> Cheers
> Chris
>

 With Best Regards, Tim
>>
>>
> 



signature.asc
Description: OpenPGP digital signature