URL:
<https://savannah.gnu.org/bugs/?60442>
Summary: On 416 ( CONTENT_RANGE_NOT_SATISFIABLE ) wget sets
the content type from the error response, not the original url type.
Project: GNU Wget
Submitted by: None
Submitted on: Thu 22 Apr 2021 07:31:18 PM UTC
Category: Crash/Freeze/Infloop
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name: jake
Originator Email: [email protected]
Open/Closed: Open
Release: None
Discussion Lock: Any
Operating System: GNU/Linux
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Details:
this looks to have been there for a while.
you do a recursive wget, with continue set on an HTTP site.
if there's still data to be retrieved, you get the real content type.
if there's not, you potentially get an html error.
if that happens, wget tries to parse additional urls out of the original file,
and will OOM on large binaries.
ugly fix: on continue, try to get from (currentsize-10) which will ensure the
content-type header matches the original file.
proper fix: maybe head-request the url to get the content-type from that ?
hth.
Jake
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?60442>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/