Hi,
I am having some trouble with a wget task that is performing an HTTPS
download of a large file (a 2.9GB 7Zip archive) every night. It is
scheduled to run as part of an SSIS package, using the SSIS Execute
Process task. The wget arguments passed are as follows (I've replaced
sensitive information with the "x" character):
--http-user=SMP\xxxxxxxx --http-passwd=xxxxxxxx
--output-document=C:\DB_Downloads\xxxxxxxx.7z
--output-file=C:\DB_Downloads\log.txt --continue --timeout=300
--tries=20 --no-check-certificate
https://download.xxxxxxxx.net/xxxxxxxx/xxxxxxxx.7z
A typical scenario is for the SSIS package to start and the wget task
starts logging information about the progress of the download. After
some random interval (there is no discernible pattern in terms of time),
the progress of the download will stop. For example, the download may
start at 01:00 and stop at 01:07, as indicated by the modified date/time
of the log file. Despite the timeout switch telling it to give up and
try again after 5 minutes of no data received, this never happens. The
wget task has been observed to run for hours (e.g. until after 9am)
without progressing the download by a single byte and also not timing
out or reporting any kind of error. The only way I can force a restart
is to put a timeout setting on the SSIS Execute Process task and use SQL
Server Agent to perform a TASKKILL on any active wget processes and
restart the package. Invariably, when a new instance of the SSIS package
starts, the download continues from where it left off and carries on,
perhaps to completion, perhaps not.
I have run a number of ping tests over recent nights, which reveal that
there is no significant loss of connectivity with the HTTPS server.
I am able to achieve a successful download of the file if I configure
enough SQl Server Agent job steps to kill wget and launch another
instance of the SSIS package. Sometimes the file will download
successfully with one or two runs of the SSIS package, sometimes it
needs up to five attempts. SSIS is currently configured to allow 90
minutes for the download and the client site has a 10Mbps leased line to
the Internet. There's no way to predict how much of the 90 minutes will
be spent actively progressing the download and how many will be spent
idle. For example, download attempt 1 may last for 30 minutes then
become idle, download attempt 2 may last for the full 90 minutes,
download attempt 3 may last for just 1 minute, then download attempt 4
may successfully complete the download with further 20 minutes.
I am using the Win32 wget.exe from SourceForge and it's been working
fine for the best part of two years. I think something has changed on
the side of the download server but this is the domain of a third party.
I've wondered if they have recently put in some kind of IPS that is
randomly blocking the data packets but this is just speculation and the
third party involved claim nothing has changed on their side. My
question is: why does the timeout switch not help in this scenario and
is this therefore a bug?
I have log files (with debug output) available, as well as the wgetrc
file if this is useful. Any help or advice will be much appreciated.
Thanks,
Ian
--
- [Bug-wget] Timeout switch not working Ian Bradshaw
-