This is my first attempt to fix 40426. I'm expecting to continue to work on it
until it's fixed, unless someone has objections.
Source of the problem:
When -r and -O - are specified together, Wget hangs. Well, it really doesn't
hang, it's actually waiting for input. When Wget downloads a file it saves it
to the disk first and then reads it again from there to parse it and get more
URLs, if any. But when '-O -' was specified, Wget reads from stdin. This is
done in wget_read_file(). As a funny game, when this happens, type some HTML,
and then hit Ctrl-D or any other key sequence equivalent to EOF, and you'll
effectively trick Wget into inserting arbitrary HTML.
Solution:
A patch was already proposed that simply avoided '-r' and '-O -' to be set together. But that would
void a clause in the documentation that states that "wget -O file http://foo" is intended to
work like "wget -O - http://foo > file". So, my approach is to maintain that.
Initially I thought of redirecting stdout to stdin, but that would be an ugly
hack that would probably make things difficult in the future. What's more,
there are options that rely on stdin, such as '--input-file=-'.
So, what I did is to write to a regular file in /tmp/.wget.stdout instead of in
stdout when '-O -' is passed, and dump everything in the end. This, apart from
fixing the bug, maintains the documented behavior.
Although this is a good workaround for the short term, IMO the best approach is
to keep everything in memory and write stuff out in the end. Something similar
was already reported at 20714.
I don't expect this patch to be definite. This is just a follow-up to my work.
I'm looking forward to your feedback as I sure haven't taken into account all
the consequences.
So far, the following are still to be done:
- /tmp/.wget.stdout won't work on Windows, so port it.
- FOPEN_OPT_ARGS is not taken into account.
- I only took into account HTTP. The same workaround should be applied to FTP
too.
Here goes:
diff --git a/src/http.c b/src/http.c
index b7020ef..bc7c1e9 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3080,7 +3080,7 @@ http_loop (struct url *u, struct url *original_url, char
**newloc,
/* Set LOCAL_FILE parameter. */
if (local_file && opt.output_document)
- *local_file = HYPHENP (opt.output_document) ? NULL : xstrdup
(opt.output_document);
+ *local_file = HYPHENP (opt.output_document) ? xstrdup (TMP_OUTFILE) :
xstrdup (opt.output_document);
/* Reset NEWLOC parameter. */
*newloc = NULL;
@@ -3101,7 +3101,7 @@ http_loop (struct url *u, struct url *original_url, char
**newloc,
if (opt.output_document)
{
- hstat.local_file = xstrdup (opt.output_document);
+ hstat.local_file = HYPHENP (opt.output_document) ? xstrdup (TMP_OUTFILE)
: xstrdup (opt.output_document);
got_name = true;
}
else if (!opt.content_disposition)
diff --git a/src/main.c b/src/main.c
index b23967b..1bf7565 100644
--- a/src/main.c
+++ b/src/main.c
@@ -1582,7 +1582,17 @@ for details.\n\n"));
#ifdef WINDOWS
_setmode (_fileno (stdout), _O_BINARY);
#endif
- output_stream = stdout;
+ // TODO We should take care of FOPEN_OPT_ARGS
+ output_stream = fopen (TMP_OUTFILE, "wb+");
+ if (output_stream == NULL)
+ {
+ perror (TMP_OUTFILE);
+ exit (WGET_EXIT_GENERIC_ERROR);
+ }
+ /*
+ * We know it's a regular file. No need to check.
+ */
+ output_stream_regular = true;
}
else
{
@@ -1762,8 +1772,24 @@ outputting to a regular file.\n"));
if (opt.convert_links && !opt.delete_after)
convert_all_links ();
+ /* If output file is stdout (`-O -' was specified), obey.
+ Print stuff out. */
+ if (HYPHENP (opt.output_document) && output_stream && (total_downloaded_bytes
> 0))
+ {
+ struct file_memory *fm = wget_read_file (TMP_OUTFILE);
+ if (fm)
+ {
+ write (fileno (stdout), fm->content, fm->length);
+ wget_read_file_free (fm);
+ }
+ }
+
cleanup ();
+ /* Delete the temporary output file */
+ if (HYPHENP (opt.output_document) && unlink (TMP_OUTFILE))
+ logprintf (LOG_NOTQUIET, "Temporary output file was not deleted. Should be done
by hand. %s\n", strerror (errno));
+
exit (get_exit_status ());
}
diff --git a/src/wget.h b/src/wget.h
index 8d2b0f1..9d40255 100644
--- a/src/wget.h
+++ b/src/wget.h
@@ -126,6 +126,12 @@ as that of the covered work. */
#define DEBUGP(args) do { IF_DEBUG { debug_logprintf args; } } while (0)
+/* If we're outputting to stdout ('-O -' was specified), we'd rather
+ write to a temporary file and output everything in the end, since
+ Wget still needs the downloaded files to be regular in order to
+ parse them. */
+#define TMP_OUTFILE "/tmp/.wget.stdout"
+
/* Pick an integer type large enough for file sizes, content lengths,
and such. Because today's files can be very large, it should be a
signed integer at least 64 bits wide. This can't be typedeffed to
--
Regards,
- AJ