[Bug-wget] file download using Wget

2012-03-30 Thread Bidwell, Eugene x23787
Hi I have a need to download a text file from a site 
(http://www.theocc.com/webapps/position-limits) which uses a button that 
activates a javascript routine to generate a dialog box for downloading a file 
to my pc using http.  Will the Wget utility allow me to retrieve that data from 
a program?

Thanks


This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.


Re: [Bug-wget] Concurrency and wget

2012-03-30 Thread Giuseppe Scrivano
Micah Cowan  writes:

> http://wget.addictivecode.org/Concurrency

Lovely!

> Giuseppe, when you have the chance, please also update the information
> on the ideas page with this link.

just done.  I also forgot to change the mentor, it is fixed now :-)

Thanks,
Giuseppe



[Bug-wget] [patch] warc_uuid_str() implementation depend on HAVE_LIBUUID

2012-03-30 Thread Tim Ruehsen
Hi,

here is a patch for conditionally compiling warc_uuid_str depending on 
HAVE_LIBUUID.

Tim
=== modified file 'src/ChangeLog'
--- src/ChangeLog	2012-03-25 15:49:55 +
+++ src/ChangeLog	2012-03-30 15:40:57 +
@@ -1,3 +1,7 @@
+2012-03-30  Tim Ruehsen  
+
+	* warc.c: make warc_uuid_str() implementation depend on HAVE_LIBUUID
+
 2012-03-25  Giuseppe Scrivano  

 	* utils.c: Include .

=== modified file 'src/warc.c'
--- src/warc.c	2012-02-25 10:58:21 +
+++ src/warc.c	2012-03-30 15:41:06 +
@@ -580,15 +580,32 @@
   strftime (timestamp, 21, "%Y-%m-%dT%H:%M:%SZ", timeinfo);
 }

-/* Fills uuid_str with a UUID based on random numbers.
+#ifdef HAVE_LIBUUID
+/* Fills urn_str with a UUID in the format required
+   for the WARC-Record-Id header.
+   The string will be 47 characters long. */
+void
+warc_uuid_str (char *urn_str)
+{
+  char uuid_str[37];
+
+  uuid_t record_id;
+  uuid_generate (record_id);
+  uuid_unparse (record_id, uuid_str);
+
+  sprintf (urn_str, "", uuid_str);
+}
+#else
+/* Fills urn_str with a UUID based on random numbers in the format
+   required for the WARC-Record-Id header.
(See RFC 4122, UUID version 4.)

Note: this is a fallback method, it is much better to use the
methods provided by libuuid.

-   The uuid_str will be 36 characters long. */
-static void
-warc_uuid_random (char *uuid_str)
+   The string will be 47 characters long. */
+void
+warc_uuid_str (char *urn_str)
 {
   // RFC 4122, a version 4 UUID with only random numbers

@@ -605,32 +622,14 @@
   // clock_seq_hi_and_reserved to zero and one, respectively.
   uuid_data[8] = (uuid_data[8] & 0xBF) | 0x80;

-  sprintf (uuid_str,
-"%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-%02x%02x%02x%02x%02x%02x",
+  sprintf (urn_str,
+"",
 uuid_data[0], uuid_data[1], uuid_data[2], uuid_data[3], uuid_data[4],
 uuid_data[5], uuid_data[6], uuid_data[7], uuid_data[8], uuid_data[9],
 uuid_data[10], uuid_data[11], uuid_data[12], uuid_data[13], uuid_data[14],
 uuid_data[15]);
 }
-
-/* Fills urn_str with a UUID in the format required
-   for the WARC-Record-Id header.
-   The string will be 47 characters long. */
-void
-warc_uuid_str (char *urn_str)
-{
-  char uuid_str[37];
-
-# ifdef HAVE_LIBUUID
-  uuid_t record_id;
-  uuid_generate (record_id);
-  uuid_unparse (record_id, uuid_str);
-# else
-  warc_uuid_random (uuid_str);
-# endif
-
-  sprintf (urn_str, "", uuid_str);
-}
+#endif

 /* Write a warcinfo record to the current file.
Updates warc_current_warcinfo_uuid_str. */


Re: [Bug-wget] (Patch) Bug on latest wget (1.3.14)

2012-03-30 Thread Tim Rühsen
Hello Alejandro,

here is a patch that fixes the issue with empty HTTP queries.

But the website has two files that can't be loaded (404 Not found). These 
files won't be translated to local filenames. This is a correct behaviour, 
since these files do not exist locally.

Guiseppe, put you in CC since I am not shure if you read all discussions in 
the list.

Tim

Am Thursday 29 March 2012 schrieb Alejandro Supu:
> Hi,
> 
> I have found a bug on the latest version of the http client, wget 1.3.14
> 
> This is how to reproduce it:
> 
> If we save the page:
> http://accionistaseinversores.bbva.com/TLBB/tlbb/bbvair/esp/index.jsp with
> the following parameters: wget -k -p
> http://accionistaseinversores.bbva.com/TLBB/tlbb/bbvair/esp/index.jsp
> 
> On the saved "main.css" file
> (\accionistaseinversores.bbva.com\TLBB\fbinir\css), there are files that
> point to the remote files instead of the saved ones! For example, on line
> 57, 68 and 79, it points to
> http://accionistaseinversores.bbva.com/TLBB/fbinir/mult/stagsans-light-web
> font.eot instead of ../mult/stagsans-book-webfont.eot and this file was
> saved to local... There are other files with the same behaviour.
> 
> If you search the string "http" within the CSS file, you will find all the
> pointed files to remote instead of the local SAVED ones.
> 
> Please, tell me anything related to this bug or when it will be corrected.
> 
> THANKS!
-- 
OMS Open Media System GmbH
Holzdamm 40
20099 Hamburg
Fon +49-40-238878-40
Fax +49-40-238878-99
Email tim.rueh...@openmediasystem.de
Sitz und Registergericht Hamburg
HRB 57616
=== modified file 'src/ChangeLog'
--- src/ChangeLog	2012-03-25 15:49:55 +
+++ src/ChangeLog	2012-03-30 09:18:54 +
@@ -1,3 +1,7 @@
+2012-03-30  Tim Ruehsen  
+
+	* url.c: use empty query in local filenames
+
 2012-03-25  Giuseppe Scrivano  
 
 	* utils.c: Include .

=== modified file 'src/url.c'
--- src/url.c	2011-01-01 12:19:37 +
+++ src/url.c	2012-03-30 09:14:56 +
@@ -1502,7 +1502,7 @@
 {
   struct growable fnres;/* stands for "file name result" */
 
-  const char *u_file, *u_query;
+  const char *u_file;
   char *fname, *unique;
   char *index_filename = "index.html"; /* The default index file is index.html */
 
@@ -1561,12 +1561,11 @@
   u_file = *u->file ? u->file : index_filename;
   append_uri_pathel (u_file, u_file + strlen (u_file), false, &fnres);
 
-  /* Append "?query" to the file name. */
-  u_query = u->query && *u->query ? u->query : NULL;
-  if (u_query)
+  /* Append "?query" to the file name, even if empty */
+  if (u->query)
 	{
 	  append_char (FN_QUERY_SEP, &fnres);
-	  append_uri_pathel (u_query, u_query + strlen (u_query),
+	  append_uri_pathel (u->query, u->query + strlen (u->query),
 			 true, &fnres);
 	}
 }



[Bug-wget] Concurrency and wget

2012-03-30 Thread Micah Cowan
I had promised to write up some information about what is needed to
support concurrent downloads in Wget, with a particular focus on GSoC
project possibilities.

I did not manage to do this in a timely manner; I was in Utah for a
week, where I'd expected to have more time on my hands than I actually
did. I've written up the basic needs now, however, and this information
may be found at

http://wget.addictivecode.org/Concurrency

Giuseppe, when you have the chance, please also update the information
on the ideas page with this link.

-mjc



Re: [Bug-wget] patch to fix some types of warnings

2012-03-30 Thread Tim Ruehsen
Hello Guiseppe,

Am Thursday 29 March 2012 schrieb Giuseppe Scrivano:
> Hello Tim,
> 
> Tim Ruehsen  writes:
> > function declaration isn't a prototype [-Wstrict-prototypes]
> > no previous prototype for 'convert_links_in_hashtable'
> > [-Wmissing-prototypes] suggest braces around empty body in an 'else'
> > statement [-Wempty-body]
> > 
> > please apply it to the repository.
> 
> please provide a ChangeLog entry for these entries.  Look at other
> entries in the ChangeLog file to see how it should be done.

Sorry, added now.

> 
> >else
> > 
> > +{
> > 
> >  /* Error in expiration spec.  Assume default (cookie doesn't
> >  
> > expire, but valid only for this session.)  */
> > 
> > -;
> > +}
> > 
> >  }
> >
> >else if (TOKEN_IS (name, "max-age"))
> >
> >  {
> > 
> > @@ -434,8 +435,9 @@
> > 
> >cookie->secure = 1;
> >  
> >  }
> >
> >else
> > 
> > +{
> > 
> >  /* Ignore unrecognized attribute. */
> > 
> > -;
> > +}
> 
> I would rather move these comments near the if and explain what happens
> in the particular case.  An empty branch is quite ugly.

You are right. I removed both branches.
In the first case, i moved/changed the comment near the if.
For the second case, I left the comment at the bottom, since one would expect 
it there. The new comment looks like /* else: Ignore... */

> 
> > +# ifndef HAVE_LIBUUID
> > 
> >  /* Fills uuid_str with a UUID based on random numbers.
> >  
> > (See RFC 4122, UUID version 4.)
> > 
> > @@ -612,6 +613,7 @@
> > 
> >  uuid_data[10], uuid_data[11], uuid_data[12], uuid_data[13],
> >  uuid_data[14], uuid_data[15]);
> >  
> >  }
> > 
> > +#endif
> 
> Please provide it as a separate patch.

I separated it and provide it later.

Tim
=== modified file 'src/ChangeLog'
--- src/ChangeLog	2012-03-25 15:49:55 +
+++ src/ChangeLog	2012-03-30 08:12:28 +
@@ -1,3 +1,24 @@
+2012-03-30  Tim Ruehsen  
+
+	* convert.c: made convert_links_in_hashtable() static
+	* cookies.c: removed empty else branches
+	* css-url.c: include css-url.h, made get_uri_string() static
+	* css-url.h: added protoype for get_urls_css()
+	* gnutls.c: prototyped declaration of ssl_init()
+	* html-parse.c: made tagstack_push(),tagstack_pop(),tagstack_find() static
+	* html-url.c: made cleanup_html_url() static
+	* progress.c: made count_cols(),get_eta() static
+	* retr.h: removed protoype convert_to_bits() (moving to util.h)
+	* util.h: added protoype convert_to_bits()
+	* spider.c: made spider_cleanup() static
+	* warc.c: prototyped declaration of warc_write_start_record(),
+	warc_write_end_record(),warc_start_cdx_file(),warc_init(),
+	warc_load_cdx_dedup_file(),warc_write_metadata(),warc_close(),
+	warc_tempfile().
+	made warc_write_warcinfo_record(),warc_load_cdx_dedup_file(),
+	warc_write_metadata() static
+	* warc.h: fixed protoypes for warc_init(),warc_close(),warc_tempfile()
+
 2012-03-25  Giuseppe Scrivano  

 	* utils.c: Include .

=== modified file 'src/convert.c'
--- src/convert.c	2011-01-01 12:19:37 +
+++ src/convert.c	2012-03-27 15:11:05 +
@@ -58,7 +58,7 @@
 static void convert_links (const char *, struct urlpos *);


-void
+static void
 convert_links_in_hashtable (struct hash_table *downloaded_set,
 int is_css,
 int *file_count)

=== modified file 'src/cookies.c'
--- src/cookies.c	2011-08-02 20:58:38 +
+++ src/cookies.c	2012-03-30 07:54:14 +
@@ -391,6 +391,8 @@
 goto error;
   BOUNDED_TO_ALLOCA (value.b, value.e, value_copy);

+  /* Check if expiration spec is valid.
+ If not, assume default (cookie doesn't expire, but valid only for this session.) */
   expires = http_atotm (value_copy);
   if (expires != (time_t) -1)
 {
@@ -402,10 +404,6 @@
   if (cookie->expiry_time < cookies_now)
 cookie->discard_requested = 1;
 }
-  else
-/* Error in expiration spec.  Assume default (cookie doesn't
-   expire, but valid only for this session.)  */
-;
 }
   else if (TOKEN_IS (name, "max-age"))
 {
@@ -433,9 +431,7 @@
   /* ignore value completely */
   cookie->secure = 1;
 }
-  else
-/* Ignore unrecognized attribute. */
-;
+  /* else: Ignore unrecognized attribute. */
 }
   if (*ptr)
 /* extract_param has encountered a syntax error */

=== modified file 'src/css-url.c'
--- src/css-url.c	2011-01-01 12:19:37 +
+++ src/css-url.c	2012-03-27 15:17:02 +
@@ -55,6 +55,7 @@
 #include "convert.h"
 #include "html-url.h"
 #include "css-tokens.h"
+#include "css-url.h"

 /* from lex.yy.c */
 extern char *yytext;
@@ -107,7 +108,7 @@
   whitespace after the opening parenthesis and before the