Re: [Bug-wget] General Testsuite issue

2014-01-24 Thread Darshit Shah
On Thu, Jan 23, 2014 at 2:47 PM, Tim Ruehsen tim.rueh...@gmx.de wrote:

 On Thursday 23 January 2014 09:03:36 Darshit Shah wrote:
Do you have any ideas that we
could use to implement this?
  
   Well, yes. Read everything in a straight order, beginning at
  
   SYSCONFDIRwgetrc
  
   following 'chooseconfig' directives within ('include' would be a more
   intuitive name).
  
   Now parse the command line, whenever a --config is found, read that
 file
   (again following 'chooseconfig' directives).
   A --reset-config would allow the user to reset all configuration
 variables
   to
   the built-in default at any time.
  
   The 'chooseconfig' directive and the --config CLI option should allow
   wildcards (globbing with tilde, if available).
   A max. recursion level prevents infinite loops (e.g. 'a' includes 'b'
 and
   'b'
   includes 'a').
  
   This would allow a /etc/wgetrc with
   ...
   chooseconfig /etc/wgetrc.d/*
   chooseconfig ~/.wgetrc
   ...
  
   At least that would make the work of some distribution package
 maintainers
   easier and is quite clear and flexible for any user.
  
   A user could set up his project-related and distribution independant
   config
   files, e.g. with
  
   resetconfig
   ... [special project settings]
  
   For the test suite we simply use --reset-config as the first CLI
 option.
  
   I like this idea. It may require a few changes to the Wgetrc files,
 but I
 
  see how this may be a huge boon distribution packagers. Any packagers
  around here want to chime in?
 
  include or includeconfig could be used in place of chooseconfig.
 
  My only issue with this idea is that it breaks backwards-compatibility
 with
  users' personal wgetrc files. Existing users are used to calling --config
  FILE_NAME and assuming that these settings will be applied on top of the
  compiled defaults and not the settings already existing in
  SYSCONFDIR/wgetrc.
  This requires all existing configuration files to change and add a new
  resetconfig option on top.

 This shouts for a 'version' string. If there is none, Wget behaves as
 always.
 Distribution maintainers could add this version at the beginning of
 SYSCONFDIR/wgetrc and explicitly tell about it (e.g. Debian has this
 mechanism
 when you upgrade a package that breaks compatibility).

Personally, I loathe the idea of version strings. Deprecating old usage of
the conf files becomes very difficult and we need to maintain multiple
parsers for different versions.

However, I don't currently see a better option. Once implemented, this
patch should speed up the invocation of Wget and we'll have a much cleaner
main function.


 If this version string can occur several times, something like the
 following
 adds value and keeps backwards compatibility to users (SYSCONFDIR/wgetrc):
 configversion=1.16 # 'chooseconfig' / --config adds up
 chooseconfig /etc/wgetrc.d/*
 configversion=1.15 # back to old behaviour

 ~/.wgetrc should be read after SYSCONFDIR/wgetrc, as it is now.

 If a user wants to switch to the new behaviour, she just adds
 configversion=1.16 # 'chooseconfig' / --config adds up
 to
 ~/.wgetrc

 I guess that supporting wildcards won't break backward compatibility or at
 least it is very unlikely that some uses shell wildcards within config file
 names.

  I'm not so sure about how wildcards and globbing could be implemented in
  the config files. Can we somehow pass on the strings to the the shell and
  ask it to expand those for us?

 I did it this way (see line 731...):
 https://github.com/rockdaboot/mget/blob/master/src/options.c
 (I am willing to contribute any of the code to the GNU/Wget project, if it
 comes to that point).

 I see that you have the recursion level management and globbing expansion
both written for mget. We could probably use the same logic here again.


I am planning on starting work for this set of patches on a local
repository. I'll be pushing my code to GitHub till we have a tested and
debugged rewrite, which can later be pushed to master. The in-work patches
can be found here:
https://github.com/darnir/wget/tree/rewrite_input
Any patches to this branch are most welcome!

Currently, I'm a little busy with my undergrad thesis and will not be able
to devote much time to this right now. However I will continue working on
it whenever time allows.


-- 
Thanking You,
Darshit Shah


Re: [Bug-wget] General Testsuite issue

2014-01-23 Thread Tim Ruehsen
On Thursday 23 January 2014 09:03:36 Darshit Shah wrote:
   Do you have any ideas that we
   could use to implement this?
  
  Well, yes. Read everything in a straight order, beginning at
  
  SYSCONFDIRwgetrc
  
  following 'chooseconfig' directives within ('include' would be a more
  intuitive name).
  
  Now parse the command line, whenever a --config is found, read that file
  (again following 'chooseconfig' directives).
  A --reset-config would allow the user to reset all configuration variables
  to
  the built-in default at any time.
  
  The 'chooseconfig' directive and the --config CLI option should allow
  wildcards (globbing with tilde, if available).
  A max. recursion level prevents infinite loops (e.g. 'a' includes 'b' and
  'b'
  includes 'a').
  
  This would allow a /etc/wgetrc with
  ...
  chooseconfig /etc/wgetrc.d/*
  chooseconfig ~/.wgetrc
  ...
  
  At least that would make the work of some distribution package maintainers
  easier and is quite clear and flexible for any user.
  
  A user could set up his project-related and distribution independant
  config
  files, e.g. with
  
  resetconfig
  ... [special project settings]
  
  For the test suite we simply use --reset-config as the first CLI option.
  
  I like this idea. It may require a few changes to the Wgetrc files, but I
 
 see how this may be a huge boon distribution packagers. Any packagers
 around here want to chime in?
 
 include or includeconfig could be used in place of chooseconfig.
 
 My only issue with this idea is that it breaks backwards-compatibility with
 users' personal wgetrc files. Existing users are used to calling --config
 FILE_NAME and assuming that these settings will be applied on top of the
 compiled defaults and not the settings already existing in
 SYSCONFDIR/wgetrc.
 This requires all existing configuration files to change and add a new
 resetconfig option on top.

This shouts for a 'version' string. If there is none, Wget behaves as always.
Distribution maintainers could add this version at the beginning of 
SYSCONFDIR/wgetrc and explicitly tell about it (e.g. Debian has this mechanism 
when you upgrade a package that breaks compatibility).

If this version string can occur several times, something like the following 
adds value and keeps backwards compatibility to users (SYSCONFDIR/wgetrc):
configversion=1.16 # 'chooseconfig' / --config adds up
chooseconfig /etc/wgetrc.d/*
configversion=1.15 # back to old behaviour

~/.wgetrc should be read after SYSCONFDIR/wgetrc, as it is now.

If a user wants to switch to the new behaviour, she just adds
configversion=1.16 # 'chooseconfig' / --config adds up
to
~/.wgetrc

I guess that supporting wildcards won't break backward compatibility or at 
least it is very unlikely that some uses shell wildcards within config file 
names.

 I'm not so sure about how wildcards and globbing could be implemented in
 the config files. Can we somehow pass on the strings to the the shell and
 ask it to expand those for us?

I did it this way (see line 731...):
https://github.com/rockdaboot/mget/blob/master/src/options.c
(I am willing to contribute any of the code to the GNU/Wget project, if it 
comes to that point).

Tim




Re: [Bug-wget] General Testsuite issue

2014-01-22 Thread Tim Ruehsen
On Tuesday 21 January 2014 08:17:45 Darshit Shah wrote:
   The --config option is detected just before the other options by running
   the same loop a little earlier. However, to same CPU cycles, we break
   out
   of the loop as soon as --config is identified. I have extended that loop

What do you think, how many CPU cycles can be saved here ;-)

  
  so
  
   that it detects --no-config too and breaks out the moment either one of
   these is seen. Hence, only the first instance is acted upon, while the
   others are silently discarded.
  
  Thanks for making this point clear.
  
  I don't understand the reason for the --config exception and I would
  definitely implement it in a different way... but changing it will break
  compatibility.
 
 The reason for having the separate loop is as follows:
 
 In case the user wants to use a config file that is not /etc/wgetrc and
 ~/.wgetrc, then they may use the --config option. However, We tend to
 initialize the settings from these files before beginning to read the
 command line arguments. This would cause wrong settings to be read from the
 default config files. Hence, we run a loop to simply check for the --config
 option and then parse the given wgetrc file before parsing the rest of the
 options.
 
 I have simply added another check to that first loop. I too would like to
 implement this in a different manner. Do you have any ideas that we could
 use to implement this?

Well, yes. Read everything in a straight order, beginning at
SYSCONFDIRwgetrc
following 'chooseconfig' directives within ('include' would be a more 
intuitive name).

Now parse the command line, whenever a --config is found, read that file 
(again following 'chooseconfig' directives).
A --reset-config would allow the user to reset all configuration variables to 
the built-in default at any time.

The 'chooseconfig' directive and the --config CLI option should allow 
wildcards (globbing with tilde, if available).
A max. recursion level prevents infinite loops (e.g. 'a' includes 'b' and 'b' 
includes 'a'). 

This would allow a /etc/wgetrc with
...
chooseconfig /etc/wgetrc.d/*
chooseconfig ~/.wgetrc
...

At least that would make the work of some distribution package maintainers 
easier and is quite clear and flexible for any user.

A user could set up his project-related and distribution independant config 
files, e.g. with
resetconfig
... [special project settings]

For the test suite we simply use --reset-config as the first CLI option.

Regards, Tim




Re: [Bug-wget] General Testsuite issue

2014-01-22 Thread Giuseppe Scrivano
Darshit Shah dar...@gmail.com writes:

 Till we decide on a restructure, here is the patch for --no-config.

 This patch is made against master, but is required in parallel-wget too.
 @Giuseppe: Maybe you could cherry pick this? Or merge master into
 parallel-wget, especially since we've just released a new version.

I have just done it.  I've applied it to master and merged it into
parallel-wget.

Cheers,
Giuseppe



Re: [Bug-wget] General Testsuite issue

2014-01-22 Thread Darshit Shah
   The --config option is detected just before the other options by
 running
the same loop a little earlier. However, to same CPU cycles, we break
out
of the loop as soon as --config is identified. I have extended that
 loop

 What do you think, how many CPU cycles can be saved here ;-)

 Well, this was always there. I simply extended it. I hated the fact that
when neither of --config and --no-config were passed, this loop would parse
all the arguments to Wget and do nothing about it. It is high time that
this be changed.

Agreed, the way I took wasn't saving too many cycles, but preventing the
exception would have required a little more logic and it didn't seem worth
the effort.

 I have simply added another check to that first loop. I too would like to
  implement this in a different manner. Do you have any ideas that we could
  use to implement this?

 Well, yes. Read everything in a straight order, beginning at
 SYSCONFDIRwgetrc
 following 'chooseconfig' directives within ('include' would be a more
 intuitive name).

 Now parse the command line, whenever a --config is found, read that file
 (again following 'chooseconfig' directives).
 A --reset-config would allow the user to reset all configuration variables
 to
 the built-in default at any time.

 The 'chooseconfig' directive and the --config CLI option should allow
 wildcards (globbing with tilde, if available).
 A max. recursion level prevents infinite loops (e.g. 'a' includes 'b' and
 'b'
 includes 'a').

 This would allow a /etc/wgetrc with
 ...
 chooseconfig /etc/wgetrc.d/*
 chooseconfig ~/.wgetrc
 ...

 At least that would make the work of some distribution package maintainers
 easier and is quite clear and flexible for any user.

 A user could set up his project-related and distribution independant config
 files, e.g. with
 resetconfig
 ... [special project settings]

 For the test suite we simply use --reset-config as the first CLI option.

 I like this idea. It may require a few changes to the Wgetrc files, but I
see how this may be a huge boon distribution packagers. Any packagers
around here want to chime in?

include or includeconfig could be used in place of chooseconfig.

My only issue with this idea is that it breaks backwards-compatibility with
users' personal wgetrc files. Existing users are used to calling --config
FILE_NAME and assuming that these settings will be applied on top of the
compiled defaults and not the settings already existing in
SYSCONFDIR/wgetrc.
This requires all existing configuration files to change and add a new
resetconfig option on top.

I'm not so sure about how wildcards and globbing could be implemented in
the config files. Can we somehow pass on the strings to the the shell and
ask it to expand those for us?

In http.c we already have the logic for handling cyclic dependencies.
Simply need to extend it to handle cyclic dependencies in the config files.
I like the direction in which this is going expect for the slight breakage
in wgetrc compatibility. Maybe we can engineer something to fix that?

-- 
Thanking You,
Darshit Shah


Re: [Bug-wget] General Testsuite issue

2014-01-21 Thread Darshit Shah
Till we decide on a restructure, here is the patch for --no-config.

This patch is made against master, but is required in parallel-wget too.
@Giuseppe: Maybe you could cherry pick this? Or merge master into
parallel-wget, especially since we've just released a new version.


On Tue, Jan 21, 2014 at 8:17 AM, Darshit Shah dar...@gmail.com wrote:

  The --config option is detected just before the other options by running
  the same loop a little earlier. However, to same CPU cycles, we break
 out
  of the loop as soon as --config is identified. I have extended that
 loop so
  that it detects --no-config too and breaks out the moment either one of
  these is seen. Hence, only the first instance is acted upon, while the
  others are silently discarded.

 Thanks for making this point clear.

 I don't understand the reason for the --config exception and I would
 definitely implement it in a different way... but changing it will break
 compatibility.

 The reason for having the separate loop is as follows:

 In case the user wants to use a config file that is not /etc/wgetrc and
 ~/.wgetrc, then they may use the --config option. However, We tend to
 initialize the settings from these files before beginning to read the
 command line arguments. This would cause wrong settings to be read from the
 default config files. Hence, we run a loop to simply check for the --config
 option and then parse the given wgetrc file before parsing the rest of the
 options.

 I have simply added another check to that first loop. I too would like to
 implement this in a different manner. Do you have any ideas that we could
 use to implement this?

 There's a couple of implementation ideas I currently have, but I'll have
 to go back to the code and check its feasibility first. If it looks
 possible, I'll send a new RFC with the precise details of how I intend to
 implement this.


 --
 Thanking You,
 Darshit Shah




-- 
Thanking You,
Darshit Shah


Re: [Bug-wget] General Testsuite issue

2014-01-20 Thread Tim Ruehsen
On Saturday 18 January 2014 23:51:07 Darshit Shah wrote:
 On Sat, Jan 18, 2014 at 2:05 AM, Tim Rühsen tim.rueh...@gmx.de wrote:
  Am Freitag, 17. Januar 2014, 11:42:41 schrieb Tony Lewis:
   Darshit Shah wrote:
In case both the --config and --no-config commands are issued, the one
   
   that
   
appears first on the command will be considered and the other ignored.
   
   Given my memory of the way the parsing loop works, I would expect that
   it
   would use the last one that appears. How do GNU commands usually handle
   multiple instances of a command option?
  
  Wget, as most tools, parse the argument from left to right, the second
  overwriting the first. Else (e.g. if arguments 'sum up'),it should be
  explicitly mentioned in the docs.
  
  True. While Wget's command parser loop would usually accept the last
 
 instance of the option, this is an exception. One of the chief reasons I
 RFC'ed this patch.

OK, i missed that.

 The --config option is detected just before the other options by running
 the same loop a little earlier. However, to same CPU cycles, we break out
 of the loop as soon as --config is identified. I have extended that loop so
 that it detects --no-config too and breaks out the moment either one of
 these is seen. Hence, only the first instance is acted upon, while the
 others are silently discarded.

Thanks for making this point clear.

I don't understand the reason for the --config exception and I would 
definitely implement it in a different way... but changing it will break 
compatibility.

So, the way you did it seems straight forward and correct.

Tim




Re: [Bug-wget] General Testsuite issue

2014-01-20 Thread Darshit Shah

  The --config option is detected just before the other options by running
  the same loop a little earlier. However, to same CPU cycles, we break out
  of the loop as soon as --config is identified. I have extended that loop
 so
  that it detects --no-config too and breaks out the moment either one of
  these is seen. Hence, only the first instance is acted upon, while the
  others are silently discarded.

 Thanks for making this point clear.

 I don't understand the reason for the --config exception and I would
 definitely implement it in a different way... but changing it will break
 compatibility.

The reason for having the separate loop is as follows:

In case the user wants to use a config file that is not /etc/wgetrc and
~/.wgetrc, then they may use the --config option. However, We tend to
initialize the settings from these files before beginning to read the
command line arguments. This would cause wrong settings to be read from the
default config files. Hence, we run a loop to simply check for the --config
option and then parse the given wgetrc file before parsing the rest of the
options.

I have simply added another check to that first loop. I too would like to
implement this in a different manner. Do you have any ideas that we could
use to implement this?

There's a couple of implementation ideas I currently have, but I'll have to
go back to the code and check its feasibility first. If it looks possible,
I'll send a new RFC with the precise details of how I intend to implement
this.


-- 
Thanking You,
Darshit Shah


Re: [Bug-wget] General Testsuite issue

2014-01-17 Thread Darshit Shah
Hi Tim,

Yes, that is an eventuality I did not consider when writing this test. A
user configured option that may override the settings intended for the
test. Maybe we should have a --no-config option to ensure that the tests
are run on the compiled defaults only.

Having a uniform environment for tests is a must. I've attached a patch
that introduces --no-config. However this is not yet complete and ready for
merging. Most importantly, if accepted, I need to change a few comments and
edit wget.texi explaining the new option.

In case both the --config and --no-config commands are issued, the one that
appears first on the command will be considered and the other ignored.


On Fri, Jan 17, 2014 at 1:51 PM, Tim Ruehsen tim.rueh...@gmx.de wrote:

 Hi,

 I just discovered that Test--spider-r.py fails here (parallel-wget branch).

 The reason is 'robots=off' in ~./wgetrc.

 But that leads the the question, how can we prevent any user (and global
 /etc/wgetrc) settings dropping in, possibly making the tests fail.

 An option like --no-config could solve this issue with just a few lines of
 code.

 What do you think ?

 Tim





-- 
Thanking You,
Darshit Shah
From 6ebf2a751635a83e6c1f8b7ce07b536bf0485e9c Mon Sep 17 00:00:00 2001
From: Darshit Shah dar...@gmail.com
Date: Fri, 17 Jan 2014 15:16:38 +0530
Subject: [PATCH] Introduce --no-config. The wgetrc files will not be read

In case of a conflict between --config and --no-config, the one
that appears first will be considered and the other ignored.
---
 src/ChangeLog |  8 
 src/init.c|  3 ++-
 src/main.c| 15 ---
 src/options.h |  1 +
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/src/ChangeLog b/src/ChangeLog
index 0ac1ac6..6b1b9b1 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,11 @@
+2014-01-17  Darshit Shah  dar...@gmail.com
+
+	* init.c (commands[]): Add --no-config
+	* options.h: Same
+	* main.c (option_data[]): Same
+	(print_help): Same
+	(main): If --no-config is set, then do not read the wgetrc files
+
 2014-01-05  Håkon Vågsether hauk...@gmail.com (tiny change)
 
 	* http.c (http_loop): Fix checking the URL length when filename is
diff --git a/src/init.c b/src/init.c
index a64dabe..43d5ae9 100644
--- a/src/init.c
+++ b/src/init.c
@@ -160,7 +160,7 @@ static const struct {
 #ifdef ENABLE_DEBUG
   { debug,opt.debug, cmd_boolean },
 #endif
-  { defaultpage, 	opt.default_page,  cmd_string },
+  { defaultpage,  opt.default_page,  cmd_string },
   { deleteafter,  opt.delete_after,  cmd_boolean },
   { dirprefix,opt.dir_prefix,cmd_directory },
   { dirstruct,NULL,   cmd_spec_dirstruct },
@@ -220,6 +220,7 @@ static const struct {
   { mirror,   NULL,   cmd_spec_mirror },
   { netrc,opt.netrc, cmd_boolean },
   { noclobber,opt.noclobber, cmd_boolean },
+  { noconfig, opt.noconfig,  cmd_boolean },
   { noparent, opt.no_parent, cmd_boolean },
   { noproxy,  opt.no_proxy,  cmd_vector },
   { numtries, opt.ntry,  cmd_number_inf },/* deprecated*/
diff --git a/src/main.c b/src/main.c
index 19d7253..2aa961d 100644
--- a/src/main.c
+++ b/src/main.c
@@ -238,6 +238,7 @@ static struct cmdline_option option_data[] =
 { mirror, 'm', OPT_BOOLEAN, mirror, -1 },
 { no, 'n', OPT__NO, NULL, required_argument },
 { no-clobber, 0, OPT_BOOLEAN, noclobber, -1 },
+{ no-config, 0, OPT_BOOLEAN, noconfig, -1},
 { no-parent, 0, OPT_BOOLEAN, noparent, -1 },
 { output-document, 'O', OPT_VALUE, outputdocument, -1 },
 { output-file, 'o', OPT_VALUE, logfile, -1 },
@@ -473,7 +474,9 @@ Logging and input file:\n),
   -B,  --base=URLresolves HTML input-file links (-i -F)\n\
  relative to URL.\n),
 N_(\
-   --config=FILE Specify config file to use.\n), 
+   --config=FILE Specify config file to use.\n),
+N_(\
+   --no-config   Do not read any config file.\n),
 \n,
 
 N_(\
@@ -1045,6 +1048,7 @@ main (int argc, char **argv)
   longindex = -1;
   int retconf;
   bool use_userconfig = false;
+  bool noconfig = false;
 
   while ((retconf = getopt_long (argc, argv,
 short_options, long_options, longindex)) != -1)
@@ -1057,7 +1061,12 @@ main (int argc, char **argv)
 {
   confval = long_options[longindex].val;
   config_opt = option_data[confval  ~BOOLEAN_NEG_MARKER];
-  if (strcmp (config_opt-long_name, config) == 0)
+  if (strcmp (config_opt-long_name, no-config) == 0)
+{
+  noconfig = true;
+  break;
+}
+  else if (strcmp (config_opt-long_name, config) == 0)
 {
   bool userrc_ret = true;
   userrc_ret = run_wgetrc (optarg);
@@ -1074,7 +1083,7 @@ main (int 

Re: [Bug-wget] General Testsuite issue

2014-01-17 Thread Tony Lewis
Darshit Shah wrote:

 In case both the --config and --no-config commands are issued, the one
that
 appears first on the command will be considered and the other ignored.

Given my memory of the way the parsing loop works, I would expect that it
would use the last one that appears. How do GNU commands usually handle
multiple instances of a command option?

Tony




Re: [Bug-wget] General Testsuite issue

2014-01-17 Thread Tim Rühsen
Am Freitag, 17. Januar 2014, 11:42:41 schrieb Tony Lewis:
 Darshit Shah wrote:
  In case both the --config and --no-config commands are issued, the one
 
 that
 
  appears first on the command will be considered and the other ignored.
 
 Given my memory of the way the parsing loop works, I would expect that it
 would use the last one that appears. How do GNU commands usually handle
 multiple instances of a command option?

Wget, as most tools, parse the argument from left to right, the second 
overwriting the first. Else (e.g. if arguments 'sum up'),it should be 
explicitly mentioned in the docs.

Tim

signature.asc
Description: This is a digitally signed message part.