Re: [Bug-wget] General Testsuite issue
On Thu, Jan 23, 2014 at 2:47 PM, Tim Ruehsen tim.rueh...@gmx.de wrote: On Thursday 23 January 2014 09:03:36 Darshit Shah wrote: Do you have any ideas that we could use to implement this? Well, yes. Read everything in a straight order, beginning at SYSCONFDIRwgetrc following 'chooseconfig' directives within ('include' would be a more intuitive name). Now parse the command line, whenever a --config is found, read that file (again following 'chooseconfig' directives). A --reset-config would allow the user to reset all configuration variables to the built-in default at any time. The 'chooseconfig' directive and the --config CLI option should allow wildcards (globbing with tilde, if available). A max. recursion level prevents infinite loops (e.g. 'a' includes 'b' and 'b' includes 'a'). This would allow a /etc/wgetrc with ... chooseconfig /etc/wgetrc.d/* chooseconfig ~/.wgetrc ... At least that would make the work of some distribution package maintainers easier and is quite clear and flexible for any user. A user could set up his project-related and distribution independant config files, e.g. with resetconfig ... [special project settings] For the test suite we simply use --reset-config as the first CLI option. I like this idea. It may require a few changes to the Wgetrc files, but I see how this may be a huge boon distribution packagers. Any packagers around here want to chime in? include or includeconfig could be used in place of chooseconfig. My only issue with this idea is that it breaks backwards-compatibility with users' personal wgetrc files. Existing users are used to calling --config FILE_NAME and assuming that these settings will be applied on top of the compiled defaults and not the settings already existing in SYSCONFDIR/wgetrc. This requires all existing configuration files to change and add a new resetconfig option on top. This shouts for a 'version' string. If there is none, Wget behaves as always. Distribution maintainers could add this version at the beginning of SYSCONFDIR/wgetrc and explicitly tell about it (e.g. Debian has this mechanism when you upgrade a package that breaks compatibility). Personally, I loathe the idea of version strings. Deprecating old usage of the conf files becomes very difficult and we need to maintain multiple parsers for different versions. However, I don't currently see a better option. Once implemented, this patch should speed up the invocation of Wget and we'll have a much cleaner main function. If this version string can occur several times, something like the following adds value and keeps backwards compatibility to users (SYSCONFDIR/wgetrc): configversion=1.16 # 'chooseconfig' / --config adds up chooseconfig /etc/wgetrc.d/* configversion=1.15 # back to old behaviour ~/.wgetrc should be read after SYSCONFDIR/wgetrc, as it is now. If a user wants to switch to the new behaviour, she just adds configversion=1.16 # 'chooseconfig' / --config adds up to ~/.wgetrc I guess that supporting wildcards won't break backward compatibility or at least it is very unlikely that some uses shell wildcards within config file names. I'm not so sure about how wildcards and globbing could be implemented in the config files. Can we somehow pass on the strings to the the shell and ask it to expand those for us? I did it this way (see line 731...): https://github.com/rockdaboot/mget/blob/master/src/options.c (I am willing to contribute any of the code to the GNU/Wget project, if it comes to that point). I see that you have the recursion level management and globbing expansion both written for mget. We could probably use the same logic here again. I am planning on starting work for this set of patches on a local repository. I'll be pushing my code to GitHub till we have a tested and debugged rewrite, which can later be pushed to master. The in-work patches can be found here: https://github.com/darnir/wget/tree/rewrite_input Any patches to this branch are most welcome! Currently, I'm a little busy with my undergrad thesis and will not be able to devote much time to this right now. However I will continue working on it whenever time allows. -- Thanking You, Darshit Shah
Re: [Bug-wget] General Testsuite issue
On Thursday 23 January 2014 09:03:36 Darshit Shah wrote: Do you have any ideas that we could use to implement this? Well, yes. Read everything in a straight order, beginning at SYSCONFDIRwgetrc following 'chooseconfig' directives within ('include' would be a more intuitive name). Now parse the command line, whenever a --config is found, read that file (again following 'chooseconfig' directives). A --reset-config would allow the user to reset all configuration variables to the built-in default at any time. The 'chooseconfig' directive and the --config CLI option should allow wildcards (globbing with tilde, if available). A max. recursion level prevents infinite loops (e.g. 'a' includes 'b' and 'b' includes 'a'). This would allow a /etc/wgetrc with ... chooseconfig /etc/wgetrc.d/* chooseconfig ~/.wgetrc ... At least that would make the work of some distribution package maintainers easier and is quite clear and flexible for any user. A user could set up his project-related and distribution independant config files, e.g. with resetconfig ... [special project settings] For the test suite we simply use --reset-config as the first CLI option. I like this idea. It may require a few changes to the Wgetrc files, but I see how this may be a huge boon distribution packagers. Any packagers around here want to chime in? include or includeconfig could be used in place of chooseconfig. My only issue with this idea is that it breaks backwards-compatibility with users' personal wgetrc files. Existing users are used to calling --config FILE_NAME and assuming that these settings will be applied on top of the compiled defaults and not the settings already existing in SYSCONFDIR/wgetrc. This requires all existing configuration files to change and add a new resetconfig option on top. This shouts for a 'version' string. If there is none, Wget behaves as always. Distribution maintainers could add this version at the beginning of SYSCONFDIR/wgetrc and explicitly tell about it (e.g. Debian has this mechanism when you upgrade a package that breaks compatibility). If this version string can occur several times, something like the following adds value and keeps backwards compatibility to users (SYSCONFDIR/wgetrc): configversion=1.16 # 'chooseconfig' / --config adds up chooseconfig /etc/wgetrc.d/* configversion=1.15 # back to old behaviour ~/.wgetrc should be read after SYSCONFDIR/wgetrc, as it is now. If a user wants to switch to the new behaviour, she just adds configversion=1.16 # 'chooseconfig' / --config adds up to ~/.wgetrc I guess that supporting wildcards won't break backward compatibility or at least it is very unlikely that some uses shell wildcards within config file names. I'm not so sure about how wildcards and globbing could be implemented in the config files. Can we somehow pass on the strings to the the shell and ask it to expand those for us? I did it this way (see line 731...): https://github.com/rockdaboot/mget/blob/master/src/options.c (I am willing to contribute any of the code to the GNU/Wget project, if it comes to that point). Tim
Re: [Bug-wget] General Testsuite issue
On Tuesday 21 January 2014 08:17:45 Darshit Shah wrote: The --config option is detected just before the other options by running the same loop a little earlier. However, to same CPU cycles, we break out of the loop as soon as --config is identified. I have extended that loop What do you think, how many CPU cycles can be saved here ;-) so that it detects --no-config too and breaks out the moment either one of these is seen. Hence, only the first instance is acted upon, while the others are silently discarded. Thanks for making this point clear. I don't understand the reason for the --config exception and I would definitely implement it in a different way... but changing it will break compatibility. The reason for having the separate loop is as follows: In case the user wants to use a config file that is not /etc/wgetrc and ~/.wgetrc, then they may use the --config option. However, We tend to initialize the settings from these files before beginning to read the command line arguments. This would cause wrong settings to be read from the default config files. Hence, we run a loop to simply check for the --config option and then parse the given wgetrc file before parsing the rest of the options. I have simply added another check to that first loop. I too would like to implement this in a different manner. Do you have any ideas that we could use to implement this? Well, yes. Read everything in a straight order, beginning at SYSCONFDIRwgetrc following 'chooseconfig' directives within ('include' would be a more intuitive name). Now parse the command line, whenever a --config is found, read that file (again following 'chooseconfig' directives). A --reset-config would allow the user to reset all configuration variables to the built-in default at any time. The 'chooseconfig' directive and the --config CLI option should allow wildcards (globbing with tilde, if available). A max. recursion level prevents infinite loops (e.g. 'a' includes 'b' and 'b' includes 'a'). This would allow a /etc/wgetrc with ... chooseconfig /etc/wgetrc.d/* chooseconfig ~/.wgetrc ... At least that would make the work of some distribution package maintainers easier and is quite clear and flexible for any user. A user could set up his project-related and distribution independant config files, e.g. with resetconfig ... [special project settings] For the test suite we simply use --reset-config as the first CLI option. Regards, Tim
Re: [Bug-wget] General Testsuite issue
Darshit Shah dar...@gmail.com writes: Till we decide on a restructure, here is the patch for --no-config. This patch is made against master, but is required in parallel-wget too. @Giuseppe: Maybe you could cherry pick this? Or merge master into parallel-wget, especially since we've just released a new version. I have just done it. I've applied it to master and merged it into parallel-wget. Cheers, Giuseppe
Re: [Bug-wget] General Testsuite issue
The --config option is detected just before the other options by running the same loop a little earlier. However, to same CPU cycles, we break out of the loop as soon as --config is identified. I have extended that loop What do you think, how many CPU cycles can be saved here ;-) Well, this was always there. I simply extended it. I hated the fact that when neither of --config and --no-config were passed, this loop would parse all the arguments to Wget and do nothing about it. It is high time that this be changed. Agreed, the way I took wasn't saving too many cycles, but preventing the exception would have required a little more logic and it didn't seem worth the effort. I have simply added another check to that first loop. I too would like to implement this in a different manner. Do you have any ideas that we could use to implement this? Well, yes. Read everything in a straight order, beginning at SYSCONFDIRwgetrc following 'chooseconfig' directives within ('include' would be a more intuitive name). Now parse the command line, whenever a --config is found, read that file (again following 'chooseconfig' directives). A --reset-config would allow the user to reset all configuration variables to the built-in default at any time. The 'chooseconfig' directive and the --config CLI option should allow wildcards (globbing with tilde, if available). A max. recursion level prevents infinite loops (e.g. 'a' includes 'b' and 'b' includes 'a'). This would allow a /etc/wgetrc with ... chooseconfig /etc/wgetrc.d/* chooseconfig ~/.wgetrc ... At least that would make the work of some distribution package maintainers easier and is quite clear and flexible for any user. A user could set up his project-related and distribution independant config files, e.g. with resetconfig ... [special project settings] For the test suite we simply use --reset-config as the first CLI option. I like this idea. It may require a few changes to the Wgetrc files, but I see how this may be a huge boon distribution packagers. Any packagers around here want to chime in? include or includeconfig could be used in place of chooseconfig. My only issue with this idea is that it breaks backwards-compatibility with users' personal wgetrc files. Existing users are used to calling --config FILE_NAME and assuming that these settings will be applied on top of the compiled defaults and not the settings already existing in SYSCONFDIR/wgetrc. This requires all existing configuration files to change and add a new resetconfig option on top. I'm not so sure about how wildcards and globbing could be implemented in the config files. Can we somehow pass on the strings to the the shell and ask it to expand those for us? In http.c we already have the logic for handling cyclic dependencies. Simply need to extend it to handle cyclic dependencies in the config files. I like the direction in which this is going expect for the slight breakage in wgetrc compatibility. Maybe we can engineer something to fix that? -- Thanking You, Darshit Shah
Re: [Bug-wget] General Testsuite issue
Till we decide on a restructure, here is the patch for --no-config. This patch is made against master, but is required in parallel-wget too. @Giuseppe: Maybe you could cherry pick this? Or merge master into parallel-wget, especially since we've just released a new version. On Tue, Jan 21, 2014 at 8:17 AM, Darshit Shah dar...@gmail.com wrote: The --config option is detected just before the other options by running the same loop a little earlier. However, to same CPU cycles, we break out of the loop as soon as --config is identified. I have extended that loop so that it detects --no-config too and breaks out the moment either one of these is seen. Hence, only the first instance is acted upon, while the others are silently discarded. Thanks for making this point clear. I don't understand the reason for the --config exception and I would definitely implement it in a different way... but changing it will break compatibility. The reason for having the separate loop is as follows: In case the user wants to use a config file that is not /etc/wgetrc and ~/.wgetrc, then they may use the --config option. However, We tend to initialize the settings from these files before beginning to read the command line arguments. This would cause wrong settings to be read from the default config files. Hence, we run a loop to simply check for the --config option and then parse the given wgetrc file before parsing the rest of the options. I have simply added another check to that first loop. I too would like to implement this in a different manner. Do you have any ideas that we could use to implement this? There's a couple of implementation ideas I currently have, but I'll have to go back to the code and check its feasibility first. If it looks possible, I'll send a new RFC with the precise details of how I intend to implement this. -- Thanking You, Darshit Shah -- Thanking You, Darshit Shah
Re: [Bug-wget] General Testsuite issue
On Saturday 18 January 2014 23:51:07 Darshit Shah wrote: On Sat, Jan 18, 2014 at 2:05 AM, Tim Rühsen tim.rueh...@gmx.de wrote: Am Freitag, 17. Januar 2014, 11:42:41 schrieb Tony Lewis: Darshit Shah wrote: In case both the --config and --no-config commands are issued, the one that appears first on the command will be considered and the other ignored. Given my memory of the way the parsing loop works, I would expect that it would use the last one that appears. How do GNU commands usually handle multiple instances of a command option? Wget, as most tools, parse the argument from left to right, the second overwriting the first. Else (e.g. if arguments 'sum up'),it should be explicitly mentioned in the docs. True. While Wget's command parser loop would usually accept the last instance of the option, this is an exception. One of the chief reasons I RFC'ed this patch. OK, i missed that. The --config option is detected just before the other options by running the same loop a little earlier. However, to same CPU cycles, we break out of the loop as soon as --config is identified. I have extended that loop so that it detects --no-config too and breaks out the moment either one of these is seen. Hence, only the first instance is acted upon, while the others are silently discarded. Thanks for making this point clear. I don't understand the reason for the --config exception and I would definitely implement it in a different way... but changing it will break compatibility. So, the way you did it seems straight forward and correct. Tim
Re: [Bug-wget] General Testsuite issue
The --config option is detected just before the other options by running the same loop a little earlier. However, to same CPU cycles, we break out of the loop as soon as --config is identified. I have extended that loop so that it detects --no-config too and breaks out the moment either one of these is seen. Hence, only the first instance is acted upon, while the others are silently discarded. Thanks for making this point clear. I don't understand the reason for the --config exception and I would definitely implement it in a different way... but changing it will break compatibility. The reason for having the separate loop is as follows: In case the user wants to use a config file that is not /etc/wgetrc and ~/.wgetrc, then they may use the --config option. However, We tend to initialize the settings from these files before beginning to read the command line arguments. This would cause wrong settings to be read from the default config files. Hence, we run a loop to simply check for the --config option and then parse the given wgetrc file before parsing the rest of the options. I have simply added another check to that first loop. I too would like to implement this in a different manner. Do you have any ideas that we could use to implement this? There's a couple of implementation ideas I currently have, but I'll have to go back to the code and check its feasibility first. If it looks possible, I'll send a new RFC with the precise details of how I intend to implement this. -- Thanking You, Darshit Shah
Re: [Bug-wget] General Testsuite issue
Hi Tim, Yes, that is an eventuality I did not consider when writing this test. A user configured option that may override the settings intended for the test. Maybe we should have a --no-config option to ensure that the tests are run on the compiled defaults only. Having a uniform environment for tests is a must. I've attached a patch that introduces --no-config. However this is not yet complete and ready for merging. Most importantly, if accepted, I need to change a few comments and edit wget.texi explaining the new option. In case both the --config and --no-config commands are issued, the one that appears first on the command will be considered and the other ignored. On Fri, Jan 17, 2014 at 1:51 PM, Tim Ruehsen tim.rueh...@gmx.de wrote: Hi, I just discovered that Test--spider-r.py fails here (parallel-wget branch). The reason is 'robots=off' in ~./wgetrc. But that leads the the question, how can we prevent any user (and global /etc/wgetrc) settings dropping in, possibly making the tests fail. An option like --no-config could solve this issue with just a few lines of code. What do you think ? Tim -- Thanking You, Darshit Shah From 6ebf2a751635a83e6c1f8b7ce07b536bf0485e9c Mon Sep 17 00:00:00 2001 From: Darshit Shah dar...@gmail.com Date: Fri, 17 Jan 2014 15:16:38 +0530 Subject: [PATCH] Introduce --no-config. The wgetrc files will not be read In case of a conflict between --config and --no-config, the one that appears first will be considered and the other ignored. --- src/ChangeLog | 8 src/init.c| 3 ++- src/main.c| 15 --- src/options.h | 1 + 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/src/ChangeLog b/src/ChangeLog index 0ac1ac6..6b1b9b1 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,11 @@ +2014-01-17 Darshit Shah dar...@gmail.com + + * init.c (commands[]): Add --no-config + * options.h: Same + * main.c (option_data[]): Same + (print_help): Same + (main): If --no-config is set, then do not read the wgetrc files + 2014-01-05 Håkon Vågsether hauk...@gmail.com (tiny change) * http.c (http_loop): Fix checking the URL length when filename is diff --git a/src/init.c b/src/init.c index a64dabe..43d5ae9 100644 --- a/src/init.c +++ b/src/init.c @@ -160,7 +160,7 @@ static const struct { #ifdef ENABLE_DEBUG { debug,opt.debug, cmd_boolean }, #endif - { defaultpage, opt.default_page, cmd_string }, + { defaultpage, opt.default_page, cmd_string }, { deleteafter, opt.delete_after, cmd_boolean }, { dirprefix,opt.dir_prefix,cmd_directory }, { dirstruct,NULL, cmd_spec_dirstruct }, @@ -220,6 +220,7 @@ static const struct { { mirror, NULL, cmd_spec_mirror }, { netrc,opt.netrc, cmd_boolean }, { noclobber,opt.noclobber, cmd_boolean }, + { noconfig, opt.noconfig, cmd_boolean }, { noparent, opt.no_parent, cmd_boolean }, { noproxy, opt.no_proxy, cmd_vector }, { numtries, opt.ntry, cmd_number_inf },/* deprecated*/ diff --git a/src/main.c b/src/main.c index 19d7253..2aa961d 100644 --- a/src/main.c +++ b/src/main.c @@ -238,6 +238,7 @@ static struct cmdline_option option_data[] = { mirror, 'm', OPT_BOOLEAN, mirror, -1 }, { no, 'n', OPT__NO, NULL, required_argument }, { no-clobber, 0, OPT_BOOLEAN, noclobber, -1 }, +{ no-config, 0, OPT_BOOLEAN, noconfig, -1}, { no-parent, 0, OPT_BOOLEAN, noparent, -1 }, { output-document, 'O', OPT_VALUE, outputdocument, -1 }, { output-file, 'o', OPT_VALUE, logfile, -1 }, @@ -473,7 +474,9 @@ Logging and input file:\n), -B, --base=URLresolves HTML input-file links (-i -F)\n\ relative to URL.\n), N_(\ - --config=FILE Specify config file to use.\n), + --config=FILE Specify config file to use.\n), +N_(\ + --no-config Do not read any config file.\n), \n, N_(\ @@ -1045,6 +1048,7 @@ main (int argc, char **argv) longindex = -1; int retconf; bool use_userconfig = false; + bool noconfig = false; while ((retconf = getopt_long (argc, argv, short_options, long_options, longindex)) != -1) @@ -1057,7 +1061,12 @@ main (int argc, char **argv) { confval = long_options[longindex].val; config_opt = option_data[confval ~BOOLEAN_NEG_MARKER]; - if (strcmp (config_opt-long_name, config) == 0) + if (strcmp (config_opt-long_name, no-config) == 0) +{ + noconfig = true; + break; +} + else if (strcmp (config_opt-long_name, config) == 0) { bool userrc_ret = true; userrc_ret = run_wgetrc (optarg); @@ -1074,7 +1083,7 @@ main (int
Re: [Bug-wget] General Testsuite issue
Darshit Shah wrote: In case both the --config and --no-config commands are issued, the one that appears first on the command will be considered and the other ignored. Given my memory of the way the parsing loop works, I would expect that it would use the last one that appears. How do GNU commands usually handle multiple instances of a command option? Tony
Re: [Bug-wget] General Testsuite issue
Am Freitag, 17. Januar 2014, 11:42:41 schrieb Tony Lewis: Darshit Shah wrote: In case both the --config and --no-config commands are issued, the one that appears first on the command will be considered and the other ignored. Given my memory of the way the parsing loop works, I would expect that it would use the last one that appears. How do GNU commands usually handle multiple instances of a command option? Wget, as most tools, parse the argument from left to right, the second overwriting the first. Else (e.g. if arguments 'sum up'),it should be explicitly mentioned in the docs. Tim signature.asc Description: This is a digitally signed message part.