accept/reject rules based on querysting

2008-10-21 Thread Gustavo Ayala
Any ideas about when this option (or an acceptable workaround) will be 
implemented ?
 
I need to include/exclude based on querysting (with regular expression of 
course). File name is not enough.
 
Thanks.
 
 


__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar


accept/reject rules based on querysting

2008-10-21 Thread Gustavo Ayala
Any ideas about when this option (or an acceptable workaround) will be 
implemented ?
 
I need to include/exclude based on querysting (with regular expression of 
course). File name is not enough.
 
Thanks.
 


__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar


Re: accept/reject rules based on querysting

2008-10-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Gustavo Ayala wrote:
 Any ideas about when this option (or an acceptable workaround) will be 
 implemented ?
  
 I need to include/exclude based on querysting (with regular expression of 
 course). File name is not enough.

I consider it an important feature, and currently expect to implement it
for 1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI/faT7M8hyUobTrERApXLAJsFFMsVcibgLlptVhJoMwZeLYg02wCfTLSs
ayyryt3wCnkwtAStESYp7cs=
=dB6e
-END PGP SIGNATURE-


Re: A/R matching against query strings

2008-10-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I sent the following last month but didn't get any feedback. I'm trying
one more time. :)
- -M

Micah Cowan wrote:
 On expanding current URI acc/rej matches to allow matching against query
 strings, I've been considering how we might enable/disable this
 functionality, with an eye toward backwards compatibility.
 
 It seems to me that one usable approach would be to require the ?
 query string to be an explicit part of rule, if it's expected to be
 matched against query strings. So -A .htm,.gif,*Action=edit* would all
 result in matches against the filename portion only, but -A
 '\?*Action=edit*' would look for Action=edit within the query-string
 portion. (The '\?' is necessary because otherwise '?' is a wildcard
 character; [?] would also work.)
 
 The disadvantage of that technique is that it's harder to specify that a
 given string should be checked _anywhere_, regardless of whether it
 falls in the filename or query-string portion; but I can't think offhand
 of any realistic cases where that's actually useful. We could also
 supply a --match-queries option to turn on matching of wildcard rules
 for anywhere (non-wildcard suffix rules should still match only at the
 end of the filename portion).
 
 Another option is to use a separate -A-like option that does what -A
 does for filenames, but matches against query strings. I like this idea
 somewhat less.
 
 Thoughts?
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI/fhT7M8hyUobTrERAgvtAJ0daQEub5GS4EFc7BuGT0pG1E1n0wCgjbnx
zb1QK0suZx0woMauqfL0qZI=
=5mdh
-END PGP SIGNATURE-


RE: A/R matching against query strings

2008-10-21 Thread Tony Lewis
Micah Cowan wrote:

 On expanding current URI acc/rej matches to allow matching against query
 strings, I've been considering how we might enable/disable this
 functionality, with an eye toward backwards compatibility.

What about something like --match-type=TYPE (with accepted values of all,
hash, path, search)?

For the URL http://www.domain.com/path/to/name.html?a=true#content

all would match against the entire string
hash would match against content
path would match against path/to/name.html
search would match against a=true

For backward compatibility the default should be --match-type=path.

I thought about having host as an option, but that duplicates another
option.

Tony



Re: A/R matching against query strings

2008-10-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tony Lewis wrote:
 Micah Cowan wrote:
 
 On expanding current URI acc/rej matches to allow matching against query
 strings, I've been considering how we might enable/disable this
 functionality, with an eye toward backwards compatibility.
 
 What about something like --match-type=TYPE (with accepted values of all,
 hash, path, search)?
 
 For the URL http://www.domain.com/path/to/name.html?a=true#content
 
 all would match against the entire string
 hash would match against content
 path would match against path/to/name.html
 search would match against a=true
 
 For backward compatibility the default should be --match-type=path.
 
 I thought about having host as an option, but that duplicates another
 option.

As does path (up to the final /).

Would hash really be useful, ever? It's never part of the request to
the server, so it's really more context to the URL than a real part of
the URL, as far as requests go. Perhaps that sort of thing could best
wait for when we allow custom URL-parsers/filters.

Also, I don't like the name search overly much, as that's a very
limited description of the much more general use of query strings.

But differentiating between three or more different match types tilts me
much more strongly toward some sort of shorthand, like the explicit need
for \?; with three types, perhaps we'd just use some special prefix for
patterns to indicate which sort of match we want (:q: query strings,
:a: for all, or whatever), to save on prefix each different type of
match with --match-type (or just using all for everything).

OTOH, regex support is easy enough to add to Wget, now that we're using
gnulib; we could just leave wildcards the way they are, and introduce
regexes that match everything. Then query strings are '\?.*foo=bar' (or,
for the really pedantic, '\?([^?]*)?foo=bar([^?]*)?$')

That last one, though, highlights how cumbersome it is to do proper
matching against typical HTML form-generated query strings (it's not
really even possible with wildcards). Perhaps a more appropriate
pattern-matcher specifically for query strings would be a good idea.
It's probably enough to do something like --query-='action=Edit', where
there's an implied '\?([^?]*)?' before, and '([^?]*)?$' after.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI/qLZ7M8hyUobTrERAmRdAJsH+9p+mTafoxqeVOstTPKrZP31CACdECCa
vQ1lZnncrdHd8SSbXevK02Y=
=YC2A
-END PGP SIGNATURE-