At 11:16 PM -0500 3/18/2000, Geoff Hutchison wrote:
>On Sat, 18 Mar 2000, Vincent Bumgarner wrote:
>
> > I would really like regex capabilities in a number of places,
> > particularly in the restrict argument. Reading through the archives,
> > I see that this is sought after by many users. I also see that this
> > is in the 3.2 release. Is the 3.2 beta stable enough for general use?
>
>I certainly wouldn't recommend it for production use on an entire ISP of
>public sites. You're free to try out one of the latest snapshots (one will
>be rolling in a few hours--it's done automatically every Sunday at
>midnight Pacific time). <http://www.htdig.org/files/snapshots/>
>
>My current suggestion is to set it up on a test machine and try it out. If
>you do, please submit feedback, bug reports, etc.
I will do so.
>How do you need regex? Even in previous versions, the search forms allow
>OR'ing together patterns, which can provide a workaround.
At this point, I'm tying a whole bunch of restrict statements
together, which allows a user to search several sites at once. I
want to be able to let the user limit to particular file types as
well, which is an "and".
For instance:
restrict=www.uu.net&restrict=www.us.uu.net&restrict=www.uk.uu.net
which is what initially comes in from the client, then gets handed around as
restrict=www.uu.net|www.us.uu.net|www.uk.uu.net
This is fine most of the time.
It would be nice to be able to write
restrict=www\.(uu|us\.uu|uk\.uu)\.net.*\.pdf
or, at the worst
restrict=www.uu.net&restrict=www.us.uu.net&restrict=www.uk.uu.net
URL encoded, of course.
This isn't that big of a deal at this point, but it might be soon.
> > In the near future, I will need 2-byte character support. Is this in
> > the 3.2 release? If not, is it planned for inclusion in the near
> > future?
>
>It is not in the 3.2 codebase. It's certainly high on the priority list,
>but right now I don't know of anyone working on it. I've seen a few people
>ask about it and then disappear. Since there are a few decent
>Unicode/UTF-8 libraries out there, it should be fairly straight-forward.
>(No, I have my hands full and I'm not volunteering to do it.)
>
>It might be an interesting proposal to post on cosource.com or someplace
>similar.
Well, it doesn't sound like it could be implemented very quickly. It
doesn't sound like it would make sense to implement this in the 3.1
code, and it doesn't sound like the 3.2 code is coming very soon.
I'll put a request on cosource.com and see what happens.
I'll go ahead with implementation of 3.1.5, and we'll see how long it
is before the lack of unicode becomes a problem.
I'm also noticing horrible inconsistencies between the way the
different webmasters around the world implement meta tags. Some use
the same tags throughout the site, which is fairly useless. It would
be nice to be able to specify a lot of the settings by site, kind of
like the VIRTUALHOST sections in apache's config file.
For instance:
<SITE>
start_url: http://www.uu.net/
exclude_urls: /cgi-bin/ .cgi ?
server_aliases: www.uunet.net:80=www.uu.net:80
keywords_meta_tag_names: description keywords
use_meta_description: true
</SITE>
<SITE>
start_url: http://www.ca.uu.net/
exclude_urls: /cgi-bin/ .cgi ?
server_aliases: www.uunet.ca:80=www.uu.net.ca:80
keywords_meta_tag_names: description keywords
use_meta_description: true
</SITE>
<SITE>
start_url: http://www.ch.uu.net/
exclude_urls: /cgi-bin/ .cgi
keywords_meta_tag_names: none
use_meta_description: false
</SITE>
Please let me know if I'm missing a way to do this now.
Thanks again,
########
Vincent Bumgarner #
UUNET Global Marketing #
Webmaster, www.uu.net #
[EMAIL PROTECTED] #
v. 703-886-6460 #
##########
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.