On Tue, 29 Jan 2002, Geoff Hutchison wrote:
> I'm not sure why you'd want htmerge and not htpurge in this case (not
> to mention htdump and htload). I would think that in a library,
> you're less likely to need to merge whole databases together and more
> likely to want to delete URLs, etc.
This is the first pass, thought I'd throw it out for comment. The
merging of databases was useful to me, before I realized that htdig
could do the incremental adding/merging on its own. ;-)
Definitely purge, dump, load and the other utilities are useful
and I'll add them ASAP.
> > sprintf(htdig_params.configFile, "/etc/htdig/htdig.conf");
> > strcpy(htdig_params.credentials,"");
> > strcpy(htdig_params.max_hops, ""); //9 digit limit
> > strcpy(htdig_params.minimalFile, "");
> > strcpy(htdig_params.URL, ""); //stdin HTTP addrs
>
> Maybe this is just me, but I don't see how this is any more useful
> than using brackets (e.g. in the defaults.cc code) or simply:
> config["authorization"] = "";
> config["max_hops"] = "-1";
> etc.
Nope, I just wrote something very quick that had a direct mapping
to the command line parameters available to htmerge & htdig. More than
one way to skin a cat.
I like libraries that have one main API header &
documentation... lower initial startup time.
> >to mix and match them. Currently the parser classes receive a Retriever
> >object as a parameter and issue callback-style calls to the Retriever
> >object.
>
> This seems a little strange to me. Clearly you could do this, but
> since the Parser and Retriever classes are tied tightly in any model,
> if you would have to write completely new Parsers for any new
> Retriever you wrote.
Point taken..The virtual base class would be more of a skeleton
for future developers.. a kind of spec for what methods need to be
defined.. virtual void functions, etc. The current Retriever is highly
build around the idea of webpages and HTTP (of course)... you could write
a retriever to get docs directly out of a database, from a file, scp, POP,
via parameters, etc. Here are some new parser ideas:
unix-style mail files,
XML files (given a spec.. see XSLT),
flat-file databases,
usenet files,
netscape/IE/Other bookmark files,
other document formats.. (via a separate library with StarOffice 6.0 importing code)
For now, I was thinking of writing an additional method for the
Parsers that will accept as a parameter the alternative
Retriever. Basically this retriever will be given a document ID, a
title, a string of 'meta data' and a string representing the
document. Very general purpose & very similar, it's just not a HTML
document, but does have some structure.
htdig_index_open(&htdig_params);
//start loop
htdig_index_document(.....);
//end loop
htdig_index_close();
> It would be more elegant to have htsearch code that supported such an
> API. At the moment, Torsten has done what he can do--massage the
> output from htsearch. This is one reason for a new query parser. (And
> yes, the new query parser that Quim donated could be styled to have
> different query syntax if you want.)
Great. I'll forward the wrapper as I get them written and
tested. Torsten's page/code is good, no knock on him.
We've got lots of experience writing these wrappers. It actually
ends up being a separate shared library (libhtdigphp.so) that receives
parameters and calls libhtdig.so functions. This is necessary because you
don't want to have the main libhtdig.so be dependent on PHP header files
to compile.
I haven't gotten in too deep in the htsearch code, but for now,
I'll be just repackaging data and breaking-up/reusing existing
functions. When the new query parser is integrated I'll change the
PHP wrappers to support that.
Again, as I write and test this stuff I'll forward .tgz files with
a script to do the setup and diff-ing. Feel free to use it or pipe it to
/dev/null if all you want is a web-crawling search engine. ;-)
Thanks.
--
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev