According to Sean Downey:
> Thanks a million Gilles
> 
> if there was a patch for this - I think & hope - all the problems I've had
> getting it going would be solved
...
> -----Original Message-----
> From: Gilles Detillieux [mailto:grdetil@;scrc.umanitoba.ca]
...
> comes close to what Sean is looking for.  Back in 1999, when Rajendra
> Inamdar first offered us a patch for this feature, it was for 3.1.3.
> See http://www.htdig.org/htdig-dev/1999/11/0181.html
> 
> However, at the time this was too radical a change to incorporate into
> the 3.1.x tree, so we suggested putting it into 3.2 instead.  In the end,
> Rajendra ended up porting the patch to 3.2, and that was the one that
> made it into the CVS tree (for 3.2) and into the mail archives.
> 
> However, 3.1.x users continued to request this feature, so at some point
> it made it into the patch archive...
> 
>   ftp://ftp.ccsf.org/htdig-patches/3.1.3/htdig-3.1.3-nntp-mdb.tar.gz
> 
> The patch also adds a couple other features, and it may take some work to
> adapt it to 3.1.6, but it's a start.  If you don't want the nntp support
> in htdig, I think you can remove everything from the patch other than
> the changes to htsearch/* source files.  There may be some complications
> due to interactions with the addition of max_excerpts handling in 3.1.6,
> as well as the changes involving coded vs unencoded URLs.  There may also
> be a few other tricky bits in main() because of recent changes there.
> I'll see if I can clean up this patch a little.

OK, I cleaned it up a lot!  There were lots of changes between 3.1.3
and 3.1.6, so it took quite a bit longer than I thought it would to
get this adapted and working on 3.1.6.  But, I did it.  I also removed
some of the superfluous stuff like the "search policy" additions and the
whole collection_names attribute which seemed to cause too much confusion
(build_select_lists does a better job of generating the followup config
file list than collection_names did).  I also put it a hook to get rid
of duplicate URLs, which happens if the same URL winds up in two or
more collections.  (I'll have to port this to 3.2 as well!)

So, here is the patch for 3.1.6 below.  Apply it in your main htdig-3.1.6
source directory using "patch -p0 < this-message".  The patch is designed
to be applied after you've already run "./configure" in the source.

To enable it, #define COLLECTIONS in include/htconfig.h (this patch does
that by default, but it gets undone after another run of ./configure).
To use it, set up a number of config files in your CONFIG_DIR, and have
them all begin with "include: htdig.conf" (or some other common config
file, in which you'll have all your attribute settings).  In each of
these config files, after the include above, override the settings of
database_dir (or database_base) and start_url, so that each config file
defines a separate database for a different collection of indexed URLs.
htsearch ends up using all of the display customization attributes from
the last selected config file, so for the sake of consistency, the config
files should share a common set of htsearch attributes.

To set up your initial search form, replace the "hidden" input parameter
definition for "config" with something like:

  Collections:
  <input type="checkbox" name="config" value="scrc" checked>SCRC Pages
  <input type="checkbox" name="config" value="physiology">Physiology Pages
  <input type="checkbox" name="config" value="wcsn">Winnipeg Chapter SFN Pages
  <br>

and in the followup search forms (in header.html, wrapper.html, nomatch.html
and syntax.html), replace it with...

  Collections:
  $(CONFIG_LIST)<br>

which would be generated by these attribute settings in htdig.conf:

  build_select_lists:   CONFIG_LIST,checkbox config config_names 2 1 2 "" ""
  config_names:         scrc "SCRC Pages" physiology "Physiology Pages" \
                        wcsn "Winnipeg Chapter SFN Pages"

Here's the patch...

--- acconfig.h.nocoll   Thu Jan 31 17:47:18 2002
+++ acconfig.h  Mon Nov 11 16:44:14 2002
@@ -40,6 +40,10 @@
 /*  regardless of the security problems with this. */
 #undef ALLOW_INSECURE_CGI_CONFIG
 
+/* Define this if you want to allow htsearch to use collections by taking */
+/*  multiple "config" CGI input parameters. */
+#undef COLLECTIONS
+
 /* Define to remove the word count in db and WordRef struct. */
 #undef NO_WORD_COUNT
 
--- htsearch/Collection.cc.nocoll       Mon Nov 11 14:07:43 2002
+++ htsearch/Collection.cc      Mon Nov 11 16:37:26 2002
@@ -0,0 +1,88 @@
+//
+// Collection.cc
+//
+//
+#if RELEASE
+static char RCSid[] = "$Id: Collection.cc,v 1.0 2000/03/17 18:34:23 inamdar Exp $";
+#endif
+
+#include "htsearch.h"
+#include "Collection.h"
+#include "ResultMatch.h"
+#include "WeightWord.h"
+#include "StringMatch.h"
+#include "QuotedStringList.h"
+#include "URL.h"
+#include <fstream.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <syslog.h>
+#include <locale.h>
+#include "HtURLCodec.h"
+#include "HtWordType.h"
+
+#ifdef COLLECTIONS
+//*****************************************************************************
+//
+Collection::Collection(char *name, char *word_file, char *index_file, 
+    char *doc_file)
+{
+    isopen = 0;
+    collectionName = name;
+    wordFile = word_file;
+    indexFile = index_file;
+    docFile = doc_file;
+    docIndex = NULL;
+    matches = NULL;
+    searchWords = NULL;
+    searchWordsPattern = NULL;
+}
+
+Collection::~Collection()
+{
+    Close();
+}
+
+void
+Collection::Open()
+{
+    if (!isopen)
+    {
+        docIndex = Database::getDatabaseInstance();
+        docIndex->OpenRead(indexFile);
+        docDB.SetCompatibility(config.Boolean("uncoded_db_compatible", 1));
+        docDB.Read(docFile);
+    }
+    isopen = 1;
+}
+
+void
+Collection::Close()
+{
+    if (isopen)
+    {
+        docDB.Close();
+        docIndex->Close();
+        docIndex = NULL;
+    }
+    isopen = 0;
+}
+
+// Collection::operator [] (char *u) 
+
+DocumentRef *
+Collection::getDocumentRef(char *u)
+{
+    Open();
+    return docDB.FindCoded(u);
+    // return docDB[u];
+}
+
+int
+Collection::Get(char *key, String &data)
+{
+    Open();
+    return docIndex->Get(key, data);
+}
+
+#endif
--- htsearch/Collection.h.nocoll        Mon Nov 11 14:07:43 2002
+++ htsearch/Collection.h       Mon Nov 11 16:36:26 2002
@@ -0,0 +1,68 @@
+//
+// Collection.h
+//
+// $Id: Collection.h,v 1.0 2000/03/17 18:34:23 inamdar Exp $
+//
+#ifndef _Collection_h_
+#define _Collection_h_
+
+#include "Object.h"
+#include "ResultList.h"
+#include "ResultMatch.h"
+#include "TemplateList.h"
+#include "cgi.h"
+#include "StringMatch.h"
+#include "List.h"
+#include "DocumentDB.h"
+#include "Database.h"
+#include "Dictionary.h"
+
+#ifdef COLLECTIONS
+class Collection : public Object
+{
+public:
+    //
+    // Construction/Destruction
+    //
+    Collection(char *name, char *wordFile, char *indexFile, char *docFile);
+    ~Collection();
+
+    void Collection::Open();
+
+    void Collection::Close(); 
+
+    char *getWordFile() { return wordFile.get(); }
+
+    // DocumentRef         *operator [] (char *url);
+    DocumentRef         *getDocumentRef(char *url);
+    int                Get(char *key, String &data);
+
+    ResultList         *getResultList() { return matches; }
+    void               setResultList(ResultList *list) { matches = list; }
+
+    List                *getSearchWords() { return searchWords; }
+    void                setSearchWords(List *list) { searchWords = list; }
+
+    StringMatch         *getSearchWordsPattern() { return searchWordsPattern;}
+    void                setSearchWordsPattern(StringMatch *smatch)
+                            { searchWordsPattern = smatch; }
+
+protected:
+    String              collectionName;
+    String              wordFile;
+    String              indexFile;
+    String              docFile;
+    ResultList         *matches;
+    List                *searchWords;
+    StringMatch         *searchWordsPattern;
+    
+
+    DocumentDB          docDB;
+    Database            *docIndex;     
+
+    int                 isopen;
+};
+
+#endif // COLLECTIONS
+#endif // _Collection_h_
+
--- htsearch/Display.cc.nocoll  Thu Jan 31 17:47:18 2002
+++ htsearch/Display.cc Mon Nov 11 20:01:28 2002
@@ -10,6 +10,9 @@ static char RCSid[] = "$Id: Display.cc,v
 #endif
 
 #include "htsearch.h"
+#ifdef COLLECTIONS
+#include "Collection.h"
+#endif
 #include "Display.h"
 #include "ResultMatch.h"
 #include "WeightWord.h"
@@ -29,6 +32,11 @@ extern int           debug;
 
 //*****************************************************************************
 //
+#ifdef COLLECTIONS
+Display::Display(Dictionary *collections)
+{
+    active_collections = collections;
+#else
 Display::Display(char *indexFile, char *docFile)
 {
     docIndex = Database::getDatabaseInstance();
@@ -39,6 +47,7 @@ Display::Display(char *indexFile, char *
     docDB.SetCompatibility(config.Boolean("uncoded_db_compatible", 1));
 
     docDB.Read(docFile);
+#endif
 
     limitTo = 0;
     excludeFrom = 0;
@@ -87,7 +96,9 @@ Display::Display(char *indexFile, char *
 //*****************************************************************************
 Display::~Display()
 {
+#ifndef COLLECTIONS
     delete docIndex;
+#endif
 }
 
 //*****************************************************************************
@@ -179,7 +190,12 @@ Display::display(int pageNumber)
     {
        if (currentMatch >= startAt)
        {
+#ifdef COLLECTIONS
+           Collection *collection = match->getCollection();
+           match->setRef(collection->getDocumentRef(match->getURL()));
+#else
            match->setRef(docDB.FindCoded(match->getURL()));
+#endif
            DocumentRef *ref = match->getRef();
            if (!ref)
                continue;       // The document isn't present for some reason
@@ -266,7 +282,11 @@ Display::displayMatch(ResultMatch *match
     String urlanchor(url);
     if (anchor)
       urlanchor << anchor;
+#ifdef COLLECTIONS
+    vars.Add("EXCERPT", excerpt(match, urlanchor, fanchor, first));
+#else
     vars.Add("EXCERPT", excerpt(ref, urlanchor, fanchor, first));
+#endif
     //
     // anchor only relevant if an excerpt was found, i.e.,
     // the search expression matches the body of the document
@@ -699,8 +719,35 @@ Display::createURL(String &url, int page
        url << "restrict=" << encodeInput("restrict") << ';';
     if (input->exists("exclude"))
        url << "exclude=" << encodeInput("exclude") << ';';
+
+#ifdef COLLECTIONS
+    // RMI
+    // Put out all specified collections. If none selected, resort to
+    // default behaviour
+    char *config_name = collectionList[0];
+    if (config_name && config_name[0] == '\0')
+       config_name = NULL;
+
+    if (config_name)
+    {
+       for (int i=0; i<collectionList.Count(); i++)
+       {
+           config_name = collectionList[i];
+           s = config_name;
+           encodeURL(s);
+           url << "config=" << s.get() << ';';
+       }
+    }
+    else
+    {
+       if (input->exists("config"))
+           url << "config=" << encodeInput("config") << ';';
+    }
+#else
     if (input->exists("config"))
        url << "config=" << encodeInput("config") << ';';
+#endif
+
     if (input->exists("method"))
        url << "method=" << encodeInput("method") << ';';
     if (input->exists("format"))
@@ -1358,13 +1405,32 @@ Display::buildMatchList()
 
     // ... MG
 
+#ifdef COLLECTIONS
+  // RMI: deal with all collections
+  active_collections->Start_Get();
+  Collection *collection;
+  while ((collection=(Collection *)active_collections->Get_NextElement()) != 0)
+  {
+    ResultList *results = collection->getResultList();
+    if (results == NULL)
+       continue;
+#endif
+
     results->Start_Get();
     while ((id = results->Get_Next()))
     {
        //
        // Convert the ID to a URL
        //
+#ifdef COLLECTIONS
+       DocMatch *dm = results->find(id);
+       Collection *collection = NULL;
+       if (dm)
+           collection = dm->collection;
+       if (collection == NULL || collection->Get(id, coded_url) == NOTOK)
+#else
        if (docIndex->Get(id, coded_url) == NOTOK)
+#endif
        {
            continue;
        }
@@ -1382,6 +1448,9 @@ Display::buildMatchList()
        thisMatch = new ResultMatch();
        thisMatch->setURL(coded_url);
        thisMatch->setRef(NULL);
+#ifdef COLLECTIONS
+       thisMatch->setCollection(collection);
+#endif
 
        //
        // Get the actual document record into the current ResultMatch
@@ -1394,7 +1463,9 @@ Display::buildMatchList()
        // known at that time, or info about the document itself, 
        // so this still needs to be done.
        //
+#ifndef COLLECTIONS
        DocMatch        *dm = results->find(id);
+#endif
        double           score = dm->score;
 
        // We need to scale based on date relevance and backlinks
@@ -1409,7 +1480,13 @@ Display::buildMatchList()
        if (date_factor != 0.0 || backlink_factor != 0.0 || typ != SortByScore
            || timet_startdate > 0 || enddate.tm_year < endoftime->tm_year)
          {
+#ifdef COLLECTIONS
+           Collection *collection = thisMatch->getCollection();
+           DocumentRef *thisRef = collection->getDocumentRef(
+                                       thisMatch->getURL());
+#else
            DocumentRef *thisRef = docDB.FindCoded(thisMatch->getURL());
+#endif
            if (thisRef)   // We better hope it's not null!
              {
                // code added by Mike Grommet for date search ranges
@@ -1452,6 +1529,9 @@ Display::buildMatchList()
        //
        matches->Add(thisMatch);
     }
+#ifdef COLLECTIONS
+  }
+#endif
 
     //
     // The matches need to be ordered by relevance level.
@@ -1464,8 +1544,14 @@ Display::buildMatchList()
 
 //*****************************************************************************
 String *
+#ifdef COLLECTIONS
+Display::excerpt(ResultMatch *match, String urlanchor, int fanchor, int &first)
+{
+    DocumentRef        *ref = match->getRef();
+#else
 Display::excerpt(DocumentRef *ref, String urlanchor, int fanchor, int &first)
 {
+#endif
     char       *head;
     int                use_meta_description = 0;
 
@@ -1486,6 +1572,15 @@ Display::excerpt(DocumentRef *ref, Strin
     String     part;
     String     *text = new String();
 
+#ifdef COLLECTIONS
+    Collection *collection = match->getCollection();
+    StringMatch *allWordsPattern = NULL;
+    if (collection)
+       allWordsPattern = collection->getSearchWordsPattern();
+    if (!allWordsPattern)
+       return text;
+#endif
+
     // htsearch displays the description when:
     // 1) a description has been found
     // 2) the option "use_meta_description" is set to true
@@ -1544,20 +1639,32 @@ Display::excerpt(DocumentRef *ref, Strin
        if (end > temp + headLength)
        {
            end = temp + headLength;
+#ifdef COLLECTIONS
+           *text << hilight(match, start, urlanchor, fanchor);
+#else
            *text << hilight(start, urlanchor, fanchor);
+#endif
        }
        else
        {
            while (*end && HtIsStrictWordChar(*end))
                end++;
            *end = '\0';
+#ifdef COLLECTIONS
+           *text << hilight(match, start, urlanchor, fanchor);
+#else
            *text << hilight(start, urlanchor, fanchor);
+#endif
            *text << config["end_ellipses"];
        }
     }
     else
     {
+#ifdef COLLECTIONS
+      *text = buildExcerpts( match, allWordsPattern, head, urlanchor, fanchor );
+#else
       *text = buildExcerpts( head, urlanchor, fanchor );
+#endif
     }
 
     return text;
@@ -1567,7 +1674,11 @@ Display::excerpt(DocumentRef *ref, Strin
 // Handle cases where multiple document excerpts are requested.
 //
 const String
+#ifdef COLLECTIONS
+Display::buildExcerpts( ResultMatch *match, StringMatch *allWordsPattern, char *head, 
+String urlanchor, int fanchor )
+#else
 Display::buildExcerpts( char *head, String urlanchor, int fanchor )
+#endif
 {
   if ( !config.Boolean( "add_anchors_to_excerpt" ) )
   {
@@ -1630,7 +1741,11 @@ Display::buildExcerpts( char *head, Stri
     {
       end = head + headLength;
 
+#ifdef COLLECTIONS
+      text << hilight(match, start, urlanchor, fanchor);
+#else
       text << hilight( start, urlanchor, fanchor );
+#endif
     }
     else
     {
@@ -1644,7 +1759,11 @@ Display::buildExcerpts( char *head, Stri
 
       *end = '\0';
 
+#ifdef COLLECTIONS
+      text << hilight(match, start, urlanchor, fanchor);
+#else
       text << hilight(start, urlanchor, fanchor);
+#endif
       text << config["end_ellipses"];
 
       *end = endChar;
@@ -1660,7 +1779,11 @@ Display::buildExcerpts( char *head, Stri
 
 //*****************************************************************************
 char *
+#ifdef COLLECTIONS
+Display::hilight(ResultMatch *match, char *str, String urlanchor, int fanchor)
+#else
 Display::hilight(char *str, String urlanchor, int fanchor)
+#endif
 {
     static char                *start_highlight = config["start_highlight"];
     static char                *end_highlight = config["end_highlight"];
@@ -1672,6 +1795,19 @@ Display::hilight(char *str, String urlan
     int                        first = 1;
 
     result = 0;
+#ifdef COLLECTIONS
+    Collection *collection = match->getCollection();
+    StringMatch *allWordsPattern = NULL;
+    List *searchWords = NULL;
+    if (collection)
+    {
+       allWordsPattern = collection->getSearchWordsPattern();
+       searchWords = collection->getSearchWords();
+    }
+    if (!allWordsPattern || !searchWords)
+       return result;
+#endif
+
     while (allWordsPattern->hasPattern() &&
           (pos = allWordsPattern->FindFirstWord(str, which, length)) >= 0)
     {
@@ -1718,6 +1854,28 @@ Display::sort(List *matches)
          (typ == SortByTime) ? Display::compareTime :
          Display::compare);
 
+#ifdef COLLECTIONS
+    // In case there are duplicate URLs across collections, keep "best" ones
+    // after sorting them.
+    Dictionary goturl;
+    String     url;
+    char       *coded_url;
+    int                j = 0;
+    for (i = 0; i < numberOfMatches; i++)
+    {
+       coded_url = array[i]->getURL();
+       String url = HtURLCodec::instance()->decode(coded_url);
+       HtURLRewriter::instance()->Replace(url);
+       if (goturl.Exists(url))
+           delete array[i];
+       else
+       {
+           array[j++] = array[i];
+           goturl.Add(url, 0);
+       }
+    }
+    numberOfMatches = j;
+#endif
     char       *st = config["sort"];
     if (st && *st && mystrncasecmp("rev", st, 3) == 0)
     {
--- htsearch/Display.h.nocoll   Thu Jan 31 17:47:18 2002
+++ htsearch/Display.h  Mon Nov 11 17:24:46 2002
@@ -25,18 +25,26 @@ public:
     //
     // Construction/Destruction
     //
+#ifdef COLLECTIONS
+    Display(Dictionary *active_collections);
+#else
     Display(char *indexFile, char *docFile);
+#endif
     ~Display();
 
     void               setStartTemplate(char *templateName);
     void               setMatchTemplate(char *templateName);
     void               setEndTemplate(char *templateName);
        
+#ifndef COLLECTIONS
     void               setResults(ResultList *results);
     void               setSearchWords(List *searchWords);
+#endif
     void               setLimit(StringMatch *);
     void               setExclude(StringMatch *);
+#ifndef COLLECTIONS
     void               setAllWordsPattern(StringMatch *);
+#endif
     void               setLogicalWords(char *);
     void               setOriginalWords(char *);
     void               setCGI(cgi *);
@@ -59,6 +67,12 @@ public:
     SortType           sortType();
 
 protected:
+#ifdef COLLECTIONS
+    //
+    // The list of search result collections.
+    //
+    Dictionary         *active_collections;
+#else
     //
     // The list of search results.
     //
@@ -78,6 +92,7 @@ protected:
     // A list of words that we are searching for
     //
     List               *searchWords;
+#endif
 
     //
     // Pattern that all result URLs must match or exclude
@@ -88,7 +103,9 @@ protected:
     //
     // Pattern of all the words
     //
+#ifndef COLLECTIONS
     StringMatch                *allWordsPattern;
+#endif
        
     //
     // Variables for substitution into text are stored in a dictionary
@@ -159,9 +176,15 @@ protected:
     String             *readFile(char *);
     void               expandVariables(char *);
     void               outputVariable(char *);
+#ifdef COLLECTIONS
+    String             *excerpt(ResultMatch *match, String urlanchor, int fanchor, 
+int &first);
+    const String        buildExcerpts( ResultMatch *match, StringMatch 
+*allWordsPattern, char *head, String urlanchor, int fanchor );
+    char               *hilight(ResultMatch *match, char *str, String urlanchor, int 
+fanchor);
+#else
     String             *excerpt(DocumentRef *ref, String urlanchor, int fanchor, int 
&first);
     const String        buildExcerpts( char *head, String urlanchor, int fanchor );
     char               *hilight(char *str, String urlanchor, int fanchor);
+#endif
     void               setupTemplates();
     void               setupImages();
     String             *generateStars(DocumentRef *, int);
@@ -184,23 +207,29 @@ Display::setExclude(StringMatch *exclude
     excludeFrom = exclude;
 }
 
+#ifndef COLLECTIONS
 inline void
 Display::setAllWordsPattern(StringMatch *pattern)
 {
     allWordsPattern = pattern;
 }
+#endif
 
+#ifndef COLLECTIONS
 inline void
 Display::setResults(ResultList *results)
 {
     this->results = results;
 }
+#endif
 
+#ifndef COLLECTIONS
 inline void
 Display::setSearchWords(List *searchWords)
 {
     this->searchWords = searchWords;
 }
+#endif
 
 inline void
 Display::setLogicalWords(char *s)
--- htsearch/DocMatch.cc.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/DocMatch.cc        Mon Nov 11 15:44:41 2002
@@ -20,6 +20,9 @@ static char RCSid[] = "$Id: DocMatch.cc,
 //
 DocMatch::DocMatch()
 {
+#ifdef COLLECTIONS
+    collection = NULL;
+#endif
 }
 
 
--- htsearch/DocMatch.h.nocoll  Thu Jan 31 17:47:18 2002
+++ htsearch/DocMatch.h Mon Nov 11 15:46:10 2002
@@ -13,6 +13,10 @@
 
 #include <Object.h>
 
+#ifdef COLLECTIONS
+class Collection;
+#endif
+
 class DocMatch : public Object
 {
 public:
@@ -22,6 +26,9 @@ public:
        float                   score;
        int                             id;
        int                             anchor;
+#ifdef COLLECTIONS
+       Collection              *collection;
+#endif
 };
 
 #endif
--- htsearch/htsearch.cc.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/htsearch.cc        Mon Nov 11 19:18:31 2002
@@ -15,6 +15,9 @@ static char RCSid[] = "$Id: htsearch.cc,
 #include "WeightWord.h"
 #include "parser.h"
 #include "Display.h"
+#ifdef COLLECTIONS
+#include "Collection.h"
+#endif
 #include "../htfuzzy/Fuzzy.h"
 #include "cgi.h"
 #include "WordRecord.h"
@@ -35,7 +38,11 @@ static char RCSid[] = "$Id: htsearch.cc,
 
 typedef void (*SIGNAL_HANDLER) (...);
 
+#ifdef COLLECTIONS
+void htsearch(Collection *, List &, Parser *);
+#else
 ResultList *htsearch(char *, List &, Parser *);
+#endif
 
 void setupWords(char *, List &, int, Parser *, String &);
 void createLogicalWords(List &, String &, String &);
@@ -49,6 +56,9 @@ int                   debug = 0;
 int                    minimum_word_length = 3;
 StringList             boolean_keywords;
 
+#ifdef COLLECTIONS
+StringList collectionList;
+#endif
 
 //*****************************************************************************
 // int main()
@@ -59,7 +69,11 @@ main(int ac, char **av)
     int                        c;
     extern char                *optarg;
     int                        override_config=0;
+#ifdef COLLECTIONS
+    List               *searchWords = NULL;
+#else
     List               searchWords;
+#endif
     String             configFile = DEFAULT_CONFIG_FILE;
     int                        pageNumber = 1;
     StringMatch                limit_to;
@@ -67,7 +81,12 @@ main(int ac, char **av)
     String             logicalWords;
     String              origPattern;
     String              logicalPattern;
+#ifdef COLLECTIONS
+    Dictionary         active_collections;
+    StringMatch                *searchWordsPattern = NULL;
+#else
     StringMatch                searchWordsPattern;
+#endif
     StringList         requiredWords;
     int                 i;
 
@@ -116,6 +135,36 @@ main(int ac, char **av)
     int                filenameok = (debug && getenv("REQUEST_METHOD") == 0);
     String     filenamemsg;
 
+#ifdef COLLECTIONS
+    if (input.exists("config"))
+       collectionList.Create(input["config"], "\001");
+
+    if (collectionList.Count() == 0)
+       collectionList.Add("");
+
+    char *errorMessage = NULL;
+    String       originalWords;
+
+  for (int cInd=0; errorMessage == NULL && cInd < collectionList.Count(); cInd++) 
+  { // RMI
+    // Each collection is handled in an iteration of this loop.
+    // Reset the following, so that each iteration starts with a
+    // clean state.
+    logicalWords = 0;
+    origPattern = 0;
+    logicalPattern = 0;
+    requiredWords.Release();
+    // searchWords.Release();
+    searchWords = new List;
+    // if (searchWordsPattern)
+    //     delete searchWordsPattern;
+    searchWordsPattern = new StringMatch;
+
+    char *config_name = collectionList[cInd];
+    if (config_name && config_name[0] == '\0')
+       config_name = NULL;
+#endif
+
     //
     // Setup the configuration database.  First we read the compiled defaults.
     // Then we override those with defaults read in from the configuration
@@ -125,8 +174,13 @@ main(int ac, char **av)
     config.Defaults(&defaults[0]);
     // To allow . in filename while still being 'secure',
     // e.g. htdig-f.q.d.n.conf
+#ifdef COLLECTIONS
+    if (!override_config && config_name 
+       && (strstr(config_name, "./") == NULL))
+#else
     if (!override_config && input.exists("config") 
        && (strstr(input["config"], "./") == NULL))
+#endif
     {
        char    *configDir = getenv("CONFIG_DIR");
        if (configDir)
@@ -137,10 +191,17 @@ main(int ac, char **av)
        {
            configFile = CONFIG_DIR;
        }
+#ifdef COLLECTIONS
+       if (config_name == NULL || strlen(config_name) == 0)
+         configFile = DEFAULT_CONFIG_FILE;
+       else
+         configFile << '/' << config_name << ".conf";
+#else
        if (strlen(input["config"]) == 0)
          configFile = DEFAULT_CONFIG_FILE;
        else
          configFile << '/' << input["config"] << ".conf";
+#endif
     }
     if (access(configFile, R_OK) < 0)
     {
@@ -268,6 +329,28 @@ main(int ac, char **av)
     // Parse the words to search for from the argument list.
     // This will produce a list of WeightWord objects.
     //
+#ifdef COLLECTIONS
+    originalWords = input["words"];
+    originalWords.chop(" \t\r\n");
+    setupWords(originalWords, *searchWords,
+              strcmp(config["match_method"], "boolean") == 0,
+              parser, origPattern);
+
+    //
+    // Convert the list of WeightWord objects to a pattern string
+    // that we can compile.
+    //
+    createLogicalWords(*searchWords, logicalWords, logicalPattern);
+
+    // 
+    // Assemble the full pattern for excerpt matching and highlighting
+    //
+    origPattern += logicalPattern;
+    searchWordsPattern->IgnoreCase();
+    searchWordsPattern->IgnorePunct();
+    searchWordsPattern->Pattern(logicalPattern);       // this should now be enough
+    //searchWordsPattern->Pattern(origPattern);
+#else
     String      originalWords = input["words"];
     originalWords.chop(" \t\r\n");
     setupWords(originalWords, searchWords,
@@ -288,6 +371,7 @@ main(int ac, char **av)
     searchWordsPattern.IgnorePunct();
     searchWordsPattern.Pattern(logicalPattern);        // this should now be enough
     //searchWordsPattern.Pattern(origPattern);
+#endif
     //if (debug > 2)
     //  cout << "Excerpt pattern: " << origPattern << "\n";
 
@@ -298,7 +382,11 @@ main(int ac, char **av)
     //
     if (requiredWords.Count() > 0)
     {
+#ifdef COLLECTIONS
+       addRequiredWords(*searchWords, requiredWords);
+#else
        addRequiredWords(searchWords, requiredWords);
+#endif
     }
     
     //
@@ -313,7 +401,9 @@ main(int ac, char **av)
        reportError(form("Unable to read word database file%s\nDid you run htmerge?",
                         filenamemsg.get()));
     }
+#ifndef COLLECTIONS
     ResultList *results = htsearch(word_db, searchWords, parser);
+#endif
 
     String     index = config["doc_index"];
     if (access(index, R_OK) < 0)
@@ -330,7 +420,27 @@ main(int ac, char **av)
                         filenamemsg.get()));
     }
 
+#ifdef COLLECTIONS
+    Collection *collection = new Collection(configFile, word_db, index, doc_db);
+    htsearch(collection, *searchWords, parser);
+    collection->setSearchWords(searchWords);
+    collection->setSearchWordsPattern(searchWordsPattern);
+    active_collections.Add(configFile, collection);
+
+    if (parser->hadError())
+    {
+       errorMessage = parser->getErrorMessage();
+       errorMessage = strdup(errorMessage);
+    }
+
+    delete parser;
+    boolean_keywords.Destroy();
+  } // RMI
+
+    Display    display(&active_collections);
+#else
     Display    display(index, doc_db);
+#endif
     if (display.hasTemplateError())
       {
        if (filenameok) filenamemsg << " '" << config["template_name"] << "'";
@@ -339,13 +449,23 @@ main(int ac, char **av)
        return 0;
       }
     display.setOriginalWords(originalWords);
+#ifndef COLLECTIONS
     display.setResults(results);
     display.setSearchWords(&searchWords);
+#endif
     display.setLimit(&limit_to);
     display.setExclude(&exclude_these);
+#ifndef COLLECTIONS
     display.setAllWordsPattern(&searchWordsPattern);
+#endif
     display.setCGI(&input);
     display.setLogicalWords(logicalWords);
+#ifdef COLLECTIONS
+    if (errorMessage)
+       display.displaySyntaxError(errorMessage);
+    else
+       display.display(pageNumber);
+#else
     if (parser->hadError())
        display.displaySyntaxError(parser->getErrorMessage());
     else
@@ -353,6 +473,7 @@ main(int ac, char **av)
 
     delete results;
     delete parser;
+#endif
     return 0;
 }
 
@@ -702,14 +823,27 @@ convertToBoolean(List &words)
 //   This returns a dictionary indexed by document ID and containing a
 //   List of WordReference objects.
 //
+#ifdef COLLECTIONS
+void
+htsearch(Collection *collection, List &searchWords, Parser *parser)
+#else
 ResultList *
 htsearch(char *wordfile, List &searchWords, Parser *parser)
+#endif
 {
     //
     // Pick the database type we are going to use
     //
     ResultList *matches = new ResultList;
     if (searchWords.Count() > 0)
+#ifdef COLLECTIONS
+    {
+       parser->setCollection(collection);
+       parser->parse(&searchWords, *matches);
+       parser->setCollection(NULL);
+    }
+    collection->setResultList(matches);
+#else
     {
        Database        *dbf = Database::getDatabaseInstance();
 
@@ -722,6 +856,7 @@ htsearch(char *wordfile, List &searchWor
     }
        
     return matches;
+#endif
 }
 
 
--- htsearch/htsearch.h.nocoll  Thu Jan 31 17:47:18 2002
+++ htsearch/htsearch.h Mon Nov 11 16:16:17 2002
@@ -38,6 +38,9 @@ extern Database               *dbf;
 extern String          logicalWords;
 extern String          originalWords;
 
+#ifdef COLLECTIONS
+extern StringList      collectionList;
+#endif
 
 #endif
 
--- htsearch/Makefile.in.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/Makefile.in        Mon Nov 11 14:09:38 2002
@@ -9,7 +9,7 @@ include $(top_builddir)/Makefile.config
 
 OBJS=          Display.o DocMatch.o ResultList.o ResultMatch.o \
                Template.o TemplateList.o WeightWord.o htsearch.o \
-               parser.o
+               parser.o Collection.o
 
 FOBJS=         $(top_builddir)/htfuzzy/libfuzzy.a
 TARGET=                htsearch
--- htsearch/Makefile.nocoll    Fri Feb  1 16:58:46 2002
+++ htsearch/Makefile   Mon Nov 11 17:11:23 2002
@@ -9,7 +9,7 @@ include $(top_builddir)/Makefile.config
 
 OBJS=          Display.o DocMatch.o ResultList.o ResultMatch.o \
                Template.o TemplateList.o WeightWord.o htsearch.o \
-               parser.o
+               parser.o Collection.o
 
 FOBJS=         $(top_builddir)/htfuzzy/libfuzzy.a
 TARGET=                htsearch
--- htsearch/parser.cc.nocoll   Thu Jan 31 17:47:18 2002
+++ htsearch/parser.cc  Mon Nov 11 15:42:37 2002
@@ -11,6 +11,11 @@ static char RCSid[] = "$Id: parser.cc,v 
 #include "parser.h"
 #include "QuotedStringList.h"
 
+#ifdef COLLECTIONS
+#include "Collection.h"
+#include "htsearch.h"
+#endif
+
 #define        WORD    1000
 #define        DONE    1001
 
@@ -462,6 +467,9 @@ Parser::parse(List *tokenList, ResultLis
     for (int i = 0; i < elements->Count(); i++)
     {
        dm = (DocMatch *) (*elements)[i];
+#ifdef COLLECTIONS
+       dm->collection = collection;
+#endif
        resultMatches.add(dm);
     }
     elements->Release();
@@ -469,3 +477,25 @@ Parser::parse(List *tokenList, ResultLis
     delete elements;
     delete result;
 }
+
+#ifdef COLLECTIONS
+void
+Parser::setCollection(Collection *coll)
+{
+    if (coll)
+    {
+       dbf = Database::getDatabaseInstance();
+       dbf->OpenRead(coll->getWordFile());
+    }
+    else
+    {
+       if (dbf)
+       {
+           dbf->Close();
+           delete dbf;
+           dbf = NULL;
+       }
+    }
+    collection = coll;
+}
+#endif
--- htsearch/parser.h.nocoll    Thu Jan 31 17:47:18 2002
+++ htsearch/parser.h   Mon Nov 11 17:27:08 2002
@@ -23,7 +23,11 @@ public:
     int                        checkSyntax(List *);
     void               parse(List *, ResultList &);
 
+#ifdef COLLECTIONS
+    void               setCollection(Collection *collection);
+#else
     void               setDatabase(Database *db)       {dbf = db;}
+#endif
     char               *getErrorMessage()              {return error.get();}
     int                        hadError()                      {return valid == 0;}
        
@@ -46,6 +50,9 @@ protected:
     int                        valid;
     Stack              stack;
     Database           *dbf;
+#ifdef COLLECTIONS
+    Collection         *collection;
+#endif
     String             error;
 };
 
--- htsearch/ResultMatch.h.nocoll       Thu Jan 31 17:47:18 2002
+++ htsearch/ResultMatch.h      Mon Nov 11 16:19:30 2002
@@ -21,6 +21,9 @@
 #include <htString.h>
 
 class DocumentRef;
+#ifdef COLLECTIONS
+class Collection;
+#endif
 
 class ResultMatch : public Object
 {
@@ -44,12 +47,20 @@ public:
        char                    *getURL()                                       
{return url;}
        DocumentRef             *getRef()                                       
{return ref;}
 
+#ifdef COLLECTIONS
+       void            setCollection(Collection *coll) { collection = coll; }
+       Collection      *getCollection() { return collection; }
+#endif
+
 private:
        float                   score;
        int                             incomplete;
        int                             anchor;
        String                  url;
        DocumentRef             *ref;
+#ifdef COLLECTIONS
+       Collection              *collection;
+#endif
 };
 
 #endif
--- include/htconfig.h.in.nocoll        Thu Jan 31 17:47:18 2002
+++ include/htconfig.h.in       Mon Nov 11 16:46:17 2002
@@ -132,6 +132,10 @@
 /*  regardless of the security problems with this. */
 #undef ALLOW_INSECURE_CGI_CONFIG
 
+/* Define this if you want to allow htsearch to use collections by taking */
+/*  multiple "config" CGI input parameters. */
+#undef COLLECTIONS
+
 /* Define to remove the word count in db and WordRef struct. */
 #undef NO_WORD_COUNT
 
--- include/htconfig.h.nocoll   Fri Feb  1 16:58:46 2002
+++ include/htconfig.h  Mon Nov 11 16:46:28 2002
@@ -133,6 +133,10 @@
 /*  regardless of the security problems with this. */
 /* #undef ALLOW_INSECURE_CGI_CONFIG */
 
+/* Define this if you want to allow htsearch to use collections by taking */
+/*  multiple "config" CGI input parameters. */
+#define COLLECTIONS 1
+
 /* Define to remove the word count in db and WordRef struct. */
 /* #undef NO_WORD_COUNT */
 

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to