Hi Gilles
thanks a million for that
unfortunately - so far - I haven't managed to get it going
Am I right doing the following steps:
1. run ./configure on the source directory
2. run the patch
- I don't know much about patching but it seemed to give funny results
portion of the output
----------------------------------------------------------------------------
---------------------
Patching file htsearch/DocMatch.cc using Plan A...
Hunk #1 succeeded at 20.
Hmm... The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- htsearch/DocMatch.h.nocoll Thu Jan 31 17:47:18 2002
|+++ htsearch/DocMatch.h Mon Nov 11 15:46:10 2002
--------------------------
----------------------------------------------------------------------------
---------------------
I checked the includes/htconfig.h file after this and COLLECTIONS was
defined so I continued
3. gmake
4. gmake install
5. indexing with htdig etc etc etc
you mentioned another run of ./configure - should I have done this after the
patch (2a) and then done the #define COLLECTIONS (2b) ??
or did I make a mistake in the patch??
if what I did above is correct then I must have messed up with the
htdig.conf and the other .conf files
here is what I think is supposed to happen
REQUEST_METHOD=GET
QUERY_STRING="words=Ireland&format=htdig&config=i_i_00_01&config=i_i_00_10&m
atchesperpage=10&method=or&page=1&sort=score&startday=01&endday=12&startmont
h=01&endmonth=11&startyear=1997&endyear=2002"
/usr/local/htdig-3.1.6a/bin/../cgi-bin/htsearch
Content-type: text/html
<html><head><title>htsearch error</title></head>
<body bgcolor="#ffffff">
<h1>ht://Dig error</h1>
<p>htsearch detected an error. Please report this to the
webmaster of this site. The error message is:</p>
<pre>
Unable to read configuration file
</pre>
</body></html>
again - your help would be greatfully received
Sean
-----Original Message-----
From: Gilles Detillieux [mailto:grdetil@;scrc.umanitoba.ca]
Sent: Tuesday, November 12, 2002 3:08 AM
To: [EMAIL PROTECTED]
Cc: ht://Dig mailing list
Subject: Re: [htdig] PATCH - collections for 3.1.6 (was: HTMerge
/mifluz)
According to Sean Downey:
> Thanks a million Gilles
>
> if there was a patch for this - I think & hope - all the problems I've had
> getting it going would be solved
...
> -----Original Message-----
> From: Gilles Detillieux [mailto:grdetil@;scrc.umanitoba.ca]
...
> comes close to what Sean is looking for. Back in 1999, when Rajendra
> Inamdar first offered us a patch for this feature, it was for 3.1.3.
> See http://www.htdig.org/htdig-dev/1999/11/0181.html
>
> However, at the time this was too radical a change to incorporate into
> the 3.1.x tree, so we suggested putting it into 3.2 instead. In the end,
> Rajendra ended up porting the patch to 3.2, and that was the one that
> made it into the CVS tree (for 3.2) and into the mail archives.
>
> However, 3.1.x users continued to request this feature, so at some point
> it made it into the patch archive...
>
> ftp://ftp.ccsf.org/htdig-patches/3.1.3/htdig-3.1.3-nntp-mdb.tar.gz
>
> The patch also adds a couple other features, and it may take some work to
> adapt it to 3.1.6, but it's a start. If you don't want the nntp support
> in htdig, I think you can remove everything from the patch other than
> the changes to htsearch/* source files. There may be some complications
> due to interactions with the addition of max_excerpts handling in 3.1.6,
> as well as the changes involving coded vs unencoded URLs. There may also
> be a few other tricky bits in main() because of recent changes there.
> I'll see if I can clean up this patch a little.
OK, I cleaned it up a lot! There were lots of changes between 3.1.3
and 3.1.6, so it took quite a bit longer than I thought it would to
get this adapted and working on 3.1.6. But, I did it. I also removed
some of the superfluous stuff like the "search policy" additions and the
whole collection_names attribute which seemed to cause too much confusion
(build_select_lists does a better job of generating the followup config
file list than collection_names did). I also put it a hook to get rid
of duplicate URLs, which happens if the same URL winds up in two or
more collections. (I'll have to port this to 3.2 as well!)
So, here is the patch for 3.1.6 below. Apply it in your main htdig-3.1.6
source directory using "patch -p0 < this-message". The patch is designed
to be applied after you've already run "./configure" in the source.
To enable it, #define COLLECTIONS in include/htconfig.h (this patch does
that by default, but it gets undone after another run of ./configure).
To use it, set up a number of config files in your CONFIG_DIR, and have
them all begin with "include: htdig.conf" (or some other common config
file, in which you'll have all your attribute settings). In each of
these config files, after the include above, override the settings of
database_dir (or database_base) and start_url, so that each config file
defines a separate database for a different collection of indexed URLs.
htsearch ends up using all of the display customization attributes from
the last selected config file, so for the sake of consistency, the config
files should share a common set of htsearch attributes.
To set up your initial search form, replace the "hidden" input parameter
definition for "config" with something like:
Collections:
<input type="checkbox" name="config" value="scrc" checked>SCRC Pages
<input type="checkbox" name="config" value="physiology">Physiology Pages
<input type="checkbox" name="config" value="wcsn">Winnipeg Chapter SFN
Pages
<br>
and in the followup search forms (in header.html, wrapper.html, nomatch.html
and syntax.html), replace it with...
Collections:
$(CONFIG_LIST)<br>
which would be generated by these attribute settings in htdig.conf:
build_select_lists: CONFIG_LIST,checkbox config config_names 2 1 2 "" ""
config_names: scrc "SCRC Pages" physiology "Physiology Pages" \
wcsn "Winnipeg Chapter SFN Pages"
Here's the patch...
--- acconfig.h.nocoll Thu Jan 31 17:47:18 2002
+++ acconfig.h Mon Nov 11 16:44:14 2002
@@ -40,6 +40,10 @@
/* regardless of the security problems with this. */
#undef ALLOW_INSECURE_CGI_CONFIG
+/* Define this if you want to allow htsearch to use collections by taking
*/
+/* multiple "config" CGI input parameters. */
+#undef COLLECTIONS
+
/* Define to remove the word count in db and WordRef struct. */
#undef NO_WORD_COUNT
--- htsearch/Collection.cc.nocoll Mon Nov 11 14:07:43 2002
+++ htsearch/Collection.cc Mon Nov 11 16:37:26 2002
@@ -0,0 +1,88 @@
+//
+// Collection.cc
+//
+//
+#if RELEASE
+static char RCSid[] = "$Id: Collection.cc,v 1.0 2000/03/17 18:34:23 inamdar
Exp $";
+#endif
+
+#include "htsearch.h"
+#include "Collection.h"
+#include "ResultMatch.h"
+#include "WeightWord.h"
+#include "StringMatch.h"
+#include "QuotedStringList.h"
+#include "URL.h"
+#include <fstream.h>
+#include <stdio.h>
+#include <ctype.h>
+#include <syslog.h>
+#include <locale.h>
+#include "HtURLCodec.h"
+#include "HtWordType.h"
+
+#ifdef COLLECTIONS
+//*************************************************************************
****
+//
+Collection::Collection(char *name, char *word_file, char *index_file,
+ char *doc_file)
+{
+ isopen = 0;
+ collectionName = name;
+ wordFile = word_file;
+ indexFile = index_file;
+ docFile = doc_file;
+ docIndex = NULL;
+ matches = NULL;
+ searchWords = NULL;
+ searchWordsPattern = NULL;
+}
+
+Collection::~Collection()
+{
+ Close();
+}
+
+void
+Collection::Open()
+{
+ if (!isopen)
+ {
+ docIndex = Database::getDatabaseInstance();
+ docIndex->OpenRead(indexFile);
+ docDB.SetCompatibility(config.Boolean("uncoded_db_compatible", 1));
+ docDB.Read(docFile);
+ }
+ isopen = 1;
+}
+
+void
+Collection::Close()
+{
+ if (isopen)
+ {
+ docDB.Close();
+ docIndex->Close();
+ docIndex = NULL;
+ }
+ isopen = 0;
+}
+
+// Collection::operator [] (char *u)
+
+DocumentRef *
+Collection::getDocumentRef(char *u)
+{
+ Open();
+ return docDB.FindCoded(u);
+ // return docDB[u];
+}
+
+int
+Collection::Get(char *key, String &data)
+{
+ Open();
+ return docIndex->Get(key, data);
+}
+
+#endif
--- htsearch/Collection.h.nocoll Mon Nov 11 14:07:43 2002
+++ htsearch/Collection.h Mon Nov 11 16:36:26 2002
@@ -0,0 +1,68 @@
+//
+// Collection.h
+//
+// $Id: Collection.h,v 1.0 2000/03/17 18:34:23 inamdar Exp $
+//
+#ifndef _Collection_h_
+#define _Collection_h_
+
+#include "Object.h"
+#include "ResultList.h"
+#include "ResultMatch.h"
+#include "TemplateList.h"
+#include "cgi.h"
+#include "StringMatch.h"
+#include "List.h"
+#include "DocumentDB.h"
+#include "Database.h"
+#include "Dictionary.h"
+
+#ifdef COLLECTIONS
+class Collection : public Object
+{
+public:
+ //
+ // Construction/Destruction
+ //
+ Collection(char *name, char *wordFile, char *indexFile, char *docFile);
+ ~Collection();
+
+ void Collection::Open();
+
+ void Collection::Close();
+
+ char *getWordFile() { return wordFile.get(); }
+
+ // DocumentRef *operator [] (char *url);
+ DocumentRef *getDocumentRef(char *url);
+ int Get(char *key, String &data);
+
+ ResultList *getResultList() { return matches; }
+ void setResultList(ResultList *list) { matches = list; }
+
+ List *getSearchWords() { return searchWords; }
+ void setSearchWords(List *list) { searchWords = list; }
+
+ StringMatch *getSearchWordsPattern() { return
searchWordsPattern;}
+ void setSearchWordsPattern(StringMatch *smatch)
+ { searchWordsPattern = smatch; }
+
+protected:
+ String collectionName;
+ String wordFile;
+ String indexFile;
+ String docFile;
+ ResultList *matches;
+ List *searchWords;
+ StringMatch *searchWordsPattern;
+
+
+ DocumentDB docDB;
+ Database *docIndex;
+
+ int isopen;
+};
+
+#endif // COLLECTIONS
+#endif // _Collection_h_
+
--- htsearch/Display.cc.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/Display.cc Mon Nov 11 20:01:28 2002
@@ -10,6 +10,9 @@ static char RCSid[] = "$Id: Display.cc,v
#endif
#include "htsearch.h"
+#ifdef COLLECTIONS
+#include "Collection.h"
+#endif
#include "Display.h"
#include "ResultMatch.h"
#include "WeightWord.h"
@@ -29,6 +32,11 @@ extern int debug;
//**************************************************************************
***
//
+#ifdef COLLECTIONS
+Display::Display(Dictionary *collections)
+{
+ active_collections = collections;
+#else
Display::Display(char *indexFile, char *docFile)
{
docIndex = Database::getDatabaseInstance();
@@ -39,6 +47,7 @@ Display::Display(char *indexFile, char *
docDB.SetCompatibility(config.Boolean("uncoded_db_compatible", 1));
docDB.Read(docFile);
+#endif
limitTo = 0;
excludeFrom = 0;
@@ -87,7 +96,9 @@ Display::Display(char *indexFile, char *
//**************************************************************************
***
Display::~Display()
{
+#ifndef COLLECTIONS
delete docIndex;
+#endif
}
//**************************************************************************
***
@@ -179,7 +190,12 @@ Display::display(int pageNumber)
{
if (currentMatch >= startAt)
{
+#ifdef COLLECTIONS
+ Collection *collection = match->getCollection();
+ match->setRef(collection->getDocumentRef(match->getURL()));
+#else
match->setRef(docDB.FindCoded(match->getURL()));
+#endif
DocumentRef *ref = match->getRef();
if (!ref)
continue; // The document isn't present for some reason
@@ -266,7 +282,11 @@ Display::displayMatch(ResultMatch *match
String urlanchor(url);
if (anchor)
urlanchor << anchor;
+#ifdef COLLECTIONS
+ vars.Add("EXCERPT", excerpt(match, urlanchor, fanchor, first));
+#else
vars.Add("EXCERPT", excerpt(ref, urlanchor, fanchor, first));
+#endif
//
// anchor only relevant if an excerpt was found, i.e.,
// the search expression matches the body of the document
@@ -699,8 +719,35 @@ Display::createURL(String &url, int page
url << "restrict=" << encodeInput("restrict") << ';';
if (input->exists("exclude"))
url << "exclude=" << encodeInput("exclude") << ';';
+
+#ifdef COLLECTIONS
+ // RMI
+ // Put out all specified collections. If none selected, resort to
+ // default behaviour
+ char *config_name = collectionList[0];
+ if (config_name && config_name[0] == '\0')
+ config_name = NULL;
+
+ if (config_name)
+ {
+ for (int i=0; i<collectionList.Count(); i++)
+ {
+ config_name = collectionList[i];
+ s = config_name;
+ encodeURL(s);
+ url << "config=" << s.get() << ';';
+ }
+ }
+ else
+ {
+ if (input->exists("config"))
+ url << "config=" << encodeInput("config") << ';';
+ }
+#else
if (input->exists("config"))
url << "config=" << encodeInput("config") << ';';
+#endif
+
if (input->exists("method"))
url << "method=" << encodeInput("method") << ';';
if (input->exists("format"))
@@ -1358,13 +1405,32 @@ Display::buildMatchList()
// ... MG
+#ifdef COLLECTIONS
+ // RMI: deal with all collections
+ active_collections->Start_Get();
+ Collection *collection;
+ while ((collection=(Collection *)active_collections->Get_NextElement())
!= 0)
+ {
+ ResultList *results = collection->getResultList();
+ if (results == NULL)
+ continue;
+#endif
+
results->Start_Get();
while ((id = results->Get_Next()))
{
//
// Convert the ID to a URL
//
+#ifdef COLLECTIONS
+ DocMatch *dm = results->find(id);
+ Collection *collection = NULL;
+ if (dm)
+ collection = dm->collection;
+ if (collection == NULL || collection->Get(id, coded_url) == NOTOK)
+#else
if (docIndex->Get(id, coded_url) == NOTOK)
+#endif
{
continue;
}
@@ -1382,6 +1448,9 @@ Display::buildMatchList()
thisMatch = new ResultMatch();
thisMatch->setURL(coded_url);
thisMatch->setRef(NULL);
+#ifdef COLLECTIONS
+ thisMatch->setCollection(collection);
+#endif
//
// Get the actual document record into the current ResultMatch
@@ -1394,7 +1463,9 @@ Display::buildMatchList()
// known at that time, or info about the document itself,
// so this still needs to be done.
//
+#ifndef COLLECTIONS
DocMatch *dm = results->find(id);
+#endif
double score = dm->score;
// We need to scale based on date relevance and backlinks
@@ -1409,7 +1480,13 @@ Display::buildMatchList()
if (date_factor != 0.0 || backlink_factor != 0.0 || typ != SortByScore
|| timet_startdate > 0 || enddate.tm_year < endoftime->tm_year)
{
+#ifdef COLLECTIONS
+ Collection *collection = thisMatch->getCollection();
+ DocumentRef *thisRef = collection->getDocumentRef(
+ thisMatch->getURL());
+#else
DocumentRef *thisRef = docDB.FindCoded(thisMatch->getURL());
+#endif
if (thisRef) // We better hope it's not null!
{
// code added by Mike Grommet for date search ranges
@@ -1452,6 +1529,9 @@ Display::buildMatchList()
//
matches->Add(thisMatch);
}
+#ifdef COLLECTIONS
+ }
+#endif
//
// The matches need to be ordered by relevance level.
@@ -1464,8 +1544,14 @@ Display::buildMatchList()
//**************************************************************************
***
String *
+#ifdef COLLECTIONS
+Display::excerpt(ResultMatch *match, String urlanchor, int fanchor, int
&first)
+{
+ DocumentRef *ref = match->getRef();
+#else
Display::excerpt(DocumentRef *ref, String urlanchor, int fanchor, int
&first)
{
+#endif
char *head;
int use_meta_description = 0;
@@ -1486,6 +1572,15 @@ Display::excerpt(DocumentRef *ref, Strin
String part;
String *text = new String();
+#ifdef COLLECTIONS
+ Collection *collection = match->getCollection();
+ StringMatch *allWordsPattern = NULL;
+ if (collection)
+ allWordsPattern = collection->getSearchWordsPattern();
+ if (!allWordsPattern)
+ return text;
+#endif
+
// htsearch displays the description when:
// 1) a description has been found
// 2) the option "use_meta_description" is set to true
@@ -1544,20 +1639,32 @@ Display::excerpt(DocumentRef *ref, Strin
if (end > temp + headLength)
{
end = temp + headLength;
+#ifdef COLLECTIONS
+ *text << hilight(match, start, urlanchor, fanchor);
+#else
*text << hilight(start, urlanchor, fanchor);
+#endif
}
else
{
while (*end && HtIsStrictWordChar(*end))
end++;
*end = '\0';
+#ifdef COLLECTIONS
+ *text << hilight(match, start, urlanchor, fanchor);
+#else
*text << hilight(start, urlanchor, fanchor);
+#endif
*text << config["end_ellipses"];
}
}
else
{
+#ifdef COLLECTIONS
+ *text = buildExcerpts( match, allWordsPattern, head, urlanchor,
fanchor );
+#else
*text = buildExcerpts( head, urlanchor, fanchor );
+#endif
}
return text;
@@ -1567,7 +1674,11 @@ Display::excerpt(DocumentRef *ref, Strin
// Handle cases where multiple document excerpts are requested.
//
const String
+#ifdef COLLECTIONS
+Display::buildExcerpts( ResultMatch *match, StringMatch *allWordsPattern,
char *head, String urlanchor, int fanchor )
+#else
Display::buildExcerpts( char *head, String urlanchor, int fanchor )
+#endif
{
if ( !config.Boolean( "add_anchors_to_excerpt" ) )
{
@@ -1630,7 +1741,11 @@ Display::buildExcerpts( char *head, Stri
{
end = head + headLength;
+#ifdef COLLECTIONS
+ text << hilight(match, start, urlanchor, fanchor);
+#else
text << hilight( start, urlanchor, fanchor );
+#endif
}
else
{
@@ -1644,7 +1759,11 @@ Display::buildExcerpts( char *head, Stri
*end = '\0';
+#ifdef COLLECTIONS
+ text << hilight(match, start, urlanchor, fanchor);
+#else
text << hilight(start, urlanchor, fanchor);
+#endif
text << config["end_ellipses"];
*end = endChar;
@@ -1660,7 +1779,11 @@ Display::buildExcerpts( char *head, Stri
//**************************************************************************
***
char *
+#ifdef COLLECTIONS
+Display::hilight(ResultMatch *match, char *str, String urlanchor, int
fanchor)
+#else
Display::hilight(char *str, String urlanchor, int fanchor)
+#endif
{
static char *start_highlight = config["start_highlight"];
static char *end_highlight = config["end_highlight"];
@@ -1672,6 +1795,19 @@ Display::hilight(char *str, String urlan
int first = 1;
result = 0;
+#ifdef COLLECTIONS
+ Collection *collection = match->getCollection();
+ StringMatch *allWordsPattern = NULL;
+ List *searchWords = NULL;
+ if (collection)
+ {
+ allWordsPattern = collection->getSearchWordsPattern();
+ searchWords = collection->getSearchWords();
+ }
+ if (!allWordsPattern || !searchWords)
+ return result;
+#endif
+
while (allWordsPattern->hasPattern() &&
(pos = allWordsPattern->FindFirstWord(str, which, length)) >= 0)
{
@@ -1718,6 +1854,28 @@ Display::sort(List *matches)
(typ == SortByTime) ? Display::compareTime :
Display::compare);
+#ifdef COLLECTIONS
+ // In case there are duplicate URLs across collections, keep "best"
ones
+ // after sorting them.
+ Dictionary goturl;
+ String url;
+ char *coded_url;
+ int j = 0;
+ for (i = 0; i < numberOfMatches; i++)
+ {
+ coded_url = array[i]->getURL();
+ String url = HtURLCodec::instance()->decode(coded_url);
+ HtURLRewriter::instance()->Replace(url);
+ if (goturl.Exists(url))
+ delete array[i];
+ else
+ {
+ array[j++] = array[i];
+ goturl.Add(url, 0);
+ }
+ }
+ numberOfMatches = j;
+#endif
char *st = config["sort"];
if (st && *st && mystrncasecmp("rev", st, 3) == 0)
{
--- htsearch/Display.h.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/Display.h Mon Nov 11 17:24:46 2002
@@ -25,18 +25,26 @@ public:
//
// Construction/Destruction
//
+#ifdef COLLECTIONS
+ Display(Dictionary *active_collections);
+#else
Display(char *indexFile, char *docFile);
+#endif
~Display();
void setStartTemplate(char *templateName);
void setMatchTemplate(char *templateName);
void setEndTemplate(char *templateName);
+#ifndef COLLECTIONS
void setResults(ResultList *results);
void setSearchWords(List *searchWords);
+#endif
void setLimit(StringMatch *);
void setExclude(StringMatch *);
+#ifndef COLLECTIONS
void setAllWordsPattern(StringMatch *);
+#endif
void setLogicalWords(char *);
void setOriginalWords(char *);
void setCGI(cgi *);
@@ -59,6 +67,12 @@ public:
SortType sortType();
protected:
+#ifdef COLLECTIONS
+ //
+ // The list of search result collections.
+ //
+ Dictionary *active_collections;
+#else
//
// The list of search results.
//
@@ -78,6 +92,7 @@ protected:
// A list of words that we are searching for
//
List *searchWords;
+#endif
//
// Pattern that all result URLs must match or exclude
@@ -88,7 +103,9 @@ protected:
//
// Pattern of all the words
//
+#ifndef COLLECTIONS
StringMatch *allWordsPattern;
+#endif
//
// Variables for substitution into text are stored in a dictionary
@@ -159,9 +176,15 @@ protected:
String *readFile(char *);
void expandVariables(char *);
void outputVariable(char *);
+#ifdef COLLECTIONS
+ String *excerpt(ResultMatch *match, String urlanchor, int fanchor, int
&first);
+ const String buildExcerpts( ResultMatch *match, StringMatch
*allWordsPattern, char *head, String urlanchor, int fanchor );
+ char *hilight(ResultMatch *match, char *str, String urlanchor, int
fanchor);
+#else
String *excerpt(DocumentRef *ref, String urlanchor, int fanchor, int
&first);
const String buildExcerpts( char *head, String urlanchor, int
fanchor );
char *hilight(char *str, String urlanchor, int fanchor);
+#endif
void setupTemplates();
void setupImages();
String *generateStars(DocumentRef *, int);
@@ -184,23 +207,29 @@ Display::setExclude(StringMatch *exclude
excludeFrom = exclude;
}
+#ifndef COLLECTIONS
inline void
Display::setAllWordsPattern(StringMatch *pattern)
{
allWordsPattern = pattern;
}
+#endif
+#ifndef COLLECTIONS
inline void
Display::setResults(ResultList *results)
{
this->results = results;
}
+#endif
+#ifndef COLLECTIONS
inline void
Display::setSearchWords(List *searchWords)
{
this->searchWords = searchWords;
}
+#endif
inline void
Display::setLogicalWords(char *s)
--- htsearch/DocMatch.cc.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/DocMatch.cc Mon Nov 11 15:44:41 2002
@@ -20,6 +20,9 @@ static char RCSid[] = "$Id: DocMatch.cc,
//
DocMatch::DocMatch()
{
+#ifdef COLLECTIONS
+ collection = NULL;
+#endif
}
--- htsearch/DocMatch.h.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/DocMatch.h Mon Nov 11 15:46:10 2002
@@ -13,6 +13,10 @@
#include <Object.h>
+#ifdef COLLECTIONS
+class Collection;
+#endif
+
class DocMatch : public Object
{
public:
@@ -22,6 +26,9 @@ public:
float score;
int id;
int anchor;
+#ifdef COLLECTIONS
+ Collection *collection;
+#endif
};
#endif
--- htsearch/htsearch.cc.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/htsearch.cc Mon Nov 11 19:18:31 2002
@@ -15,6 +15,9 @@ static char RCSid[] = "$Id: htsearch.cc,
#include "WeightWord.h"
#include "parser.h"
#include "Display.h"
+#ifdef COLLECTIONS
+#include "Collection.h"
+#endif
#include "../htfuzzy/Fuzzy.h"
#include "cgi.h"
#include "WordRecord.h"
@@ -35,7 +38,11 @@ static char RCSid[] = "$Id: htsearch.cc,
typedef void (*SIGNAL_HANDLER) (...);
+#ifdef COLLECTIONS
+void htsearch(Collection *, List &, Parser *);
+#else
ResultList *htsearch(char *, List &, Parser *);
+#endif
void setupWords(char *, List &, int, Parser *, String &);
void createLogicalWords(List &, String &, String &);
@@ -49,6 +56,9 @@ int debug = 0;
int minimum_word_length = 3;
StringList boolean_keywords;
+#ifdef COLLECTIONS
+StringList collectionList;
+#endif
//**************************************************************************
***
// int main()
@@ -59,7 +69,11 @@ main(int ac, char **av)
int c;
extern char *optarg;
int override_config=0;
+#ifdef COLLECTIONS
+ List *searchWords = NULL;
+#else
List searchWords;
+#endif
String configFile = DEFAULT_CONFIG_FILE;
int pageNumber = 1;
StringMatch limit_to;
@@ -67,7 +81,12 @@ main(int ac, char **av)
String logicalWords;
String origPattern;
String logicalPattern;
+#ifdef COLLECTIONS
+ Dictionary active_collections;
+ StringMatch *searchWordsPattern = NULL;
+#else
StringMatch searchWordsPattern;
+#endif
StringList requiredWords;
int i;
@@ -116,6 +135,36 @@ main(int ac, char **av)
int filenameok = (debug && getenv("REQUEST_METHOD") == 0);
String filenamemsg;
+#ifdef COLLECTIONS
+ if (input.exists("config"))
+ collectionList.Create(input["config"], "\001");
+
+ if (collectionList.Count() == 0)
+ collectionList.Add("");
+
+ char *errorMessage = NULL;
+ String originalWords;
+
+ for (int cInd=0; errorMessage == NULL && cInd < collectionList.Count();
cInd++)
+ { // RMI
+ // Each collection is handled in an iteration of this loop.
+ // Reset the following, so that each iteration starts with a
+ // clean state.
+ logicalWords = 0;
+ origPattern = 0;
+ logicalPattern = 0;
+ requiredWords.Release();
+ // searchWords.Release();
+ searchWords = new List;
+ // if (searchWordsPattern)
+ // delete searchWordsPattern;
+ searchWordsPattern = new StringMatch;
+
+ char *config_name = collectionList[cInd];
+ if (config_name && config_name[0] == '\0')
+ config_name = NULL;
+#endif
+
//
// Setup the configuration database. First we read the compiled
defaults.
// Then we override those with defaults read in from the configuration
@@ -125,8 +174,13 @@ main(int ac, char **av)
config.Defaults(&defaults[0]);
// To allow . in filename while still being 'secure',
// e.g. htdig-f.q.d.n.conf
+#ifdef COLLECTIONS
+ if (!override_config && config_name
+ && (strstr(config_name, "./") == NULL))
+#else
if (!override_config && input.exists("config")
&& (strstr(input["config"], "./") == NULL))
+#endif
{
char *configDir = getenv("CONFIG_DIR");
if (configDir)
@@ -137,10 +191,17 @@ main(int ac, char **av)
{
configFile = CONFIG_DIR;
}
+#ifdef COLLECTIONS
+ if (config_name == NULL || strlen(config_name) == 0)
+ configFile = DEFAULT_CONFIG_FILE;
+ else
+ configFile << '/' << config_name << ".conf";
+#else
if (strlen(input["config"]) == 0)
configFile = DEFAULT_CONFIG_FILE;
else
configFile << '/' << input["config"] << ".conf";
+#endif
}
if (access(configFile, R_OK) < 0)
{
@@ -268,6 +329,28 @@ main(int ac, char **av)
// Parse the words to search for from the argument list.
// This will produce a list of WeightWord objects.
//
+#ifdef COLLECTIONS
+ originalWords = input["words"];
+ originalWords.chop(" \t\r\n");
+ setupWords(originalWords, *searchWords,
+ strcmp(config["match_method"], "boolean") == 0,
+ parser, origPattern);
+
+ //
+ // Convert the list of WeightWord objects to a pattern string
+ // that we can compile.
+ //
+ createLogicalWords(*searchWords, logicalWords, logicalPattern);
+
+ //
+ // Assemble the full pattern for excerpt matching and highlighting
+ //
+ origPattern += logicalPattern;
+ searchWordsPattern->IgnoreCase();
+ searchWordsPattern->IgnorePunct();
+ searchWordsPattern->Pattern(logicalPattern); // this should now be
enough
+ //searchWordsPattern->Pattern(origPattern);
+#else
String originalWords = input["words"];
originalWords.chop(" \t\r\n");
setupWords(originalWords, searchWords,
@@ -288,6 +371,7 @@ main(int ac, char **av)
searchWordsPattern.IgnorePunct();
searchWordsPattern.Pattern(logicalPattern); // this should now be
enough
//searchWordsPattern.Pattern(origPattern);
+#endif
//if (debug > 2)
// cout << "Excerpt pattern: " << origPattern << "\n";
@@ -298,7 +382,11 @@ main(int ac, char **av)
//
if (requiredWords.Count() > 0)
{
+#ifdef COLLECTIONS
+ addRequiredWords(*searchWords, requiredWords);
+#else
addRequiredWords(searchWords, requiredWords);
+#endif
}
//
@@ -313,7 +401,9 @@ main(int ac, char **av)
reportError(form("Unable to read word database file%s\nDid you run
htmerge?",
filenamemsg.get()));
}
+#ifndef COLLECTIONS
ResultList *results = htsearch(word_db, searchWords, parser);
+#endif
String index = config["doc_index"];
if (access(index, R_OK) < 0)
@@ -330,7 +420,27 @@ main(int ac, char **av)
filenamemsg.get()));
}
+#ifdef COLLECTIONS
+ Collection *collection = new Collection(configFile, word_db, index,
doc_db);
+ htsearch(collection, *searchWords, parser);
+ collection->setSearchWords(searchWords);
+ collection->setSearchWordsPattern(searchWordsPattern);
+ active_collections.Add(configFile, collection);
+
+ if (parser->hadError())
+ {
+ errorMessage = parser->getErrorMessage();
+ errorMessage = strdup(errorMessage);
+ }
+
+ delete parser;
+ boolean_keywords.Destroy();
+ } // RMI
+
+ Display display(&active_collections);
+#else
Display display(index, doc_db);
+#endif
if (display.hasTemplateError())
{
if (filenameok) filenamemsg << " '" << config["template_name"] << "'";
@@ -339,13 +449,23 @@ main(int ac, char **av)
return 0;
}
display.setOriginalWords(originalWords);
+#ifndef COLLECTIONS
display.setResults(results);
display.setSearchWords(&searchWords);
+#endif
display.setLimit(&limit_to);
display.setExclude(&exclude_these);
+#ifndef COLLECTIONS
display.setAllWordsPattern(&searchWordsPattern);
+#endif
display.setCGI(&input);
display.setLogicalWords(logicalWords);
+#ifdef COLLECTIONS
+ if (errorMessage)
+ display.displaySyntaxError(errorMessage);
+ else
+ display.display(pageNumber);
+#else
if (parser->hadError())
display.displaySyntaxError(parser->getErrorMessage());
else
@@ -353,6 +473,7 @@ main(int ac, char **av)
delete results;
delete parser;
+#endif
return 0;
}
@@ -702,14 +823,27 @@ convertToBoolean(List &words)
// This returns a dictionary indexed by document ID and containing a
// List of WordReference objects.
//
+#ifdef COLLECTIONS
+void
+htsearch(Collection *collection, List &searchWords, Parser *parser)
+#else
ResultList *
htsearch(char *wordfile, List &searchWords, Parser *parser)
+#endif
{
//
// Pick the database type we are going to use
//
ResultList *matches = new ResultList;
if (searchWords.Count() > 0)
+#ifdef COLLECTIONS
+ {
+ parser->setCollection(collection);
+ parser->parse(&searchWords, *matches);
+ parser->setCollection(NULL);
+ }
+ collection->setResultList(matches);
+#else
{
Database *dbf = Database::getDatabaseInstance();
@@ -722,6 +856,7 @@ htsearch(char *wordfile, List &searchWor
}
return matches;
+#endif
}
--- htsearch/htsearch.h.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/htsearch.h Mon Nov 11 16:16:17 2002
@@ -38,6 +38,9 @@ extern Database *dbf;
extern String logicalWords;
extern String originalWords;
+#ifdef COLLECTIONS
+extern StringList collectionList;
+#endif
#endif
--- htsearch/Makefile.in.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/Makefile.in Mon Nov 11 14:09:38 2002
@@ -9,7 +9,7 @@ include $(top_builddir)/Makefile.config
OBJS= Display.o DocMatch.o ResultList.o ResultMatch.o \
Template.o TemplateList.o WeightWord.o htsearch.o \
- parser.o
+ parser.o Collection.o
FOBJS= $(top_builddir)/htfuzzy/libfuzzy.a
TARGET= htsearch
--- htsearch/Makefile.nocoll Fri Feb 1 16:58:46 2002
+++ htsearch/Makefile Mon Nov 11 17:11:23 2002
@@ -9,7 +9,7 @@ include $(top_builddir)/Makefile.config
OBJS= Display.o DocMatch.o ResultList.o ResultMatch.o \
Template.o TemplateList.o WeightWord.o htsearch.o \
- parser.o
+ parser.o Collection.o
FOBJS= $(top_builddir)/htfuzzy/libfuzzy.a
TARGET= htsearch
--- htsearch/parser.cc.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/parser.cc Mon Nov 11 15:42:37 2002
@@ -11,6 +11,11 @@ static char RCSid[] = "$Id: parser.cc,v
#include "parser.h"
#include "QuotedStringList.h"
+#ifdef COLLECTIONS
+#include "Collection.h"
+#include "htsearch.h"
+#endif
+
#define WORD 1000
#define DONE 1001
@@ -462,6 +467,9 @@ Parser::parse(List *tokenList, ResultLis
for (int i = 0; i < elements->Count(); i++)
{
dm = (DocMatch *) (*elements)[i];
+#ifdef COLLECTIONS
+ dm->collection = collection;
+#endif
resultMatches.add(dm);
}
elements->Release();
@@ -469,3 +477,25 @@ Parser::parse(List *tokenList, ResultLis
delete elements;
delete result;
}
+
+#ifdef COLLECTIONS
+void
+Parser::setCollection(Collection *coll)
+{
+ if (coll)
+ {
+ dbf = Database::getDatabaseInstance();
+ dbf->OpenRead(coll->getWordFile());
+ }
+ else
+ {
+ if (dbf)
+ {
+ dbf->Close();
+ delete dbf;
+ dbf = NULL;
+ }
+ }
+ collection = coll;
+}
+#endif
--- htsearch/parser.h.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/parser.h Mon Nov 11 17:27:08 2002
@@ -23,7 +23,11 @@ public:
int checkSyntax(List *);
void parse(List *, ResultList &);
+#ifdef COLLECTIONS
+ void setCollection(Collection *collection);
+#else
void setDatabase(Database *db) {dbf = db;}
+#endif
char *getErrorMessage() {return error.get();}
int hadError() {return valid == 0;}
@@ -46,6 +50,9 @@ protected:
int valid;
Stack stack;
Database *dbf;
+#ifdef COLLECTIONS
+ Collection *collection;
+#endif
String error;
};
--- htsearch/ResultMatch.h.nocoll Thu Jan 31 17:47:18 2002
+++ htsearch/ResultMatch.h Mon Nov 11 16:19:30 2002
@@ -21,6 +21,9 @@
#include <htString.h>
class DocumentRef;
+#ifdef COLLECTIONS
+class Collection;
+#endif
class ResultMatch : public Object
{
@@ -44,12 +47,20 @@ public:
char *getURL()
{return url;}
DocumentRef *getRef()
{return ref;}
+#ifdef COLLECTIONS
+ void setCollection(Collection *coll) { collection = coll; }
+ Collection *getCollection() { return collection; }
+#endif
+
private:
float score;
int incomplete;
int anchor;
String url;
DocumentRef *ref;
+#ifdef COLLECTIONS
+ Collection *collection;
+#endif
};
#endif
--- include/htconfig.h.in.nocoll Thu Jan 31 17:47:18 2002
+++ include/htconfig.h.in Mon Nov 11 16:46:17 2002
@@ -132,6 +132,10 @@
/* regardless of the security problems with this. */
#undef ALLOW_INSECURE_CGI_CONFIG
+/* Define this if you want to allow htsearch to use collections by taking
*/
+/* multiple "config" CGI input parameters. */
+#undef COLLECTIONS
+
/* Define to remove the word count in db and WordRef struct. */
#undef NO_WORD_COUNT
--- include/htconfig.h.nocoll Fri Feb 1 16:58:46 2002
+++ include/htconfig.h Mon Nov 11 16:46:28 2002
@@ -133,6 +133,10 @@
/* regardless of the security problems with this. */
/* #undef ALLOW_INSECURE_CGI_CONFIG */
+/* Define this if you want to allow htsearch to use collections by taking
*/
+/* multiple "config" CGI input parameters. */
+#define COLLECTIONS 1
+
/* Define to remove the word count in db and WordRef struct. */
/* #undef NO_WORD_COUNT */
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html