Re: [htdig] scoring questions
On Mon, 8 Jan 2001, Daniel Naber wrote: Date: Mon, 8 Jan 2001 21:52:28 +0100 From: Daniel Naber [EMAIL PROTECTED] To: Jason Meyering [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [htdig] scoring questions On 2001-01-08 21:23, you wrote: three words. Is there any way to tell htsearch to automatically score pages containing all search words higher than pages that don't have all words? Yes, there's a one-line patch in the contributions section. I think I called it multiple-boost or so. ftp://ftp.ccsf.org/htdig-patches/3.1.5/multiboost.1 Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] problems building htdig on cygwin
On Sun, 26 Nov 2000, Gerrit P. Haase wrote: Date: Sun, 26 Nov 2000 18:06:01 +0100 From: "Gerrit P. Haase" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] problems building htdig on cygwin Hi there, got problems building htdig on cygwin, first was: some tabs were missing in the makefile, which is dubious. second one is a missing header: nl_types.h, anyone knows, what is missing at my system? Where is this header included? There is nothing mentioned at the requirements, what i did not have. I had the same problem building htdig on BSDI. nl_types.h was included in later snapshots of htdig-3.2.0b3; however, that header file requires another header file, features.h, which is missing in my system;( I have extracted features.h from GNU glibc distribution and placed it in my htlib directory. Unfortunately features.h file requires yet another header file, stubs.h, which is neither in BSDI system nor in GNU glibc;( There is a chance that stubs.h would also require other heather file(s), and those files require yet other files ... ad infinitum;( ;))) In short, don't hold your breath. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] ssl patch
On Thu, 16 Nov 2000, Jeremy Lyon wrote: Date: Thu, 16 Nov 2000 15:15:07 -0700 From: Jeremy Lyon [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] ssl patch Hi I just tried to patch htdig 3.1.5 with the ssl patch ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.2 to a clean htdig. I got these errors root:/tmp/work/htdig-3.1.5# patch -p1 ../ssl.2 Obviously you did not look at the patch;( The first lines read: # Tabs in this patch have been converted to spaces;( In order to apply the # patch to a clean htdig-3.1.5 please use the -l switch: # #gunzip -c htdig-3.1.5.tar.gz | tar xf - #cd htdig-3.1.5 #patch -p1 -l /path/to/ssl.2 ^ That ensures the patch to apply, but it does not guaranty that the package would compile; Michael Arndt [EMAIL PROTECTED] has testified about that;) If it does not compile on your system you might want to contact the author of the patch. You could find that information, also, in the patch: _ From [EMAIL PROTECTED] Sun Oct 29 11:11:32 2000 Date: Sun, 29 Oct 2000 15:24:01 -0500 From: Will Ballantyne [EMAIL PROTECTED] To: "J. op den Brouw" [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: [htdig] ssl patches for 3.1.5 _ Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] ssl Patch for htdig
On Wed, 15 Nov 2000, Michael Arndt wrote: Date: Wed, 15 Nov 2000 13:56:05 +0100 From: Michael Arndt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] ssl Patch for htdig Hello, i would need a SSL-Version of htdig. n the Archives i found a Thread about a SSL-Patch for htdig. Only Patchfile i found is: ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.1 and that does not apply against a "clean htdig". Only Help would be aplying all patches manually. Is anyone out there who has done this already ? Or someone who can point me to a patch appliable against a clean htdig or send me patched sources ? It was reported that the older patch: ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0 applies to a "clean htdig-3.1.5" with the -l switch: cd /path/to/htdig-3.1.5/ patch -p1 -l /path/to/ssl.0 Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] ssl Patch for htdig
On Wed, 15 Nov 2000, Joe R. Jah wrote: Date: Wed, 15 Nov 2000 10:26:20 -0800 (PST) From: "Joe R. Jah" [EMAIL PROTECTED] To: Michael Arndt [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [htdig] ssl Patch for htdig On Wed, 15 Nov 2000, Michael Arndt wrote: Date: Wed, 15 Nov 2000 13:56:05 +0100 From: Michael Arndt [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] ssl Patch for htdig Hello, i would need a SSL-Version of htdig. n the Archives i found a Thread about a SSL-Patch for htdig. Only Patchfile i found is: ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.1 and that does not apply against a "clean htdig". Only Help would be aplying all patches manually. Is anyone out there who has done this already ? Or someone who can point me to a patch appliable against a clean htdig or send me patched sources ? It was reported that the older patch: ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0 applies to a "clean htdig-3.1.5" with the -l switch: cd /path/to/htdig-3.1.5/ patch -p1 -l /path/to/ssl.0 OK, I downloaded htdig-3.1.5.tar.gz; my htdig have been patched and re-patched;) I tested both versions of the patch, and found out that ssl.1 does not apply, but ssl.0, the old patch applies with -l switch. I added the following lines to the beginning of the patch and placed it in the archives as: ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.2 _ Tabs in this patch have been converted to spaces;( In order to apply the patch to a clean htdig-3.1.5 please use the -l switch: gunzip -c htdig-3.1.5.tar.gz | tar xf - cd htdig-3.1.5 patch -p1 -l /path/to/ssl.2 _ Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] ssl Patch for htdig
On Wed, 15 Nov 2000, Joshua Gerth wrote: Date: Wed, 15 Nov 2000 13:38:29 -0800 (PST) From: Joshua Gerth [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: [htdig] ssl Patch for htdig Speaking of ssl patches. I also downloaded 3.1.5 and patched it with the ssl.0 patch and the -l flag. However, I then ran into the additional problem that urls of the form: https://myserver.com were being directed to port 80, and that only urls of the form: https://myserver.com:433 were actually going to the encrypted port. So I hacked my copy so that any url which starts with https goes to port 433 by default but 'http' still goes to 80 by default. Of course, both can still be overridden by using the :port on the url. Did anyone else hit this? Would this patch be useful to anyone? If so I'll try to post it assuming I have the rights to do so. I only tested to see if the patch applies to a clean 3.1.5. I am sure, however, that your patch will be useful to someone;) Go ahead an post it to the list, or just upload it to: ftp://ftp.ccsf.org/incoming/ P.S. It would be nice to document your patch; save potential users the guesswork and digging up relevant information in the list archives;) Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Re: SSL patch for ht://Dig 3.1.5
On Fri, 3 Nov 2000, Will Ballantyne wrote: Date: Fri, 03 Nov 2000 18:46:19 -0500 From: Will Ballantyne [EMAIL PROTECTED] To: "Joe R. Jah" [EMAIL PROTECTED] Cc: "Brian W. Spolarich" [EMAIL PROTECTED], "J. op den Brouw" [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: [htdig] Re: SSL patch for ht://Dig 3.1.5 ok, I have uploaded the patch... I have moved it to: ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.1 However, I have not tested it. I searched for tabs in it, though, and found several;) Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah [EMAIL PROTECTED] "Joe R. Jah" wrote: On Thu, 2 Nov 2000 [EMAIL PROTECTED] wrote: Date: Thu, 2 Nov 2000 09:32:32 -0800 From: [EMAIL PROTECTED] To: "Brian W. Spolarich" [EMAIL PROTECTED] Cc: "J. op den Brouw" [EMAIL PROTECTED], Will Ballantyne [EMAIL PROTECTED], [EMAIL PROTECTED], "Joe R. Jah" [EMAIL PROTECTED] Subject: Re: [htdig] Re: SSL patch for ht://Dig 3.1.5 it looks like the process of mailing the patch converted my tabs to spaces (note you should be able to use "patch -l" to ignore whitespace issues). I am unsure what added those spaces. I am not a regular contributor. If someone lets me know where I can ftp the patch to avoid the conversion please let me know and I will try to do it this weekend (I am working at a secured site and cannot access the command line for my home server from here). You can upload it to: ftp://ftp.ccsf.org/incoming/ Note that you won't be able to list directory in there, but you should be able to upload files. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] At Wed, 1 Nov 2000 21:51:28 + (GMT) , "Brian W. Spolarich" [EMAIL PROTECTED] wrote: On Wed, 1 Nov 2000, Joe R. Jah wrote: | Any way I moved the patch to | ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0 because it obviously | does not apply correctly. Bless you. :-) -bws -- Brian W. Spolarich - Manager, Network Systems - WALID, Inc. - [EMAIL PROTECTED] Welcome to the Real World. - http://www.walid.com/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Re: SSL patch for ht://Dig 3.1.5
On Thu, 2 Nov 2000 [EMAIL PROTECTED] wrote: Date: Thu, 2 Nov 2000 09:32:32 -0800 From: [EMAIL PROTECTED] To: "Brian W. Spolarich" [EMAIL PROTECTED] Cc: "J. op den Brouw" [EMAIL PROTECTED], Will Ballantyne [EMAIL PROTECTED], [EMAIL PROTECTED], "Joe R. Jah" [EMAIL PROTECTED] Subject: Re: [htdig] Re: SSL patch for ht://Dig 3.1.5 it looks like the process of mailing the patch converted my tabs to spaces (note you should be able to use "patch -l" to ignore whitespace issues). I am unsure what added those spaces. I am not a regular contributor. If someone lets me know where I can ftp the patch to avoid the conversion please let me know and I will try to do it this weekend (I am working at a secured site and cannot access the command line for my home server from here). You can upload it to: ftp://ftp.ccsf.org/incoming/ Note that you won't be able to list directory in there, but you should be able to upload files. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] At Wed, 1 Nov 2000 21:51:28 + (GMT) , "Brian W. Spolarich" [EMAIL PROTECTED] wrote: On Wed, 1 Nov 2000, Joe R. Jah wrote: | Any way I moved the patch to | ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0 because it obviously | does not apply correctly. Bless you. :-) -bws -- Brian W. Spolarich - Manager, Network Systems - WALID, Inc. - [EMAIL PROTECTED] Welcome to the Real World. - http://www.walid.com/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] Re: SSL patch for ht://Dig 3.1.5
On Wed, 1 Nov 2000, Brian W. Spolarich wrote: Date: Wed, 1 Nov 2000 11:36:16 + (GMT) From: "Brian W. Spolarich" [EMAIL PROTECTED] To: "J. op den Brouw" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [htdig] Re: SSL patch for ht://Dig 3.1.5 Did not! :-) I saved directly from the patch archive, which was no mean feat given the fact that you have to construct the patch URL by hand. No, of course you didn't; it was originally posted to the list with tabs already converted. I am sure Jesse used "you" as a general pronoun; he didn't mean to say that _you_, Brian, has copied it off the screen. Any way I moved the patch to ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0 because it obviously does not apply correctly. You can manually apply it, but it would be a lot easier if Will reposts the patch correctly. It would be greatly appreciated by all the folks who wish to use it. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] On Wed, 1 Nov 2000, J. op den Brouw wrote: | I think that's what happens when you copy off the screen ;-) | | "Brian W. Spolarich" wrote: | | On Tue, 31 Oct 2000, Joe R. Jah wrote: | | | I am forwarding your message to the patch author and htdig users | | mailing list, to which the patch was originally posted. Maintainer of | | the patch site does not necessarily know why a patch fails; however, I | | have a pretty good idea in this case. All tab characters in the patch | | have been converted to spaces;( I checked the original mailing from | | Will; the tabs were converted there already. | | --Jesse | -- Brian W. Spolarich - Manager, Network Systems - WALID, Inc. - [EMAIL PROTECTED] Welcome to the Real World. - http://www.walid.com/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] SSL patch for ht://Dig 3.1.5
Hi Brian, I am forwarding your message to the patch author and htdig users mailing list, to which the patch was originally posted. Maintainer of the patch site does not necessarily know why a patch fails; however, I have a pretty good idea in this case. All tab characters in the patch have been converted to spaces;( I checked the original mailing from Will; the tabs were converted there already. Regards, Joe -- _/ _/_/_/ _/__o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah [EMAIL PROTECTED] -- Forwarded message -- Date: Tue, 31 Oct 2000 14:32:36 + (GMT) From: "Brian W. Spolarich" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: SSL patch for ht://Dig 3.1.5 I downloaded ht://Dig 3.1.5 from the htdig.org website and the SSL patch from: ftp://sol.ccsf.cc.ca.us/htdig-patches/3.1.5/ssl.0 I attempt to run 'patch' using the supplied patchfile and all of the patches fail. Am I missing something stupid and obvious? -bws admin1% tar zxf htdig-3.1.5.tar.gz admin1% ls htdig-3.1.5 htdig-3.1.5.tar.gz ssl.0 admin1% patch -p0 ssl.0 patching file `htdig-3.1.5/CONFIG' patching file `htdig-3.1.5/Makefile.config.in' Hunk #1 FAILED at 24. 1 out of 1 hunk FAILED -- saving rejects to htdig-3.1.5/Makefile.config.in.rej patching file `htdig-3.1.5/htcommon/DocumentDB.cc' Hunk #1 FAILED at 217. Hunk #2 FAILED at 284. 2 out of 2 hunks FAILED -- saving rejects to htdig-3.1.5/htcommon/DocumentDB.cc.rej patching file `htdig-3.1.5/htcommon/defaults.cc' Hunk #1 FAILED at 38. 1 out of 1 hunk FAILED -- saving rejects to htdig-3.1.5/htcommon/defaults.cc.rej patching file `htdig-3.1.5/htdig/Document.cc' Hunk #1 FAILED at 220. Hunk #2 FAILED at 332. 2 out of 2 hunks FAILED -- saving rejects to htdig-3.1.5/htdig/Document.cc.rej patching file `htdig-3.1.5/htdig/Images.cc' Hunk #1 FAILED at 61. Hunk #2 FAILED at 81. 2 out of 2 hunks FAILED -- saving rejects to htdig-3.1.5/htdig/Images.cc.rej patching file `htdig-3.1.5/htdig/Retriever.cc' Hunk #2 FAILED at 132. Hunk #3 FAILED at 668. Hunk #4 FAILED at 1232. Hunk #5 FAILED at 1365. 4 out of 5 hunks FAILED -- saving rejects to htdig-3.1.5/htdig/Retriever.cc.rej patching file `htdig-3.1.5/htdig/Server.cc' Hunk #1 succeeded at 20 with fuzz 1. Hunk #2 FAILED at 40. 1 out of 2 hunks FAILED -- saving rejects to htdig-3.1.5/htdig/Server.cc.rej patching file `htdig-3.1.5/htdig/Server.h' Hunk #1 FAILED at 26. 1 out of 1 hunk FAILED -- saving rejects to htdig-3.1.5/htdig/Server.h.rej patching file `htdig-3.1.5/htlib/Connection.cc' Hunk #1 FAILED at 39. Hunk #4 FAILED at 119. Hunk #5 FAILED at 174. Hunk #7 FAILED at 281. Hunk #9 FAILED at 469. 5 out of 9 hunks FAILED -- saving rejects to htdig-3.1.5/htlib/Connection.cc.rej patching file `htdig-3.1.5/htlib/Connection.h' Hunk #2 succeeded at 53 with fuzz 1. Hunk #3 succeeded at 73 with fuzz 2. Hunk #4 FAILED at 102. 1 out of 4 hunks FAILED -- saving rejects to htdig-3.1.5/htlib/Connection.h.rej patching file `htdig-3.1.5/htlib/URL.cc' Hunk #1 FAILED at 130. Hunk #2 FAILED at 223. Hunk #3 FAILED at 492. Hunk #4 FAILED at 549. 4 out of 4 hunks FAILED -- saving rejects to htdig-3.1.5/htlib/URL.cc.rej patching file `htdig-3.1.5/htlib/URL.h' Hunk #1 FAILED at 48. 1 out of 1 hunk FAILED -- saving rejects to htdig-3.1.5/htlib/URL.h.rej -- Brian W. Spolarich - Manager, Network Systems - WALID, Inc. - [EMAIL PROTECTED] Welcome to the Real World. - http://www.walid.com/ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
[htdig] Compiling htdig 3.20b2 on BSDI 4.01
Hi Folks, Has anyone made htdig 3.2.x compile under BSDI 4.01? Here is my latest unsuccessful stab at compiling 3.20b2 on BSDI with the following flags and results: CFLAGS="-O2" CXXFLAGS="-O2" CPPFLAGS="-I/usr/include/g++" CXX="/usr/bin/g++" export CFLAGS CXXFLAGS CPPFLAGS CXX ./configure [...] gmake Making all in db gmake[1]: Entering directory `/tmp/htdig-3.2.0b2/db' cd dist ; if [ -f Makefile ] ; then gmake PACKAGE=htdig all ; fi gmake[2]: Entering directory `/tmp/htdig-3.2.0b2/db/dist' gcc -c -O2 -Wall -I. -I./../include -I/usr/include/g++ -I/usr/include/g++/std -I/usr/local/include -I/usr/local/include ../btree/btc In file included from /usr/include/g++/std/stddef.h:11, from /usr/include/g++/std/bastring.h:35, from /usr/include/g++/std/string.h:6, from ../btree/bt_compare.c:56: /usr/include/g++/_G_config.h:39: syntax error before `_G_ptrdiff_t' /usr/include/g++/_G_config.h:39: warning: data definition has no type or storage class /usr/include/g++/_G_config.h:46: syntax error before `_G_wchar_t' /usr/include/g++/_G_config.h:46: warning: data definition has no type or storage class In file included from /usr/include/g++/std/bastring.h:35, from /usr/include/g++/std/string.h:6, from ../btree/bt_compare.c:56: /usr/include/g++/std/stddef.h:14: syntax error before string constant /usr/include/g++/std/stddef.h:23: syntax error before `}' gmake[2]: *** [bt_compare.o] Error 1 As always, any pointers are greatly appreciated. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives: http://www.htdig.org/mail/menu.html FAQ:http://www.htdig.org/FAQ.html
Re: [htdig] is there a problem with the documentation?
On Fri, 4 Aug 2000, Gilles Detillieux wrote: Date: Fri, 4 Aug 2000 10:28:07 -0500 (CDT) From: Gilles Detillieux [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] is there a problem with the documentation? Hi, folks. Geoff and I have noticed recently that there are a LOT of questions being asked on this list that are readily answered in the FAQ and/or in the Attributes documentation, both available on the web site. We'd really like to cut down on this traffic, but we need your help. If you've been RTFM'ed recently - now be honest because we know there are a lot of you out there who have - what can we do to make it easier to find answers on the web site rather than using the mailing list as a first recourse? It may help to include that info in the message footer for those who are used to the mailing list as the first recourse;) _ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. List archives:http://www.htdig.org/mail/menu.html FAQ: http://www.htdig.org/FAQ.html _ Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
[htdig] (3.1.5) Missing [Next] in the 10th page
Hi Folks, When I search for keywords that result in many matches I observe a situation like when I search for "htdig" in www.htdig.org;) Documents 1 - 10 of 15666 matches. More*'s indicate a better match. At the bottom of the page there are icons 1 to 10 and a right arrow [Next] I continue to the next page and see: Documents 11 - 20 of 15666 matches. More*'s indicate a better match. At the bottom of the page there are a left arrow [Previous], icons 1 to 10, and a right arrow [Next] ... When I reach the tenth page I see: Documents 91 - 100 of 15666 matches. More *'s indicate a better match. At the bottom of the page there are a left arrow [Previous], icons 1 to 10. The icon that points to the next page disappears;( I wouldn't be able to see any more pages, unless I increase the number of available icons in the config file. I can, of course, increase the number of matches per page to see more results in the same number of pages, but the problem is just shifts farther; it does not disappear. Is it possible to have the [Next] icon appear on the page of the last numbered icon, and beyond that, perhaps only with the [Previous] icon, and possibly some other icon indicating to have passed beyond 10; like so: Left arrow [Previous], Beyond Last icon, and right arrow [Next] I appreciate any pointers. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] Re: 3.2.0b2 - problem with either no stars, or infinite loop writing out (PR#846)
On Mon, 15 May 2000, Gilles Detillieux wrote: Date: Mon, 15 May 2000 14:12:00 -0500 (CDT) From: Gilles Detillieux [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: [htdig] Re: 3.2.0b2 - problem with either no stars, or infinite loop writing out (PR#846) Yes, Terry Luedtke reported this problem and posted a patch for it, to [EMAIL PROTECTED], back on May 3rd. It didn't seem to make it into Joe's patch archive, so I'll repost it here for those who missed it. The patch fixes a few bugs in the score calculation which cause the problems in star generation. I do not know how I missed it, but it is now in there: ftp://ftp.ccsf.org/htdig-patches/3.2.0b2/noStars.0 Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] problems with the accent patch
On Thu, 2 Mar 2000, Eric van der Vlist wrote: Date: Thu, 02 Mar 2000 22:12:34 +0100 From: Eric van der Vlist [EMAIL PROTECTED] To: Gilles Detillieux [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: [htdig] problems with the "accent" patch Hi, I have applied this patch as well and noticed that it's working for most of the words, but not for others... Looking at the output of "htfuzzy -vv accents", I have noticed that all the words are truncated to 12 characters and that the words which are truncated are those for which there is a problem. For instance searching for "enchere" (not truncated) will return the matching for the correctly spelled word (with egrave;) while searching for "specification" truncated to "specificatio" will not match specification with a eacute;. If I search for "specificatio", I do get the matching for the accentuated word... I am trying to find where this truncation happens, but if anyone more familiar with the code can shed some light, it would help ! In the htdig.conf file set maximum_word_length attribute. It is by default 12. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
[htdig] Someone's forging email addresses of htdig members
On Tue, 16 Nov 1999 [EMAIL PROTECTED] wrote: Date: Tue, 16 Nov 1999 21:05:40 +1100 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] hey wassup HtDIG ;) here is the site you wanted... XXX it's the one that gives you free membership access (all hacked) to abotu 300 membership based sex sites. k bye... ps: why r u using htdig.org now? it doens't make sence, anyway *bye*... To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message. I don't believe Geoff sent that message. I received another forged message masquerading to have come form [EMAIL PROTECTED] I also received an undeliverable message, which _I_ supposedly had sent; meaning they have forged my address too;( Any ideas? Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
Re: [htdig] htsearch on BSDI 4.0.1
On Thu, 28 Oct 1999, Markus Mohr wrote: Date: Thu, 28 Oct 1999 00:56:23 +0200 From: Markus Mohr [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] htsearch on BSDI 4.0.1 Hi! I´ve compiled and configured htdig 3.1.3 on BSDI. The dig ran fine, but htsearch simply segfaults. I´ve got a htsearch.core, but what can I do now? . Make clean. . Remove references to regex.o from htlib/Makefile. . Remove htlib/regex.h. . Remove references to htlib/regex.h in htfuzzy/Makefile, if you have done make depend. . Make. . Voila;) Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
Re: [htdig] rundig locks
On Mon, 27 Sep 1999, Andy Malato wrote: Date: Mon, 27 Sep 1999 09:38:22 -0400 From: Andy Malato [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] rundig locks Hello, I've just compiled and installed htdig 3.1.3 on my BSDI 3.1 machine, and when I attempt to run rundig to create the sample search, nothing happens, I then have to user cntrl-c to break out of it. Did I do something wrong? Search the archives at http://www.mail-archive.com/htdig%40htdig.org/ for "Memory fault (core dumped)" and/or "Segmentation Fault V3.1.2" Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
Re: [htdig] Scripts that use GDBM_File
On Sun, 26 Sep 1999 [EMAIL PROTECTED] wrote: Date: Sun, 26 Sep 1999 14:14:13 +0200 (MEST) From: [EMAIL PROTECTED] To: "Joe R. Jah" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: [htdig] Scripts that use GDBM_File Joe R. Jah writes: Hi Folks, I have been trying to take advantage of some of the scripts in contrib folder; yesterday I noticed that five of them, changehost/changehost.pl, doclist/doclist.pl, doclist/listafter.pl, urlindex/urlindex.pl, and wordfreq/wordfreq.pl, have an identical problem: they all die on the following line: tie(%docdb, GDBM_File, $dbfile, GDBM_READER, 0) die ... Well, since htdig uses Berkeley DB files and not GDBM files you could just change the tie call to DB_File. I'm not sure if the internal structure of the data stored in the file is still compatible, though :-} Thanks Loic; I replaced GDBM_File with DB_File in the tie call, but now I get: Can't locate object method "TIEHASH" via package "DB_File" at doclist.pl line 13. Can't locate object method "TIEHASH" via package "DB_File" at listafter.pl line 27. .. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
[htdig] Scripts that use GDBM_File
Hi Folks, I have been trying to take advantage of some of the scripts in contrib folder; yesterday I noticed that five of them, changehost/changehost.pl, doclist/doclist.pl, doclist/listafter.pl, urlindex/urlindex.pl, and wordfreq/wordfreq.pl, have an identical problem: they all die on the following line: tie(%docdb, GDBM_File, $dbfile, GDBM_READER, 0) die ... All these scripts use GDBM_File. Am I missing something, or they are all incompatible with htdig 3.1.x db? I appreciate any pointers. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
Re: [htdig] htdig and symbolic links
On Fri, 10 Sep 1999, Nick O'Brien wrote: Date: Fri, 10 Sep 1999 15:13:20 +0100 (GMT Daylight Time) From: Nick O'Brien [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] htdig and symbolic links Hi, We are implementing htdig (v3.1.2 + the patch kit on Solaris 2.6) on our main web server. One comment we have had is that there are alot of duplicate search results pointing to the same web pages. This is usually caused by having several different Unix symbolic links pointing to the same directory/file in the web document tree. Is there any way we can prevent the indexing of these duplicates? I see from the mailing list archives that for previous versions of htdig there were patches to fix this issue but they are not available for the current version. I see from the bug database the latest advice is to eliminate symbolic links - however for many practical reasons it is not possible for us to do this. Is it for example possible to configure htdig to index our URLs via the filesystem instead of HTTP (i.e using local_urls) and to ignore the symbolic links? How are people on the list working round this problem? Or is this an unresolved bug I will need to (re)log with the htdig developers? Our site is in the same boat that your site is in; I use the same old patch for version 3.0.8b2, but I apply it manually at every new release. You can get it from: ftp://sol.ccsf.cc.ca.us/htdig-patches/3.0.8b2/Retriever.cc.0 Then with an ugly extensive set of local_urls for each and every symbolic link in the site:( I mange to suppress duplicates, quadruplicates, and multuplicates;) Boy, do I look forward to 3.2, which is promised to take care of the menace of duplicates. Regards, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
[htdig] Meta keywords abuse;(
Hi Folks, I run htdig for a college; several departments maintain their respective sites on a few servers. Each department has a webmaster, sort of, who sets up their pages; some departments take advantage of the enthusiasm of their students and leave it to them to set up the site. Sometimes students become over-enthusiastic;) Some of the over-enthusiastic students, or webmasters, get the idea of using, I'd say abusing, the meta keywords. They create a meta tag keywords set of over a hundred very common words and replicate it in two dozen files in their site. When anyone searches for any of those common words, the first two dozen results would be from that site. My solution, for now, is to exclude their site from the dig; however, I would like to find a less drastic measure. I would like to dig their site, but incapacitate _their_ meta tag keywords. I'd like to leave meta tags for judicious use, a few descriptive keywords, not a hundred!! Secondly, and more importantly, I'd like to find a way to discover such abuses. I stumbled upon one of these sites, by sheer luck the day after they set it up;) I'd like to systematically search and find any site who abuses meta tags like this, SD;) I appreciate any pointers, TIA, Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
Re: [htdig] BSDI installation
On Sat, 4 Sep 1999, Biranit Goren wrote: Date: Sat, 04 Sep 1999 01:54:03 +0200 From: Biranit Goren [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [htdig] BSDI installation Hello, I have had htdig on our website, at Atlas F1, for quite some time now and have grown totally reliant on its wonderful search service. Our current OS is Linux RedHat 6.0. However, at the end of the month we are moving to a new, dedicated server, where the OS will be BSDI 4.0. I understand that htdig cannot be installed on this OS, but I also saw in the TODO page that this will be changed (or has changed, I am not sure what you regard as a bullet and what you regard as a circle there!!!). I am totally depressed by the thought of not being able to use htdig - please, please tell me there is a solution? We have 2.5 million visitors a month to our website, and the search service is imperative for us. The thought of having to find a new search engine is utterly annoying, so please tell me how I can install htdig on BSDI after all? Please? Don't be depressed; thanks to Gilles and Geoff I solved this problem last July. 3.1.2 doesn't use the regex code in the C library, but rather it bundles the GNU regex code in the package, and puts it in htlib/libht.a. This GNU regex.c code is causing a conflict with BSDI C/C++ library. BSDI already has a regex.h in /usr/include directory, and regex functions in C library, To go around this problem do the following: . Remove references to regex.o from htlib/Makefile. . Remove, htlib/regex.h. . Remove references to htlib/regex.h in htfuzzy/Makefile. Voila;) Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
Re: [htdig] Memory fault (core dumped)
On Wed, 28 Jul 1999, Gilles Detillieux wrote: Date: Wed, 28 Jul 1999 17:50:23 -0500 (CDT) From: Gilles Detillieux [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: [htdig] Memory fault (core dumped) OK, upon closer examination, it appears I was mistaken about one point. 3.1.2 doesn't use the regex code in the C library, but rather it bundles the GNU regex code in the package, and puts it in htlib/libht.a. The extern "C" construct above doesn't work because htlib/regex.h already includes this construct - that's why it was removed from htfuzzy/Endings.cc. So, perhaps this GNU regex.c code is causing a conflict with your C or C++ library. If you already have a regex.h in your /usr/include directory, and regex functions in your C library, you might want to try using these instead of the ones in htlib. To do this, I think you'd need to remove references to regex.o from htlib/Makefile, remove regex.o from Bingo;) I removed references to regex.o from htlib/Makefile. htlib/libht.a, and probably also remove htlib/regex.h. If this works, I ran make clean and removed htlib/regex.h; however, I had to also remove references to htlib/regex.h in htfuzzy/Makefile. It worked like a charm;) Thanks a million Gilles and Geoff. Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
Re: [htdig] Memory fault (core dumped)
Hi BSDI Folks, On htdig mailing list a question has come up, to which you might know the answer, is there a bug in /usr/lib/libg++.so in BSDI. I appreciate any pointers and/or comments. On Wed, 28 Jul 1999, Geoff Hutchison wrote: Date: Wed, 28 Jul 1999 16:21:04 -0400 From: Geoff Hutchison [EMAIL PROTECTED] To: "Joe R. Jah" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [htdig] Memory fault (core dumped) "Joe R. Jah" wrote: Yes, I grepped all the files in the source directory for "Regex"; it's only in htdoc/TODO.html and htlib/regex.c as a comment. I found many instances of it, though, in /usr/lib/libg++.so. Does that mean anything at all? It may mean a bug in libg++ or a mismatch between libg++ and your C library. What compiler (version) did you use to compile? Have you tried compiling with a more recent compiler (and/or libstdc++)? Gcc version 2.7.2.1; I haven't tried with any other. As Gilles pointed out, one of the changes in 3.1.2 was to use the system regex.h, which seems almost uniformly much faster than the rx code that we had been using. This is the likely culprit, but I don't know why it's not working for you. On Wed, 28 Jul 1999, Gilles Detillieux wrote: Date: Wed, 28 Jul 1999 15:13:02 -0500 (CDT) From: Gilles Detillieux [EMAIL PROTECTED] To: "Joe R. Jah" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [htdig] Memory fault (core dumped) According to Joe R. Jah: On Wed, 28 Jul 1999, Geoff Hutchison wrote: #2 0x180dcf29 in Regex::Regex () #3 0x180dd138 in global constructors keyed to Regex::Regex () This still confuses me. There isn't a 'Regex' class in 3.1.2. There's a Regex class (a fuzzy) in the 3.2 development source. But that shouldn't be in 3.1.2, so 'Regex::Regex()' shouldn't be called. Yes, I grepped all the files in the source directory for "Regex"; it's only in htdoc/TODO.html and htlib/regex.c as a comment. I found many instances of it, though, in /usr/lib/libg++.so. Does that mean anything at all? Yes, I suspect that the Regex class might be right in libg++ on your system. The first stack backtrace you sent seemed to suggest that all the chain of function calls was for some internal initialization sequence (none of the symbols seemed to be within ht://Dig's code). I suspect the problem is somehow related to the switch in 3.1.2 from the old, slow regex code bundled in with ht://Dig, to the faster C library regex code. One of the differences I could see is that in the old code in htfuzzy/Endings.cc, it did this to include the bundled regex header file: extern "C" { # include rxposix.h } but now it does this: #include regex.h Just a wild guess, but maybe surrounding that line with an extern "C" construct would help - it might allow inclusion of C library stuff, but avoid unwanted C++ library stuff, which appears to be the source of I did it like so: __ extern "C" { #include regex.h } __ I got a compile error: __ regex.c:210: syntax error before string constant regex.c:213: syntax error before `}' gmake[1]: *** [regex.o] Error 1 gmake[1]: Leaving directory `/usr/home/jjah/tmp/htdig-3.1.2/htdig-3.1.2/htlib' gmake: *** [all] Error 1 __ grief here. The other possibility (more wild speculation here) is that on BSDI systems, the regex code in the C library somehow conflicts with the regex code in the C library, so it would need to be avoided on BSDI (or you'd need a different C++ library). I CC this message to BSDI-Users mailing list for that question. It's odd, though, that the problem arises in htsearch but not htfuzzy (or does it?). Does htfuzzy do anything differently in its initialization to avoid this problem? (retorical question, but hey, if anyone has the answer I'd like to hear it) Yes the problem also arises in htfuzzy. Joe -- _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word unsubscribe in the SUBJECT of the message.
htdig: Patch: for keyword(s)_factor typo in the docs
Hi Folks, The following patch is against a virgin 3.1.0b4:) Begin Patch___ *** htdoc/attrs.html.orig Tue Dec 22 17:53:13 1998 --- htdoc/attrs.htmlSat Jan 9 11:37:59 1999 *** *** 1694,1701 hr dl dt ! stronga name="keyword_factor" ! keyword_factor/a/strong /dt dd dl --- 1694,1701 hr dl dt ! stronga name="keywords_factor" ! keywords_factor/a/strong /dt dd dl *** *** 1731,1737 emexample:/em /dt dd ! keyword_factor: 12 /dd /dl /dd --- 1731,1737 emexample:/em /dt dd ! keywords_factor: 12 /dd /dl /dd *** *** 4331,4338 to be ignored. The number may be a floating point number. See also the a href="#heading_factor" heading_factor_[1-6]/a, a href="#title_factor" ! title_factor/a, and a href="#keyword_factor" ! keyword_factor/a attributes. /dd dt emexample:/em --- 4331,4338 to be ignored. The number may be a floating point number. See also the a href="#heading_factor" heading_factor_[1-6]/a, a href="#title_factor" ! title_factor/a, and a href="#keywords_factor" ! keywords_factor/a attributes. /dd dt emexample:/em *** htdoc/cf_byname.html.orig Tue Dec 22 17:53:13 1998 --- htdoc/cf_byname.htmlSat Jan 9 11:36:34 1999 *** *** 112,118 /font br bK/b font face="helvetica,arial" size="2"br img src="dot.gif" alt="*" a target="body" href= ! "attrs.html#keyword_factor"keyword_factor/abr img src="dot.gif" alt="*" a target="body" href= "attrs.html#keywords_meta_tag_names" keywords_meta_tag_names/abr --- 112,118 /font br bK/b font face="helvetica,arial" size="2"br img src="dot.gif" alt="*" a target="body" href= ! "attrs.html#keywords_factor"keywords_factor/abr img src="dot.gif" alt="*" a target="body" href= "attrs.html#keywords_meta_tag_names" keywords_meta_tag_names/abr *** htdoc/cf_byprog.html.orig Tue Dec 22 17:53:13 1998 --- htdoc/cf_byprog.htmlSat Jan 9 11:35:45 1999 *** *** 67,73 img src="dot.gif" alt="*" a target="body" href= "attrs.html#image_list"image_list/abr img src="dot.gif" alt="*" a target="body" href= ! "attrs.html#keyword_factor"keyword_factor/abr img src="dot.gif" alt="*" a target="body" href= "attrs.html#limit_normalized"limit_normalized/abr img src="dot.gif" alt="*" a target="body" href= --- 67,73 img src="dot.gif" alt="*" a target="body" href= "attrs.html#image_list"image_list/abr img src="dot.gif" alt="*" a target="body" href= ! "attrs.html#keywords_factor"keywords_factor/abr img src="dot.gif" alt="*" a target="body" href= "attrs.html#limit_normalized"limit_normalized/abr img src="dot.gif" alt="*" a target="body" href= _End Patch Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
htdig: Patch: for limit_normalized alphabetical rank
Hi Folks, This patch is against keyword(s)_factor patched 3.1.0b4, ftp://sol.ccsf.cc.ca.us/htdig-patches/3.1.0b4/attrs-cf_byname-prog.html.0 ___Begin Patch_ *** htdoc/cf_byprog.html.patchedSat Jan 9 11:35:45 1999 --- htdoc/cf_byprog.htmlSat Jan 9 12:45:01 1999 *** *** 69,78 img src="dot.gif" alt="*" a target="body" href= "attrs.html#keywords_factor"keywords_factor/abr img src="dot.gif" alt="*" a target="body" href= - "attrs.html#limit_normalized"limit_normalized/abr - img src="dot.gif" alt="*" a target="body" href= "attrs.html#keywords_meta_tag_names" keywords_meta_tag_names/abr img src="dot.gif" alt="*" a target="body" href= "attrs.html#limit_urls_to"limit_urls_to/abr img src="dot.gif" alt="*" a target="body" href= --- 69,78 img src="dot.gif" alt="*" a target="body" href= "attrs.html#keywords_factor"keywords_factor/abr img src="dot.gif" alt="*" a target="body" href= "attrs.html#keywords_meta_tag_names" keywords_meta_tag_names/abr + img src="dot.gif" alt="*" a target="body" href= + "attrs.html#limit_normalized"limit_normalized/abr img src="dot.gif" alt="*" a target="body" href= "attrs.html#limit_urls_to"limit_urls_to/abr img src="dot.gif" alt="*" a target="body" href= End Patch__ Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
htdig: 3.1.b2 - 3.1.b3 performance degradation +
Hi Geoff, Yesterday I installed htdig 3.1.b3 on my machine. I compiled it on a BSDI box, everything was left as was except Retriever.cc, which was patched with the old 3.0.b2 patch, ftp://sol.ccsf.cc.ca.us/htdig-patches/3.0.8b2/Retriever.cc.0 to exclude local duplicates, same as my 3.1.b2. The results: 1. It takes considerably longer to search ( 10 to 20 times) than 3.1.b2 2. Many of the pages present in 3.1.b2 results, are absent in 3.1.b3 results. 3. I can not explain the size changes of the db.wordlist and db.words.db files. 3.1.b2 DB files: -rw-r--r-- 1 jjah www 11360256 Dec 16 02:35 db.docdb -rw-r--r-- 1 jjah www385024 Dec 16 02:35 db.docs.index -rw-r--r-- 1 jjah www 19231896 Dec 16 02:34 db.wordlist -rw-r--r-- 1 jjah www 16835584 Dec 16 02:34 db.words.db 3.1.b3 DB files: -rw-r--r-- 1 jjah www 11515904 Dec 17 02:37 db.docdb -rw-r--r-- 1 jjah www372736 Dec 17 02:37 db.docs.index -rw-r--r-- 1 jjah www 17188189 Dec 17 02:36 db.wordlist -rw-r--r-- 1 jjah www 17328128 Dec 17 02:36 db.words.db I appreciate any pointer. TIA, Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: Problems with using htdig -a
On Mon, 21 Sep 1998, Geoff Hutchison wrote: Date: Mon, 21 Sep 1998 23:13:12 -0400 From: Geoff Hutchison [EMAIL PROTECTED] To: "Joe R. Jah" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: htdig: Problems with using htdig -a At 1:23 AM -0400 9/18/98, Joe R. Jah wrote: I assume this increase in size of db files and theincrease in the reported number of documents will be cumulative over time if one uses this workaround; It will probably increase the actual search time as well;( I'm not sure what's going on here. Perhaps you could export the ASCII database for the db with and without this behavior. I'd be interested to see if documents are being duplicated. Do you use "remove_bad_urls"? Yes documents are being duplicated, triplicated, and ... That's why I use the old "Excluding directories and duplicate URLs patch." Yes I have the line remove_bad_urls:true in my htdig.conf file. Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: Problems with using htdig -a
On Thu, 17 Sep 1998, Geoff Hutchison wrote: Date: Thu, 17 Sep 1998 23:47:47 -0400 From: Geoff Hutchison [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: htdig: Problems with using htdig -a Hi, I consider the following a bug, since it's not documented. Fortunately there's an easy workaround. I normally run the dig with the switch -a to use alternate files (allowing others to search as I'm digging). Usually I don't use the switch -i, so it should do an "update" dig and index only the changed or new files (which should be a small subset of the 50,000 pages). Then the script moves the files into place at the end of the run. However, when using "-a" I wasn't seeing an update of the database. Essentially htdig looks at the db.docs.work file and found it empty. So it updates the empty db by doing a full initial dig. :-( Here's an example solution: (yes, you might want to ignore the first cp commands and change the first two mv commands to cp) BASEDIR=/opt/htdig cp $BASEDIR/db/db.wordlist $BASEDIR/db/db.wordlist.work cp $BASEDIR/db/db.docdb $BASEDIR/db/db.docdb.work $BASEDIR/bin/htdig -a -s $BASEDIR/bin/htmerge -a -s mv $BASEDIR/db/db.wordlist.work $BASEDIR/db/db.wordlist mv $BASEDIR/db/db.docdb.work $BASEDIR/db/db.docdb mv $BASEDIR/db/db.docs.index.work $BASEDIR/db/db.docs.index mv $BASEDIR/db/db.words.db.work $BASEDIR/db/db.words.db This changed a 1 hr. 30 min. dig into a 15 min dig, even counting the shuffling of files. Faster is better. :-) I have 2809 documents on a local server; I also use the -a switch; it normllyt takes about 12 minutes to rundig. I tried your easy workaround and got the following results: According to the report I have 3128 documents; it took about 14 minutes to rundig. The size of my db files increased by about 30%: -rw-r--r-- 1 jjah www 13281280 Sep 17 21:36 db.docdb -rw-r--r-- 1 jjah www 10482688 Sep 17 02:33 db.docdb.old -rw-r--r-- 1 jjah www398336 Sep 17 21:35 db.docs.index -rw-r--r-- 1 jjah www343040 Sep 17 02:33 db.docs.index.old -rw-r--r-- 1 jjah www 22928417 Sep 17 21:36 db.wordlist -rw-r--r-- 1 jjah www 17329728 Sep 17 02:32 db.wordlist.old -rw-r--r-- 1 jjah www 19543040 Sep 17 21:34 db.words.db -rw-r--r-- 1 jjah www 15352832 Sep 17 02:32 db.words.db.old I assume this increase in size of db files and theincrease in the reported number of documents will be cumulative over time if one uses this workaround; It will probably increase the actual search time as well;( Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: ht3.1.0b1 and PDF
On Sun, 13 Sep 1998, Geoff Hutchison wrote: Date: Sun, 13 Sep 1998 09:08:55 -0400 From: Geoff Hutchison [EMAIL PROTECTED] To: "Joe R. Jah" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: htdig: ht3.1.0b1 and PDF [snip] This will be in 3.1.0b2. But it's not really supported in 3.1.0b1. If someone wants to beat me to a patch, great! I applied the following patch to PDF.cc: ___ $ diff -c PDF.cc.old PDF.cc *** PDF.cc.old Tue Sep 15 00:40:51 1998 --- PDF.cc Tue Sep 15 00:52:03 1998 *** *** 140,147 return; } ! // Use acroread as a filter to convert to PostScript. ! acroread " -toPostScript " pdfName " /tmp 21"; system(acroread); FILE* psFile = fopen(psName, "r"); --- 140,147 return; } ! // Use pdftops as a filter to convert to PostScript. ! acroread " " pdfName " 21"; system(acroread); FILE* psFile = fopen(psName, "r"); ___ Now when I run htdig I get the following errors: Error (0): PDF file is damaged - attempting to reconstruct xref table... Error: Couldn't find trailer dictionary Error: Couldn't read xref table Is it the patch or some damaged PDF file? Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
htdig: Re: Virtual memory exceeded in `new'
On Mon, 14 Sep 1998, Geoff Hutchison wrote: Date: Mon, 14 Sep 1998 11:20:28 -0400 From: Geoff Hutchison [EMAIL PROTECTED] To: "Joe R. Jah" [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Virtual memory exceeded in `new' Is there anything I can do to make htnotify get enough VM? Sure. I'd guess there's a memory leak. So if anyone finds it (I really haven't looked at the code), you won't have to worry about "enough" VM. Thanks to a tip from Theodore Hope from [EMAIL PROTECTED] mailing list, I solved the problem by prepending the htnotify command line in rundig with "unlimit;" Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: ht3.1.0b1 and PDF
On Fri, 11 Sep 1998, Geoff Hutchison wrote: Date: Fri, 11 Sep 1998 13:28:47 -0400 From: Geoff Hutchison [EMAIL PROTECTED] To: Chris Brown [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: htdig: ht3.1.0b1 and PDF At 12:51 PM -0400 9/11/98, Chris Brown wrote: I recently installed htdig 3.1b1 and it works fine except now it won't find my acroread to convert .pdf's. I still have the acroread: parameter set in the htdig.conf file as it was in my old one. Is there a new param to set this? Yup. The param is "pdf_parser" since there was a lot of discussion about using other programs to parse PDF files. I don't think anyone has tested using other programs, but I figured it would be better to name it "pdf_parser" than "acroread" anyway. I run htdig on a BSDI 3.1 box; acroread does not have a port for it, but I have pdftops. When I set the param up in my htdig.config file as follows: pdf_parser: /usr/contrib/bin/pdftops I get the following result: Usage: pdftops [-f int] [-l int] [-h] [-help] PDF-file [PS-file] -f : first page to print -l : last page to print -h : print usage information -help: print usage information PDF::parse: cannot open acroread output when I set it up like: pdf_parser: /usr/contrib/bin/pdftops %src %dest I get the following result: PDF::parse: cannot find acroread What am I doing wrong? TIA, Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
htdig: Excluding directories and duplicate URLs patch
Hi Geoff, Thank you very much for carrying this great software forward. I compiled/installed ht://Dig 3.1.0b1 a few hours ago on a BSDI 3.1 box. When I ran the rundig script I realized that the sizes of files in db directory were dramatically increased, about 70%. I searched several local file systems and found out that I had many duplicate and triplicate indexed files. I immediately checked Retriever.cc and realized that the patch ftp://sol.ccsf.cc.ca.us/htdig-patches/3.0.8b2/Retriever.cc.0 have not been applied to ht://Dig 3.1.0b1; I applied it manually and recompiled htdig and reran rundig. My databases shrank to their normal size; no more duplicates;-) Please include this patch in your next release. Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
htdig: Virtual memory exceeded in `new'
Hi Geoff, I have had this message ever since the first release of ht//Dig. my solution have been to comment out the htnotify line in rundig. I uncommented it today to find out if it would work in 3.1.0b1, but unfortunately I got the same message. I have 96 Megs of RAM and my total swap space is about 370 Megs: __ $ pstat -s Device name 1K-blocks Type sd0b 49148 Interleaved sd1a 61436 Sequential sd2a 12284 Sequential sd2b255996 Sequential 0 (1K-blocks) allocated out of 378864 (1K-blocks) total, 0% in use __ Is there anything I can do to make htnotify get enough VM? TIA, Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
htdig: PDF parser
Hi Folks, Has anyone installed a PDF parser on BSDI? Acrobat reader has versions for Linux, AIX, Solaris, SunOS, IRIX, HP-UX, and Digital, but non for BSDI;( I appreciate any pointers. Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: patch site
On Thu, 27 Aug 1998, Gordon Hopper wrote: Date: Thu, 27 Aug 1998 10:09:45 -0600 From: Gordon Hopper [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: htdig: patch site [snip] I have a clean version that I downloaded from http://htdig.sdsu.edu/. If there are other places to download files, please let me know where they are. Gordon ftp://sol.ccsf.cc.ca.us/htdig-patches/ Or http://sol.ccsf.cc.ca.us/ftp/htdig-patches/ Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: PDF parsing
On Tue, 21 Jul 1998, Colin Viebrock wrote: Date: Tue, 21 Jul 1998 11:42:41 -0400 From: Colin Viebrock [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: "[EMAIL PROTECTED]" [EMAIL PROTECTED] Subject: Re: htdig: PDF parsing [snip] send this file to you seperately, so as not to clutter the list. (Is there a location for this file on the htdig site yet?) Don't compile it yet Yes you can put it in ftp://sol.ccsf.cc.ca.us/incoming directory. It is an unreadable, but writable directory. I will place it in ftp://sol.ccsf.cc.ca.us/htdig-patches/3.0.8b2/, the unofficial htdig patches site. Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: What am I doing wrong
On Wed, 10 Jun 1998, Peter Burden wrote: Date: Wed, 10 Jun 1998 21:33:38 +0100 From: Peter Burden [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: htdig: What am I doing wrong Hello, We've been running htdig on a medium site (some 18000 pages) for some time and it's been quite OK (apart form the odd time the database build broke the disc partition). Recent analysis of results has identified one or two problems. Are these configuration issues ? Are there patches available ? 1.Duplicate URLs htdig doesn't seem too good at spotting multiple different URLs pointing to the same page. Host name duplication You can apply the following patches: http://sol.ccsf.cc.ca.us/ftp/htdig-patches/3.0.8b1/Docu-def-Retr-Serv.0 http://sol.ccsf.cc.ca.us/ftp/htdig-patches/3.0.8b1/Document.cc.0 http://sol.ccsf.cc.ca.us/ftp/htdig-patches/3.0.8b1/Retriever-def.0 http://sol.ccsf.cc.ca.us/ftp/htdig-patches/3.0.8b2/Retriever.cc.0 Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: Patch for b2 for excludes/restricts
On Tue, 12 May 1998, Alex Block wrote: Date: Tue, 12 May 1998 15:10:17 -0400 From: Alex Block [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: htdig: Patch for b2 for excludes/restricts I understand that there is a patch available for release b2 that addresses issues with respect to inclusion of the "exclude" or "restrict" in search forms? Can anyone advise where to find this patch? Try:ftp://sol.ccsf.cc.ca.us/htdig-patches/ FTP clients Or: http://sol.ccsf.cc.ca.us/ftp/Ahtdig-patches/Web browsers And read the 00INDEX and README files. Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: Dig on One Machine (FreeBSD), Search on Another (Linux) --- Big Problems
On Tue, 21 Apr 1998, Rene' Seindal wrote: Date: Tue, 21 Apr 1998 20:36:10 +0200 From: Rene' Seindal [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: htdig: Dig on One Machine (FreeBSD), Search on Another (Linux) --- Big Problems [snip] This might be the problem I fixed about a year ago. Get the patch from ftp://webadm.kb.dk/pub/htdig.patch and ftp://webadm.kb.dk/pub/htdig.patch2 -- René Seindal I read the two patches mentioned above. Unfortunately there were not any explanation and/or comment on what they are supposed to achieve. I'd like to place them in ftp://sol.ccsf.cc.ca.us/htdig-patches/3.08b/. If you don't mind them placed in a central patch site for all interested to use, please add a couple of lines of comment at the top of each patch stating the name of the author and explaining what they are supposed to fix and/or what feature(s) they are supposed to add. IMHO, It would help the whole htdig runners to have a central site to look for all old and new patches. If someone has a better idea please share it with the rest of us. please check ftp://sol.ccsf.cc.ca.us/htdig-patches/3.08b/ for patches; if you know of any patches that are not there please email them, or their present location to [EMAIL PROTECTED]. Thank you. Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: patch for Retriever.cc
On Fri, 17 Apr 1998, Steve Scott wrote: Date: Fri, 17 Apr 1998 08:44:47 -0400 (EDT) From: Steve Scott [EMAIL PROTECTED] To: "Joe R. Jah" [EMAIL PROTECTED] Cc: Steve Scott [EMAIL PROTECTED] Subject: Re: htdig: patch for Retriever.cc Joe, Thanks for the information on the patches. I looked at them and will apply them. When you said that then needed to be applied in order, do you mean apply one , compile, then apply the second one. Can't I just make all the changes at once and then recompile? I am not real familiar with C code, but it looks like a few hours of Yes, you can make all the changes and then compile. cutting and pasting the code into place. Why weren't the 3.07 patches included in the 3.08 code? This does not make sense to me. Are there any other patches that I should apply? I have the 3.08 patches already and I was only going to apply the one that checks the inodes for duplicate references. I do not know why they weren't included in the 3.08b. Unfortunately the line numbers may not match and you should do it the hard way, manually patch the changes; some of the memory leak patches have been applied; you'd see them in the code. There are at least two of Pasi's patches you should apply; they both relate to local file systems. On another note, I thought that I read that there is a way for htdig to only dig for pages that have changed since the last dig, and then append this to the database. Do you know anything about this? Or am I mistaken? Currently it takes 6-10 hours for our customer to dig all the web sites that we want. It would be nice to dig out only the pages that have changed. I refer this question to the list;-) Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: patch for Retriever.cc
On Thu, 16 Apr 1998, Steve Scott wrote: Date: Thu, 16 Apr 1998 16:34:27 -0400 (EDT) From: Steve Scott [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Steve Scott [EMAIL PROTECTED] Subject: htdig: patch for Retriever.cc I tried to apply the patch for Retriever.cc and I get a compile error trying to compile here is the error message: Retriever.cc:427: Undefined symbol _IsLocal referenced from text segment make[1]: *** [htdig] Error 1 I say another person with the same problem who posted a message in March. Is there a fix to the patch? Thank you. Steve Scott This patch requires Pasi Eronen's Patches for local home directories for 3.07 which was not carried over to 3.08b: ___ Date: Tue, 5 Aug 1997 16:53:35 +0300 (EEST) From: Pasi Eronen [EMAIL PROTECTED] To: HTDig mailing list [EMAIL PROTECTED] Subject: htdig: Patches for local home directories Hi! Few days ago, I posted patches for local filesystem access to HTDig. Since then, I've received a request for supporting user home directory URLs (like http://www.my.com/~user/foo.html), and made a patch for that, too. The syntax of the new configuration file option is: local_user_urls: prefix1=[path1],dir1 ... If you leave the "path" part out, it looks up the user's home directory in /etc/passwd (or NIS or whatever). For example, to map "http://www.my.org/~joe/foo/bar.html" to "/home/joe/www/foo/bar.html" you would say: local_user_urls: http://www.my.org/=/home/,/www/ The default behaviour of many WWW servers is approximately: local_user_urls: http://www.my.org/=,/public_html/ (NOTE: All the slashes in these examples are REQUIRED!) This patch is a bit large, so I won't post it here. Instead, it's available from http://www.iki.fi/pe/htdig/. I'm not indexing any home directories myself, so comments are very welcome. (Tip: you can see what local filename it's trying to use if you specify options '-v -v' to htdig.) Thanks to Geoff Hutchison for testing this patch. Best regards, Pasi --- Pasi Eronen [EMAIL PROTECTED], +358-50-5123499 _ Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
Re: htdig: Virtual Memory
On Sun, 12 Apr 1998, System Administrator wrote: Date: Sun, 12 Apr 1998 10:45:52 -0500 (CDT) From: System Administrator [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: htdig: Virtual Memory I've been trying to run htdig -a -v -s, to index about 5500 sites, but I get an error 'Virtual Memory exceeded in `New'. Does anyone had this problem before? How can I fix it? Frank I had the same problem indexing just one site; I have 64 meg of ram and about 450 meg of swap. Our site has only about 2,000 documents. I solved the problem by commenting out the htnotify line in rundig. Joe _/ _/_/_/ _/ __o _/ _/ _/ _/ __ _-\,_ _/ _/ _/_/_/ _/ _/ ..(_)/ (_) _/_/ oe _/ _/. _/_/ ah[EMAIL PROTECTED] -- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.