Re: [htdig] scoring questions

2001-01-08 Thread Joe R. Jah

On Mon, 8 Jan 2001, Daniel Naber wrote:

 Date: Mon, 8 Jan 2001 21:52:28 +0100
 From: Daniel Naber [EMAIL PROTECTED]
 To: Jason Meyering [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: [htdig] scoring questions
 
 On 2001-01-08 21:23, you wrote:
 
  three words.  Is there any way to tell htsearch to automatically score
  pages containing all search words higher than pages that don't have all
  words?  
 
 Yes, there's a one-line patch in the contributions section. I think I 
 called it multiple-boost or so.

ftp://ftp.ccsf.org/htdig-patches/3.1.5/multiboost.1

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] problems building htdig on cygwin

2000-11-26 Thread Joe R. Jah

On Sun, 26 Nov 2000, Gerrit P. Haase wrote:

 Date: Sun, 26 Nov 2000 18:06:01 +0100
 From: "Gerrit P. Haase" [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [htdig] problems building htdig on cygwin
 
 Hi there,
 
 got problems building htdig on cygwin,
 first was: some tabs were missing in the makefile, which is dubious.
 second one is a missing header: nl_types.h, anyone knows, what is 
 missing at my system? Where is this header included?
 There is nothing mentioned at the requirements, what i did not have.

I had the same problem building htdig on BSDI.  nl_types.h was included in
later snapshots of htdig-3.2.0b3; however, that header file requires
another header file, features.h, which is missing in my system;(

I have extracted features.h from GNU glibc distribution and placed it in
my htlib directory.  Unfortunately features.h file requires yet another
header file, stubs.h, which is neither in BSDI system nor in GNU glibc;(

There is a chance that stubs.h would also require other heather file(s),
and those files require yet other files ... ad infinitum;( ;))) 

In short, don't hold your breath.

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] ssl patch

2000-11-16 Thread Joe R. Jah

On Thu, 16 Nov 2000, Jeremy Lyon wrote:

 Date: Thu, 16 Nov 2000 15:15:07 -0700
 From: Jeremy Lyon [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [htdig] ssl patch
 
 Hi
 
 I just tried to patch htdig 3.1.5 with the ssl patch
 ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.2 to a clean htdig.  I got
 these errors
 
 root:/tmp/work/htdig-3.1.5# patch -p1  ../ssl.2

Obviously you did not look at the patch;(  The first lines read:

# Tabs in this patch have been converted to spaces;(  In order to apply the
# patch to a clean htdig-3.1.5 please use the -l switch:
#
#gunzip -c htdig-3.1.5.tar.gz | tar xf -
#cd htdig-3.1.5
#patch -p1 -l  /path/to/ssl.2
 ^

That ensures the patch to apply, but it does not guaranty that the package
would compile; Michael Arndt [EMAIL PROTECTED] has testified
about that;)  If it does not compile on your system you might want to
contact the author of the patch.  You could find that information, also,
in the patch:
_
From [EMAIL PROTECTED] Sun Oct 29 11:11:32 2000
Date: Sun, 29 Oct 2000 15:24:01 -0500
From: Will Ballantyne [EMAIL PROTECTED]
To: "J. op den Brouw" [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: [htdig] ssl patches for 3.1.5
_


Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] ssl Patch for htdig

2000-11-15 Thread Joe R. Jah

On Wed, 15 Nov 2000, Michael Arndt wrote:

 Date: Wed, 15 Nov 2000 13:56:05 +0100
 From: Michael Arndt [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [htdig] ssl Patch for htdig
 
 Hello,
 
 i would need a SSL-Version of htdig. n the Archives i found a Thread
 about a SSL-Patch
 for htdig.
 Only Patchfile i found is:
 ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.1
 
 and that does not apply against a "clean htdig".
 Only Help would be aplying all patches manually.
 
 Is anyone out there who has done this already ?
 Or someone who can point me to a patch appliable against
 a clean htdig or send me patched sources ?

It was reported that the older patch:

ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0

applies to a "clean htdig-3.1.5" with the -l switch: 

cd /path/to/htdig-3.1.5/
patch -p1 -l  /path/to/ssl.0

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] ssl Patch for htdig

2000-11-15 Thread Joe R. Jah

On Wed, 15 Nov 2000, Joe R. Jah wrote:

 Date: Wed, 15 Nov 2000 10:26:20 -0800 (PST)
 From: "Joe R. Jah" [EMAIL PROTECTED]
 To: Michael Arndt [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: [htdig] ssl Patch for htdig
 
 On Wed, 15 Nov 2000, Michael Arndt wrote:
 
  Date: Wed, 15 Nov 2000 13:56:05 +0100
  From: Michael Arndt [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Subject: [htdig] ssl Patch for htdig
  
  Hello,
  
  i would need a SSL-Version of htdig. n the Archives i found a Thread
  about a SSL-Patch
  for htdig.
  Only Patchfile i found is:
  ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.1
  
  and that does not apply against a "clean htdig".
  Only Help would be aplying all patches manually.
  
  Is anyone out there who has done this already ?
  Or someone who can point me to a patch appliable against
  a clean htdig or send me patched sources ?
 
 It was reported that the older patch:
 
   ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0
 
 applies to a "clean htdig-3.1.5" with the -l switch: 
 
   cd /path/to/htdig-3.1.5/
   patch -p1 -l  /path/to/ssl.0

OK, I downloaded htdig-3.1.5.tar.gz; my htdig have been patched and
re-patched;)  I tested both versions of the patch, and found out that ssl.1
does not apply, but ssl.0, the old patch applies with -l switch.  I added
the following lines to the beginning of the patch and placed it in the
archives as: 

ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.2
_
Tabs in this patch have been converted to spaces;(  In order to apply the
patch to a clean htdig-3.1.5 please use the -l switch: 

gunzip -c htdig-3.1.5.tar.gz | tar xf -
cd htdig-3.1.5
patch -p1 -l  /path/to/ssl.2
_


Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] ssl Patch for htdig

2000-11-15 Thread Joe R. Jah

On Wed, 15 Nov 2000, Joshua Gerth wrote:

 Date: Wed, 15 Nov 2000 13:38:29 -0800 (PST)
 From: Joshua Gerth [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: [htdig] ssl Patch for htdig
 
 
 Speaking of ssl patches.  I also downloaded 3.1.5 and patched it with the
 ssl.0 patch and the -l flag.  However, I then ran into the additional
 problem that urls of the form:
   https://myserver.com
 
 were being directed to port 80, and that only urls of the form:
   https://myserver.com:433
 
 were actually going to the encrypted port.  So I hacked my copy so that
 any url which starts with 
   https
 
 goes to port 433 by default but 'http' still goes to 80 by default.  Of
 course, both can still be overridden by using the :port on the url.
 
 Did anyone else hit this?  Would this patch be useful to anyone?  If so
 I'll try to post it assuming I have the rights to do so.

I only tested to see if the patch applies to a clean 3.1.5.  I am sure,
however, that your patch will be useful to someone;)  Go ahead an post it
to the list, or just upload it to: 

ftp://ftp.ccsf.org/incoming/

P.S.  It would be nice to document your patch; save potential users the
  guesswork and digging up relevant information in the list archives;)

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Re: SSL patch for ht://Dig 3.1.5

2000-11-03 Thread Joe R. Jah

On Fri, 3 Nov 2000, Will Ballantyne wrote:

 Date: Fri, 03 Nov 2000 18:46:19 -0500
 From: Will Ballantyne [EMAIL PROTECTED]
 To: "Joe R. Jah" [EMAIL PROTECTED]
 Cc: "Brian W. Spolarich" [EMAIL PROTECTED],
 "J. op den Brouw" [EMAIL PROTECTED], [EMAIL PROTECTED]
 Subject: Re: [htdig] Re: SSL patch for ht://Dig 3.1.5
 
 ok, I have uploaded the patch...

I have moved it to:

ftp://ftp.ccsf.org/htdig-patches/3.1.5/ssl.1

However, I have not tested it.  I searched for tabs in it, though, and
found several;)

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        [EMAIL PROTECTED]


 "Joe R. Jah" wrote:
 
  On Thu, 2 Nov 2000 [EMAIL PROTECTED] wrote:
 
   Date: Thu, 2 Nov 2000 09:32:32 -0800
   From: [EMAIL PROTECTED]
   To: "Brian W. Spolarich" [EMAIL PROTECTED]
   Cc: "J. op den Brouw" [EMAIL PROTECTED], Will Ballantyne [EMAIL PROTECTED],
   [EMAIL PROTECTED], "Joe R. Jah" [EMAIL PROTECTED]
   Subject: Re: [htdig] Re: SSL patch for ht://Dig 3.1.5
  
   it looks like the process of mailing the patch converted my tabs to spaces
  (note you should be able to use "patch -l" to ignore whitespace issues).
  I am unsure what added those spaces.  I am not a regular contributor.  If
  someone lets me know where I can ftp the patch to avoid the conversion
  please let me know and I will try to do it this weekend (I am working at
  a secured site and cannot access the command line for my home server from here).
 
  You can upload it to:
 
  ftp://ftp.ccsf.org/incoming/
 
  Note that you won't be able to list directory in there, but you should be
  able to upload files.
 
  Regards,
 
  Joe
  --
   _/   _/_/_/   _/  __o
   _/   _/   _/  _/ __ _-\,_
   _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
_/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]
 
  At Wed, 1 Nov 2000 21:51:28 + (GMT) , "Brian W. Spolarich" [EMAIL PROTECTED] 
wrote:
 
  On Wed, 1 Nov 2000, Joe R. Jah wrote:
  
  | Any way I moved the patch to
  | ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0 because it obviously
  | does not apply correctly.
  
Bless you. :-)
  
-bws
  
  --
  Brian W. Spolarich - Manager, Network Systems - WALID, Inc. - [EMAIL PROTECTED]
Welcome to the Real World.  - http://www.walid.com/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Re: SSL patch for ht://Dig 3.1.5

2000-11-02 Thread Joe R. Jah

On Thu, 2 Nov 2000 [EMAIL PROTECTED] wrote:

 Date: Thu, 2 Nov 2000 09:32:32 -0800
 From: [EMAIL PROTECTED]
 To: "Brian W. Spolarich" [EMAIL PROTECTED]
 Cc: "J. op den Brouw" [EMAIL PROTECTED], Will Ballantyne [EMAIL PROTECTED],
 [EMAIL PROTECTED], "Joe R. Jah" [EMAIL PROTECTED]
 Subject: Re: [htdig] Re: SSL patch for ht://Dig 3.1.5
 
 it looks like the process of mailing the patch converted my tabs to spaces  
(note you should be able to use "patch -l" to ignore whitespace issues). 
I am unsure what added those spaces.  I am not a regular contributor.  If 
someone lets me know where I can ftp the patch to avoid the conversion
please let me know and I will try to do it this weekend (I am working at
a secured site and cannot access the command line for my home server from here).

You can upload it to:

ftp://ftp.ccsf.org/incoming/

Note that you won't be able to list directory in there, but you should be
able to upload files.

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

At Wed, 1 Nov 2000 21:51:28 + (GMT) , "Brian W. Spolarich" [EMAIL PROTECTED] 
wrote: 

On Wed, 1 Nov 2000, Joe R. Jah wrote:

| Any way I moved the patch to
| ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0 because it obviously
| does not apply correctly.

  Bless you. :-)

  -bws

-- 
Brian W. Spolarich - Manager, Network Systems - WALID, Inc. - [EMAIL PROTECTED]
  Welcome to the Real World.  - http://www.walid.com/





To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Re: SSL patch for ht://Dig 3.1.5

2000-11-01 Thread Joe R. Jah

On Wed, 1 Nov 2000, Brian W. Spolarich wrote:

 Date: Wed, 1 Nov 2000 11:36:16 + (GMT)
 From: "Brian W. Spolarich" [EMAIL PROTECTED]
 To: "J. op den Brouw" [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: [htdig] Re: SSL patch for ht://Dig 3.1.5
 
 
   Did not! :-)  I saved directly from the patch archive, which was no mean
 feat given the fact that you have to construct the patch URL by hand.

No, of course you didn't; it was originally posted to the list with tabs
already converted.  I am sure Jesse used "you" as a general pronoun; he
didn't mean to say that _you_, Brian, has copied it off the screen.  Any
way I moved the patch to ftp://ftp.ccsf.org/htdig-patches/3.1.5/0ld/ssl.0
because it obviously does not apply correctly.

You can manually apply it, but it would be a lot easier if Will reposts
the patch correctly.  It would be greatly appreciated by all the folks who
wish to use it.

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

 On Wed, 1 Nov 2000, J. op den Brouw wrote:
 
 | I think that's what happens when you copy off the screen ;-)
 | 
 | "Brian W. Spolarich" wrote:
 |  
 |  On Tue, 31 Oct 2000, Joe R. Jah wrote:
 |  
 |  | I am forwarding your message to the patch author and htdig users
 |  | mailing list, to which the patch was originally posted.  Maintainer of
 |  | the patch site does not necessarily know why a patch fails; however, I
 |  | have a pretty good idea in this case.  All tab characters in the patch
 |  | have been converted to spaces;( I checked the original mailing from
 |  | Will; the tabs were converted there already.
 | 
 | --Jesse
 | 
 
 -- 
 Brian W. Spolarich - Manager, Network Systems - WALID, Inc. - [EMAIL PROTECTED]
   Welcome to the Real World.  - http://www.walid.com/
 
 
 
 To unsubscribe from the htdig mailing list, send a message to
 [EMAIL PROTECTED]
 You will receive a message to confirm this.
 List archives:  http://www.htdig.org/mail/menu.html
 FAQ:http://www.htdig.org/FAQ.html



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] SSL patch for ht://Dig 3.1.5

2000-10-31 Thread Joe R. Jah

Hi Brian,

I am forwarding your message to the patch author and htdig users mailing
list, to which the patch was originally posted.  Maintainer of the patch
site does not necessarily know why a patch fails; however, I have a pretty
good idea in this case.  All tab characters in the patch have been
converted to spaces;(  I checked the original mailing from Will; the tabs
were converted there already. 

Regards,

Joe
-- 
 _/   _/_/_/   _/__o
 _/   _/   _/  _/   __ _-\,_
 _/  _/   _/_/_/   _/  _/   ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah   [EMAIL PROTECTED]

-- Forwarded message --
Date: Tue, 31 Oct 2000 14:32:36 + (GMT)
From: "Brian W. Spolarich" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: SSL patch for ht://Dig 3.1.5


  I downloaded ht://Dig 3.1.5 from the htdig.org website and the SSL
patch from:

  ftp://sol.ccsf.cc.ca.us/htdig-patches/3.1.5/ssl.0

  I attempt to run 'patch' using the supplied patchfile and all of the
patches fail.  Am I missing something stupid and obvious?

  -bws

admin1% tar zxf htdig-3.1.5.tar.gz 
admin1% ls  
htdig-3.1.5  htdig-3.1.5.tar.gz  ssl.0
admin1% patch -p0  ssl.0 
patching file `htdig-3.1.5/CONFIG'
patching file `htdig-3.1.5/Makefile.config.in'
Hunk #1 FAILED at 24.
1 out of 1 hunk FAILED -- saving rejects to
htdig-3.1.5/Makefile.config.in.rej
patching file `htdig-3.1.5/htcommon/DocumentDB.cc'
Hunk #1 FAILED at 217.
Hunk #2 FAILED at 284.
2 out of 2 hunks FAILED -- saving rejects to
htdig-3.1.5/htcommon/DocumentDB.cc.rej
patching file `htdig-3.1.5/htcommon/defaults.cc'
Hunk #1 FAILED at 38.
1 out of 1 hunk FAILED -- saving rejects to
htdig-3.1.5/htcommon/defaults.cc.rej
patching file `htdig-3.1.5/htdig/Document.cc'
Hunk #1 FAILED at 220.
Hunk #2 FAILED at 332.
2 out of 2 hunks FAILED -- saving rejects to
htdig-3.1.5/htdig/Document.cc.rej
patching file `htdig-3.1.5/htdig/Images.cc'
Hunk #1 FAILED at 61.
Hunk #2 FAILED at 81.
2 out of 2 hunks FAILED -- saving rejects to
htdig-3.1.5/htdig/Images.cc.rej
patching file `htdig-3.1.5/htdig/Retriever.cc'
Hunk #2 FAILED at 132.
Hunk #3 FAILED at 668.
Hunk #4 FAILED at 1232.
Hunk #5 FAILED at 1365.
4 out of 5 hunks FAILED -- saving rejects to
htdig-3.1.5/htdig/Retriever.cc.rej
patching file `htdig-3.1.5/htdig/Server.cc'
Hunk #1 succeeded at 20 with fuzz 1.
Hunk #2 FAILED at 40.
1 out of 2 hunks FAILED -- saving rejects to
htdig-3.1.5/htdig/Server.cc.rej
patching file `htdig-3.1.5/htdig/Server.h'
Hunk #1 FAILED at 26.
1 out of 1 hunk FAILED -- saving rejects to htdig-3.1.5/htdig/Server.h.rej
patching file `htdig-3.1.5/htlib/Connection.cc'
Hunk #1 FAILED at 39.
Hunk #4 FAILED at 119.
Hunk #5 FAILED at 174.
Hunk #7 FAILED at 281.
Hunk #9 FAILED at 469.
5 out of 9 hunks FAILED -- saving rejects to
htdig-3.1.5/htlib/Connection.cc.rej
patching file `htdig-3.1.5/htlib/Connection.h'
Hunk #2 succeeded at 53 with fuzz 1.
Hunk #3 succeeded at 73 with fuzz 2.
Hunk #4 FAILED at 102.
1 out of 4 hunks FAILED -- saving rejects to
htdig-3.1.5/htlib/Connection.h.rej
patching file `htdig-3.1.5/htlib/URL.cc'
Hunk #1 FAILED at 130.
Hunk #2 FAILED at 223.
Hunk #3 FAILED at 492.
Hunk #4 FAILED at 549.
4 out of 4 hunks FAILED -- saving rejects to htdig-3.1.5/htlib/URL.cc.rej
patching file `htdig-3.1.5/htlib/URL.h'
Hunk #1 FAILED at 48.
1 out of 1 hunk FAILED -- saving rejects to htdig-3.1.5/htlib/URL.h.rej
 
-- 
Brian W. Spolarich - Manager, Network Systems - WALID, Inc. - [EMAIL PROTECTED]
  Welcome to the Real World.  - http://www.walid.com/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] Compiling htdig 3.20b2 on BSDI 4.01

2000-10-08 Thread Joe R. Jah

Hi Folks,

Has anyone made htdig 3.2.x compile under BSDI 4.01?

Here is my latest unsuccessful stab at compiling 3.20b2 on BSDI with the
following flags and results: 

CFLAGS="-O2"
CXXFLAGS="-O2"
CPPFLAGS="-I/usr/include/g++"
CXX="/usr/bin/g++"
export CFLAGS CXXFLAGS CPPFLAGS CXX
./configure

[...]

gmake
Making all in db
gmake[1]: Entering directory `/tmp/htdig-3.2.0b2/db'
cd dist ; if [ -f Makefile ] ; then gmake PACKAGE=htdig all ; fi
gmake[2]: Entering directory `/tmp/htdig-3.2.0b2/db/dist'
gcc -c -O2 -Wall -I. -I./../include -I/usr/include/g++
-I/usr/include/g++/std -I/usr/local/include -I/usr/local/include
../btree/btc
In file included from /usr/include/g++/std/stddef.h:11,
 from /usr/include/g++/std/bastring.h:35,
 from /usr/include/g++/std/string.h:6,
 from ../btree/bt_compare.c:56:
/usr/include/g++/_G_config.h:39: syntax error before `_G_ptrdiff_t'
/usr/include/g++/_G_config.h:39: warning: data definition has no type or
storage class
/usr/include/g++/_G_config.h:46: syntax error before `_G_wchar_t'
/usr/include/g++/_G_config.h:46: warning: data definition has no type or
storage class
In file included from /usr/include/g++/std/bastring.h:35,
 from /usr/include/g++/std/string.h:6,
 from ../btree/bt_compare.c:56:
/usr/include/g++/std/stddef.h:14: syntax error before string constant
/usr/include/g++/std/stddef.h:23: syntax error before `}'
gmake[2]: *** [bt_compare.o] Error 1

As always, any pointers are greatly appreciated.

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] is there a problem with the documentation?

2000-08-04 Thread Joe R. Jah

On Fri, 4 Aug 2000, Gilles Detillieux wrote:

 Date: Fri, 4 Aug 2000 10:28:07 -0500 (CDT)
 From: Gilles Detillieux [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [htdig] is there a problem with the documentation?
 
 Hi, folks.  Geoff and I have noticed recently that there are a LOT of
 questions being asked on this list that are readily answered in the FAQ
 and/or in the Attributes documentation, both available on the web site.
 We'd really like to cut down on this traffic, but we need your help.
 
 If you've been RTFM'ed recently - now be honest because we know there are
 a lot of you out there who have - what can we do to make it easier to
 find answers on the web site rather than using the mailing list as a
 first recourse?

It may help to include that info in the message footer for those who are
used to the mailing list as the first recourse;)
  _ 
  To unsubscribe from the htdig mailing list, send a message to
  [EMAIL PROTECTED] 
  You will receive a message to confirm this. 
  List archives:http://www.htdig.org/mail/menu.html
  FAQ:  http://www.htdig.org/FAQ.html
  _

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




[htdig] (3.1.5) Missing [Next] in the 10th page

2000-05-26 Thread Joe R. Jah

Hi Folks,

When I search for keywords that result in many matches I observe a
situation like when I search for "htdig" in www.htdig.org;)

 Documents 1 - 10 of 15666 matches. More*'s indicate a better match.

 At the bottom of the page there are icons 1 to 10 and a right arrow
 [Next]

I continue to the next page and see:

 Documents 11 - 20 of 15666 matches. More*'s indicate a better match.

 At the bottom of the page there are a left arrow [Previous], icons 1 to
 10, and a right arrow [Next]

...

When I reach the tenth page I see:

 Documents 91 - 100 of 15666 matches. More *'s indicate a better match.

 At the bottom of the page there are a left arrow [Previous], icons 1 to
 10.

The icon that points to the next page disappears;(

I wouldn't be able to see any more pages, unless I increase the number of
available icons in the config file.  I can, of course, increase the number
of matches per page to see more results in the same number of pages, but
the problem is just shifts farther; it does not disappear.

Is it possible to have the [Next] icon appear on the page of the last
numbered icon, and beyond that, perhaps only with the [Previous] icon, and
possibly some other icon indicating to have passed beyond 10; like so: 

 Left arrow [Previous], Beyond Last icon, and right arrow [Next]

I appreciate any pointers.

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] Re: 3.2.0b2 - problem with either no stars, or infinite loop writing out (PR#846)

2000-05-15 Thread Joe R. Jah

On Mon, 15 May 2000, Gilles Detillieux wrote:

 Date: Mon, 15 May 2000 14:12:00 -0500 (CDT)
 From: Gilles Detillieux [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
 Subject: [htdig] Re: 3.2.0b2 - problem with either no stars, or infinite loop 
writing out (PR#846)
 
 Yes, Terry Luedtke reported this problem and posted a patch for it, to
 [EMAIL PROTECTED], back on May 3rd.  It didn't seem to make it into
 Joe's patch archive, so I'll repost it here for those who missed it.
 The patch fixes a few bugs in the score calculation which cause the
 problems in star generation.

I do not know how I missed it, but it is now in there:

ftp://ftp.ccsf.org/htdig-patches/3.2.0b2/noStars.0

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] problems with the accent patch

2000-03-02 Thread Joe R. Jah

On Thu, 2 Mar 2000, Eric van der Vlist wrote:

 Date: Thu, 02 Mar 2000 22:12:34 +0100
 From: Eric van der Vlist [EMAIL PROTECTED]
 To: Gilles Detillieux [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
 [EMAIL PROTECTED]
 Subject: Re: [htdig] problems with the "accent" patch
 
 Hi,
 
 I have applied this patch as well and noticed that it's working for most
 of the words, but not for others...
 
 Looking at the output of "htfuzzy -vv accents", I have noticed that all
 the words are truncated to 12 characters and that the words which are
 truncated are those for which there is a problem.
 
 For instance searching for "enchere" (not truncated) will return the
 matching for the correctly spelled word (with egrave;) while searching
 for "specification" truncated to "specificatio" will not match
 specification with a eacute;.
 
 If I search for "specificatio", I do get the matching for the
 accentuated word...
 
 I am trying to find where this truncation happens, but if anyone more
 familiar with the code can shed some light, it would help !

In the htdig.conf file set maximum_word_length attribute.  It is by
default 12.

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



[htdig] Someone's forging email addresses of htdig members

1999-11-16 Thread Joe R. Jah


On Tue, 16 Nov 1999 [EMAIL PROTECTED] wrote:

 Date: Tue, 16 Nov 1999 21:05:40 +1100
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [htdig] hey wassup HtDIG ;)
 
 
 here is the site you wanted... XXX it's the
 one that gives you free membership access (all hacked) to abotu 300
 membership based sex sites. k bye...  ps: why r u using htdig.org now?
 it doens't make sence, anyway *bye*... 
 
 
 To unsubscribe from the htdig mailing list, send a message to
 [EMAIL PROTECTED] containing the single word unsubscribe in
 the SUBJECT of the message.

I don't believe Geoff sent that message.  I received another forged
message masquerading to have come form [EMAIL PROTECTED]  I also
received an undeliverable message, which _I_ supposedly had sent; meaning
they have forged my address too;(

Any ideas?

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



Re: [htdig] htsearch on BSDI 4.0.1

1999-10-27 Thread Joe R. Jah


On Thu, 28 Oct 1999, Markus Mohr wrote:

 Date: Thu, 28 Oct 1999 00:56:23 +0200
 From: Markus Mohr [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [htdig] htsearch on BSDI 4.0.1
 
 
 Hi!
 
 I´ve compiled and configured htdig 3.1.3 on BSDI. The dig ran fine, but
 htsearch simply segfaults.
 I´ve got a htsearch.core, but what can I do now?

 . Make clean.
 . Remove references to regex.o from htlib/Makefile.
 . Remove htlib/regex.h.
 . Remove references to htlib/regex.h in htfuzzy/Makefile, if you have
   done make depend.
 . Make.
 . Voila;)

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



Re: [htdig] rundig locks

1999-09-27 Thread Joe R. Jah


On Mon, 27 Sep 1999, Andy Malato wrote:

 Date: Mon, 27 Sep 1999 09:38:22 -0400
 From: Andy Malato [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [htdig] rundig locks 
 
 
 Hello, I've just compiled and installed htdig 3.1.3 on my BSDI 3.1
 machine,  and when I attempt to run rundig to create the sample search,
 nothing happens, I then have to user cntrl-c to break out of it.  Did I
 do something wrong?

Search the archives at http://www.mail-archive.com/htdig%40htdig.org/ for
"Memory fault (core dumped)" and/or "Segmentation Fault V3.1.2"

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



Re: [htdig] Scripts that use GDBM_File

1999-09-26 Thread Joe R. Jah


On Sun, 26 Sep 1999 [EMAIL PROTECTED] wrote:

 Date: Sun, 26 Sep 1999 14:14:13 +0200 (MEST)
 From: [EMAIL PROTECTED]
 To: "Joe R. Jah" [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: [htdig] Scripts that use GDBM_File
 
 Joe R. Jah writes:
   
   Hi Folks,
   
   I have been trying to take advantage of some of the scripts in contrib
   folder; yesterday I noticed that five of them, changehost/changehost.pl,
   doclist/doclist.pl, doclist/listafter.pl, urlindex/urlindex.pl, and
   wordfreq/wordfreq.pl, have an identical problem: they all die on the
   following line:
   
   tie(%docdb, GDBM_File, $dbfile, GDBM_READER, 0) die ...
   
 
  Well, since htdig uses Berkeley DB files and not GDBM files you could
 just change the tie call to DB_File. I'm not sure if the internal structure
 of the data stored in the file is still compatible, though :-}

Thanks Loic; I replaced GDBM_File with DB_File in the tie call, but now I
get:

Can't locate object method "TIEHASH" via package "DB_File" at doclist.pl line 13.
Can't locate object method "TIEHASH" via package "DB_File" at listafter.pl line 27.
..

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



[htdig] Scripts that use GDBM_File

1999-09-25 Thread Joe R. Jah


Hi Folks,

I have been trying to take advantage of some of the scripts in contrib
folder; yesterday I noticed that five of them, changehost/changehost.pl,
doclist/doclist.pl, doclist/listafter.pl, urlindex/urlindex.pl, and
wordfreq/wordfreq.pl, have an identical problem: they all die on the
following line:

tie(%docdb, GDBM_File, $dbfile, GDBM_READER, 0) die ...

All these scripts use GDBM_File.  Am I missing something, or they are all
incompatible with htdig 3.1.x db?

I appreciate any pointers.

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



Re: [htdig] htdig and symbolic links

1999-09-10 Thread Joe R. Jah


On Fri, 10 Sep 1999, Nick O'Brien wrote:

 Date: Fri, 10 Sep 1999 15:13:20 +0100 (GMT Daylight Time)
 From: Nick O'Brien [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [htdig] htdig and symbolic links
 
 
 Hi,
 
 We are implementing htdig (v3.1.2 + the patch kit on Solaris 2.6) on our 
 main web server. One comment we have had is that there are alot of 
 duplicate search results pointing to the same web pages. This is usually 
 caused by having several different Unix symbolic links pointing to the 
 same directory/file in the web document tree.
 
 Is there any way we can prevent the indexing of these duplicates? I see 
 from the mailing list archives that for previous versions of htdig there 
 were patches to fix this issue but they are not available for the current 
 version.
 
 I see from the bug database the latest advice is to eliminate symbolic 
 links - however for many practical reasons it is not possible for us to 
 do this.
 
 
 Is it for example possible to configure htdig to index our URLs via the 
 filesystem instead of HTTP (i.e using local_urls) and to ignore the 
 symbolic links?
 
 How are people on the list working round this problem? Or is this an 
 unresolved bug I will need to (re)log with the htdig developers?

Our site is in the same boat that your site is in; I use the same old
patch for version 3.0.8b2, but I apply it manually at every new release.
You can get it from:

ftp://sol.ccsf.cc.ca.us/htdig-patches/3.0.8b2/Retriever.cc.0

Then with an ugly extensive set of local_urls for each and every symbolic
link in the site:( I mange to suppress duplicates, quadruplicates, and
multuplicates;)

Boy, do I look forward to 3.2, which is promised to take care of the
menace of duplicates. 

Regards,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



[htdig] Meta keywords abuse;(

1999-09-08 Thread Joe R. Jah


Hi Folks,

I run htdig for a college; several departments maintain their respective
sites on a few servers.  Each department has a webmaster, sort of, who
sets up their pages; some departments take advantage of the enthusiasm of
their students and leave it to them to set up the site.  Sometimes
students become over-enthusiastic;) 

Some of the over-enthusiastic students, or webmasters, get the idea of
using, I'd say abusing, the meta keywords.  They create a meta tag
keywords set of over a hundred very common words and replicate it in two
dozen files in their site.  When anyone searches for any of those common
words, the first two dozen results would be from that site. 

My solution, for now, is to exclude their site from the dig; however, I
would like to find a less drastic measure.  I would like to dig their
site, but incapacitate _their_ meta tag keywords.  I'd like to leave meta
tags for judicious use, a few descriptive keywords, not a hundred!!

Secondly, and more importantly, I'd like to find a way to discover such
abuses.  I stumbled upon one of these sites, by sheer luck the day after
they set it up;)  I'd like to systematically search and find any site who
abuses meta tags like this, SD;) 

I appreciate any pointers,

TIA,

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



Re: [htdig] BSDI installation

1999-09-04 Thread Joe R. Jah


On Sat, 4 Sep 1999, Biranit Goren wrote:

 Date: Sat, 04 Sep 1999 01:54:03 +0200
 From: Biranit Goren [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [htdig] BSDI installation
 
 
 Hello,
 
 I have had htdig on our website, at Atlas F1, for quite some time now and
 have grown totally reliant on its wonderful search service. Our current OS
 is Linux RedHat 6.0.
 
 However, at the end of the month we are moving to a new, dedicated server,
 where the OS will be BSDI 4.0. I understand that htdig cannot be installed
 on this OS, but I also saw in the TODO page that this will be changed (or
 has changed, I am not sure what you regard as a bullet and what you regard
 as a circle there!!!). 
 
 I am totally depressed by the thought of not being able to use htdig -
 please, please tell me there is a solution? We have 2.5 million visitors a
 month to our website, and the search service is imperative for us. The
 thought of having to find a new search engine is utterly annoying, so
 please tell me how I can install htdig on BSDI after all? Please?

Don't be depressed; thanks to Gilles and Geoff I solved this problem last
July. 

3.1.2 doesn't use the regex code in the C library, but rather it bundles
the GNU regex code in the package, and puts it in htlib/libht.a.  This GNU
regex.c code is causing a conflict with BSDI C/C++ library.  BSDI already
has a regex.h in /usr/include directory, and regex functions in C library,

To go around this problem do the following:

 . Remove references to regex.o from htlib/Makefile.
 . Remove, htlib/regex.h.
 . Remove references to htlib/regex.h in htfuzzy/Makefile.

Voila;)

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



Re: [htdig] Memory fault (core dumped)

1999-07-29 Thread Joe R. Jah


On Wed, 28 Jul 1999, Gilles Detillieux wrote:

 Date: Wed, 28 Jul 1999 17:50:23 -0500 (CDT)
 From: Gilles Detillieux [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
 Subject: Re: [htdig] Memory fault (core dumped)
 
 OK, upon closer examination, it appears I was mistaken about one point.
 3.1.2 doesn't use the regex code in the C library, but rather it bundles
 the GNU regex code in the package, and puts it in htlib/libht.a.
 The extern "C" construct above doesn't work because htlib/regex.h
 already includes this construct - that's why it was removed from
 htfuzzy/Endings.cc.
 
 So, perhaps this GNU regex.c code is causing a conflict with your C
 or C++ library.  If you already have a regex.h in your /usr/include
 directory, and regex functions in your C library, you might want to try
 using these instead of the ones in htlib.  To do this, I think you'd need
 to remove references to regex.o from htlib/Makefile, remove regex.o from

Bingo;)  I removed references to regex.o from htlib/Makefile.

 htlib/libht.a, and probably also remove htlib/regex.h.  If this works,

I ran make clean and removed htlib/regex.h; however, I had to also remove
references to htlib/regex.h in htfuzzy/Makefile.  It worked like a charm;)

Thanks a million Gilles and Geoff.

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



Re: [htdig] Memory fault (core dumped)

1999-07-28 Thread Joe R. Jah


Hi BSDI Folks,

On htdig mailing list a question has come up, to which you might know the
answer, is there a bug in /usr/lib/libg++.so in BSDI.  I appreciate any
pointers and/or comments. 

On Wed, 28 Jul 1999, Geoff Hutchison wrote:

 Date: Wed, 28 Jul 1999 16:21:04 -0400
 From: Geoff Hutchison [EMAIL PROTECTED]
 To: "Joe R. Jah" [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: [htdig] Memory fault (core dumped)

 "Joe R. Jah" wrote:
  Yes, I grepped all the files in the source directory for "Regex"; it's
  only in htdoc/TODO.html and htlib/regex.c as a comment.  I found many
  instances of it, though, in /usr/lib/libg++.so.  Does that mean anything
  at all?

 It may mean a bug in libg++ or a mismatch between libg++ and your C
 library.
 What compiler (version) did you use to compile? Have you tried compiling
 with a more recent compiler (and/or libstdc++)?

Gcc version 2.7.2.1; I haven't tried with any other.

 As Gilles pointed out, one of the changes in 3.1.2 was to use the system
 regex.h, which seems almost uniformly much faster than the rx code that
 we had been using. This is the likely culprit, but I don't know why it's
 not working for you.

On Wed, 28 Jul 1999, Gilles Detillieux wrote:

 Date: Wed, 28 Jul 1999 15:13:02 -0500 (CDT)
 From: Gilles Detillieux [EMAIL PROTECTED]
 To: "Joe R. Jah" [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: [htdig] Memory fault (core dumped)
 
 According to Joe R. Jah:
  On Wed, 28 Jul 1999, Geoff Hutchison wrote:
#2  0x180dcf29 in Regex::Regex ()
#3  0x180dd138 in global constructors keyed to Regex::Regex ()
  
   This still confuses me. There isn't a 'Regex' class in 3.1.2. There's a
   Regex class (a fuzzy) in the 3.2 development source. But that shouldn't
   be in 3.1.2, so 'Regex::Regex()' shouldn't be called.
  
  Yes, I grepped all the files in the source directory for "Regex"; it's
  only in htdoc/TODO.html and htlib/regex.c as a comment.  I found many
  instances of it, though, in /usr/lib/libg++.so.  Does that mean anything
  at all?
 
 Yes, I suspect that the Regex class might be right in libg++ on your
 system.  The first stack backtrace you sent seemed to suggest that all
 the chain of function calls was for some internal initialization sequence
 (none of the symbols seemed to be within ht://Dig's code).
 
 I suspect the problem is somehow related to the switch in 3.1.2 from the
 old, slow regex code bundled in with ht://Dig, to the faster C library
 regex code.
 
 One of the differences I could see is that in the old code in
 htfuzzy/Endings.cc, it did this to include the bundled regex header file:
 
 extern "C"
 {
 # include rxposix.h
 }
 
 but now it does this:
 
 #include regex.h
 
 Just a wild guess, but maybe surrounding that line with an extern "C"
 construct would help - it might allow inclusion of C library stuff,
 but avoid unwanted C++ library stuff, which appears to be the source of

I did it like so:
__
extern "C"
{
#include regex.h
}
__

I got a compile error:
__
regex.c:210: syntax error before string constant
regex.c:213: syntax error before `}'
gmake[1]: *** [regex.o] Error 1
gmake[1]: Leaving directory
`/usr/home/jjah/tmp/htdig-3.1.2/htdig-3.1.2/htlib'
gmake: *** [all] Error 1
__


 grief here.  The other possibility (more wild speculation here) is that
 on BSDI systems, the regex code in the C library somehow conflicts with
 the regex code in the C library, so it would need to be avoided on BSDI
 (or you'd need a different C++ library).

I CC this message to BSDI-Users mailing list for that question.   

 It's odd, though, that the problem arises in htsearch but not htfuzzy
 (or does it?).  Does htfuzzy do anything differently in its initialization
 to avoid this problem? (retorical question, but hey, if anyone has the answer
 I'd like to hear it)

Yes the problem also arises in htfuzzy.

Joe
-- 
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



htdig: Patch: for keyword(s)_factor typo in the docs

1999-01-09 Thread Joe R. Jah

Hi Folks,

The following patch is against a virgin 3.1.0b4:) 

Begin Patch___
*** htdoc/attrs.html.orig   Tue Dec 22 17:53:13 1998
--- htdoc/attrs.htmlSat Jan  9 11:37:59 1999
***
*** 1694,1701 
  hr
  dl
dt
! stronga name="keyword_factor"
! keyword_factor/a/strong
/dt
dd
  dl
--- 1694,1701 
  hr
  dl
dt
! stronga name="keywords_factor"
! keywords_factor/a/strong
/dt
dd
  dl
***
*** 1731,1737 
  emexample:/em
/dt
dd
! keyword_factor: 12
/dd
  /dl
/dd
--- 1731,1737 
  emexample:/em
/dt
dd
! keywords_factor: 12
/dd
  /dl
/dd
***
*** 4331,4338 
  to be ignored. The number may be a floating point
  number. See also the a href="#heading_factor"
  heading_factor_[1-6]/a, a href="#title_factor"
! title_factor/a, and a href="#keyword_factor"
! keyword_factor/a attributes.
/dd
dt
  emexample:/em
--- 4331,4338 
  to be ignored. The number may be a floating point
  number. See also the a href="#heading_factor"
  heading_factor_[1-6]/a, a href="#title_factor"
! title_factor/a, and a href="#keywords_factor"
! keywords_factor/a attributes.
/dd
dt
  emexample:/em
*** htdoc/cf_byname.html.orig   Tue Dec 22 17:53:13 1998
--- htdoc/cf_byname.htmlSat Jan  9 11:36:34 1999
***
*** 112,118 
  /font br
   bK/b font face="helvetica,arial" size="2"br
   img src="dot.gif" alt="*" a target="body" href=
! "attrs.html#keyword_factor"keyword_factor/abr
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#keywords_meta_tag_names"
  keywords_meta_tag_names/abr
--- 112,118 
  /font br
   bK/b font face="helvetica,arial" size="2"br
   img src="dot.gif" alt="*" a target="body" href=
! "attrs.html#keywords_factor"keywords_factor/abr
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#keywords_meta_tag_names"
  keywords_meta_tag_names/abr
*** htdoc/cf_byprog.html.orig   Tue Dec 22 17:53:13 1998
--- htdoc/cf_byprog.htmlSat Jan  9 11:35:45 1999
***
*** 67,73 
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#image_list"image_list/abr
   img src="dot.gif" alt="*" a target="body" href=
! "attrs.html#keyword_factor"keyword_factor/abr
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#limit_normalized"limit_normalized/abr
   img src="dot.gif" alt="*" a target="body" href=
--- 67,73 
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#image_list"image_list/abr
   img src="dot.gif" alt="*" a target="body" href=
! "attrs.html#keywords_factor"keywords_factor/abr
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#limit_normalized"limit_normalized/abr
   img src="dot.gif" alt="*" a target="body" href=
_End Patch

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]


--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



htdig: Patch: for limit_normalized alphabetical rank

1999-01-09 Thread Joe R. Jah

Hi Folks,

This patch is against keyword(s)_factor patched 3.1.0b4,
ftp://sol.ccsf.cc.ca.us/htdig-patches/3.1.0b4/attrs-cf_byname-prog.html.0

___Begin Patch_
*** htdoc/cf_byprog.html.patchedSat Jan  9 11:35:45 1999
--- htdoc/cf_byprog.htmlSat Jan  9 12:45:01 1999
***
*** 69,78 
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#keywords_factor"keywords_factor/abr
   img src="dot.gif" alt="*" a target="body" href=
- "attrs.html#limit_normalized"limit_normalized/abr
-  img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#keywords_meta_tag_names"
  keywords_meta_tag_names/abr
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#limit_urls_to"limit_urls_to/abr
   img src="dot.gif" alt="*" a target="body" href=
--- 69,78 
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#keywords_factor"keywords_factor/abr
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#keywords_meta_tag_names"
  keywords_meta_tag_names/abr
+  img src="dot.gif" alt="*" a target="body" href=
+ "attrs.html#limit_normalized"limit_normalized/abr
   img src="dot.gif" alt="*" a target="body" href=
  "attrs.html#limit_urls_to"limit_urls_to/abr
   img src="dot.gif" alt="*" a target="body" href=
End Patch__

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



htdig: 3.1.b2 - 3.1.b3 performance degradation +

1998-12-17 Thread Joe R. Jah

Hi Geoff,

Yesterday I installed htdig 3.1.b3 on my machine.  I compiled it on a BSDI
box, everything was left as was except Retriever.cc, which was patched
with the old 3.0.b2 patch,
ftp://sol.ccsf.cc.ca.us/htdig-patches/3.0.8b2/Retriever.cc.0
to exclude local duplicates, same as my 3.1.b2. 

The results:

1. It takes considerably longer to search ( 10 to 20 times) than
   3.1.b2
2. Many of the pages present in 3.1.b2 results, are absent in
   3.1.b3 results.
3. I can not explain the size changes of the db.wordlist and db.words.db 
   files.

   3.1.b2 DB files:

   -rw-r--r--  1 jjah  www  11360256 Dec 16 02:35 db.docdb
   -rw-r--r--  1 jjah  www385024 Dec 16 02:35 db.docs.index
   -rw-r--r--  1 jjah  www  19231896 Dec 16 02:34 db.wordlist
   -rw-r--r--  1 jjah  www  16835584 Dec 16 02:34 db.words.db

   3.1.b3 DB files:

   -rw-r--r--  1 jjah  www  11515904 Dec 17 02:37 db.docdb
   -rw-r--r--  1 jjah  www372736 Dec 17 02:37 db.docs.index
   -rw-r--r--  1 jjah  www  17188189 Dec 17 02:36 db.wordlist
   -rw-r--r--  1 jjah  www  17328128 Dec 17 02:36 db.words.db

I appreciate any pointer.

TIA,

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: Problems with using htdig -a

1998-09-22 Thread Joe R. Jah

On Mon, 21 Sep 1998, Geoff Hutchison wrote:

 Date: Mon, 21 Sep 1998 23:13:12 -0400
 From: Geoff Hutchison [EMAIL PROTECTED]
 To: "Joe R. Jah" [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: htdig: Problems with using htdig -a
 
 At 1:23 AM -0400 9/18/98, Joe R. Jah wrote:
 I assume this increase in size of db files and theincrease in the reported
 number of documents will be cumulative over time if one uses this
 workaround; It will probably increase the actual search time as well;(
 
 I'm not sure what's going on here. Perhaps you could export the ASCII
 database for the db with and without this behavior. I'd be interested to
 see if documents are being duplicated. Do you use "remove_bad_urls"?

Yes documents are being duplicated, triplicated, and ...  That's why I use
the old "Excluding directories and duplicate URLs patch."

Yes I have the line 

remove_bad_urls:true

in my htdig.conf file.

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]


--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: Problems with using htdig -a

1998-09-18 Thread Joe R. Jah

On Thu, 17 Sep 1998, Geoff Hutchison wrote:

 Date: Thu, 17 Sep 1998 23:47:47 -0400
 From: Geoff Hutchison [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: htdig: Problems with using htdig -a
 
 Hi,
 
 I consider the following a bug, since it's not documented. Fortunately
 there's an easy workaround.
 
 I normally run the dig with the switch -a to use alternate files (allowing
 others to search as I'm digging). Usually I don't use the switch -i, so it
 should do an "update" dig and index only the changed or new files (which
 should be a small subset of the 50,000 pages). Then the script moves the
 files into place at the end of the run.
 
 However, when using "-a" I wasn't seeing an update of the database.
 Essentially htdig looks at the db.docs.work file and found it empty. So it
 updates the empty db by doing a full initial dig. :-(
 
 Here's an example solution: (yes, you might want to ignore the first cp
 commands and change the first two mv commands to cp)
 
 BASEDIR=/opt/htdig
 cp $BASEDIR/db/db.wordlist $BASEDIR/db/db.wordlist.work
 cp $BASEDIR/db/db.docdb $BASEDIR/db/db.docdb.work
 $BASEDIR/bin/htdig -a -s
 $BASEDIR/bin/htmerge -a -s
 mv $BASEDIR/db/db.wordlist.work $BASEDIR/db/db.wordlist
 mv $BASEDIR/db/db.docdb.work $BASEDIR/db/db.docdb
 mv $BASEDIR/db/db.docs.index.work $BASEDIR/db/db.docs.index
 mv $BASEDIR/db/db.words.db.work $BASEDIR/db/db.words.db
 
 This changed a 1 hr. 30 min. dig into a 15 min dig, even counting the
 shuffling of files. Faster is better. :-)

I have 2809 documents on a local server; I also use the -a switch; it
normllyt takes about 12 minutes to rundig.  I tried your easy workaround
and got the following results: 

According to the report I have 3128 documents; it took about 14 minutes to
rundig.  The size of my db files increased by about 30%:

-rw-r--r--  1 jjah  www  13281280 Sep 17 21:36 db.docdb
-rw-r--r--  1 jjah  www  10482688 Sep 17 02:33 db.docdb.old
-rw-r--r--  1 jjah  www398336 Sep 17 21:35 db.docs.index
-rw-r--r--  1 jjah  www343040 Sep 17 02:33 db.docs.index.old
-rw-r--r--  1 jjah  www  22928417 Sep 17 21:36 db.wordlist
-rw-r--r--  1 jjah  www  17329728 Sep 17 02:32 db.wordlist.old
-rw-r--r--  1 jjah  www  19543040 Sep 17 21:34 db.words.db
-rw-r--r--  1 jjah  www  15352832 Sep 17 02:32 db.words.db.old

I assume this increase in size of db files and theincrease in the reported
number of documents will be cumulative over time if one uses this
workaround; It will probably increase the actual search time as well;(

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: ht3.1.0b1 and PDF

1998-09-15 Thread Joe R. Jah

On Sun, 13 Sep 1998, Geoff Hutchison wrote:

 Date: Sun, 13 Sep 1998 09:08:55 -0400
 From: Geoff Hutchison [EMAIL PROTECTED]
 To: "Joe R. Jah" [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: htdig: ht3.1.0b1 and PDF

[snip]

 This will be in 3.1.0b2. But it's not really supported in 3.1.0b1. If
 someone wants to beat me to a patch, great!

I applied the following patch to PDF.cc:
___
$ diff -c PDF.cc.old PDF.cc
*** PDF.cc.old  Tue Sep 15 00:40:51 1998
--- PDF.cc  Tue Sep 15 00:52:03 1998
***
*** 140,147 
return;
}

! // Use acroread as a filter to convert to PostScript.
! acroread  " -toPostScript "  pdfName  " /tmp 21";

  system(acroread);
  FILE* psFile = fopen(psName, "r");
--- 140,147 
return;
}

! // Use pdftops as a filter to convert to PostScript.
! acroread  " "  pdfName  " 21";

  system(acroread);
  FILE* psFile = fopen(psName, "r");
___


Now when I run htdig I get the following errors:

Error (0): PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table


Is it the patch or some damaged PDF file?

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



htdig: Re: Virtual memory exceeded in `new'

1998-09-14 Thread Joe R. Jah

On Mon, 14 Sep 1998, Geoff Hutchison wrote:

 Date: Mon, 14 Sep 1998 11:20:28 -0400
 From: Geoff Hutchison [EMAIL PROTECTED]
 To: "Joe R. Jah" [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: Virtual memory exceeded in `new'
 
 Is there anything I can do to make htnotify get enough VM?
 
 Sure. I'd guess there's a memory leak. So if anyone finds it (I really
 haven't looked at the code), you won't have to worry about "enough" VM.

Thanks to a tip from Theodore Hope from [EMAIL PROTECTED] mailing list,
I solved the problem by prepending the htnotify command line in rundig
with "unlimit;"

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]



--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: ht3.1.0b1 and PDF

1998-09-13 Thread Joe R. Jah

On Fri, 11 Sep 1998, Geoff Hutchison wrote:

 Date: Fri, 11 Sep 1998 13:28:47 -0400
 From: Geoff Hutchison [EMAIL PROTECTED]
 To: Chris Brown [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: htdig: ht3.1.0b1 and PDF
 
 At 12:51 PM -0400 9/11/98, Chris Brown wrote:
 I recently installed htdig 3.1b1 and it works fine except now it won't
 find my acroread to convert .pdf's.  I still have the acroread: parameter
 set in the htdig.conf file as it was in my old one.
 
 Is there a new param to set this?
 
 Yup. The param is "pdf_parser" since there was a lot of discussion about
 using other programs to parse PDF files. I don't think anyone has tested
 using other programs, but I figured it would be better to name it
 "pdf_parser" than "acroread" anyway.

I run htdig on a BSDI 3.1 box; acroread does not have a port for it, but I
have pdftops. 

When I set the param up in my htdig.config file as follows:

pdf_parser: /usr/contrib/bin/pdftops

I get the following result:

Usage: pdftops [-f int] [-l int] [-h] [-help] PDF-file [PS-file]
  -f   : first page to print
  -l   : last page to print
  -h   : print usage information
  -help: print usage information
PDF::parse: cannot open acroread output


when I set it up like:

pdf_parser: /usr/contrib/bin/pdftops %src %dest 

I get the following result:

PDF::parse: cannot find acroread


What am I doing wrong?

TIA,

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



htdig: Excluding directories and duplicate URLs patch

1998-09-13 Thread Joe R. Jah

Hi Geoff,

Thank you very much for carrying this great software forward.

I compiled/installed ht://Dig 3.1.0b1 a few hours ago on a BSDI 3.1 box. 
When I ran the rundig script I realized that the sizes of files in db
directory were dramatically increased, about 70%.  I searched several
local file systems and found out that I had many duplicate and triplicate
indexed files.  I immediately checked Retriever.cc and realized that the
patch

   ftp://sol.ccsf.cc.ca.us/htdig-patches/3.0.8b2/Retriever.cc.0 

have not been applied to ht://Dig 3.1.0b1; I applied it manually and
recompiled htdig and reran rundig.  My databases shrank to their normal
size; no more duplicates;-)  Please include this patch in your next
release. 

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]


--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



htdig: Virtual memory exceeded in `new'

1998-09-13 Thread Joe R. Jah

Hi Geoff,

I have had this message ever since the first release of ht//Dig.  my
solution have been to comment out the htnotify line in rundig.  I
uncommented it today to find out if it would work in 3.1.0b1, but
unfortunately I got the same message.

I have 96 Megs of RAM and my total swap space is about 370 Megs:
__
$ pstat -s
Device name  1K-blocks  Type
sd0b 49148  Interleaved
sd1a 61436  Sequential
sd2a 12284  Sequential
sd2b255996  Sequential
0 (1K-blocks) allocated out of 378864 (1K-blocks) total, 0% in use
__


Is there anything I can do to make htnotify get enough VM?

TIA,

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



htdig: PDF parser

1998-09-11 Thread Joe R. Jah

Hi Folks,

Has anyone installed a PDF parser on BSDI?  Acrobat reader has versions
for Linux, AIX, Solaris, SunOS, IRIX, HP-UX, and Digital, but non for
BSDI;(

I appreciate any pointers.

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]





--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: patch site

1998-08-29 Thread Joe R. Jah

On Thu, 27 Aug 1998, Gordon Hopper wrote:

 Date: Thu, 27 Aug 1998 10:09:45 -0600
 From: Gordon Hopper [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: htdig: patch site

[snip]

 I have a clean version that I downloaded from http://htdig.sdsu.edu/. 
 If there are other places to download files, please let me know where
 they are.
 
 Gordon

ftp://sol.ccsf.cc.ca.us/htdig-patches/
Or
http://sol.ccsf.cc.ca.us/ftp/htdig-patches/

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]




--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: PDF parsing

1998-07-21 Thread Joe R. Jah

On Tue, 21 Jul 1998, Colin Viebrock wrote:

 Date: Tue, 21 Jul 1998 11:42:41 -0400
 From: Colin Viebrock [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: "[EMAIL PROTECTED]" [EMAIL PROTECTED]
 Subject: Re: htdig: PDF parsing

[snip]

 send this file to you seperately, so as not to clutter the list.  (Is there
 a location for this file on the htdig site yet?)  Don't compile it yet

Yes you can put it in ftp://sol.ccsf.cc.ca.us/incoming directory.  It is
an unreadable, but writable directory.  I will place it in
ftp://sol.ccsf.cc.ca.us/htdig-patches/3.0.8b2/, the unofficial htdig
patches site.

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: What am I doing wrong

1998-06-11 Thread Joe R. Jah

On Wed, 10 Jun 1998, Peter Burden wrote:

 Date: Wed, 10 Jun 1998 21:33:38 +0100
 From: Peter Burden [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: htdig: What am I doing wrong
 
 Hello,
   We've been running htdig on a medium site (some 18000 pages)
 for some time and it's been quite OK (apart form the odd time the
 database build broke the disc partition). Recent analysis of results
 has identified one or two problems. Are these configuration issues ?
 Are there patches available ?
 
 1.Duplicate URLs
 
   htdig doesn't seem too good at spotting multiple different
   URLs pointing to the same page. Host name duplication

You can apply the following patches:

  http://sol.ccsf.cc.ca.us/ftp/htdig-patches/3.0.8b1/Docu-def-Retr-Serv.0
  http://sol.ccsf.cc.ca.us/ftp/htdig-patches/3.0.8b1/Document.cc.0
  http://sol.ccsf.cc.ca.us/ftp/htdig-patches/3.0.8b1/Retriever-def.0
  http://sol.ccsf.cc.ca.us/ftp/htdig-patches/3.0.8b2/Retriever.cc.0

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]


--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: Patch for b2 for excludes/restricts

1998-05-12 Thread Joe R. Jah

On Tue, 12 May 1998, Alex Block wrote:

 Date: Tue, 12 May 1998 15:10:17 -0400
 From: Alex Block [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: htdig: Patch for b2 for excludes/restricts
 
 I understand that there is a patch available for release b2 that addresses
 issues with respect to inclusion of the "exclude" or "restrict" in search
 forms?
 
 Can anyone advise where to find this patch?

Try:ftp://sol.ccsf.cc.ca.us/htdig-patches/  FTP clients
Or: http://sol.ccsf.cc.ca.us/ftp/Ahtdig-patches/Web browsers

And read the 00INDEX and README files.

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: Dig on One Machine (FreeBSD), Search on Another (Linux) --- Big Problems

1998-04-21 Thread Joe R. Jah

On Tue, 21 Apr 1998, Rene' Seindal wrote:

 Date: Tue, 21 Apr 1998 20:36:10 +0200
 From: Rene' Seindal [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: htdig: Dig on One Machine (FreeBSD), Search on Another (Linux) ---  Big 
Problems

[snip]

 This might be the problem I fixed about a year ago.
 
 Get the patch from ftp://webadm.kb.dk/pub/htdig.patch and
 ftp://webadm.kb.dk/pub/htdig.patch2
 
 -- 
 René Seindal

I read the two patches mentioned above.  Unfortunately there were not any
explanation and/or comment on what they are supposed to achieve.  I'd like
to place them in ftp://sol.ccsf.cc.ca.us/htdig-patches/3.08b/.  If you
don't mind them placed in a central patch site for all interested to use,
please add a couple of lines of comment at the top of each patch stating
the name of the author and explaining what they are supposed to fix and/or
what feature(s) they are supposed to add. 

IMHO, It would help the whole htdig runners to have a central site to look
for all old and new patches.  If someone has a better idea please share it
with the rest of us.

please check ftp://sol.ccsf.cc.ca.us/htdig-patches/3.08b/ for patches; 
if you know of any patches that are not there please email them, or their
present location to [EMAIL PROTECTED].

Thank you.

Joe
 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]

--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: patch for Retriever.cc

1998-04-17 Thread Joe R. Jah

On Fri, 17 Apr 1998, Steve Scott wrote:

 Date: Fri, 17 Apr 1998 08:44:47 -0400 (EDT)
 From: Steve Scott [EMAIL PROTECTED]
 To: "Joe R. Jah" [EMAIL PROTECTED]
 Cc: Steve Scott [EMAIL PROTECTED]
 Subject: Re: htdig: patch for Retriever.cc
 
 Joe,
 Thanks for the information on the patches.  I looked at them and will apply them.
 When you said that then needed to be applied in order, do you mean apply one , 
compile, 
 then apply the second one.  Can't I just make all the changes at once and then
 recompile?  I am not real familiar with C code, but it looks like a few hours of

Yes, you can make all the changes and then compile.

 cutting and pasting the code into place.  Why weren't the 3.07 patches included in 
 the 3.08 code?  This does not make sense to me.  Are there any other patches that I 
should
 apply?  I have the 3.08 patches already and I was only going to apply the one that
 checks the inodes for duplicate references.   

I do not know why they weren't included in the 3.08b.  Unfortunately the
line numbers may not match and you should do it the hard way, manually
patch the changes; some of the memory leak patches have been applied; 
you'd see them in the code.  There are at least two of Pasi's patches you
should apply; they both relate to local file systems.

 On another note, I thought that I read that there is a way for htdig to only dig
 for pages that have changed since the last dig, and then append this to the 
database.  Do 
 you know anything about this? Or am I mistaken?  Currently it takes 6-10 hours for 
our
 customer to dig all the web sites that we want.  It would be nice to dig out only 
the 
 pages that have changed. 

I refer this question to the list;-)

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]


--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: patch for Retriever.cc

1998-04-16 Thread Joe R. Jah

On Thu, 16 Apr 1998, Steve Scott wrote:

 Date: Thu, 16 Apr 1998 16:34:27 -0400 (EDT)
 From: Steve Scott [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: Steve Scott [EMAIL PROTECTED]
 Subject: htdig: patch for Retriever.cc
 
  I tried to apply the patch for Retriever.cc and I get a compile error trying
 to compile here is the error message:
 Retriever.cc:427: Undefined symbol _IsLocal referenced from text segment
 make[1]: *** [htdig] Error 1
 I say another person with the same problem who posted a message in March.  Is there
 a fix to the patch?
 Thank you.
 Steve Scott

This patch requires Pasi Eronen's Patches for local home directories
for 3.07 which was not carried over to 3.08b:
___
Date: Tue, 5 Aug 1997 16:53:35 +0300 (EEST)
From: Pasi Eronen [EMAIL PROTECTED]
To: HTDig mailing list [EMAIL PROTECTED]
Subject: htdig: Patches for local home directories


Hi!

Few days ago, I posted patches for local filesystem access
to HTDig. Since then, I've received a request for supporting
user home directory URLs (like http://www.my.com/~user/foo.html),
and made a patch for that, too.

The syntax of the new configuration file option is:

  local_user_urls: prefix1=[path1],dir1 ...

If you leave the "path" part out, it looks up the user's home
directory in /etc/passwd (or NIS or whatever). For example,
to map  "http://www.my.org/~joe/foo/bar.html" to
"/home/joe/www/foo/bar.html" you would say:

  local_user_urls: http://www.my.org/=/home/,/www/

The default behaviour of many WWW servers is approximately:

  local_user_urls: http://www.my.org/=,/public_html/

(NOTE: All the slashes in these examples are REQUIRED!)

This patch is a bit large, so I won't post it here. Instead,
it's available from http://www.iki.fi/pe/htdig/. I'm not
indexing any home directories myself, so comments are very
welcome.  (Tip: you can see what local filename it's trying
to use if you specify options '-v -v' to htdig.)

Thanks to Geoff Hutchison for testing this patch.

Best regards,
Pasi

---
Pasi Eronen [EMAIL PROTECTED], +358-50-5123499
_

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]


--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.



Re: htdig: Virtual Memory

1998-04-13 Thread Joe R. Jah

On Sun, 12 Apr 1998, System Administrator wrote:

 Date: Sun, 12 Apr 1998 10:45:52 -0500 (CDT)
 From: System Administrator [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: htdig: Virtual Memory
 
 I've been trying to run htdig -a -v -s, to index about 5500 sites, but I
 get an error 'Virtual Memory exceeded in `New'. Does anyone had this
 problem before? How can I fix it?
 
 Frank 

I had the same problem indexing just one site; I have 64 meg of ram and
about 450 meg of swap.  Our site has only about 2,000 documents.  I solved
the problem by commenting out the htnotify line in rundig.

Joe

 _/   _/_/_/   _/  __o
 _/   _/   _/  _/ __ _-\,_
 _/  _/   _/_/_/   _/  _/ ..(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah[EMAIL PROTECTED]


--
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.