Re: [htdig] No Excerpt Error

2000-07-18 Thread Paul Watters

Hi Gilles,

Thanks for your message. The PDF files were exported from Quark Express I
believe.

I've extracted the postscript, and all the appropriate strings seem to be
there:

(geophysics, German, human geography, modern history, palaeontology,)Tj
-0.00279 Tc
0.06298 Tw
1.16668 TL
T*
(palaeobiology, politics, resource and environmental management, Slavonic\
 languages)Tj
-0.00329 Tc
0.07339 Tw
1.08329 TL
T*

etc. etc.

I'll try using xpdf!

Cheers,
Paul



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




[htdig] Error in ./configure

2000-07-18 Thread Sigfus Oddsson

Hi,

When doing a ./configure on the 3.1.5 version of Ht://Dig I constantly get
this error:

sed: file conftest.s1 line 3: Unterminated `s' command

Which results in no Makefile and CONFIG being generated so I can't 'make'
the binary.

I suspect that I'm missing some programs that are required and thus the
error, but I have no idea what programs. I also cannot find the conftest.s1
file mentioned so I have no clue how to correct this error.

Please help me with this as I find Ht://Dig very useful and enjoyable and
intend to use it on my website (www.hmmm.is)

Best regards and thanks in advance for your help,

S. R. Oddsson
Reykjavik, Iceland



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




[htdig] Fw: Error in ./configure

2000-07-18 Thread Sigfus Oddsson

I'm sorry to have to send another e-mail but I figured I should include the
output of ./configure in my mail. The errors are not included as they are
printed to the console.

Regards,

S. R. Oddsson
- Original Message -
From: "Sigfus Oddsson" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, July 16, 2000 6:07 PM
Subject: Error in ./configure


 Hi,

 When doing a ./configure on the 3.1.5 version of Ht://Dig I constantly get
 this error:

 sed: file conftest.s1 line 3: Unterminated `s' command

 Which results in no Makefile and CONFIG being generated so I can't 'make'
 the binary.

 I suspect that I'm missing some programs that are required and thus the
 error, but I have no idea what programs. I also cannot find the
conftest.s1
 file mentioned so I have no clue how to correct this error.

 Please help me with this as I find Ht://Dig very useful and enjoyable and
 intend to use it on my website (www.hmmm.is)

 Best regards and thanks in advance for your help,

 S. R. Oddsson
 Reykjavik, Iceland


 log


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.


Re: [htdig] httpd Internal Server Error

2000-07-18 Thread Gilles Detillieux

According to Greg Lepore:
   I've isolated the error to the sort by title parameter that I pass along
 with the search terms.  When I search with sort by score the results are
 returned to the browser in the same time it takes to search by command
 line.  When I sort by title - crash-o-rama.  Searching by reverse score
 works, but not by time, reverse time, or reverse title.
   To sum up, the server will not return an error with sorting by score or
 reverse score; any other sorting causes the internal server error,
 presumably due to a timeout.

OK, that makes sense.  See http://www.htdig.org/FAQ.html#q5.10

   In researching the Premature End of Script Headers problem at the Apache
 website, it was pointed out that
 "The second most common cause of this is a result of an interaction with
 Perl's output buffering To make Perl flush its buffers after each
 output statement...This is generally only necessary when you are calling
 external programs from your script that send output to stdout, or if there
 will be a long delay between the time the headers are sent and the actual
 content starts being emitted... If your script isn't written in Perl, do
 the equivalent thing for whatever language you are using (e.g., for C, call
 fflush() after writing the headers). "
   Might this be relevant?

No, it's not likely to be a buffering problem.  htsearch doesn't start
outputting anything until it's processed and sorted all matches, and
after that the time to output is actually quite small.  The problem is
that when there are a lot of hits, the time to process them can be quite
long, especially when htsearch must fetch the db.docdb record for each
match (rather than for just the few it actually displays).

 At 10:34 AM 7/13/00 , Gilles Detillieux wrote:
 According to Greg Lepore:
 I need to work on my powers of estimation, the actual command line time
  for a search that returns all pages (112,000) is around 20-25 seconds.  At
  the time of the tests, there was only one install of HTDIG and therefore
  only one database and conf file. No unusual input parameters, basically
  "search everything" with the defaults.
 
 OK, but just to be sure we rule out any input parameter differences, how
 about setting the method from POST to GET in the search form, so you can
 see the query string, and then calling htsearch from the command line
 with the QUERY_STRING environment variable set to the same query string
 you saw in the URL in your browser, and the REQUEST_METHOD environment
 variable set to GET.  Perhaps try the query that actually works from the
 browser and returns the largest number of hits, and compare its timing to
 the time it takes from the command line.
 
 Unless you're running on a busy server and the CGI scripts run at a much
 higher nice level, I'm at a loss to explain why htsearch takes so much
 longer when run as a CGI.
 
  I am trying to install 3.2b2 but the
  dig time is outrageous, still running after 16 hours versus 5 hours for a
  complete dig with 3.1.3.
 However, searches that return 20,000 hits and over are still giving the
  crash.  I was hoping that upgrading would give me some speed benefits.  Of
  course, I will also install another 128MB of RAM and cross my fingers...
 
 The 3.2 series is supposed to give some speed benefits for searches, at the
 cost of longer digging time.


-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] Htmerge: Deleted, invalid

2000-07-18 Thread Gilles Detillieux

According to [EMAIL PROTECTED]:
   I think there is a bug in htmerge 3.1.5 which causes it to declare
   some pages as "invalid" in some cases.
  
  That may be, but I want to be sure we've ruled out every other possibility
  first.  I've never seen a bug report like this, so it would be very
  unusual if it is indeed a bug showing up in your case, but not for
  other users.  If you can find a consistent test case that fails on
  an initial dig, please provide details on your OS, version, config,
  etc. so that we can look into this further.
  
 
 IRIX 6.5, Htdig 3.1.5
 
 One of the symptoms is that there is no consistency.  Today's re-index
 reported 84 pages to be invalid.  Of these only one was from the
 http://www.tregalic.co.uk/sacred-heart/ site, and this time it was
 churchpage7.html.  And that page is *NOT* found by any search on my index,
 though I can follow links to it from other pages and browse it.
 
 I don't see how you can investigate this yet, but unless people put in
 reports like mine you will always be able to claim the "no-one else
 is having this problem". 
 
 I will continue to look for a pattern which might give a clue. 

I'm inclined to think this is a platform-specific problem.  Most of
the trouble reports we've seen about IRIX systems are from users who
can't even get htdig compiled, let alone running, so I don't think the
package has had a thorough workout under IRIX.  Which compier did you
use to build it?

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




[htdig] start/stop/restart

2000-07-18 Thread Franck Horlaville

Hi !

I'd like to set htdig to crawl only at night.

Right now I know how to start it at specified times with cron and 
tell it to save context before quitting so it restarts where it left 
off.

Current operation chain :

general "do it forever and log" script
   slightly modified rundig.sh
 htdig

I would like to include cron somewhere in there to turn crawling on and off.

Any suggestions ?

TIA
-- 
Franck Horlaville

Athena Online
+212 7 68 28 08
http://www.athena.online.co.ma/
mailto:[EMAIL PROTECTED]


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] httpd Internal Server Error

2000-07-18 Thread Greg Lepore

Gilles et al,

I've isolated the error to the sort by title parameter that I pass along
with the search terms.  When I search with sort by score the results are
returned to the browser in the same time it takes to search by command
line.  When I sort by title - crash-o-rama.  Searching by reverse score
works, but not by time, reverse time, or reverse title.
To sum up, the server will not return an error with sorting by score or
reverse score; any other sorting causes the internal server error,
presumably due to a timeout.
In researching the Premature End of Script Headers problem at the Apache
website, it was pointed out that
"The second most common cause of this is a result of an interaction with
Perl's output buffering To make Perl flush its buffers after each
output statement...This is generally only necessary when you are calling
external programs from your script that send output to stdout, or if there
will be a long delay between the time the headers are sent and the actual
content starts being emitted... If your script isn't written in Perl, do
the equivalent thing for whatever language you are using (e.g., for C, call
fflush() after writing the headers). "
Might this be relevant?




At 10:34 AM 7/13/00 , Gilles Detillieux wrote:
According to Greg Lepore:
  I need to work on my powers of estimation, the actual command line time
 for a search that returns all pages (112,000) is around 20-25 seconds.  At
 the time of the tests, there was only one install of HTDIG and therefore
 only one database and conf file. No unusual input parameters, basically
 "search everything" with the defaults.

OK, but just to be sure we rule out any input parameter differences, how
about setting the method from POST to GET in the search form, so you can
see the query string, and then calling htsearch from the command line
with the QUERY_STRING environment variable set to the same query string
you saw in the URL in your browser, and the REQUEST_METHOD environment
variable set to GET.  Perhaps try the query that actually works from the
browser and returns the largest number of hits, and compare its timing to
the time it takes from the command line.

Unless you're running on a busy server and the CGI scripts run at a much
higher nice level, I'm at a loss to explain why htsearch takes so much
longer when run as a CGI.

 I am trying to install 3.2b2 but the
 dig time is outrageous, still running after 16 hours versus 5 hours for a
 complete dig with 3.1.3.
  However, searches that return 20,000 hits and over are still giving the
 crash.  I was hoping that upgrading would give me some speed benefits.  Of
 course, I will also install another 128MB of RAM and cross my fingers...

The 3.2 series is supposed to give some speed benefits for searches, at the
cost of longer digging time.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:
http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930

Gregory Lepore
Maryland Electronic Capital Webmaster
410-260-6425
[EMAIL PROTECTED]


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] installing htdig for NT

2000-07-18 Thread Anthony Peacock

 According to Jim Kerslake:
  I've been trying the NT binaries supplied by Stephane at
  http://www.htdig.org/files/binaries/ and have fallen at the final small
  hurdle:
  
   April 3rd, 2000
   I had some feedbacks about the fact that htdig contains a path to
  "sort.exe"
   hardly compiled with the binary. So, I added sort.exe with cygwin1.dll.
   Sort.exe should be located in d:\cygnus\cygwin-b20\H-i586-cygwin32\bin\ .
  
  I can't partition my server's disk, so can't put sort.exe into the above
  location.
  So htmerge just dies.
  
  Does this mean that I have to get hold of Cygwin and compile my own NT
  binaries... or is there any quicker work-around ??
 
 It's a shame Stephane built the package with non-standard paths like that
 built-in.  Is there no way in NT of assigning a subdirectory to a particular
 drive letter without repartitioning?  If not, then rebuilding may be your only
 option.

Use the SUBST command.

Open a Command window  (MS/DOS) and type:  HELP SUBST

The command allows you to map a drive letter to a directory.

SUBST d: c:\path\to\sort

Of course if you want this to be valid everytime you log in you will need to 
include the command in your startup files.

---
Anthony Peacock   
CHIME, Royal Free  University College Medical School
WWW:http://www.chime.ucl.ac.uk/~rmhiajp/
"If you needed a personal life, we would have issued you with one."
"Some days it is just not worth gnawing through the restraints."


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] installing htdig for NT

2000-07-18 Thread alan

i have tried editting the htmerge.exe so that it points to
c:\cygnus\cygwin-b20\H-i586-cygwin32\bin\ but that doesn't seem to work for some
reason... i made all the necessary directories but sorting still fails...


-alan



Keith Christian wrote:

 Jim,

 You may want to look at Disk Administrator to see about re-assigning a drive
 letter to a device.  PartitionMagic is able to non-destrucively repartition
 drives under most versions of Windows.  One of those two might solve the
 problem.

 Or, use a hex editor and change the drive letter in the executable (*.EXE or
 *.DLL)
 files (do this with care - as usual, make a backup beforehand of any files
 you use a hex editor on.)

 =Keith

 __
 Do You Yahoo!?
 Talk to your friends online with Yahoo! Messenger.
 http://im.yahoo.com

 
 To unsubscribe from the htdig mailing list, send a message to
 [EMAIL PROTECTED]
 You will receive a message to confirm this.



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] Htmerge: Deleted, invalid

2000-07-18 Thread David Adams

Quoting Gilles Detillieux [EMAIL PROTECTED]:

  
  IRIX 6.5, Htdig 3.1.5
  
  One of the symptoms is that there is no consistency.  Today's re-index
  reported 84 pages to be invalid.  Of these only one was from the
  http://www.tregalic.co.uk/sacred-heart/ site, and this time it was
  churchpage7.html.  And that page is *NOT* found by any search on my index,
  though I can follow links to it from other pages and browse it.
  
  I don't see how you can investigate this yet, but unless people put in
  reports like mine you will always be able to claim the "no-one else
  is having this problem". 
  
  I will continue to look for a pattern which might give a clue. 
 
 I'm inclined to think this is a platform-specific problem.  Most of
 the trouble reports we've seen about IRIX systems are from users who
 can't even get htdig compiled, let alone running, so I don't think the
 package has had a thorough workout under IRIX.  Which compier did you
 use to build it?
 
 -- 
 Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
 Spinal Cord Research Centre   WWW:   
 http://www.scrc.umanitoba.ca/~grdetil
 Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
 Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930
 

That is a possibilty worth pursuing.

I use the standard MIPSpro compiler.  The script I use (thanks to my former 
collegeaue James Hammick) to setup the Makefile is:

#!/bin/sh
CFLAGS="-woff all -O2 -mips4 -n32 -DHAVE_ALLOCA_H" ; export CFLAGS
CPPFLAGS="-woff all -O2 -mips4 -n32 -DHAVE_ALLOCA_H" ; export CPPFLAGS
LDFLAGS="-mips4 -L/usr/lib32 -rpath /opt/local/htdig-3.1.5/lib";
export LDFLAGS
./configure --prefix=/opt/local/htdig-3.1.5 \
  --with-cgi-bin-dir=/opt/local/htdig-3.1.5/cgi-bin \
  --with-image-dir=/opt/local/htdig-3.1.5/graphics \
  --with-search-dir=/opt/local/htdig-3.1.5/htdocs/sample

A lot of that is site-specific, and the "-rpath directory" option is only
needed because the compression library is not in a standard place on the 
machine on which htdig is run.

The "-woff all" option suppresses most warning messages.  I will remove it,
recompile htdig and send the result directly to Gilles, it might contain a clue.

--
David Adams
[EMAIL PROTECTED]
Computing Services
Southampton University


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




[htdig] Solaris Compile Problems

2000-07-18 Thread Chad Phillips

I just upgraded my OS from Solairs 2.6 to 8, and now I can't compile HtDig.  
When I run configure it start fine but then gives me this error:
checking for fstream.h... no
configure: error: To compile ht://Dig, you will need a C++ library. Try installing 
libstdc++.

I installed packages for gcc and libstdc.
libstdc++-2.8.1.1-sol7-sparc-local
GCC281

Any ideas would be greatly appreciated.

thanks
chad



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] installing htdig for NT

2000-07-18 Thread Jim Kerslake

Many thanks for all of the help and advice -
I feel like I've been on a bizarre odyssey these past 4 days, trying to
persuade htdig to cooperate with my various Win32 systems.  The mailing-list
documentation has been very helpful - but very scattered (particularly when
you're reading the cached versions from Google, since I seem to have picked
a time when www.htdig.org was temporarily unavailable !!)
Perhaps a generic "guide to htdig on NT" would be useful?

Here's my experiences anyway - I'm 95% there, with one problem left - any NT
gurus out there might be able to help...?


Firstly - I bit the bullet in the end and installed Cygwin.
Glad I did really - I hadn't appreciated how nice it was!!

So, on my home Win98 / PWS system, I compiled htdig under Cygwin (having
edited the appropriate file paths into the Makefiles) and got it running
perfectly.
The PDF parser also works fine - I installed the Cygwin version of perl5.6,
and used it to run conv_doc.pl  - which in turn points to the Win32 ports of
the xpdf utilities.
All works perfectly.

Inspired by that, I repeated the process exactly at work, on my NT
workstation.
Compiled htdig under Cygwin, and got it working fine.
The PDF parser was more troublesome here - it generated multiple errors
saying that the PDF files were corrupted.
I took note of the previous mailing-list advice to edit
/htdig/ExternalParser.cc  adding a "b" thus:
FILE *fl = fopen(path, "wb");
and that reduced the errors a bit, but did not eliminate them by any means.
Unfortunately, the workload of parsing these big PDFs just brought my NT
machine to a near-standstill (funny how that didn't happen in Win98), so I
gave up on the PDF aspect at this point.
Still, htdig and htsort worked fine.

Then, I tried moving the system onto the WinNT server, which is where I most
need them.  Installing Cygwin on this box is probably not an option, so I
just copied those .exe's  which I had compiled on NT Workstation  (plus the
two necessary Cygwin .dll's ) into the same locations.

Trouble at this point:
htdig.exe works fine, as does the htsearch CGI end of things.
But htmerge.exe just bombs out with the error: "Word sort failed".

sort.exe was there, of course, in the correct location, as compiled in to
/htmerge/Makefile :
LOCAL_DEFINES= -DSORT_PROG=\"C:/htdig/bin/sort.exe\"

Reading the FAQ pointed the finger of suspicion at the use of temporary
filespace for the word-sort.
So I attempted various tricks (picked up from archives of this mailing list)
to specify a temp location:

(i) tried to specify an environment variable in a batch file:  SET
TMPDIR=C:\Temp
(ii) tried to hard-code the location of the temp space, in
/htmerge/words.cc:
String tmpdir = "c:/Temp";
(iii) tried removing the -T flag in /htmerge/words.cc  so that:
command  " -T "  tmpdir;became command  tmpdir;
(iv) tried compiling the GNU sort (from textutils) under Cygwin, and using
this as my sort.exe  (rather than the original Cygwin one)

Having tried various permutations of these (and re-making htmerge about a
dozen times), I never once got htmerge to work on the NT server (though it
usually did on my NT workstation).

So I'm wondering if I'm missing anything blindingly obvious about NTserver's
handling of temp filespace? I'm totally clueless about NT and its filesystem
/ permissions structure - do I have to do anything else to persuade NTserver
to allow sort.exe to run successfully?  I'm doing all this logged in as the
server administrator (what a joke!).

Cheers,
Jim



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] installing htdig for NT

2000-07-18 Thread alan

if anyone has gotten ht://Dig installed and configured properly on NT then
please let me know... thx


-alan



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.