Re: [htdig] No Excerpt Error
Hi Gilles, Thanks for your message. The PDF files were exported from Quark Express I believe. I've extracted the postscript, and all the appropriate strings seem to be there: (geophysics, German, human geography, modern history, palaeontology,)Tj -0.00279 Tc 0.06298 Tw 1.16668 TL T* (palaeobiology, politics, resource and environmental management, Slavonic\ languages)Tj -0.00329 Tc 0.07339 Tw 1.08329 TL T* etc. etc. I'll try using xpdf! Cheers, Paul To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
[htdig] Error in ./configure
Hi, When doing a ./configure on the 3.1.5 version of Ht://Dig I constantly get this error: sed: file conftest.s1 line 3: Unterminated `s' command Which results in no Makefile and CONFIG being generated so I can't 'make' the binary. I suspect that I'm missing some programs that are required and thus the error, but I have no idea what programs. I also cannot find the conftest.s1 file mentioned so I have no clue how to correct this error. Please help me with this as I find Ht://Dig very useful and enjoyable and intend to use it on my website (www.hmmm.is) Best regards and thanks in advance for your help, S. R. Oddsson Reykjavik, Iceland To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
[htdig] Fw: Error in ./configure
I'm sorry to have to send another e-mail but I figured I should include the output of ./configure in my mail. The errors are not included as they are printed to the console. Regards, S. R. Oddsson - Original Message - From: "Sigfus Oddsson" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, July 16, 2000 6:07 PM Subject: Error in ./configure Hi, When doing a ./configure on the 3.1.5 version of Ht://Dig I constantly get this error: sed: file conftest.s1 line 3: Unterminated `s' command Which results in no Makefile and CONFIG being generated so I can't 'make' the binary. I suspect that I'm missing some programs that are required and thus the error, but I have no idea what programs. I also cannot find the conftest.s1 file mentioned so I have no clue how to correct this error. Please help me with this as I find Ht://Dig very useful and enjoyable and intend to use it on my website (www.hmmm.is) Best regards and thanks in advance for your help, S. R. Oddsson Reykjavik, Iceland log To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] httpd Internal Server Error
According to Greg Lepore: I've isolated the error to the sort by title parameter that I pass along with the search terms. When I search with sort by score the results are returned to the browser in the same time it takes to search by command line. When I sort by title - crash-o-rama. Searching by reverse score works, but not by time, reverse time, or reverse title. To sum up, the server will not return an error with sorting by score or reverse score; any other sorting causes the internal server error, presumably due to a timeout. OK, that makes sense. See http://www.htdig.org/FAQ.html#q5.10 In researching the Premature End of Script Headers problem at the Apache website, it was pointed out that "The second most common cause of this is a result of an interaction with Perl's output buffering To make Perl flush its buffers after each output statement...This is generally only necessary when you are calling external programs from your script that send output to stdout, or if there will be a long delay between the time the headers are sent and the actual content starts being emitted... If your script isn't written in Perl, do the equivalent thing for whatever language you are using (e.g., for C, call fflush() after writing the headers). " Might this be relevant? No, it's not likely to be a buffering problem. htsearch doesn't start outputting anything until it's processed and sorted all matches, and after that the time to output is actually quite small. The problem is that when there are a lot of hits, the time to process them can be quite long, especially when htsearch must fetch the db.docdb record for each match (rather than for just the few it actually displays). At 10:34 AM 7/13/00 , Gilles Detillieux wrote: According to Greg Lepore: I need to work on my powers of estimation, the actual command line time for a search that returns all pages (112,000) is around 20-25 seconds. At the time of the tests, there was only one install of HTDIG and therefore only one database and conf file. No unusual input parameters, basically "search everything" with the defaults. OK, but just to be sure we rule out any input parameter differences, how about setting the method from POST to GET in the search form, so you can see the query string, and then calling htsearch from the command line with the QUERY_STRING environment variable set to the same query string you saw in the URL in your browser, and the REQUEST_METHOD environment variable set to GET. Perhaps try the query that actually works from the browser and returns the largest number of hits, and compare its timing to the time it takes from the command line. Unless you're running on a busy server and the CGI scripts run at a much higher nice level, I'm at a loss to explain why htsearch takes so much longer when run as a CGI. I am trying to install 3.2b2 but the dig time is outrageous, still running after 16 hours versus 5 hours for a complete dig with 3.1.3. However, searches that return 20,000 hits and over are still giving the crash. I was hoping that upgrading would give me some speed benefits. Of course, I will also install another 128MB of RAM and cross my fingers... The 3.2 series is supposed to give some speed benefits for searches, at the cost of longer digging time. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] Htmerge: Deleted, invalid
According to [EMAIL PROTECTED]: I think there is a bug in htmerge 3.1.5 which causes it to declare some pages as "invalid" in some cases. That may be, but I want to be sure we've ruled out every other possibility first. I've never seen a bug report like this, so it would be very unusual if it is indeed a bug showing up in your case, but not for other users. If you can find a consistent test case that fails on an initial dig, please provide details on your OS, version, config, etc. so that we can look into this further. IRIX 6.5, Htdig 3.1.5 One of the symptoms is that there is no consistency. Today's re-index reported 84 pages to be invalid. Of these only one was from the http://www.tregalic.co.uk/sacred-heart/ site, and this time it was churchpage7.html. And that page is *NOT* found by any search on my index, though I can follow links to it from other pages and browse it. I don't see how you can investigate this yet, but unless people put in reports like mine you will always be able to claim the "no-one else is having this problem". I will continue to look for a pattern which might give a clue. I'm inclined to think this is a platform-specific problem. Most of the trouble reports we've seen about IRIX systems are from users who can't even get htdig compiled, let alone running, so I don't think the package has had a thorough workout under IRIX. Which compier did you use to build it? -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW:http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
[htdig] start/stop/restart
Hi ! I'd like to set htdig to crawl only at night. Right now I know how to start it at specified times with cron and tell it to save context before quitting so it restarts where it left off. Current operation chain : general "do it forever and log" script slightly modified rundig.sh htdig I would like to include cron somewhere in there to turn crawling on and off. Any suggestions ? TIA -- Franck Horlaville Athena Online +212 7 68 28 08 http://www.athena.online.co.ma/ mailto:[EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] httpd Internal Server Error
Gilles et al, I've isolated the error to the sort by title parameter that I pass along with the search terms. When I search with sort by score the results are returned to the browser in the same time it takes to search by command line. When I sort by title - crash-o-rama. Searching by reverse score works, but not by time, reverse time, or reverse title. To sum up, the server will not return an error with sorting by score or reverse score; any other sorting causes the internal server error, presumably due to a timeout. In researching the Premature End of Script Headers problem at the Apache website, it was pointed out that "The second most common cause of this is a result of an interaction with Perl's output buffering To make Perl flush its buffers after each output statement...This is generally only necessary when you are calling external programs from your script that send output to stdout, or if there will be a long delay between the time the headers are sent and the actual content starts being emitted... If your script isn't written in Perl, do the equivalent thing for whatever language you are using (e.g., for C, call fflush() after writing the headers). " Might this be relevant? At 10:34 AM 7/13/00 , Gilles Detillieux wrote: According to Greg Lepore: I need to work on my powers of estimation, the actual command line time for a search that returns all pages (112,000) is around 20-25 seconds. At the time of the tests, there was only one install of HTDIG and therefore only one database and conf file. No unusual input parameters, basically "search everything" with the defaults. OK, but just to be sure we rule out any input parameter differences, how about setting the method from POST to GET in the search form, so you can see the query string, and then calling htsearch from the command line with the QUERY_STRING environment variable set to the same query string you saw in the URL in your browser, and the REQUEST_METHOD environment variable set to GET. Perhaps try the query that actually works from the browser and returns the largest number of hits, and compare its timing to the time it takes from the command line. Unless you're running on a busy server and the CGI scripts run at a much higher nice level, I'm at a loss to explain why htsearch takes so much longer when run as a CGI. I am trying to install 3.2b2 but the dig time is outrageous, still running after 16 hours versus 5 hours for a complete dig with 3.1.3. However, searches that return 20,000 hits and over are still giving the crash. I was hoping that upgrading would give me some speed benefits. Of course, I will also install another 128MB of RAM and cross my fingers... The 3.2 series is supposed to give some speed benefits for searches, at the cost of longer digging time. -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 Gregory Lepore Maryland Electronic Capital Webmaster 410-260-6425 [EMAIL PROTECTED] To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] installing htdig for NT
According to Jim Kerslake: I've been trying the NT binaries supplied by Stephane at http://www.htdig.org/files/binaries/ and have fallen at the final small hurdle: April 3rd, 2000 I had some feedbacks about the fact that htdig contains a path to "sort.exe" hardly compiled with the binary. So, I added sort.exe with cygwin1.dll. Sort.exe should be located in d:\cygnus\cygwin-b20\H-i586-cygwin32\bin\ . I can't partition my server's disk, so can't put sort.exe into the above location. So htmerge just dies. Does this mean that I have to get hold of Cygwin and compile my own NT binaries... or is there any quicker work-around ?? It's a shame Stephane built the package with non-standard paths like that built-in. Is there no way in NT of assigning a subdirectory to a particular drive letter without repartitioning? If not, then rebuilding may be your only option. Use the SUBST command. Open a Command window (MS/DOS) and type: HELP SUBST The command allows you to map a drive letter to a directory. SUBST d: c:\path\to\sort Of course if you want this to be valid everytime you log in you will need to include the command in your startup files. --- Anthony Peacock CHIME, Royal Free University College Medical School WWW:http://www.chime.ucl.ac.uk/~rmhiajp/ "If you needed a personal life, we would have issued you with one." "Some days it is just not worth gnawing through the restraints." To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] installing htdig for NT
i have tried editting the htmerge.exe so that it points to c:\cygnus\cygwin-b20\H-i586-cygwin32\bin\ but that doesn't seem to work for some reason... i made all the necessary directories but sorting still fails... -alan Keith Christian wrote: Jim, You may want to look at Disk Administrator to see about re-assigning a drive letter to a device. PartitionMagic is able to non-destrucively repartition drives under most versions of Windows. One of those two might solve the problem. Or, use a hex editor and change the drive letter in the executable (*.EXE or *.DLL) files (do this with care - as usual, make a backup beforehand of any files you use a hex editor on.) =Keith __ Do You Yahoo!? Talk to your friends online with Yahoo! Messenger. http://im.yahoo.com To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this. To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] Htmerge: Deleted, invalid
Quoting Gilles Detillieux [EMAIL PROTECTED]: IRIX 6.5, Htdig 3.1.5 One of the symptoms is that there is no consistency. Today's re-index reported 84 pages to be invalid. Of these only one was from the http://www.tregalic.co.uk/sacred-heart/ site, and this time it was churchpage7.html. And that page is *NOT* found by any search on my index, though I can follow links to it from other pages and browse it. I don't see how you can investigate this yet, but unless people put in reports like mine you will always be able to claim the "no-one else is having this problem". I will continue to look for a pattern which might give a clue. I'm inclined to think this is a platform-specific problem. Most of the trouble reports we've seen about IRIX systems are from users who can't even get htdig compiled, let alone running, so I don't think the package has had a thorough workout under IRIX. Which compier did you use to build it? -- Gilles R. Detillieux E-mail: [EMAIL PROTECTED] Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax:(204)789-3930 That is a possibilty worth pursuing. I use the standard MIPSpro compiler. The script I use (thanks to my former collegeaue James Hammick) to setup the Makefile is: #!/bin/sh CFLAGS="-woff all -O2 -mips4 -n32 -DHAVE_ALLOCA_H" ; export CFLAGS CPPFLAGS="-woff all -O2 -mips4 -n32 -DHAVE_ALLOCA_H" ; export CPPFLAGS LDFLAGS="-mips4 -L/usr/lib32 -rpath /opt/local/htdig-3.1.5/lib"; export LDFLAGS ./configure --prefix=/opt/local/htdig-3.1.5 \ --with-cgi-bin-dir=/opt/local/htdig-3.1.5/cgi-bin \ --with-image-dir=/opt/local/htdig-3.1.5/graphics \ --with-search-dir=/opt/local/htdig-3.1.5/htdocs/sample A lot of that is site-specific, and the "-rpath directory" option is only needed because the compression library is not in a standard place on the machine on which htdig is run. The "-woff all" option suppresses most warning messages. I will remove it, recompile htdig and send the result directly to Gilles, it might contain a clue. -- David Adams [EMAIL PROTECTED] Computing Services Southampton University To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
[htdig] Solaris Compile Problems
I just upgraded my OS from Solairs 2.6 to 8, and now I can't compile HtDig. When I run configure it start fine but then gives me this error: checking for fstream.h... no configure: error: To compile ht://Dig, you will need a C++ library. Try installing libstdc++. I installed packages for gcc and libstdc. libstdc++-2.8.1.1-sol7-sparc-local GCC281 Any ideas would be greatly appreciated. thanks chad To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] installing htdig for NT
Many thanks for all of the help and advice - I feel like I've been on a bizarre odyssey these past 4 days, trying to persuade htdig to cooperate with my various Win32 systems. The mailing-list documentation has been very helpful - but very scattered (particularly when you're reading the cached versions from Google, since I seem to have picked a time when www.htdig.org was temporarily unavailable !!) Perhaps a generic "guide to htdig on NT" would be useful? Here's my experiences anyway - I'm 95% there, with one problem left - any NT gurus out there might be able to help...? Firstly - I bit the bullet in the end and installed Cygwin. Glad I did really - I hadn't appreciated how nice it was!! So, on my home Win98 / PWS system, I compiled htdig under Cygwin (having edited the appropriate file paths into the Makefiles) and got it running perfectly. The PDF parser also works fine - I installed the Cygwin version of perl5.6, and used it to run conv_doc.pl - which in turn points to the Win32 ports of the xpdf utilities. All works perfectly. Inspired by that, I repeated the process exactly at work, on my NT workstation. Compiled htdig under Cygwin, and got it working fine. The PDF parser was more troublesome here - it generated multiple errors saying that the PDF files were corrupted. I took note of the previous mailing-list advice to edit /htdig/ExternalParser.cc adding a "b" thus: FILE *fl = fopen(path, "wb"); and that reduced the errors a bit, but did not eliminate them by any means. Unfortunately, the workload of parsing these big PDFs just brought my NT machine to a near-standstill (funny how that didn't happen in Win98), so I gave up on the PDF aspect at this point. Still, htdig and htsort worked fine. Then, I tried moving the system onto the WinNT server, which is where I most need them. Installing Cygwin on this box is probably not an option, so I just copied those .exe's which I had compiled on NT Workstation (plus the two necessary Cygwin .dll's ) into the same locations. Trouble at this point: htdig.exe works fine, as does the htsearch CGI end of things. But htmerge.exe just bombs out with the error: "Word sort failed". sort.exe was there, of course, in the correct location, as compiled in to /htmerge/Makefile : LOCAL_DEFINES= -DSORT_PROG=\"C:/htdig/bin/sort.exe\" Reading the FAQ pointed the finger of suspicion at the use of temporary filespace for the word-sort. So I attempted various tricks (picked up from archives of this mailing list) to specify a temp location: (i) tried to specify an environment variable in a batch file: SET TMPDIR=C:\Temp (ii) tried to hard-code the location of the temp space, in /htmerge/words.cc: String tmpdir = "c:/Temp"; (iii) tried removing the -T flag in /htmerge/words.cc so that: command " -T " tmpdir;became command tmpdir; (iv) tried compiling the GNU sort (from textutils) under Cygwin, and using this as my sort.exe (rather than the original Cygwin one) Having tried various permutations of these (and re-making htmerge about a dozen times), I never once got htmerge to work on the NT server (though it usually did on my NT workstation). So I'm wondering if I'm missing anything blindingly obvious about NTserver's handling of temp filespace? I'm totally clueless about NT and its filesystem / permissions structure - do I have to do anything else to persuade NTserver to allow sort.exe to run successfully? I'm doing all this logged in as the server administrator (what a joke!). Cheers, Jim To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] installing htdig for NT
if anyone has gotten ht://Dig installed and configured properly on NT then please let me know... thx -alan To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.