Dear list, I recently had my hosting provider install the Htdig search engine for our server which runs the Apache web server on Linux. They installed the search engine as root and I had to get them to provide permissions for all the files.
I ran a rundig and it appeared to have created the db files...but when I use the form in my browser to pull up the results, I get: ------------------------------------------ ht://Dig error htsearch detected an error. Please report this to the webmaster of this site. The error message is: Unable to read configuration file ------------------------------------------ I assumed that the configuration file is the "htdig.conf" file but the file is just fine as I can read it and it is already set to my permission...is there another configuration file that it's not reading? Please note that I had to create a symlink in my cgi-bin to the htsearch binary at /opt/www/cgi-bin/htsearch. All the permissions are fine. The form to do a search is at http://uahc.org/htdig.html. Also, are there any consultants in this list that can help troubleshoot any more potential problems such as the one above? Jonathan Lam <http://uahc.org> -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 13, 2002 4:35 AM To: [EMAIL PROTECTED] Subject: htdig-general digest, Vol 1 #551 - 6 msgs Send htdig-general mailing list submissions to [EMAIL PROTECTED] To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/htdig-general or, via email, send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach the person managing the list at [EMAIL PROTECTED] When replying, please edit your Subject line so it is more specific than "Re: Contents of htdig-general digest..." Today's Topics: 1. Re: Help building a select list (Gilles Detillieux) 2. Re: recreate url list (Gilles Detillieux) 3. Re: Problem with Foreign Chars (Swedish) (Gilles Detillieux) 4. No Navegue Mas!!! (Luis) 5. RE: Deleted, no excerpt with pdf files (Steve Marshall) 6. Re: Deleted, no excerpt with pdf files (David Adams) --__--__-- Message: 1 From: Gilles Detillieux <[EMAIL PROTECTED]> Subject: Re: [htdig] Help building a select list To: [EMAIL PROTECTED] Date: Tue, 12 Mar 2002 17:51:41 -0600 (CST) Cc: [EMAIL PROTECTED] (ht://Dig mailing list) According to [EMAIL PROTECTED]: > Thank you Gilles for your response. I made the changes you suggested and it > still does not work. I wonder if it is the combination of the two lists I > built that is the problem. > > build_select_lists: RESTRICT_LIST restrict restrict_names 2 1 2 > restrict "" \ > EXCLUDE_LIST,checkbox > exclude exclude_names 2 1 2 exclude "" > > > restrict_names: "" "Austin City Connection" \ > "http://www.ci.austin.tx.us/budget/" > "Budget" \ > "http://www.ci.austin.tx.us/council/" "Council" \ > "http://www.ci.austin.tx.us/library/" > "Library" \ > "http://www.ci.austin.tx.us/minutes/" > "Minutes" \ > "http://www.ci.austin.tx.us/news/" "News" \ > "http://www.ci.austin.tx.us/police/" > "Police" \ > "http://www.ci.austin.tx.us/sws/" "Solid > Waste Services" \ > "http://www.ci.austin.tx.us/watershed/" > "Watershed Protection" > > exclude_names: "http://www.ci.austin.tx.us/agenda/" "Council Agenda" \ > "http://www.ci.austin.tx.us/council/" > "Council Transcripts" \ > "http://www.ci.austin.tx.us/minutes/" > "Council Minutes" \ > "http://www.ci.austin.tx.us/news/" "News" > > My form for the html is as follows: > > <form method="get" action="$(CGI)"><input type="hidden" name="config" > value="$&(CONFIG)"> > <table bgcolor="#BCB6A0" cellpadding="2" cellspacing="1"> > <tr> > <td bgcolor="#EAE8DC" width="96"><p>Search for:</p></td> > <td bgcolor="#EAE8DC" width="208"> > <input type="text" size="25" name="words" value="$(WORDS)"></td> > <td colspan="2" align="center" valign="middle"><p><input > type="submit" value="Search"> > </p></td> > </tr> > <tr> > <td bgcolor="#EAE8DC"><p>Search in:</p></td> > <td bgcolor="#EAE8DC" colspan="3">$(RESTRICT_LIST)</td> > </tr> > <tr> > <td bgcolor="#EAE8DC"><p>Match:</p></td> > <td bgcolor="#EAE8DC">$(METHOD)</td> > <td bgcolor="#EAE8DC" width="96"><p>Format:</p></td> > <td bgcolor="#EAE8DC" width="207">$(FORMAT)</td> > </tr> > <tr> > <td colspan="4" bgcolor="#EAE8DC"><p> > If you elect to search all of Austin City Connection, you may want > to exclude one or more of the following directories. These large directories > make it difficult to target documents.</p></td> > </tr> > <tr> > <td bgcolor="#EAE8DC" valign="top"><p>Exclude:</p></td> > <td bgcolor="#EAE8DC" valign="top" > colspan="3">$(EXCLUDE_LIST)</form></td> > </tr> > > The Restrict list works, the Exclude list does not. > > Thanks again for your help. I have the search engine so close to working > like I want it. All the above looks fine to me. I have to ask, though, are your sure you're running version 3.1.6 of htsearch? 3.1.5 doesn't support the extensions to build_select_lists for checkboxes, radio buttons and select multiple without a patch. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 --__--__-- Message: 2 From: Gilles Detillieux <[EMAIL PROTECTED]> Subject: Re: [htdig] recreate url list To: [EMAIL PROTECTED] (Gabriele Bartolini) Date: Tue, 12 Mar 2002 18:05:59 -0600 (CST) Cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (htdig-general) According to Gabriele Bartolini: > At 13.44 12/03/2002 +0200, Greg wrote: > >I am running htdig 3.1.4 and want to do a re-index of the existing URL list > >within the current database. The conf file no longer contains the > >original URL > >list. Is there a way to redig the existing URL list without the start_url > >list and > >if possible place the new db into a seperate set of files so that general > >access is not affected? If not how do I extract the existing url > >list? Please > >can you be very specific as I am not at all familiar with htdig command > >syntax. > > Ciao, > > please guys correct me if I am wrong, but I think that you Greg should > probably switch to the 3.1.6 version if you can. It should be almost > painless. It's just a consideration I am doing now ... :-) > > Geoff, Gilles & co, is the 3.1.4 database compatible with the 3.1.6 > version? Yes, 3.1.6 should be able to handle a 3.1.4 database without any difficulty. 3.1.6 also includes an htdump utility to extract the whole document database as an ASCII file. You could probably fairly easily extract the list of URLs from the db.docs file produced by htdump, using an awk/sed/perl script. See http://www.htdig.org/ for all ht://Dig documentation, including syntax for individual commands. See the documentation for awk, sed or Perl for information about how you could use one of these to strip out the URLs from db.docs. Something like "sed -n 's/^.* u:\([^ ]*\) .*/\1/p' db.docs" would probably do it, where the spaces in the s/// command are actually tab characters. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 --__--__-- Message: 3 From: Gilles Detillieux <[EMAIL PROTECTED]> Subject: Re: [htdig] Problem with Foreign Chars (Swedish) To: [EMAIL PROTECTED] (Stefan Wold) Date: Tue, 12 Mar 2002 18:07:46 -0600 (CST) Cc: [EMAIL PROTECTED] (htdig-general) According to Stefan Wold: > I'm running htdig 3.1.6 on Linux. When I use rundig to create the > database for a website it index it correct except that it doesn't take > ANY foreign chars (Swedish) at all, =E5=E4=F6 nor =C5=C4=D6 can be foun= d in the > db.wordlist. It seem to skip the whole word if it contains a Swedish > char. I have tried with different locale settings before running rundig > without any luck. >=20 > Anyone had this kind of problem? Lots of people do! See http://www.htdig.org/FAQ.html#q5.8 I added a couple paragraphs to it this morning. --=20 Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~g= rdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 --__--__-- Message: 4 Reply-To: [EMAIL PROTECTED] From: "Luis" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Date: Wed, 13 Mar 2002 02:39:34 -0300 Subject: [htdig] No Navegue Mas!!! ------=_NextPart_84815C5ABAF209EF376268C8 Content-type: text/plain; charset="US-ASCII" No navegue mas!!! No gastes miles de horas de conexión!!! No utilice su tarjeta para entrar a sitios que le ofrecen lo mismo o menos, y con demoras para bajar los videos y fotos que quiere ver. En un solo CD lo tiene todo. Todo lo que siempre buscó y perdió noches de sueño para conseguir. 2.500 fotos seleccionadas de mayor cantidad, de alta calidad. 102 Videos con mas de 2hs - (89 cortos y 13 DIVX extractos de las mejores escenas de Películas) Categorías : Anal, Coitos, Orales, Colas, Fist, Famosas, Lesvis, Vaginas, Embarazadas, Hentai (fotos y Videos), Manga, Teen, Meadas, Negras, Playboy, Sadismo, Tetonas, Toys, Zoo (fotos y Videos), etc. etc. Los videos se ven perfectamente con el Mediaplayer 6.4 de Windows. (Incluimos Visualizador reecompacto para ver fotos, y Codecs para ver en formato Divx los videos) SI HA COMPRADO ALGUNA VEZ REVISTAS QUE TIENEN CD CON MATERIAL PARECIDO, SE DARA CUENTA QUE NO CONTIENEN NI EL 10% DE LO QUE TIENE ESTE CD. TODO EN UN SOLO CD GARANTIZADO SIN ERRORES, DE PRIMERA MARCA. SOLAMENTE DENTRO DE REPUBLICA ARGENTINA. ENVIO INCLUIDO CONTRAREEMBOLSO $ 20.- Pedidos Exclusivamente a : [EMAIL PROTECTED] Estrictamente para mayores de 18 años. Absoluta reserva, el CD llegará sin identificación alguna. El CD no es autoejecutable, para que al ponerlo por error no se vea el contenido. Disculpe las molestias si este mensaje no es de su interés, solo se enviara una vez Por sección, párrafo (a) (2) (C) de S.1618. Bajo el decreto titulo 3ro. Aprobado por el 105 congreso base de las normativas internacionales sobre SPAM, un E-mail no podrá se considerado SPAM mientras incluya una forma de ser removido. Si usted desea ser removido de nuestra base de datos en forma definitiva por favor responda a este mail indicando "Remover" en el subject. ------=_NextPart_84815C5ABAF209EF376268C8 Content-Type: text/html; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable <HTML> <HEAD> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; charset=3Dwindows-= 1252"> <META NAME=3D"Generator" CONTENT=3D"Microsoft Word 97"> <TITLE>NO NAVEGUE MAS</TITLE> </HEAD> <BODY LINK=3D"#0000ff" VLINK=3D"#800080"> <B><FONT FACE=3D"Arial" SIZE=3D4 COLOR=3D"#ff0000"><P ALIGN=3D"CENTER">No = navegue mas!!!</P> <P ALIGN=3D"CENTER">No gastes miles de horas de conexión!!!</P> </FONT><P ALIGN=3D"JUSTIFY">No utilice su tarjeta para entrar a sitios que= le ofrecen lo mismo o menos, y con demoras para bajar los videos y fotos = que quiere ver=2E</P> <P ALIGN=3D"JUSTIFY">En un solo CD lo tiene todo=2E</P> <P ALIGN=3D"JUSTIFY">Todo lo que siempre buscó y perdió noch= es de sueño para conseguir=2E</P> <P ALIGN=3D"JUSTIFY">2=2E500 fotos seleccionadas de mayor cantidad, de alt= a calidad=2E</P> <P ALIGN=3D"JUSTIFY">102 Videos con mas de 2hs - (89 cortos y 13 DIVX extr= actos de las mejores escenas de Películas)</P> <P ALIGN=3D"JUSTIFY">Categorías : Anal, Coitos, Orales, Colas, Fist= , Famosas, Lesvis, Vaginas, Embarazadas, Hentai (fotos y Videos), Manga, T= een, Meadas, Negras, Playboy, Sadismo, Tetonas, Toys, Zoo (fotos y Videos)= , etc=2E etc=2E</P> <P ALIGN=3D"JUSTIFY">Los videos se ven perfectamente con el Mediaplayer 6=2E= 4 de Windows=2E</P> <P ALIGN=3D"JUSTIFY">(Incluimos Visualizador reecompacto para ver fotos, y= Codecs para ver en formato Divx los videos)</P> <FONT FACE=3D"Arial"><P ALIGN=3D"JUSTIFY">SI HA COMPRADO ALGUNA VEZ REVIST= AS QUE TIENEN CD CON MATERIAL PARECIDO, SE DARA CUENTA QUE NO CONTIENEN NI= EL 10% DE LO QUE TIENE ESTE CD=2E</P> </FONT><FONT FACE=3D"Arial" COLOR=3D"#0000ff"><P ALIGN=3D"CENTER">TODO EN = UN SOLO CD GARANTIZADO SIN ERRORES, DE PRIMERA MARCA=2E</P> </FONT><FONT FACE=3D"Arial" COLOR=3D"#ff0000"><P ALIGN=3D"CENTER">SOLAMENT= E DENTRO DE REPUBLICA ARGENTINA=2E</P> <P ALIGN=3D"CENTER">ENVIO INCLUIDO CONTRAREEMBOLSO $ 20=2E-</P> </FONT><FONT FACE=3D"Arial"><P ALIGN=3D"JUSTIFY">Pedidos Exclusivamente a = : </B></FONT><A HREF=3D"mailto:luicd@hot-shot=2Ecom"><FONT SIZE=3D4>luicd@= hot-shot=2Ecom</FONT></A></P> <B><FONT FACE=3D"Arial"><P ALIGN=3D"JUSTIFY"> Estrictamente para mayores d= e 18 años=2E</P> <P ALIGN=3D"JUSTIFY">Absoluta reserva, el CD llegará sin identifica= ción alguna=2E</P> <P ALIGN=3D"JUSTIFY">El CD no es autoejecutable, para que al ponerlo por e= rror no se vea el contenido=2E</P> </FONT><FONT COLOR=3D"#0000ff"><P ALIGN=3D"CENTER">Disculpe las molestias = si este mensaje no es de su interés, solo se enviara una vez</P> </B></FONT><FONT SIZE=3D2><P ALIGN=3D"CENTER">Por sección, pá= ;rrafo (a) (2) (C) de S=2E1618=2E Bajo el decreto titulo 3ro=2E Aprobado <= BR> por el 105 congreso base de las normativas internacionales sobre SPAM, un = E-mail <BR> no podrá se considerado SPAM mientras incluya una forma de ser remo= vido=2E Si usted <BR> desea ser removido de nuestra base de datos en forma definitiva por favor = responda <BR> a este mail indicando "Remover" en el subject=2E</P></FONT></BODY> </HTML> ------=_NextPart_84815C5ABAF209EF376268C8-- --__--__-- Message: 5 Reply-To: <[EMAIL PROTECTED]> From: "Steve Marshall" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>, "'David Adams'" <[EMAIL PROTECTED]> Subject: RE: [htdig] Deleted, no excerpt with pdf files Date: Wed, 13 Mar 2002 08:49:50 -0000 Organization: Atelier Ten This is a multi-part message in MIME format. ------=_NextPart_000_0000_01C1CA6C.0A7234A0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit David, thanks for your suggestion The trouble is, htdig's output looks fine to me, seems to get the Content-Type correct, the length looks sensible at 29122 bytes, it just doesn't put anything it finds into its database scratch files. It lists the text from the pdf when in -vvvv mode, so it's not one of those pdf-image issues. Output is listed below Any other thoughts? Steve title: Atelier Ten Web Graphics image: http://192.168.1.2/pdfs/TSB_Exterior_thumb.gif href: http://192.168.1.2/pdfs/phoenix.pdf (support images) resolving 'http://192.168.1.2/pdfs/phoenix.pdf' pushing http://192.168.1.2/pdfs/phoenix.pdf + size = 1186 pick: 192.168.1.2, # servers = 1 1:1:1:http://192.168.1.2/pdfs/phoenix.pdf: Retrieval command for http://192.168.1.2/pdfs/phoenix.pdf: GET /pdfs/phoenix.pdf HTTP/1.0 User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) Referer: http://192.168.1.2/ Host: 192.168.1.2 Header line: HTTP/1.1 200 OK Header line: Date: Tue, 12 Mar 2002 20:00:42 GMT Header line: Server: Apache/1.3.20 (Linux/SuSE) PHP/4.0.6 Header line: Last-Modified: Thu, 14 Jun 2001 08:59:02 GMT Converted Thu, 14 Jun 2001 08:59:02 GMT to Thu, 14 Jun 2001 08:59:02 Header line: ETag: "9813c-71c2-3b287cd6" Header line: Accept-Ranges: bytes Header line: Content-Length: 29122 Header line: Connection: close Header line: Content-Type: application/pdf Header line: returnStatus = 0 Read 8192 from document Read 8192 from document Read 8192 from document Read 4546 from document Read a total of 29122 bytes PDF::setContents(29122 bytes) PDF::parse(http://192.168.1.2/pdfs/phoenix.pdf) PDF::parse: 19272 lines parsed PDF::parse ends normally size = 29122 pick: 192.168.1.2, # servers = 1 ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ ------=_NextPart_000_0000_01C1CA6C.0A7234A0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; = charset=3Dus-ascii"> <META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version = 6.0.4630.0"> <TITLE>RE: [htdig] Deleted, no excerpt with pdf files</TITLE> </HEAD> <BODY> <!-- Converted from text/rtf format --> <P><FONT SIZE=3D2 FACE=3D"Arial">David, thanks for your = suggestion</FONT> </P> <P><FONT SIZE=3D2 FACE=3D"Arial">The trouble is, htdig's output looks = fine to me, seems to get the Content-Type correct, the length looks = sensible at 29122 bytes, it just doesn't put anything it finds into = its database scratch files. It lists the text from the pdf when in = -vvvv mode, so it's not one of those pdf-image issues.</FONT></P> <P><FONT SIZE=3D2 FACE=3D"Arial">Output is listed below</FONT> </P> <P><FONT SIZE=3D2 FACE=3D"Arial">Any other thoughts?</FONT> </P> <P><FONT SIZE=3D2 FACE=3D"Arial">Steve</FONT> </P> <BR> <P><FONT SIZE=3D2 FACE=3D"Arial">title: Atelier Ten Web Graphics</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">image: </FONT><A = HREF=3D"http://192.168.1.2/pdfs/TSB_Exterior_thumb.gif"><U><FONT = COLOR=3D"#0000FF" SIZE=3D2 = FACE=3D"Arial">http://192.168.1.2/pdfs/TSB_Exterior_thumb.gif</FONT></U><= /A> <BR><FONT SIZE=3D2 FACE=3D"Arial">href: </FONT><A = HREF=3D"http://192.168.1.2/pdfs/phoenix.pdf"><U><FONT COLOR=3D"#0000FF" = SIZE=3D2 = FACE=3D"Arial">http://192.168.1.2/pdfs/phoenix.pdf</FONT></U></A><FONT = SIZE=3D2 FACE=3D"Arial"> (support images)</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">resolving '</FONT><A = HREF=3D"http://192.168.1.2/pdfs/phoenix.pdf"><U><FONT COLOR=3D"#0000FF" = SIZE=3D2 = FACE=3D"Arial">http://192.168.1.2/pdfs/phoenix.pdf</FONT></U></A><FONT = SIZE=3D2 FACE=3D"Arial">'</FONT> </P> <P><FONT SIZE=3D2 FACE=3D"Arial"> pushing </FONT><A = HREF=3D"http://192.168.1.2/pdfs/phoenix.pdf"><U><FONT COLOR=3D"#0000FF" = SIZE=3D2 = FACE=3D"Arial">http://192.168.1.2/pdfs/phoenix.pdf</FONT></U></A> <BR><FONT SIZE=3D2 FACE=3D"Arial">+ size =3D 1186</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">pick: 192.168.1.2, # servers =3D = 1</FONT> <BR><FONT SIZE=3D2 = FACE=3D"Arial">1:1:1:http://192.168.1.2/pdfs/phoenix.pdf: Retrieval = command for </FONT><A = HREF=3D"http://192.168.1.2/pdfs/phoenix.pdf"><U><FONT COLOR=3D"#0000FF" = SIZE=3D2 = FACE=3D"Arial">http://192.168.1.2/pdfs/phoenix.pdf</FONT></U></A><FONT = SIZE=3D2 FACE=3D"Arial">: GET /pdfs/phoenix.pdf HTTP/1.0</FONT></P> <P><FONT SIZE=3D2 FACE=3D"Arial">User-Agent: htdig/3.1.6 = ([EMAIL PROTECTED])</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Referer: </FONT><A = HREF=3D"http://192.168.1.2/"><U><FONT COLOR=3D"#0000FF" SIZE=3D2 = FACE=3D"Arial">http://192.168.1.2/</FONT></U></A> <BR><FONT SIZE=3D2 FACE=3D"Arial">Host: 192.168.1.2</FONT> </P> <P><FONT SIZE=3D2 FACE=3D"Arial">Header line: HTTP/1.1 200 OK</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Header line: Date: Tue, 12 Mar 2002 = 20:00:42 GMT</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Header line: Server: Apache/1.3.20 = (Linux/SuSE) PHP/4.0.6</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Header line: Last-Modified: Thu, 14 = Jun 2001 08:59:02 GMT</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Converted Thu, 14 Jun 2001 08:59:02 = GMT to Thu, 14 Jun 2001 08:59:02</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Header line: ETag: = "9813c-71c2-3b287cd6"</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Header line: Accept-Ranges: = bytes</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Header line: Content-Length: = 29122</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Header line: Connection: close</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Header line: Content-Type: = application/pdf</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Header line:</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">returnStatus =3D 0</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Read 8192 from document</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Read 8192 from document</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Read 8192 from document</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Read 4546 from document</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">Read a total of 29122 bytes</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">PDF::setContents(29122 bytes)</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">PDF::parse(<A = HREF=3D"http://192.168.1.2/pdfs/phoenix.pdf">http://192.168.1.2/pdfs/phoe= nix.pdf</A>)</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">PDF::parse: 19272 lines parsed</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">PDF::parse ends normally</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial"> size =3D 29122</FONT> <BR><FONT SIZE=3D2 FACE=3D"Arial">pick: 192.168.1.2, # servers =3D = 1</FONT> </P> </BODY> </HTML> <HTML><BODY><BR> ________________________________________________________________________<BR> This e-mail has been scanned for all viruses by Star Internet. The<BR> service is powered by MessageLabs. For more information on a proactive<BR> anti-virus service working around the clock, around the globe, visit:<BR> <A =20 href=3Dhttp://www.star.net.uk> http://www.star.net.uk</A><BR> ________________________________________________________________________<BR> </BODY></HTML> ------=_NextPart_000_0000_01C1CA6C.0A7234A0-- --__--__-- Message: 6 From: "David Adams" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]> Subject: Re: [htdig] Deleted, no excerpt with pdf files Date: Wed, 13 Mar 2002 09:31:03 -0000 This is a multi-part message in MIME format. ------=_NextPart_000_002B_01C1CA71.CB32BDE0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable RE: [htdig] Deleted, no excerpt with pdf filesSteve, It looks as though there must be a problem with your configuration file. = The lines: PDF::setContents(29122 bytes)=20 PDF::parse(http://192.168.1.2/pdfs/phoenix.pdf)=20 PDF::parse: 19272 lines parsed=20 PDF::parse ends normally=20 size =3D 29122=20 are definitely NOT what I would expect from doc2html.pl, pdf2html.pl or = pdftotext. Some other parser is being used. -- David Adams Computing Services Southampton University ----- Original Message -----=20 From: Steve Marshall=20 To: [EMAIL PROTECTED] ; 'David Adams'=20 Sent: Wednesday, March 13, 2002 8:49 AM Subject: RE: [htdig] Deleted, no excerpt with pdf files David, thanks for your suggestion=20 The trouble is, htdig's output looks fine to me, seems to get the = Content-Type correct, the length looks sensible at 29122 bytes, it just = doesn't put anything it finds into its database scratch files. It lists = the text from the pdf when in -vvvv mode, so it's not one of those = pdf-image issues. Output is listed below=20 Any other thoughts?=20 Steve=20 title: Atelier Ten Web Graphics=20 image: http://192.168.1.2/pdfs/TSB_Exterior_thumb.gif=20 href: http://192.168.1.2/pdfs/phoenix.pdf (support images)=20 resolving 'http://192.168.1.2/pdfs/phoenix.pdf'=20 pushing http://192.168.1.2/pdfs/phoenix.pdf=20 + size =3D 1186=20 pick: 192.168.1.2, # servers =3D 1=20 1:1:1:http://192.168.1.2/pdfs/phoenix.pdf: Retrieval command for = http://192.168.1.2/pdfs/phoenix.pdf: GET /pdfs/phoenix.pdf HTTP/1.0 User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])=20 Referer: http://192.168.1.2/=20 Host: 192.168.1.2=20 Header line: HTTP/1.1 200 OK=20 Header line: Date: Tue, 12 Mar 2002 20:00:42 GMT=20 Header line: Server: Apache/1.3.20 (Linux/SuSE) PHP/4.0.6=20 Header line: Last-Modified: Thu, 14 Jun 2001 08:59:02 GMT=20 Converted Thu, 14 Jun 2001 08:59:02 GMT to Thu, 14 Jun 2001 08:59:02=20 Header line: ETag: "9813c-71c2-3b287cd6"=20 Header line: Accept-Ranges: bytes=20 Header line: Content-Length: 29122=20 Header line: Connection: close=20 Header line: Content-Type: application/pdf=20 Header line:=20 returnStatus =3D 0=20 Read 8192 from document=20 Read 8192 from document=20 Read 8192 from document=20 Read 4546 from document=20 Read a total of 29122 bytes=20 PDF::setContents(29122 bytes)=20 PDF::parse(http://192.168.1.2/pdfs/phoenix.pdf)=20 PDF::parse: 19272 lines parsed=20 PDF::parse ends normally=20 size =3D 29122=20 pick: 192.168.1.2, # servers =3D 1=20 = ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk = ________________________________________________________________________ ------=_NextPart_000_002B_01C1CA71.CB32BDE0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD><TITLE>RE: [htdig] Deleted, no excerpt with pdf = files</TITLE> <META http-equiv=3DContent-Type content=3D"text/html; = charset=3Diso-8859-1"> <META content=3D"MSHTML 5.50.4134.600" name=3DGENERATOR> <STYLE></STYLE> </HEAD> <BODY bgColor=3D#ffffff> <DIV><FONT size=3D2>Steve,</FONT></DIV> <DIV><FONT size=3D2></FONT> </DIV> <DIV><FONT size=3D2>It looks as though there must be a problem with your = configuration file. The lines:</FONT></DIV> <DIV><FONT size=3D2></FONT> </DIV> <DIV><FONT size=3D2>PDF::setContents(29122 bytes)<FONT size=3D3> = <BR></FONT><FONT=20 face=3DArial size=3D2>PDF::parse(<A=20 href=3D"http://192.168.1.2/pdfs/phoenix.pdf">http://192.168.1.2/pdfs/phoe= nix.pdf</A>)</FONT><FONT=20 size=3D3> <BR></FONT><FONT face=3DArial size=3D2>PDF::parse: 19272 lines = parsed</FONT><FONT size=3D3> <BR></FONT><FONT face=3DArial = size=3D2>PDF::parse ends=20 normally</FONT><FONT size=3D3> <BR></FONT><FONT face=3DArial = size=3D2> size =3D=20 29122</FONT><FONT size=3D3> </FONT><BR></FONT></DIV> <DIV><FONT size=3D2>are definitely NOT what I would expect from = doc2html.pl,=20 pdf2html.pl or pdftotext.</FONT></DIV> <DIV><FONT size=3D2>Some other parser is being used.</DIV></FONT> <DIV><FONT size=3D2></FONT> </DIV> <DIV>--<BR>David Adams<BR>Computing Services<BR>Southampton = University<BR></DIV> <BLOCKQUOTE=20 style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; = BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px"> <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV> <DIV=20 style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: = black"><B>From:</B>=20 <A [EMAIL PROTECTED] href=3D"mailto:[EMAIL PROTECTED]">Steve = Marshall</A>=20 </DIV> <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A=20 [EMAIL PROTECTED]=20 = href=3D"mailto:[EMAIL PROTECTED]">[EMAIL PROTECTED]= ourceforge.net</A>=20 ; <A [EMAIL PROTECTED] = href=3D"mailto:[EMAIL PROTECTED]">'David=20 Adams'</A> </DIV> <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Wednesday, March 13, 2002 = 8:49=20 AM</DIV> <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> RE: [htdig] Deleted, = no excerpt=20 with pdf files</DIV> <DIV><BR></DIV><!-- Converted from text/rtf format --> <P><FONT face=3DArial size=3D2>David, thanks for your = suggestion</FONT> </P> <P><FONT face=3DArial size=3D2>The trouble is, htdig's output looks = fine to me,=20 seems to get the Content-Type correct, the length looks sensible at = 29122=20 bytes, it just doesn't put anything it finds into its database = scratch=20 files. It lists the text from the pdf when in -vvvv mode, so it's not = one of=20 those pdf-image issues.</FONT></P> <P><FONT face=3DArial size=3D2>Output is listed below</FONT> </P> <P><FONT face=3DArial size=3D2>Any other thoughts?</FONT> </P> <P><FONT face=3DArial size=3D2>Steve</FONT> </P><BR> <P><FONT face=3DArial size=3D2>title: Atelier Ten Web Graphics</FONT> = <BR><FONT=20 face=3DArial size=3D2>image: </FONT><A=20 href=3D"http://192.168.1.2/pdfs/TSB_Exterior_thumb.gif"><U><FONT = face=3DArial=20 color=3D#0000ff=20 size=3D2>http://192.168.1.2/pdfs/TSB_Exterior_thumb.gif</FONT></U></A> = <BR><FONT=20 face=3DArial size=3D2>href: </FONT><A=20 href=3D"http://192.168.1.2/pdfs/phoenix.pdf"><U><FONT face=3DArial = color=3D#0000ff=20 size=3D2>http://192.168.1.2/pdfs/phoenix.pdf</FONT></U></A><FONT = face=3DArial=20 size=3D2> (support images)</FONT> <BR><FONT face=3DArial = size=3D2>resolving=20 '</FONT><A href=3D"http://192.168.1.2/pdfs/phoenix.pdf"><U><FONT = face=3DArial=20 color=3D#0000ff = size=3D2>http://192.168.1.2/pdfs/phoenix.pdf</FONT></U></A><FONT=20 face=3DArial size=3D2>'</FONT> </P> <P><FONT face=3DArial size=3D2> pushing </FONT><A=20 href=3D"http://192.168.1.2/pdfs/phoenix.pdf"><U><FONT face=3DArial = color=3D#0000ff=20 size=3D2>http://192.168.1.2/pdfs/phoenix.pdf</FONT></U></A> <BR><FONT = face=3DArial=20 size=3D2>+ size =3D 1186</FONT> <BR><FONT face=3DArial size=3D2>pick: = 192.168.1.2, #=20 servers =3D 1</FONT> <BR><FONT face=3DArial=20 size=3D2>1:1:1:http://192.168.1.2/pdfs/phoenix.pdf: Retrieval command = for=20 </FONT><A href=3D"http://192.168.1.2/pdfs/phoenix.pdf"><U><FONT = face=3DArial=20 color=3D#0000ff = size=3D2>http://192.168.1.2/pdfs/phoenix.pdf</FONT></U></A><FONT=20 face=3DArial size=3D2>: GET /pdfs/phoenix.pdf HTTP/1.0</FONT></P> <P><FONT face=3DArial size=3D2>User-Agent: htdig/3.1.6=20 ([EMAIL PROTECTED])</FONT> <BR><FONT = face=3DArial=20 size=3D2>Referer: </FONT><A href=3D"http://192.168.1.2/"><U><FONT = face=3DArial=20 color=3D#0000ff size=3D2>http://192.168.1.2/</FONT></U></A> <BR><FONT = face=3DArial=20 size=3D2>Host: 192.168.1.2</FONT> </P> <P><FONT face=3DArial size=3D2>Header line: HTTP/1.1 200 OK</FONT> = <BR><FONT=20 face=3DArial size=3D2>Header line: Date: Tue, 12 Mar 2002 20:00:42 = GMT</FONT>=20 <BR><FONT face=3DArial size=3D2>Header line: Server: Apache/1.3.20 = (Linux/SuSE)=20 PHP/4.0.6</FONT> <BR><FONT face=3DArial size=3D2>Header line: = Last-Modified: Thu,=20 14 Jun 2001 08:59:02 GMT</FONT> <BR><FONT face=3DArial = size=3D2>Converted Thu, 14=20 Jun 2001 08:59:02 GMT to Thu, 14 Jun 2001 08:59:02</FONT> <BR><FONT = face=3DArial=20 size=3D2>Header line: ETag: "9813c-71c2-3b287cd6"</FONT> <BR><FONT = face=3DArial=20 size=3D2>Header line: Accept-Ranges: bytes</FONT> <BR><FONT = face=3DArial=20 size=3D2>Header line: Content-Length: 29122</FONT> <BR><FONT = face=3DArial=20 size=3D2>Header line: Connection: close</FONT> <BR><FONT face=3DArial=20 size=3D2>Header line: Content-Type: application/pdf</FONT> <BR><FONT = face=3DArial=20 size=3D2>Header line:</FONT> <BR><FONT face=3DArial = size=3D2>returnStatus =3D 0</FONT>=20 <BR><FONT face=3DArial size=3D2>Read 8192 from document</FONT> = <BR><FONT=20 face=3DArial size=3D2>Read 8192 from document</FONT> <BR><FONT = face=3DArial=20 size=3D2>Read 8192 from document</FONT> <BR><FONT face=3DArial = size=3D2>Read 4546=20 from document</FONT> <BR><FONT face=3DArial size=3D2>Read a total of = 29122=20 bytes</FONT> <BR><FONT face=3DArial size=3D2>PDF::setContents(29122 = bytes)</FONT>=20 <BR><FONT face=3DArial size=3D2>PDF::parse(<A=20 = href=3D"http://192.168.1.2/pdfs/phoenix.pdf">http://192.168.1.2/pdfs/phoe= nix.pdf</A>)</FONT>=20 <BR><FONT face=3DArial size=3D2>PDF::parse: 19272 lines parsed</FONT> = <BR><FONT=20 face=3DArial size=3D2>PDF::parse ends normally</FONT> <BR><FONT = face=3DArial=20 size=3D2> size =3D 29122</FONT> <BR><FONT face=3DArial = size=3D2>pick:=20 192.168.1.2, # servers =3D 1</FONT>=20 = </P><BR>_________________________________________________________________= _______<BR>This=20 e-mail has been scanned for all viruses by Star Internet. = The<BR>service is=20 powered by MessageLabs. For more information on a = proactive<BR>anti-virus=20 service working around the clock, around the globe, visit:<BR><A=20 = href=3D"http://www.star.net.uk">http://www.star.net.uk</A><BR>___________= _____________________________________________________________<BR></BLOCKQ= UOTE></BODY></HTML> ------=_NextPart_000_002B_01C1CA71.CB32BDE0-- --__--__-- _______________________________________________ htdig-general list digest <[EMAIL PROTECTED]> Information: https://lists.sourceforge.net/lists/listinfo/htdig-general FAQ: http://htdig.sourceforge.net/FAQ.html End of htdig-general Digest _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

