Mike,

I figured it out.
It was a simple mistake on my side. I should have looked at doc2html and pdf2html. Also, because some scripts are running within a jail and they called some other scripts out of the jail, htdig couldn't index .doc and .pdf files right.

I appreciate your time and help!


From: "CHUN KI SHIN" <[EMAIL PROTECTED]>
To: [email protected]
Subject: Re: [htdig] doc2html - indexed but no hits
Date: Thu, 10 May 2007 13:31:43 -0500
MIME-Version: 1.0
X-Originating-IP: [199.253.130.17]
X-Originating-Email: [EMAIL PROTECTED]
X-Sender: [EMAIL PROTECTED]
Received: from lists-outbound.sourceforge.net ([66.35.250.225]) by bay0-mc6-f8.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668); Thu, 10 May 2007 11:31:56 -0700 Received: from sc8-sf-list1-new.sourceforge.net (sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid CD310127B2; Thu, 10 May 2007 11:31:55 -0700 (PDT) Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.92]helo=mail.sourceforge.net)by sc8-sf-list1-new.sourceforge.net with esmtp (Exim 4.43)id 1HmDQd-0006ib-3Zfor [email protected]; Thu, 10 May 2007 11:31:53 -0700 Received: from bay0-omc2-s17.bay0.hotmail.com ([65.54.246.153])by mail.sourceforge.net with esmtp (Exim 4.44) id 1HmDQc-0001j0-4qfor [email protected]; Thu, 10 May 2007 11:31:50 -0700 Received: from hotmail.com ([207.46.10.118]) by bay0-omc2-s17.bay0.hotmail.comwith Microsoft SMTPSVC(6.0.3790.2668); Thu, 10 May 2007 11:31:44 -0700 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC;Thu, 10 May 2007 11:31:44 -0700 Received: from 207.46.10.123 by by121fd.bay121.hotmail.msn.com with HTTP;Thu, 10 May 2007 18:31:43 GMT X-Message-Info: LsUYwwHHNt2vwKEzD7QXdX+5ZIcQzh6u3DLf2Y1dAJYi4WzeLGPo6RLmdxir0Vzn X-OriginalArrivalTime: 10 May 2007 18:31:44.0476 (UTC)FILETIME=[75F109C0:01C79331]
X-Spam-Score: 0.5 (/)
X-Spam-Report: Spam Filtering performed by sourceforge.net.See http://spamassassin.org/tag/ for more details.Report problems tohttp://sf.net/tracker/?func=add&group_id=1&atid=2000010.5 FROM_ENDS_IN_NUMS From: ends in numbers0.0 MSGID_FROM_MTA_HEADER Message-Id was added by a relay
X-BeenThere: [email protected]
X-Mailman-Version: 2.1.8
Precedence: list
List-Id: "A mailing list for general ht://Dig discussion"<htdig-general.lists.sourceforge.net> List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/htdig-general>, <mailto:[EMAIL PROTECTED]> List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum=htdig-general>
List-Post: <mailto:[email protected]>
List-Help: <mailto:[EMAIL PROTECTED]> List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/htdig-general>, <mailto:[EMAIL PROTECTED]>
Errors-To: [EMAIL PROTECTED]
Return-Path: [EMAIL PROTECTED]

Mike,

It looks you are right. I reindexed the docs with -i -s -v option and got the following:

https://devserverxxx.com/library/ADJA/docs/portlet-1_0-fr-spec.pdf: size=438107

No plus or minus at all. This applies to .doc files, too.

So, after this, I put a print statement within the doc2html.pl and no custom message was echoed out. Do you have any idea how to make sure rundig.sh call doc2html.pl?

Oh, by the way, I thought simply clicking "Reply to" replies to the mailing list, but looks like no. Thanks for letting me know.

Thanks,


From: <[EMAIL PROTECTED]>
To: <[email protected]>
Subject: Re: [htdig] doc2html - indexed but no hits
Date: Thu, 10 May 2007 16:21:17 +0100
MIME-Version: 1.0
Received: from lists-outbound.sourceforge.net ([66.35.250.225]) by bay0-mc5-f8.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668); Thu, 10 May 2007 08:21:49 -0700 Received: from sc8-sf-list1-new.sourceforge.net (sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid 209C1123C2; Thu, 10 May 2007 08:21:49 -0700 (PDT) Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.92]helo=mail.sourceforge.net)by sc8-sf-list1-new.sourceforge.net with esmtp (Exim 4.43)id 1HmASO-0002S7-L2for [email protected]; Thu, 10 May 2007 08:21:28 -0700 Received: from smtp2.smtp.bt.com ([217.32.164.150])by mail.sourceforge.net with esmtp (Exim 4.44) id 1HmASM-0006nn-Cxfor [email protected]; Thu, 10 May 2007 08:21:28 -0700 Received: from I2KF03BV-UKBR.domain1.systemhost.net ([193.113.197.45]) bysmtp2.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 10 May 2007 16:21:19 +0100 Received: from E03MVZ4-UKDY.domain1.systemhost.net ([193.113.30.63]) byI2KF03BV-UKBR.domain1.systemhost.net with MicrosoftSMTPSVC(6.0.3790.211); Thu, 10 May 2007 16:21:19 +0100 X-Message-Info: LsUYwwHHNt3igTN6QK+bgFoRqCYjqfvL2Ze/1rHnaFaU0TpcCHeSaTTF0/ZTrvaR
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [htdig] doc2html - indexed but no hits
Thread-Index: AceTFYj6xqGt5BctR2GUiMyIWArneQAAB83g
X-OriginalArrivalTime: 10 May 2007 15:21:19.0062 (UTC)FILETIME=[DBDD3760:01C79316]
X-Spam-Score: 1.2 (+)
X-Spam-Report: Spam Filtering performed by sourceforge.net.See http://spamassassin.org/tag/ for more details.Report problems tohttp://sf.net/tracker/?func=add&group_id=1&atid=2000010.2 NO_REAL_NAME From: does not include a real name1.0 FORGED_RCVD_HELO Received: contains a forged HELO
X-BeenThere: [email protected]
X-Mailman-Version: 2.1.8
Precedence: list
List-Id: "A mailing list for general ht://Dig discussion"<htdig-general.lists.sourceforge.net> List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/htdig-general>, <mailto:[EMAIL PROTECTED]> List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum=htdig-general>
List-Post: <mailto:[email protected]>
List-Help: <mailto:[EMAIL PROTECTED]> List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/htdig-general>, <mailto:[EMAIL PROTECTED]>
Errors-To: [EMAIL PROTECTED]
Return-Path: [EMAIL PROTECTED]

 In this case I can be fairly sure they were not called!
Note the line that says 'not changed' ? Not sure how extensive your
indexes are, or if you are in a production status, but you may want to
add the -i  flag to do an index from scratch.  From memory, the -s  flag
turns on a set of summary statistics, which may include useful info.
During a normal run at the correct level, you should see a line like
++++---++-
for each file that you index.  www.htdig.org  can reveal what these
symbols mean - I can't remember off hand, but this helps to indicate
what is actually found inside a document. Check also that htmerge is
running at a similar verbosity setting.

On my system, doc2html etc is called from an intermediate DOS batch
file, which is an easy place to put in an extra bit of logging.
Alternatively, you may be brave enough to put a debug line into doc2html
itself - it is just a bit of PERL if I remember correctly.

Mike
NB, I have copied this back to the list - not sure if you meant to send
this direct, I get that wrong all the time!

> -----Original Message-----
> From: CHUN KI SHIN [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 10, 2007 4:12 PM
> To: Brockington,MJ,Michael,JPGA4X R
> Subject: Re: [htdig] doc2html - indexed but no hits
>
> Thanks for the quick response, Mike.
>
> Ok, I ran the script with -vv, and I don't know what I'm
> looking for from
> the index log. Only thing I can see is the following:
>
> pick: devserverxxx.com, # servers = 1
> 234:31:2:https://devserverxxx.com/library/ADJA/docs/portlet-1_
> 0-fr-spec.pdf:
>   not changed
>
> and the same for .doc.
>
> Could you tell me how to make sure doc2html is being called?
>
> Also, what do you mean by 'statistics' in htdig?
>
> Thanks for your time and help!
>
> >From: <[EMAIL PROTECTED]>
> >To: <[email protected]>
> >Subject: Re: [htdig] doc2html - indexed but no hits
> >Date: Thu, 10 May 2007 14:14:59 +0100
> >MIME-Version: 1.0
> >Received: from lists-outbound.sourceforge.net ([66.35.250.225]) by
> >bay0-mc10-f3.bay0.hotmail.com with Microsoft
> SMTPSVC(6.0.3790.2668); Thu,
> >10 May 2007 06:15:16 -0700
> >Received: from sc8-sf-list1-new.sourceforge.net
> >(sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by
> >sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid
> 05C7C12E15; Thu, 10 May
> >2007 06:15:16 -0700 (PDT)
> >Received: from sc8-sf-mx1-b.sourceforge.net
> >([10.3.1.91]helo=mail.sourceforge.net)by
> sc8-sf-list1-new.sourceforge.net
> >with esmtp (Exim 4.43)id 1Hm8U9-0004LN-Hnfor
> >[email protected]; Thu, 10 May 2007 06:15:09 -0700
> >Received: from smtp2.smtp.bt.com ([217.32.164.150])by
> mail.sourceforge.net
> >with esmtp (Exim 4.44) id 1Hm8U7-0004Pw-NFfor
> >[email protected]; Thu, 10 May 2007 06:15:09 -0700
> >Received: from I2KF03CV-UKBR.domain1.systemhost.net
> ([193.113.197.43])
> >bysmtp2.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.1830);
> Thu, 10 May 2007
> >14:15:00 +0100
> >Received: from E03MVZ4-UKDY.domain1.systemhost.net ([193.113.30.63])
> >byI2KF03CV-UKBR.domain1.systemhost.net with
> MicrosoftSMTPSVC(6.0.3790.211);
> >Thu, 10 May 2007 14:15:00 +0100
> >X-Message-Info:
> >LsUYwwHHNt3igTN6QK+bgHeD79v5SZW0Ne7jEEII55/mb39+2hv8+2ps07jKcsv0
> >X-MimeOLE: Produced By Microsoft Exchange V6.5
> >Content-class: urn:content-classes:message
> >X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [htdig]
> doc2html -
> >indexed but no hits
> >Thread-Index: AceTAM4rcEeX2/+QTI2LarpwABt5LAABAOJg
> >X-OriginalArrivalTime: 10 May 2007 13:15:00.0122
> >(UTC)FILETIME=[3676BFA0:01C79305]
> >X-Spam-Score: 1.2 (+)
> >X-Spam-Report: Spam Filtering performed by sourceforge.net.See
> >http://spamassassin.org/tag/ for more details.Report problems
> >tohttp://sf.net/tracker/?func=add&group_id=1&atid=2000010.2
> NO_REAL_NAME
> >        From: does not include a real name1.0 FORGED_RCVD_HELO
> >Received: contains a forged HELO
> >X-BeenThere: [email protected]
> >X-Mailman-Version: 2.1.8
> >Precedence: list
> >List-Id: "A mailing list for general ht://Dig
> >discussion"<htdig-general.lists.sourceforge.net>
> >List-Unsubscribe:
> ><https://lists.sourceforge.net/lists/listinfo/htdig-general>,
> ><mailto:[EMAIL PROTECTED]
> nsubscribe>
> >List-Archive:
> ><http://sourceforge.net/mailarchive/forum.php?forum=htdig-general>
> >List-Post: <mailto:[email protected]>
> >List-Help:
> ><mailto:[EMAIL PROTECTED]>
> >List-Subscribe:
> ><https://lists.sourceforge.net/lists/listinfo/htdig-general>,
> ><mailto:[EMAIL PROTECTED]
> ubscribe>
> >Errors-To: [EMAIL PROTECTED]
> >Return-Path: [EMAIL PROTECTED]
> >
> >Can you tell if  doc2html is actually being called by htdig? Just
> >because htdig is downloading the document, it does not
> guarantee that it
> >is being passed over for conversion to an indexable format.
> >It might be worth decreasing the number of  v's you are
> using by one or
> >two so that you can see what is being found in each
> document. Not sure
> >if you have the 'statistics' turned on?
> >
> >Regards,
> >Mike
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On
> > > Behalf Of CHUN KI SHIN
> > > Sent: Thursday, May 10, 2007 1:43 PM
> > > To: [email protected]
> > > Subject: [htdig] doc2html - indexed but no hits
> > >
> > > I've been trying to index .pdf and .doc documents in v.
> 3.2.0b with
> > > doc2html/catdoc/pdf2html.
> > > I can see both types indexed fine (though I'm not sure why
> > > log doesn't tell
> > > which words and tags have been indexed). See below:
> > >
> >
> >-------------------------------------------------------------
> ------------
> >This SF.net email is sponsored by DB2 Express
> >Download DB2 Express C - the FREE version of DB2 express and take
> >control of your XML. No limits. Just data. Click to get it now.
> >http://sourceforge.net/powerbar/db2/
> >_______________________________________________
> >ht://Dig general mailing list: <[email protected]>
> >ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> >List information (subscribe/unsubscribe, etc.)
> >https://lists.sourceforge.net/lists/listinfo/htdig-general
>
> _________________________________________________________________
> PC Magazine's 2007 editors' choice for best Web
> mail-award-winning Windows
> Live Hotmail.
> http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_
> TAGHM_migration_HM_mini_pcmag_0507
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

_________________________________________________________________
See what you’re getting into…before you go there http://newlivehotmail.com/?ocid=TXT_TAGHM_migration_HM_viral_preview_0507




-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/


_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

_________________________________________________________________
Now you can see trouble…before he arrives http://newlivehotmail.com/?ocid=TXT_TAGHM_migration_HM_viral_protection_0507


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to