Mike,
It looks you are right. I reindexed the docs with -i -s -v option and got
the following:
https://devserverxxx.com/library/ADJA/docs/portlet-1_0-fr-spec.pdf:
size=438107
No plus or minus at all. This applies to .doc files, too.
So, after this, I put a print statement within the doc2html.pl and no custom
message was echoed out. Do you have any idea how to make sure rundig.sh call
doc2html.pl?
Oh, by the way, I thought simply clicking "Reply to" replies to the mailing
list, but looks like no. Thanks for letting me know.
Thanks,
From: <[EMAIL PROTECTED]>
To: <[email protected]>
Subject: Re: [htdig] doc2html - indexed but no hits
Date: Thu, 10 May 2007 16:21:17 +0100
MIME-Version: 1.0
Received: from lists-outbound.sourceforge.net ([66.35.250.225]) by
bay0-mc5-f8.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668); Thu, 10
May 2007 08:21:49 -0700
Received: from sc8-sf-list1-new.sourceforge.net
(sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by
sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid 209C1123C2; Thu, 10 May
2007 08:21:49 -0700 (PDT)
Received: from sc8-sf-mx2-b.sourceforge.net
([10.3.1.92]helo=mail.sourceforge.net)by sc8-sf-list1-new.sourceforge.net
with esmtp (Exim 4.43)id 1HmASO-0002S7-L2for
[email protected]; Thu, 10 May 2007 08:21:28 -0700
Received: from smtp2.smtp.bt.com ([217.32.164.150])by mail.sourceforge.net
with esmtp (Exim 4.44) id 1HmASM-0006nn-Cxfor
[email protected]; Thu, 10 May 2007 08:21:28 -0700
Received: from I2KF03BV-UKBR.domain1.systemhost.net ([193.113.197.45])
bysmtp2.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 10 May 2007
16:21:19 +0100
Received: from E03MVZ4-UKDY.domain1.systemhost.net ([193.113.30.63])
byI2KF03BV-UKBR.domain1.systemhost.net with MicrosoftSMTPSVC(6.0.3790.211);
Thu, 10 May 2007 16:21:19 +0100
X-Message-Info:
LsUYwwHHNt3igTN6QK+bgFoRqCYjqfvL2Ze/1rHnaFaU0TpcCHeSaTTF0/ZTrvaR
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [htdig] doc2html -
indexed but no hits
Thread-Index: AceTFYj6xqGt5BctR2GUiMyIWArneQAAB83g
X-OriginalArrivalTime: 10 May 2007 15:21:19.0062
(UTC)FILETIME=[DBDD3760:01C79316]
X-Spam-Score: 1.2 (+)
X-Spam-Report: Spam Filtering performed by sourceforge.net.See
http://spamassassin.org/tag/ for more details.Report problems
tohttp://sf.net/tracker/?func=add&group_id=1&atid=2000010.2 NO_REAL_NAME
From: does not include a real name1.0 FORGED_RCVD_HELO
Received: contains a forged HELO
X-BeenThere: [email protected]
X-Mailman-Version: 2.1.8
Precedence: list
List-Id: "A mailing list for general ht://Dig
discussion"<htdig-general.lists.sourceforge.net>
List-Unsubscribe:
<https://lists.sourceforge.net/lists/listinfo/htdig-general>,
<mailto:[EMAIL PROTECTED]>
List-Archive:
<http://sourceforge.net/mailarchive/forum.php?forum=htdig-general>
List-Post: <mailto:[email protected]>
List-Help:
<mailto:[EMAIL PROTECTED]>
List-Subscribe:
<https://lists.sourceforge.net/lists/listinfo/htdig-general>,
<mailto:[EMAIL PROTECTED]>
Errors-To: [EMAIL PROTECTED]
Return-Path: [EMAIL PROTECTED]
In this case I can be fairly sure they were not called!
Note the line that says 'not changed' ? Not sure how extensive your
indexes are, or if you are in a production status, but you may want to
add the -i flag to do an index from scratch. From memory, the -s flag
turns on a set of summary statistics, which may include useful info.
During a normal run at the correct level, you should see a line like
++++---++-
for each file that you index. www.htdig.org can reveal what these
symbols mean - I can't remember off hand, but this helps to indicate
what is actually found inside a document. Check also that htmerge is
running at a similar verbosity setting.
On my system, doc2html etc is called from an intermediate DOS batch
file, which is an easy place to put in an extra bit of logging.
Alternatively, you may be brave enough to put a debug line into doc2html
itself - it is just a bit of PERL if I remember correctly.
Mike
NB, I have copied this back to the list - not sure if you meant to send
this direct, I get that wrong all the time!
> -----Original Message-----
> From: CHUN KI SHIN [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 10, 2007 4:12 PM
> To: Brockington,MJ,Michael,JPGA4X R
> Subject: Re: [htdig] doc2html - indexed but no hits
>
> Thanks for the quick response, Mike.
>
> Ok, I ran the script with -vv, and I don't know what I'm
> looking for from
> the index log. Only thing I can see is the following:
>
> pick: devserverxxx.com, # servers = 1
> 234:31:2:https://devserverxxx.com/library/ADJA/docs/portlet-1_
> 0-fr-spec.pdf:
> not changed
>
> and the same for .doc.
>
> Could you tell me how to make sure doc2html is being called?
>
> Also, what do you mean by 'statistics' in htdig?
>
> Thanks for your time and help!
>
> >From: <[EMAIL PROTECTED]>
> >To: <[email protected]>
> >Subject: Re: [htdig] doc2html - indexed but no hits
> >Date: Thu, 10 May 2007 14:14:59 +0100
> >MIME-Version: 1.0
> >Received: from lists-outbound.sourceforge.net ([66.35.250.225]) by
> >bay0-mc10-f3.bay0.hotmail.com with Microsoft
> SMTPSVC(6.0.3790.2668); Thu,
> >10 May 2007 06:15:16 -0700
> >Received: from sc8-sf-list1-new.sourceforge.net
> >(sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by
> >sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid
> 05C7C12E15; Thu, 10 May
> >2007 06:15:16 -0700 (PDT)
> >Received: from sc8-sf-mx1-b.sourceforge.net
> >([10.3.1.91]helo=mail.sourceforge.net)by
> sc8-sf-list1-new.sourceforge.net
> >with esmtp (Exim 4.43)id 1Hm8U9-0004LN-Hnfor
> >[email protected]; Thu, 10 May 2007 06:15:09 -0700
> >Received: from smtp2.smtp.bt.com ([217.32.164.150])by
> mail.sourceforge.net
> >with esmtp (Exim 4.44) id 1Hm8U7-0004Pw-NFfor
> >[email protected]; Thu, 10 May 2007 06:15:09 -0700
> >Received: from I2KF03CV-UKBR.domain1.systemhost.net
> ([193.113.197.43])
> >bysmtp2.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.1830);
> Thu, 10 May 2007
> >14:15:00 +0100
> >Received: from E03MVZ4-UKDY.domain1.systemhost.net ([193.113.30.63])
> >byI2KF03CV-UKBR.domain1.systemhost.net with
> MicrosoftSMTPSVC(6.0.3790.211);
> >Thu, 10 May 2007 14:15:00 +0100
> >X-Message-Info:
> >LsUYwwHHNt3igTN6QK+bgHeD79v5SZW0Ne7jEEII55/mb39+2hv8+2ps07jKcsv0
> >X-MimeOLE: Produced By Microsoft Exchange V6.5
> >Content-class: urn:content-classes:message
> >X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [htdig]
> doc2html -
> >indexed but no hits
> >Thread-Index: AceTAM4rcEeX2/+QTI2LarpwABt5LAABAOJg
> >X-OriginalArrivalTime: 10 May 2007 13:15:00.0122
> >(UTC)FILETIME=[3676BFA0:01C79305]
> >X-Spam-Score: 1.2 (+)
> >X-Spam-Report: Spam Filtering performed by sourceforge.net.See
> >http://spamassassin.org/tag/ for more details.Report problems
> >tohttp://sf.net/tracker/?func=add&group_id=1&atid=2000010.2
> NO_REAL_NAME
> > From: does not include a real name1.0 FORGED_RCVD_HELO
> >Received: contains a forged HELO
> >X-BeenThere: [email protected]
> >X-Mailman-Version: 2.1.8
> >Precedence: list
> >List-Id: "A mailing list for general ht://Dig
> >discussion"<htdig-general.lists.sourceforge.net>
> >List-Unsubscribe:
> ><https://lists.sourceforge.net/lists/listinfo/htdig-general>,
> ><mailto:[EMAIL PROTECTED]
> nsubscribe>
> >List-Archive:
> ><http://sourceforge.net/mailarchive/forum.php?forum=htdig-general>
> >List-Post: <mailto:[email protected]>
> >List-Help:
> ><mailto:[EMAIL PROTECTED]>
> >List-Subscribe:
> ><https://lists.sourceforge.net/lists/listinfo/htdig-general>,
> ><mailto:[EMAIL PROTECTED]
> ubscribe>
> >Errors-To: [EMAIL PROTECTED]
> >Return-Path: [EMAIL PROTECTED]
> >
> >Can you tell if doc2html is actually being called by htdig? Just
> >because htdig is downloading the document, it does not
> guarantee that it
> >is being passed over for conversion to an indexable format.
> >It might be worth decreasing the number of v's you are
> using by one or
> >two so that you can see what is being found in each
> document. Not sure
> >if you have the 'statistics' turned on?
> >
> >Regards,
> >Mike
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On
> > > Behalf Of CHUN KI SHIN
> > > Sent: Thursday, May 10, 2007 1:43 PM
> > > To: [email protected]
> > > Subject: [htdig] doc2html - indexed but no hits
> > >
> > > I've been trying to index .pdf and .doc documents in v.
> 3.2.0b with
> > > doc2html/catdoc/pdf2html.
> > > I can see both types indexed fine (though I'm not sure why
> > > log doesn't tell
> > > which words and tags have been indexed). See below:
> > >
> >
> >-------------------------------------------------------------
> ------------
> >This SF.net email is sponsored by DB2 Express
> >Download DB2 Express C - the FREE version of DB2 express and take
> >control of your XML. No limits. Just data. Click to get it now.
> >http://sourceforge.net/powerbar/db2/
> >_______________________________________________
> >ht://Dig general mailing list: <[email protected]>
> >ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> >List information (subscribe/unsubscribe, etc.)
> >https://lists.sourceforge.net/lists/listinfo/htdig-general
>
> _________________________________________________________________
> PC Magazine's 2007 editors' choice for best Web
> mail-award-winning Windows
> Live Hotmail.
> http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_
> TAGHM_migration_HM_mini_pcmag_0507
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general
_________________________________________________________________
See what youre getting into
before you go there
http://newlivehotmail.com/?ocid=TXT_TAGHM_migration_HM_viral_preview_0507
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general