Mike,
I figured it out.
It was a simple mistake on my side. I should have looked at doc2html and
pdf2html. Also, because some scripts are running within a jail and they
called some other scripts out of the jail, htdig couldn't index .doc and
.pdf files right.
I appreciate your time and help!
From: "CHUN KI SHIN" <[EMAIL PROTECTED]>
To: [email protected]
Subject: Re: [htdig] doc2html - indexed but no hits
Date: Thu, 10 May 2007 13:31:43 -0500
MIME-Version: 1.0
X-Originating-IP: [199.253.130.17]
X-Originating-Email: [EMAIL PROTECTED]
X-Sender: [EMAIL PROTECTED]
Received: from lists-outbound.sourceforge.net ([66.35.250.225]) by
bay0-mc6-f8.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668); Thu, 10
May 2007 11:31:56 -0700
Received: from sc8-sf-list1-new.sourceforge.net
(sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by
sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid CD310127B2; Thu, 10 May
2007 11:31:55 -0700 (PDT)
Received: from sc8-sf-mx2-b.sourceforge.net
([10.3.1.92]helo=mail.sourceforge.net)by sc8-sf-list1-new.sourceforge.net
with esmtp (Exim 4.43)id 1HmDQd-0006ib-3Zfor
[email protected]; Thu, 10 May 2007 11:31:53 -0700
Received: from bay0-omc2-s17.bay0.hotmail.com ([65.54.246.153])by
mail.sourceforge.net with esmtp (Exim 4.44) id 1HmDQc-0001j0-4qfor
[email protected]; Thu, 10 May 2007 11:31:50 -0700
Received: from hotmail.com ([207.46.10.118]) by
bay0-omc2-s17.bay0.hotmail.comwith Microsoft SMTPSVC(6.0.3790.2668); Thu,
10 May 2007 11:31:44 -0700
Received: from mail pickup service by hotmail.com with Microsoft
SMTPSVC;Thu, 10 May 2007 11:31:44 -0700
Received: from 207.46.10.123 by by121fd.bay121.hotmail.msn.com with
HTTP;Thu, 10 May 2007 18:31:43 GMT
X-Message-Info:
LsUYwwHHNt2vwKEzD7QXdX+5ZIcQzh6u3DLf2Y1dAJYi4WzeLGPo6RLmdxir0Vzn
X-OriginalArrivalTime: 10 May 2007 18:31:44.0476
(UTC)FILETIME=[75F109C0:01C79331]
X-Spam-Score: 0.5 (/)
X-Spam-Report: Spam Filtering performed by sourceforge.net.See
http://spamassassin.org/tag/ for more details.Report problems
tohttp://sf.net/tracker/?func=add&group_id=1&atid=2000010.5
FROM_ENDS_IN_NUMS From: ends in numbers0.0 MSGID_FROM_MTA_HEADER
Message-Id was added by a relay
X-BeenThere: [email protected]
X-Mailman-Version: 2.1.8
Precedence: list
List-Id: "A mailing list for general ht://Dig
discussion"<htdig-general.lists.sourceforge.net>
List-Unsubscribe:
<https://lists.sourceforge.net/lists/listinfo/htdig-general>,
<mailto:[EMAIL PROTECTED]>
List-Archive:
<http://sourceforge.net/mailarchive/forum.php?forum=htdig-general>
List-Post: <mailto:[email protected]>
List-Help:
<mailto:[EMAIL PROTECTED]>
List-Subscribe:
<https://lists.sourceforge.net/lists/listinfo/htdig-general>,
<mailto:[EMAIL PROTECTED]>
Errors-To: [EMAIL PROTECTED]
Return-Path: [EMAIL PROTECTED]
Mike,
It looks you are right. I reindexed the docs with -i -s -v option and got
the following:
https://devserverxxx.com/library/ADJA/docs/portlet-1_0-fr-spec.pdf:
size=438107
No plus or minus at all. This applies to .doc files, too.
So, after this, I put a print statement within the doc2html.pl and no
custom message was echoed out. Do you have any idea how to make sure
rundig.sh call doc2html.pl?
Oh, by the way, I thought simply clicking "Reply to" replies to the mailing
list, but looks like no. Thanks for letting me know.
Thanks,
From: <[EMAIL PROTECTED]>
To: <[email protected]>
Subject: Re: [htdig] doc2html - indexed but no hits
Date: Thu, 10 May 2007 16:21:17 +0100
MIME-Version: 1.0
Received: from lists-outbound.sourceforge.net ([66.35.250.225]) by
bay0-mc5-f8.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2668); Thu,
10 May 2007 08:21:49 -0700
Received: from sc8-sf-list1-new.sourceforge.net
(sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by
sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid 209C1123C2; Thu, 10
May 2007 08:21:49 -0700 (PDT)
Received: from sc8-sf-mx2-b.sourceforge.net
([10.3.1.92]helo=mail.sourceforge.net)by sc8-sf-list1-new.sourceforge.net
with esmtp (Exim 4.43)id 1HmASO-0002S7-L2for
[email protected]; Thu, 10 May 2007 08:21:28 -0700
Received: from smtp2.smtp.bt.com ([217.32.164.150])by mail.sourceforge.net
with esmtp (Exim 4.44) id 1HmASM-0006nn-Cxfor
[email protected]; Thu, 10 May 2007 08:21:28 -0700
Received: from I2KF03BV-UKBR.domain1.systemhost.net ([193.113.197.45])
bysmtp2.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 10 May
2007 16:21:19 +0100
Received: from E03MVZ4-UKDY.domain1.systemhost.net ([193.113.30.63])
byI2KF03BV-UKBR.domain1.systemhost.net with
MicrosoftSMTPSVC(6.0.3790.211); Thu, 10 May 2007 16:21:19 +0100
X-Message-Info:
LsUYwwHHNt3igTN6QK+bgFoRqCYjqfvL2Ze/1rHnaFaU0TpcCHeSaTTF0/ZTrvaR
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [htdig] doc2html -
indexed but no hits
Thread-Index: AceTFYj6xqGt5BctR2GUiMyIWArneQAAB83g
X-OriginalArrivalTime: 10 May 2007 15:21:19.0062
(UTC)FILETIME=[DBDD3760:01C79316]
X-Spam-Score: 1.2 (+)
X-Spam-Report: Spam Filtering performed by sourceforge.net.See
http://spamassassin.org/tag/ for more details.Report problems
tohttp://sf.net/tracker/?func=add&group_id=1&atid=2000010.2 NO_REAL_NAME
From: does not include a real name1.0 FORGED_RCVD_HELO
Received: contains a forged HELO
X-BeenThere: [email protected]
X-Mailman-Version: 2.1.8
Precedence: list
List-Id: "A mailing list for general ht://Dig
discussion"<htdig-general.lists.sourceforge.net>
List-Unsubscribe:
<https://lists.sourceforge.net/lists/listinfo/htdig-general>,
<mailto:[EMAIL PROTECTED]>
List-Archive:
<http://sourceforge.net/mailarchive/forum.php?forum=htdig-general>
List-Post: <mailto:[email protected]>
List-Help:
<mailto:[EMAIL PROTECTED]>
List-Subscribe:
<https://lists.sourceforge.net/lists/listinfo/htdig-general>,
<mailto:[EMAIL PROTECTED]>
Errors-To: [EMAIL PROTECTED]
Return-Path: [EMAIL PROTECTED]
In this case I can be fairly sure they were not called!
Note the line that says 'not changed' ? Not sure how extensive your
indexes are, or if you are in a production status, but you may want to
add the -i flag to do an index from scratch. From memory, the -s flag
turns on a set of summary statistics, which may include useful info.
During a normal run at the correct level, you should see a line like
++++---++-
for each file that you index. www.htdig.org can reveal what these
symbols mean - I can't remember off hand, but this helps to indicate
what is actually found inside a document. Check also that htmerge is
running at a similar verbosity setting.
On my system, doc2html etc is called from an intermediate DOS batch
file, which is an easy place to put in an extra bit of logging.
Alternatively, you may be brave enough to put a debug line into doc2html
itself - it is just a bit of PERL if I remember correctly.
Mike
NB, I have copied this back to the list - not sure if you meant to send
this direct, I get that wrong all the time!
> -----Original Message-----
> From: CHUN KI SHIN [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 10, 2007 4:12 PM
> To: Brockington,MJ,Michael,JPGA4X R
> Subject: Re: [htdig] doc2html - indexed but no hits
>
> Thanks for the quick response, Mike.
>
> Ok, I ran the script with -vv, and I don't know what I'm
> looking for from
> the index log. Only thing I can see is the following:
>
> pick: devserverxxx.com, # servers = 1
> 234:31:2:https://devserverxxx.com/library/ADJA/docs/portlet-1_
> 0-fr-spec.pdf:
> not changed
>
> and the same for .doc.
>
> Could you tell me how to make sure doc2html is being called?
>
> Also, what do you mean by 'statistics' in htdig?
>
> Thanks for your time and help!
>
> >From: <[EMAIL PROTECTED]>
> >To: <[email protected]>
> >Subject: Re: [htdig] doc2html - indexed but no hits
> >Date: Thu, 10 May 2007 14:14:59 +0100
> >MIME-Version: 1.0
> >Received: from lists-outbound.sourceforge.net ([66.35.250.225]) by
> >bay0-mc10-f3.bay0.hotmail.com with Microsoft
> SMTPSVC(6.0.3790.2668); Thu,
> >10 May 2007 06:15:16 -0700
> >Received: from sc8-sf-list1-new.sourceforge.net
> >(sc8-sf-list1-new-b.sourceforge.net [10.3.1.93])by
> >sc8-sf-spam2.sourceforge.net (Postfix) with ESMTPid
> 05C7C12E15; Thu, 10 May
> >2007 06:15:16 -0700 (PDT)
> >Received: from sc8-sf-mx1-b.sourceforge.net
> >([10.3.1.91]helo=mail.sourceforge.net)by
> sc8-sf-list1-new.sourceforge.net
> >with esmtp (Exim 4.43)id 1Hm8U9-0004LN-Hnfor
> >[email protected]; Thu, 10 May 2007 06:15:09 -0700
> >Received: from smtp2.smtp.bt.com ([217.32.164.150])by
> mail.sourceforge.net
> >with esmtp (Exim 4.44) id 1Hm8U7-0004Pw-NFfor
> >[email protected]; Thu, 10 May 2007 06:15:09 -0700
> >Received: from I2KF03CV-UKBR.domain1.systemhost.net
> ([193.113.197.43])
> >bysmtp2.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.1830);
> Thu, 10 May 2007
> >14:15:00 +0100
> >Received: from E03MVZ4-UKDY.domain1.systemhost.net ([193.113.30.63])
> >byI2KF03CV-UKBR.domain1.systemhost.net with
> MicrosoftSMTPSVC(6.0.3790.211);
> >Thu, 10 May 2007 14:15:00 +0100
> >X-Message-Info:
> >LsUYwwHHNt3igTN6QK+bgHeD79v5SZW0Ne7jEEII55/mb39+2hv8+2ps07jKcsv0
> >X-MimeOLE: Produced By Microsoft Exchange V6.5
> >Content-class: urn:content-classes:message
> >X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [htdig]
> doc2html -
> >indexed but no hits
> >Thread-Index: AceTAM4rcEeX2/+QTI2LarpwABt5LAABAOJg
> >X-OriginalArrivalTime: 10 May 2007 13:15:00.0122
> >(UTC)FILETIME=[3676BFA0:01C79305]
> >X-Spam-Score: 1.2 (+)
> >X-Spam-Report: Spam Filtering performed by sourceforge.net.See
> >http://spamassassin.org/tag/ for more details.Report problems
> >tohttp://sf.net/tracker/?func=add&group_id=1&atid=2000010.2
> NO_REAL_NAME
> > From: does not include a real name1.0 FORGED_RCVD_HELO
> >Received: contains a forged HELO
> >X-BeenThere: [email protected]
> >X-Mailman-Version: 2.1.8
> >Precedence: list
> >List-Id: "A mailing list for general ht://Dig
> >discussion"<htdig-general.lists.sourceforge.net>
> >List-Unsubscribe:
> ><https://lists.sourceforge.net/lists/listinfo/htdig-general>,
> ><mailto:[EMAIL PROTECTED]
> nsubscribe>
> >List-Archive:
> ><http://sourceforge.net/mailarchive/forum.php?forum=htdig-general>
> >List-Post: <mailto:[email protected]>
> >List-Help:
> ><mailto:[EMAIL PROTECTED]>
> >List-Subscribe:
> ><https://lists.sourceforge.net/lists/listinfo/htdig-general>,
> ><mailto:[EMAIL PROTECTED]
> ubscribe>
> >Errors-To: [EMAIL PROTECTED]
> >Return-Path: [EMAIL PROTECTED]
> >
> >Can you tell if doc2html is actually being called by htdig? Just
> >because htdig is downloading the document, it does not
> guarantee that it
> >is being passed over for conversion to an indexable format.
> >It might be worth decreasing the number of v's you are
> using by one or
> >two so that you can see what is being found in each
> document. Not sure
> >if you have the 'statistics' turned on?
> >
> >Regards,
> >Mike
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On
> > > Behalf Of CHUN KI SHIN
> > > Sent: Thursday, May 10, 2007 1:43 PM
> > > To: [email protected]
> > > Subject: [htdig] doc2html - indexed but no hits
> > >
> > > I've been trying to index .pdf and .doc documents in v.
> 3.2.0b with
> > > doc2html/catdoc/pdf2html.
> > > I can see both types indexed fine (though I'm not sure why
> > > log doesn't tell
> > > which words and tags have been indexed). See below:
> > >
> >
> >-------------------------------------------------------------
> ------------
> >This SF.net email is sponsored by DB2 Express
> >Download DB2 Express C - the FREE version of DB2 express and take
> >control of your XML. No limits. Just data. Click to get it now.
> >http://sourceforge.net/powerbar/db2/
> >_______________________________________________
> >ht://Dig general mailing list: <[email protected]>
> >ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> >List information (subscribe/unsubscribe, etc.)
> >https://lists.sourceforge.net/lists/listinfo/htdig-general
>
> _________________________________________________________________
> PC Magazine's 2007 editors' choice for best Web
> mail-award-winning Windows
> Live Hotmail.
> http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_
> TAGHM_migration_HM_mini_pcmag_0507
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general
_________________________________________________________________
See what youre getting into
before you go there
http://newlivehotmail.com/?ocid=TXT_TAGHM_migration_HM_viral_preview_0507
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general
_________________________________________________________________
Now you can see trouble
before he arrives
http://newlivehotmail.com/?ocid=TXT_TAGHM_migration_HM_viral_protection_0507
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general