[htdig] 3.2b3snapshot , Was: Re: [htdig] Problems compiling 3.20b2

2001-01-17 Thread Richard van Drimmelen

Yep ,the disable-shared works.

rundig -s option is gone now ?

Gilles Detillieux wrote:
 
 According to richard:
  compiling went fine, after I installled zlib-1.1.3.
  But running rundig -c my.conf:
 
  Arithmetic Exception - core dumped
 
  core file from htfuzzy.
 
 Oh, right.  On Solaris, you must use the --disable-shared option on
 ./configure to avoid this problem.  We still haven't gotten to the
 bottom of this, but C++ objects in shared libraries don't seem to get
 initialized properly on Solaris, causing this error.  Avoiding shared
 libraries for ht://Dig's C++ classes avoids this problem.
 
 --
 Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
 Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
 Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
 Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930
 
 
 To unsubscribe from the htdig mailing list, send a message to
 [EMAIL PROTECTED]
 You will receive a message to confirm this.
 List archives:  http://www.htdig.org/mail/menu.html
 FAQ:http://www.htdig.org/FAQ.html

-- 
Richard van Drimmelen   | email: [EMAIL PROTECTED]
Facility Management | phone: +31 20 5928080
SARA Computing Services | fax:   +31 20 6683167


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] excluding page section? sorting output?

2001-01-17 Thread Bernhard Krickl

Hi!

My boss keeps asking me about features he wants with htdig.
Recently he came up with the following:

Is there a way to exclude a section on an HTML-page from 
indexing? Thats because navigational elements often produce hits
when the content doesn't match much. (Frames are not an option!)

Is there a way to sort the output by category?

And here's another one:
Is there a possibility to index Shockwave Flash files?
Let me guess: Yes, if I have an external parser.
In this case: Where do i find one?

Thanks.
-- 
bernhard krickl

begin:vcard 
n:Krickl;Bernhard
tel;fax:0211 311655 33
tel;work:0211 311655 0
x-mozilla-html:FALSE
org:tro new media gmbh
adr:;;zimmerstr. 19;duesseldorf;;40215;
version:2.1
email;internet:[EMAIL PROTECTED]
x-mozilla-cpt:;0
fn:Bernhard Krickl
end:vcard




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html


[htdig] Problems with the page dates.

2001-01-17 Thread Nick Heuser

Hallo!

We have problems with htdig 3.1.5 on a freebsd 3.1. Htdig works fine, but the 
$(MODIFIED) field stays empty unless you specify sorting by date. But even 
then the date is sometimes missing. 
I index the site using one large (script created) html file where all other 
pages are linked in. This page itself is not for browsing just for indexing.
Any help is welcome. Thank you in advance.

Nick

P.S.: If I can fix this to next week htdig will become the new search engine 
for bigbrother.de.
-- 

 Syngery: 11:37am  up  1:31,  1 user,  load average: 0.05, 0.13, 0.09
++
= AME Aigner Media  Entertainment GmbH  Multimedia  -  Radio  TV =
==
==  Bavariaring 8Nick H E U S E R   ==
==  D-80336 Muenchen System-  Network Engineer ==
==  ==
==  Tel:[+49] 089-427 05 #332eMail: [EMAIL PROTECTED]   ==
==  Fax:[+49] 089-427 05 #400http://ame.de  ==
==
=   www.NetRadio.de   -   www.MuniCam.de  -  www.netNewsLetter.de=
++


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] excluding page section? sorting output?

2001-01-17 Thread Torsten Neuer

Bernhard Krickl wrote:
 
 Hi!
 
 My boss keeps asking me about features he wants with htdig.
 Recently he came up with the following:
 
 Is there a way to exclude a section on an HTML-page from
 indexing? Thats because navigational elements often produce hits
 when the content doesn't match much. (Frames are not an option!)

There is a way.  Please see the Ht://Dig documentation:
http://www.htdig.org/attrs.html#noindex_start
http://www.htdig.org/attrs.html#noindex_end

 
 Is there a way to sort the output by category?

Yes and no ;)

This highly depends upon how you define "category".

Basically, you can sort the output by score, time and title.
If you structure your Web-Site in a way that you can automagically
use the document titles for categories, that's the way it goes...
For more information, please see:
http://www.htdig.org/attrs.html#sort

 
 And here's another one:
 Is there a possibility to index Shockwave Flash files?
 Let me guess: Yes, if I have an external parser.
Yep ;)
 In this case: Where do i find one?

This is a bit harder.  I searched the web for an existing parser but
only
found some more-or-less useful docs and one generic parser.

This generic parser (see attachment) can easily be used within a wrapper
script to at least extract links from a flash menu, which in my opinion
is
the most requested feature.


hth,

  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstrae 14Tel: +49-4101-403605
D-25474 EllerbekFax: +49-4101-403606
E-Mail: [EMAIL PROTECTED]Internet: http://www.inwise.de
 swfparser.tar.gz


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html


Re: [htdig] excluding page section? sorting output?

2001-01-17 Thread Torsten Neuer

Bernhard Krickl wrote:
 
   Is there a way to sort the output by category?
  Basically, you can sort the output by score, time and title.
  If you structure your Web-Site in a way that you can automagically
  use the document titles for categories, that's the way it goes...
  For more information, please see:
  http://www.htdig.org/attrs.html#sort
 
 This does not help. I'm thinking about self-defined categories,
 maybe defined by some Meta-tag or meta-keywords.
 Doc-titles might be out of question, but I'll check it.
 
 Any more ideas?

Categories could also be implemented via URL structures.  In this case
you could either patch the CGI program to add a sort-by-url method or
run the (complete) search output through an additional wraper script.

If you have categories based upon META tags, you'll need to change the
database in order to support this special information.


   Is there a possibility to index Shockwave Flash files?
  This is a bit harder.  I searched the web for an existing parser but
  only
  found some more-or-less useful docs and one generic parser.
 
  This generic parser (see attachment) can easily be used within a wrapper
  script to at least extract links from a flash menu, which in my opinion
  is
  the most requested feature.
 
 Thanx for this one, but I'll need a bit more time to check it.
 Anway, extracting links is not enough, i think. keywords or full text
 index are needed.

Well, full text index should also be possible, but requires some more
work on the parser.  The attached one is just a very generic one which
dumps all the different record entries of a flash file.  It is not de-
signed to be an axternal parser for Ht://Dig, but it works well with
the shell wrapper to extract links from flash menus.  With some addi-
tional work it shoudl be possible to produce a fully fledged external
parser out of it (yet, I haven't found the time nor did I have some
projects depending on that).


hth,

  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstrae 14Tel: +49-4101-403605
D-25474 EllerbekFax: +49-4101-403606
E-Mail: [EMAIL PROTECTED]Internet: http://www.inwise.de


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] excluding page section? sorting output?

2001-01-17 Thread Bernhard Krickl

Is there a way to sort the output by category?
   Basically, you can sort the output by score, time and title.
  This does not help. I'm thinking about self-defined categories,
 Categories could also be implemented via URL structures.
 If you have categories based upon META tags, you'll need to change the
 database in order to support this special information.
Ok, we'll check that.

Is there a possibility to index Shockwave Flash files?
   This is a bit harder.  I searched the web for an existing parser 
   This generic parser (see attachment) can easily be used within a wrapper
   script to at least extract links from a flash menu
  Thanx for this one, but I'll need a bit more time to check it.
  Anway, extracting links is not enough, i think. keywords or full text
  index are needed.
 Well, full text index should also be possible, but requires some more
 work on the parser.  The attached one is just a very generic one which
 dumps all the different record entries of a flash file.
I'll check that later on.


Thank you very much!
This should do it for the time being.

-- 
bernhard krickl

begin:vcard 
n:Krickl;Bernhard
tel;fax:0211 311655 33
tel;work:0211 311655 0
x-mozilla-html:FALSE
org:tro new media gmbh
adr:;;zimmerstr. 19;duesseldorf;;40215;
version:2.1
email;internet:[EMAIL PROTECTED]
x-mozilla-cpt:;0
fn:Bernhard Krickl
end:vcard




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html


[htdig] config-questions: special characters and search restriction

2001-01-17 Thread Juergen Peus


Hi all,

i have two questions:

(1) whenever i try to find something for "miles  more" i get no
matches. If i search for "miles AND more" i get a lot of matches with
several of them containing "miles  more" as a string. It seems that
the "" gets not recognized by htsearch...
I have tried this with 3.1.5 and 3.2.0b2/b3...any ideas what i'm doing
wrong??

(2) I'd like to perform different types of searches, this means you can
choose whether to search only in the documents' titles, in the
document's headings or in the documents' text. For this purpose i've
set up three different databases/configurations with different values
of title_factor, heading_factor* and text_factor. 
In the example "only in headings", i've set title_factor and
text_factor to zero - but i still find documents with the words in the
title or text and none in the header (with $PERCENT = 1)...
So what i need is the possibility the restrict htsearch to only
search within the headers/titles/text and nowhere else...

TIA

--Juergen





To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




RE: [htdig] Unable to contact server-revisisted

2001-01-17 Thread Roger Weiss

The solution has been found.
I finally found out that they were stopping and starting the (Apache) server
every hour or 2 due to some problems that aren't worth going into here.
One interesting question is, why, after the server was restarted, htdig
didn't start connecting again? It seemed that once it could not connect,
there was no going back.

Anyway, the servers have stabilized and now htdig is running fine again.

Thanks,
Roger

Roger Weiss
[EMAIL PROTECTED]
(978) 318-7301
http://www.trellix.com


-Original Message-
From: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
Sent: Thursday, January 11, 2001 11:40 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig] Unable to contact server-revisisted


According to Roger Weiss:
 I'm running htdig v3.1.5 and my digging seems to be running out of steam
 after it runs for anywhere from 20 minutes to an hour or so. The initial
msg
 was "Unable to connect to server". So, I ran it again with -v v v   to get
 the error message below.
 
 pick: ponderingjudd.xxx.com, # servers = 550
 3213:3622:2:http://ponderingjudd.xxx.com/ponderingjudd/id6.html: Unable to
 build
  connection with ponderingjudd.xxx.com:80
  no server running
 
 I've replaced part of the URL with xxx to protect the innocent. The server
 certainly is running and I had no trouble finding the mentioned url. Is
 there some parm I need to set or limit I need to raise?
 We're running an apache server with startservers =25 and minspace=10.

I guess the next question, if you're sure the server is running, is can
you access it from a client?  More specifically, can you access it using a
different web client on the same system as the one on which you're running
htdig (e.g. from lynx, Netscape, kfm, or some other Linux/Unix-based web
browser)?  If you can, then the problem will be to figure out why htdig
can't build the connection while other programs on the same system can.
If you can't access the server from any client program on the same
system, then the problem isn't with htdig, but with your network setup
(e.g. firewall, packet filtering, or a bad connection from that system).

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:
http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] problem indexing a site

2001-01-17 Thread Elsa Chan

Hi,
I am having a problem indexing with htdig. I am trying to index one 
site. I am running with -vvv and - output but the output does 
not indicate any errors. It looks like as follows: 
1:0:http://www.site.net/
New server: www.site.net, 80
It just sits there for a long while.

Thanks in advance.




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] solaris 2.6 and htdig 3.1.5

2001-01-17 Thread Gilles Detillieux

According to Ronald Edward Petty:
 I think its 3.1.5(whatever the latest stable is).  Anyways I emailed
 yesterday about this
 ares:/export/netapp/user/rpy/htdig-3.1.5/htfuzzy/ make
 c++ -o htfuzzy -L../htlib -L../htcommon -L../db/dist -L/usr/lib Endings.o
 EndingsDB.o Exact.o Fuzzy.o Metaphone.o Soundex.o SuffixEntry.o Synonym.o
 htfuzzy.o Substring.o Prefix.o ../htcommon/libcommon.a ../htlib/libht.a
 ../db/dist/libdb.a -lz -lnsl -lsocket
 /usr/local/lib/gcc-lib/sparc-sun-solaris2.6/2.95.2/libgcc.a: could not
 read symbols: Bad value
 collect2: ld returned 1 exit status
 make: *** [htfuzzy] Error 1
 
 and now I was wondering, everywhere I search on the net I get the
 impression that gcc is calling the wrong linkers.  I type as -version and
 its the gnu assembler in my path, and same for id.  So I am assuming that
 there is a version of the solaris as or id that is messing me up.  2
 questions
 
 1) Is it possible there is another problem that can be generating this?  I
 ask this so I dont have to manually link all this, i have never done that
 before so maybe i should to learn... gee
 2) If noone thinks it is another problem... how can "watch" the make file
 call the linker,etc  if I use top it doesn't show.  I do where ld and get
 5 choices, and if i do /asdf/asdf/asdf/ld - version on 2 of them I get gnu
 and the other 3 i get invalid option, could these maybe be the solaris
 versions I cant tell , there is no option listed to tell.
 
 HELP(whinny voice)
 Thanks
 Ron Petty

Try compiling some different C++ code on your system.  I'm almost certain
that this is a problem with the setup of GNU C++ on your particular machine,
and not an ht://Dig problem.  If so, then this is not the best place to get
help, and you'd probably have better luck on a GNU C++ related mailing list
or newsgroup.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] Re: Reindex

2001-01-17 Thread Gilles Detillieux

According to Elsa Chan:
 We just launched a new site, but the search engine is indexing pages that
 don't exist anymore. I think I just need to restart htdig except I don't
 know how. I trying search for info on theb htdig web site but I couldnjt
 find anything. Would you be able to help me?

Running the standard "rundig" script will rebuild your database from scratch.
You can also manually run "htdig -i" and "htmerge" to do this.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Problems with the page dates.

2001-01-17 Thread Geoff Hutchison

At 11:48 AM +0100 1/17/01, Nick Heuser wrote:
We have problems with htdig 3.1.5 on a freebsd 3.1. Htdig works fine, but the
$(MODIFIED) field stays empty unless you specify sorting by date. But even
then the date is sometimes missing.

The dates will be "missing" if the server does not return a 
Last-Modified: header for the URL. This frequently happens with CGIs 
or other server-generated content (e.g. SSI, ASP, JSP...). Try using 
the modification_time_is_now attribute, which uses the current time 
for the date.

http://www.htdig.org/attrs.html#modification_time_is_now

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] /usr/include/netinet/in.h:837: syntax error before `struct' on

2001-01-17 Thread Gilles Detillieux

According to Ramesh Veema:
 While I do a make on my application ported to SOLARIS8
 in the middle of the make i get the following error when
 when a C file tries to include netinet/in.h and i having
 doubt since this header file supports ip6 aswell, so Iam
 not clear how to correct this  error, Pls help me if any on
 came across  with this error.
 
 
 /usr/include/netinet/in.h:837: syntax error before `struct'
 /usr/include/netinet/in.h:838: syntax error before `struct'

What application are you porting here?  This is a mailing list for the
ht://Dig search engine only.  If that's the application in question,
please send in the complete output from ./configure and make, as well
as information on which version you're compiling, and what patches,
if any, you've made.  If you're talking about some other application,
I'm afraid you have the wrong list.

In any case, it sounds like something is overriding a header file
definition in a way that's incompatible with what /usr/include/netinet/in.h
expects.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




RE: [htdig] Unable to contact server-revisisted

2001-01-17 Thread Geoff Hutchison

On Wed, 17 Jan 2001, Roger Weiss wrote:

 One interesting question is, why, after the server was restarted, htdig
 didn't start connecting again? It seemed that once it could not connect,
 there was no going back.

Previous versions of htdig would keep trying a server for every URL even
if it went down in the middle of a run. But of course in almost all cases,
this just means that htdig will have to wait until the connection times
out for each URL before continuing.

Now (i.e. 3.1.5 and the current 3.2 code), htdig will mark the server as
dead and stop trying to connect. This is usually what you want to do,
though perhaps a config attribute is needed for this.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] problem indexing a site

2001-01-17 Thread Geoff Hutchison

On Wed, 17 Jan 2001, Elsa Chan wrote:

 not indicate any errors. It looks like as follows: 
 1:0:http://www.site.net/
 New server: www.site.net, 80
 It just sits there for a long while.

The first thing you should check is if you can contact this site with
another browser, e.g. lynx, Netscape, etc. The first thing htdig must
do is to retrieve the robots.txt file from the server. So if you cannot
connect to the server using other means, htdig will not be able to either
and you will have to look at networking issues.

That said, it should not just "hang" since there is a timeout set in the
connection code and the 3.1.5 version should be good about killing
connections if they timeout. How long is "a long while?"

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] config-questions: special characters and search restriction

2001-01-17 Thread Geoff Hutchison

On Wed, 17 Jan 2001, Juergen Peus wrote:

 the "" gets not recognized by htsearch...

This is correct. At the moment, only and, or, and not operators are
recognized.

 So what i need is the possibility the restrict htsearch to only
 search within the headers/titles/text and nowhere else...

This is what we usually term "field-restricted" searching. In the 3.1.x
code (and before), this is not possible except by the means you described.
In the 3.2 code, the scoring factors can be set on-the-fly for htsearch,
which minimizes the pain somewhat. But while the 3.2 code keeps track of
where words were found, at the moment there isn't any htsearch code to
restrict searches in this manner.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] Can't exclude a directory in results

2001-01-17 Thread Gilles Detillieux

Last I heard, Mindspring was still using a rather ancient beta release of
htdig-3.0.8b2, which had numerous bugs.  The exclude parameter handling
didn't work correctly until version 3.1.0b2.  Also, we had problems
with the StringMatch class used to implement the restrict and exclude
parameters to htsearch, among other things, until 3.1.0b4.  That may be
the cause of these problems.  Also, you must make absolutely sure that you
only have one definition of the input parameters "restrict" and "exclude"
in your search form, as versions before 3.1.0b4 didn't handle multiple
parameter definitions for these.  The current stable release is 3.1.5,
which has been out for almost a year now, and fixes these and many, many
other bugs.

According to Dudley Jane:
 TKO,
 
 This isn't exactly what you're doing, but, we have a form to restrict,
 but we couldn't get it to work until we said:
 
 input type=hidden name=config value=htdig  
 input type=hidden name=restrict value="www.co.henrico.va.us/hr"
 
 It didn't work if we just said value="/hr" - we had to add the www.etc.
 in front.
 
 JD
 
 
 Carrot-Top Creative wrote:
 I've searched and double checked the instructions on how to exclude a
 directory from results and can't get it to work.  I'm using htdig on
 MindSpring which means I can't do any custom configuration to the server or
 have more than one htdig install.  I need to have 2 different search pages
 that return different results.  I've successfully created a search form that
 uses restrict:
 
 input type=hidden name=config value="www57080"
 input type=hidden name=restrict value="/98study/"
 input type=hidden name=exclude value=""
 
 But I can't get exclude to work using this code to not include a particular
 directory.
 
 input type=hidden name=config value="www57080"
 input type=hidden name=restrict value=""
 input type=hidden name=exclude value="/98study/"
 
 Help what can I do?  tko


-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] config-questions: special characters and search restriction

2001-01-17 Thread Juergen Peus

On Wed, 17 Jan 2001, Geoff Hutchison wrote:

Hi Geoff,

 Date: Wed, 17 Jan 2001 10:37:31 -0500 (EST)
 
  the "" gets not recognized by htsearch...
 
 This is correct. At the moment, only and, or, and not operators are
 recognized.

hmmm...but i don't want '' to be recognized as an operator, i just want
it to be treated as just another word...:-(

Thanks!!
Juergen

---
Juergen Peus   paderLinx - Neue Informationsmedien GmbH
Geschaeftsfuehrer  Cheruskerstrasse 2b, 33102 Paderborn
[EMAIL PROTECTED] Fon: +49 5251 8994 - 11  Fax: -20
---



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] solaris 2.6 and htdig 3.1.5

2001-01-17 Thread Ronald Edward Petty


 Try compiling some different C++ code on your system.  I'm almost certain
 that this is a problem with the setup of GNU C++ on your particular machine,
 and not an ht://Dig problem.  If so, then this is not the best place to get
 help, and you'd probably have better luck on a GNU C++ related mailing list
 or newsgroup.

I did use it and it works fine.  What library of libstdc++ should be used?
I believe mine is using 2.8.1.  SHould it be using 2.95.2 or whatever it
is. I do have a unix ?, is there a way to find out what linker and
assembler gcc calls.  I heard  gcc -v will, but it just says what library
ur using not what linker assembler etc  any commands to see what it
calls.  Since the internal path is first than your path.

Thanks
Ron



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] hidden keywords

2001-01-17 Thread Gilles Detillieux

According to Stephen L Arnold:
 I'm trying to achieve some requested behavior with htdig; ie, I've 
 setup some selections in the search page (using restrict and 
 exclude) and the desired behavior is to have the user enter nothing 
 in the search field and have htdig serve up a list of all documents 
 in the directories specified by restrict/exclude.
 
 All documents are Word docs (at the moment) and previously I had 
 added the keyword "doc" to the bottom of the search form (in the 
 hidden keyword field) and it worked.  However, that was when Apache 
 was configured to allow directory indexing (and the indexes would 
 show up in the search results, along with the documents).  I turned 
 off the Apache auto-index stuff, and built a single html file with 
 URLs for all the documents for htdig to do the actual dig.
 
 However, and here's the rub, now I get a boolean search error when 
 I submit a search with no keywords, even if I put more hidden 
 keywords in the search form (that are guaranteed to be in the 
 documents).
 
 The only thing that changed was the Apache auto-index stuff; is 
 there anything I can do to get the behavior I want back again?

I'm not sure how you had it working in the first place, if this was
with 3.1.5.  You must have had some value in the "words" input parameter,
because htsearch 3.1.5 (and earlier) doesn't like it when you have
"keywords" but no "words".  Here's a patch that fixes this:

ftp://ftp.ccsf.org/htdig-patches/3.1.5/any_keywords.0

The Apache auto-index stuff made the keyword "doc" match any .doc file,
because the index uses the file names as link description text for the
link to the file, and with a non-zero description_factor, these words
have weight in the search.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] solaris 2.6 and htdig 3.1.5

2001-01-17 Thread Geoff Hutchison

On Wed, 17 Jan 2001, Ronald Edward Petty wrote:

 I did use it and it works fine.  What library of libstdc++ should be used?

Gilles mentioned that this is not particularly the proper place for these
questions. We just don't stay up-to-date on various compiler issues, while
not surprisingly, the gcc mailing list does.

 I believe mine is using 2.8.1.  SHould it be using 2.95.2 or whatever it

You should most definitely be using libstdc++ 2.95.2 because that's the
version that comes with the compiler you say you're using.

I'll echo it one more time for good measure--these and other gcc
configuration/installation questions are better off asked on the gcc
mailing list or newsgroups. See for example, http://gcc.gnu.org/

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] Re: indexing Flash (was: excluding page section...)

2001-01-17 Thread nets

Theoretically, Flash is supposed to put links and text into the HTML 
file if you check those options.  Unfortunately, it sticks them in 
comment fields.  I've had inconsistent behavior with getting it to do 
even that!

Macromedia did publish a Flash file access API or something, but it's 
not open source as far as I know.

I'm working on a report on indexing Flash, so if anyone has a 
text-heavy example, I'd love to see it!

Avi

At 1:04 PM +0100 1/17/01, Torsten Neuer wrote:
 Is there a possibility to index Shockwave Flash files?
   This is a bit harder.  I searched the web for an existing parser but
   only
   found some more-or-less useful docs and one generic parser.
  
   This generic parser (see attachment) can easily be used within a wrapper
   script to at least extract links from a flash menu, which in my opinion
   is
   the most requested feature.

  Thanx for this one, but I'll need a bit more time to check it.
  Anway, extracting links is not enough, i think. keywords or full text
  index are needed.

Well, full text index should also be possible, but requires some more
work on the parser.  The attached one is just a very generic one which
dumps all the different record entries of a flash file.  It is not de-
signed to be an axternal parser for Ht://Dig, but it works well with
the shell wrapper to extract links from flash menus.  With some addi-
tional work it shoudl be possible to produce a fully fledged external
parser out of it (yet, I haven't found the time nor did I have some
projects depending on that).


-- 
_
Complete Guide to Search Engines for Web Sites, Intranets, 
   and Portals: http://www.searchtools.com


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




RE: [htdig] problem indexing a site

2001-01-17 Thread Geoff Hutchison


If you aren't using port 80, you will need to set this in the start_url,
e.g.:

start_url: http://www.foo.com:81/

Cheers,
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

On Wed, 17 Jan 2001, Elsa Chan wrote:

 It just hangs for 10 to 15 minutes.
 
 If port 80 is not what we use, do I go and change this in the robots.txt
 file?
 
 Where is this file?
 
 Thanks



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] Re: Reindex

2001-01-17 Thread Gilles Detillieux

According to Elsa Chan:
 I try doing that, but only one file gets updated from htdig.
 
 /usr/local/htdig/db/db.docdb is the only file that gets updated.
 
 db.docs.index is still old and db.wordlist.new is created by it has 0 bytes
 
 When I try to run htmerge it gives me 
 
 htmerge: Unable to open word list file '/usr/local/htdig/db/db.wordlist'

As FAQ 5.16 explains, this happens because htdig didn't index any documents.

 I also try running htdig -vvv, but I get this
 
 1:0:http://www.site.net/
 New server: www.site.net, 80
 
 
 I specify in the config file to used a different port and I put the url in
 quotes but it doesn't seem to work properly
 
 Any ideas?

You can't use quotes in the start_url, because htdig doesn't parse it as
a quoted string list.  See http://www.htdig.org/attrs.html

The port number should be tacked right on to the end of the URL with a
colon, e.g.  start_url: http://www.site.net:8001

As for figuring out why it's hanging, and what constitutes a long while,
please see Geoff's response.

 -Original Message-
 From: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, January 17, 2001 10:18 AM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: Re: Reindex
 
 
 According to Elsa Chan:
  We just launched a new site, but the search engine is indexing pages that
  don't exist anymore. I think I just need to restart htdig except I don't
  know how. I trying search for info on theb htdig web site but I couldnjt
  find anything. Would you be able to help me?
 
 Running the standard "rundig" script will rebuild your database from
 scratch.
 You can also manually run "htdig -i" and "htmerge" to do this.


-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] Re: Problem with exclude_url

2001-01-17 Thread Gilles Detillieux

According to [EMAIL PROTECTED]:
 we have htdig 3.15.
 we wanted to index a big directory of the SAP-documentation
 the structure is as follows:
 
 directory1
  directory2
   directory3
directory4
 content.html
 frameset.html
 
 directory1
  directory2
   directory3
other_directory4
 content.html
 frameset.html
 
 and so on...
 
 We want to exclude all (!) files named frameset.htm in all directories.
 when i made: exclude_url: frameset.htm - nothing happend
 I think, that you must take the qualified path - but there are so many different
 paths in this case.
 
 I nedd something like:
 exclude_url: /directory1/ directory2/directory3/*/frameset.htm  (the asterix is
 important)
 Is this possible?

First of all, please see http://www.htdig.org/FAQ.html#q1.16
Such questions should go to the list, not to me personally.  This isn't a
one-man show.

Secondly, could you elaborate on what you mean by "nothing happened"?
Do you mean that htdig didn't index anything, or that the frameset.htm
or frameset.html files were not excluded?  Also, is the above
a typo, or did you really omit the "s" from exclude_urls?  See
http://www.htdig.org/attrs.html for correct spellings of attribute names.

Thirdly, there is no wildcard support for exclude_urls.  In version 3.2,
we're adding support for regular expressions to exclude_urls and other
attributes, which will be like wildcards only more powerful, but with
a somewhat more complicated syntax.  This is still a work in progress,
however.

You shouldn't need wildcards for this case, though, because it's a
pretty simple exclusion you're trying to do here.  However, if the only
links to some of your files, such as the content.html files, are in the
frameset.html, then you may not want to exclude them, or you'll end up
missing a whole lot more besides.  This is why I asked what you mean by
"nothing happened".  If not of the files were indexed, this may be why.
Remember that htdig only follows HTML links from one document to the
next.

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] how do you index local pages in 3.1.5?

2001-01-17 Thread Gilles Detillieux

According to Jon Beyer:
 This is probably a really easy thing, but I can't get
 htdig to index HTML from my hard drive.  I tried
 setting start_url to file:/, but that didn't work
 and I played around with local_urls_only and
 local_urls but couldn't get it to work.  Any advice is
 greatly appreciated.  Thanks.

htdig 3.1.5 doesn't handle file:/ URLs, only http://... URLs.  You can
make local_urls work with this style of URL, if the documents are on the
same system as the one on which you run htdig, using a syntax similar to
this example from my system:

start_url:  http://www.scrc.umanitoba.ca/
local_urls: http://www.scrc.umanitoba.ca/=/home/httpd/html/
local_user_urls:http://www.scrc.umanitoba.ca/=/home/,/public_html/

where /home/httpd/html corresponds to my Apache DocumentRoot setting.

Note that local_urls only indexes a certain limited set of file types,
determined by file extension.  For any other file type, or for directory
URLs where there's no index.html, it falls back to the HTTP server.

See http://www.htdig.org/attrs.html#local_urls

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] Changing sites list midway

2001-01-17 Thread htdighelp

Hi,

I am using an external url list when running my digs.
I often update these files as well and the conf file while the dig is taking place.
Is there some way to restart htdig so that is re-reads both the conf and the url list 
and
just continues on?

Mike




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] solaris 2.6 and htdig 3.1.5

2001-01-17 Thread Ronald Edward Petty


Sorry for the questions.. I just thought someone said that there is a
shared memory problem on solaris with htdig using that, and that u should
use a certain linker (namely gnu) instead of (solairs).  However the make
file that htdig comes with i cannot really figure out if there is a
certain linker to use.. this is a htdig thing not gcc.  If i do the proper
compiler and linker etc then its a  gnu problem... that is the question,
and i have not found an answer from the htdig site about what is the
proper set up of compiler, linker, assembler.   I will ask the gnu people
and see if this makes any since to them.. thanks for the help.

Ron

On Wed, 17 Jan 2001, Geoff Hutchison wrote:

 On Wed, 17 Jan 2001, Ronald Edward Petty wrote:

  I did use it and it works fine.  What library of libstdc++ should be used?

 Gilles mentioned that this is not particularly the proper place for these
 questions. We just don't stay up-to-date on various compiler issues, while
 not surprisingly, the gcc mailing list does.

  I believe mine is using 2.8.1.  SHould it be using 2.95.2 or whatever it

 You should most definitely be using libstdc++ 2.95.2 because that's the
 version that comes with the compiler you say you're using.

 I'll echo it one more time for good measure--these and other gcc
 configuration/installation questions are better off asked on the gcc
 mailing list or newsgroups. See for example, http://gcc.gnu.org/

 --
 -Geoff Hutchison
 Williams Students Online
 http://wso.williams.edu/





To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] solaris 2.6 and htdig 3.1.5

2001-01-17 Thread Gilles Detillieux

According to Ronald Edward Petty:
 Sorry for the questions.. I just thought someone said that there is a
 shared memory problem on solaris with htdig using that, and that u should
 use a certain linker (namely gnu) instead of (solairs).

I think you're confusing two unrelated responses to other problems.

There's a problem with shared library support, not shared memory,
and it affects Solaris systems running the 3.2.0 betas of htdig only.
It is resolved by using the --disable-shared option to ./configure.
This doesn't affect 3.1.5, because it's not a problem with the standard
C++ libraries, but only the libraries built in the htdig package.
3.1.5 doesn't build any shared libraries.

I doubt anyone recommended using a GNU linker.  We often recommend
using GNU make if there are problems with some of htdig's Makefiles
on some platforms.  If GNU makes a linker for Solaris, it's the first
I hear about it.  Usually, the GNU compilers will use the linker that
comes with the target operating system, if I understand things correctly.

  However the make
 file that htdig comes with i cannot really figure out if there is a
 certain linker to use.. this is a htdig thing not gcc.  If i do the proper
 compiler and linker etc then its a  gnu problem... that is the question,
 and i have not found an answer from the htdig site about what is the
 proper set up of compiler, linker, assembler.   I will ask the gnu people
 and see if this makes any since to them.. thanks for the help.

It's very, very rare to call the linker or assembler directly from a
Makefile for standard C or C++ code.  The htdig Makefiles certainly
don't attempt to do this!  Generally, for linking C programs, it's
the C or C++ front-end (e.g., cc, gcc, c++, g++) that gets called,
and this front-end is preconfigured to call the correct linker and
pass it all the required libraries.  Problems such as you reported
are a symptom of a mis-configured front-end to the C or C++ compilers,
or incompatible libraries, or both.  So, this is indeed a gcc thing.

The same goes for the assembler: gcc will typically call the first and
second stage back-end compilers for a .c file, to create a temporary .s
file, and then call the assembler to assemble it into a .o file.

If you can compile and link other C++ programs, it may be a problem with
the libraries your htdig Makefiles are telling the compiler to use,
but it could also be that your other programs are simple enough that
they don't run into similar compatibility issues.  In any case, the
htdig Makefiles don't call the linker directly.  If they call the wrong
front-end compiler, or use the wrong libraries or library directories,
you may need to change that in your Makefile.config, but this would
be a problem specific to your installation.  Lots of users have build
htdig successfully on Solaris, with nothing like the sort of errors you
reported occurring.

It might help to compare the options to gcc or g++, or whatever front-end
is used for linking the .o's and .a's in htdig, to the options used
for C++ programs you were able to link successfully.  Especially the -l
and -L options.  That might point the way to the problem you're having.
E.g., if you have different versions of libstdc++ or libg++ in different
directories, don't point g++ to an incompatible version with one of your
-L options!

-- 
Gilles R. Detillieux  E-mail: [EMAIL PROTECTED]
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




RE: [htdig] problem indexing a site

2001-01-17 Thread Elsa Chan

I try that but I still get the same message. 

1:0:http://www.site.com
New Server www.site.com , 80
And it hangs there, I also try putting the url in quotes as well in the
config file. 

Thanks




-Original Message-
From: Geoff Hutchison [EMAIL PROTECTED]
To: Elsa Chan [EMAIL PROTECTED]
CC: [EMAIL PROTECTED] [EMAIL PROTECTED]
Sent: Wed Jan 17 11:42:00 2001
Subject: RE: [htdig] problem indexing a site


If you aren't using port 80, you will need to set this in the start_url,
e.g.:

start_url: http://www.foo.com:81/

Cheers,
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

On Wed, 17 Jan 2001, Elsa Chan wrote:

 It just hangs for 10 to 15 minutes.
 
 If port 80 is not what we use, do I go and change this in the robots.txt
 file?
 
 Where is this file?
 
 Thanks


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] List of search words

2001-01-17 Thread Ben Bardill

Does ht://dig index the words that people search for? I would like to get a list of 
frequently entered words so that I can build a better site map.
All I need is a log file of words, and I can organize them with uniq and sort. 

Thanks,
Ben Bardill


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] List of search words

2001-01-17 Thread Geoff Hutchison

On Wed, 17 Jan 2001, Ben Bardill wrote:

 Does ht://dig index the words that people search for? I would like to
 get a list of frequently entered words so that I can build a better
 site map. All I need is a log file of words, and I can organize them
 with uniq and sort.

See http://www.htdig.org/attrs.html#logging which will log queries via
syslog.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




RE: [htdig] problem indexing a site

2001-01-17 Thread Geoff Hutchison

On Wed, 17 Jan 2001, Elsa Chan wrote:

 1:0:http://www.site.com
 New Server www.site.com , 80

I think we need to see your config file--if you did change your
htdig.conf, then you have done it in a manner that htdig does not
recognize.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] basic script

2001-01-17 Thread Ronald Edward Petty


Thanks for the help everyone... it was a lack of knowledge and maybe
alittle lack of documentiation (maybe i just didn't see it).  You should
say that gcc should also point to the latest 2.95.2 not just g++.  I did
not know that libstdc++ was for both... i assumed g++ , maybe u should
mention that gcc and g++ should have both  the latest stdlibc++.

Now that i have it, i assume u run htdig  I saw somewhere a script or
something that runs it on the htdig site, but  where is the link or
email.. anyone know what Im talking about or am I going crazy :)

One more thing, when u compile does the CONFIG file u adjust, can that be
changed later or is that internally added somehow...  I had to compile on
a machine that is not going to be running it.. so im wondering if this is
possible...
thanks
ron



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] environment problems

2001-01-17 Thread Ronald Edward Petty


Ok i yield, my ignorance has shown... it is our environment that is messed
up.  ex.

ajax:/export/netapp/user/rpy/htdig-3.1.5/htdig/ htdig
ld.so.1: htdig: fatal: libstdc++.so.2.10.0: open failed: No such file or
directory
Killed
ajax:/export/netapp/user/rpy/htdig-3.1.5/htdig/

This machine is not the one that it was compiled on, so that could be part
of it.  I assumed that moving everything that got made would be ok , im
wrong apparently.  So now i ask, we have a production server... this is
the server where this was compiled fine.  However my higher managers wanna
know if u run the thing like htdig and it creates databases files etc...
where what and how many are there going to be.  Can someone PLEASE give me
a detailed list of what happens and where for a first run.  I dont know
how to do this, since i can't run it on the production server unless were
sure that it wont destroy it

Thanks again
ron



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




[htdig] Memory requriements

2001-01-17 Thread Pat Lennon

I have a Linux box with approx 1 gig of html and pdf books. I want to
use htdig for the search engine. I dont want to assume to much
butwill 1 additional gig of hard disk cover the size of the index
database. I figure double may be a safe starting point. Also what type
of memory requirements should i consider at a minimum? The hardware is a
Cyrix 150 64 meg ram redhat 6.2 apache webserver. I know this is a vague
question...I would just like some reasonable starting points???



Thanks much

Pat



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] environment problems

2001-01-17 Thread Geoff Hutchison

At 5:51 PM -0500 1/17/01, Ronald Edward Petty wrote:
This machine is not the one that it was compiled on, so that could be part
of it.

Solaris has some somewhat strange ways of loading shared libraries 
and you're seeing the error messages from this. If you don't have 
libstdc++ (and any other libraries you linked in), you'll need to 
install them on this server too. You'll also probably need to set the 
environment variable LD_LIBRARY_PATH to include these libraries if 
they're in different locations than on the compiling machine.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] basic script

2001-01-17 Thread Geoff Hutchison

At 5:43 PM -0500 1/17/01, Ronald Edward Petty wrote:
Now that i have it, i assume u run htdig  I saw somewhere a script or
something that runs it on the htdig site, but  where is the link or
email.. anyone know what Im talking about or am I going crazy :)

There's the "rundig" script that is installed with the htdig, 
htmerge, htfuzzy, etc. binaries. There are also a variety of scripts 
you can use to drive things, e.g. http://www.htdig.org/contrib/

One more thing, when u compile does the CONFIG file u adjust, can that be
changed later or is that internally added somehow...

The only variable which cannot be changed in the configuration files 
is the default config directory in htsearch. It must have this to 
prevent the CGI from reading arbitrary directories on the server.

Everything else can be set--see the attribute documentation, esp. 
those ending in _dir:
http://www.htdig.org/confindex.html

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: AW: [htdig] exclude_urls vs. url_part_aliases

2001-01-17 Thread SMantscheff

I rebuilt the whole database, so corruption shouldn't be the cause of the 
problem. The problem is still reproducible. The workaround for 
the time is grabbing the search result with the original URLs and 
replacing them in a PHP script.
s.m.


Ihre Nachricht vom Wednesday 17 January 2001 16:15:
 Corruption would be my first guess too.  If it's not that, there could be
 something funky happening because of the URL rewriting.  I've seen this
 happen when you merge two databases together with incompatible settings
 of url_part_aliases, but not strictly in htsearch.  Is htsearch's config
 file the only one where you set url_part_aliases?  Does the problem go
 away if you take that definition out?

 According to Reich, Stefan:
  Looks like a Database corruption. Before trying something else, you
  should reindex your database with -i option or delete the database files
  before reindexing.
 
  Cheers
 
Stefan
 
  -Ursprngliche Nachricht-
  Von: SMantscheff [mailto:[EMAIL PROTECTED]]
  Gesendet: Dienstag, 16. Januar 2001 08:49
  An: [EMAIL PROTECTED]
  Betreff: [htdig] exclude_urls vs. url_part_aliases
 
 
  I exclude URLs like
  exclude_urls: our.server.de/F1 \
  our.server.de/F2 \
  our.server.de/F3 \
  our.server.de/F4 \
  our.server.de/F5
 
  Instead, I index a database with URLs like
  db.server/db/F1 \
  db.server/db/F2 \
  db.server/db/F3 \
  db.server/db/F4 \
  db.server/db/F5
 
  Then I rewrite URLs with
  url_part_aliases our.server.de/F db.server/db/F
 
  This works. But the results from the DB URLs are not displayed.
  By the number of pages I know that all matching documents from the
  database are found. But no document excerpts are shown. Is this a bug, a
  feature, or am I missing something?
 
  s.m.



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html




Re: [htdig] environment problems

2001-01-17 Thread Carlos Ramirez

If you successfully built in on your production machine and if it's the same OS
as your test machine. Try copying the missing library files from your prodcution
machine to your test machine. Either  copy them to /usr/lib (the standard lib
directory on Solaris). Or you can copy it to another directory and set your
LD_LIBRARY_PATH to that directory. (like Geoff states)

This usually works for me.

Good luck.

-Carlos

Ronald Edward Petty wrote:

 oh... they are on the server.. my test machine is all messes up.  I
 compiled it just fine on the server.  But moved it back to my test machine
 to see if i could just learn how to use it.  In other words i wanna see
 how many files are made , used etc, whereetc  before i use it on our
 main servers.  But my test machine wont run htdig or anything because it
 cant find um ld.o.1 or whatever it is(im home now i forgot the lib it was
 looking for)... maybe i can use this as an excuse to get a new machine :)
 Ron

 On Wed, 17 Jan 2001, Geoff Hutchison wrote:

  At 5:51 PM -0500 1/17/01, Ronald Edward Petty wrote:
  This machine is not the one that it was compiled on, so that could be part
  of it.
 
  Solaris has some somewhat strange ways of loading shared libraries
  and you're seeing the error messages from this. If you don't have
  libstdc++ (and any other libraries you linked in), you'll need to
  install them on this server too. You'll also probably need to set the
  environment variable LD_LIBRARY_PATH to include these libraries if
  they're in different locations than on the compiling machine.
 
  --
  -Geoff Hutchison
  Williams Students Online
  http://wso.williams.edu/
 

 
 To unsubscribe from the htdig mailing list, send a message to
 [EMAIL PROTECTED]
 You will receive a message to confirm this.
 List archives:  http://www.htdig.org/mail/menu.html
 FAQ:http://www.htdig.org/FAQ.html



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  http://www.htdig.org/mail/menu.html
FAQ:http://www.htdig.org/FAQ.html