[htdig] 3.2v?

2001-01-12 Thread Emilio Bueso


When will the 3.2 version of ht://Dig be available?


Regards,

Emilio Bueso

Institut Joan Lluís Vives
Edifici Rectorat i Serveis Centrals
Universitat Jaume I
Campus del Riu Sec
12071 - Castelló de la Plana
Tel. +34 964 72 89 93
Fax. +34 964 72 89 92
[EMAIL PROTECTED]
http://www.vives.org



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] prepare a search index for a different URL

2001-01-12 Thread Matthias Kleine

Hi there,

we aim to deliver a large documentation to a customer and 
want to include a search engine in this package.

The easiest way of installation would be, if the search index
of the documentation could be created without knowing the 
final URL. Example: I create a search index for 

http://www.mydocumentation.com/projectfolder 

and all subdirectories. Afterwards, the documentation will
be used by the customer under 

http://www.customersdomain.com/projectfolder

It should now be possible to install the htdig-files and to
tell the search index that the first part of all URL's has
changed to a different value (while everything else stayed 
the same). 

Is this possible or has the search index to be built completely
new each time when the basic URL changes? (I.e. the use of a 
variable for the basic URL is not possible).

Thanks for any hints,
Matthias Kleine
-- 
-
Matthias Kleine   Phone: ++49-(0)6 11-17 31-624
Patzschke + Rasp Software AG  Fax:   ++49-(0)6 11-17 31-31
Bierstadter Straße 7  mailto:[EMAIL PROTECTED]
D-65189 Wiesbaden Web Site: http://www.prs.de/
-


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Unable to contact server-revisisted

2001-01-12 Thread David Adams

Is this server in your local network or remote?  It might be worth trying to
index it via a proxy cache.  I found that this cured the problem for us,
though it hasn't helped everybody.

Take a look at the http_proxy configuration file attribute.

--
David Adams
Computing Services
Southampton University


- Original Message -
From: "Roger Weiss" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, January 11, 2001 4:18 PM
Subject: [htdig] Unable to contact server-revisisted


> Hi,
>
> I'm running htdig v3.1.5 and my digging seems to be running out of steam
> after it runs for anywhere from 20 minutes to an hour or so. The initial
msg
> was "Unable to connect to server". So, I ran it again with -v v v   to get
> the error message below.
>
> pick: ponderingjudd.xxx.com, # servers = 550
> 3213:3622:2:http://ponderingjudd.xxx.com/ponderingjudd/id6.html: Unable to
> build
>  connection with ponderingjudd.xxx.com:80
>  no server running
>
> I've replaced part of the URL with xxx to protect the innocent. The server
> certainly is running and I had no trouble finding the mentioned url. Is
> there some parm I need to set or limit I need to raise?
> We're running an apache server with startservers =25 and minspace=10.
>
> Thanks for your help,
> Roger
>
> Roger Weiss
> [EMAIL PROTECTED]
> (978) 318-7301
> http://www.trellix.com
>
>
> 
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED]
> You will receive a message to confirm this.
> List archives:  
> FAQ:
>
>



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Problem with PDF files....

2001-01-12 Thread Elijah Kagan

Gilles,

1. I run htdig with an explicit -c option, so it uses the correct conf
file.
2. I rewrote the external_parsers so it includes only one line...
3. ..and it is the first line in the file

Results are the same! It is still looking for an acroread!

Please, help. I am getting desperate...

-- elijah


On Thu, 11 Jan 2001, Gilles Detillieux wrote:

> According to Elijah Kagan:
> > 
> > Dear Everyone
> > 
> > Hope this is the correct list to send such questions. If not, accept my
> > apologies.
> > 
> > When I run htdig on my files I get the following message when it comes to
> > a PDF document:
> > 
> > 41:41:3:http://myserver/~elijah/document.pdf: PDF::parse: cannot find pdf
> > parser /usr/local/bin/acroread  size = 1965732 
> > 
> > For some reason htdig looks for an Acrobat while its config file clearly
> > states:
> > 
> > external_parsers: application/msword->text/html /usr/local/bin/conv_doc.pl \
> >   application/postscript->text/html /usr/local/bin/conv_doc.pl \
> >   application/pdf->text/html /usr/local/bin/conv_doc.pl
> > 
> > The conv_doc.pl exists and working and the content type received from the
> > server is application/pdf.
> > 
> > Any ideas?
> ...
> > P.S.  I am running htdig 3.1.5 on a Debian system.
> 
> There are a few possibilities:
> 
> 1) htdig isn't looking at this config file, but another one, without
> the external_parsers definition;
> 2) there's a typo in the external_parsers definition that isn't showing up 
> in the text you e-mailed above, e.g. a misspelled word or a space after
> one of the backslashes at the end of the first two lines; or
> 3) there's a definition right above your external_parsers definition that
> mistakenly ends with a backslash at the end of the line, causing your
> external_parsers definition to be swallowed up by the previous line.
> 
> That htdig is attempting to invoke acroread confirms two things:  a)
> the PDF file is correctly being tagged by the server as application/pdf,
> and b) htdig is not seeing a usable definition of an external parser
> for that content-type, for any of the reasons outlined above.
> 
> -- 
> Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
> Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
> Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930
> 



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] 3.2v?

2001-01-12 Thread Geoff Hutchison

At 9:47 AM +0100 1/12/01, Emilio Bueso wrote:
>When will the 3.2 version of ht://Dig be available?

Betas are available now--the 3.2.0b3 release will be made fairly soon.

When will 3.2.0 be ready? When it's finished.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] prepare a search index for a different URL

2001-01-12 Thread Geoff Hutchison

At 11:42 AM +0100 1/12/01, Matthias Kleine wrote:
>http://www.customersdomain.com/projectfolder
>
>It should now be possible to install the htdig-files and to
>tell the search index that the first part of all URL's has
>changed to a different value (while everything else stayed
>the same).

What you'll want to do is to use the url_part_aliases attribute:


Index the site using something like this in your config file:
url_part_aliases: http://www.mydocumentation.com/ *1

Then when you install the documentation, you'll want to edit the 
config file to read:
url_part_aliases: http://www.customersdomain.com/ *1

This will encode the URLs when indexing and make sure a different 
pattern is there for decoding on the customer's end.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




AW: [htdig] prepare a search index for a different URL

2001-01-12 Thread Reich, Stefan

You can use the url_part_aliases feature to get this done. You need two
config files, one for digging, one for searching.

In the dig config, you set url_part_aliases: http://original_url.com/
replace#1
In the search config you set url_part_aliases: http://new_url.com/ replace#1

replace#1 may be any string, which is not a common word in your documents (I
always use replace#1 replace#2 and so on for such "rename" tasks)

See http://www.htdig.org/attrs.html#url_part_aliases

Bye

  Stefan

-Ursprüngliche Nachricht-
Von: Matthias Kleine [mailto:[EMAIL PROTECTED]]
Gesendet: Freitag, 12. Januar 2001 11:43
An: [EMAIL PROTECTED]
Betreff: [htdig] prepare a search index for a different URL


Hi there,

we aim to deliver a large documentation to a customer and 
want to include a search engine in this package.

The easiest way of installation would be, if the search index
of the documentation could be created without knowing the 
final URL. Example: I create a search index for 

http://www.mydocumentation.com/projectfolder 

and all subdirectories. Afterwards, the documentation will
be used by the customer under 

http://www.customersdomain.com/projectfolder

It should now be possible to install the htdig-files and to
tell the search index that the first part of all URL's has
changed to a different value (while everything else stayed 
the same). 

Is this possible or has the search index to be built completely
new each time when the basic URL changes? (I.e. the use of a 
variable for the basic URL is not possible).

Thanks for any hints,
Matthias Kleine
-- 
-
Matthias Kleine   Phone: ++49-(0)6 11-17 31-624
Patzschke + Rasp Software AG  Fax:   ++49-(0)6 11-17 31-31
Bierstadter Straße 7  mailto:[EMAIL PROTECTED]
D-65189 Wiesbaden Web Site: http://www.prs.de/
-


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] 3.2v?

2001-01-12 Thread Michael Schulz

Geoff Hutchison wrote:
> When will 3.2.0 be ready? When it's finished.

Really? :-)))

What do you think (aprox.), when the 3.2.0 will be ready?

Mike


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Problem with PDF files....

2001-01-12 Thread Gilles Detillieux

According to Elijah Kagan:
> 1. I run htdig with an explicit -c option, so it uses the correct conf
> file.
> 2. I rewrote the external_parsers so it includes only one line...
> 3. ..and it is the first line in the file
> 
> Results are the same! It is still looking for an acroread!
> 
> Please, help. I am getting desperate...

Hmm.  You're sure you're running version 3.1.5 of htdig, and you
don't have a pre-3.1.4 binary of htdig kicking around that you might be
unknowingly running instead?  External converter support was added to the
external_parsers attribute only in version 3.1.4 and above.  If you're
sure this isn't the problem either, please send me a copy of your conf
file as it stands now (preferably uuencoded right on your htdig box to
prevent e-mail mangling of it), and I'll have a look and try a test or two.

Oh, another thing.  You mentioned this was on a Debian system.  Did you
compile htdig yourself, or did you use a pre-compiled binary?  If the
latter, which one?

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] 3.1.3 engine on 3.1.5 db

2001-01-12 Thread Gilles Detillieux

According to Dave Salisbury:
> > If
> > you created your database with htdig 3.1.5, and want to search it with
> > htsearch 3.1.3, that's a bad idea.  The most glaring bug in releases
> > before 3.1.5 is in htsearch, so you really should upgrade it.
> 
> I take it one of the worst things is the security hole which allows
> a user to view any file with read permissions ( ouch! )

That's the one!

> Is there any way to correct for this with a wrapper around htsearch?
> Reading the indices using 3.1.3 that were created by a 3.1.5 engine
> seems to work just fine.

There would be, but it might be a tad tricky.  The idea is to use a
backslash to quote any left quote (`), dollar sign ($) or backslash
(\) in the query string that is part of an input parameter value that
will get added to the config object as an internal attribute setting.
The lines in htsearch/htsearch.cc that do this are (from a grep):

config.Add("match_method", input["method"]);
config.Add("template_name", input["format"]);
config.Add("matches_per_page", input["matchesperpage"]);
config.Add("config", input["config"]);
config.Add("restrict", input["restrict"]);
config.Add("exclude", input["exclude"]);
config.Add("keywords", input["keywords"]);
config.Add("sort", input["sort"]);
config.Add(form_vars[i], input[form_vars[i]]);

The last one above is the tricky one, as it can be any input parameter
name that you use in allow_in_form.  Rather that limiting the backslash
escaping of special characters to only the values of these parameters,
it might be better to do the whole query string, but exclude a few
parameters where this might be undesirable.  I'd recommend NOT doing
this for the "words" input parameter, for instance, but I can't think
of any others right off-hand where you would not want to do this.

> Anyone out there want to bash Glimpse before I look into it.  
> I'm hoping to get it at least to compile on an SGI.

I won't do any bashing, but if htdig is your preference, I'd suggest not
giving up on it too quickly.  Did you have a look at David Adams' recent
post about an "IRIX compile fix"?  In it, he forwarded a message from
Bob MacCallum that explains a workaround to some problems on IRIX 6.5,
using cc, not gcc.  If you haven't already, you ought to try that before
abandoning htdig.

> > On the other hand, if you have an existing database built with version
> > 3.1.3, and want to use it with the latest htsearch, that should work
> > without any difficulty.  However, you'll lose out on several benefits
> > in the latest htdig (better parsing of meta tags, parsing img alt text,
> > fixed parsing of URL parameters, etc.), 
> 
> Couldn't find what "fixed parsing of URL parameters" means.
> The query string is part of what's indexed??

The query string isn't indexed, but it's part of the URL.  3.1.3 mangled
bare ampersands (&) in the query string in an URL, and versions before
that didn't decode sequences like é within an URL.  I think the
ChangeLog explains it better than the release notes.

Tue Nov 23 19:52:27 1999  Gilles Detillieux  <[EMAIL PROTECTED]>

* htdig/HTML.cc(transSGML), htdig/SGMLEntities.cc(translateAndUpdate):
Fix the infamous problem in htdig 3.1.3 of mangling URL parameters that
contain bare ampersands (&), and not converting & entities in URLs.
...
Wed Sep  1 15:39:41 1999  Gilles Detillieux  <[EMAIL PROTECTED]>

* htdig/HTML.h, htdig/HTML.cc(do_tag, transSGML): Fix the HTML parser
to decode SGML entities within tag attributes.

> > which you'll only get if you
> > reindex with htdig 3.1.5.  Maybe none of these matter for your site,
> > though.  See the release notes and ChangeLog for details.
> 
> I don't think they're essential.

Except for the URL parameter mangling fix, of course.

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] 3.2v?

2001-01-12 Thread Geoff Hutchison

On Fri, 12 Jan 2001, Emilio Bueso wrote:

> I DO need phrase search. Will 3.2.0b3 support phrase searching?

Yes. As do the current development snapshots.

> Could I use indexes from 3.1.5 with 3.2.0b3?

Alas, no. The pre-3.2 code doesn't index word positions, etc. So you have
to reindex again. But there will be ways to update indexes after this
point. Reade the htdoc/RELEASE.html and htdoc/upgrade.html files for
instructions.

> > When will 3.2.0 be ready?
> > When it's finished.
> 
> A long way to go?:)

I'd guess we need at least one more beta before aiming at 3.2.0.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] 3.2v?

2001-01-12 Thread Geoff Hutchison

On Fri, 12 Jan 2001, Michael Schulz wrote:

> What do you think (aprox.), when the 3.2.0 will be ready?

Haven't the faintest. I'd hope that 3.2.0b3 will be out soon and we'll
probably need at least another beta (so add a few months before 3.2.0b4)
before aiming at 3.2.0.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] Re: need help with htdig...

2001-01-12 Thread Gilles Detillieux

According to Ran Kenig:
> I spotted your email on one of the answers in the htdig.org site and I
> thought maybe you could help me.

The archives you were most likely looking at were those of the
[EMAIL PROTECTED] mailing list.  That is where questions of this sort
should be asked.  See http://www.htdig.org/FAQ.html#q1.16

> I'm creating this simple site search engine using htdig.
> My problem is that the description that return with the page is only its
> title.
> All our titles are the same so the list of files are all the same, so I
> would like to show the meta description instead.
> Maybe I'm doing something wrong but I've added these to the config file:
>  - use_meta_description: true
>  - keywords_meta_tag_names: keywords description title
>  - title_factor: 0
> 
> I recreate the DB's and nothing changed.
> 
> you can see the search page in: http://www.lsvl.net/home_search.html
> Can you help me ???

The use_meta_description attribute gets htsearch to use the meta
description in place of the document excerpt, not in place of the title.
Also, the keywords_meta_tag_names is for specifying which meta tags will
be parsed as keywords, and added to the word database according to the
keyword_factor.  As meta description text is already added to the word
database according to the meta_description_factor, there's really no
point in listing "description" in keywords_meta_tag_names.  Finally,
setting title_factor to 0 will prevent the words in the title from
bearing any weight in the search, but it won't prevent the titles from
showing up on the results page, because the templates still use them.
So, none of the 3 attribute settings is particularly appropriate for
what you're trying to accomplish.

I tried a search at your search page above and got this error:
  Not Found

  The requested URL /cgi-bin/hierArrays.js was not found on this server.

So, there's apparently another configuration problem you need to deal
with as well.  You shouldn't use relative file name references in your
htsearch templates, because they're interpreted relative to the directory
in with htsearch is found, not the one in which the templates are found.

If you want the meta descriptions to replace the titles, first of all
you should set use_meta_description back to false, because you don't
want the descriptions to appear as both the title and as the text.
Next, you should apply this patch to htsearch/Display.cc, so you can
use the meta description in the result templates:

--- htsearch/Display.cc.origThu Feb 17 10:43:28 2000
+++ htsearch/Display.cc Fri Jan 12 11:12:13 2001
@@ -273,6 +273,7 @@ Display::displayMatch(ResultMatch *match
vars.Remove("ANCHOR");
   }
 
+vars.Add("METADESCRIPTION", new String(ref->DocMetaDsc()));
 vars.Add("SCORE", new String(form("%d", match->getScore(;
 vars.Add("CURRENT", new String(form("%d", current)));
 char   *title = ref->DocTitle();

Then, you'll need to uncomment the template_map definition from your
htdig.conf file, and edit the common/long.html and common/short.html
template files to use $&(METADESCRIPTION) instead of $&(TITLE).

(All of this assumes you're running version 3.1.5 of ht://Dig.)

Of course, none of this will do much good unless you consistently
have meaningful meta description tags in all the pages on your site.
As I was browsing through almost a half-dozen pages there, I didn't
come across any that did have a meta description.  If you're going to
change all those pages, it seems to me it would make more sense to
put the meaninful descriptions within the ... tags,
rather than in a  tag, and
avoid all the configuration changes.

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] Performance problems with htdig 3.2.0b2

2001-01-12 Thread Mathias Rohland

Hello to all on this list,

I have a problem with the performance of htdig 3.2.0b2. I'm indexing 
about +25.500 HTML-docs at the moment and it takes several (+8) hours 
to index them on a machine that's not to busy with outher tasks (PII 233
w/ 512K Cache and 128MB RAM).
All documents are active content coming from a MySQL database and are
being created using a php-script.
The machine is running SuSE Linux 7.0, Apache 1.3.x (no fancy_indexing)
w/ php4 and MySQL 3.22.x.

As all docs are created using one php-script I created a three-level 
index stage where the 1st level contains only links to the 2nd that
contains up to 500 links to the script that actually creates the 
content.

index.php3
   |
   +---index2.php3?START_ID=x&END_ID=x+499
   |  |
   |  +---content.php3?x (x being the docid - primary key in db)
   |  | 
   |  +---content.php3?x+1
   |  |
   |  ...
   |  |
   |  +---content.php3?x+499
   |
   +---index2.php3?START_ID=x+500&END_ID=x+999
   |
   ...

The documents are not that big. The maximum size of one document is
definitely below 32KB.

Unfortunately I also have to index numbers as docids and several other
informations in this database consist of numbers only.

I need to use htdig 3.2.0b2 as we need phrase searching and a second
machine in another location that runs with solaris won't like 3.2.0b3.

The htdig.conf I'm using can be seen below:

<-- cut here ->
#
# Example config file for ht://Dig.
#
# This configuration file is used by all the programs that make up
ht://Dig.
# Please refer to the attribute reference manual for more details on
what
# can be put into this file.  (http://www.htdig.org/confindex.html)
# Note that most attributes have very reasonable default values so you
# really only have to add attributes here if you want to change the
defaults.
#
# What follows are some of the common attributes you might want to
change.
#

#
# Specify where the database files need to go.  Make sure that there is
# plenty of free disk space available for the databases.  They can get
# pretty big.
#
database_dir:   /local/htdig/db

#
# This specifies the URL where the robot (htdig) will start.  You can
specify
# multiple URLs here.  Just separate them by some whitespace.
# The example here will cause the ht://Dig homepage and related pages to
be
# indexed.
# You could also index all the URLs in a file like so:
# start_url:   `${common_dir}/start.url`
#
start_url:  http://192.168.1.27/volltext/index.php3

#
# This attribute limits the scope of the indexing process.  The default
is to
# set it to the same as the start_url above.  This way only pages that
are on
# the sites specified in the start_url attribute will be indexed and it
will
# reject any URLs that go outside of those sites.
#
# Keep in mind that the value for this attribute is just a list of
string
# patterns. As long as URLs contain at least one of the patterns it will
be
# seen as part of the scope of the index.
#
#limit_urls_to: ${start_url}
limit_urls_to:  index.php3 index2.php3 content.php3

#
# If there are particular pages that you definitely do NOT want to
index, you
# can use the exclude_urls attribute.  The value is a list of string
patterns.
# If a URL matches any of the patterns, it will NOT be indexed.  This is
# useful to exclude things like virtual web trees or database accesses. 
By
# default, all CGI URLs will be excluded.  (Note that the /cgi-bin/
convention
# may not work on your web server.  Check the  path prefix used on your
web
# server.)
#
exclude_urls:   /cgi-bin/ .cgi search.php3

#
# Since ht://Dig does not (and cannot) parse every document type, this 
# attribute is a list of strings (extensions) that will be ignored
during 
# indexing. These are *only* checked at the end of a URL, whereas 
# exclude_url patterns are matched anywhere.
#
# Also keep in mind that while other attributes allow regex, these must
be 
# actual strings.
#
bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \
.jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi

#
# The string htdig will send in every request to identify the robot. 
Change
# this to your email address.
#
maintainer: webmaster

#
# The excerpts that are displayed in long results rely on stored
information
# in the index databases.  The compiled default only stores 512
characters of
# text from each document (this excludes any HTML markup...)  If you
plan on
# using the excerpts you probably want to make this larger.  The only
concern
# here is that more disk space is going to be needed to store the
additional
# information.  Since disk space is cheap (! :-)) you might want to set
this
# to a value so that a large percentage of the documents that you are
going
# to be indexing are stored completely in the database.  At SDSU we
found
# that by setting this value to abou

[htdig] Invitations aux soldes privés de Grandes marques !

2001-01-12 Thread RICO




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Performance problems with htdig 3.2.0b2

2001-01-12 Thread Gilles Detillieux

According to Mathias Rohland:
> I have a problem with the performance of htdig 3.2.0b2. I'm indexing 
> about +25.500 HTML-docs at the moment and it takes several (+8) hours 
> to index them on a machine that's not to busy with outher tasks (PII 233
> w/ 512K Cache and 128MB RAM).
...
> I need to use htdig 3.2.0b2 as we need phrase searching and a second
> machine in another location that runs with solaris won't like 3.2.0b3.

3.2.0b3 is still a work in progress, but it already fixes a large number
of bugs in 3.2.0b2.  Try the latest snapshot of b3, and if you still
can't compile it on Solaris, please e-mail us at this list the output
of the configure and make runs.  I don't think it makes sense for us to
take time debugging an old beta version when the real problem here is
you can't build the latest beta pre-release.

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] htdig ignores *.doc file extension

2001-01-12 Thread Evelio Martinez


 
Hello!
I have installed htdig  under RH 6.2 and I have followed the README
files instructions.
1) conf/htdig.conf
    does not contain nothing related to *.doc or *.pdf
documents in  bad_extensions:
2) external_parsers:   application/msword->text/html
/opt/www/htdig/scripts/doc2html.pl \

application/pdf->text/html /opt/www/htdig/scripts/doc2html.pl
3) Variables in doc2html  point to the correct place
   $CATDOC = "/usr/local/bin/catdoc";
   $CATPDF = "/usr/local/bin/pdftotext";
  $PDFINFO = "/usr/local/bin/pdfinfo";
htdig is ignoring the files with pdf and doc extension.
Did I miss something?
Any suggestion?
Thanks in advance
-- 
Evelio Martínez
Testanet. Dept. desarrollo software.
Av. Reino de Valencia, 15 - 5
46005 Valencia (Spain)
Tel: +34 96 395 90 00
Fax: +34 96 316 23 19
 


[htdig] how to set the $(PERCENT)? -it always show 1%

2001-01-12 Thread Edward Lu



Hi 
Htdig,
I am 
trying to set the $(PERCENT) on my result pages. But it always shows 1% any 
result I got. 
Is 
there any configuration I need to set?
 
By the 
way, Htdig is very good. Better than the AltaVista search engine. I tried 
both.
 
Looking forward your reply.
 
Thanks!
-Ed
 


  
  

  

  

 

  Edward Lu
  ConsultantFort Point Partners 
Inc.
  

  
Builders of Internet Solutions that Sell Harder111 Sutter St, 
  22nd Floor, San Francisco, CA 94104
  

  
tel 
(415) 
  762-3751
[EMAIL PROTECTED]
  
fax 
(415) 
  395-4783
http://www.fortpoint.com
 


Re: [htdig] how to set the $(PERCENT)? -it always show 1%

2001-01-12 Thread Geoff Hutchison

On Fri, 12 Jan 2001, Edward Lu wrote:

> I am trying to set the $(PERCENT) on my result pages. But it always shows 1%
> any result I got. 

I'm going to guess that you're using 3.2.0b2. I strongly suggest using
either the production version 3.1.5 or one of the latest development
snapshots:


--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] htdig ignores *.doc file extension

2001-01-12 Thread Geoff Hutchison

On Fri, 12 Jan 2001, Evelio Martinez wrote:

> htdig is ignoring the files with pdf and doc extension.

By this, I assume you mean they're not indexed.

Try running htdig -vvv and take a look at what happens when it encounters
a link to a PDF file. Does it reject the link? Or does it get to the link
and try to index it later?

If it's the former, then one of your limits is set incorrectly. (e.g.
bad_extensions, valid_extensions, exclude_urls, limit_urls_to ...)

If it's the latter, then make sure you can run a .doc or a .pdf through
the external converter itself and get reasonable-looking output.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] how to set the $(PERCENT)? -it always show 1%

2001-01-12 Thread Geoff Hutchison

On Fri, 12 Jan 2001, Edward Lu wrote:

> Thanks! Geoff. I think that is the problem. 
> But I am not sure where to find the production version 3.1.5. Apparently it
> is not on www.htdig.org web site. 

Really? Try  -- looks pretty good to
me...

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] how to set the $(PERCENT)? -it always show 1%

2001-01-12 Thread Edward Lu

Geoff,
What is the security hole in version 3.1.5?
It sounds scary. 

-Ed

-Original Message-
From: Geoff Hutchison [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 12, 2001 1:47 PM
To: Edward Lu
Cc: '[EMAIL PROTECTED]'
Subject: Re: [htdig] how to set the $(PERCENT)? -it always show 1%


On Fri, 12 Jan 2001, Edward Lu wrote:

> I am trying to set the $(PERCENT) on my result pages. But it always shows
1%
> any result I got. 

I'm going to guess that you're using 3.2.0b2. I strongly suggest using
either the production version 3.1.5 or one of the latest development
snapshots:


--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] security hole (was: how to set the $(PERCENT)? -it always show 1%)

2001-01-12 Thread Gilles Detillieux

According to Edward Lu:
> Geoff,
> What is the security hole in version 3.1.5?
> It sounds scary. 

The security hole is in version BEFORE 3.1.5, and is fixed in 3.1.5.  It
allowed a user to snoop through any file on your web server's file system,
as long as it was readable by the user ID under which the web server process
runs, just by passing it a special query string in the htsearch URL.

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] any suggestions for using 3.1.5 or 3.2.0b2?

2001-01-12 Thread Edward Lu

According to the release note for htdig-3.2.0b2. It added more functionality
and fixed all known bugs after 3.1.5
But apparently it still has the relevance ($(PERCENT)) bug and not stable
enough. 
I am asking for any suggestions about which version (3.1.5 or 3.2.0b2)
should be used for our company web site. 
Any experience about the advantage and disadvantage of both the versions?

Any suggestions will be greatly appreciated.

-Edward

-Original Message-
From: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
Sent: Friday, January 12, 2001 2:45 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig] security hole (was: how to set the $(PERCENT)? -it
always show 1%)


According to Edward Lu:
> Geoff,
> What is the security hole in version 3.1.5?
> It sounds scary. 

The security hole is in version BEFORE 3.1.5, and is fixed in 3.1.5.  It
allowed a user to snoop through any file on your web server's file system,
as long as it was readable by the user ID under which the web server process
runs, just by passing it a special query string in the htsearch URL.

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:
http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] Grouping words

2001-01-12 Thread Jason Meyering

Is it possible to group words in a search via quotes or anything?  For
example, I tried searching for an answer on htdig.org and a search for
"group words" (including the quotes) performed a search for "group and
words" but not the string "group words".  The same thing happens on my test
server--when I try to group words in quotes, htsearch always splits them up
into individual words.  Is there any way to change this behaviour or any
characters I can put in the search string that will allow grouping?  Thanks
in advance!

J


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] Re: any suggestions for using 3.1.5 or 3.2.0b2?

2001-01-12 Thread Gilles Detillieux

According to Edward Lu:
> According to the release note for htdig-3.2.0b2. It added more functionality
> and fixed all known bugs after 3.1.5
> But apparently it still has the relevance ($(PERCENT)) bug and not stable
> enough. 
> I am asking for any suggestions about which version (3.1.5 or 3.2.0b2)
> should be used for our company web site. 
> Any experience about the advantage and disadvantage of both the versions?
> 
> Any suggestions will be greatly appreciated.
> 
> -Edward

It's correct that 3.2.0b2 fixed many known bugs in 3.1.5, but none of
these were earth-shattering problems.  There were many limitations,
though, in the 3.1.x series that required a pretty radical redesign of
many components.

While 3.2.0b2 did fix some bugs, it introduced a whole lot because of
the large number of redesigned/rewritten components.  That's why 3.2 is
still in beta.  The latest 3.2.0b3 pre-release source snapshot fixes a
lot of the 3.2.0b2 bugs, but there are still some that remain.

If you need the features of 3.2, then use the b3 snapshots, not the
b2 release.  If you don't need these features, and the limitations of
3.1 aren't a problem for you, then you'd be wise to stick to 3.1.5 for
a production system until 3.2 gets a bit more of a shakeout.

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Grouping words

2001-01-12 Thread Geoff Hutchison

On Fri, 12 Jan 2001, Jason Meyering wrote:

> Is it possible to group words in a search via quotes or anything?  For

See the FAQ: 

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Grouping words

2001-01-12 Thread Gilles Detillieux

According to Jason Meyering:
> Is it possible to group words in a search via quotes or anything?  For
> example, I tried searching for an answer on htdig.org and a search for
> "group words" (including the quotes) performed a search for "group and
> words" but not the string "group words".  The same thing happens on my test
> server--when I try to group words in quotes, htsearch always splits them up
> into individual words.  Is there any way to change this behaviour or any
> characters I can put in the search string that will allow grouping?  Thanks
> in advance!

See http://www.htdig.org/FAQ.html#q1.9

-- 
Gilles R. Detillieux  E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre   WWW:http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:(204)789-3930


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ: