Re: [htdig] Question about search engine

2000-11-20 Thread Doug Barton

[EMAIL PROTECTED] wrote:
> 
> At 11:41 AM -0800 11/20/2000, Doug Barton wrote:
> >Dmitry Lesov wrote:
> >>
> >>  Dear Sir/Madam
> >>  I have a specific question about the search engine. I am looking for the
> >>
> >>  search engine smart enough to target HTML pages back into original
> >>  frame set. Does your search engine have this capability? Please
> >>  reply ASAP.Thank You
> >
> >   No search engine I know of does, it's actually a very
> >difficult problem to
> >solve. However, there are some javascript tricks you can use. Take a look
> >at http://www.zdnet.com/devhead/stories/articles/0,4413,2438662,00.html
> 
> Actually, MondoSearch does this rather nicely
> <http://www.mondosearch.com>.  The drawback is that they have to
> reindex the entire site to do an update.

I should have clarified that I meant "freely available, open source"
search engine. :)

Doug
-- 
 Any sufficiently advanced technology is indistinguishable from magic.
 -- Arthur C. Clarke

   Do YOU Yahoo!?


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:<http://www.htdig.org/FAQ.html>




Re: [htdig] Question about search engine

2000-11-20 Thread Doug Barton

Dmitry Lesov wrote:
> 
> Dear Sir/Madam
> I have a specific question about the search engine. I am looking for the
> 
> search engine smart enough to target HTML pages back into original
> frame set. Does your search engine have this capability? Please
> reply ASAP.Thank You

No search engine I know of does, it's actually a very difficult problem to
solve. However, there are some javascript tricks you can use. Take a look
at http://www.zdnet.com/devhead/stories/articles/0,4413,2438662,00.html

Good luck,

Doug
-- 
 Any sufficiently advanced technology is indistinguishable from magic.
 -- Arthur C. Clarke

   Do YOU Yahoo!?


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] not showing child frames?

2000-11-15 Thread Doug Barton

I just found a really good article on this topic at zdnet's aforementioned
site. http://www.zdnet.com/devhead/stories/articles/0,4413,2438662,00.html

Is this worth a FAQ entry? I could probably write something up

Enjoy,

Doug
PS, please don't hold the fact that it's a frontpage article against me. :)
-- 
 Any sufficiently advanced technology is indistinguishable from magic.
 -- Arthur C. Clarke

   Do YOU Yahoo!?


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] not showing child frames?

2000-11-15 Thread Doug Barton

Geoff Hutchison wrote:
> 
> On Wed, 15 Nov 2000, Gilles Detillieux wrote:
> 
> > If you do define each page as its own unique frameset, and you're 100%
> > consistent in the naming convention you use for this, then it would
> > probably be pretty easy to map the target page URLs back to their
> > corresponding frameset URLs.  However, I have yet to see a frames-based
> > web site where this would be possible.
> 
> Someone once told me that you could do this with JavaScript. (I guess it
> detected that it was the only frame and assembled the frameset.)
> 
> I have not seen a working example of this. Perhaps some JavaScript gurus
> out there could provide one.

I am not a javascript guru, but I do play one on TV. :)


...





In this example, if someone clicks on a link in my search results page that
actually references my table of contents frame, it uses a fake 302
(location) header to cause that "page" to load up in the proper frameset by
going to the site's home, which then loads the frameset, loads the TOC,
etc. Obviously you should replace "tocframe" with the real name of the
frame that your page should be in. This works really well, as long as your
search form itself is not in a frameset. You can extend this of course with
a little imagination. 

A really good site for Javascript (and other) tips and tricks is
http://www.zdnet.com/devhead/, although I don't recall off hand where I
picked this one up. 

HTH,

Doug
-- 
 Any sufficiently advanced technology is indistinguishable from magic.
 -- Arthur C. Clarke

   Do YOU Yahoo!?


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] 3.2b2 -- include:, config_dir

2000-11-15 Thread Doug Barton

[EMAIL PROTECTED] wrote:
> 
> Documentation of these is less than entirely clear.
> 
> As to "include:", what is the implied path:

What do you mean by "implied path?"  

I tried using an include: statement in a htdig.conf file that I'm
developing. I first tried "include: $config_dir/includefile," which worked
with rundig, but not with htsearch because htsearch couldn't seem to
determine the value of $config_dir. I don't remember if I tried it with an
absolute path to the include or not...


Doug
-- 
 Any sufficiently advanced technology is indistinguishable from magic.
 -- Arthur C. Clarke

   Do YOU Yahoo!?


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] 3.20b2 -- subsequent-page-locate HTML.

2000-11-09 Thread Doug Barton

Geoff Hutchison wrote:
> 
> On Thu, 9 Nov 2000, Gilles Detillieux wrote:
> 
> > in 3.2.0b2.  In that sense, it's probably ready for release already,
> > but I think there were still some known bugs that Geoff wanted to nail
> > down first.
> 
> Yes, there are a few known bugs I'd like to squash. I'm also working on
> integrating the new htsearch query parser--it wouldn't be hard if I had a
> day or two where I could sit down and do it, but that just hasn't happened
> yet. I can't commit to any sort of date for 3.2.0b3 until I can do this.
> 
> My general policy on the beta releases is that as they get further and
> further along, I'm more likely to suggest the snapshots since they're much
> more likely to have the bugs fixed in a timely manner.

Well, if you're looking for user input, I'd like to put a vote in for
making a beta release with the current feature set cleaned up to a
reasonable degree and put off adding new features for the next beta. Having
done this kind of development myself I am very familiar with the "one more
feature" feeling, but at some point you just have to lay down the tag. :) 

Of course, I'm speaking from a selfish perspective since I have a new
project that I'd like to use htdig on that needs at least a semblance of
phrase searching capability, and b2 still has some quirks that I'm not
comfortable with. I'd also like to put a nice stable beta in the FreeBSD
ports tree, which I think will help speed acceptance of it and get you a
bunch more vict^H^H^H^H testers involved in the project. 

Doug
-- 
 Any sufficiently advanced technology is indistinguishable from magic.
 -- Arthur C. Clarke

   Do YOU Yahoo!?


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] cookies and session-ids

2000-03-23 Thread Doug Barton

Remi Fasol wrote:
> 
> --- Geoff Hutchison <[EMAIL PROTECTED]> wrote:
> 
> > But this raises another question--why are you
> > spitting back different
> > session IDs?
> 
> Apache::ASP has a feature where the session-id can be
> automatically added to urls when the user agent
> doesn't accept cookies. 

How does it determine that the ua "accepted" the cookie? If you can find
this out, it may be easy enough to have htdig fake the appopriate response. 

Doug
-- 
  "Too much of a good thing is WONDERFUL."
   -- Mae West


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.




Re: [htdig] Problems with CRONTAB

2000-03-18 Thread Doug Barton

Chris Tubutis wrote:
> 
> > # Schedule updates for the ht://Dig search data base
> >
> > # MIN HOUR DAY MONTH DAY_OF_WEEK COMMAND
> >
> > # Run rundig at 3:00 AM daily
> > * 3 * * * /home/fanac/www/htdig/bin/rundig
> 
> It looks to me like you're running it at 3:00, at 3:01, at 3:02, at
> 3:03, ... 3:59 each day.

Right-o. You want:

0 3 * * * /home/fanac/www/htdig/bin/rundig

Doug
-- 
"While the future's there for anyone to change, still you know it seems, 
 it would be easier sometimes to change the past"

 - Jackson Browne, "Fountain of Sorrow"


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



Re: [htdig] Indexing Images using external parsers

2000-03-05 Thread Doug Barton

"Rzepa, Henry" wrote:

> We have in fact modified the  htdig source code to do this, invoking an
> external parser for the purpose.

Don't tease us! Send patches. :)

Doug
-- 
"Welcome to the desert of the real." 

- Laurence Fishburne as Morpheus, "The Matrix"


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



Re: [htdig] Quick question

2000-03-05 Thread Doug Barton

"Glenn J. Rowe" wrote:
> 
> Pardon me.  I just started using htdig and just now joined this mailing
> list.  I have a question which I am sure someone will be able to answer.
> 
> I have specified a rather small list of sites that should be indexed.
> htdig does only index those sites; however, when indexing it follows
> links to sites that aren't in the list.  This poses a problem because a
> few sites have a large amount of external links on them and htdig
> follows everyone of those links.  It doesn't index them but it follows
> them thus making the indexing process take FOREVER. Is there a way to
> stop that?

What is the limit_urls set to in your htdig.conf?

Doug
-- 
"Welcome to the desert of the real." 

- Laurence Fishburne as Morpheus, "The Matrix"


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



Re: [htdig] serious htdig 3.1.5 problems

2000-02-29 Thread Doug Barton

One thing you might want to look at is the free space on the drive. There
have been several reports that 3.1.5 is indexing a lot more than previous
versions and that the db's are considerably larger. I'd say that you
probably want about 3 times the size of the old databases free before you
start the indexing, just to eliminate that as a possibility. 

Doug
-- 
"It's kind of fun to do the impossible."
-- Walt Disney


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



[htdig] FreeBSD port update to 3.1.5

2000-02-26 Thread Doug Barton

Howdy folks,

Due to the importance of the 3.1.5 upgrade I've gone ahead and updated
the FreeBSD port. I've submitted the patch in the URL below to the port
maintainer and it should be updated soon. Meanwhile, you can get the
patch at
http://freebsd.simplenet.com/htdig-FreeBSD-port-3_1_3-3_1_5.patch

To update your port with this patch you should update your ports
collection to the latest via whatever method you use (cvs, cvsup, etc.)
Then cd to /usr/ports/textproc/htdig and execute the following command. 

patch < /htdig-FreeBSD-port-3_1_3-3_1_5.patch

where  should be replaced by the actual path where you downloaded
the patch. After you patch your port you should be able to build it and
install it as you normally do. Before you deinstall the previous port
make sure to back up your htdig.conf file.

I have made one small improvement to the port that the htDig developers
might be interested in. The way the htDig Makefile currently works it
installs a new htdig.conf file if one does not exist in the $(CONFDIR)
directory. On one hand it's good that the default htDig install doesn't
nuke an existing conf file. On the other though, users might be
interested in seeing changes/updates/etc. to the new conf file. My
change installs htdig.conf.sample instead, and prevents the FreeBSD port
from deleting the htdig.conf file if you choose to deinstall or
pkg_delete it. 

I've also created a package, available at
http://freebsd.simplenet.com/htdig-3.1.5.tgz. This package was compiled
on a Pentium 90 system running a recently updated (2/19) FreeBSD
3.4-Stable system. If you are using an older system I can't guarantee
that it will work for you. 

For those that aren't aware, the FreeBSD ports/packages system is
similar to the Linux RPM system. A port is similar to a source RPM, and
a package is similar to a Linux binary RPM or a Solaris package. 

Disclaimer: I am not a representative of FreeBSD, and I provide all of
the above with no warantee whatsoever. If you use any of the above it is
at your own risk. I encourage you to carefully examine any software you
download from the internet. 

Enjoy,

Doug
-- 
"Welcome to the desert of the real." 

- Laurence Fishburne as Morpheus, "The Matrix"


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



Re: [htdig] plain text output

2000-02-16 Thread Doug Barton

Mikhail Teterin wrote:
> 
> Hello!
> 
> I'd like to  use the ht://Dig's powerful syntax and  databases to simply
> get  a  list of  files  (or  URLs) for  further  processing  -- no  HTML
> creation.

You should be able to do this easily using custom templates. Have you
looked at any of the documentation?

Good luck,

Doug
-- 
"It's kind of fun to do the impossible."
-- Walt Disney


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



Re: [htdig] Star bug

2000-02-08 Thread Doug Barton

"Bernard T. Higonnet" wrote:
> 
> Hello,
> 
> I just noticed that if one turns off star ratings for retrieved items
> with "use_star_image: no" you still get the line "Documents 1 - 2 of 2
> matches. More [star]'s indicate a better match."

I'm not sure this is a bug. Once you start down the road of customizing
the appearance of your output, these are the kinds of things that come
up. Although it would be possible to make that word a variable in the
template that gets replaced, I'm not sure it would be worth the effort.
If you turn off the star image feature, what would you like the word to
default to?

only speaking for myself,

Doug


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



Re: [htdig] Compile errors of FreeBSD 3.3

2000-01-24 Thread Doug Barton

Paul Wolstenholme wrote:
> 
> Greetings,
> 
> I'm just compiled HtDig 3.1.4 on FreeBSD 3.3.  It appears to have
> compiled but during the make there were a lot of error messages like:
> 
> gcc -o db_load  db_load.o err.o getlong.o libdb.a -lc_r

The biggest part of the problem here is the use of -lc_r instead of
-pthread, which is what the patch in the port does for you. 

> /usr/lib/libc.so: warning: this program uses gets(), which is unsafe.

Use of gets() _is_ unsafe, but not the end of the world. When was the
last time anyone did a security audit of the htdig source? Or is that
part of the 3.2 series already? 

> /usr/lib/libc.so: WARNING!  setkey(3) not present in the system!
> /usr/lib/libc.so: WARNING!  des_setkey(3) not present in the system!
> /usr/lib/libc.so: WARNING!  encrypt(3) not present in the system!
> /usr/lib/libc.so: WARNING!  des_cipher(3) not present in the system!

These errors look like you tried to enable DES for something, and you
don't have DES installed on your system. What options did you give
configure?

> Anyone else have a similar experience? How can I fix it.  The current
> port at the FreeBSD site is 3.1.3.  I've also sent a message to the
> maintainer asking him if there were plans to upgrade the port.

Fortunately the upgrade to this port is easy. I did the last
modifications to the port, and Bill (the maintainer) was kind enough to
commit my changes. I just tested compiling htdig with the following
changes, although I did not test whether it runs or not since I'm not at
home and I don't want to bring my webserver down. 

You can make the following changes to the port and it will compile:

vi /usr/ports/textproc/htdig/Makefile
Change:
DISTNAME=   htdig-3.1.3
to
DISTNAME=   htdig-3.1.4

Delete the line that says:
PATCHFILES=htdig-3.1.3-urlparmbug.patch

make fetch
make makesum
make install

Then you're all set. 

Good luck,

Doug


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



Re: [htdig] indexing

2000-01-07 Thread Doug Barton

Rickard Lundgren wrote:
> 
> Hi
> 
> I'm having a problem when I'm indexing my site, when i start htidg it goes
> smooth at first but later it take up more and more memory space (after a few
> hours) and finally the server hangs. Even though i set the priority to htdig
> to 20 i still does the same thing. But when i run htmerge och the databse
> that htdig has created so far it workds just fine
> 
> I run htidg och MacOS X Server with a G3 with 384 MB of physical memory,
> when htdig hangs the server the database is about 800 MB large, is it
> possible that the site and hence the databse is to large to handle for htdig
> and the hardware?

I would say that you have answered your own question. :)

Good luck,

Doug
-- 
"It's kind of fun to do the impossible."
-- Walt Disney


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



Re: [htdig] One solution for slow dig on Linux.

1999-12-21 Thread Doug Barton

Premier Hosting Administrator wrote:
> 
> I have a theory question now.. is there any limitations to the size or
> number of records that HtDig can perform?  Is there an OS limit in
> FreeBSD for example?

FreeBSD is an excellent platform for high-volume, high-availability web
server apps. I use htdig on FreeBSD and it performs brilliantly. You can
save yourself a small amount of aggravation on the install by using the
ports collection version of htdig (similar to RPM's for you linu-philes)
but it's not strictly necessary. 

Enjoy,

Doug
-- 
"It's kind of fun to do the impossible."
-- Walt Disney


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.



Re: AW: [htdig] irrelevant pages in search

1999-11-25 Thread Doug Barton

Hartmut Steffin wrote:
> 
> Thanks for the answer,
> 
> > > htmerge does not seem to honour the TMPDIR variable which
> > IS properly set
> this seems to be an individual problem on my machine. there is even a
> difference in running rundig from commandline (ok) and via cron/batch
> (erroneous)

It's not a plot against you, honest. :) If you get different results from
the command line and from cron it simply means that cron's environment is
different from the shell's. You might try setting the TMPDIR environment
explicitly in the crontab file and see if that improves things. 

Good luck,

Doug
-- 
"Welcome to the desert of the real." 

- Laurence Fishburne as Morpheus, "The Matrix"


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You'll receive a message confirming the unsubscription.



Re: [htdig] rundig does not finish

1999-11-11 Thread Doug Barton


Howard Ha wrote:
> 
> Hi there,
> 
> I am experiencing a weird problem and I'm not sure what the cause might
> be.  If I am digging a relatively small site everything works fine.  I can
> dig pretty large numbers of pages without problems, but when I try to add
> more sites to dig I get this problem where everything just stops. 

Have you tried digging them one at a time? I suspect that you're
probably running into a robots.txt problem.

Good luck,

Doug


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.



[htdig] FreeBSD patch

1999-10-27 Thread Doug Barton


Greetings,

The following patch updates the version of the thread library needed to
compile htdig on FreeBSD versions 3.x and 4.x. This patch is included in
the FreeBSD port/packages system (roughly equivalent to source/binary
rpm's in linux, respectively) and I've confirmed that htdig compiled
with this patch works properly. Please include it in the source
distribution for future releases. 

Thanks,

Doug


diff -ur ../htdig-3.1.3.Dist/db/dist/configure ./db/dist/configure
--- ../htdig-3.1.3.Dist/db/dist/configure   Wed Sep 22 09:18:15 1999
+++ ./db/dist/configure Tue Oct 26 18:06:57 1999
@@ -3056,7 +3056,7 @@
 
 case "$host_os" in
 freebsd*) CPPFLAGS="-D_THREAD_SAFE $CPPFLAGS"
- LIBS="-lc_r";;
+ LIBS="-pthread";;
 irix*)CPPFLAGS="-D_SGI_MP_SOURCE $CPPFLAGS";;
 osf*)CPPFLAGS="-D_REENTRANT $CPPFLAGS";;
 solaris*) CPPFLAGS="-D_REENTRANT $CPPFLAGS"
diff -ur ../htdig-3.1.3.Dist/db/dist/configure.in ./db/dist/configure.in
--- ../htdig-3.1.3.Dist/db/dist/configure.inWed Sep 22 09:18:15 1999
+++ ./db/dist/configure.in  Tue Oct 26 18:06:46 1999
@@ -405,7 +405,7 @@
 dnl libraries for threaded applications
 case "$host_os" in
 freebsd*) CPPFLAGS="-D_THREAD_SAFE $CPPFLAGS"
- LIBS="-lc_r";;
+ LIBS="-pthread";;
 irix*)CPPFLAGS="-D_SGI_MP_SOURCE $CPPFLAGS";;
 osf*)CPPFLAGS="-D_REENTRANT $CPPFLAGS";;
 solaris*) CPPFLAGS="-D_REENTRANT $CPPFLAGS"


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.