RE: [htdig] Hi, need help with searching database.

2000-12-14 Thread Akshay Guleria

I didn't try this out but the problem got fixed when I ran rundig in verbose
mode based. In fact problem was with the .htaccess in my server root.

I think I should have checked that before bothering the list.


-Original Message-
From: Geoff Hutchison [mailto:[EMAIL PROTECTED]]
Sent: Friday, December 15, 2000 10:10 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig] Hi, need help with searching database.


At 6:03 PM +0530 12/13/00, Akshay Guleria wrote:
>Now, I ran rundig and it increased the file sizes in /var/lib/htdig. So, I
>presume the database was created. And then I ran htmerge. But I still get
>the "No matches found .." page.

There are a number of possible causes. My first question is whether
you get results if you run the htsearch CGI from the command-line.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] PDF Problem

2000-12-14 Thread Geoff Hutchison

At 6:32 PM +0100 12/14/00, Reich, Stefan wrote:
>On Indexing PDF Documents xpdf-0.90 and xpdf-0.92 generates:
>
>  Error: Uknown Type 0 character set: Adobe-Identity

Sounds like an xpdf bug, or a new change from Adobe. I'd suggest 
filing an xpdf bug report, along with the file itself (if that's OK 
with you).

The author, of course, is Derek Noonburg <[EMAIL PROTECTED]>

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Hi, I really need your help!

2000-12-14 Thread Geoff Hutchison

At 7:51 PM +0800 12/13/00, Sean Harris wrote:
>Even i succeeded in searching by using English words,  such as "good".
>BUT, if i use a chinese word to search for, IT FAILS!

There is no way to index Chinese (or other multibyte character languages).

Help in this direction would be appreciated.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] htmerge - tons of output

2000-12-14 Thread Geoff Hutchison

At 2:15 PM -0600 12/13/00, Dennis Director wrote:
>Running htdig-3.2.0b3-112600, when I try to merge two database, I get
>a huge amout of output to (stdout?), I stopped it before it completed with
>CNTRL-C, and it left a core file.

Haven't the faintest. I'll take a look this weekend. Seems like there 
are still some bugs lurking in the mifluz code, though unfortunately 
Loic and I haven't had the time to update things recently.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] ht://Dig version 3.1.5 with Weblogic 5.1

2000-12-14 Thread Geoff Hutchison

At 11:33 AM -0800 12/14/00, Huynh, Andy wrote:
>Read 5788 from document
>Read a total of 5788 bytes
>  retrieved but not changed
>pick: localhost, # servers = 1
>
>I was able to run htdig on IIS but having no luck with Weblogic.  Any
>comments or suggestions would be greatly appreciated.

I'm not sure what you mean. The "retrieved but not changed" message 
is saying that it has a document that is the same age or newer than 
the Last-modified date it received from the server. So it assumes 
that the site hasn't changed.

In general if you're going to make drastic changes to the server, 
it's a good idea to either reindex from scratch or to update the 
timestamps on the web files. (The latter is good because it clears 
caching servers elsewhere.)

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Hi, need help with searching database.

2000-12-14 Thread Geoff Hutchison

At 6:03 PM +0530 12/13/00, Akshay Guleria wrote:
>Now, I ran rundig and it increased the file sizes in /var/lib/htdig. So, I
>presume the database was created. And then I ran htmerge. But I still get
>the "No matches found .." page.

There are a number of possible causes. My first question is whether 
you get results if you run the htsearch CGI from the command-line.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Installation Problem under HPUX 10.20

2000-12-14 Thread Geoff Hutchison

At 5:37 AM -0800 12/13/00, Intekhab Choudhury wrote:
>I have installed gcc using swinstall utility that
>comes with HPUX
>
>% swinstall -s /tmp/gcc-2.95.2-sd-10.20.depot

OK, fair enough. (I asked, because GNU/FSF does not distribute binaries.)

Good to hear it compiles some simple programs for you. Now the 
question: what do you see in the config.log file? It should show you 
the program it tried and the error message it got.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] keywords in meta tags?

2000-12-14 Thread Geoff Hutchison

At 2:57 PM -0500 12/14/00, Matthes, Fred wrote:
>I've added the keywords_meta-tagnames.
>I've tried changing the weighting.  I've also tried the meta htdig-keywords
>- to no avail.
>
>I think that I've tried all the keyword attributes in the config file.
[...]
>'toothslayer'.  I then run htdig and then htmerge.  I then attempt to search
>for these words and they are not found.

Hmm. I've never had a problem with this, and you said you tried the attributes.

My first debugging suggestion would be to run htdig and then take a 
look at the ASCII db.wordlist file. Does this contain the words?

If not, it would help to see your config file--just so we can check 
if there's something you're missing. (Eliminate the impossible...)

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread Geoff Hutchison

On Thu, 14 Dec 2000, crosstar wrote:

> I have never run htmerge, so I thought maybe I would try
> that (possibly to solve the problem of my files not being indexed).

You must run htmerge to get usable databases. The rundig script runs
htdig, htmerge and moves files around (look at the file itself for more
details).

> However, I received the following error:
> 
> /var:  warning, user disk quota exceeded

My guess is that the /tmp directory is actually on /var. When merging,
temporary files are needed (quite a lot, actually). These files are put
into the directory specified by the TMPDIR environment variable, or /tmp
if TMPDIR is not defined.

The rundig script sets TMPDIR to the same directory as your databases
before running. Alternatively, you can point it to someplace where there
won't be any sort of quota.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread Geoff Hutchison

On Thu, 14 Dec 2000, crosstar wrote:

> The last message I got from Fred indicated that htdig would not index
> and find files unless they are listed in some upper-level file which
> includes an  We do have some links by way of a java applet directory;  however, Fred
> indicates that that will not work in this application.

That won't work for any search engine--there's no way to know that that
particular applet is going to give links. (Beyond that, it won't work for
people who don't turn on Java.)

> I have tried listing the absolute URL path-statements to where these
> files are located, as a test (listing these in the htdig.conf file).  
> But this has not been successful.

You can list as many URLs as you want in the start_url attribute, or you
can also include a file into the htdig.conf. e.g.:

start_url: `/path/to/urls.txt`

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

In the last message from Fred, I notice that he said
that he ran rundig and htmerge.

I have never run htmerge, so I thought maybe I would try
that (possibly to solve the problem of my files not being indexed).

However, I received the following error:

/var:  warning, user disk quota exceeded

Our administrator says that htdig may be trying to access
the /var directory where e-mail is stored (and which is excluded
from access).

He says that we are no where near exceeding any quotas, so he does not
understand the error.

So, should we attempt to place a restriction in htdig.conf?  Or something
else?  Or, do we not need to run htmerge at all, maybe?

Thanks.

 
-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

This is a message for Gilles or anyone who is "senior" enough with the
program to answer.

I had written to Gilles, earlier, and he had said to post the questions here.

The last message I got from Fred indicated that htdig would not index and find
files unless they are listed in some upper-level file which includes
an mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 8:33 AM
To: [EMAIL PROTECTED]
Cc: Matthes, Fred
Subject: RE: [htdig] Words and files not being found or indexed


Hi, Fred:

Well, I tried adding URLs and path statements, and no luck.

Let me give a quick example.

I have a file called rogues.html

The URL is:

/news/archives/2000/rogues.html

In this file is information about "rogues."  The title, even, is
"Rogues Gallery."

However, when I do a keyword search with htdig, the file or text
does not come up.  And this appears to be the same with most files
on the site.

Now, there is a file called archives.html

located at:

/news/archives/

which lists rogues.html in a java applet (as well as all files in that
particular sub-directory):

Rogues' Gallery;right_frame;/news/archives/2000/rogues.html|

That is the only "link" to the file, however.

Does this help you?

Is there any solution... or do I need to abandon the project
(I would hate to after coming so far!).

I am embarrassed I have to ask all these questions, but
thanks for bearing with me.


 


Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in

my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some
example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!




If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed


Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in 

[htdig] keywords in meta tags?

2000-12-14 Thread Matthes, Fred


I'm running htdig 3.1.5 on Tru64 Unix 4.0F with Apache 1.3.0.

I have add a meta tag that includes some keywords.  I'm using some keywords
you might not find on a technical intranet such as 'moondoggie' and
'toothslayer'.  I then run htdig and then htmerge.  I then attempt to search
for these words and they are not found.

What do I need to do ?

What am I doing wrong?

I've added the keywords_meta-tagnames.
I've tried changing the weighting.  I've also tried the meta htdig-keywords
- to no avail.

I think that I've tried all the keyword attributes in the config file.  

Anyone have this working for them ??

many thanks,
Fred


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Hi, Fred:

Would this be a possible solution to get all the files and text indexed?

What if I list all URL's down to the deepest level in the config file
at:

start_url:  

such as:

/news/archives/2000/dec/

Or, should I possibly add the index file to it, as well, in htdig.conf,
such as:

/news/archives/2000/dec/archives.html

I just am at wits end, at this point.

Thanks.

PS We have 25 years filed and thousands of files and hundreds of
sub-directories, so I am wondering if this is really going to work for us?



 

Well, I tried adding URLs and path statements, and no luck.

Let me give a quick example.

I have a file called rogues.html

The URL is:

/news/archives/2000/rogues.html

In this file is information about "rogues."  The title, even, is
"Rogues Gallery."

However, when I do a keyword search with htdig, the file or text
does not come up.  And this appears to be the same with most files
on the site.

Now, there is a file called archives.html

located at:

/news/archives/

which lists rogues.html in a java applet (as well as all files in that
particular sub-directory):

Rogues' Gallery;right_frame;/news/archives/2000/rogues.html|

That is the only "link" to the file, however.

Does this help you?

Is there any solution... or do I need to abandon the project
(I would hate to after coming so far!).

I am embarrassed I have to ask all these questions, but
thanks for bearing with me.


 


Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in 
my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!




If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed


Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
dir

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread Matthes, Fred


Well, I'm just digging into htdig to solve my own problem.  I seriously
doubt that any spider is going to discover a url by executing a java applet.
I think it has to in the form of mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 8:33 AM
To: [EMAIL PROTECTED]
Cc: Matthes, Fred
Subject: RE: [htdig] Words and files not being found or indexed


Hi, Fred:

Well, I tried adding URLs and path statements, and no luck.

Let me give a quick example.

I have a file called rogues.html

The URL is:

/news/archives/2000/rogues.html

In this file is information about "rogues."  The title, even, is
"Rogues Gallery."

However, when I do a keyword search with htdig, the file or text
does not come up.  And this appears to be the same with most files
on the site.

Now, there is a file called archives.html

located at:

/news/archives/

which lists rogues.html in a java applet (as well as all files in that
particular sub-directory):

Rogues' Gallery;right_frame;/news/archives/2000/rogues.html|

That is the only "link" to the file, however.

Does this help you?

Is there any solution... or do I need to abandon the project
(I would hate to after coming so far!).

I am embarrassed I have to ask all these questions, but
thanks for bearing with me.


 


Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in

my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some
example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!




If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed


Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?"  Would this refer
to our "index.html" on our site (which is ou

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Hi, Fred:

Well, I tried adding URLs and path statements, and no luck.

Let me give a quick example.

I have a file called rogues.html

The URL is:

/news/archives/2000/rogues.html

In this file is information about "rogues."  The title, even, is
"Rogues Gallery."

However, when I do a keyword search with htdig, the file or text
does not come up.  And this appears to be the same with most files
on the site.

Now, there is a file called archives.html

located at:

/news/archives/

which lists rogues.html in a java applet (as well as all files in that
particular sub-directory):

Rogues' Gallery;right_frame;/news/archives/2000/rogues.html|

That is the only "link" to the file, however.

Does this help you?

Is there any solution... or do I need to abandon the project
(I would hate to after coming so far!).

I am embarrassed I have to ask all these questions, but
thanks for bearing with me.


 


Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in 
my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!




If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed


Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?"  Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point.  But I've never seen another
search engine operate in this manner.  Usually, this is all done 
automatically.

Thanks, again.

 

t 12:44 PM 12/14/00 -0600, you wrote:
>I'm not sur

[htdig] ht://Dig version 3.1.5 with Weblogic 5.1

2000-12-14 Thread Huynh, Andy

I'm getting the following errors while trying to run htdig:

0:0:http://localhost:7001/index.htm
New server: localhost, 7001
Retrieval command for http://localhost:7001/robots.txt: GET /robots.txt
HTTP/1.0
User-Agent: htdig/3.1.5 ([EMAIL PROTECTED])
Host: localhost

Header line: HTTP/1.1 404 Not Found
Header line: Content-Length: 1613
Header line: Content-Type: text/html
Header line: Connection: Close
Header line: 
returnStatus = 1
 pushed
1:0:http://localhost:7001/index.htm skipped
pick: localhost, # servers = 1
0:0:0:http://localhost:7001/index.htm: Retrieval command for
http://localhost:7001/index.htm: GET /index.htm HTTP/1.0
User-Agent: htdig/3.1.5 ([EMAIL PROTECTED])
If-Modified-Since: Thu, 26  2000  GMT
Host: localhost

Header line: HTTP/1.1 200 OK
Header line: Server: WebLogic 5.1.0 Service Pack 5 08/17/2000 07:21:55
#79895
Header line: Content-Length: 5788
Header line: Content-Type: text/html
Header line: Last-Modified: Thu, 26 Oct 2000 01:55:34 GMT
Translated Thu, 26 Oct 2000 01:55:34 GMT to 2000-10-26 01:55:34 (100)
And converted to Thu, 26 Oct 2000 
Header line: Connection: Close
Header line: 
returnStatus = 0
Read 5788 from document
Read a total of 5788 bytes
 retrieved but not changed
pick: localhost, # servers = 1

I was able to run htdig on IIS but having no luck with Weblogic.  Any
comments or suggestions would be greatly appreciated.

Andy Huynh
Software Engineer
[EMAIL PROTECTED]
(949) 609-4726


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Hi, thanks, Fred!

I'm fighting a splitting headache over this, so pardon if I am a bit dense.

OK, I think I see the THEORY you are speaking of.  It's still a bit fuzzy in 
my mind.

Nonetheless, the question now is:  What is the practical SOLUTION (if any)?

In other words, what can I do to have my files indexed and found?

Meanwhile, I am attempting to enter more path and URL statements in
the htdig.conf file and running rundig, once again.

I'll then test to see if more files and text was picked up that way.

If you could kindly afford me a specific solution (such as "type this"
or "add this") that would be a big help.  Or, possibly refer me to some example.
That is about all I can do, insofar as I am not very technical.

We do have links from an "index" of sorts in many sub-directories,
but it is contained in a java applet and I have doubts that it would be
picked up by the "spider" (although I am not sure and am testing now
to determine this).

Thanks, again.  I really would like to use this thing!




If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed


Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?"  Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point.  But I've never seen another
search engine operate in this manner.  Usually, this is all done 
automatically.

Thanks, again.

 

t 12:44 PM 12/14/00 -0600, you wrote:
>I'm not sure Im making myself clear, and unfortunately, I am swamped
>with work.  I can't spend any more time on this right now.
>
>You don't have to have files called index.html, you just have to have 
>a path of references from the start page, to any other page that you 
>wnat indexed.

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from

RE: [htdig] Words and files not being found or indexed

2000-12-14 Thread Matthes, Fred

If I can try to help here.  First time writing mail to this type of mailing
so please forgive me.

Jason, (I hope that I get these names correct), you are talking and thinking
directory structures.  Starting at the directory where your home page is and
perusing files.

Dennis is talking web structures which I think is how htdig works.  Imagine
a spider crawling around a web perusing the files (links) that it discovers.
It 'recurses' down a link it discovers until it does all possible links.
The only difference between this and the directory tree walk is that you
have to keep track of the links you've visited since many pages might have a
link back to your home page for example.  If you blindly followed all links,
you'd just go round and round.

Now there are many webs out there, just as there are many directories on
your disks.  You start somewhere in a directory tree by supplying a
directory to a program that walks through (recurses) the tree visiting all
the leaves below that directory.
 
You start a spider on a particular web by supplying a url.  The spider
begins to walk a web branch (recurses) when it discovers a link on the
current page.  When that branch is exhausted, it then continues until it
discovers another link (url).  It does this until it has walked all links in
that particular web.

Now just because you have files in your directory that you use for your web
files does not necessarily mean that those files are on your web.  Yes, you
can access these with a browser.  But can you go to one page and by just
clicking on hyperlinks visit all of these files.  As soon as you have to
type in a url, you have discovered a file that htdig will not find.

These files do not have to be listed on the home page.  But they do need to
be accessible via links on your web site.  They have to be in a url in one
of the pages that the spider began crawling through.  I define web site by
all of those files connected to some page.  Usually, I think most people
think of that page as the home page but it doesn't have to be.  So those
files that you want htdig to find must exist somewhere on the web in a
link.+

I hope that this helps.

-Original Message-
From: crosstar [mailto:[EMAIL PROTECTED]]
Sent: Thursday, December 14, 2000 7:39 AM
To: [EMAIL PROTECTED]
Cc: Dennis Director
Subject: Re: [htdig] Words and files not being found or indexed


Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?"  Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point.  But I've never seen another
search engine operate in this manner.  Usually, this is all done 
automatically.

Thanks, again.

 

t 12:44 PM 12/14/00 -0600, you wrote:
>I'm not sure Im making myself clear, and unfortunately, I am swamped
>with work.  I can't spend any more time on this right now.
>
>You don't have to have files called index.html, you just have to have 
>a path of references from the start page, to any other page that you 
>wnat indexed.

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:


To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Thanks, Dennis.

I'll make this quick (because I know you are busy).

At 01:02 PM 12/14/00 -0600, you wrote:

>The big question is how is a viewer ever going to get to those 3,000
>pages?  

The hope is that they can reach these pages by using a good keyword
search engine.  We previously used Webglimpse and it worked just fine.
All you did was type in any word and it brought up any text from anywhere
on the site.

>How do they know about them???  

You do not know about them.  That is why we need the search engine.

>Can they click there way to every page starting at your home page?  

Nope, they cannot.

We do have a java applet which they can click on which directs them to
about a dozen pages which give the viewer a general idea of the site.
But, of course, not the specific data they seek.  Our site is
like a large reference library.

If this program simply does not index all the files or
cannot be configuered to do so, just tell me.  I have already spent
14 hours on this, getting nowhere, and can go back to Webglimpse, if
need be.

>If the answer is yes, they should get indexed by htdig.  

The answer is no.

>If the answer is no, then how do you visitors ever find
>those pages ?

Many find them because they are picked up by various web search engines,
such as yahoo, alta vista, excite, etc.  or by other sites
which place links to them.  But, we need a good search engine
which will index all files on our page, to be more specific
to our page.  Htdig looked nice, at first.

If there is any way to use it and if it can be configured to
index all or files and pages, I'd like to do so.

I can send a file with our opening page, if that would help.

Thanks.




>If you have more questions, youll have to go back to the list.
>Hope I was some help.
>
>
>On Thu, 14 Dec 2000, crosstar wrote:
>
>> Thanks, Dennis:
>> 
>> Let me make this quick, then.
>> 
>> We have approximately 3,000 files on our site.
>> 
>> Are you saying that we must list all 3,000 by name, path and
>> directory on our starting page in order for them to be indexed?
>> 
>> If so, where and how on the "starting page?"  Would this refer
>> to our "index.html" on our site (which is our default home page)?
>> 
>> Or, do you mean to list them all in the htdig.conf file?
>> 
>> If I misunderstand, kindly advise me (at your convenience).
>> Sorry it is unclear, at this point.  But I've never seen another
>> search engine operate in this manner.  Usually, this is all done 
>> automatically.
>> 
>> Thanks, again.
>> 
>>  
>> 
>> t 12:44 PM 12/14/00 -0600, you wrote:
>> >I'm not sure Im making myself clear, and unfortunately, I am swamped
>> >with work.  I can't spend any more time on this right now.
>> >
>> >You don't have to have files called index.html, you just have to have 
>> >a path of references from the start page, to any other page that you 
>> >wnat indexed.
>> 
>> -
>> The Nationalist Movement
>> PO Box 2000
>> Learned MS 39154
>> (601) 885-2288
>> Clinic: http://www.nationalist.org/board/html/index.php
>> Crosstarlist: http://www.nationalist.org/docs/resources/list.html
>> E-mail: mailto:[EMAIL PROTECTED]
>> Forum: http://www.nationalist.org/forum/index.php
>> Home Page: http://www.nationalist.org
>> ICQ: 5429992
>> Newsgroup: alt.national
>> Views not necessarily those of The Nationalist Movement
>> © 2000 by The Nationalist Movement
>> -
>> 
>> END
>> 

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] HTTPS Indexing

2000-12-14 Thread Joshua Gerth


Hi Jason,

>   Here are all of the active lines (non-comments) in my config file.
> 
> database_dir:   /wwwsys/src/htdig/db
> start_url:  https://www.myurl.com/pub/en/index.html
> limit_urls_to:  ${start_url}
> bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com
> .gif \
> .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg
> .mov .avi
> maintainer: [EMAIL PROTECTED]
> max_head_length:1
> max_doc_size:   20
> no_excerpt_show_top:true
> search_algorithm:   exact:1 synonyms:0.5 endings:0.1
> .
> . (page layout stuff)
> .
> 
> Also, I tried running the rundig script but I get the same "Unable to
> build connection" error as before.  Let me know if there's anything else
> I can do to help.

It looks like everything is correct.  Did you rebuild htdig after getting
openssl working?  If so then I think I am out of guesses as to what the
problem is.  Perhaps someone else on this list can offer some ideas.

Sorry I can't be of more help.

Joshua



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Thanks, Dennis:

Let me make this quick, then.

We have approximately 3,000 files on our site.

Are you saying that we must list all 3,000 by name, path and
directory on our starting page in order for them to be indexed?

If so, where and how on the "starting page?"  Would this refer
to our "index.html" on our site (which is our default home page)?

Or, do you mean to list them all in the htdig.conf file?

If I misunderstand, kindly advise me (at your convenience).
Sorry it is unclear, at this point.  But I've never seen another
search engine operate in this manner.  Usually, this is all done 
automatically.

Thanks, again.

 

t 12:44 PM 12/14/00 -0600, you wrote:
>I'm not sure Im making myself clear, and unfortunately, I am swamped
>with work.  I can't spend any more time on this right now.
>
>You don't have to have files called index.html, you just have to have 
>a path of references from the start page, to any other page that you 
>wnat indexed.

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




[htdig] PDF Problem

2000-12-14 Thread Reich, Stefan

Hi, this is not really a HTDIG problem, but at least it's HTDIG related ;-)

On Indexing PDF Documents xpdf-0.90 and xpdf-0.92 generates:

 Error: Uknown Type 0 character set: Adobe-Identity

Acrobat Reader shows the document ok:
(http://www.gesundheitsgespraech.de/common/koch.pdf)

Any help hint would be great

  Tnx

 Stefan




To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Hi, one additional point regarding "index" files, if I understand your point
correctly.

Are you saying that I must have files called "index.html" in
order for the program to index the files under a given subdirectory?

Here is a brief explanation.

In a sub-directory, at present, I have a name of the sub-directory,
itself, which serves as an index.

For instance, if the sub-directory is named "dogs" I have a file 
in it called "dogs.html" which then lists the various files in
that subdirectory, with links to them (such as "fleas.html," 
"doghouse.html," etc.).

Now, are you saying that I must change "dogs.html" to
"index.html" in order for htdig to index the site completely?
Or, am I misreading what you said?

This could be done, if necessary, I suppose, but remember, we have
thousands of files and hundreds of sub-directories.  The task would
be tantamount to a Florida recount!

Please advise.  Thanks.






OK, thanks, Dennis.  This explanation is a lot clearer.

I was under the impression that the program simply
indexed the entire site, minus any files or sub-directories
which were specifically excluded.

At least, that appeared to be the implication.

OK, well, I'm glad to know that I was wrong on this.

So, the question now is, how to make it operate to index and
find ALL files? (All of my files are html files, by the way).

Per your example, I do not have many links to other pages on
the main page.  In fact, most pages (there are thousands of
them) on my site do not have any links on them at all.

So, per your explanation, these files would not be indexed.  
It appears that many are indexed, however, at present,
somehow, which throws me off, a bit.

Anyhow...

My main (start-up) page is contained in an "index.html"
page, but that is the only one which has an "index.html" file.
The others just have names like you suggested, cat.html, dog.html, etc.
And, they are all under many, many subdirectories.

So... as to solution...

Are you saying that I should expressly list ALL of my subdirectories in 
htdig.conf at:

start_url:

I want to be sure on this because I have HUNDREDS of subdirectories.

If the answer is "yes," then I will proceedto list the entire directory and
sub-directory structure, there...

but...

One more point.

Can I get by by listing the MAIN sub-directories, or do I need to list
all sub- sub -sub directories (some of them going quite deep)?

Hope I'm getting close to a solution.

Thanks for the tips and patience.




At 10:34 AM 12/14/00 -0600, you wrote:
>On Thu, 14 Dec 2000, crosstar wrote:
>
>> Thanks for taking the time to reply.  :)
>> 
>> I might sound a bit dense (sorry), but bear with me, as I hope
>> to get this thing operable.
>> 
>> What do you mean by:  
>> 
>> "Path of links on pages?"
>> 
>> "Starting at the start page?"
>> 
>..
>
>> 
>> I am trying to understand your analysis, but could you,
>> perhaps, simply tell me what exactly to do (such as,
>> "type this," "cut and paste" that, or something practical
>> (rather than just theoretical).
>> 
>
>Well, I can't give you cut and paste steps, but let me try again ..
>
>If your start page is  http://www.nationalist.org/ and on that page
>is a link to say  http://www.nationalist.org/gerbils.htm, then those 2 
>pages get indexed.
>
>If the start page has a link to
>http://www.nationalist.org/pets, and the pets subdirectory has an
>index.html page which includes links to dogs.html and cats.html, then
>you will index  http://www.nationalist.org/pets/dogs.html and
>http://www.nationalist.org/pets/cats.html.
>
>BUT ! if in the pets subdirectory is a page called fish.html and there is
>no link to fish.html in the pets/index.html or the
>http://www.nationalist.org/, then fish.html will not be indexed, because
>htdig never saw it.
>
>In other words, htdig gets the start page, looks inside for links, get
>those pages, looks inside for links, get those pages, etc.
>Just because a page is accessable by your server is not enough.
>
>That's the best I can do..

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

OK, thanks, Dennis.  This explanation is a lot clearer.

I was under the impression that the program simply
indexed the entire site, minus any files or sub-directories
which were specifically excluded.

At least, that appeared to be the implication.

OK, well, I'm glad to know that I was wrong on this.

So, the question now is, how to make it operate to index and
find ALL files? (All of my files are html files, by the way).

Per your example, I do not have many links to other pages on
the main page.  In fact, most pages (there are thousands of
them) on my site do not have any links on them at all.

So, per your explanation, these files would not be indexed.  
It appears that many are indexed, however, at present,
somehow, which throws me off, a bit.

Anyhow...

My main (start-up) page is contained in an "index.html"
page, but that is the only one which has an "index.html" file.
The others just have names like you suggested, cat.html, dog.html, etc.
And, they are all under many, many subdirectories.

So... as to solution...

Are you saying that I should expressly list ALL of my subdirectories in 
htdig.conf at:

start_url:

I want to be sure on this because I have HUNDREDS of subdirectories.

If the answer is "yes," then I will proceedto list the entire directory and
sub-directory structure, there...

but...

One more point.

Can I get by by listing the MAIN sub-directories, or do I need to list
all sub- sub -sub directories (some of them going quite deep)?

Hope I'm getting close to a solution.

Thanks for the tips and patience.




At 10:34 AM 12/14/00 -0600, you wrote:
>On Thu, 14 Dec 2000, crosstar wrote:
>
>> Thanks for taking the time to reply.  :)
>> 
>> I might sound a bit dense (sorry), but bear with me, as I hope
>> to get this thing operable.
>> 
>> What do you mean by:  
>> 
>> "Path of links on pages?"
>> 
>> "Starting at the start page?"
>> 
>..
>
>> 
>> I am trying to understand your analysis, but could you,
>> perhaps, simply tell me what exactly to do (such as,
>> "type this," "cut and paste" that, or something practical
>> (rather than just theoretical).
>> 
>
>Well, I can't give you cut and paste steps, but let me try again ..
>
>If your start page is  http://www.nationalist.org/  and on that page
>is a link to say  http://www.nationalist.org/gerbils.htm, then those 2 
>pages get indexed.
>
>If the start page has a link to
>http://www.nationalist.org/pets, and the pets subdirectory has an
>index.html page which includes links to dogs.html and cats.html, then
>you will index  http://www.nationalist.org/pets/dogs.html  and
>http://www.nationalist.org/pets/cats.html.
>
>BUT ! if in the pets subdirectory is a page called fish.html and there is
>no link to fish.html in the pets/index.html or the
>http://www.nationalist.org/, then fish.html will not be indexed, because
>htdig never saw it.
>
>In other words, htdig gets the start page, looks inside for links, get
>those pages, looks inside for links, get those pages, etc.
>Just because a page is accessable by your server is not enough.
>
>That's the best I can do..

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread Dennis Director

On Thu, 14 Dec 2000, crosstar wrote:

> Thanks for taking the time to reply.  :)
> 
> I might sound a bit dense (sorry), but bear with me, as I hope
> to get this thing operable.
> 
> What do you mean by:  
> 
> "Path of links on pages?"
> 
> "Starting at the start page?"
> 
..

> 
> I am trying to understand your analysis, but could you,
> perhaps, simply tell me what exactly to do (such as,
> "type this," "cut and paste" that, or something practical
> (rather than just theoretical).
> 

Well, I can't give you cut and paste steps, but let me try again ..

If your start page is  http://www.nationalist.org/  and on that page
is a link to say  http://www.nationalist.org/gerbils.htm, then those 2 
pages get indexed.

If the start page has a link to
http://www.nationalist.org/pets, and the pets subdirectory has an
index.html page which includes links to dogs.html and cats.html, then
you will index  http://www.nationalist.org/pets/dogs.html  and
http://www.nationalist.org/pets/cats.html.

BUT ! if in the pets subdirectory is a page called fish.html and there is
no link to fish.html in the pets/index.html or the
http://www.nationalist.org/, then fish.html will not be indexed, because
htdig never saw it.

In other words, htdig gets the start page, looks inside for links, get
those pages, looks inside for links, get those pages, etc.
Just because a page is accessable by your server is not enough.

That's the best I can do..



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread crosstar

Thanks for taking the time to reply.  :)

I might sound a bit dense (sorry), but bear with me, as I hope
to get this thing operable.

What do you mean by:  

"Path of links on pages?"

"Starting at the start page?"

For example, in htdig.conf, at present, there is:

# This specifies the URL where the robot (htdig) will start.  You can specify# 
multiple URLs here.  Just separate them by some whitespace.# The example here will 
cause the ht://Dig homepage and related pages to be# indexed.# You could also index 
all the URLs in a file like so:# start_url:  
`${common_dir}/start.url`#start_url: http://www.nationalist.org/

Are you saying that this is incorrect?  Do I need to add
something here?

If so, what?

Or, are you stating that something must be added to each page
in the site?  There are thousadns of pages and this would be
a rather difficult proposition (hope that wouldn't be necessary).

If you are saying that something has to be added on each page,
what -- exactly -- needs to be added, please.

What do you mean by:

"leave the nav bar indexable, at least on the front page"

The reason I am in the dark is that about 40% of the site appears
to be indexing all right.  But the other 60% is lost.  

I am trying to understand your analysis, but could you,
perhaps, simply tell me what exactly to do (such as,
"type this," "cut and paste" that, or something practical
(rather than just theoretical).

As I said, I am not very technical.

Much appreciated.







At 09:50 AM 12/14/00 -0600, you wrote:
>On Wed, 13 Dec 2000, crosstar wrote:
>
>> I am not too technical, so I hope this sounds clear.
>> 
>> I have htdig installed.  But, although it works fine with no
>> errors, many files and words are being left out of the search and
>> indexing.
>
>I'm not any kind of htdig expert, but ..
>on my sites, I discovered, (the slow way) that htdig recurses down from
>the start page that you give it in the config file.  If there is not a
>path of links on pages, starting at the start page, that gets you to the
>subdirectory you mention, then those pages will not be fetched and
>indexed.  This happened to me when I set  tags around my
>navigation bar on all pages.  I needed to leave the nav bar indexable, at
>least on the front page, so that there were references to all sections.
>Hope this helps!

-
The Nationalist Movement
PO Box 2000
Learned MS 39154
(601) 885-2288
Clinic: http://www.nationalist.org/board/html/index.php
Crosstarlist: http://www.nationalist.org/docs/resources/list.html
E-mail: mailto:[EMAIL PROTECTED]
Forum: http://www.nationalist.org/forum/index.php
Home Page: http://www.nationalist.org
ICQ: 5429992
Newsgroup: alt.national
Views not necessarily those of The Nationalist Movement
© 2000 by The Nationalist Movement
-

END



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ:




Re: [htdig] Words and files not being found or indexed

2000-12-14 Thread Dennis Director


On Wed, 13 Dec 2000, crosstar wrote:

> I am not too technical, so I hope this sounds clear.
> 
> I have htdig installed.  But, although it works fine with no
> errors, many files and words are being left out of the search and
> indexing.

I'm not any kind of htdig expert, but ..
on my sites, I discovered, (the slow way) that htdig recurses down from
the start page that you give it in the config file.  If there is not a
path of links on pages, starting at the start page, that gets you to the
subdirectory you mention, then those pages will not be fetched and
indexed.  This happened to me when I set  tags around my
navigation bar on all pages.  I needed to leave the nav bar indexable, at
least on the front page, so that there were references to all sections.
Hope this helps!



To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  
FAQ: