Only indexing first part of a page (possible bug)

2001-07-26 Thread nick

Is there a word count limit per page implicitly defined in the
mnogosearch indexer?

In dubugging why some links aren't picked up in our site indexing, I've
found that for indexing a single page of about 850 words (only about
9k), only the first quarter (rough estimate) is being stored.  I have no
idea why.

This is evidenced by both words and links from the latter part of this
single page missing from the dict and url sql tables.  I can almost
spot the exact point in the page at which words stop going into the
index database by checking in the dict table.

Anyone have a clue why this is happening?  I'm using mnogosearch on a
redhat 7.1 machine with postgresql to index an SSL intranet.

Thanks, nick
___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




Re: Only indexing first part of a page (possible bug)

2001-07-26 Thread Sergey Kartashoff

Hi!

Thursday, July 26, 2001, 5:45:07 PM, you wrote:

nbcu Is there a word count limit per page implicitly defined in the
nbcu mnogosearch indexer?

nbcu In dubugging why some links aren't picked up in our site indexing, I've
nbcu found that for indexing a single page of about 850 words (only about
nbcu 9k), only the first quarter (rough estimate) is being stored.  I have no
nbcu idea why.

nbcu This is evidenced by both words and links from the latter part of this
nbcu single page missing from the dict and url sql tables.  I can almost
nbcu spot the exact point in the page at which words stop going into the
nbcu index database by checking in the dict table.

nbcu Anyone have a clue why this is happening?  I'm using mnogosearch on a
nbcu redhat 7.1 machine with postgresql to index an SSL intranet.

Please specife detail of your configuration such as:
mnogosearch version,
database type
database storage type.

And we need you indexer.conf.

-- 
Regards, Sergey aka gluke.

___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




RE: Only indexing first part of a page (possible bug)

2001-07-26 Thread Briggs, Gary

Hi,

There's a limit to the amount of content it'll download and indx for each
individual page. IIRC, it's in the indexer.conf file, but it may be compiled
in...

Gary (-;

 -Original Message-
 From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
 Sent: Thursday, July 26, 2001 3:45 PM
 To:   [EMAIL PROTECTED]
 Subject:  Only indexing first part of a page (possible bug)
 
 Is there a word count limit per page implicitly defined in the
 mnogosearch indexer?
 
 In dubugging why some links aren't picked up in our site indexing, I've
 found that for indexing a single page of about 850 words (only about
 9k), only the first quarter (rough estimate) is being stored.  I have no
 idea why.
 
 This is evidenced by both words and links from the latter part of this
 single page missing from the dict and url sql tables.  I can almost
 spot the exact point in the page at which words stop going into the
 index database by checking in the dict table.
 
 Anyone have a clue why this is happening?  I'm using mnogosearch on a
 redhat 7.1 machine with postgresql to index an SSL intranet.
 
 Thanks, nick
 ___
 If you want to unsubscribe send unsubscribe general
 to [EMAIL PROTECTED]
 


--
This message is intended only for the personal and confidential use of the designated 
recipient(s) named above.  If you are not the intended recipient of this message you 
are hereby notified that any review, dissemination, distribution or copying of this 
message is strictly prohibited.  This communication is for information purposes only 
and should not be regarded as an offer to sell or as a solicitation of an offer to buy 
any financial product, an official confirmation of any transaction, or as an official 
statement of Lehman Brothers.  Email transmission cannot be guaranteed to be secure or 
error-free.  Therefore, we do not represent that this information is complete or 
accurate and it should not be relied upon as such.  All information is subject to 
change without notice.


___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]