> Hmmm ... how many chunks of "about 50 pages" do you do before hitting this?
> Roughly how many docs are in the index when it happens?

Oh, gosh, not sure.  I'm guessing it's about half done.

> Can you describe the docs/fields you're adding?

I've got 1735 documents, 18969 pages -- average page size 10.9, max
page size 1235 (a physics textbook), 578 one-page documents.  These
are Web pages, PDFs, articles, photos, scanned stuff, technical
papers, etc.  I index six documents at a time, so I guess I'm
averaging about 65 pages per chunk.  For each document, I index the
whole text of the document as a Lucene Document, and I index the text
of each page separately as a Document.  I use the "contents" fields
and "pagecontents" fields for those two uses.  I also add metadata
information to each: "title", multiple "author" fields, "date",
"abstract", etc.

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to