Yes I see, I originally missed Terry’s response which is probably the source of the confusion.
So to clarify: I already know the size of the source document. As you say, this bears little resemblance to what actually gets written when indexed. It is this latter figure I was hoping to get. Thanks everyone. Chris > On 5 Jul 2018, at 03:31, Erick Erickson <erickerick...@gmail.com> wrote: > > I think we're not talking about the same thing. > > You asked "How can I calculate the total size of a Lucene Document"... > > I was responding to the Terry's comment "In the document types I > usually index (.pdf, .docx/.doc, .eml), there exists a metadata field > called "stream_size" that contains the size of the document on disk. " > > Two totally different beasts. One is the source document, the other is > what you choose to put into the index from that document. Not to even > mention that you could, for instance, choose to index only the title > and throw everything else away so the size of the raw document on disk > doesn't seem useful for your case. > > Best, > Erick > >> On Wed, Jul 4, 2018 at 9:24 AM, Chris Bamford <ch...@bammers.net> wrote: >> Hi Erick >> >> Yes, size on disk is what I’m after as it will feed into an eventual >> calculation regarding actual bytes written (not interested in the source >> data document size, just real disk usage). >> Thanks >> >> Chris >> >> Sent from my iPhone >> >>> On 4 Jul 2018, at 17:08, Erick Erickson <erickerick...@gmail.com> wrote: >>> >>> But does size on disk help? If the doc has a zillion >>> images in it, those aren't part of the resulting index >>> (I'm excluding stored data here).... >>> >>>> On Wed, Jul 4, 2018 at 7:49 AM, Terry Steichen <te...@net-frame.com> wrote: >>>> In the document types I usually index (.pdf, .docx/.doc, .eml), there >>>> exists a metadata field called "stream_size" that contains the size of >>>> the document on disk. You don't have to compute it. Thus, when you >>>> retrieve each document you can pull out the contents of this field and, >>>> if you like, include it in each hitlist entry. >>>> >>>> >>>>> On 07/04/2018 05:26 AM, Chris and Helen Bamford wrote: >>>>> Hi there, >>>>> >>>>> How can I calculate the total size of a Lucene Document that I'm about >>>>> to write to an index so I know how many bytes I am writing please? I >>>>> need it for some external metrics collection. >>>>> >>>>> Thanks >>>>> >>>>> - Chris >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org