Yes I see, I originally missed Terry’s response which is probably the source of 
the confusion.

So to clarify: I already know the size of the source document. As you say, this 
bears little resemblance to what actually gets written when indexed. It is this 
latter figure I was hoping to get.

Thanks everyone.

Chris



> On 5 Jul 2018, at 03:31, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> I think we're not talking about the same thing.
> 
> You asked "How can I calculate the total size of a Lucene Document"...
> 
> I was responding to the Terry's comment "In the document types I
> usually index (.pdf, .docx/.doc, .eml), there exists a metadata field
> called "stream_size" that contains the size of the document on disk. "
> 
> Two totally different beasts. One is the source document, the other is
> what you choose to put into the index from that document. Not to even
> mention that you could, for instance, choose to index only the title
> and throw everything else away so the size of the raw document on disk
> doesn't seem useful for your case.
> 
> Best,
> Erick
> 
>> On Wed, Jul 4, 2018 at 9:24 AM, Chris Bamford <ch...@bammers.net> wrote:
>> Hi Erick
>> 
>> Yes, size on disk is what I’m after as it will feed into an eventual 
>> calculation regarding actual bytes written (not interested in the source 
>> data document size, just real disk usage).
>> Thanks
>> 
>> Chris
>> 
>> Sent from my iPhone
>> 
>>> On 4 Jul 2018, at 17:08, Erick Erickson <erickerick...@gmail.com> wrote:
>>> 
>>> But does size on disk help? If the doc has a zillion
>>> images in it, those aren't part of the resulting index
>>> (I'm excluding stored data here)....
>>> 
>>>> On Wed, Jul 4, 2018 at 7:49 AM, Terry Steichen <te...@net-frame.com> wrote:
>>>> In the document types I usually index (.pdf, .docx/.doc, .eml), there
>>>> exists a metadata field called "stream_size" that contains the size of
>>>> the document on disk.  You don't have to compute it.  Thus, when you
>>>> retrieve each document you can pull out the contents of this field and,
>>>> if you like, include it in each hitlist entry.
>>>> 
>>>> 
>>>>> On 07/04/2018 05:26 AM, Chris and Helen Bamford wrote:
>>>>> Hi there,
>>>>> 
>>>>> How can I calculate the total size of a Lucene Document that I'm about
>>>>> to write to an index so I know how many bytes I am writing please?  I
>>>>> need it for some external metrics collection.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> - Chris
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to