Hi Adrian, 
I cannot tell if such thing would make it less or more robust, just thinking 
aloud  :)

I am thinking of it as a way to somehow postpone byte->type conversion to the 
moment where it is really needed.  Simply, keep byte[] around as long as 
possible.   
*Theoretically*, this should improve gc() and memory footprint for some types 
of downstream processing. It all depends how easy would something like that be.

There is already a way to achieve this by using binary field type, …  hmmm, 
maybe some lucene.expert hack to make Lucene think every field is binary wold 
be simple and robust enough? 
e.g. Visitor.transportOnlySerializedValuesWithoutTypeConversion()

---------

By the way, the trick with tim-sort in Sorter worked great. For 1.1 Mio short 
documents, the time to sort unsorted index on handful of stored fields went 
from 490 seconds to 380. 
Congrats and thanks for it! It also improved compression by 12% (very small, 4k 
chunk size)

On Mar 17, 2013, at 5:26 PM, Adrien Grand <[email protected]> wrote:

> Hi,
> 
> On Sun, Mar 17, 2013 at 2:58 PM, eksdev <[email protected]> wrote:
>> sure, there is a way to make anything -> byte[] ;)
>> 
>> it looks like this byte[]->type conversion is done deep-down and this
>> visitor user-api gets already correct types  …
>> 
>> Maybe an idea would be to delay byte[] -> type conversion to field access
>> time, i do not know what mines would be on the road to do it.
>> 
>> use cases that require identity checks, or not locale specific sorting and
>> co would benefit from having row, serialised representations without type
>> conversion…. anyhow, I could switch overt to byte[] fields completely to do
>> ii…
> 
> I understand that it is frustrating to perform a String -> byte[]
> conversion if Lucene just did the opposite. But because it needs to
> perform one random seek per document (on a file which is often large),
> the stored fields API is much slower than a String -> UTF-8 bytes
> conversion, so I think we should keep the API robust rather than
> allowing for these kinds of optimizations?
> 
> -- 
> Adrien
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to