[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Mark Miller (JIRA) Fri, 17 Apr 2009 17:26:40 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700391#action_12700391
 ]


Mark Miller commented on LUCENE-831:
------------------------------------

I've got a bit of the same feeling. My list was more or less cherry picked from 
all of the above comments, and my initial feeling was their was not enough 
motivation as well. But the more I thought about it, the more kind of ugly 
field cache is. And we would want to lose exposing Parser so that CFS can be a 
seamless backing. That makes FieldCache even uglier for a while. Clickless thus 
far here too, but I think we have a good base to work with still.

{quote}Honestly these reasons are not net/net compelling enough to warrant a
whole new API? They are fairly minor. And I agree: LUCENE-1483 has
already achieved the biggest step forward here.{quote}

Not only that, but almost all of those reasons can be handled by allowing a 
custom FieldCache to be used, rather than just hard coding to the default 
singleton.

A couple responses:

{quote}We need source pluggability for when CSF arrives (but, admittedly,
we could wait until CSF actually does arrive){quote}
We have it? Just pass the CSFValueSource at IndexReader creation?

{quote}
Allowing values to change, just like we can call
IndexReader.setNorm/deleteDoc to change norms/deletes. We'd need a
copy-on-write approach, like norms & deleted docs.{quote}
Good point. We need a way to update, that can throw USO Exception?

{quote}
How would norms be folded into this? Ideally, each field could
choose to pull its norms from any source. Document level norms
was discussed somewhere, and should easily "fit" as another norms
source. We'd need to relax how per-doc-field boosting is computed
at runtime to pull from such "arbitrary" sources.{quote}
Good point again. Getting norms under this API will add a bit more meat to this 
issue.

{quote}
Deleted docs could also be represented as a ValueSource? Just one
bit per doc. This way one could swap in whatever source for
"deleted docs" one wanted.{quote}
You've got me here at the moment. I don't know the delete code very well, but I 
will in time :)

{quote}
      Allowing for docs that have more than one value. (We'd also need
      to extend sorting to be able to compare multiple values).
{quote}
This is an interesting one, because I wonder if we can do it and stick with 
arrays? A multi dimensional array seems a bit much...

{quote}
An mmap implementation (like Lucy/KS) - should feel just like CSF
or uninversion (ie, "just another impl").{quote}
This is already fairly independent I think...

{quote}
Good impls for the enum case (all strings could be considered
enums), eg if there are only 100 unique strings in that field, you
only need 7 bits per ord derefing into the char[] values.
{quote}
+1. Yes.

{quote}
Possible future when Lucene computes sort cache (for text fields)
and stores in the index{quote}
I'm not familiar with that idea, so not sure what affect this has...

{quote}
Allowing field sort to use an entirely external source of values
{quote}
I think both options allow that now - if you pass the ValueSource from the 
reader, it can get its values from everywhere. If you override the reader 
valuesource with the sortfield valuesource, it too can load from anywhere. I am 
just not sure both options are really needed. I am kind of liking Uwe's idea of 
assigning ValueSources per field, though that could probably get messy. Perhaps 
a default, and then per field overrides? 

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
> fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
> LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, 
> LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, 
> LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

Reply via email to