[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

Ryan McKinley (JIRA) Wed, 22 Sep 2010 17:03:00 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913857#action_12913857
 ]


Ryan McKinley commented on LUCENE-2649:
---------------------------------------

bq Is the one-time-per-IndexReader-lifecycle cost of multiplying the cache load 
time by some factor < 2.0 ... really so terrible 

it can be... on a big index just iterating all the terms/docs take a long time. 
 Try the LukeRequestHandler on an index with a million+ docs!

-------------

Here is different variation, it changes *lots* but if we are talking about 
changing Parser from interface to class, then I guess the cat can be out of the 
bag.

What about something like: 
{code:java|title=FieldCache.java}
  ...

  
  public class EntryConfig implements Serializable 
  {
    public Parser getParser() {
      return null;
    }
    public boolean cacheValidBits() {
      return false;
    }
    public boolean cacheValues() {
      return true;
    }
    
    /**
     * The HashCode is used as part of the Cache Key (along with the field 
name).  
     * To allow multiple calls with different parameters, make sure the 
hashCode 
     * does not include the specific instance and parameters.
     */
    public int hashCode()
    {
      return EntryConfig.class.hashCode();
    }
  }
  
  
  public abstract class CachePopulator 
  {
    public abstract void fillValidBits(  CachedArray vals, IndexReader reader, 
String field, EntryConfig creator ) throws IOException;
    public abstract void fillByteValues( CachedArray vals, IndexReader reader, 
String field, EntryConfig creator ) throws IOException;
    ...
  }

  public abstract CachePopulator getCachePopulator();

...

  public ByteValues getByteValues(IndexReader reader, String field, EntryConfig 
creator )

...

{code}


The field cache implementation would make sure what you asked for is filled up 
before passing it back (though i think this has some concurrency issue)
{code:java}

  public ByteValues getByteValues(IndexReader reader, String field, EntryConfig 
config) throws IOException
  {
    ByteValues vals = (ByteValues) caches.get(Byte.TYPE).get(reader, new 
Entry(field, config));
    if( vals.values == null && config.cacheValues() ) {
      populator.fillByteValues(vals, reader, field, config);
    }
    if( vals.valid == null && config.cacheValidBits() ) {
      populator.fillValidBits(vals, reader, field, config);
    }
    return vals;
  }
{code}

The Cache would then delegate the creation to the populator:
{code:java}

    @Override
    protected final ByteValues createValue(IndexReader reader, Entry entry, 
CachePopulator populator) throws IOException {
      String field = entry.field;
      EntryConfig config = (EntryConfig)entry.custom;
      if (config == null) {
        return wrapper.getByteValues(reader, field, new EntryConfig() );
      }
      ByteValues vals = new ByteValues();
      if( config.cacheValues() ) {
        populator.fillByteValues(vals, reader, field, config);
      }
      else if( config.cacheValidBits() ) {
        populator.fillValidBits(vals, reader, field, config);
      }
      else {
        throw new RuntimeException( "the config must cache values and/or bits" 
);
      }
      return vals;
    }
{code}

The fillByteValues would be the same code as always, but I think the 
CachedArray should make sure the same parser is used everytime
{code:java}

    @Override
    public void fillByteValues( CachedArray vals, IndexReader reader, String 
field, EntryConfig config ) throws IOException
    {
      ByteParser parser = (ByteParser) config.getParser();
      if( parser == null ) {
        parser = FieldCache.DEFAULT_BYTE_PARSER;
      }
      // Make sure it is the same parser
      int parserHashCode = parser.hashCode();
      if( vals.parserHashCode != null && vals.parserHashCode != parserHashCode 
) {
        throw new RuntimeException( "Subsequent calls with different parser!" );
      }
      vals.parserHashCode = parserHashCode;
     ...
{code}

This is different then the current code where asking for the cached values with 
two different parsers (that return different hashcodes) will make two entries 
in the cache.

This approach would let us:
* cache values and bits independently or together
* subsequent calls with different parameters should behave reasonably
* If CachePopulator is pluggable/extendable that may make some other issues 
easier
* lets us use CachePopulator outside of the cache context (perhaps useful)





> FieldCache should include a BitSet for matching docs
> ----------------------------------------------------
>
>                 Key: LUCENE-2649
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2649
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, 
> LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
>
>
> The FieldCache returns an array representing the values for each doc.  
> However there is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a 
> BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

Reply via email to