Re: any good idea for loading fields into memory?

Li Li Thu, 21 Jun 2012 00:48:20 -0700

I can't use 4.0 because it's not released. our company require to use
stable version.


So I decide to wrapper an IndexSearcher with fields' values in memory like this:
and I copy all the codes of org.apache.lucene.search.SearcherManager.
replace IndexSearcher with
my IndexSearcherWithFields.

any suggestion for this solution?

public class IndexSearcherWithFields {
        protected static Logger logger =
Logger.getLogger(IndexSearcherWithFields.class);
        private Collection<String> inMemoryFields;
        private Collection<String> inMemoryMultiValueFields;
        private Map<String,Object[]> fieldsValues=new 
HashMap<String,Object[]>();
        private IndexSearcher searcher;
        
        public IndexReader getIndexReader(){
                return searcher.getIndexReader();
        }
        
        public IndexSearcherWithFields(IndexSearcher
searcher,Collection<String> inMemoryFields
                        ,Collection<String> inMemoryMultiValueFields) throws 
IOException{
                this.searcher=searcher;
                this.inMemoryFields=inMemoryFields;
                this.inMemoryMultiValueFields=inMemoryMultiValueFields;
                this.warmup();
        }
        
        public final IndexSearcher getSearcher(){
                return searcher;
        }
        
        public Object[] getField(String fn){            
                return fieldsValues.get(fn);
        }
        
        private void warmup() throws IOException{
                long start=System.currentTimeMillis();
                IndexReader reader=searcher.getIndexReader();
                int docSize=reader.maxDoc();
                for(String f:inMemoryFields){
                        Object[] arr=new Object[docSize];
                        fieldsValues.put(f, arr);
                }
                for(String f:inMemoryMultiValueFields){
                        Object[] arr=new Object[docSize];
                        fieldsValues.put(f, arr);
                }
                
                for(int i=0;i<docSize;i++){
                        Document doc=reader.document(i);
                        for(String f:inMemoryFields){
                                Object[] arr=fieldsValues.get(f);
                                arr[i]=doc.get(f);
                        }
                        
                        for(String f:inMemoryMultiValueFields){
                                Object[] arr=fieldsValues.get(f);
                                arr[i]=doc.getValues(f);
                        }
                }
                logger.debug("warm up fields time:
"+(System.currentTimeMillis()-start)+" ms.");
        }
}


On Wed, Jun 20, 2012 at 11:37 PM, Michael McCandless
<[email protected]> wrote:
> Right, the field must have a single token for FieldCache.
>
> But if you are on 4.x you can use DocTermOrds
> (FieldCache.getDocTermOrds) which allows for multiple tokens per
> field.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jun 20, 2012 at 9:47 AM, Li Li <[email protected]> wrote:
>> but as l can remember, in 2.9.x FieldCache can only apply to indexed but
>> not analyzed fields.
>> 在 2012-6-20 晚上8:59，"Danil ŢORIN" <[email protected]>写道：
>>
>>> I think you are looking for FieldCache.
>>>
>>> I'm not sure the current status in 4x, but it worked in 2.9/3.x.
>>> Basically it's an array, so access is quite straight forward, and the
>>> best part IndexReader manage those for you, so on reopen only new
>>> segments are read.
>>>
>>> Small catch is that FiledCaches are per segment, so you need to be
>>> careful if you want to retrieve data using global document ids.
>>> However if you are building result set in your own Collector, using
>>> FieldCache is quite straight forward.
>>>
>>>
>>> On Wed, Jun 20, 2012 at 3:49 PM, Li Li <[email protected]> wrote:
>>> > hi all
>>> >    I need to return certain fields of all matched documents quickly.
>>> > I am now using Document.get(field), but the performance is not well
>>> > enough. Originally I use HashMap to store these fields. it's much
>>> > faster but I have to maintain two storage systems. Now I am
>>> > reconstructing this project. I want to store everything in lucene.
>>> >    when I use an IndexSearcher to perform searching, I can get
>>> > related fields by docID. it must thread safe. And like the IndexReader
>>> > it's a snapshot of the index
>>> >    Here are some solutions I can come up with:
>>> >    1. StringIndex
>>> >       I have considered StringIndex but some fields need to tokenize.
>>> > maybe I can use two fields, one is tokenized for searching. Another is
>>> > indexed but not analyzed, the later one is only used for StringIndex.
>>> > If there is any better solution, maybe I have to use this one.
>>> >    2. Associating a Map with each IndexReader
>>> >       when the IndexReader is opened or reopened, I need to iterate
>>> > through each documents of this Reader and put everything into a map.
>>> > The problem is it's slower and I don't know whether it's problematic
>>> > with NRT.
>>> >
>>> >    is there any other better solution? thanks.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: [email protected]
>>> > For additional commands, e-mail: [email protected]
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: any good idea for loading fields into memory?

Reply via email to