Re: The most efficient way to get un-inverted view of the index?
in case this helps someone, here is a solution (probably very efficient already, but i didn't profile it); it can deal with DocValues and with FieldCache (the old 'stored' values) private void unInvertedTheDamnThing( SolrIndexSearcher searcher, List fields, KVSetter setter) throws IOException { LeafReader reader = searcher.getLeafReader(); IndexSchema schema = searcher.getCore().getLatestSchema(); List leaves = reader.getContext().leaves(); Bits liveDocs; LeafReader lr; Transformer transformer; for (LeafReaderContext leave: leaves) { int docBase = leave.docBase; liveDocs = leave.reader().getLiveDocs(); lr = leave.reader(); FieldInfos fInfo = lr.getFieldInfos(); for (String field: fields) { FieldInfo fi = fInfo.fieldInfo(field); SchemaField fSchema = schema.getField(field); DocValuesType fType = fi.getDocValuesType(); Mapmapping = new HashMap (); final LeafReader unReader; if (fType.equals(DocValuesType.NONE)) { Class c = fType.getClass(); if (c.isAssignableFrom(TextField.class) || c.isAssignableFrom(StrField.class)) { if (fSchema.multiValued()) { mapping.put(field, Type.SORTED); } else { mapping.put(field, Type.BINARY); } } else if (c.isAssignableFrom(TrieIntField.class)) { if (fSchema.multiValued()) { mapping.put(field, Type.SORTED_SET_INTEGER); } else { mapping.put(field, Type.INTEGER_POINT); } } else { continue; } unReader = new UninvertingReader(lr, mapping); } else { unReader = lr; } switch(fType) { case NUMERIC: transformer = new Transformer() { NumericDocValues dv = unReader.getNumericDocValues(field); @Override public void process(int docBase, int docId) { int v = (int) dv.get(docId); setter.set(docBase, docId, v); } }; break; case SORTED_NUMERIC: transformer = new Transformer() { SortedNumericDocValues dv = unReader.getSortedNumericDocValues(field); @Override public void process(int docBase, int docId) { dv.setDocument(docId); int max = dv.count(); int v; for (int i=0; i 5) return; dv.setDocument(docId); for (long ord = dv.nextOrd(); ord != SortedSetDocValues.NO_MORE_ORDS; ord = dv.nextOrd()) { final BytesRef value = dv.lookupOrd(ord); setter.set(docBase, docId, value.utf8ToString()); } } }; break; case SORTED: transformer = new Transformer() { SortedDocValues dv = unReader.getSortedDocValues(field); TermsEnum te; @Override public void process(int docBase, int docId) { BytesRef v = dv.get(docId); if (v.length == 0) return; setter.set(docBase, docId, v.utf8ToString()); } }; break; default: throw new IllegalArgumentException("The field " + field + " is of type that cannot be un-inverted"); } int i = 0; while(i < lr.maxDoc()) { if (liveDocs != null && !(i < liveDocs.length() && liveDocs.get(i))) { i++; continue; } transformer.process(docBase, i); i++; } } } } On Wed, Aug 17, 2016 at 1:22 PM, Roman Chyla wrote: > Joel, thanks, but which of them? I've counted at least 4, if not more, > different ways of how to get DocValues. Are there many functionally > equal approaches just because devs can't agree on using one api? Or is > there a deeper reason? > > Btw, the FieldCache is still there - both in lucene (to be deprecated) > and in solr; but became package accessible only > > This is what removed the FieldCache: > https://issues.apache.org/jira/browse/LUCENE-5666 > This is what followed: https://issues.apache.org/jira/browse/SOLR-8096 > > And there is still code which un-inverts data from an index if no > doc-values are available. > > --roman > > On Tue, Aug 16, 2016 at 9:54 PM, Joel Bernstein
Re: The most efficient way to get un-inverted view of the index?
Joel, thanks, but which of them? I've counted at least 4, if not more, different ways of how to get DocValues. Are there many functionally equal approaches just because devs can't agree on using one api? Or is there a deeper reason? Btw, the FieldCache is still there - both in lucene (to be deprecated) and in solr; but became package accessible only This is what removed the FieldCache: https://issues.apache.org/jira/browse/LUCENE-5666 This is what followed: https://issues.apache.org/jira/browse/SOLR-8096 And there is still code which un-inverts data from an index if no doc-values are available. --roman On Tue, Aug 16, 2016 at 9:54 PM, Joel Bernsteinwrote: > You'll want to use org.apache.lucene.index.DocValues. The DocValues api has > replaced the field cache. > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Aug 16, 2016 at 8:18 PM, Roman Chyla wrote: > >> I need to read data from the index in order to build a special cache. >> Previously, in SOLR4, this was accomplished with FieldCache or >> DocTermOrds >> >> Now, I'm struggling to see what API to use, there is many of them: >> >> on lucene level: >> >> UninvertingReader.getNumericDocValues (and others) >> .getNumericValues() >> MultiDocValues.getNumericValues() >> MultiFields.getTerms() >> >> on solr level: >> >> reader.getNumericValues() >> UninvertingReader.getNumericDocValues() >> and extensions to FilterLeafReader - eg. very intersting, but >> undocumented facet accumulators (ex: NumericAcc) >> >> >> I need this for solr, and ideally re-use the existing cache [ie. the >> special cache is using another fields so those get loaded only once >> and reused in the old solr; which is a win-win situation] >> >> If I use reader.getValues() or FilterLeafReader will I be reading data >> every time the object is created? What would be the best way to read >> data only once? >> >> Thanks, >> >> --roman >>
Re: The most efficient way to get un-inverted view of the index?
You'll want to use org.apache.lucene.index.DocValues. The DocValues api has replaced the field cache. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Aug 16, 2016 at 8:18 PM, Roman Chylawrote: > I need to read data from the index in order to build a special cache. > Previously, in SOLR4, this was accomplished with FieldCache or > DocTermOrds > > Now, I'm struggling to see what API to use, there is many of them: > > on lucene level: > > UninvertingReader.getNumericDocValues (and others) > .getNumericValues() > MultiDocValues.getNumericValues() > MultiFields.getTerms() > > on solr level: > > reader.getNumericValues() > UninvertingReader.getNumericDocValues() > and extensions to FilterLeafReader - eg. very intersting, but > undocumented facet accumulators (ex: NumericAcc) > > > I need this for solr, and ideally re-use the existing cache [ie. the > special cache is using another fields so those get loaded only once > and reused in the old solr; which is a win-win situation] > > If I use reader.getValues() or FilterLeafReader will I be reading data > every time the object is created? What would be the best way to read > data only once? > > Thanks, > > --roman >
The most efficient way to get un-inverted view of the index?
I need to read data from the index in order to build a special cache. Previously, in SOLR4, this was accomplished with FieldCache or DocTermOrds Now, I'm struggling to see what API to use, there is many of them: on lucene level: UninvertingReader.getNumericDocValues (and others) .getNumericValues() MultiDocValues.getNumericValues() MultiFields.getTerms() on solr level: reader.getNumericValues() UninvertingReader.getNumericDocValues() and extensions to FilterLeafReader - eg. very intersting, but undocumented facet accumulators (ex: NumericAcc) I need this for solr, and ideally re-use the existing cache [ie. the special cache is using another fields so those get loaded only once and reused in the old solr; which is a win-win situation] If I use reader.getValues() or FilterLeafReader will I be reading data every time the object is created? What would be the best way to read data only once? Thanks, --roman