Re: Opening up FieldCacheImpl
Separately from this, I'm playing with an ExternalDocValuesFilterReader that takes a list of abstract ExternalDocValuesProviders, as a kind of generalisation of FileFloatSource. It's a bit rough at the moment, and it's for a lucene application rather than for Solr, but it could work as a replacement for ExternalFileField with appropriate factories - I'll open a JIRA and put up a patch once it does anything useful. Alan Woodward www.flax.co.uk On 26 Mar 2013, at 10:02, Alan Woodward wrote: > I've opened https://issues.apache.org/jira/browse/LUCENE-4883 as a start. > > Alan Woodward > www.flax.co.uk > > > On 26 Mar 2013, at 00:51, Robert Muir wrote: > >> I don't think codec would be where you'd plugin for a filterreader that >> exposes external data as fake fields. That's because its all about what >> encoding indexwriter uses to write. I think solr has an indexreaderfactory >> if you want to e.g. wrap readers with filteratomicreaders. >> >> On Mar 25, 2013 2:30 PM, "David Smiley (@MITRE.org)" >> wrote: >> Interesting conversation. So if hypothetically Solr's FileFloatSource / >> ExternalFileField didn't yet exist and we were instead talking about how to >> implement such a thing on the latest 4.x code, then how basically might it >> work? I can see how to implement a Solr CodecFactory ( a SchemaAware one) , >> then a DocValuesProducer. The CodecFactory implements >> NamedInitializedPlugin and can thus get its config info that way. That's >> one approach. But it's not clear to me where one would wrap AtomicReader >> with FilterAtomicReader to use that approach. >> >> ~ David >> >> >> Robert Muir wrote >> > On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward < >> >> > alan@.co >> >> > > wrote: >> >>> I think instead FieldCache should actually be completely package >> >>> private and hidden behind a UninvertingFilterReader and accessible via >> >>> the existing AtomicReader docValues methods. >> >> >> >> Aha, right, because SegmentCoreReaders already caches XXXDocValues >> >> instances (without using WeakReferences or anything like that). >> >> >> >> I should explain my motivation here. I want to store various scoring >> >> factors externally to Lucene, but make them available via a ValueSource >> >> to CustomScoreQueries - essentially a generalisation of FileFloatSource >> >> to any external data source. FFS already has a bunch of code copied from >> >> FieldCache, which was why my first thought was to open it up a bit and >> >> extend it, rather than copy and paste again. >> >> >> >> But it sounds as though a nicer way of doing this would be to create a >> >> new DocValuesProducer that talks to the external data source, and then >> >> access it through the AR docValues methods. Does that sound plausible? >> >> Is SPI going to make it difficult to pass parameters to a custom >> >> DVProducer (data location, host/port, other DV fields to use as primary >> >> key lookups, etc)? >> >> >> > >> > its not involved if you implement via FilterAtomicReader. its only >> > involved for reading things that are actually written into the index. >> > >> > - >> > To unsubscribe, e-mail: >> >> > dev-unsubscribe@.apache >> >> > For additional commands, e-mail: >> >> > dev-help@.apache >> >> >> >> >> >> - >> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html >> Sent from the Lucene - Java Developer mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >
Re: Opening up FieldCacheImpl
I've opened https://issues.apache.org/jira/browse/LUCENE-4883 as a start. Alan Woodward www.flax.co.uk On 26 Mar 2013, at 00:51, Robert Muir wrote: > I don't think codec would be where you'd plugin for a filterreader that > exposes external data as fake fields. That's because its all about what > encoding indexwriter uses to write. I think solr has an indexreaderfactory if > you want to e.g. wrap readers with filteratomicreaders. > > On Mar 25, 2013 2:30 PM, "David Smiley (@MITRE.org)" > wrote: > Interesting conversation. So if hypothetically Solr's FileFloatSource / > ExternalFileField didn't yet exist and we were instead talking about how to > implement such a thing on the latest 4.x code, then how basically might it > work? I can see how to implement a Solr CodecFactory ( a SchemaAware one) , > then a DocValuesProducer. The CodecFactory implements > NamedInitializedPlugin and can thus get its config info that way. That's > one approach. But it's not clear to me where one would wrap AtomicReader > with FilterAtomicReader to use that approach. > > ~ David > > > Robert Muir wrote > > On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward < > > > alan@.co > > > > wrote: > >>> I think instead FieldCache should actually be completely package > >>> private and hidden behind a UninvertingFilterReader and accessible via > >>> the existing AtomicReader docValues methods. > >> > >> Aha, right, because SegmentCoreReaders already caches XXXDocValues > >> instances (without using WeakReferences or anything like that). > >> > >> I should explain my motivation here. I want to store various scoring > >> factors externally to Lucene, but make them available via a ValueSource > >> to CustomScoreQueries - essentially a generalisation of FileFloatSource > >> to any external data source. FFS already has a bunch of code copied from > >> FieldCache, which was why my first thought was to open it up a bit and > >> extend it, rather than copy and paste again. > >> > >> But it sounds as though a nicer way of doing this would be to create a > >> new DocValuesProducer that talks to the external data source, and then > >> access it through the AR docValues methods. Does that sound plausible? > >> Is SPI going to make it difficult to pass parameters to a custom > >> DVProducer (data location, host/port, other DV fields to use as primary > >> key lookups, etc)? > >> > > > > its not involved if you implement via FilterAtomicReader. its only > > involved for reading things that are actually written into the index. > > > > - > > To unsubscribe, e-mail: > > > dev-unsubscribe@.apache > > > For additional commands, e-mail: > > > dev-help@.apache > > > > > > - > Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org >
Re: Opening up FieldCacheImpl
I don't think codec would be where you'd plugin for a filterreader that exposes external data as fake fields. That's because its all about what encoding indexwriter uses to write. I think solr has an indexreaderfactory if you want to e.g. wrap readers with filteratomicreaders. On Mar 25, 2013 2:30 PM, "David Smiley (@MITRE.org)" wrote: > Interesting conversation. So if hypothetically Solr's FileFloatSource / > ExternalFileField didn't yet exist and we were instead talking about how to > implement such a thing on the latest 4.x code, then how basically might it > work? I can see how to implement a Solr CodecFactory ( a SchemaAware one) > , > then a DocValuesProducer. The CodecFactory implements > NamedInitializedPlugin and can thus get its config info that way. That's > one approach. But it's not clear to me where one would wrap AtomicReader > with FilterAtomicReader to use that approach. > > ~ David > > > Robert Muir wrote > > On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward < > > > alan@.co > > > > wrote: > >>> I think instead FieldCache should actually be completely package > >>> private and hidden behind a UninvertingFilterReader and accessible via > >>> the existing AtomicReader docValues methods. > >> > >> Aha, right, because SegmentCoreReaders already caches XXXDocValues > >> instances (without using WeakReferences or anything like that). > >> > >> I should explain my motivation here. I want to store various scoring > >> factors externally to Lucene, but make them available via a ValueSource > >> to CustomScoreQueries - essentially a generalisation of FileFloatSource > >> to any external data source. FFS already has a bunch of code copied > from > >> FieldCache, which was why my first thought was to open it up a bit and > >> extend it, rather than copy and paste again. > >> > >> But it sounds as though a nicer way of doing this would be to create a > >> new DocValuesProducer that talks to the external data source, and then > >> access it through the AR docValues methods. Does that sound plausible? > >> Is SPI going to make it difficult to pass parameters to a custom > >> DVProducer (data location, host/port, other DV fields to use as primary > >> key lookups, etc)? > >> > > > > its not involved if you implement via FilterAtomicReader. its only > > involved for reading things that are actually written into the index. > > > > - > > To unsubscribe, e-mail: > > > dev-unsubscribe@.apache > > > For additional commands, e-mail: > > > dev-help@.apache > > > > > > - > Author: > http://www.packtpub.com/apache-solr-3-enterprise-search-server/book > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: Opening up FieldCacheImpl
Interesting conversation. So if hypothetically Solr's FileFloatSource / ExternalFileField didn't yet exist and we were instead talking about how to implement such a thing on the latest 4.x code, then how basically might it work? I can see how to implement a Solr CodecFactory ( a SchemaAware one) , then a DocValuesProducer. The CodecFactory implements NamedInitializedPlugin and can thus get its config info that way. That's one approach. But it's not clear to me where one would wrap AtomicReader with FilterAtomicReader to use that approach. ~ David Robert Muir wrote > On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward < > alan@.co > > wrote: >>> I think instead FieldCache should actually be completely package >>> private and hidden behind a UninvertingFilterReader and accessible via >>> the existing AtomicReader docValues methods. >> >> Aha, right, because SegmentCoreReaders already caches XXXDocValues >> instances (without using WeakReferences or anything like that). >> >> I should explain my motivation here. I want to store various scoring >> factors externally to Lucene, but make them available via a ValueSource >> to CustomScoreQueries - essentially a generalisation of FileFloatSource >> to any external data source. FFS already has a bunch of code copied from >> FieldCache, which was why my first thought was to open it up a bit and >> extend it, rather than copy and paste again. >> >> But it sounds as though a nicer way of doing this would be to create a >> new DocValuesProducer that talks to the external data source, and then >> access it through the AR docValues methods. Does that sound plausible? >> Is SPI going to make it difficult to pass parameters to a custom >> DVProducer (data location, host/port, other DV fields to use as primary >> key lookups, etc)? >> > > its not involved if you implement via FilterAtomicReader. its only > involved for reading things that are actually written into the index. > > - > To unsubscribe, e-mail: > dev-unsubscribe@.apache > For additional commands, e-mail: > dev-help@.apache - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Opening up FieldCacheImpl
On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward wrote: >> I think instead FieldCache should actually be completely package >> private and hidden behind a UninvertingFilterReader and accessible via >> the existing AtomicReader docValues methods. > > Aha, right, because SegmentCoreReaders already caches XXXDocValues instances > (without using WeakReferences or anything like that). > > I should explain my motivation here. I want to store various scoring factors > externally to Lucene, but make them available via a ValueSource to > CustomScoreQueries - essentially a generalisation of FileFloatSource to any > external data source. FFS already has a bunch of code copied from > FieldCache, which was why my first thought was to open it up a bit and extend > it, rather than copy and paste again. > > But it sounds as though a nicer way of doing this would be to create a new > DocValuesProducer that talks to the external data source, and then access it > through the AR docValues methods. Does that sound plausible? Is SPI going > to make it difficult to pass parameters to a custom DVProducer (data > location, host/port, other DV fields to use as primary key lookups, etc)? > its not involved if you implement via FilterAtomicReader. its only involved for reading things that are actually written into the index. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Opening up FieldCacheImpl
> I think instead FieldCache should actually be completely package > private and hidden behind a UninvertingFilterReader and accessible via > the existing AtomicReader docValues methods. Aha, right, because SegmentCoreReaders already caches XXXDocValues instances (without using WeakReferences or anything like that). I should explain my motivation here. I want to store various scoring factors externally to Lucene, but make them available via a ValueSource to CustomScoreQueries - essentially a generalisation of FileFloatSource to any external data source. FFS already has a bunch of code copied from FieldCache, which was why my first thought was to open it up a bit and extend it, rather than copy and paste again. But it sounds as though a nicer way of doing this would be to create a new DocValuesProducer that talks to the external data source, and then access it through the AR docValues methods. Does that sound plausible? Is SPI going to make it difficult to pass parameters to a custom DVProducer (data location, host/port, other DV fields to use as primary key lookups, etc)? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Opening up FieldCacheImpl
On Fri, Mar 22, 2013 at 6:26 PM, Alan Woodward wrote: > Actually this would be really nice, wouldn't it. Add a getFieldCache(String > field) method to AtomicReader. You'd have to be able to determine what to > return depending on the field though - uninverted field, or docvalues, or > another cached source. but the cache isnt even on the reader, its on the SegmentCoreReaders. > > FieldCache and DocValues seem like they ought to have a common API, really. They already do. I think instead FieldCache should actually be completely package private and hidden behind a UninvertingFilterReader and accessible via the existing AtomicReader docValues methods. Uninverting is a really crazy solution vs. indexing fields the way they will be used. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Opening up FieldCacheImpl
Actually this would be really nice, wouldn't it. Add a getFieldCache(String field) method to AtomicReader. You'd have to be able to determine what to return depending on the field though - uninverted field, or docvalues, or another cached source. FieldCache and DocValues seem like they ought to have a common API, really. And ValueSource in the function queries package as well. But that's another issue... Alan Woodward www.flax.co.uk On 22 Mar 2013, at 20:48, Yonik Seeley wrote: > The ability to cache stuff w/o resorting to weak references would be even > nicer! > Caches right on the segment readers? > > -Yonik > http://lucidworks.com > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org >
Re: Opening up FieldCacheImpl
Note that what fieldcache does is not special, it just has a map and calls the public SegmentReader.addCoreClosedListener method so that it gets notifications when something is no longer needed. I'm not sure we should make fieldcacheimpl public if thats the real logic you want to reuse. On Fri, Mar 22, 2013 at 1:36 PM, Alan Woodward wrote: > I'm looking at exposing data held externally to an index via a ValueSource, > and it would be nice to reuse the machinery in FieldCacheImpl to cache the > data per-segment. However, it's package-private at the moment, which means > I can't extend it nicely. Is there a reason for this? Or should I put up a > JIRA to make it public? > > Alan Woodward > www.flax.co.uk > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Opening up FieldCacheImpl
The ability to cache stuff w/o resorting to weak references would be even nicer! Caches right on the segment readers? -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Opening up FieldCacheImpl
That would be nice! There is similar machinery in Solr's ExternalFileField. In the spatial module I'd like to cache data per-segment; it's current cache sucks to say the least. My current plans are to use BinaryDocValues so I might not use this proposed machinery after-all but nonetheless I think it's useful. ~ David Alan Woodward-2 wrote > I'm looking at exposing data held externally to an index via a > ValueSource, and it would be nice to reuse the machinery in FieldCacheImpl > to cache the data per-segment. However, it's package-private at the > moment, which means I can't extend it nicely. Is there a reason for this? > Or should I put up a JIRA to make it public? > > Alan Woodward > www.flax.co.uk - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4050579.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org