Re: Opening up FieldCacheImpl

2013-03-26 Thread Alan Woodward
Separately from this, I'm playing with an ExternalDocValuesFilterReader that 
takes a list of abstract ExternalDocValuesProviders, as a kind of 
generalisation of FileFloatSource.  It's a bit rough at the moment, and it's 
for a lucene application rather than for Solr, but it could work as a 
replacement for ExternalFileField with appropriate factories - I'll open a JIRA 
and put up a patch once it does anything useful.

Alan Woodward
www.flax.co.uk


On 26 Mar 2013, at 10:02, Alan Woodward wrote:

> I've opened https://issues.apache.org/jira/browse/LUCENE-4883 as a start.
> 
> Alan Woodward
> www.flax.co.uk
> 
> 
> On 26 Mar 2013, at 00:51, Robert Muir wrote:
> 
>> I don't think codec would be where you'd plugin for a filterreader that 
>> exposes external data as fake fields. That's because its all about what 
>> encoding indexwriter uses to write. I think solr has an indexreaderfactory 
>> if you want to e.g. wrap readers with filteratomicreaders.
>> 
>> On Mar 25, 2013 2:30 PM, "David Smiley (@MITRE.org)"  
>> wrote:
>> Interesting conversation. So if hypothetically Solr's FileFloatSource /
>> ExternalFileField didn't yet exist and we were instead talking about how to
>> implement such a thing on the latest 4.x code, then how basically might it
>> work?  I can see how to implement a Solr CodecFactory ( a SchemaAware one) ,
>> then a DocValuesProducer.  The CodecFactory implements
>> NamedInitializedPlugin and can thus get its config info that way.  That's
>> one approach.  But it's not clear to me where one would wrap AtomicReader
>> with FilterAtomicReader to use that approach.
>> 
>> ~ David
>> 
>> 
>> Robert Muir wrote
>> > On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward <
>> 
>> > alan@.co
>> 
>> > > wrote:
>> >>> I think instead FieldCache should actually be completely package
>> >>> private and hidden behind a UninvertingFilterReader and accessible via
>> >>> the existing AtomicReader docValues methods.
>> >>
>> >> Aha, right, because SegmentCoreReaders already caches XXXDocValues
>> >> instances (without using WeakReferences or anything like that).
>> >>
>> >> I should explain my motivation here.  I want to store various scoring
>> >> factors externally to Lucene, but make them available via a ValueSource
>> >> to CustomScoreQueries - essentially a generalisation of FileFloatSource
>> >> to any external data source.  FFS already has a bunch of code copied from
>> >> FieldCache, which was why my first thought was to open it up a bit and
>> >> extend it, rather than copy and paste again.
>> >>
>> >> But it sounds as though a nicer way of doing this would be to create a
>> >> new DocValuesProducer that talks to the external data source, and then
>> >> access it through the AR docValues methods.  Does that sound plausible?
>> >> Is SPI going to make it difficult to pass parameters to a custom
>> >> DVProducer (data location, host/port, other DV fields to use as primary
>> >> key lookups, etc)?
>> >>
>> >
>> > its not involved if you implement via FilterAtomicReader. its only
>> > involved for reading things that are actually written into the index.
>> >
>> > -
>> > To unsubscribe, e-mail:
>> 
>> > dev-unsubscribe@.apache
>> 
>> > For additional commands, e-mail:
>> 
>> > dev-help@.apache
>> 
>> 
>> 
>> 
>> 
>> -
>>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 



Re: Opening up FieldCacheImpl

2013-03-26 Thread Alan Woodward
I've opened https://issues.apache.org/jira/browse/LUCENE-4883 as a start.

Alan Woodward
www.flax.co.uk


On 26 Mar 2013, at 00:51, Robert Muir wrote:

> I don't think codec would be where you'd plugin for a filterreader that 
> exposes external data as fake fields. That's because its all about what 
> encoding indexwriter uses to write. I think solr has an indexreaderfactory if 
> you want to e.g. wrap readers with filteratomicreaders.
> 
> On Mar 25, 2013 2:30 PM, "David Smiley (@MITRE.org)"  
> wrote:
> Interesting conversation. So if hypothetically Solr's FileFloatSource /
> ExternalFileField didn't yet exist and we were instead talking about how to
> implement such a thing on the latest 4.x code, then how basically might it
> work?  I can see how to implement a Solr CodecFactory ( a SchemaAware one) ,
> then a DocValuesProducer.  The CodecFactory implements
> NamedInitializedPlugin and can thus get its config info that way.  That's
> one approach.  But it's not clear to me where one would wrap AtomicReader
> with FilterAtomicReader to use that approach.
> 
> ~ David
> 
> 
> Robert Muir wrote
> > On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward <
> 
> > alan@.co
> 
> > > wrote:
> >>> I think instead FieldCache should actually be completely package
> >>> private and hidden behind a UninvertingFilterReader and accessible via
> >>> the existing AtomicReader docValues methods.
> >>
> >> Aha, right, because SegmentCoreReaders already caches XXXDocValues
> >> instances (without using WeakReferences or anything like that).
> >>
> >> I should explain my motivation here.  I want to store various scoring
> >> factors externally to Lucene, but make them available via a ValueSource
> >> to CustomScoreQueries - essentially a generalisation of FileFloatSource
> >> to any external data source.  FFS already has a bunch of code copied from
> >> FieldCache, which was why my first thought was to open it up a bit and
> >> extend it, rather than copy and paste again.
> >>
> >> But it sounds as though a nicer way of doing this would be to create a
> >> new DocValuesProducer that talks to the external data source, and then
> >> access it through the AR docValues methods.  Does that sound plausible?
> >> Is SPI going to make it difficult to pass parameters to a custom
> >> DVProducer (data location, host/port, other DV fields to use as primary
> >> key lookups, etc)?
> >>
> >
> > its not involved if you implement via FilterAtomicReader. its only
> > involved for reading things that are actually written into the index.
> >
> > -
> > To unsubscribe, e-mail:
> 
> > dev-unsubscribe@.apache
> 
> > For additional commands, e-mail:
> 
> > dev-help@.apache
> 
> 
> 
> 
> 
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 



Re: Opening up FieldCacheImpl

2013-03-25 Thread Robert Muir
I don't think codec would be where you'd plugin for a filterreader that
exposes external data as fake fields. That's because its all about what
encoding indexwriter uses to write. I think solr has an indexreaderfactory
if you want to e.g. wrap readers with filteratomicreaders.
On Mar 25, 2013 2:30 PM, "David Smiley (@MITRE.org)" 
wrote:

> Interesting conversation. So if hypothetically Solr's FileFloatSource /
> ExternalFileField didn't yet exist and we were instead talking about how to
> implement such a thing on the latest 4.x code, then how basically might it
> work?  I can see how to implement a Solr CodecFactory ( a SchemaAware one)
> ,
> then a DocValuesProducer.  The CodecFactory implements
> NamedInitializedPlugin and can thus get its config info that way.  That's
> one approach.  But it's not clear to me where one would wrap AtomicReader
> with FilterAtomicReader to use that approach.
>
> ~ David
>
>
> Robert Muir wrote
> > On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward <
>
> > alan@.co
>
> > > wrote:
> >>> I think instead FieldCache should actually be completely package
> >>> private and hidden behind a UninvertingFilterReader and accessible via
> >>> the existing AtomicReader docValues methods.
> >>
> >> Aha, right, because SegmentCoreReaders already caches XXXDocValues
> >> instances (without using WeakReferences or anything like that).
> >>
> >> I should explain my motivation here.  I want to store various scoring
> >> factors externally to Lucene, but make them available via a ValueSource
> >> to CustomScoreQueries - essentially a generalisation of FileFloatSource
> >> to any external data source.  FFS already has a bunch of code copied
> from
> >> FieldCache, which was why my first thought was to open it up a bit and
> >> extend it, rather than copy and paste again.
> >>
> >> But it sounds as though a nicer way of doing this would be to create a
> >> new DocValuesProducer that talks to the external data source, and then
> >> access it through the AR docValues methods.  Does that sound plausible?
> >> Is SPI going to make it difficult to pass parameters to a custom
> >> DVProducer (data location, host/port, other DV fields to use as primary
> >> key lookups, etc)?
> >>
> >
> > its not involved if you implement via FilterAtomicReader. its only
> > involved for reading things that are actually written into the index.
> >
> > -
> > To unsubscribe, e-mail:
>
> > dev-unsubscribe@.apache
>
> > For additional commands, e-mail:
>
> > dev-help@.apache
>
>
>
>
>
> -
>  Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Opening up FieldCacheImpl

2013-03-25 Thread David Smiley (@MITRE.org)
Interesting conversation. So if hypothetically Solr's FileFloatSource /
ExternalFileField didn't yet exist and we were instead talking about how to
implement such a thing on the latest 4.x code, then how basically might it
work?  I can see how to implement a Solr CodecFactory ( a SchemaAware one) ,
then a DocValuesProducer.  The CodecFactory implements
NamedInitializedPlugin and can thus get its config info that way.  That's
one approach.  But it's not clear to me where one would wrap AtomicReader
with FilterAtomicReader to use that approach.

~ David


Robert Muir wrote
> On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward <

> alan@.co

> > wrote:
>>> I think instead FieldCache should actually be completely package
>>> private and hidden behind a UninvertingFilterReader and accessible via
>>> the existing AtomicReader docValues methods.
>>
>> Aha, right, because SegmentCoreReaders already caches XXXDocValues
>> instances (without using WeakReferences or anything like that).
>>
>> I should explain my motivation here.  I want to store various scoring
>> factors externally to Lucene, but make them available via a ValueSource
>> to CustomScoreQueries - essentially a generalisation of FileFloatSource
>> to any external data source.  FFS already has a bunch of code copied from
>> FieldCache, which was why my first thought was to open it up a bit and
>> extend it, rather than copy and paste again.
>>
>> But it sounds as though a nicer way of doing this would be to create a
>> new DocValuesProducer that talks to the external data source, and then
>> access it through the AR docValues methods.  Does that sound plausible? 
>> Is SPI going to make it difficult to pass parameters to a custom
>> DVProducer (data location, host/port, other DV fields to use as primary
>> key lookups, etc)?
>>
> 
> its not involved if you implement via FilterAtomicReader. its only
> involved for reading things that are actually written into the index.
> 
> -
> To unsubscribe, e-mail: 

> dev-unsubscribe@.apache

> For additional commands, e-mail: 

> dev-help@.apache





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-23 Thread Robert Muir
On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward  wrote:
>> I think instead FieldCache should actually be completely package
>> private and hidden behind a UninvertingFilterReader and accessible via
>> the existing AtomicReader docValues methods.
>
> Aha, right, because SegmentCoreReaders already caches XXXDocValues instances 
> (without using WeakReferences or anything like that).
>
> I should explain my motivation here.  I want to store various scoring factors 
> externally to Lucene, but make them available via a ValueSource to 
> CustomScoreQueries - essentially a generalisation of FileFloatSource to any 
> external data source.  FFS already has a bunch of code copied from 
> FieldCache, which was why my first thought was to open it up a bit and extend 
> it, rather than copy and paste again.
>
> But it sounds as though a nicer way of doing this would be to create a new 
> DocValuesProducer that talks to the external data source, and then access it 
> through the AR docValues methods.  Does that sound plausible?  Is SPI going 
> to make it difficult to pass parameters to a custom DVProducer (data 
> location, host/port, other DV fields to use as primary key lookups, etc)?
>

its not involved if you implement via FilterAtomicReader. its only
involved for reading things that are actually written into the index.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-23 Thread Alan Woodward
> I think instead FieldCache should actually be completely package
> private and hidden behind a UninvertingFilterReader and accessible via
> the existing AtomicReader docValues methods.

Aha, right, because SegmentCoreReaders already caches XXXDocValues instances 
(without using WeakReferences or anything like that).

I should explain my motivation here.  I want to store various scoring factors 
externally to Lucene, but make them available via a ValueSource to 
CustomScoreQueries - essentially a generalisation of FileFloatSource to any 
external data source.  FFS already has a bunch of code copied from FieldCache, 
which was why my first thought was to open it up a bit and extend it, rather 
than copy and paste again.

But it sounds as though a nicer way of doing this would be to create a new 
DocValuesProducer that talks to the external data source, and then access it 
through the AR docValues methods.  Does that sound plausible?  Is SPI going to 
make it difficult to pass parameters to a custom DVProducer (data location, 
host/port, other DV fields to use as primary key lookups, etc)?


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-22 Thread Robert Muir
On Fri, Mar 22, 2013 at 6:26 PM, Alan Woodward  wrote:
> Actually this would be really nice, wouldn't it.  Add a getFieldCache(String
> field) method to AtomicReader.  You'd have to be able to determine what to
> return depending on the field though - uninverted field, or docvalues, or
> another cached source.

but the cache isnt even on the reader, its on the SegmentCoreReaders.

>
> FieldCache and DocValues seem like they ought to have a common API, really.

They already do.

I think instead FieldCache should actually be completely package
private and hidden behind a UninvertingFilterReader and accessible via
the existing AtomicReader docValues methods.

Uninverting is a really crazy solution vs. indexing fields the way
they will be used.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-22 Thread Alan Woodward
Actually this would be really nice, wouldn't it.  Add a getFieldCache(String 
field) method to AtomicReader.  You'd have to be able to determine what to 
return depending on the field though - uninverted field, or docvalues, or 
another cached source.  

FieldCache and DocValues seem like they ought to have a common API, really.  
And ValueSource in the function queries package as well.  But that's another 
issue...

Alan Woodward
www.flax.co.uk


On 22 Mar 2013, at 20:48, Yonik Seeley wrote:

> The ability to cache stuff w/o resorting to weak references would be even 
> nicer!
> Caches right on the segment readers?
> 
> -Yonik
> http://lucidworks.com
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 



Re: Opening up FieldCacheImpl

2013-03-22 Thread Robert Muir
Note that what fieldcache does is not special, it just has a map and
calls the public SegmentReader.addCoreClosedListener method so that it
gets notifications when something is no longer needed.

I'm not sure we should make fieldcacheimpl public if thats the real
logic you want to reuse.

On Fri, Mar 22, 2013 at 1:36 PM, Alan Woodward  wrote:
> I'm looking at exposing data held externally to an index via a ValueSource,
> and it would be nice to reuse the machinery in FieldCacheImpl to cache the
> data per-segment.  However, it's package-private at the moment, which means
> I can't extend it nicely.  Is there a reason for this?  Or should I put up a
> JIRA to make it public?
>
> Alan Woodward
> www.flax.co.uk
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-22 Thread Yonik Seeley
The ability to cache stuff w/o resorting to weak references would be even nicer!
Caches right on the segment readers?

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-22 Thread David Smiley (@MITRE.org)
That would be nice!  There is similar machinery in Solr's ExternalFileField. 
In the spatial module I'd like to cache data per-segment; it's current cache
sucks to say the least.  My current plans are to use BinaryDocValues so I
might not use this proposed machinery after-all but nonetheless I think it's
useful.

~ David


Alan Woodward-2 wrote
> I'm looking at exposing data held externally to an index via a
> ValueSource, and it would be nice to reuse the machinery in FieldCacheImpl
> to cache the data per-segment.  However, it's package-private at the
> moment, which means I can't extend it nicely.  Is there a reason for this? 
> Or should I put up a JIRA to make it public?
> 
> Alan Woodward
> www.flax.co.uk





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4050579.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org