Re: Opening up FieldCacheImpl

2013-03-26 Thread Alan Woodward
I've opened https://issues.apache.org/jira/browse/LUCENE-4883 as a start.

Alan Woodward
www.flax.co.uk


On 26 Mar 2013, at 00:51, Robert Muir wrote:

 I don't think codec would be where you'd plugin for a filterreader that 
 exposes external data as fake fields. That's because its all about what 
 encoding indexwriter uses to write. I think solr has an indexreaderfactory if 
 you want to e.g. wrap readers with filteratomicreaders.
 
 On Mar 25, 2013 2:30 PM, David Smiley (@MITRE.org) dsmi...@mitre.org 
 wrote:
 Interesting conversation. So if hypothetically Solr's FileFloatSource /
 ExternalFileField didn't yet exist and we were instead talking about how to
 implement such a thing on the latest 4.x code, then how basically might it
 work?  I can see how to implement a Solr CodecFactory ( a SchemaAware one) ,
 then a DocValuesProducer.  The CodecFactory implements
 NamedInitializedPlugin and can thus get its config info that way.  That's
 one approach.  But it's not clear to me where one would wrap AtomicReader
 with FilterAtomicReader to use that approach.
 
 ~ David
 
 
 Robert Muir wrote
  On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward lt;
 
  alan@.co
 
  gt; wrote:
  I think instead FieldCache should actually be completely package
  private and hidden behind a UninvertingFilterReader and accessible via
  the existing AtomicReader docValues methods.
 
  Aha, right, because SegmentCoreReaders already caches XXXDocValues
  instances (without using WeakReferences or anything like that).
 
  I should explain my motivation here.  I want to store various scoring
  factors externally to Lucene, but make them available via a ValueSource
  to CustomScoreQueries - essentially a generalisation of FileFloatSource
  to any external data source.  FFS already has a bunch of code copied from
  FieldCache, which was why my first thought was to open it up a bit and
  extend it, rather than copy and paste again.
 
  But it sounds as though a nicer way of doing this would be to create a
  new DocValuesProducer that talks to the external data source, and then
  access it through the AR docValues methods.  Does that sound plausible?
  Is SPI going to make it difficult to pass parameters to a custom
  DVProducer (data location, host/port, other DV fields to use as primary
  key lookups, etc)?
 
 
  its not involved if you implement via FilterAtomicReader. its only
  involved for reading things that are actually written into the index.
 
  -
  To unsubscribe, e-mail:
 
  dev-unsubscribe@.apache
 
  For additional commands, e-mail:
 
  dev-help@.apache
 
 
 
 
 
 -
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 



Re: Opening up FieldCacheImpl

2013-03-26 Thread Alan Woodward
Separately from this, I'm playing with an ExternalDocValuesFilterReader that 
takes a list of abstract ExternalDocValuesProviders, as a kind of 
generalisation of FileFloatSource.  It's a bit rough at the moment, and it's 
for a lucene application rather than for Solr, but it could work as a 
replacement for ExternalFileField with appropriate factories - I'll open a JIRA 
and put up a patch once it does anything useful.

Alan Woodward
www.flax.co.uk


On 26 Mar 2013, at 10:02, Alan Woodward wrote:

 I've opened https://issues.apache.org/jira/browse/LUCENE-4883 as a start.
 
 Alan Woodward
 www.flax.co.uk
 
 
 On 26 Mar 2013, at 00:51, Robert Muir wrote:
 
 I don't think codec would be where you'd plugin for a filterreader that 
 exposes external data as fake fields. That's because its all about what 
 encoding indexwriter uses to write. I think solr has an indexreaderfactory 
 if you want to e.g. wrap readers with filteratomicreaders.
 
 On Mar 25, 2013 2:30 PM, David Smiley (@MITRE.org) dsmi...@mitre.org 
 wrote:
 Interesting conversation. So if hypothetically Solr's FileFloatSource /
 ExternalFileField didn't yet exist and we were instead talking about how to
 implement such a thing on the latest 4.x code, then how basically might it
 work?  I can see how to implement a Solr CodecFactory ( a SchemaAware one) ,
 then a DocValuesProducer.  The CodecFactory implements
 NamedInitializedPlugin and can thus get its config info that way.  That's
 one approach.  But it's not clear to me where one would wrap AtomicReader
 with FilterAtomicReader to use that approach.
 
 ~ David
 
 
 Robert Muir wrote
  On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward lt;
 
  alan@.co
 
  gt; wrote:
  I think instead FieldCache should actually be completely package
  private and hidden behind a UninvertingFilterReader and accessible via
  the existing AtomicReader docValues methods.
 
  Aha, right, because SegmentCoreReaders already caches XXXDocValues
  instances (without using WeakReferences or anything like that).
 
  I should explain my motivation here.  I want to store various scoring
  factors externally to Lucene, but make them available via a ValueSource
  to CustomScoreQueries - essentially a generalisation of FileFloatSource
  to any external data source.  FFS already has a bunch of code copied from
  FieldCache, which was why my first thought was to open it up a bit and
  extend it, rather than copy and paste again.
 
  But it sounds as though a nicer way of doing this would be to create a
  new DocValuesProducer that talks to the external data source, and then
  access it through the AR docValues methods.  Does that sound plausible?
  Is SPI going to make it difficult to pass parameters to a custom
  DVProducer (data location, host/port, other DV fields to use as primary
  key lookups, etc)?
 
 
  its not involved if you implement via FilterAtomicReader. its only
  involved for reading things that are actually written into the index.
 
  -
  To unsubscribe, e-mail:
 
  dev-unsubscribe@.apache
 
  For additional commands, e-mail:
 
  dev-help@.apache
 
 
 
 
 
 -
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 



Re: Opening up FieldCacheImpl

2013-03-25 Thread David Smiley (@MITRE.org)
Interesting conversation. So if hypothetically Solr's FileFloatSource /
ExternalFileField didn't yet exist and we were instead talking about how to
implement such a thing on the latest 4.x code, then how basically might it
work?  I can see how to implement a Solr CodecFactory ( a SchemaAware one) ,
then a DocValuesProducer.  The CodecFactory implements
NamedInitializedPlugin and can thus get its config info that way.  That's
one approach.  But it's not clear to me where one would wrap AtomicReader
with FilterAtomicReader to use that approach.

~ David


Robert Muir wrote
 On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward lt;

 alan@.co

 gt; wrote:
 I think instead FieldCache should actually be completely package
 private and hidden behind a UninvertingFilterReader and accessible via
 the existing AtomicReader docValues methods.

 Aha, right, because SegmentCoreReaders already caches XXXDocValues
 instances (without using WeakReferences or anything like that).

 I should explain my motivation here.  I want to store various scoring
 factors externally to Lucene, but make them available via a ValueSource
 to CustomScoreQueries - essentially a generalisation of FileFloatSource
 to any external data source.  FFS already has a bunch of code copied from
 FieldCache, which was why my first thought was to open it up a bit and
 extend it, rather than copy and paste again.

 But it sounds as though a nicer way of doing this would be to create a
 new DocValuesProducer that talks to the external data source, and then
 access it through the AR docValues methods.  Does that sound plausible? 
 Is SPI going to make it difficult to pass parameters to a custom
 DVProducer (data location, host/port, other DV fields to use as primary
 key lookups, etc)?

 
 its not involved if you implement via FilterAtomicReader. its only
 involved for reading things that are actually written into the index.
 
 -
 To unsubscribe, e-mail: 

 dev-unsubscribe@.apache

 For additional commands, e-mail: 

 dev-help@.apache





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-25 Thread Robert Muir
I don't think codec would be where you'd plugin for a filterreader that
exposes external data as fake fields. That's because its all about what
encoding indexwriter uses to write. I think solr has an indexreaderfactory
if you want to e.g. wrap readers with filteratomicreaders.
On Mar 25, 2013 2:30 PM, David Smiley (@MITRE.org) dsmi...@mitre.org
wrote:

 Interesting conversation. So if hypothetically Solr's FileFloatSource /
 ExternalFileField didn't yet exist and we were instead talking about how to
 implement such a thing on the latest 4.x code, then how basically might it
 work?  I can see how to implement a Solr CodecFactory ( a SchemaAware one)
 ,
 then a DocValuesProducer.  The CodecFactory implements
 NamedInitializedPlugin and can thus get its config info that way.  That's
 one approach.  But it's not clear to me where one would wrap AtomicReader
 with FilterAtomicReader to use that approach.

 ~ David


 Robert Muir wrote
  On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward lt;

  alan@.co

  gt; wrote:
  I think instead FieldCache should actually be completely package
  private and hidden behind a UninvertingFilterReader and accessible via
  the existing AtomicReader docValues methods.
 
  Aha, right, because SegmentCoreReaders already caches XXXDocValues
  instances (without using WeakReferences or anything like that).
 
  I should explain my motivation here.  I want to store various scoring
  factors externally to Lucene, but make them available via a ValueSource
  to CustomScoreQueries - essentially a generalisation of FileFloatSource
  to any external data source.  FFS already has a bunch of code copied
 from
  FieldCache, which was why my first thought was to open it up a bit and
  extend it, rather than copy and paste again.
 
  But it sounds as though a nicer way of doing this would be to create a
  new DocValuesProducer that talks to the external data source, and then
  access it through the AR docValues methods.  Does that sound plausible?
  Is SPI going to make it difficult to pass parameters to a custom
  DVProducer (data location, host/port, other DV fields to use as primary
  key lookups, etc)?
 
 
  its not involved if you implement via FilterAtomicReader. its only
  involved for reading things that are actually written into the index.
 
  -
  To unsubscribe, e-mail:

  dev-unsubscribe@.apache

  For additional commands, e-mail:

  dev-help@.apache





 -
  Author:
 http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4051217.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Opening up FieldCacheImpl

2013-03-23 Thread Alan Woodward
 I think instead FieldCache should actually be completely package
 private and hidden behind a UninvertingFilterReader and accessible via
 the existing AtomicReader docValues methods.

Aha, right, because SegmentCoreReaders already caches XXXDocValues instances 
(without using WeakReferences or anything like that).

I should explain my motivation here.  I want to store various scoring factors 
externally to Lucene, but make them available via a ValueSource to 
CustomScoreQueries - essentially a generalisation of FileFloatSource to any 
external data source.  FFS already has a bunch of code copied from FieldCache, 
which was why my first thought was to open it up a bit and extend it, rather 
than copy and paste again.

But it sounds as though a nicer way of doing this would be to create a new 
DocValuesProducer that talks to the external data source, and then access it 
through the AR docValues methods.  Does that sound plausible?  Is SPI going to 
make it difficult to pass parameters to a custom DVProducer (data location, 
host/port, other DV fields to use as primary key lookups, etc)?


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-23 Thread Robert Muir
On Sat, Mar 23, 2013 at 7:25 AM, Alan Woodward a...@flax.co.uk wrote:
 I think instead FieldCache should actually be completely package
 private and hidden behind a UninvertingFilterReader and accessible via
 the existing AtomicReader docValues methods.

 Aha, right, because SegmentCoreReaders already caches XXXDocValues instances 
 (without using WeakReferences or anything like that).

 I should explain my motivation here.  I want to store various scoring factors 
 externally to Lucene, but make them available via a ValueSource to 
 CustomScoreQueries - essentially a generalisation of FileFloatSource to any 
 external data source.  FFS already has a bunch of code copied from 
 FieldCache, which was why my first thought was to open it up a bit and extend 
 it, rather than copy and paste again.

 But it sounds as though a nicer way of doing this would be to create a new 
 DocValuesProducer that talks to the external data source, and then access it 
 through the AR docValues methods.  Does that sound plausible?  Is SPI going 
 to make it difficult to pass parameters to a custom DVProducer (data 
 location, host/port, other DV fields to use as primary key lookups, etc)?


its not involved if you implement via FilterAtomicReader. its only
involved for reading things that are actually written into the index.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-22 Thread David Smiley (@MITRE.org)
That would be nice!  There is similar machinery in Solr's ExternalFileField. 
In the spatial module I'd like to cache data per-segment; it's current cache
sucks to say the least.  My current plans are to use BinaryDocValues so I
might not use this proposed machinery after-all but nonetheless I think it's
useful.

~ David


Alan Woodward-2 wrote
 I'm looking at exposing data held externally to an index via a
 ValueSource, and it would be nice to reuse the machinery in FieldCacheImpl
 to cache the data per-segment.  However, it's package-private at the
 moment, which means I can't extend it nicely.  Is there a reason for this? 
 Or should I put up a JIRA to make it public?
 
 Alan Woodward
 www.flax.co.uk





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Opening-up-FieldCacheImpl-tp4050537p4050579.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-22 Thread Yonik Seeley
The ability to cache stuff w/o resorting to weak references would be even nicer!
Caches right on the segment readers?

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-22 Thread Robert Muir
Note that what fieldcache does is not special, it just has a map and
calls the public SegmentReader.addCoreClosedListener method so that it
gets notifications when something is no longer needed.

I'm not sure we should make fieldcacheimpl public if thats the real
logic you want to reuse.

On Fri, Mar 22, 2013 at 1:36 PM, Alan Woodward a...@flax.co.uk wrote:
 I'm looking at exposing data held externally to an index via a ValueSource,
 and it would be nice to reuse the machinery in FieldCacheImpl to cache the
 data per-segment.  However, it's package-private at the moment, which means
 I can't extend it nicely.  Is there a reason for this?  Or should I put up a
 JIRA to make it public?

 Alan Woodward
 www.flax.co.uk



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Opening up FieldCacheImpl

2013-03-22 Thread Alan Woodward
Actually this would be really nice, wouldn't it.  Add a getFieldCache(String 
field) method to AtomicReader.  You'd have to be able to determine what to 
return depending on the field though - uninverted field, or docvalues, or 
another cached source.  

FieldCache and DocValues seem like they ought to have a common API, really.  
And ValueSource in the function queries package as well.  But that's another 
issue...

Alan Woodward
www.flax.co.uk


On 22 Mar 2013, at 20:48, Yonik Seeley wrote:

 The ability to cache stuff w/o resorting to weak references would be even 
 nicer!
 Caches right on the segment readers?
 
 -Yonik
 http://lucidworks.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 



Re: Opening up FieldCacheImpl

2013-03-22 Thread Robert Muir
On Fri, Mar 22, 2013 at 6:26 PM, Alan Woodward a...@flax.co.uk wrote:
 Actually this would be really nice, wouldn't it.  Add a getFieldCache(String
 field) method to AtomicReader.  You'd have to be able to determine what to
 return depending on the field though - uninverted field, or docvalues, or
 another cached source.

but the cache isnt even on the reader, its on the SegmentCoreReaders.


 FieldCache and DocValues seem like they ought to have a common API, really.

They already do.

I think instead FieldCache should actually be completely package
private and hidden behind a UninvertingFilterReader and accessible via
the existing AtomicReader docValues methods.

Uninverting is a really crazy solution vs. indexing fields the way
they will be used.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org