Re: Understanding fieldCache SUBREADER insanity

2012-10-02 Thread Aaron Daubman
Hi Yonik,

I've been attempting to fix the SUBREADER insanity in our custom
component, and have made perhaps some progress (or is this worse?) -
I've gone from SUBREADER to VALUEMISMATCH insanity:
---snip---
entries_count : 12
entry#0 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='f_normalizedTotalHotttnesss',class
org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache,null=org.apache.lucene.util.FixedBitSet#1387502754
entry#1 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_track_count',class
org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache,null=org.apache.lucene.util.Bits$MatchAllBits#233863705
entry#2 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='s_artistID',class
org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#652215925
entry#3 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='s_artistID',class
java.lang.String,null=[Ljava.lang.String;#1036517187
entry#4 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='thingID',class
java.lang.String,null=[Ljava.lang.String;#357017445
entry#5 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='f_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#322888397
entry#6 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='f_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.DEFAULT_FLOAT_PARSER=org.apache.lucene.search.FieldCache$CreationPlaceholder#1229311421
entry#7 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='f_normalizedTotalHotttnesss',float,null=[F#322888397
entry#8 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_collapse',int,org.apache.lucene.search.FieldCache.DEFAULT_INT_PARSER=org.apache.lucene.search.FieldCache$CreationPlaceholder#92920526
entry#9 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_collapse',int,null=[I#494669113
entry#10 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_collapse',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#494669113
entry#11 : 
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_track_count',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#994584654
insanity_count : 1
insanity#0 : VALUEMISMATCH: Multiple distinct value objects for
MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)+s_artistID
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='s_artistID',class
org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#652215925
'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='s_artistID',class
java.lang.String,null=[Ljava.lang.String;#1036517187
---snip---

Any suggestions on what the cause of this VALUEMISMATCH is, if it is
the normal case, or suggestions on how to fix it.

For anybody else with SUBREADER insanity issues, this is the change I
made to get this far (get the first leafReader, since we are using a
merged/optimized index):
---snip---
SolrIndexReader reader = searcher.getReader().getLeafReaders()[0];
collapseIDs = FieldCache.DEFAULT.getInts(reader, COLLAPSE_KEY_NAME);
hotnessValues = FieldCache.DEFAULT.getFloats(reader,
HOTNESS_KEY_NAME);
artistIDs = FieldCache.DEFAULT.getStrings(reader, ARTIST_KEY_NAME);
---snip---

Thanks,
 Aaron

On Wed, Sep 19, 2012 at 4:54 PM, Yonik Seeley yo...@lucidworks.com wrote:
 already-optimized, single-segment index

 That part is interesting... if true, then the type of insanity you
 saw should be impossible, and either the insanity detection or
 something else is broken.

 -Yonik
 http://lucidworks.com


Re: Understanding fieldCache SUBREADER insanity

2012-09-21 Thread Aaron Daubman
Yonik, et al.

I believe I found the section of code pushing me into 'insanity' status:
---snip---
int[] collapseIDs = null;
float[] hotnessValues = null;
String[] artistIDs = null;
try {
collapseIDs =
FieldCache.DEFAULT.getInts(searcher.getIndexReader(),
COLLAPSE_KEY_NAME);
hotnessValues =
FieldCache.DEFAULT.getFloats(searcher.getIndexReader(),
HOTNESS_KEY_NAME);
artistIDs =
FieldCache.DEFAULT.getStrings(searcher.getIndexReader(),
ARTIST_KEY_NAME);
} ...
---snip---

Since it seems like this code is using the 'old-style' pre-Lucene 2.9
top-level indexReaders, is there any example code you can point me to
that could show how to convert to using the leaf level segmentReaders?
If the limited information I've been able to find is correct, this
could explain some of the significant memory usage I am seeing...

Thanks again,
 Aaron

On Wed, Sep 19, 2012 at 4:54 PM, Yonik Seeley yo...@lucidworks.com wrote:
 already-optimized, single-segment index

 That part is interesting... if true, then the type of insanity you
 saw should be impossible, and either the insanity detection or
 something else is broken.

 -Yonik
 http://lucidworks.com


Understanding fieldCache SUBREADER insanity

2012-09-19 Thread Aaron Daubman
Hi all,

In reviewing a solr instance with somewhat variable performance, I
noticed that its fieldCache stats show an insanity_count of 1 with the
insanity type SUBREADER:

---snip---
insanity_count : 1
insanity#0 : SUBREADER: Found caches for descendants of
ReadOnlyDirectoryReader(segments_k
_6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
'ReadOnlyDirectoryReader(segments_k
_6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1965982057
'ReadOnlyDirectoryReader(segments_k
_6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,null=[F#1965982057
'MMapIndexInput(path=/io01/p/solr/playlist/a/playlist/index/_6h9.frq)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1308116426
---snip---

How can I decipher what this means and what, if anything, I should do
to fix/improve the insanity?

Thanks,
 Aaron


Re: Understanding fieldCache SUBREADER insanity

2012-09-19 Thread Tomás Fernández Löbbe
Hi Aaron, here there is some information about the insanity count:
http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache

As for the SUBREADER type, the javadocs say:
Indicates an overlap in cache usage on a given field in sub/super readers.

This probably means that you are using the same field for faceting and for
sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
cache and faceting uses by default the global field cache. This can be a
problem because the field is duplicated in cache, and then it uses twice
the memory.

One way to solve this would be to change the faceting method on that field
to 'fcs', which uses segment level cache (but may be a little bit slower).

Tomás


On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman daub...@gmail.com wrote:

 Hi all,

 In reviewing a solr instance with somewhat variable performance, I
 noticed that its fieldCache stats show an insanity_count of 1 with the
 insanity type SUBREADER:

 ---snip---
 insanity_count : 1
 insanity#0 : SUBREADER: Found caches for descendants of
 ReadOnlyDirectoryReader(segments_k
 _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
 'ReadOnlyDirectoryReader(segments_k

 _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1965982057
 'ReadOnlyDirectoryReader(segments_k

 _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,null=[F#1965982057

 'MMapIndexInput(path=/io01/p/solr/playlist/a/playlist/index/_6h9.frq)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1308116426
 ---snip---

 How can I decipher what this means and what, if anything, I should do
 to fix/improve the insanity?

 Thanks,
  Aaron



Re: Understanding fieldCache SUBREADER insanity

2012-09-19 Thread Aaron Daubman
Hi Tomás,

 This probably means that you are using the same field for faceting and for
 sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
 cache and faceting uses by default the global field cache. This can be a
 problem because the field is duplicated in cache, and then it uses twice
 the memory.

 One way to solve this would be to change the faceting method on that field
 to 'fcs', which uses segment level cache (but may be a little bit slower).

Thanks for explaining what the sparse wiki and javadoc mean - I had
read them but had no idea what the implications were ;-)

We are not doing any explicit faceting, and this index is also
supposed to be a read-only, already-optimized, single-segment index -
both of these seem to indicate to (very unknowledgeable about this) me
that this could be more of a problem - e.g. what am I doing to cause
this since I don't think I need to be using segment-level anything
(should be a single segment if I understand optimization and RO
indicies) and I am not leveraging faceting?

Any pointers on where else to look for what might be causing this (one
issue I am currently troubleshooting is too-many-pauses caused by
too-frequent GC, so preventing this double-allocation could help)?

Thanks again,
 Aaron


Re: Understanding fieldCache SUBREADER insanity

2012-09-19 Thread Yonik Seeley
The other thing to realize is that it's only insanity if it's
unexpected or not-by-design (so the term is rather mis-named).
It's more for core developers - if you are just using Solr without
custom plugins, don't worry about it.

-Yonik
http://lucidworks.com


On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe
tomasflo...@gmail.com wrote:
 Hi Aaron, here there is some information about the insanity count:
 http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache

 As for the SUBREADER type, the javadocs say:
 Indicates an overlap in cache usage on a given field in sub/super readers.

 This probably means that you are using the same field for faceting and for
 sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
 cache and faceting uses by default the global field cache. This can be a
 problem because the field is duplicated in cache, and then it uses twice
 the memory.

 One way to solve this would be to change the faceting method on that field
 to 'fcs', which uses segment level cache (but may be a little bit slower).

 Tomás


 On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman daub...@gmail.com wrote:

 Hi all,

 In reviewing a solr instance with somewhat variable performance, I
 noticed that its fieldCache stats show an insanity_count of 1 with the
 insanity type SUBREADER:

 ---snip---
 insanity_count : 1
 insanity#0 : SUBREADER: Found caches for descendants of
 ReadOnlyDirectoryReader(segments_k
 _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
 'ReadOnlyDirectoryReader(segments_k

 _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1965982057
 'ReadOnlyDirectoryReader(segments_k

 _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,null=[F#1965982057

 'MMapIndexInput(path=/io01/p/solr/playlist/a/playlist/index/_6h9.frq)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1308116426
 ---snip---

 How can I decipher what this means and what, if anything, I should do
 to fix/improve the insanity?

 Thanks,
  Aaron



Re: Understanding fieldCache SUBREADER insanity

2012-09-19 Thread Tomás Fernández Löbbe
Some function queries also use the field cache. I *think* those usually use
the segment level cache, but I'm not sure.

On Wed, Sep 19, 2012 at 4:36 PM, Yonik Seeley yo...@lucidworks.com wrote:

 The other thing to realize is that it's only insanity if it's
 unexpected or not-by-design (so the term is rather mis-named).
 It's more for core developers - if you are just using Solr without
 custom plugins, don't worry about it.

 -Yonik
 http://lucidworks.com


 On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe
 tomasflo...@gmail.com wrote:
  Hi Aaron, here there is some information about the insanity count:
  http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache
 
  As for the SUBREADER type, the javadocs say:
  Indicates an overlap in cache usage on a given field in sub/super
 readers.
 
  This probably means that you are using the same field for faceting and
 for
  sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
  cache and faceting uses by default the global field cache. This can be a
  problem because the field is duplicated in cache, and then it uses twice
  the memory.
 
  One way to solve this would be to change the faceting method on that
 field
  to 'fcs', which uses segment level cache (but may be a little bit
 slower).
 
  Tomás
 
 
  On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman daub...@gmail.com
 wrote:
 
  Hi all,
 
  In reviewing a solr instance with somewhat variable performance, I
  noticed that its fieldCache stats show an insanity_count of 1 with the
  insanity type SUBREADER:
 
  ---snip---
  insanity_count : 1
  insanity#0 : SUBREADER: Found caches for descendants of
  ReadOnlyDirectoryReader(segments_k
  _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
  'ReadOnlyDirectoryReader(segments_k
 
 
 _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1965982057
  'ReadOnlyDirectoryReader(segments_k
 
 
 _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,null=[F#1965982057
 
 
 'MMapIndexInput(path=/io01/p/solr/playlist/a/playlist/index/_6h9.frq)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1308116426
  ---snip---
 
  How can I decipher what this means and what, if anything, I should do
  to fix/improve the insanity?
 
  Thanks,
   Aaron
 



Re: Understanding fieldCache SUBREADER insanity

2012-09-19 Thread Yonik Seeley
 already-optimized, single-segment index

That part is interesting... if true, then the type of insanity you
saw should be impossible, and either the insanity detection or
something else is broken.

-Yonik
http://lucidworks.com