Re: Understanding fieldCache SUBREADER insanity
Hi Yonik, I've been attempting to fix the SUBREADER insanity in our custom component, and have made perhaps some progress (or is this worse?) - I've gone from SUBREADER to VALUEMISMATCH insanity: ---snip--- entries_count : 12 entry#0 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='f_normalizedTotalHotttnesss',class org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache,null=org.apache.lucene.util.FixedBitSet#1387502754 entry#1 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_track_count',class org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache,null=org.apache.lucene.util.Bits$MatchAllBits#233863705 entry#2 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='s_artistID',class org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#652215925 entry#3 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='s_artistID',class java.lang.String,null=[Ljava.lang.String;#1036517187 entry#4 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='thingID',class java.lang.String,null=[Ljava.lang.String;#357017445 entry#5 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='f_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#322888397 entry#6 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='f_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.DEFAULT_FLOAT_PARSER=org.apache.lucene.search.FieldCache$CreationPlaceholder#1229311421 entry#7 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='f_normalizedTotalHotttnesss',float,null=[F#322888397 entry#8 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_collapse',int,org.apache.lucene.search.FieldCache.DEFAULT_INT_PARSER=org.apache.lucene.search.FieldCache$CreationPlaceholder#92920526 entry#9 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_collapse',int,null=[I#494669113 entry#10 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_collapse',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#494669113 entry#11 : 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='i_track_count',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#994584654 insanity_count : 1 insanity#0 : VALUEMISMATCH: Multiple distinct value objects for MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)+s_artistID 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='s_artistID',class org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#652215925 'MMapIndexInput(path=/io01/p/solr/playlist/c/playlist/index/_c2.frq)'='s_artistID',class java.lang.String,null=[Ljava.lang.String;#1036517187 ---snip--- Any suggestions on what the cause of this VALUEMISMATCH is, if it is the normal case, or suggestions on how to fix it. For anybody else with SUBREADER insanity issues, this is the change I made to get this far (get the first leafReader, since we are using a merged/optimized index): ---snip--- SolrIndexReader reader = searcher.getReader().getLeafReaders()[0]; collapseIDs = FieldCache.DEFAULT.getInts(reader, COLLAPSE_KEY_NAME); hotnessValues = FieldCache.DEFAULT.getFloats(reader, HOTNESS_KEY_NAME); artistIDs = FieldCache.DEFAULT.getStrings(reader, ARTIST_KEY_NAME); ---snip--- Thanks, Aaron On Wed, Sep 19, 2012 at 4:54 PM, Yonik Seeley yo...@lucidworks.com wrote: already-optimized, single-segment index That part is interesting... if true, then the type of insanity you saw should be impossible, and either the insanity detection or something else is broken. -Yonik http://lucidworks.com
Re: Understanding fieldCache SUBREADER insanity
Yonik, et al. I believe I found the section of code pushing me into 'insanity' status: ---snip--- int[] collapseIDs = null; float[] hotnessValues = null; String[] artistIDs = null; try { collapseIDs = FieldCache.DEFAULT.getInts(searcher.getIndexReader(), COLLAPSE_KEY_NAME); hotnessValues = FieldCache.DEFAULT.getFloats(searcher.getIndexReader(), HOTNESS_KEY_NAME); artistIDs = FieldCache.DEFAULT.getStrings(searcher.getIndexReader(), ARTIST_KEY_NAME); } ... ---snip--- Since it seems like this code is using the 'old-style' pre-Lucene 2.9 top-level indexReaders, is there any example code you can point me to that could show how to convert to using the leaf level segmentReaders? If the limited information I've been able to find is correct, this could explain some of the significant memory usage I am seeing... Thanks again, Aaron On Wed, Sep 19, 2012 at 4:54 PM, Yonik Seeley yo...@lucidworks.com wrote: already-optimized, single-segment index That part is interesting... if true, then the type of insanity you saw should be impossible, and either the insanity detection or something else is broken. -Yonik http://lucidworks.com
Understanding fieldCache SUBREADER insanity
Hi all, In reviewing a solr instance with somewhat variable performance, I noticed that its fieldCache stats show an insanity_count of 1 with the insanity type SUBREADER: ---snip--- insanity_count : 1 insanity#0 : SUBREADER: Found caches for descendants of ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1965982057 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,null=[F#1965982057 'MMapIndexInput(path=/io01/p/solr/playlist/a/playlist/index/_6h9.frq)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1308116426 ---snip--- How can I decipher what this means and what, if anything, I should do to fix/improve the insanity? Thanks, Aaron
Re: Understanding fieldCache SUBREADER insanity
Hi Aaron, here there is some information about the insanity count: http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache As for the SUBREADER type, the javadocs say: Indicates an overlap in cache usage on a given field in sub/super readers. This probably means that you are using the same field for faceting and for sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level cache and faceting uses by default the global field cache. This can be a problem because the field is duplicated in cache, and then it uses twice the memory. One way to solve this would be to change the faceting method on that field to 'fcs', which uses segment level cache (but may be a little bit slower). Tomás On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman daub...@gmail.com wrote: Hi all, In reviewing a solr instance with somewhat variable performance, I noticed that its fieldCache stats show an insanity_count of 1 with the insanity type SUBREADER: ---snip--- insanity_count : 1 insanity#0 : SUBREADER: Found caches for descendants of ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1965982057 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,null=[F#1965982057 'MMapIndexInput(path=/io01/p/solr/playlist/a/playlist/index/_6h9.frq)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1308116426 ---snip--- How can I decipher what this means and what, if anything, I should do to fix/improve the insanity? Thanks, Aaron
Re: Understanding fieldCache SUBREADER insanity
Hi Tomás, This probably means that you are using the same field for faceting and for sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level cache and faceting uses by default the global field cache. This can be a problem because the field is duplicated in cache, and then it uses twice the memory. One way to solve this would be to change the faceting method on that field to 'fcs', which uses segment level cache (but may be a little bit slower). Thanks for explaining what the sparse wiki and javadoc mean - I had read them but had no idea what the implications were ;-) We are not doing any explicit faceting, and this index is also supposed to be a read-only, already-optimized, single-segment index - both of these seem to indicate to (very unknowledgeable about this) me that this could be more of a problem - e.g. what am I doing to cause this since I don't think I need to be using segment-level anything (should be a single segment if I understand optimization and RO indicies) and I am not leveraging faceting? Any pointers on where else to look for what might be causing this (one issue I am currently troubleshooting is too-many-pauses caused by too-frequent GC, so preventing this double-allocation could help)? Thanks again, Aaron
Re: Understanding fieldCache SUBREADER insanity
The other thing to realize is that it's only insanity if it's unexpected or not-by-design (so the term is rather mis-named). It's more for core developers - if you are just using Solr without custom plugins, don't worry about it. -Yonik http://lucidworks.com On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Hi Aaron, here there is some information about the insanity count: http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache As for the SUBREADER type, the javadocs say: Indicates an overlap in cache usage on a given field in sub/super readers. This probably means that you are using the same field for faceting and for sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level cache and faceting uses by default the global field cache. This can be a problem because the field is duplicated in cache, and then it uses twice the memory. One way to solve this would be to change the faceting method on that field to 'fcs', which uses segment level cache (but may be a little bit slower). Tomás On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman daub...@gmail.com wrote: Hi all, In reviewing a solr instance with somewhat variable performance, I noticed that its fieldCache stats show an insanity_count of 1 with the insanity type SUBREADER: ---snip--- insanity_count : 1 insanity#0 : SUBREADER: Found caches for descendants of ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1965982057 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,null=[F#1965982057 'MMapIndexInput(path=/io01/p/solr/playlist/a/playlist/index/_6h9.frq)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1308116426 ---snip--- How can I decipher what this means and what, if anything, I should do to fix/improve the insanity? Thanks, Aaron
Re: Understanding fieldCache SUBREADER insanity
Some function queries also use the field cache. I *think* those usually use the segment level cache, but I'm not sure. On Wed, Sep 19, 2012 at 4:36 PM, Yonik Seeley yo...@lucidworks.com wrote: The other thing to realize is that it's only insanity if it's unexpected or not-by-design (so the term is rather mis-named). It's more for core developers - if you are just using Solr without custom plugins, don't worry about it. -Yonik http://lucidworks.com On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Hi Aaron, here there is some information about the insanity count: http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache As for the SUBREADER type, the javadocs say: Indicates an overlap in cache usage on a given field in sub/super readers. This probably means that you are using the same field for faceting and for sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level cache and faceting uses by default the global field cache. This can be a problem because the field is duplicated in cache, and then it uses twice the memory. One way to solve this would be to change the faceting method on that field to 'fcs', which uses segment level cache (but may be a little bit slower). Tomás On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman daub...@gmail.com wrote: Hi all, In reviewing a solr instance with somewhat variable performance, I noticed that its fieldCache stats show an insanity_count of 1 with the insanity type SUBREADER: ---snip--- insanity_count : 1 insanity#0 : SUBREADER: Found caches for descendants of ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1965982057 'ReadOnlyDirectoryReader(segments_k _6h9(3.3):C17198463)'='tf_normalizedTotalHotttnesss',float,null=[F#1965982057 'MMapIndexInput(path=/io01/p/solr/playlist/a/playlist/index/_6h9.frq)'='tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=[F#1308116426 ---snip--- How can I decipher what this means and what, if anything, I should do to fix/improve the insanity? Thanks, Aaron
Re: Understanding fieldCache SUBREADER insanity
already-optimized, single-segment index That part is interesting... if true, then the type of insanity you saw should be impossible, and either the insanity detection or something else is broken. -Yonik http://lucidworks.com