Sure, There is only one stack trace (that seems to be how the output for this tool works) for java.lang.String.intern:
TRACE 300165: java.lang.String.intern(<Unknown Source>:Unknown line) org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74) org.apache.lucene.util.StringHelper.intern(StringHelper.java:34) org.apache.lucene.index.FieldInfos.read(FieldInfos.java:353) org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71) org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:116) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:576) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:554) org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:105) org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27) org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:75) org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677) org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) org.apache.lucene.index.IndexReader.open(IndexReader.java:316) org.apache.lucene.index.IndexReader.open(IndexReader.java:188) com.smartsheet.Main.main(<Unknown Source>:Unknown line) But there are several for org.apache.lucene.util.SimpleStringInterner.intern and org.apache.lucene.util.StringHelper.intern: TRACE 300252: org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:61) org.apache.lucene.util.StringHelper.intern(StringHelper.java:34) org.apache.lucene.index.FieldInfos.read(FieldInfos.java:353) org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71) org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:116) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:576) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:554) org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:105) org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27) org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:75) org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677) org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) org.apache.lucene.index.IndexReader.open(IndexReader.java:316) org.apache.lucene.index.IndexReader.open(IndexReader.java:188) com.smartsheet.Main.main(<Unknown Source>:Unknown line) TRACE 300339: org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:75) org.apache.lucene.util.StringHelper.intern(StringHelper.java:34) org.apache.lucene.index.FieldInfos.read(FieldInfos.java:353) org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71) org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:116) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:576) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:554) org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:105) org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27) org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:75) org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677) org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) org.apache.lucene.index.IndexReader.open(IndexReader.java:316) org.apache.lucene.index.IndexReader.open(IndexReader.java:188) com.smartsheet.Main.main(<Unknown Source>:Unknown line) TRACE 300262: org.apache.lucene.util.StringHelper.intern(StringHelper.java:34) org.apache.lucene.index.FieldInfos.addInternal(FieldInfos.java:249) org.apache.lucene.index.FieldInfos.read(FieldInfos.java:363) org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71) org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:116) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:576) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:554) org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:105) org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27) org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:75) org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677) org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) org.apache.lucene.index.IndexReader.open(IndexReader.java:316) org.apache.lucene.index.IndexReader.open(IndexReader.java:188) com.smartsheet.Main.main(<Unknown Source>:Unknown line) TRACE 300344: org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:78) org.apache.lucene.util.StringHelper.intern(StringHelper.java:34) org.apache.lucene.index.FieldInfos.read(FieldInfos.java:353) org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71) org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:116) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:576) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:554) org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:105) org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27) org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:75) org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677) org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) org.apache.lucene.index.IndexReader.open(IndexReader.java:316) org.apache.lucene.index.IndexReader.open(IndexReader.java:188) com.smartsheet.Main.main(<Unknown Source>:Unknown line) TRACE 300206: org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:54) org.apache.lucene.util.StringHelper.intern(StringHelper.java:34) org.apache.lucene.index.FieldInfos.read(FieldInfos.java:353) org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71) org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:116) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:576) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:554) org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:105) org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27) org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:75) org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677) org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) org.apache.lucene.index.IndexReader.open(IndexReader.java:316) org.apache.lucene.index.IndexReader.open(IndexReader.java:188) com.smartsheet.Main.main(<Unknown Source>:Unknown line) TRACE 300211: org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:51) org.apache.lucene.util.StringHelper.intern(StringHelper.java:34) org.apache.lucene.index.FieldInfos.read(FieldInfos.java:353) org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71) org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:116) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:576) org.apache.lucene.index.SegmentReader.get(SegmentReader.java:554) org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:105) org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27) org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:75) org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677) org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) org.apache.lucene.index.IndexReader.open(IndexReader.java:316) org.apache.lucene.index.IndexReader.open(IndexReader.java:188) com.smartsheet.Main.main(<Unknown Source>:Unknown line) -Mark On Nov 17, 2010, at 1:51 PM, Michael McCandless wrote: > Lucene interns field names... since you have a truly enormous number > of unique fields it's expected intern will be called alot. > > But that said it's odd that it's this costly. > > Can you post the stack traces that call intern? > > Mike > > On Fri, Nov 5, 2010 at 1:53 PM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> Hmm... >> >> So, I was going on this output from your CheckIndex: >> >> test: field norms.........OK [296713 fields] >> >> But in fact I just looked and that number is bogus -- it's always >> equal to total number of fields, not number of fields with norms >> enabled. I'll open an issue to fix this, but in the meantime can you >> apply this patch to your CheckIndex and run it again? >> >> Index: src/java/org/apache/lucene/index/CheckIndex.java >> =================================================================== >> --- src/java/org/apache/lucene/index/CheckIndex.java (revision 1031678) >> +++ src/java/org/apache/lucene/index/CheckIndex.java (working copy) >> @@ -570,8 +570,10 @@ >> } >> final byte[] b = new byte[reader.maxDoc()]; >> for (final String fieldName : fieldNames) { >> - reader.norms(fieldName, b, 0); >> - ++status.totFields; >> + if (reader.hasNorms(fieldName)) { >> + reader.norms(fieldName, b, 0); >> + ++status.totFields; >> + } >> } >> >> msg("OK [" + status.totFields + " fields]"); >> >> So if in fact you have already disabled norms then something else is >> the source of the sudden slowness. Though, such a huge number of >> unique field names is not an area of Lucene that's very well tested... >> perhaps there's something silly somewhere. Maybe you can try >> profiling just the init of your IndexReader? (Eg, run java with >> -agentlib:hprof=cpu=samples,depth=16,interval=1). >> >> Yes, both Index.NOT_ANALYZED_NO_NORMS and Index.NO will disable norms >> as long as no document in the index ever had norms on (yes it does >> "infect" heh). >> >> Mike >> >> On Fri, Nov 5, 2010 at 1:37 PM, Mark Kristensson >> <mark.kristens...@smartsheet.com> wrote: >>> While most of our Lucene indexes are used for more traditional searching, >>> this index in particular is used more like a reporting repository. Thus, we >>> really do need to have that many fields indexed and they do need to be >>> broken out into separate fields. There may be another way to structure the >>> index to reduce the number of fields, but I'm hoping we can optimize the >>> current design and avoid (yet another) index redesign. >>> >>> I'll look into the tweaking the merge policy, but I'm more interested in >>> disabling norms because scoring really doesn't matter for us. Basically, we >>> need nothing more than a binary answer from Lucene: either a record meets >>> the provided criteria (which can be a rather complex boolean query with >>> many subqueries) or it doesn't. If the record does match, then we get the >>> IDs from lucene and run off to get the live data from our primary data >>> store and sort it (in Java) based upon criteria provided by the user, not >>> by score. >>> >>> After our initial design mushroomed in size, we redesigned and now (I >>> thought) do not have norms on any of the fields in this index. So, I'm >>> wondering if there was something in the results from the CheckIndex that I >>> provided which indicates to you that we may have norms still enabled? I >>> know that if you have norms on any one document's field, then any other >>> document with that same field will get "infected" with norms as well. >>> >>> My understanding is that any field that uses the constants >>> Index.NOT_ANALYZED_NO_NORMS or Index.NO will not have norms on it, >>> regardless of whether or not the field is stored. Is that not correct? >>> >>> Thanks, >>> Mark >>> >>> >>> >>> On Nov 4, 2010, at 2:56 AM, Michael McCandless wrote: >>> >>>> Likely what happened is you had a bunch of smaller segments, and then >>>> suddenly they got merged into that one big segment (_aiaz) in your >>>> index. >>>> >>>> The representation for norms in particular is not sparse, so this >>>> means the size of the norms file for a given segment will be >>>> number-of-unique-indexed-fields X number-of-documents. >>>> >>>> So this count grows quadratically on merge. >>>> >>>> Do these fields really need to be indexed? If so, it'd be better to >>>> use a single field for all users for the indexable text if you can. >>>> >>>> Failing that, a simple workaround is to set the maxMergeMB/Docs on the >>>> merge policy; this'd prevent big segments from being produced. >>>> Disabling norms should also workaround this, though that will affect >>>> hit scores... >>>> >>>> Mike >>>> >>>> On Wed, Nov 3, 2010 at 7:37 PM, Mark Kristensson >>>> <mark.kristens...@smartsheet.com> wrote: >>>>> Yes, we do have a large number of unique field names in that index, >>>>> because they are driven by user named fields in our application (with >>>>> some cleaning to remove illegal chars). >>>>> >>>>> This slowness problem has appeared very suddenly in the last couple of >>>>> weeks and the number of unique field names has not spiked in the last few >>>>> weeks. Have we crept over some threshold with our linear growth in the >>>>> number of unique field names? Perhaps there is a limit driven by the >>>>> amount of RAM in the machine that we are violating? Are there any >>>>> guidelines for the maximum number, or suggested number, of unique fields >>>>> names in an index or segment? Any suggestions for potentially mitigating >>>>> the problem? >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>> On Nov 3, 2010, at 2:02 PM, Michael McCandless wrote: >>>>> >>>>>> On Wed, Nov 3, 2010 at 4:27 PM, Mark Kristensson >>>>>> <mark.kristens...@smartsheet.com> wrote: >>>>>>> >>>>>>> I've run checkIndex against the index and the results are below. That >>>>>>> net is that it's telling me nothing is wrong with the index. >>>>>> >>>>>> Thanks. >>>>>> >>>>>>> I did not have any instrumentation around the opening of the >>>>>>> IndexSearcher (we don't use an IndexReader), just around the actual >>>>>>> query execution so I had to add some additional logging. What I found >>>>>>> surprised me, opening a search against this index takes the same 6 to 8 >>>>>>> seconds that closing the indexWriter takes. >>>>>> >>>>>> IndexWriter opens a SegmentReader for each segment in the index, to >>>>>> apply deletions, so I think this is the source of the slowness. >>>>>> >>>>>> From the CheckIndex output, it looks like you have many (296,713) >>>>>> unique fields names on that one large segment -- does that sound >>>>>> right? I suspect such a very high field count is the source of the >>>>>> slowness... >>>>>> >>>>>> Mike >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org