After a week away, I'm back and still working to get to the bottom of this
issue. We run Lucene from the binaries, so making changes to the source code is
not something we are really setup to do right now.
I have, however, created a trivial Java app that just opens an IndexReader for
our problematic index and then closes it:
try {
IndexReader indexReader =
IndexReader.open(getIndexDirectory(indexPath));
System.out.println("Successfully opened index at " + indexPath);
indexReader.close();
System.out.println("Successfully closed index at " + indexPath);
} catch (Exception ex) {
System.out.println("Exception while opening index: " +
ex.getMessage());
}
I've run this simple app with the hprof commands suggested below and it appears
that a huge amount of the CPU work is spent on String function(s). Below is the
summary from the end of the java.hprof.txt. I'm happy to attach the whole file,
but I wasn't sure whether that was appropriate for this mailing list.
Thanks,
Mark
CPU SAMPLES BEGIN (total = 5295) Wed Nov 17 11:54:15 2010
rank self accum count trace method
1 80.40% 80.40% 4257 300165 java.lang.String.intern
2 1.83% 82.23% 97 300189 sun.nio.ch.FileDispatcher.pread0
3 0.83% 83.06% 44 300232 java.util.HashMap.transfer
4 0.72% 83.78% 38 300201 sun.nio.ch.FileDispatcher.pread0
5 0.70% 84.48% 37 300252
org.apache.lucene.util.SimpleStringInterner.intern
6 0.60% 85.08% 32 300191 java.lang.StringCoding$StringDecoder.decode
7 0.59% 85.67% 31 300202 java.lang.System.arraycopy
8 0.38% 86.04% 20 300098 java.util.zip.ZipFile.read
9 0.36% 86.40% 19 300203 java.util.Arrays.copyOfRange
10 0.36% 86.76% 19 300224 sun.nio.ch.FileDispatcher.pread0
11 0.32% 87.08% 17 300089 java.lang.Class.forName0
12 0.32% 87.40% 17 300237 java.lang.Thread.currentThread
13 0.28% 87.69% 15 300049 java.lang.ClassLoader.findBootstrapClass
14 0.28% 87.97% 15 300102 java.util.zip.ZipFile.read
15 0.26% 88.23% 14 300180 java.util.zip.ZipFile.read
16 0.26% 88.50% 14 300255 java.lang.Thread.currentThread
17 0.26% 88.76% 14 300335 sun.nio.ch.FileDispatcher.pread0
18 0.25% 89.01% 13 300164 java.lang.System.arraycopy
19 0.25% 89.25% 13 300286 sun.nio.ch.NativeThread.current
20 0.23% 89.48% 12 300240 sun.nio.ch.FileDispatcher.pread0
21 0.23% 89.71% 12 300242 java.lang.System.arraycopy
22 0.21% 89.92% 11 300207 java.lang.Thread.currentThread
23 0.21% 90.12% 11 300231 java.lang.System.getSecurityManager
24 0.19% 90.31% 10 300155 java.util.zip.ZipFile.read
25 0.19% 90.50% 10 300216 java.lang.ClassLoader.findBootstrapClass
26 0.19% 90.69% 10 300239 java.nio.Bits.copyToByteArray
27 0.19% 90.88% 10 300350 java.util.HashMap.values
28 0.17% 91.05% 9 300034
sun.net.www.protocol.file.Handler.createFileURLConnection
29 0.17% 91.22% 9 300283 sun.nio.ch.FileDispatcher.pread0
30 0.15% 91.37% 8 300006 java.util.jar.JarFile.getBytes
31 0.15% 91.52% 8 300008 java.util.zip.ZipFile.getInputStream
32 0.15% 91.67% 8 300166 java.util.zip.ZipFile.read
33 0.15% 91.82% 8 300179 java.lang.ClassLoader.findBootstrapClass
34 0.15% 91.97% 8 300209 sun.nio.ch.FileDispatcher.pread0
35 0.13% 92.11% 7 300123 java.lang.ClassLoader$NativeLibrary.load
36 0.13% 92.24% 7 300140 sun.nio.ch.FileDispatcher.pread0
37 0.13% 92.37% 7 300225 sun.nio.ch.FileDispatcher.pread0
38 0.13% 92.50% 7 300246 java.nio.Bits.copyToByteArray
39 0.11% 92.62% 6 300031 java.util.zip.ZipFile.read
40 0.11% 92.73% 6 300059 java.io.FileInputStream.readBytes
41 0.11% 92.84% 6 300101 java.lang.ClassLoader.findBootstrapClass
42 0.11% 92.96% 6 300138 java.lang.ClassLoader.findBootstrapClass
43 0.11% 93.07% 6 300241 sun.nio.ch.FileDispatcher.pread0
44 0.11% 93.18% 6 300282 java.lang.Thread.currentThread
45 0.11% 93.30% 6 300290 org.apache.lucene.index.TermInfosReader.<init>
46 0.11% 93.41% 6 300311 org.apache.lucene.util.UnicodeUtil.UTF8toUTF16
47 0.09% 93.50% 5 300047 java.util.zip.ZipFile.read
48 0.09% 93.60% 5 300057 java.io.UnixFileSystem.getBooleanAttributes0
49 0.09% 93.69% 5 300064 sun.security.jca.Providers.<clinit>
50 0.09% 93.79% 5 300254 sun.nio.ch.NativeThread.current
51 0.09% 93.88% 5 300324 org.apache.lucene.index.SegmentTermEnum.next
52 0.09% 93.98% 5 300340 java.util.HashMap.put
53 0.08% 94.05% 4 300007 java.util.zip.ZipFile.getInputStream
54 0.08% 94.13% 4 300009 java.util.zip.ZipFile.getInflater
55 0.08% 94.20% 4 300010 java.util.jar.JarFile.getManifestFromReference
56 0.08% 94.28% 4 300051 java.lang.ClassLoader.findBootstrapClass
57 0.08% 94.35% 4 300054 java.lang.ClassLoader.findBootstrapClass
58 0.08% 94.43% 4 300083 java.util.HashMap.entrySet0
59 0.08% 94.50% 4 300108 java.util.zip.ZipFile.read
60 0.08% 94.58% 4 300135 java.util.zip.ZipFile.read
61 0.08% 94.66% 4 300142 java.util.zip.ZipFile.read
62 0.08% 94.73% 4 300238 java.lang.Thread.currentThread
63 0.08% 94.81% 4 300247 sun.nio.ch.FileDispatcher.pread0
64 0.08% 94.88% 4 300253 java.lang.Thread.currentThread
65 0.08% 94.96% 4 300257 java.util.HashMap.resize
66 0.08% 95.03% 4 300275 sun.nio.ch.FileDispatcher.pread0
67 0.08% 95.11% 4 300295 org.apache.lucene.index.TermBuffer.read
68 0.08% 95.18% 4 300299 org.apache.lucene.index.SegmentTermEnum.next
69 0.06% 95.24% 3 300004 java.util.zip.ZipFile.getEntry
70 0.06% 95.30% 3 300021 sun.misc.URLClassPath$3.run
71 0.06% 95.35% 3 300050 java.util.zip.ZipFile.read
72 0.06% 95.41% 3 300055 java.security.MessageDigest.getInstance
73 0.06% 95.47% 3 300124 java.lang.ClassLoader$NativeLibrary.load
74 0.06% 95.52% 3 300249 java.util.HashMap.getEntry
75 0.06% 95.58% 3 300250 java.util.HashMap.getEntry
76 0.06% 95.64% 3 300261 java.lang.System.arraycopy
77 0.06% 95.69% 3 300267 java.util.Arrays.copyOf
78 0.06% 95.75% 3 300276 org.apache.lucene.index.TermInfosReader.<init>
79 0.06% 95.81% 3 300277 org.apache.lucene.index.SegmentTermEnum.next
80 0.06% 95.86% 3 300300 org.apache.lucene.index.TermInfosReader.<init>
81 0.06% 95.92% 3 300304 org.apache.lucene.store.IndexInput.readVLong
82 0.06% 95.98% 3 300318 sun.nio.ch.NativeThread.current
83 0.06% 96.03% 3 300338 sun.nio.cs.UTF_8.updatePositions
84 0.06% 96.09% 3 300339
org.apache.lucene.util.SimpleStringInterner.intern
85 0.04% 96.13% 2 300001 java.lang.ClassLoader.findBootstrapClass
86 0.04% 96.17% 2 300079 java.lang.Math.floor
87 0.04% 96.20% 2 300085 java.security.Provider.parseLegacyPut
88 0.04% 96.24% 2 300107 org.apache.lucene.index.IndexReader.open
89 0.04% 96.28% 2 300119 java.io.RandomAccessFile.getChannel
90 0.04% 96.32% 2 300190 java.lang.System.arraycopy
91 0.04% 96.36% 2 300197 java.nio.ByteBuffer.hasArray
92 0.04% 96.39% 2 300198 java.util.HashMap.put
93 0.04% 96.43% 2 300199 java.util.HashMap.hash
94 0.04% 96.47% 2 300210 java.util.HashMap.addEntry
95 0.04% 96.51% 2 300217 java.util.HashMap.hash
96 0.04% 96.54% 2 300220 java.nio.Buffer.position
97 0.04% 96.58% 2 300222 org.apache.lucene.index.FieldInfos.read
98 0.04% 96.62% 2 300235 java.util.Arrays.copyOf
99 0.04% 96.66% 2 300243 java.lang.System.arraycopy
100 0.04% 96.69% 2 300248 org.apache.lucene.index.FieldInfos.hasVectors
101 0.04% 96.73% 2 300258 java.lang.Thread.currentThread
102 0.04% 96.77% 2 300262 org.apache.lucene.util.StringHelper.intern
103 0.04% 96.81% 2 300264 java.lang.Thread.isInterrupted
104 0.04% 96.85% 2 300292 org.apache.lucene.index.TermInfosReader.<init>
105 0.04% 96.88% 2 300297
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal
106 0.04% 96.92% 2 300309 sun.nio.ch.FileDispatcher.pread0
107 0.04% 96.96% 2 300319 org.apache.lucene.store.IndexInput.readVLong
108 0.04% 97.00% 2 300321 org.apache.lucene.store.IndexInput.readVLong
109 0.04% 97.03% 2 300323
org.apache.lucene.store.BufferedIndexInput.refill
110 0.04% 97.07% 2 300326 sun.nio.ch.NativeThread.current
111 0.04% 97.11% 2 300332
org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores
112 0.04% 97.15% 2 300334 org.apache.lucene.index.SegmentReader.get
113 0.04% 97.19% 2 300341 java.util.HashMap.put
114 0.04% 97.22% 2 300344
org.apache.lucene.util.SimpleStringInterner.intern
115 0.04% 97.26% 2 300346 org.apache.lucene.store.IndexInput.readString
116 0.02% 97.28% 1 300003 java.util.zip.ZipFile.open
117 0.02% 97.30% 1 300014 java.util.jar.Attributes.putValue
118 0.02% 97.32% 1 300019 sun.misc.URLClassPath.getLoader
119 0.02% 97.34% 1 300023 sun.misc.URLClassPath$JarLoader.ensureOpen
120 0.02% 97.36% 1 300027 sun.misc.URLClassPath$JarLoader.checkResource
121 0.02% 97.37% 1 300029 sun.security.util.ManifestEntryVerifier.<init>
122 0.02% 97.39% 1 300036 sun.net.www.URLConnection.<init>
123 0.02% 97.41% 1 300039 java.io.FilePermission$1.run
124 0.02% 97.43% 1 300060 java.util.Properties$LineReader.readLine
125 0.02% 97.45% 1 300066 sun.security.jca.ProviderList.<clinit>
126 0.02% 97.47% 1 300071 java.security.Provider.<init>
127 0.02% 97.49% 1 300075 sun.security.jca.ProviderConfig.getLock
128 0.02% 97.51% 1 300081 sun.security.provider.NativePRNG.initIO
129 0.02% 97.53% 1 300086 java.lang.Character.toUpperCaseEx
130 0.02% 97.54% 1 300088 java.util.HashMap.put
131 0.02% 97.56% 1 300093 org.apache.lucene.store.FSDirectory.<clinit>
132 0.02% 97.58% 1 300095 java.lang.ClassLoader.defineClass1
133 0.02% 97.60% 1 300096 java.util.zip.Inflater.inflateBytes
134 0.02% 97.62% 1 300097 sun.security.provider.MD5.implDigest
135 0.02% 97.64% 1 300099 java.util.zip.Inflater.inflateBytes
136 0.02% 97.66% 1 300103 java.lang.ClassLoader.defineClass1
137 0.02% 97.68% 1 300104 java.util.Arrays.copyOf
138 0.02% 97.70% 1 300105 java.lang.String.indexOf
139 0.02% 97.71% 1 300106 java.util.zip.InflaterInputStream.<init>
140 0.02% 97.73% 1 300109 java.util.zip.ZipFile.read
141 0.02% 97.75% 1 300111 java.util.Arrays.copyOfRange
142 0.02% 97.77% 1 300113 java.util.zip.Inflater.inflateBytes
143 0.02% 97.79% 1 300115 java.util.Arrays.copyOf
144 0.02% 97.81% 1 300116 java.lang.ClassLoader.findBootstrapClass
145 0.02% 97.83% 1 300117 java.lang.String.lastIndexOf
146 0.02% 97.85% 1 300122 sun.security.action.LoadLibraryAction.<init>
147 0.02% 97.87% 1 300127 sun.nio.ch.FileChannelImpl.<init>
148 0.02% 97.88% 1 300132 java.nio.DirectByteBuffer.<init>
149 0.02% 97.90% 1 300136 java.util.Arrays.copyOfRange
150 0.02% 97.92% 1 300141 java.lang.ref.SoftReference.get
151 0.02% 97.94% 1 300143 java.util.zip.Inflater.inflateBytes
152 0.02% 97.96% 1 300145 org.apache.lucene.index.SegmentInfos.read
153 0.02% 97.98% 1 300146 java.util.zip.CRC32.update
154 0.02% 98.00% 1 300147 java.nio.CharBuffer.hasArray
155 0.02% 98.02% 1 300148 java.lang.ClassLoader.defineClass1
156 0.02% 98.04% 1 300149 org.apache.lucene.index.DirectoryReader.<init>
157 0.02% 98.05% 1 300150 java.lang.ClassLoader.defineClass1
158 0.02% 98.07% 1 300151 java.lang.AbstractStringBuilder.<init>
159 0.02% 98.09% 1 300154 java.lang.ClassLoader.defineClass1
160 0.02% 98.11% 1 300157
org.apache.lucene.index.SegmentReader$CoreReaders.<init>
161 0.02% 98.13% 1 300159 org.apache.lucene.util.StringHelper.<clinit>
162 0.02% 98.15% 1 300161
org.apache.lucene.util.SimpleStringInterner.<init>
163 0.02% 98.17% 1 300167 java.lang.ClassLoader.defineClass1
164 0.02% 98.19% 1 300168
org.apache.lucene.index.SegmentReader$CoreReaders.<init>
165 0.02% 98.21% 1 300170 org.apache.lucene.index.SegmentTermEnum.<init>
166 0.02% 98.22% 1 300174 java.util.zip.Inflater.inflateBytes
167 0.02% 98.24% 1 300177 java.security.AccessController.doPrivileged
168 0.02% 98.26% 1 300181 java.util.zip.Inflater.inflateBytes
169 0.02% 98.28% 1 300182 java.util.Arrays.copyOf
170 0.02% 98.30% 1 300183 java.util.Arrays.copyOf
171 0.02% 98.32% 1 300184 java.lang.ClassLoader.defineClass1
172 0.02% 98.34% 1 300185 java.lang.ref.SoftReference.get
173 0.02% 98.36% 1 300186 java.lang.String.replace
174 0.02% 98.38% 1 300187
org.apache.lucene.index.SegmentReader.openNorms
175 0.02% 98.39% 1 300188 java.nio.charset.CharsetDecoder.flush
176 0.02% 98.41% 1 300192 sun.nio.cs.UTF_8$Decoder.decodeLoop
177 0.02% 98.43% 1 300193 java.lang.StringCoding.decode
178 0.02% 98.45% 1 300194 java.lang.System.arraycopy
179 0.02% 98.47% 1 300195 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
180 0.02% 98.49% 1 300196 java.util.HashMap.hash
181 0.02% 98.51% 1 300200 sun.nio.cs.UTF_8$Decoder.isMalformed2
182 0.02% 98.53% 1 300204 java.lang.System.arraycopy
183 0.02% 98.55% 1 300205 java.io.RandomAccessFile.open
184 0.02% 98.56% 1 300206
org.apache.lucene.util.SimpleStringInterner.intern
185 0.02% 98.58% 1 300208 java.nio.charset.CoderResult.isUnderflow
186 0.02% 98.60% 1 300211
org.apache.lucene.util.SimpleStringInterner.intern
187 0.02% 98.62% 1 300212 java.util.HashMap.addEntry
188 0.02% 98.64% 1 300213 org.apache.lucene.store.IndexInput.readVInt
189 0.02% 98.66% 1 300214 java.util.ArrayList.RangeCheck
190 0.02% 98.68% 1 300218 org.apache.lucene.index.FieldInfos.read
191 0.02% 98.70% 1 300219 java.lang.Thread.currentThread
192 0.02% 98.72% 1 300221 java.util.HashMap.transfer
193 0.02% 98.73% 1 300223 java.lang.System.arraycopy
194 0.02% 98.75% 1 300226 org.apache.lucene.store.IndexInput.readString
195 0.02% 98.77% 1 300227
org.apache.lucene.store.BufferedIndexInput.readBytes
196 0.02% 98.79% 1 300228 java.util.ArrayList.size
197 0.02% 98.81% 1 300229 java.lang.StringCoding.access$100
198 0.02% 98.83% 1 300230 java.lang.Thread.currentThread
199 0.02% 98.85% 1 300233 java.lang.Throwable.fillInStackTrace
200 0.02% 98.87% 1 300236 java.lang.StringCoding.decode
201 0.02% 98.89% 1 300244 sun.nio.ch.FileDispatcher.pread0
202 0.02% 98.90% 1 300245 sun.nio.ch.NativeThread.current
203 0.02% 98.92% 1 300251 java.util.HashMap.getEntry
204 0.02% 98.94% 1 300259 java.nio.Bits.copyToByteArray
205 0.02% 98.96% 1 300260 java.nio.DirectByteBuffer.get
206 0.02% 98.98% 1 300263 java.lang.Thread.currentThread
207 0.02% 99.00% 1 300265 sun.nio.ch.FileChannelImpl.read
208 0.02% 99.02% 1 300266
java.nio.channels.spi.AbstractInterruptibleChannel.begin
209 0.02% 99.04% 1 300268 sun.nio.ch.NativeThread.current
210 0.02% 99.06% 1 300269 sun.nio.ch.FileChannelImpl.read
211 0.02% 99.07% 1 300270 java.nio.Bits.copyToByteArray
212 0.02% 99.09% 1 300271 java.lang.Thread.currentThread
213 0.02% 99.11% 1 300272 java.lang.Object.clone
214 0.02% 99.13% 1 300273 org.apache.lucene.index.TermInfosReader.<init>
215 0.02% 99.15% 1 300274 org.apache.lucene.index.TermInfosReader.<init>
216 0.02% 99.17% 1 300278 org.apache.lucene.index.SegmentTermEnum.next
217 0.02% 99.19% 1 300279
java.nio.channels.spi.AbstractInterruptibleChannel.isOpen
218 0.02% 99.21% 1 300280 java.lang.Thread.isInterrupted
219 0.02% 99.23% 1 300281 java.lang.Thread.currentThread
220 0.02% 99.24% 1 300284 java.lang.System.arraycopy
221 0.02% 99.26% 1 300285 java.lang.System.arraycopy
222 0.02% 99.28% 1 300287 java.lang.Thread.currentThread
223 0.02% 99.30% 1 300288 java.nio.Bits.copyToByteArray
224 0.02% 99.32% 1 300289
org.apache.lucene.store.BufferedIndexInput.readBytes
225 0.02% 99.34% 1 300291
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal
226 0.02% 99.36% 1 300293 java.lang.Thread.isInterrupted
227 0.02% 99.38% 1 300294
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal
228 0.02% 99.40% 1 300296
org.apache.lucene.store.BufferedIndexInput.readByte
229 0.02% 99.41% 1 300298 sun.nio.ch.FileChannelImpl.read
230 0.02% 99.43% 1 300301 java.lang.Thread.currentThread
231 0.02% 99.45% 1 300302 sun.nio.ch.FileChannelImpl.ensureOpen
232 0.02% 99.47% 1 300303 sun.nio.ch.FileDispatcher.pread0
233 0.02% 99.49% 1 300305 sun.nio.ch.FileDispatcher.pread0
234 0.02% 99.51% 1 300306 sun.nio.ch.NativeThread.current
235 0.02% 99.53% 1 300307 org.apache.lucene.index.TermBuffer.toTerm
236 0.02% 99.55% 1 300308
org.apache.lucene.store.BufferedIndexInput.refill
237 0.02% 99.57% 1 300310 sun.nio.ch.FileChannelImpl.read
238 0.02% 99.58% 1 300312 sun.nio.ch.NativeThread.current
239 0.02% 99.60% 1 300313 sun.nio.ch.FileChannelImpl.read
240 0.02% 99.62% 1 300314 sun.nio.ch.NativeThread.current
241 0.02% 99.64% 1 300315
org.apache.lucene.store.BufferedIndexInput.refill
242 0.02% 99.66% 1 300316
org.apache.lucene.store.BufferedIndexInput.refill
243 0.02% 99.68% 1 300317 sun.nio.ch.FileChannelImpl.read
244 0.02% 99.70% 1 300320 org.apache.lucene.index.TermBuffer.read
245 0.02% 99.72% 1 300322
org.apache.lucene.store.BufferedIndexInput.refill
246 0.02% 99.74% 1 300325
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal
247 0.02% 99.75% 1 300327 org.apache.lucene.store.IndexInput.readVInt
248 0.02% 99.77% 1 300328
org.apache.lucene.store.BufferedIndexInput.refill
249 0.02% 99.79% 1 300329 java.nio.Bits.copyToByteArray
250 0.02% 99.81% 1 300330
org.apache.lucene.store.BufferedIndexInput.refill
251 0.02% 99.83% 1 300331 sun.nio.ch.NativeThread.current
252 0.02% 99.85% 1 300333 sun.misc.Unsafe.setMemory
253 0.02% 99.87% 1 300336 org.apache.lucene.index.TermInfosReader.<init>
254 0.02% 99.89% 1 300337 java.lang.Thread.currentThread
255 0.02% 99.91% 1 300342 org.apache.lucene.index.FieldInfos.read
256 0.02% 99.92% 1 300343 org.apache.lucene.store.IndexInput.readString
257 0.02% 99.94% 1 300345 sun.nio.ch.IOUtil.read
258 0.02% 99.96% 1 300347
java.nio.channels.spi.AbstractInterruptibleChannel.begin
259 0.02% 99.98% 1 300348 org.apache.lucene.index.FieldInfos.addInternal
260 0.02% 100.00% 1 300349
java.nio.channels.spi.AbstractInterruptibleChannel.begin
CPU SAMPLES END
On Nov 5, 2010, at 10:53 AM, Michael McCandless wrote:
> Hmm...
>
> So, I was going on this output from your CheckIndex:
>
> test: field norms.........OK [296713 fields]
>
> But in fact I just looked and that number is bogus -- it's always
> equal to total number of fields, not number of fields with norms
> enabled. I'll open an issue to fix this, but in the meantime can you
> apply this patch to your CheckIndex and run it again?
>
> Index: src/java/org/apache/lucene/index/CheckIndex.java
> ===================================================================
> --- src/java/org/apache/lucene/index/CheckIndex.java (revision 1031678)
> +++ src/java/org/apache/lucene/index/CheckIndex.java (working copy)
> @@ -570,8 +570,10 @@
> }
> final byte[] b = new byte[reader.maxDoc()];
> for (final String fieldName : fieldNames) {
> - reader.norms(fieldName, b, 0);
> - ++status.totFields;
> + if (reader.hasNorms(fieldName)) {
> + reader.norms(fieldName, b, 0);
> + ++status.totFields;
> + }
> }
>
> msg("OK [" + status.totFields + " fields]");
>
> So if in fact you have already disabled norms then something else is
> the source of the sudden slowness. Though, such a huge number of
> unique field names is not an area of Lucene that's very well tested...
> perhaps there's something silly somewhere. Maybe you can try
> profiling just the init of your IndexReader? (Eg, run java with
> -agentlib:hprof=cpu=samples,depth=16,interval=1).
>
> Yes, both Index.NOT_ANALYZED_NO_NORMS and Index.NO will disable norms
> as long as no document in the index ever had norms on (yes it does
> "infect" heh).
>
> Mike
>
> On Fri, Nov 5, 2010 at 1:37 PM, Mark Kristensson
> <[email protected]> wrote:
>> While most of our Lucene indexes are used for more traditional searching,
>> this index in particular is used more like a reporting repository. Thus, we
>> really do need to have that many fields indexed and they do need to be
>> broken out into separate fields. There may be another way to structure the
>> index to reduce the number of fields, but I'm hoping we can optimize the
>> current design and avoid (yet another) index redesign.
>>
>> I'll look into the tweaking the merge policy, but I'm more interested in
>> disabling norms because scoring really doesn't matter for us. Basically, we
>> need nothing more than a binary answer from Lucene: either a record meets
>> the provided criteria (which can be a rather complex boolean query with many
>> subqueries) or it doesn't. If the record does match, then we get the IDs
>> from lucene and run off to get the live data from our primary data store and
>> sort it (in Java) based upon criteria provided by the user, not by score.
>>
>> After our initial design mushroomed in size, we redesigned and now (I
>> thought) do not have norms on any of the fields in this index. So, I'm
>> wondering if there was something in the results from the CheckIndex that I
>> provided which indicates to you that we may have norms still enabled? I know
>> that if you have norms on any one document's field, then any other document
>> with that same field will get "infected" with norms as well.
>>
>> My understanding is that any field that uses the constants
>> Index.NOT_ANALYZED_NO_NORMS or Index.NO will not have norms on it,
>> regardless of whether or not the field is stored. Is that not correct?
>>
>> Thanks,
>> Mark
>>
>>
>>
>> On Nov 4, 2010, at 2:56 AM, Michael McCandless wrote:
>>
>>> Likely what happened is you had a bunch of smaller segments, and then
>>> suddenly they got merged into that one big segment (_aiaz) in your
>>> index.
>>>
>>> The representation for norms in particular is not sparse, so this
>>> means the size of the norms file for a given segment will be
>>> number-of-unique-indexed-fields X number-of-documents.
>>>
>>> So this count grows quadratically on merge.
>>>
>>> Do these fields really need to be indexed? If so, it'd be better to
>>> use a single field for all users for the indexable text if you can.
>>>
>>> Failing that, a simple workaround is to set the maxMergeMB/Docs on the
>>> merge policy; this'd prevent big segments from being produced.
>>> Disabling norms should also workaround this, though that will affect
>>> hit scores...
>>>
>>> Mike
>>>
>>> On Wed, Nov 3, 2010 at 7:37 PM, Mark Kristensson
>>> <[email protected]> wrote:
>>>> Yes, we do have a large number of unique field names in that index,
>>>> because they are driven by user named fields in our application (with some
>>>> cleaning to remove illegal chars).
>>>>
>>>> This slowness problem has appeared very suddenly in the last couple of
>>>> weeks and the number of unique field names has not spiked in the last few
>>>> weeks. Have we crept over some threshold with our linear growth in the
>>>> number of unique field names? Perhaps there is a limit driven by the
>>>> amount of RAM in the machine that we are violating? Are there any
>>>> guidelines for the maximum number, or suggested number, of unique fields
>>>> names in an index or segment? Any suggestions for potentially mitigating
>>>> the problem?
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>
>>>> On Nov 3, 2010, at 2:02 PM, Michael McCandless wrote:
>>>>
>>>>> On Wed, Nov 3, 2010 at 4:27 PM, Mark Kristensson
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> I've run checkIndex against the index and the results are below. That
>>>>>> net is that it's telling me nothing is wrong with the index.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>> I did not have any instrumentation around the opening of the
>>>>>> IndexSearcher (we don't use an IndexReader), just around the actual
>>>>>> query execution so I had to add some additional logging. What I found
>>>>>> surprised me, opening a search against this index takes the same 6 to 8
>>>>>> seconds that closing the indexWriter takes.
>>>>>
>>>>> IndexWriter opens a SegmentReader for each segment in the index, to
>>>>> apply deletions, so I think this is the source of the slowness.
>>>>>
>>>>> From the CheckIndex output, it looks like you have many (296,713)
>>>>> unique fields names on that one large segment -- does that sound
>>>>> right? I suspect such a very high field count is the source of the
>>>>> slowness...
>>>>>
>>>>> Mike
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>