Re: DiskDocValuesFormat

Wei Wang Sun, 14 Apr 2013 16:48:51 -0700

Unfortunately, I got another problem. My index has 9 segments (9 dvdd
files) with total size is about 22GB. The merging step eventually failed
and I saw an error message:


Exception in thread "main" java.lang.IllegalStateException: this writer hit
an OutOfMemoryError; cannot complete forceMerge
    at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1664)
    at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1610)
    at
com.ea.eadp.data.aem.audience.indexer.tools.IndexingTool.mergeIndex(IndexingTool.java:196)
    at
com.ea.eadp.data.aem.audience.indexer.tools.AudienceIndexer.main(AudienceIndexer.java:46)
Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.OutOfMemoryError: Java heap space
    at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
    at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)

I configured jvm with "-Xmx4096m", and it seems still not enough memory. I
thought DiskDocValuesFormat puts most of the data on disk and there should
not be that much memory consumption. But it seems not the case.

On Sun, Apr 14, 2013 at 4:13 PM, Wei Wang <[email protected]> wrote:

> That makes sense.
>
> BTW, I checked the jar file. Exactly as you pointed out, the services
> files only contains info from lucene-core, without codec from
> lucene-codecs. After adding the maven plugin, now it is running.
>
> Thanks!
>
>
> On Sun, Apr 14, 2013 at 3:26 PM, Uwe Schindler <[email protected]> wrote:
>
>> Hi,
>>
>> > Thanks for the hint. I will double check the jar file.
>> >
>> > I am just a bit puzzled that if the indexing step recognizes 'Disk'
>> codec and
>> > creates index properly, the merge step that immediately follows indexing
>> > seems should also recognize the 'Disk' codec.
>>
>> This is easy to explain: By creating the custom Lucene42 Codec as a
>> Class, you just define the disk format on the initial write (when *new*
>> segments are written with new documents). While merging (or force-merging),
>> Lucene uses the metadata that’s already on disk for the segments to merge.
>> The metadata on disk contains the names of all codec components used. Those
>> metadata is also used when opening IndexReaders. It will then use SPI and
>> META-INF/services files to look up the class that is responsible for e.g.
>> the "Disk" docvalues format. Without the META-INF data, Lucene cannot
>> lookup the segment codecs.
>>
>> Uwe
>>
>> > On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <[email protected]> wrote:
>> >
>> > > Are you sure that you use the ServicesResourceTransformer in your
>> > > shade config?
>> > >
>> > >
>> > > http://maven.apache.org/plugins/maven-shade-
>> > plugin/examples/resource-t
>> > > ransformers.html#ServicesResourceTransformer
>> > >
>> > > The problem is: lucene-core.jar and lucene-codecs.jar both contain
>> > > codec components and their classes are listed in META-INF/services. If
>> > > those files are not correctly merged through this resource
>> > > transformer, the resulting JAR file will miss some codecs.
>> > >
>> > > You can check correctness by opening the final JAR file with a ZIP
>> > > program and check that all files in META-INF/services contain all
>> > > entries merged from all Lucene JARs.
>> > >
>> > > Uwe
>> > >
>> > > -----
>> > > Uwe Schindler
>> > > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > http://www.thetaphi.de
>> > > eMail: [email protected]
>> > >
>> > >
>> > > > -----Original Message-----
>> > > > From: Wei Wang [mailto:[email protected]]
>> > > > Sent: Sunday, April 14, 2013 11:49 PM
>> > > > To: [email protected]
>> > > > Subject: Re: DiskDocValuesFormat
>> > > >
>> > > > Yes, I used Maven Shade plugin, but still have this problem. Here is
>> > > > the Maven output during packaging:
>> > > >
>> > > > [INFO] --- maven-shade-plugin:2.0:shade (default) @
>> > > > audience-profile- indexer --- [INFO] Including
>> > > > commons-collections:commons-
>> > > > collections:jar:3.2.1 in the shaded jar.
>> > > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the shaded
>> jar.
>> > > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the shaded
>> jar.
>> > > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded jar.
>> > > > [INFO] Including junit:junit:jar:4.11 in the shaded jar.
>> > > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the
>> > > > shaded
>> > > jar.
>> > > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in the
>> > > > shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 in
>> > > > the shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in the
>> > > > shaded jar.
>> > > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the
>> shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-analyzers-common:jar:4.2.1
>> > > > in the shaded jar.
>> > > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in the
>> > > > shaded
>> > > jar.
>> > > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded
>> jar.
>> > > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the
>> > > > shaded jar.
>> > > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded jar.
>> > > > [INFO] Replacing original artifact with shaded artifact.
>> > > >
>> > > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <[email protected]>
>> > wrote:
>> > > >
>> > > > > If you create a single JAR file out of multiple Lucene JAR files
>> > > > > use a tool like Maven Shade plugin, otherwise, required metadata
>> > > > > propreties
>> > > > > (META-INF/services) files in the JAR files are not correctly
>> > > > > merged together.
>> > > > >
>> > > > > -----
>> > > > > Uwe Schindler
>> > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
>> > > > > eMail: [email protected]
>> > > > >
>> > > > >
>> > > > > > -----Original Message-----
>> > > > > > From: Wei Wang [mailto:[email protected]]
>> > > > > > Sent: Sunday, April 14, 2013 11:30 PM
>> > > > > > To: [email protected]
>> > > > > > Subject: Re: DiskDocValuesFormat
>> > > > > >
>> > > > > > Hi Adrien,
>> > > > > >
>> > > > > > The Lucene42Codec works well to generate the index with
>> > > > > > DiskDocValuesFormat. But when I tried to merge the index
>> segments
>> > by
>> > > > > > calling:
>> > > > > >
>> > > > > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
>> > > > > > iw.forceMerge(1);
>> > > > > >
>> > > > > > I got the following error message:
>> > > > > >
>> > > > > > Caused by: java.lang.IllegalArgumentException: A SPI class of
>> type
>> > > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' does
>> > not
>> > > > exist.
>> > > > > > You need to add the corresponding JAR file supporting this SPI
>> to
>> > > > > > your classpath.The current classpath supports the following
>> names:
>> > > > > > [Lucene42]
>> > > > > >
>> > > > > > Any hint on this classpath problem? I have created a single jar
>> file
>> > > > > that has all
>> > > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar. And I
>> > > > > > assume the indexing step works well, so Lucene already knows the
>> > > > > > format with name 'Disk'.
>> > > > > >
>> > > > > > Thanks.
>> > > > > >
>> > > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <
>> [email protected]>
>> > > > wrote:
>> > > > > >
>> > > > > > > Hi Wei,
>> > > > > > >
>> > > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang
>> > <[email protected]>
>> > > > > > wrote:
>> > > > > > > > I am trying to use DiskDocValuesFormat for a particular
>> > > > > > > > BinaryDocValuesField. It seems there is no good examples
>> > showing
>> > > > > > > > how to
>> > > > > > > do
>> > > > > > > > this. The only hint I got from various docs and forums is
>> set
>> > > > > > > > some codec
>> > > > > > > in
>> > > > > > > > IndexWriter. Could someone give a few lines of code snippet
>> and
>> > > > > > > > show how
>> > > > > > > to
>> > > > > > > > set DiskDocValuesFormat?
>> > > > > > >
>> > > > > > > Lucene42Codec can be extended to specify the doc values format
>> > to
>> > > > > > > use on a per-field basis. For example:
>> > > > > > >
>> > > > > > > final Codec codec = new Lucene42Codec() {
>> > > > > > >   final Lucene42DocValuesFormat memoryDVFormat = new
>> > > > > > > Lucene42DocValuesFormat();
>> > > > > > >   final DiskDocValuesFormat diskDVFormat = new
>> > > > DiskDocValuesFormat();
>> > > > > > >   @Override
>> > > > > > >   public DocValuesFormat getDocValuesFormatForField(String
>> field)
>> > {
>> > > > > > >     if ("dv_mem".equals(field)) {
>> > > > > > >       // use Lucene42 for "dv_mem"
>> > > > > > >       return memoryDVFormat;
>> > > > > > >     } else {
>> > > > > > >       // use Disk otherwise
>> > > > > > >       return diskDVFormat;
>> > > > > > >     }
>> > > > > > >   }
>> > > > > > > };
>> > > > > > >
>> > > > > > > Then just pass this Codec instance to your IndexWriterConfig.
>> > > > > > >
>> > > > > > > --
>> > > > > > > Adrien
>> > > > > > >
>> > > > > > >
>> ------------------------------------------------------------------
>> > > > > > > --- To unsubscribe, e-mail:
>> > > > > > > [email protected]
>> > > > > > > For additional commands, e-mail: java-user-
>> > [email protected]
>> > > > > > >
>> > > > > > >
>> > > > >
>> > > > >
>> > > > >
>> ---------------------------------------------------------------------
>> > > > > To unsubscribe, e-mail: [email protected]
>> > > > > For additional commands, e-mail: [email protected]
>> > > > >
>> > > > >
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: [email protected]
>> > > For additional commands, e-mail: [email protected]
>> > >
>> > >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: DiskDocValuesFormat

Reply via email to