Re: Solr4 - no examples of postingsFormat in schema.xml
On 10/15/2012 12:38 PM, Alan Woodward wrote: See discussion on https://issues.apache.org/jira/browse/SOLR-3843, this was apparently intentional. That also links to the following: http://wiki.apache.org/solr/SolrConfigXml#codecFactory, which suggests you need to use solr.SchemaCodecFactory for per-field codecs - this might solve your postingsFormat exception. I already added this to my solrconfig.xml as a top-level element: Once I added this, I tried Bloom, but I had an incorrect name. That resulted in this error, showing that the codecFactory config element gave me more choices than Lucene40 and Lucene41: SEVERE: java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Bloom' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40, Lucene41, Pulsing41, SimpleText, Memory, BloomFilter, Direct] Once I got that, I knew I had made some progress, so I changed it to BloomFilter and got the error in the previous message. Repasting here without the full stacktrace: SEVERE: null:java.lang.UnsupportedOperationException: Error - org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been constructed without a choice of PostingsFormat Based on that error message, along with something I remember reading during my Google travels, I suspect that not all codecs (BloomFilter being a prime example) have whatever corresponding Solr bits are required. Thanks, Shawn
Re: Solr4 - no examples of postingsFormat in schema.xml
See discussion on https://issues.apache.org/jira/browse/SOLR-3843, this was apparently intentional. That also links to the following: http://wiki.apache.org/solr/SolrConfigXml#codecFactory, which suggests you need to use solr.SchemaCodecFactory for per-field codecs - this might solve your postingsFormat exception. On 15 Oct 2012, at 18:41, Alan Woodward wrote: > >> >> This should not be required, because I am building from source. I compiled >> Solr from lucene-solr source checked out from branch_4x. I grepped the >> entire tree for lucene-codec and found nothing. >> >> It turns out that running 'ant generate-maven-artifacts' created the jar >> file -- along with a huge number of other jars that I don't need. It took >> an extremely long time to run, for a jar that's a little over 300KB. >> >> I would argue that the codecs jar should be created by compiling a dist >> target for Solr. Someone else should determine whether it's appropriate to >> put it in the .war file, but I think it's important enough to make available >> without compiling everything in the Lucene universe. > > I agree - it looks as though the codecs module wasn't added to the solr build > when it was split off. I've created a JIRA ticket > (https://issues.apache.org/jira/browse/SOLR-3947) and added a patch. > > On the error below, I'll have to defer to someone who knows how this actually > works... > >> >> I put this jar in my lib, and now I get a new error when I try the >> BloomFilter postingsFormat: >> >> SEVERE: null:java.lang.UnsupportedOperationException: Error - >> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been >> constructed without a choice of PostingsFormat >> at >> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139) >> at >> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130) >> at >> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335) >> at >> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) >> at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) >> at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) >> at >> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) >> at >> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483) >> at >> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) >> at >> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559) >> at >> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656) >> at >> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792) >> at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772) >> at >> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525) >> at >> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87) >> at >> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) >> at >> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007) >> at >> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) >> at >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) >> >> >
Re: Solr4 - no examples of postingsFormat in schema.xml
> > This should not be required, because I am building from source. I compiled > Solr from lucene-solr source checked out from branch_4x. I grepped the > entire tree for lucene-codec and found nothing. > > It turns out that running 'ant generate-maven-artifacts' created the jar file > -- along with a huge number of other jars that I don't need. It took an > extremely long time to run, for a jar that's a little over 300KB. > > I would argue that the codecs jar should be created by compiling a dist > target for Solr. Someone else should determine whether it's appropriate to > put it in the .war file, but I think it's important enough to make available > without compiling everything in the Lucene universe. I agree - it looks as though the codecs module wasn't added to the solr build when it was split off. I've created a JIRA ticket (https://issues.apache.org/jira/browse/SOLR-3947) and added a patch. On the error below, I'll have to defer to someone who knows how this actually works... > > I put this jar in my lib, and now I get a new error when I try the > BloomFilter postingsFormat: > > SEVERE: null:java.lang.UnsupportedOperationException: Error - > org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been > constructed without a choice of PostingsFormat >at > org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139) >at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130) >at > org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335) >at > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) >at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) >at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) >at > org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) >at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483) >at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) >at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559) >at > org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656) >at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792) >at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772) >at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525) >at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87) >at > org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) >at > org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007) >at > org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) >at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) >at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750) >at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) >at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) > >
Re: Solr4 - no examples of postingsFormat in schema.xml
On 10/15/2012 2:47 AM, Alan Woodward wrote: The extra codecs are supplied in a separate jar file now (lucene-codecs-4.0.0.jar) - I guess this isn't being packaged into solr.war by default? You should be able to download it here: http://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-codecs/4.0.0/lucene-codecs-4.0.0-javadoc.jar and drop it into the lib/ directory. This should not be required, because I am building from source. I compiled Solr from lucene-solr source checked out from branch_4x. I grepped the entire tree for lucene-codec and found nothing. It turns out that running 'ant generate-maven-artifacts' created the jar file -- along with a huge number of other jars that I don't need. It took an extremely long time to run, for a jar that's a little over 300KB. I would argue that the codecs jar should be created by compiling a dist target for Solr. Someone else should determine whether it's appropriate to put it in the .war file, but I think it's important enough to make available without compiling everything in the Lucene universe. ncindex@bigindy5 /index/src/branch_4x $ find . | grep "\.jar$" | grep codec ./solr/core/lib/commons-codec-1.7.jar ./dist/maven/org/apache/lucene/lucene-codecs/4.1-SNAPSHOT/lucene-codecs-4.1-20121015.165734-1.jar ./dist/maven/org/apache/lucene/lucene-codecs/4.1-SNAPSHOT/lucene-codecs-4.1-20121015.165734-1-javadoc.jar ./dist/maven/org/apache/lucene/lucene-codecs/4.1-SNAPSHOT/lucene-codecs-4.1-20121015.165734-1-sources.jar ./lucene/analysis/phonetic/lib/commons-codec-1.7.jar ./lucene/build/codecs/lucene-codecs-4.1-SNAPSHOT.jar ./lucene/build/codecs/lucene-codecs-4.1-SNAPSHOT-javadoc.jar ./lucene/build/codecs/lucene-codecs-4.1-SNAPSHOT-src.jar I put this jar in my lib, and now I get a new error when I try the BloomFilter postingsFormat: SEVERE: null:java.lang.UnsupportedOperationException: Error - org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been constructed without a choice of PostingsFormat at org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
Re: Solr4 - no examples of postingsFormat in schema.xml
The extra codecs are supplied in a separate jar file now (lucene-codecs-4.0.0.jar) - I guess this isn't being packaged into solr.war by default? You should be able to download it here: http://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-codecs/4.0.0/lucene-codecs-4.0.0-javadoc.jar and drop it into the lib/ directory. On 15 Oct 2012, at 00:49, Shawn Heisey wrote: > On 10/14/2012 3:21 PM, Rafał Kuć wrote: >> Hello! >> >> Try adding the following to solrconfig.xml: >> >> > > I did this and got a little further, but still no go. From what it's saying > now, I don't think it will be possible in the current state of branch_4x to > use anything but the default. > > SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type > org.apache.lucene.codecs.PostingsFormat with name 'Block' does not exist. You > need to add the corresponding JAR file supporting this SPI to your > classpath.The current classpath supports the following names: [Lucene40] > > I saw that LUCENE-4446 was applied to branch_4x a few hours ago. I did 'svn > up' and rebuilt Solr. Trying again, it appears to be using Lucene41, which I > believe is the Block format. But when I tried to change the format for my > unique key fields to Bloom, that still didn't work. Is this something I > should file an issue on? > > SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type > org.apache.lucene.codecs.PostingsFormat with name 'Bloom' does not exist. You > need to add > the corresponding JAR file supporting this SPI to your classpath.The current > classpath supports the following names: [Lucene40, Lucene41] > > Thanks, > Shawn >
Re: Solr4 - no examples of postingsFormat in schema.xml
On 10/14/2012 3:21 PM, Rafał Kuć wrote: Hello! Try adding the following to solrconfig.xml: I did this and got a little further, but still no go. From what it's saying now, I don't think it will be possible in the current state of branch_4x to use anything but the default. SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Block' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40] I saw that LUCENE-4446 was applied to branch_4x a few hours ago. I did 'svn up' and rebuilt Solr. Trying again, it appears to be using Lucene41, which I believe is the Block format. But when I tried to change the format for my unique key fields to Bloom, that still didn't work. Is this something I should file an issue on? SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Bloom' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40, Lucene41] Thanks, Shawn
Re: Solr4 - no examples of postingsFormat in schema.xml
Hello! Try adding the following to solrconfig.xml: -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch > On 10/14/2012 12:19 AM, Walter Underwood wrote: >> There is a bit more info in this post, look for "alternative codecs": >> >> http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4-0-0-released/ > I'm running on branch_4x checked out yesterday at 13:59 MDT. I tried > postingsFormat="Block" and "BlockPostingsFormat" and I get the following > error in Solr's log: > SEVERE: Unable to create core: ncmain > org.apache.solr.common.SolrException: FieldType 'sourceText' is > configured with a postings format, but the codec does not support it: > class org.apache.solr.core.SolrCore$3 > at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:773) > at org.apache.solr.core.SolrCore.(SolrCore.java:643) > at org.apache.solr.core.SolrCore.(SolrCore.java:573) > at > org.apache.solr.core.CoreContainer.create(CoreContainer.java:850) > at > org.apache.solr.core.CoreContainer.load(CoreContainer.java:534) > at > org.apache.solr.core.CoreContainer.load(CoreContainer.java:356) > at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) > There is more to the stacktrace, but they are all jetty, sun, and java > classes, so I doubt they are very useful. > sortMissingLast="true" omitNorms="true" positionIncrementGap="0" > postingsFormat="Block"> > > > pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" >replacement="$2" >allowempty="false" > /> > splitOnCaseChange="0" >splitOnNumerics="0" >stemEnglishPossessive="0" >generateWordParts="1" >generateNumberParts="1" >catenateWords="0" >catenateNumbers="0" >catenateAll="0" >preserveOriginal="1" > /> > > > > > Thanks, > Shawn
Re: Solr4 - no examples of postingsFormat in schema.xml
On 10/14/2012 12:19 AM, Walter Underwood wrote: There is a bit more info in this post, look for "alternative codecs": http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4-0-0-released/ I'm running on branch_4x checked out yesterday at 13:59 MDT. I tried postingsFormat="Block" and "BlockPostingsFormat" and I get the following error in Solr's log: SEVERE: Unable to create core: ncmain org.apache.solr.common.SolrException: FieldType 'sourceText' is configured with a postings format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:773) at org.apache.solr.core.SolrCore.(SolrCore.java:643) at org.apache.solr.core.SolrCore.(SolrCore.java:573) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) There is more to the stacktrace, but they are all jetty, sun, and java classes, so I doubt they are very useful. sortMissingLast="true" omitNorms="true" positionIncrementGap="0" postingsFormat="Block"> Thanks, Shawn
Re: Solr4 - no examples of postingsFormat in schema.xml
On 10/14/2012 12:19 AM, Walter Underwood wrote: There is a bit more info in this post, look for "alternative codecs": http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4-0-0-released/ If I were to add this to the Solr wiki as potential options under postingsFormat, would it be correct? “Appending” works with append-only filesystems (such as Hadoop DFS) “Memory” writes the entire terms+postings as an FST read into RAM “Pulsing” inlines the postings for low-frequency terms into the term dictionary “SimpleText” writes all files in plain-text for easy debugging/transparency “Bloom” uses a bloom filter to sometimes avoid disk seeks when looking up terms “Direct” holds all postings as simple byte[] and int[] for very fast performance at the cost of very high RAM consumption “Block” use a new index layout and compression scheme for improved performance Thanks, Shawn
Re: Solr4 - no examples of postingsFormat in schema.xml
There is a bit more info in this post, look for "alternative codecs": http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4-0-0-released/ wunder On Oct 13, 2012, at 10:46 PM, Shawn Heisey wrote: > The wiki page for schema.xml shows the syntax for postingsFormat in a schema > fieldType definition, but it doesn't tell you what to use for the value. > Also, the example configs in the solr download do not include this parameter. > > http://wiki.apache.org/solr/SchemaXml#Data_Types > > For some fields, I am interested in using the BlockPostingsFormat that will > become the default in 4.1. For others, specifically fields that consist > entirely of unique values, I would like to try one of the other formats, like > the Bloom filter. > > https://issues.apache.org/jira/browse/LUCENE-4069 > > Can someone with the appropriate knowledge update the wiki with some values > that would cover the majority of normal use cases, and what kinds of problems > each one is designed to solve? > > Thanks, > Shawn >