Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Shawn Heisey

On 10/15/2012 12:38 PM, Alan Woodward wrote:

See discussion on https://issues.apache.org/jira/browse/SOLR-3843, this was 
apparently intentional.

That also links to the following: 
http://wiki.apache.org/solr/SolrConfigXml#codecFactory, which suggests you need 
to use solr.SchemaCodecFactory for per-field codecs - this might solve your 
postingsFormat exception.


I already added this to my solrconfig.xml as a top-level element:

 

Once I added this, I tried Bloom, but I had an incorrect name.  That 
resulted in this error, showing that the codecFactory config element 
gave me more choices than Lucene40 and Lucene41:


SEVERE: java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.codecs.PostingsFormat with name 'Bloom' does not 
exist. You need to add the corresponding JAR file supporting this SPI to 
your classpath.The current classpath supports the following names: 
[Lucene40, Lucene41, Pulsing41, SimpleText, Memory, BloomFilter, Direct]


Once I got that, I knew I had made some progress, so I changed it to 
BloomFilter and got the error in the previous message.  Repasting here 
without the full stacktrace:


SEVERE: null:java.lang.UnsupportedOperationException: Error - 
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
constructed without a choice of PostingsFormat


Based on that error message, along with something I remember reading 
during my Google travels, I suspect that not all codecs (BloomFilter 
being a prime example) have whatever corresponding Solr bits are required.


Thanks,
Shawn



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Alan Woodward
See discussion on https://issues.apache.org/jira/browse/SOLR-3843, this was 
apparently intentional.

That also links to the following: 
http://wiki.apache.org/solr/SolrConfigXml#codecFactory, which suggests you need 
to use solr.SchemaCodecFactory for per-field codecs - this might solve your 
postingsFormat exception.

On 15 Oct 2012, at 18:41, Alan Woodward wrote:

> 
>> 
>> This should not be required, because I am building from source.  I compiled 
>> Solr from lucene-solr source checked out from branch_4x.  I grepped the 
>> entire tree for lucene-codec and found nothing.
>> 
>> It turns out that running 'ant generate-maven-artifacts' created the jar 
>> file -- along with a huge number of other jars that I don't need.  It took 
>> an extremely long time to run, for a jar that's a little over 300KB.
>> 
>> I would argue that the codecs jar should be created by compiling a dist 
>> target for Solr.  Someone else should determine whether it's appropriate to 
>> put it in the .war file, but I think it's important enough to make available 
>> without compiling everything in the Lucene universe.
> 
> I agree - it looks as though the codecs module wasn't added to the solr build 
> when it was split off.  I've created a JIRA ticket 
> (https://issues.apache.org/jira/browse/SOLR-3947) and added a patch.
> 
> On the error below, I'll have to defer to someone who knows how this actually 
> works...
> 
>> 
>> I put this jar in my lib, and now I get a new error when I try the 
>> BloomFilter postingsFormat:
>> 
>> SEVERE: null:java.lang.UnsupportedOperationException: Error - 
>> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
>> constructed without a choice of PostingsFormat
>>   at 
>> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139)
>>   at 
>> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
>>   at 
>> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
>>   at 
>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>>   at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
>>   at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>>   at 
>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
>>   at 
>> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483)
>>   at 
>> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>>   at 
>> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
>>   at 
>> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656)
>>   at 
>> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792)
>>   at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772)
>>   at 
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525)
>>   at 
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
>>   at 
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>>   at 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
>>   at 
>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>>   at 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>>   at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
>> 
>> 
> 



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Alan Woodward

> 
> This should not be required, because I am building from source.  I compiled 
> Solr from lucene-solr source checked out from branch_4x.  I grepped the 
> entire tree for lucene-codec and found nothing.
> 
> It turns out that running 'ant generate-maven-artifacts' created the jar file 
> -- along with a huge number of other jars that I don't need.  It took an 
> extremely long time to run, for a jar that's a little over 300KB.
> 
> I would argue that the codecs jar should be created by compiling a dist 
> target for Solr.  Someone else should determine whether it's appropriate to 
> put it in the .war file, but I think it's important enough to make available 
> without compiling everything in the Lucene universe.

I agree - it looks as though the codecs module wasn't added to the solr build 
when it was split off.  I've created a JIRA ticket 
(https://issues.apache.org/jira/browse/SOLR-3947) and added a patch.

On the error below, I'll have to defer to someone who knows how this actually 
works...

> 
> I put this jar in my lib, and now I get a new error when I try the 
> BloomFilter postingsFormat:
> 
> SEVERE: null:java.lang.UnsupportedOperationException: Error - 
> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
> constructed without a choice of PostingsFormat
>at 
> org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139)
>at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
>at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
>at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
>at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>at 
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
>at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483)
>at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
>at 
> org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656)
>at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792)
>at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772)
>at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525)
>at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
>at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
>at 
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
>at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> 
> 



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Shawn Heisey

On 10/15/2012 2:47 AM, Alan Woodward wrote:

The extra codecs are supplied in a separate jar file now 
(lucene-codecs-4.0.0.jar) - I guess this isn't being packaged into solr.war by 
default?  You should be able to download it here:

http://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-codecs/4.0.0/lucene-codecs-4.0.0-javadoc.jar

  and drop it into the lib/ directory.


This should not be required, because I am building from source.  I 
compiled Solr from lucene-solr source checked out from branch_4x.  I 
grepped the entire tree for lucene-codec and found nothing.


It turns out that running 'ant generate-maven-artifacts' created the jar 
file -- along with a huge number of other jars that I don't need.  It 
took an extremely long time to run, for a jar that's a little over 300KB.


I would argue that the codecs jar should be created by compiling a dist 
target for Solr.  Someone else should determine whether it's appropriate 
to put it in the .war file, but I think it's important enough to make 
available without compiling everything in the Lucene universe.


ncindex@bigindy5 /index/src/branch_4x $ find . | grep "\.jar$" | grep codec
./solr/core/lib/commons-codec-1.7.jar
./dist/maven/org/apache/lucene/lucene-codecs/4.1-SNAPSHOT/lucene-codecs-4.1-20121015.165734-1.jar
./dist/maven/org/apache/lucene/lucene-codecs/4.1-SNAPSHOT/lucene-codecs-4.1-20121015.165734-1-javadoc.jar
./dist/maven/org/apache/lucene/lucene-codecs/4.1-SNAPSHOT/lucene-codecs-4.1-20121015.165734-1-sources.jar
./lucene/analysis/phonetic/lib/commons-codec-1.7.jar
./lucene/build/codecs/lucene-codecs-4.1-SNAPSHOT.jar
./lucene/build/codecs/lucene-codecs-4.1-SNAPSHOT-javadoc.jar
./lucene/build/codecs/lucene-codecs-4.1-SNAPSHOT-src.jar

I put this jar in my lib, and now I get a new error when I try the 
BloomFilter postingsFormat:


SEVERE: null:java.lang.UnsupportedOperationException: Error - 
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat has been 
constructed without a choice of PostingsFormat
at 
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat.fieldsConsumer(BloomFilteringPostingsFormat.java:139)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:130)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)

at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:483)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2656)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2792)
at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2772)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:525)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)





Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-15 Thread Alan Woodward
The extra codecs are supplied in a separate jar file now 
(lucene-codecs-4.0.0.jar) - I guess this isn't being packaged into solr.war by 
default?  You should be able to download it here:

http://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-codecs/4.0.0/lucene-codecs-4.0.0-javadoc.jar

 and drop it into the lib/ directory.

On 15 Oct 2012, at 00:49, Shawn Heisey wrote:

> On 10/14/2012 3:21 PM, Rafał Kuć wrote:
>> Hello!
>> 
>> Try adding the following to solrconfig.xml:
>> 
>> 
> 
> I did this and got a little further, but still no go.  From what it's saying 
> now, I don't think it will be possible in the current state of branch_4x to 
> use anything but the default.
> 
> SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type 
> org.apache.lucene.codecs.PostingsFormat with name 'Block' does not exist. You 
> need to add the corresponding JAR file supporting this SPI to your 
> classpath.The current classpath supports the following names: [Lucene40]
> 
> I saw that LUCENE-4446 was applied to branch_4x a few hours ago. I did 'svn 
> up' and rebuilt Solr.  Trying again, it appears to be using Lucene41, which I 
> believe is the Block format.  But when I tried to change the format for my 
> unique key fields to Bloom, that still didn't work.  Is this something I 
> should file an issue on?
> 
> SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type 
> org.apache.lucene.codecs.PostingsFormat with name 'Bloom' does not exist. You 
> need to add
> the corresponding JAR file supporting this SPI to your classpath.The current 
> classpath supports the following names: [Lucene40, Lucene41]
> 
> Thanks,
> Shawn
> 



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-14 Thread Shawn Heisey

On 10/14/2012 3:21 PM, Rafał Kuć wrote:

Hello!

Try adding the following to solrconfig.xml:




I did this and got a little further, but still no go.  From what it's 
saying now, I don't think it will be possible in the current state of 
branch_4x to use anything but the default.


SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.codecs.PostingsFormat with name 'Block' does not 
exist. You need to add the corresponding JAR file supporting this SPI to 
your classpath.The current classpath supports the following names: 
[Lucene40]


I saw that LUCENE-4446 was applied to branch_4x a few hours ago. I did 
'svn up' and rebuilt Solr.  Trying again, it appears to be using 
Lucene41, which I believe is the Block format.  But when I tried to 
change the format for my unique key fields to Bloom, that still didn't 
work.  Is this something I should file an issue on?


SEVERE: null:java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.codecs.PostingsFormat with name 'Bloom' does not 
exist. You need to add
 the corresponding JAR file supporting this SPI to your classpath.The 
current classpath supports the following names: [Lucene40, Lucene41]


Thanks,
Shawn



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-14 Thread Rafał Kuć
Hello!

Try adding the following to solrconfig.xml:



-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> On 10/14/2012 12:19 AM, Walter Underwood wrote:
>> There is a bit more info in this post, look for "alternative codecs":
>>
>> http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4-0-0-released/

> I'm running on branch_4x checked out yesterday at 13:59 MDT.  I tried 
> postingsFormat="Block" and "BlockPostingsFormat" and I get the following
> error in Solr's log:

> SEVERE: Unable to create core: ncmain
> org.apache.solr.common.SolrException: FieldType 'sourceText' is 
> configured with a postings format, but the codec does not support it: 
> class org.apache.solr.core.SolrCore$3
>  at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:773)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:643)
>  at org.apache.solr.core.SolrCore.(SolrCore.java:573)
>  at 
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
>  at
> org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
>  at
> org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
>  at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)

> There is more to the stacktrace, but they are all jetty, sun, and java
> classes, so I doubt they are very useful.

>   sortMissingLast="true" omitNorms="true" positionIncrementGap="0" 
> postingsFormat="Block">
>
>  
>  pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$"
>replacement="$2"
>allowempty="false"
>  />
>  splitOnCaseChange="0"
>splitOnNumerics="0"
>stemEnglishPossessive="0"
>generateWordParts="1"
>generateNumberParts="1"
>catenateWords="0"
>catenateNumbers="0"
>catenateAll="0"
>preserveOriginal="1"
>  />
>  
>  
>
>  

> Thanks,
> Shawn



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-14 Thread Shawn Heisey

On 10/14/2012 12:19 AM, Walter Underwood wrote:

There is a bit more info in this post, look for "alternative codecs":

http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4-0-0-released/


I'm running on branch_4x checked out yesterday at 13:59 MDT.  I tried 
postingsFormat="Block" and "BlockPostingsFormat" and I get the following 
error in Solr's log:


SEVERE: Unable to create core: ncmain
org.apache.solr.common.SolrException: FieldType 'sourceText' is 
configured with a postings format, but the codec does not support it: 
class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:773)
at org.apache.solr.core.SolrCore.(SolrCore.java:643)
at org.apache.solr.core.SolrCore.(SolrCore.java:573)
at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)

at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)


There is more to the stacktrace, but they are all jetty, sun, and java 
classes, so I doubt they are very useful.


sortMissingLast="true" omitNorms="true" positionIncrementGap="0" 
postingsFormat="Block">

  





  


Thanks,
Shawn



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-14 Thread Shawn Heisey

On 10/14/2012 12:19 AM, Walter Underwood wrote:

There is a bit more info in this post, look for "alternative codecs":

http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4-0-0-released/


If I were to add this to the Solr wiki as potential options under 
postingsFormat, would it be correct?


“Appending” works with append-only filesystems (such as Hadoop DFS)
“Memory” writes the entire terms+postings as an FST read into RAM
“Pulsing” inlines the postings for low-frequency terms into the term 
dictionary

“SimpleText” writes all files in plain-text for easy debugging/transparency
“Bloom” uses a bloom filter to sometimes avoid disk seeks when looking 
up terms
“Direct” holds all postings as simple byte[] and int[] for very fast 
performance at the cost of very high RAM consumption
“Block” use a new index layout and compression scheme for improved 
performance


Thanks,
Shawn



Re: Solr4 - no examples of postingsFormat in schema.xml

2012-10-13 Thread Walter Underwood
There is a bit more info in this post, look for "alternative codecs":

http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4-0-0-released/

wunder

On Oct 13, 2012, at 10:46 PM, Shawn Heisey wrote:

> The wiki page for schema.xml shows the syntax for postingsFormat in a schema 
> fieldType definition, but it doesn't tell you what to use for the value.  
> Also, the example configs in the solr download do not include this parameter.
> 
> http://wiki.apache.org/solr/SchemaXml#Data_Types
> 
> For some fields, I am interested in using the BlockPostingsFormat that will 
> become the default in 4.1.  For others, specifically fields that consist 
> entirely of unique values, I would like to try one of the other formats, like 
> the Bloom filter.
> 
> https://issues.apache.org/jira/browse/LUCENE-4069
> 
> Can someone with the appropriate knowledge update the wiki with some values 
> that would cover the majority of normal use cases, and what kinds of problems 
> each one is designed to solve?
> 
> Thanks,
> Shawn
>