from:"Dirk Högemann"

EarlyTerminatingCollectorException

2014-11-05 Thread Dirk Högemann

Our production Solr-Slaves-Cores (we have about 40 Cores (each has a
moderate size about 10K documents to  90K documents)) produce many
exceptions of type:

014-11-05 15:06:06.247 [searcherExecutor-158-thread-1] ERROR
org.apache.solr.search.SolrCache: Error during auto-warming of
key:org.apache.solr.search.QueryResultKey@62340b01
:org.apache.solr.search.EarlyTerminatingCollectorException

Our relevant solrconfig is

  

  18

  

  
2


   


  

  

What exactly does the exception mean?
Thank you!

-- Dirk --

Re: EarlyTerminatingCollectorException

2014-11-06 Thread Dirk Högemann

https://issues.apache.org/jira/browse/SOLR-6710

2014-11-05 21:56 GMT+01:00 Mikhail Khludnev :

> I'm wondered too, but it seems it warmups queryResultCache
>
> https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L522
> at least this ERRORs broke nothing  see
>
> https://github.com/apache/lucene-solr/blob/20f9303f5e2378e2238a5381291414881ddb8172/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L165
>
> anyway, here are two usability issues:
>  - of key:org.apache.solr.search.QueryResultKey@62340b01 lack of readable
> toString()
>  - I don't think regeneration exceptions are ERRORs, they seem WARNs for me
> or even lower. also for courtesy, particularly
> EarlyTerminatingCollectorExcepions can be recognized, and even ignored,
> providing SolrIndexSearcher.java#L522
>
> Would you mind to raise a ticket?
>
> On Wed, Nov 5, 2014 at 6:51 PM, Dirk Högemann  wrote:
>
> > Our production Solr-Slaves-Cores (we have about 40 Cores (each has a
> > moderate size about 10K documents to  90K documents)) produce many
> > exceptions of type:
> >
> > 014-11-05 15:06:06.247 [searcherExecutor-158-thread-1] ERROR
> > org.apache.solr.search.SolrCache: Error during auto-warming of
> > key:org.apache.solr.search.QueryResultKey@62340b01
> > :org.apache.solr.search.EarlyTerminatingCollectorException
> >
> > Our relevant solrconfig is
> >
> >   
> > 
> >   18
> > 
> >   
> >
> >   
> > 2
> >  >   class="solr.FastLRUCache"
> >   size="8192"
> >   initialSize="8192"
> >   autowarmCount="4096"/>
> >
> >
> >  >   class="solr.FastLRUCache"
> >   size="8192"
> >   initialSize="8192"
> >   autowarmCount="4096"/>
> >
> >   
> >  >   class="solr.FastLRUCache"
> >   size="8192"
> >   initialSize="8192"
> >   autowarmCount="4096"/>
> >   
> >
> > What exactly does the exception mean?
> > Thank you!
> >
> > -- Dirk --
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>

Solr4.2 PostCommit EventListener not working on Replication-Instances

2013-07-25 Thread Dirk Högemann

Hello,

I have implemented a Solr EventListener, which should be fired after
committing.
This works fine on the Solr-Master Instance and  it also worked in Solr 3.5
on any Slave Instance.
I upgraded my installation to Solr 4.2 and now the postCommit event is not
fired any more on the replication (Slave) instances, which is a huge
problem, as other cache have to be invalidated, when replication took place.

This is my configuration solrconfig.xml on the slaves:

  

  1



...


  

...
  

  http://localhost:9101/solr/Core1
  00:03:00

  

Any hints?

Best regards

Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann

Hi,

I am not sure if am missing something, or maybe I do not exactly understand
the index/search analyzer definition and their execution.

I have a field definition like this:



  


  
  


  


Any field starting with cl2 should be recognized as being of type
cl2Tokenized_string:


When I try to search for a token in that sense the query is tokenized at
whitespaces:

{!q.op=AND
df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
Erden, sonstiger Bergbau+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau

I expected the query parser would also tokenize ONLY at the pattern ###,
instead of using a white space tokenizer here?
Is is possible to define a filter query, without using phrases, to achieve
the desired behavior?
Maybe local parameters are not the way to go here?

Best
Dirk

Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann

{!q.op=AND df=cl2Categories_NACE}08
Gewinnung von Steinen und Erden, sonstiger Bergbau+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau

That is the relevant debug Output from the query.

2012/12/17 Dirk Högemann 

> Hi,
>
> I am not sure if am missing something, or maybe I do not exactly
> understand the index/search analyzer definition and their execution.
>
> I have a field definition like this:
>
>
>  sortMissingLast="true" omitNorms="true">
>   
>  group="-1"/>
> 
>   
>   
>  group="-1"/>
> 
>   
> 
>
> Any field starting with cl2 should be recognized as being of type
> cl2Tokenized_string:
>  stored="true" />
>
> When I try to search for a token in that sense the query is tokenized at
> whitespaces:
>
> {!q.op=AND
> df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
> Erden, sonstiger Bergbau name="parsed_filter_queries">+cl2Categories_NACE:08
> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
> +cl2Categories_NACE:bergbau
>
> I expected the query parser would also tokenize ONLY at the pattern ###,
> instead of using a white space tokenizer here?
> Is is possible to define a filter query, without using phrases, to achieve
> the desired behavior?
> Maybe local parameters are not the way to go here?
>
> Best
> Dirk
>

Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann

Ok- right, changed that... Nevertheless I thought I should always use the
same analyzers for the query and the index section to have consistent
results.
Does this mean that the tokenizer in the query section will always be
ignored by the given query parsers?



2012/12/17 Jack Krupansky 

> The query parsers normally tokenize on white space and query operators,
> but you can escape any white space with backslash or put the text in quotes
> and then it will be tokenized by the analyzer rather than the query parser.
>
> Also, you have:
>
> 
>
> Change "search" to "query", but that won't change your problem since Solr
> defaults to using the "index" analyzer if it doesn't "see" a "query"
> analyzer.
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Monday, December 17, 2012 5:59 AM
> To: solr-user@lucene.apache.org
> Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
> whitespace?
>
>
> Hi,
>
> I am not sure if am missing something, or maybe I do not exactly understand
> the index/search analyzer definition and their execution.
>
> I have a field definition like this:
>
>
> sortMissingLast="true" omitNorms="true">
>  
> group="-1"/>
>
>  
>  
> group="-1"/>
>
>  
>
>
> Any field starting with cl2 should be recognized as being of type
> cl2Tokenized_string:
>  stored="true" />
>
> When I try to search for a token in that sense the query is tokenized at
> whitespaces:
>
> {!**q.op=AND
> df=cl2Categories_NACE}**cl2Categories_NACE:08 Gewinnung von Steinen und
> Erden, sonstiger Bergbau name="parsed_filter_queries"><**str>+cl2Categories_NACE:08
> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
> +cl2Categories_NACE:bergbau
>
> I expected the query parser would also tokenize ONLY at the pattern ###,
> instead of using a white space tokenizer here?
> Is is possible to define a filter query, without using phrases, to achieve
> the desired behavior?
> Maybe local parameters are not the way to go here?
>
> Best
> Dirk
>

Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

2012-12-17 Thread Dirk Högemann

Ah - now I got it. My solution to this was to use phrase queries - now I
know why: Thanks!
2012/12/17 Jack Krupansky 

> No, the "query" analyzer tokenizer will simply be applied to each term or
> quoted string AFTER the query parser has already parsed it. You may have
> escaped or quoted characters which will then be seen by the analyzer
> tokenizer.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Monday, December 17, 2012 11:01 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always
> at whitespace?
>
>
> Ok- right, changed that... Nevertheless I thought I should always use the
> same analyzers for the query and the index section to have consistent
> results.
> Does this mean that the tokenizer in the query section will always be
> ignored by the given query parsers?
>
>
>
> 2012/12/17 Jack Krupansky 
>
>  The query parsers normally tokenize on white space and query operators,
>> but you can escape any white space with backslash or put the text in
>> quotes
>> and then it will be tokenized by the analyzer rather than the query
>> parser.
>>
>> Also, you have:
>>
>> 
>>
>> Change "search" to "query", but that won't change your problem since Solr
>> defaults to using the "index" analyzer if it doesn't "see" a "query"
>> analyzer.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Dirk Högemann
>> Sent: Monday, December 17, 2012 5:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at
>> whitespace?
>>
>>
>> Hi,
>>
>> I am not sure if am missing something, or maybe I do not exactly
>> understand
>> the index/search analyzer definition and their execution.
>>
>> I have a field definition like this:
>>
>>
>>> sortMissingLast="true" omitNorms="true">
>>  
>>> group="-1"/>
>>
>>  
>>  
>>> group="-1"/>
>>
>>
>>  
>>
>>
>> Any field starting with cl2 should be recognized as being of type
>> cl2Tokenized_string:
>> > stored="true" />
>>
>> When I try to search for a token in that sense the query is tokenized at
>> whitespaces:
>>
>> {!q.op=AND
>> df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen
>> und
>>
>> Erden, sonstiger Bergbau> name="parsed_filter_queries"><str>+cl2Categories_NACE:08
>>
>> +cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
>> +cl2Categories_NACE:steinen +cl2Categories_NACE:und
>> +cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
>> +cl2Categories_NACE:bergbau
>>
>>
>> I expected the query parser would also tokenize ONLY at the pattern ###,
>> instead of using a white space tokenizer here?
>> Is is possible to define a filter query, without using phrases, to achieve
>> the desired behavior?
>> Maybe local parameters are not the way to go here?
>>
>> Best
>> Dirk
>>
>>
>

Re: Bad performance while query pdf solr documents

2012-12-23 Thread Dirk Högemann

You can define the fields to be returned with the fl parameter fl=the,
needed, fields - usually the score and the id...

2012/12/23 uwe72 

> hi
>
> i am indexing pdf documents to solr by tika.
>
> when i do the query in the client with solrj the performance is very bad
> (40
> seconds) to load 100 documents?
>
> Probably because to load all the content. The content i don't need. How can
> i tell the query to don't load the content?
>
> Or other reasons why the performance is so bad?
>
> Regards
> Uwe
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Bad-performance-while-query-pdf-solr-documents-tp4028766.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Bad performance while query pdf solr documents

2012-12-23 Thread Dirk Högemann

Do you really need them all in the response to show them in the results?
As you define them as not stored now this does not seem so.


2012/12/23 Otis Gospodnetic 

> Hi,
>
> You can specify them in solrconfig.xml for your request handler, so you
> don't have to specify it for each query unless you want to override fl.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Dec 23, 2012 4:39 AM, "uwe72"  wrote:
>
> > we have more than hundreds fields...i don't want to put them all to the
> fl
> > parameters
> >
> > is there a other way, like to say return all fields, except the
> fields...?
> >
> > anyhow i will change the field from stored to stored=false in the schema.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Bad-performance-while-query-pdf-solr-documents-tp4028766p4028816.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Auto-Commit and failures / schema violations

2011-07-29 Thread Dirk Högemann

Hello,

we are running a large CMS with multiple customers and we are now going to use 
solr for our search and indexing tasks.
As we have a lot of users working simultaneously on the CMS we decided not to 
commit our changes programatically (we use StreamingUpdateSolrServer) on each 
add. Instead we are using the autocommit functions ins solr-config.xml.

To be "reliable" we write Timestamp files on each "add" of a document to the 
StreamingUpdateSolrServer. (In case of a crash we could restart indexing since 
that timetamp. )
Unfortunately we don't know how to be sure that the add was successfull, as 
(for example) schema violations seem to be detected on commit, which is 
therefore too late, as the timestamp is usually already overwritten then.

So: Are there any valid approaches to bes sure that an add of a document has 
been processed successfully?
Maybe: Is ist better to collect a list of documents to add and commit these, 
instead of using the auto-commit function?

Thanks in advance for any help!
Dirk Högemann
___
Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://produkte.web.de/go/toolbar

Phonetic search and matching

2012-02-06 Thread Dirk Högemann

Hi,

I have a question on phonetic search and matching in solr.
In our application all the content of an article is written to a full-text
search field, which provides stemming and a phonetic filter (cologne
phonetic for german).
This is the relevant part of the configuration for the index analyzer
(search is analogous):








Unfortunately this results sometimes in strange, but also explainable,
matches.
For example:

Content field indexes the following String: Donnerstag von 13 bis 17 Uhr.

This results in a match, if we search for "puf"  as the result of the
phonetic filter for this is 13.
(As a consequence the 13 is then also highlighted)

Does anyone has an idea how to handle this in a reasonable way that a
search for "puf" does not match 13 in the content?

Thanks in advance!

Dirk

Re: Phonetic search and matching

2012-02-07 Thread Dirk Högemann

Thanks Erick.
In the first place we thought of removing numbers with a pattern filter.
Setting inject to false will have the "same" effect
If we want to be able to search for numbers in the content this solution
will not work,but another field without phonetic filtering and searching in
both fields would be ok,right?

Dirk
Am 07.02.2012 14:01 schrieb "Erick Erickson" :

> What happens if you do NOT inject? Setting  inject="false"
> stores only the phonetic reduction, not the original text. In that
> case your false match on "13" would go away
>
> Not sure what that means for the rest of your app though.
>
> Best
> Erick
>
> On Mon, Feb 6, 2012 at 5:44 AM, Dirk Högemann
>  wrote:
> > Hi,
> >
> > I have a question on phonetic search and matching in solr.
> > In our application all the content of an article is written to a
> full-text
> > search field, which provides stemming and a phonetic filter (cologne
> > phonetic for german).
> > This is the relevant part of the configuration for the index analyzer
> > (search is analogous):
> >
> >
> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
> >
> > language="German2"
> > />
> > > encoder="ColognePhonetic" inject="true"/>
> >
> >
> > Unfortunately this results sometimes in strange, but also explainable,
> > matches.
> > For example:
> >
> > Content field indexes the following String: Donnerstag von 13 bis 17 Uhr.
> >
> > This results in a match, if we search for "puf"  as the result of the
> > phonetic filter for this is 13.
> > (As a consequence the 13 is then also highlighted)
> >
> > Does anyone has an idea how to handle this in a reasonable way that a
> > search for "puf" does not match 13 in the content?
> >
> > Thanks in advance!
> >
> > Dirk
>

Solr / Tika Integration

2012-02-10 Thread Dirk Högemann

Hello,

we use Solr 3.5 and Tika to index a lot of PDFs. The content of those PDFs
is searchable via a full-text search.
Also the terms are used to make search suggestions.

Unfortunately pdfbox seems to insert a space character, when there are
soft-hyphens in the content of the PDF
Thus the extracted text is sometimes very fragmented. For example the word
Medizin is extracted as Me di zin.
As a consequence the suggestions are often unusable and the search does not
work as expected.

Has anyone a suggestion how to extract the content of PDF containing
sof-hyphens withpout fragmenting it?

Best
Dirk

Re: Solr / Tika Integration

2012-02-10 Thread Dirk Högemann

Thanks so far. I will have a closer look at the PDF.

I tried the enableautospace setting with pdfbox1.6 - did not work:

PDFParser parser = new PDFParser();
   parser.setEnableAutoSpace(false);
   ContentHandler handler = new BodyContentHandler();

Output:
Va ri an te Creutz feldt-
Ja kob-Krank heit
Stel lung nah men des Ar beits krei ses Blut

Our suggest component and parts of our search is getting hard to use by
this. Any other ideas?

Best
Dirk


2012/2/10 Jan Høydahl 

> I think you need to control the parameter "enableAutoSpace" in PDFBox.
> There's a JIRA for it, but it depends on some Tika1.1 stuff as far I can
> understand
>
> https://issues.apache.org/jira/browse/SOLR-2930
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 10. feb. 2012, at 11:21, Dirk Högemann wrote:
>
> > Hello,
> >
> > we use Solr 3.5 and Tika to index a lot of PDFs. The content of those
> PDFs
> > is searchable via a full-text search.
> > Also the terms are used to make search suggestions.
> >
> > Unfortunately pdfbox seems to insert a space character, when there are
> > soft-hyphens in the content of the PDF
> > Thus the extracted text is sometimes very fragmented. For example the
> word
> > Medizin is extracted as Me di zin.
> > As a consequence the suggestions are often unusable and the search does
> not
> > work as expected.
> >
> > Has anyone a suggestion how to extract the content of PDF containing
> > sof-hyphens withpout fragmenting it?
> >
> > Best
> > Dirk
>
>

Re: Solr / Tika Integration

2012-02-10 Thread Dirk Högemann

Interesting thing is that the only Tool I found to handle my pdf correctly
was pdftotext.


2012/2/10 Robert Muir 

> On Fri, Feb 10, 2012 at 6:18 AM, Dirk Högemann
>  wrote:
> >
> > Our suggest component and parts of our search is getting hard to use by
> > this. Any other ideas?
> >
>
> Looks like https://issues.apache.org/jira/browse/PDFBOX-371
>
> The title of the issue is a bit confusing (I don't think it should go
> to hyphen either!), but I think its the reason its being mapped to a
> space.
>
> --
> lucidimagination.com
>

Auto commit exception in Solr 4.0 Beta

2012-08-21 Thread Dirk Högemann

Hello,

I am trying to make our search application Solr 4.0 (Beta) ready and
elaborate on the tasks necessary to accomplish this.
When I try to reindex our documents I get the following exception:

 auto commit error...:java.lang.UnsupportedOperationException: this codec
can only be used for reading
at
org.apache.lucene.codecs.lucene3x.Lucene3xCodec$1.writeLiveDocs(Lucene3xCodec.java:74)
at
org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:278)
at
org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:435)
at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:278)
at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2928)
at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2919)
at
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2666)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2793)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2773)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:531)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:214)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Is this a known bug, or is it maybe a Classpath problem I am facing here?

Best
Dirk Hoegemann

Re: Auto commit exception in Solr 4.0 Beta

2012-08-21 Thread Dirk Högemann

Perfect. I reindexed the whole index and everything worked fine. The
exception was just a little bit confusing.
Best
Dirk
Am 21.08.2012 14:39 schrieb "Jack Krupansky" :

> Did you explicitly run the IndexUpgrader before adding new documents?
>
> In theory, you don't have to do that, but... who knows for sure.
>
> While you wait for one of the hard-core Lucene guys to respond, you could
> try IndexUpgrader, if you haven't already.
>
> OTOH, if you are in fact reindexing (rather than reusing your old index),
> why not start with an empty 4.0 index?
>
> From CHANGES.TXT:
>
> - On upgrading to 4.0, if you do not fully reindex your documents,
>  Lucene will emulate the new flex API on top of the old index,
>  incurring some performance cost (up to ~10% slowdown, typically).
>  To prevent this slowdown, use oal.index.IndexUpgrader
>  to upgrade your indexes to latest file format (LUCENE-3082).
>
>  Mixed flex/pre-flex indexes are perfectly fine -- the two
>  emulation layers (flex API on pre-flex index, and pre-flex API on
>  flex index) will remap the access as required.  So on upgrading to
>  4.0 you can start indexing new documents into an existing index.
>  To get optimal performance, use oal.index.IndexUpgrader
>  to upgrade your indexes to latest file format (LUCENE-3082).
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Tuesday, August 21, 2012 9:17 AM
> To: solr-user@lucene.apache.org
> Subject: Auto commit exception in Solr 4.0 Beta
>
> Hello,
>
> I am trying to make our search application Solr 4.0 (Beta) ready and
> elaborate on the tasks necessary to accomplish this.
> When I try to reindex our documents I get the following exception:
>
> auto commit error...:java.lang.**UnsupportedOperationException: this codec
> can only be used for reading
>at
> org.apache.lucene.codecs.**lucene3x.Lucene3xCodec$1.**
> writeLiveDocs(Lucene3xCodec.**java:74)
>at
> org.apache.lucene.index.**ReadersAndLiveDocs.**writeLiveDocs(**
> ReadersAndLiveDocs.java:278)
>at
> org.apache.lucene.index.**IndexWriter$ReaderPool.**
> release(IndexWriter.java:435)
>at
> org.apache.lucene.index.**BufferedDeletesStream.**applyDeletes(**
> BufferedDeletesStream.java:**278)
>at
> org.apache.lucene.index.**IndexWriter.applyAllDeletes(**
> IndexWriter.java:2928)
>at
> org.apache.lucene.index.**IndexWriter.maybeApplyDeletes(**
> IndexWriter.java:2919)
>at
> org.apache.lucene.index.**IndexWriter.prepareCommit(**
> IndexWriter.java:2666)
>at
> org.apache.lucene.index.**IndexWriter.commitInternal(**
> IndexWriter.java:2793)
>at org.apache.lucene.index.**IndexWriter.commit(**
> IndexWriter.java:2773)
>at
> org.apache.solr.update.**DirectUpdateHandler2.commit(**
> DirectUpdateHandler2.java:531)
>at org.apache.solr.update.**CommitTracker.run(**
> CommitTracker.java:214)
>at
> java.util.concurrent.**Executors$RunnableAdapter.**
> call(Executors.java:441)
>at
> java.util.concurrent.**FutureTask$Sync.innerRun(**FutureTask.java:303)
>at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>at
> java.util.concurrent.**ScheduledThreadPoolExecutor$**
> ScheduledFutureTask.access$**301(**ScheduledThreadPoolExecutor.**java:98)
>at
> java.util.concurrent.**ScheduledThreadPoolExecutor$**
> ScheduledFutureTask.run(**ScheduledThreadPoolExecutor.**java:206)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:886)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.**java:662)
>
> Is this a known bug, or is it maybe a Classpath problem I am facing here?
>
> Best
> Dirk Hoegemann
>

solr4.0 LimitTokenCountFilterFactory NumberFormatException

2012-10-17 Thread Dirk Högemann

Hi,

I am trying to upgrade from Solr 3.5 to Solr 4.0.
I read the following in the example solrconfig:

 

I tried that as follows:

...

  







  
...

The LimitTokenCountFilterFactory configured like that crashes the startup
of the corresponding core with the following exception (without the Factory
the core startup works):


17.10.2012 17:44:19 org.apache.solr.common.SolrException log
SCHWERWIEGEND: null:org.apache.solr.common.SolrException: Plugin init
failure for [schema.xml] fieldType "textgen": Plugin init failure for
[schema.xml] analyze
r/filter: null
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:113)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649)
at
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/filter: null
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 25 more
Caused by: java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:417)
at java.lang.Integer.parseInt(Integer.java:499)
at
org.apache.lucene.analysis.miscellaneous.LimitTokenCountFilterFactory.init(LimitTokenCountFilterFactory.java:48)
at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:367)
at
org.apache.solr.schema.FieldTypePluginLoader$3.init(FieldTypePluginLoader.java:358)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:159)
... 29 more

Any ideas?

Best
Dirk

Re: solr4.0 LimitTokenCountFilterFactory NumberFormatException

2012-10-17 Thread Dirk Högemann

:-) great solution...will look funny in our production system.
Am 17.10.2012 16:12 schrieb "Jack Krupansky" :

> Anybody want to guess what's wrong with this code:
>
> String maxTokenCountArg = args.get("maxTokenCount");
> if (maxTokenCountArg == null) {
>  throw new IllegalArgumentException("**maxTokenCount is mandatory.");
> }
> maxTokenCount = Integer.parseInt(args.get(**maxTokenCountArg));
>
> Hmmm... try this "workaround":
>
>  maxTokenCount="foo" foo="1"/>
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Wednesday, October 17, 2012 11:50 AM
> To: solr-user@lucene.apache.org
> Subject: solr4.0 LimitTokenCountFilterFactory NumberFormatException
>
> Hi,
>
> I am trying to upgrade from Solr 3.5 to Solr 4.0.
> I read the following in the example solrconfig:
>
> 
>
> I tried that as follows:
>
> ...
>  positionIncrementGap="100">
>  
>
> maxTokenCount="10"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>
> language="German"
> />
> words="stopwords.txt" enablePositionIncrements="**true" />
>
>  
> ...
>
> The LimitTokenCountFilterFactory configured like that crashes the startup
> of the corresponding core with the following exception (without the Factory
> the core startup works):
>
>
> 17.10.2012 17:44:19 org.apache.solr.common.**SolrException log
> SCHWERWIEGEND: null:org.apache.solr.common.**SolrException: Plugin init
> failure for [schema.xml] fieldType "textgen": Plugin init failure for
> [schema.xml] analyze
> r/filter: null
>at
> org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
> AbstractPluginLoader.java:177)
>at
> org.apache.solr.schema.**IndexSchema.readSchema(**IndexSchema.java:369)
>at org.apache.solr.schema.**IndexSchema.(**
> IndexSchema.java:113)
>at org.apache.solr.core.**CoreContainer.create(**
> CoreContainer.java:846)
>at org.apache.solr.core.**CoreContainer.load(**
> CoreContainer.java:534)
>at org.apache.solr.core.**CoreContainer.load(**
> CoreContainer.java:356)
>at
> org.apache.solr.core.**CoreContainer$Initializer.**
> initialize(CoreContainer.java:**308)
>at
> org.apache.solr.servlet.**SolrDispatchFilter.init(**
> SolrDispatchFilter.java:107)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**initFilter(**
> ApplicationFilterConfig.java:**277)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**getFilter(**
> ApplicationFilterConfig.java:**258)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**setFilterDef(**
> ApplicationFilterConfig.java:**382)
>at
> org.apache.catalina.core.**ApplicationFilterConfig.**
> (ApplicationFilterConfig.java:**103)
>at
> org.apache.catalina.core.**StandardContext.filterStart(**
> StandardContext.java:4638)
>at
> org.apache.catalina.core.**StandardContext.startInternal(**
> StandardContext.java:5294)
>at
> org.apache.catalina.util.**LifecycleBase.start(**LifecycleBase.java:150)
>at
> org.apache.catalina.core.**ContainerBase.**addChildInternal(**
> ContainerBase.java:895)
>at
> org.apache.catalina.core.**ContainerBase.addChild(**
> ContainerBase.java:871)
>at
> org.apache.catalina.core.**StandardHost.addChild(**StandardHost.java:615)
>at
> org.apache.catalina.startup.**HostConfig.deployDescriptor(**
> HostConfig.java:649)
>at
> org.apache.catalina.startup.**HostConfig$DeployDescriptor.**
> run(HostConfig.java:1581)
>at
> java.util.concurrent.**Executors$RunnableAdapter.**
> call(Executors.java:441)
>at
> java.util.concurrent.**FutureTask$Sync.innerRun(**FutureTask.java:303)
>at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:886)
>at
> java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.**java:662)
> Caused by: org.apache.solr.common.**SolrException: Plugin init failure for
> [schema.xml] analyzer/filter: null
>at
> org.apache.solr.util.plugin.**AbstractPluginLoader.load(**
> AbstractPluginLoader.java:177)
>at
> org.apache.solr.schema.**FieldTypePlu

Forwardslash delimiter.Solr4.0 query for path like /Customer/Content/*

2012-10-30 Thread Dirk Högemann

Hi,

I am currently upgrading from Solr 3.5 to Solr 4.0

I used to have filter-bases restrictions for my search based on the paths
of documents in a content repository.
E.g.  fq={!q.op=OR df=}folderPath_}/customer/content/*

Unfortunately this does not work anymore, as lucene now supports
Regexpsearches - delimiting the expression with forward slashes:
http://lucene.apache.org/core/4_0_0-BETA/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Regexp_Searches

this leads to a parsed query, which is of course not what is intended:

RegexpQuery(folderPath_:/standardlsg/)
folderPath_:shareddocs RegexpQuery(folderPath_:/personen/)
folderPath_:*

Is there a possibility to make the example query above work, without
escaping the "/" with "\/"?
Otherwise I will have to parse all queries  (coming from persisted
configurations in the repositiory) and escape the relevant parts of the
queries on that field, which is somewhat ugly...

The field I search on is of type:




  
  


 

Best and thanks for any hints
Dirk

Re: Forwardslash delimiter.Solr4.0 query for path like /Customer/Content/*

2012-11-01 Thread Dirk Högemann

Ok.If there is no other way I will have some string parsing to do, but in
this case I am wondering a little bit about the chosen delimiter...as it is
central to nearly any path in directories, web resources etc.,right?
Best
Dirk
Am 30.10.2012 19:16 schrieb "Jack Krupansky" :

> Maybe a custom search component that runs before the QueryComponent and
> does the escaping?
>
> -- Jack Krupansky
>
> -Original Message- From: Dirk Högemann
> Sent: Tuesday, October 30, 2012 1:07 PM
> To: solr-user@lucene.apache.org
> Subject: Forwardslash delimiter.Solr4.0 query for path like
> /Customer/Content/*
>
> Hi,
>
> I am currently upgrading from Solr 3.5 to Solr 4.0
>
> I used to have filter-bases restrictions for my search based on the paths
> of documents in a content repository.
> E.g.  fq={!q.op=OR df=}folderPath_}/customer/**content/*
>
> Unfortunately this does not work anymore, as lucene now supports
> Regexpsearches - delimiting the expression with forward slashes:
> http://lucene.apache.org/core/**4_0_0-BETA/queryparser/org/**
> apache/lucene/queryparser/**classic/package-summary.html#**Regexp_Searches<http://lucene.apache.org/core/4_0_0-BETA/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Regexp_Searches>
>
> this leads to a parsed query, which is of course not what is intended:
>
>  name="parsed_filter_queries"><**str>RegexpQuery(folderPath_:/**
> standardlsg/)
> folderPath_:shareddocs RegexpQuery(folderPath_:/**personen/)
> folderPath_:*
>
> Is there a possibility to make the example query above work, without
> escaping the "/" with "\/"?
> Otherwise I will have to parse all queries  (coming from persisted
> configurations in the repositiory) and escape the relevant parts of the
> queries on that field, which is somewhat ugly...
>
> The field I search on is of type:
>
> 
>
>
>  
>  
>
>
> 
>
> Best and thanks for any hints
> Dirk
>

EarlyTerminatingCollectorException

Re: EarlyTerminatingCollectorException

Solr4.2 PostCommit EventListener not working on Replication-Instances

Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Re: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Re: Bad performance while query pdf solr documents

Re: Bad performance while query pdf solr documents

Auto-Commit and failures / schema violations

Phonetic search and matching

Re: Phonetic search and matching

Solr / Tika Integration

Re: Solr / Tika Integration

Re: Solr / Tika Integration

Auto commit exception in Solr 4.0 Beta

Re: Auto commit exception in Solr 4.0 Beta

solr4.0 LimitTokenCountFilterFactory NumberFormatException

Re: solr4.0 LimitTokenCountFilterFactory NumberFormatException

Forwardslash delimiter.Solr4.0 query for path like /Customer/Content/*

Re: Forwardslash delimiter.Solr4.0 query for path like /Customer/Content/*

21 matches

Site Navigation

Mail list logo

Footer information