Re: Error "unexpected docvalues type NUMERIC for field" using rord() function query on single valued int field
Hello, We've figured out a workaround for this, using another field that's multivalued and populated with , and using that field in the rord() function query. Nevertheless, this feels like a bug to me. Bye, Jaco. On 23 November 2016 at 09:04, Jaco de Vroed <jdevr...@gmail.com> wrote: > Hi, > > No, I reproduced the original issue, with the rord() function, on a brand > new index with docValues=true, with just one doc indexed in it. > > Any clues? > > Thanks, > > Jaco. > > On 21 November 2016 at 15:06, Pushkar Raste <pushkar.ra...@gmail.com> > wrote: > >> Did you turn on/off docValues on a already existing field? >> >> On Nov 16, 2016 11:51 AM, "Jaco de Vroed" <jdevr...@gmail.com> wrote: >> >> > Hi, >> > >> > I made a typo. The Solr version number in which this error occurs is >> 5.5.3. >> > I also checked 6.3.0, same problem. >> > >> > Thanks, bye, >> > >> > Jaco. >> > >> > On 16 November 2016 at 17:39, Jaco de Vroed <jdevr...@gmail.com> wrote: >> > >> > > Hello Solr users, >> > > >> > > I’m running into an error situation using Solr 5.3.3. The case is as >> > > follows. In my schema, I have a field with a definition like this: >> > > >> > > > > > positionIncrementGap="0”/> >> > > …. >> > > > > > docValues="true" /> >> > > >> > > That field is used in function queries for boosting purposes, using >> the >> > > rord() function. We’re coming from Solr 4, not using docValues for >> that >> > > field, and now moving to Solr 5, using docValues. Now, this is >> causing a >> > > problem. When doing this: >> > > >> > > http://localhost:8983/solr/core1/select?q=*:*=ID, >> > > recip(rord(PublicationDate),0.15,300,10) >> > > >> > > The following error is given: "*unexpected docvalues type NUMERIC for >> > > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use >> > > UninvertingReader or index with docvalues*” (full stack trace below). >> > > >> > > This does not happen when the field is changed to be multiValued, but >> I >> > > don’t want to change that at this point (and I noticed that changing >> from >> > > single valued to multivalued, then attempting to post the document >> again >> > > also results in an error related to docvalues type, but that could be >> the >> > > topic of another mail I guess). This is now blocking our long desired >> > > upgrade to Solr 5. We initially tried upgrading without docValues, but >> > > performance was completely killed because of our function query based >> > > ranking stuff, so we decide to use docValues. >> > > >> > > To me, this seems a bug. I’ve tried finding something in Solr’s JIRA, >> the >> > > exact same error is in https://issues.apache.org/jira >> /browse/SOLR-7495, >> > > but that is a different case. >> > > >> > > I can create a JIRA issue for this of course, but first wanted to >> throw >> > > this at the mailing list to see if there’s any insights that can be >> > shared. >> > > >> > > Thanks a lot in advance, bye, >> > > >> > > Jaco.. >> > > >> > > unexpected docvalues type NUMERIC for field 'PublicationDate' >> (expected >> > > one of [SORTED, SORTED_SET]). Use UninvertingReader or index with >> > docvalues. >> > > java.lang.IllegalStateException: unexpected docvalues type NUMERIC >> for >> > > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use >> > > UninvertingReader or index with docvalues. >> > > at org.apache.lucene.index.DocValues.checkField(DocValues.java:208) >> > > at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306) >> > > at org.apache.solr.search.function.ReverseOrdFieldSource.getValues( >> > > ReverseOrdFieldSource.java:98) >> > > at org.apache.lucene.queries.function.valuesource. >> > ReciprocalFloatFunction. >> > > getValues(ReciprocalFloatFunction.java:64) >> > > at org.apache.solr.response.transform.ValueSourceAugmenter.transform( >> > > ValueSourceAugmenter.java:95) >> > > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160) >> > > at org.apache.solr.response.TextRespo
Re: Error "unexpected docvalues type NUMERIC for field" using rord() function query on single valued int field
Hi, No, I reproduced the original issue, with the rord() function, on a brand new index with docValues=true, with just one doc indexed in it. Any clues? Thanks, Jaco. On 21 November 2016 at 15:06, Pushkar Raste <pushkar.ra...@gmail.com> wrote: > Did you turn on/off docValues on a already existing field? > > On Nov 16, 2016 11:51 AM, "Jaco de Vroed" <jdevr...@gmail.com> wrote: > > > Hi, > > > > I made a typo. The Solr version number in which this error occurs is > 5.5.3. > > I also checked 6.3.0, same problem. > > > > Thanks, bye, > > > > Jaco. > > > > On 16 November 2016 at 17:39, Jaco de Vroed <jdevr...@gmail.com> wrote: > > > > > Hello Solr users, > > > > > > I’m running into an error situation using Solr 5.3.3. The case is as > > > follows. In my schema, I have a field with a definition like this: > > > > > > > > positionIncrementGap="0”/> > > > …. > > > > > docValues="true" /> > > > > > > That field is used in function queries for boosting purposes, using the > > > rord() function. We’re coming from Solr 4, not using docValues for that > > > field, and now moving to Solr 5, using docValues. Now, this is causing > a > > > problem. When doing this: > > > > > > http://localhost:8983/solr/core1/select?q=*:*=ID, > > > recip(rord(PublicationDate),0.15,300,10) > > > > > > The following error is given: "*unexpected docvalues type NUMERIC for > > > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use > > > UninvertingReader or index with docvalues*” (full stack trace below). > > > > > > This does not happen when the field is changed to be multiValued, but I > > > don’t want to change that at this point (and I noticed that changing > from > > > single valued to multivalued, then attempting to post the document > again > > > also results in an error related to docvalues type, but that could be > the > > > topic of another mail I guess). This is now blocking our long desired > > > upgrade to Solr 5. We initially tried upgrading without docValues, but > > > performance was completely killed because of our function query based > > > ranking stuff, so we decide to use docValues. > > > > > > To me, this seems a bug. I’ve tried finding something in Solr’s JIRA, > the > > > exact same error is in https://issues.apache.org/jira/browse/SOLR-7495 > , > > > but that is a different case. > > > > > > I can create a JIRA issue for this of course, but first wanted to throw > > > this at the mailing list to see if there’s any insights that can be > > shared. > > > > > > Thanks a lot in advance, bye, > > > > > > Jaco.. > > > > > > unexpected docvalues type NUMERIC for field 'PublicationDate' (expected > > > one of [SORTED, SORTED_SET]). Use UninvertingReader or index with > > docvalues. > > > java.lang.IllegalStateException: unexpected docvalues type NUMERIC for > > > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use > > > UninvertingReader or index with docvalues. > > > at org.apache.lucene.index.DocValues.checkField(DocValues.java:208) > > > at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306) > > > at org.apache.solr.search.function.ReverseOrdFieldSource.getValues( > > > ReverseOrdFieldSource.java:98) > > > at org.apache.lucene.queries.function.valuesource. > > ReciprocalFloatFunction. > > > getValues(ReciprocalFloatFunction.java:64) > > > at org.apache.solr.response.transform.ValueSourceAugmenter.transform( > > > ValueSourceAugmenter.java:95) > > > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160) > > > at org.apache.solr.response.TextResponseWriter.writeDocuments( > > > TextResponseWriter.java:246) > > > at org.apache.solr.response.TextResponseWriter.writeVal( > > > TextResponseWriter.java:151) > > > at org.apache.solr.response.XMLWriter.writeResponse( > XMLWriter.java:113) > > > at org.apache.solr.response.XMLResponseWriter.write( > > > XMLResponseWriter.java:39) > > > at org.apache.solr.response.QueryResponseWriterUtil. > writeQueryResponse( > > > QueryResponseWriterUtil.java:52) > > > at org.apache.solr.servlet.HttpSolrCall.writeResponse( > > > HttpSolrCall.java:728) > > > at org.apache.solr.servlet.HttpSolrCall.call(H
Re: Error "unexpected docvalues type NUMERIC for field" using rord() function query on single valued int field
Hi, I made a typo. The Solr version number in which this error occurs is 5.5.3. I also checked 6.3.0, same problem. Thanks, bye, Jaco. On 16 November 2016 at 17:39, Jaco de Vroed <jdevr...@gmail.com> wrote: > Hello Solr users, > > I’m running into an error situation using Solr 5.3.3. The case is as > follows. In my schema, I have a field with a definition like this: > > positionIncrementGap="0”/> > …. > docValues="true" /> > > That field is used in function queries for boosting purposes, using the > rord() function. We’re coming from Solr 4, not using docValues for that > field, and now moving to Solr 5, using docValues. Now, this is causing a > problem. When doing this: > > http://localhost:8983/solr/core1/select?q=*:*=ID, > recip(rord(PublicationDate),0.15,300,10) > > The following error is given: "*unexpected docvalues type NUMERIC for > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use > UninvertingReader or index with docvalues*” (full stack trace below). > > This does not happen when the field is changed to be multiValued, but I > don’t want to change that at this point (and I noticed that changing from > single valued to multivalued, then attempting to post the document again > also results in an error related to docvalues type, but that could be the > topic of another mail I guess). This is now blocking our long desired > upgrade to Solr 5. We initially tried upgrading without docValues, but > performance was completely killed because of our function query based > ranking stuff, so we decide to use docValues. > > To me, this seems a bug. I’ve tried finding something in Solr’s JIRA, the > exact same error is in https://issues.apache.org/jira/browse/SOLR-7495, > but that is a different case. > > I can create a JIRA issue for this of course, but first wanted to throw > this at the mailing list to see if there’s any insights that can be shared. > > Thanks a lot in advance, bye, > > Jaco.. > > unexpected docvalues type NUMERIC for field 'PublicationDate' (expected > one of [SORTED, SORTED_SET]). Use UninvertingReader or index with docvalues. > java.lang.IllegalStateException: unexpected docvalues type NUMERIC for > field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use > UninvertingReader or index with docvalues. > at org.apache.lucene.index.DocValues.checkField(DocValues.java:208) > at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306) > at org.apache.solr.search.function.ReverseOrdFieldSource.getValues( > ReverseOrdFieldSource.java:98) > at org.apache.lucene.queries.function.valuesource.ReciprocalFloatFunction. > getValues(ReciprocalFloatFunction.java:64) > at org.apache.solr.response.transform.ValueSourceAugmenter.transform( > ValueSourceAugmenter.java:95) > at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160) > at org.apache.solr.response.TextResponseWriter.writeDocuments( > TextResponseWriter.java:246) > at org.apache.solr.response.TextResponseWriter.writeVal( > TextResponseWriter.java:151) > at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:113) > at org.apache.solr.response.XMLResponseWriter.write( > XMLResponseWriter.java:39) > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse( > QueryResponseWriterUtil.java:52) > at org.apache.solr.servlet.HttpSolrCall.writeResponse( > HttpSolrCall.java:728) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:257) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:208) > at org.eclipse.jetty.servlet.ServletHandler$CachedChain. > doFilter(ServletHandler.java:1652) > at org.eclipse.jetty.servlet.ServletHandler.doHandle( > ServletHandler.java:585) > at org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:143) > at org.eclipse.jetty.security.SecurityHandler.handle( > SecurityHandler.java:577) > at org.eclipse.jetty.server.session.SessionHandler. > doHandle(SessionHandler.java:223) > at org.eclipse.jetty.server.handler.ContextHandler. > doHandle(ContextHandler.java:1127) > at org.eclipse.jetty.servlet.ServletHandler.doScope( > ServletHandler.java:515) > at org.eclipse.jetty.server.session.SessionHandler. > doScope(SessionHandler.java:185) > at org.eclipse.jetty.server.handler.ContextHandler. > doScope(ContextHandler.java:1061) > at org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:141) > at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle( > ContextHandlerCollection.java:215) > at org.eclipse.jetty.server.handler.HandlerCollection. > handle(HandlerC
Error "unexpected docvalues type NUMERIC for field" using rord() function query on single valued int field
Hello Solr users, I’m running into an error situation using Solr 5.3.3. The case is as follows. In my schema, I have a field with a definition like this: That field is used in function queries for boosting purposes, using the rord() function. We’re coming from Solr 4, not using docValues for that field, and now moving to Solr 5, using docValues. Now, this is causing a problem. When doing this: http://localhost:8983/solr/core1/select?q=*:*=ID,recip(rord(PublicationDate),0.15,300,10) <http://localhost:8983/solr/core1/select?q=*:*=ID,recip(rord(PublicationDate),0.15,300,10)> The following error is given: "unexpected docvalues type NUMERIC for field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use UninvertingReader or index with docvalues” (full stack trace below). This does not happen when the field is changed to be multiValued, but I don’t want to change that at this point (and I noticed that changing from single valued to multivalued, then attempting to post the document again also results in an error related to docvalues type, but that could be the topic of another mail I guess). This is now blocking our long desired upgrade to Solr 5. We initially tried upgrading without docValues, but performance was completely killed because of our function query based ranking stuff, so we decide to use docValues. To me, this seems a bug. I’ve tried finding something in Solr’s JIRA, the exact same error is in https://issues.apache.org/jira/browse/SOLR-7495 <https://issues.apache.org/jira/browse/SOLR-7495>, but that is a different case. I can create a JIRA issue for this of course, but first wanted to throw this at the mailing list to see if there’s any insights that can be shared. Thanks a lot in advance, bye, Jaco.. unexpected docvalues type NUMERIC for field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use UninvertingReader or index with docvalues. java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 'PublicationDate' (expected one of [SORTED, SORTED_SET]). Use UninvertingReader or index with docvalues. at org.apache.lucene.index.DocValues.checkField(DocValues.java:208) at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:306) at org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:98) at org.apache.lucene.queries.function.valuesource.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:64) at org.apache.solr.response.transform.ValueSourceAugmenter.transform(ValueSourceAugmenter.java:95) at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:160) at org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:246) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:151) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:113) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:39) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:52) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:728) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpCo
Solr 3.5 MoreLikeThis on Date fields
Hi Everyone, Please help out if you know what is going on. We are upgrading to Solr 3.5 (from 1.4.1) and busy with a Re-Index and Test on our data. Everything seems OK, but Date Fields seem to be broken when using with the MoreLikeThis handler (I also saw the same error on Date Fields using the HighLighter in another forum post Invalid Date String for highlighting any date field match @ Mon 2011/08/15 13:10 ). * I deleted the index/core and only loaded a few records and still get the error when using the MoreLikeThis using the docdate as part of the mlt.fl params. * I double checked all the data that was loaded and the dates parse 100% and can see no problems with any of the data loaded. Type: fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ Definition: field name=docdate type=date indexed=true stored=true multiValued=false/ A sample result: date name=docdate1999-06-28T00:00:00Z/date THE MLT QUERY: Jan 16, 2012 4:09:16 PM org.apache.solr.core.SolrCore execute INFO: [legal_spring] webapp=/solr path=/select params={mlt.fl=doctitle,pld_pubtype,docdate,pld_cluster,pld_port,pld_summary,alltext,subclassmlt.mintf=1mlt=trueversion=2.2fl=doc_id,doctitle,docdate,prodtypeqt=mltmlt.boost=truemlt.qf=doctitle^5.0+alltext^0.2json.nl=mapwt=jsonrows=50mlt.mindf=1mlt.count=50start=0q=doc_id:PLD23996} status=400 QTime=1 THE ERROR: Jan 16, 2012 4:09:16 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'94046400' at org.apache.solr.schema.DateField.parseMath(DateField.java:165) at org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:106) at org.apache.solr.analysis.TrieTokenizer.init(TrieTokenizerFactory.java:76) at org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:51) at org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:41) at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:68) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:75) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385) at org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:876) at org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:820) at org.apache.lucene.search.similar.MoreLikeThis.like(MoreLikeThis.java:629) at org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:311) at org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:149) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) Sincerely, Jaco Olivier Please note: This email and its content are subject to the disclaimer as displayed at the following link http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web access, send an email to i...@sabinet.co.zamailto:i...@sabinet.co.za and a copy will be sent to you
Solr and Lucene in South Africa
Hi to all Solr/Lucene Users... Out team had a discussion today regarding the Solr/Lucene community closer to home. I am hereby putting out an SOS to all Solr/Lucene users in the South African market and wish to organize a meet-up (or user support group) if at all possible. It would be great to share some triumphs and pitfalls that were experienced. * Sorry for hogging the User Mailing list on non-technical question, but think this is the easiest way to get it done :) Jaco Olivier Web Specialist Please note: This email and its content are subject to the disclaimer as displayed at the following link http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web access, send an email to i...@sabinet.co.zamailto:i...@sabinet.co.za and a copy will be sent to you
Re: Good literature on search basics
See http://markmail.org/thread/z5sq2jr2a6eayth4 On 12 February 2010 12:14, javaxmlsoapdev vika...@yahoo.com wrote: Does anyone know good literature(web resources, books etc) on basics of search? I do have Solr 1.4 and Lucene books but wanted to go in more details on basics. Thanks, -- View this message in context: http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: why no results?
Hi Regan, I am using STRING fields only for values that in most cases will be used to FACET on.. I suggest using TEXT fields as per the default examples... ALSO, remember that if you do not specify the solr.LowerCaseFilterFactory that your search has just become case sensitive.. I struggled with that one before, so make sure what you are indexing is what you are searching for. * Stick to the default examples that is provided with the SOLR distro and you should be fine. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Jaco Olivier -Original Message- From: regany [mailto:re...@newzealand.co.nz] Sent: 08 December 2009 06:15 To: solr-user@lucene.apache.org Subject: Re: why no results? Tom Hill-7 wrote: Try solr.TextField instead. Thanks Tom, I've replaced the types section above with... types fieldtype name=string class=solr.TextField sortMissingLast=true omitNorms=true / /types deleted my index, restarted Solr and re-indexed my documents - but the search still returns nothing. Do I need to change the type in the fields sections as well? regan -- View this message in context: http://old.nabble.com/why-no-results--tp26688249p26688469.html Sent from the Solr - User mailing list archive at Nabble.com. Please consider the environment before printing this email. This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The content of this e-mail is the opinion of the writer only and is not endorsed by Sabinet Online Limited unless expressly stated otherwise.
RE: why no results?
Hi, Try changing your TEXT field to type text field name=text type=text indexed=true stored=false multiValued=true / (without the of course :)) That is your problem... also use the text type as per default examples with SOLR distro :) Jaco Olivier -Original Message- From: regany [mailto:re...@newzealand.co.nz] Sent: 08 December 2009 05:44 To: solr-user@lucene.apache.org Subject: why no results? hi all - newbie solr question - I've indexed some documents and can search / receive results using the following schema - BUT ONLY when searching on the id field. If I try searching on the title, subtitle, body or text field I receive NO results. Very confused. :confused: Can anyone see anything obvious I'm doing wrong Regan. ?xml version=1.0 ? schema name=core0 version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true / /types fields !-- general -- field name=id type=string indexed=true stored=true multiValued=false required=true / field name=title type=string indexed=true stored=true multiValued=false / field name=subtitle type=string indexed=true stored=true multiValued=false / field name=body type=string indexed=true stored=true multiValued=false / field name=text type=string indexed=true stored=false multiValued=true / /fields !-- field to use to determine and enforce document uniqueness. -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldtext/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=OR/ !-- copyFields group fields into one single searchable indexed field for speed. -- copyField source=title dest=text / copyField source=subtitle dest=text / copyField source=body dest=text / /schema -- View this message in context: http://old.nabble.com/why-no-results--tp26688249p26688249.html Sent from the Solr - User mailing list archive at Nabble.com. Please consider the environment before printing this email. This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The content of this e-mail is the opinion of the writer only and is not endorsed by Sabinet Online Limited unless expressly stated otherwise.
RE: do copyField's need to exist as Fields?
Hi Regan, Something I noticed on your setup... The ID field in your setup I assume to be your uniqueID for the book or journal (The ISSN or something) Try making this a string as TEXT is not the ideal field to use for unique IDs field name=id type=string indexed=true stored=true multiValued=false required=true / Congrats on figuring out SOLR fields - I suggest getting the SOLR 1.4 Book.. It really saved me a 1000 questions on this mailing list :) Jaco Olivier -Original Message- From: regany [mailto:re...@newzealand.co.nz] Sent: 09 December 2009 00:48 To: solr-user@lucene.apache.org Subject: Re: do copyField's need to exist as Fields? regany wrote: Is there a different way I should be setting it up to achieve the above?? Think I figured it out. I set up the fields so they are present, but get ignored accept for the text field which gets indexed... field name=id type=text indexed=true stored=true multiValued=false required=true / field name=title stored=false indexed=false multiValued=true type=text / field name=subtitle stored=false indexed=false multiValued=true type=text / field name=body stored=false indexed=false multiValued=true type=text / field name=text type=text indexed=true stored=false multiValued=true / and then copyField the first 4 fields to the text field: copyField source=id dest=text / copyField source=title dest=text / copyField source=subtitle dest=text / copyField source=body dest=text / Seems to be working!? :drunk: -- View this message in context: http://old.nabble.com/do-copyField%27s-need-to-exist-as-Fields--tp267017 06p26702224.html Sent from the Solr - User mailing list archive at Nabble.com. Please consider the environment before printing this email. This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The content of this e-mail is the opinion of the writer only and is not endorsed by Sabinet Online Limited unless expressly stated otherwise.
Nested phrases with proximity in Solr
Hello, As far as I've been able to dig up, there is no way to use nested phrases in Solr, let alone with proximity. For instance a b c d~10. I've seen a special Surround Query Parser in Lucene that appears to support this. Am I missing something? Any clues anybody? Thanks in advance, bye, Jaco.
Re: Solr Replication on Windows
Hi, In my experience, you can just migrate to 1.4. We are using this in production without any problems, and the Java Replication ( http://wiki.apache.org/solr/SolrReplication) works excellent. Bye, Jaco. 2009/6/17 vaibhav joshi callvaib...@hotmail.com Hi, I am using Solr 1.3 release and have different set of machines for Query and Master for Indexer. These machines are windows boxes. The Solr replication in wiki scripts are unix shell scritps. Are there any scritpt or Java version of Replication available with Solr 1.3..I saw Java replication mentioned in wiki but its seems to be available only with Solr 1.4. Thanks Vaibhav _ Live Search extreme As India feels the heat of poll season, get all the info you need on the MSN News Aggregator http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx
Re: Who is running 1.4 nightly in production?
Running 1.4 nightly in production as well, also for the Java replication and for the improved facet count algorithms. No problems, all running smoothly. Bye, Jaco. 2009/5/13 Erik Hatcher e...@ehatchersolutions.com We run a not too distant trunk (1.4, probably a month or so ago) version of Solr on LucidFind at http://www.lucidimagination.com/search Erik On May 12, 2009, at 5:02 PM, Walter Underwood wrote: We're planning our move to 1.4, and want to run one of our production servers with the new code. Just to feel better about it, is anyone else running 1.4 in production? I'm building 2009-05-11 right now. wuner
Re: Dictionary lookup possibilities
Hi, Thanks for the suggestions! It looks like the MemoryIndex is worth having a detailed look at, so that's what I'll start on. Thanks again, bye, Jaco. 2009/4/17 Steven A Rowe sar...@syr.edu Hi Jaco, On 4/9/2009 at 2:58 PM, Jaco wrote: I'm struggling with some ideas, maybe somebody can help me with past experiences or tips. I have loaded a dictionary into a Solr index, using stemming and some stopwords in analysis part of the schema. Each record holds a term from the dictionary, which can consist of multiple words. For some data analysis work, I want to send pieces of text (sentences actually) to Solr to retrieve all possible dictionary terms that could occur. Ideally, I want to construct a query that only returns those Solr records for which all individual words in that record are matched. For instance, my dictionary holds the following terms: 1 - a b c d 2 - c d e 3 - a b 4 - a e f g h If I put the sentence [a b c d f g h] in as a query, I want to recieve dictionary items 1 (matching all words a b c d) and 3 (matching words a b) as matches I have been puzzling about how to do this. The only way I found so far was to construct an OR query with all words of the sentence in it. In this case, that would result in all dictionary items being returned. This would then require some code to go over the search results and analyse each of them (i.e. by using the highlight function) to kick out 'false' matches, but I am looking for a more efficient way. Is there a way to do this with Solr functionality, or do I need to start looking into the Lucene API ..? Your problem could be modeled as a set of standing queries, where your dictionary entries are the *queries* (with all words required, maybe using a PhraseQuery or a SpanNearQuery), and the sentence is the document. Solr may not be usable in this context (extremely high volume queries), depending on your throughput requirements, but Lucene's MemoryIndex was designed for this kind of thing: http://lucene.apache.org/java/2_4_1/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html Steve
Dictionary lookup possibilities
Hello, I'm struggling with some ideas, maybe somebody can help me with past experiences or tips. I have loaded a dictionary into a Solr index, using stemming and some stopwords in analysis part of the schema. Each record holds a term from the dictionary, which can consist of multiple words. For some data analysis work, I want to send pieces of text (sentences actually) to Solr to retrieve all possible dictionary terms that could occur. Ideally, I want to construct a query that only returns those Solr records for which all individual words in that record are matched. For instance, my dictionary holds the following terms: 1 - a b c d 2 - c d e 3 - a b 4 - a e f g h If I put the sentence [a b c d f g h] in as a query, I want to recieve dictionary items 1 (matching all words a b c d) and 3 (matching words a b) as matches I have been puzzling about how to do this. The only way I found so far was to construct an OR query with all words of the sentence in it. In this case, that would result in all dictionary items being returned. This would then require some code to go over the search results and analyse each of them (i.e. by using the highlight function) to kick out 'false' matches, but I am looking for a more efficient way. Is there a way to do this with Solr functionality, or do I need to start looking into the Lucene API ..? Any help would be much appreciated as usual! Thanks, bye, Jaco.
Re: Size of my index directory increase considerably
Hi, After installing that patch, all is running fine for me as well - problem no longer occurring and replication running great! The issue https://issues.apache.org/jira/browse/SOLR-978 has already been committed, so it's also there in the 1.4 nightly builds. Bye, Jaco. 2009/3/26 sunnyfr johanna...@gmail.com Just applied this patch : http://www.nabble.com/Solr-Replication%3A-disk-space-consumed-on-slave-much-higher-than-on--master-td21579171.html#a21622876 It seems to work well now. Do I have to do something else ? Do you reckon something for my configuration ? Thanks a lot -- View this message in context: http://www.nabble.com/Size-of-my-index-directory-increase-considerably-tp22718590p22722075.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Replication: disk space consumed on slave much higher than on master
Hi Noble, Great stuff, no problem, I really think the Solr development team is excellent and takes pride in delivering high quality software! And we're going into production with a brand new Solr based system in a few weeks as well, so I'm really happy that this is fixed now. Bye, Jaco. 2009/1/24 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com hi Jaco, We owe you a bing THANK YOU. We were planning to roll out this feature into production in the next week or so. Our internal testing could not find this out. --Noble On Fri, Jan 23, 2009 at 6:36 PM, Jaco jdevr...@gmail.com wrote: Hi, I have tested this as well, looking fine! Both issues are indeed fixed, and the index directory of the slaves gets cleaned up nicely. I will apply the changes to all systems I've got running and report back in this thread in case any issues are found. Thanks for the very fast help! I usually need much, much more patience with commercial software vendors.. Cheers, Jaco. 2009/1/23 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com I have opened an issue to track this https://issues.apache.org/jira/browse/SOLR-978 On Fri, Jan 23, 2009 at 5:22 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: I tested with the patch it has solved both the issues On Fri, Jan 23, 2009 at 5:00 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Jan 23, 2009 at 2:12 PM, Jaco jdevr...@gmail.com wrote: Hi, I applied the patch and did some more tests - also adding some LOG.info() calls in delTree to see if it actually gets invoked (LOG.info(START: delTree: +dir.getName()); at the start of that method). I don't see any entries of this showing up in the log file at all, so it looks like delTree doesn't get invoked at all. To be sure, explaining the issue to prevent misunderstanding: - The number of files in the index directory on the slave keeps increasing (in my very small test core, there are now 128 files in the slave's index directory, and only 73 files in the master's index directory) - The directories index.x are still there after replication, but they are empty Are there any other things I can do check, or more info that I can provide to help fix this? The problem is that when we do a commit on the slave after replication is done. The commit does not re-open the IndexWriter. Therefore, the deletion policy does not take affect and older files are left as is. This can keep on building up. The only solution is to re-open the index writer. I think the attached patch can solve this problem. Can you try this and let us know? Thank you for your patience. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- --Noble Paul -- --Noble Paul
Re: Solr Replication: disk space consumed on slave much higher than on master
Hi, I applied the patch and did some more tests - also adding some LOG.info() calls in delTree to see if it actually gets invoked (LOG.info(START: delTree: +dir.getName()); at the start of that method). I don't see any entries of this showing up in the log file at all, so it looks like delTree doesn't get invoked at all. To be sure, explaining the issue to prevent misunderstanding: - The number of files in the index directory on the slave keeps increasing (in my very small test core, there are now 128 files in the slave's index directory, and only 73 files in the master's index directory) - The directories index.x are still there after replication, but they are empty Are there any other things I can do check, or more info that I can provide to help fix this? Thanks, bye, Jaco. 2009/1/22 Shalin Shekhar Mangar shalinman...@gmail.com On Fri, Jan 23, 2009 at 12:15 AM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: I have attached a patch which logs the names of the files which could not get deleted (which may help us diagnose the problem). If you are comfortable applying a patch you may try it out. I've committed this patch to trunk. -- Regards, Shalin Shekhar Mangar.
Re: Solr Replication: disk space consumed on slave much higher than on master
Hi, I have tested this as well, looking fine! Both issues are indeed fixed, and the index directory of the slaves gets cleaned up nicely. I will apply the changes to all systems I've got running and report back in this thread in case any issues are found. Thanks for the very fast help! I usually need much, much more patience with commercial software vendors.. Cheers, Jaco. 2009/1/23 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com I have opened an issue to track this https://issues.apache.org/jira/browse/SOLR-978 On Fri, Jan 23, 2009 at 5:22 PM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: I tested with the patch it has solved both the issues On Fri, Jan 23, 2009 at 5:00 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Fri, Jan 23, 2009 at 2:12 PM, Jaco jdevr...@gmail.com wrote: Hi, I applied the patch and did some more tests - also adding some LOG.info() calls in delTree to see if it actually gets invoked (LOG.info(START: delTree: +dir.getName()); at the start of that method). I don't see any entries of this showing up in the log file at all, so it looks like delTree doesn't get invoked at all. To be sure, explaining the issue to prevent misunderstanding: - The number of files in the index directory on the slave keeps increasing (in my very small test core, there are now 128 files in the slave's index directory, and only 73 files in the master's index directory) - The directories index.x are still there after replication, but they are empty Are there any other things I can do check, or more info that I can provide to help fix this? The problem is that when we do a commit on the slave after replication is done. The commit does not re-open the IndexWriter. Therefore, the deletion policy does not take affect and older files are left as is. This can keep on building up. The only solution is to re-open the index writer. I think the attached patch can solve this problem. Can you try this and let us know? Thank you for your patience. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- --Noble Paul
Re: Solr Replication: disk space consumed on slave much higher than on master
Hm, I don't know what to do anymore. I tried this: - Run Tomcat service as local administrator to overcome any permissioning issues - Installed latest nightly build (I noticed that item I mentioned before ( http://markmail.org/message/yq2ram4f3jblermd) had been committed which is good - Build a small master and slave core to try it all out - With each replication, the number of files on slave grows, and the directories index.xxx.. are not removed - I tried sending explicit commit commands to the slave, assuming it wouldn't help, which was true. - I don't see any reference to SolrDeletion in the log of the slave (it's there in the log of the master) Can anybody recommend some action to be taken? I'm building up some quite large production cores right now, and don't want the slaves to eat up all hard disk space of course.. Thanks a lot in advance, Jaoc. 2009/1/21 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com On Wed, Jan 21, 2009 at 3:42 PM, Jaco jdevr...@gmail.com wrote: Thanks for the fast replies! It appears that I made a (probably classical) error... I didnt' make the change to solrconfig.xml to include the deletionPolicy when applying the upgrade. I include this now, but the slave is not cleaning up. Will this be done at some point automatically? Can I trigger this? Unfortunately , no. Lucene is supposed to cleanup these old commit points automatically after each commit. Even if the delettionPolicy is not specified the default is supposed to take effect. User access rights for the user are OK, this use is allowed to do anything in the Solr data directory (Tomcat service is running from SYSTEM account (Windows)). Thanks, regards, Jaco. 2009/1/21 Shalin Shekhar Mangar shalinman...@gmail.com Hi, There shouldn't be so many files on the slave. Since the empty index.x folders are not getting deleted, is it possible that Solr process user does not enough privileges to delete files/folders? Also, have you made any changes to the IndexDeletionPolicy configuration? On Wed, Jan 21, 2009 at 2:15 PM, Jaco jdevr...@gmail.com wrote: Hi, I'm running Solr nightly build of 20.12.2008, with patch as discussed on http://markmail.org/message/yq2ram4f3jblermd, using Solr replication. On various systems running, I see that the disk space consumed on the slave is much higher than on the master. One example: - Master: 30 GB in 138 files - Slave: 152 GB in 3,941 files Can anybody tell me what to do to prevent this from happening, and how to clean up the slave? Also, there are quite some empty index.xxx directories sitting in the slaves data dir. Can these be safely removed? Thanks a lot in advance, bye, Jaco. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Solr Replication: disk space consumed on slave much higher than on master
Hi, I'm running Solr nightly build of 20.12.2008, with patch as discussed on http://markmail.org/message/yq2ram4f3jblermd, using Solr replication. On various systems running, I see that the disk space consumed on the slave is much higher than on the master. One example: - Master: 30 GB in 138 files - Slave: 152 GB in 3,941 files Can anybody tell me what to do to prevent this from happening, and how to clean up the slave? Also, there are quite some empty index.xxx directories sitting in the slaves data dir. Can these be safely removed? Thanks a lot in advance, bye, Jaco.
Re: Solr Replication: disk space consumed on slave much higher than on master
Thanks for the fast replies! It appears that I made a (probably classical) error... I didnt' make the change to solrconfig.xml to include the deletionPolicy when applying the upgrade. I include this now, but the slave is not cleaning up. Will this be done at some point automatically? Can I trigger this? User access rights for the user are OK, this use is allowed to do anything in the Solr data directory (Tomcat service is running from SYSTEM account (Windows)). Thanks, regards, Jaco. 2009/1/21 Shalin Shekhar Mangar shalinman...@gmail.com Hi, There shouldn't be so many files on the slave. Since the empty index.x folders are not getting deleted, is it possible that Solr process user does not enough privileges to delete files/folders? Also, have you made any changes to the IndexDeletionPolicy configuration? On Wed, Jan 21, 2009 at 2:15 PM, Jaco jdevr...@gmail.com wrote: Hi, I'm running Solr nightly build of 20.12.2008, with patch as discussed on http://markmail.org/message/yq2ram4f3jblermd, using Solr replication. On various systems running, I see that the disk space consumed on the slave is much higher than on the master. One example: - Master: 30 GB in 138 files - Slave: 152 GB in 3,941 files Can anybody tell me what to do to prevent this from happening, and how to clean up the slave? Also, there are quite some empty index.xxx directories sitting in the slaves data dir. Can these be safely removed? Thanks a lot in advance, bye, Jaco. -- Regards, Shalin Shekhar Mangar.
Unable to move index file error during replication
Hello, While testing out the new replication features, I'm running into some strange problem. On the slave, I keep getting an error like this after all files have been copied from the master to the temporary index.x directory: SEVERE: Unable to move index file from: D:\Data\solr\Slave\data\index.20081224110855\_21e.tvx to: D:\Data\Solr\Slave\data\index\_21e.tvx The replication then stops, index remains in original state, so the updates are not available at the slave. This is my replication config at the master: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize' it can also be 'commit' -- str name=replicateAftercommit/str str name=confFilesschema.xml/str /lst /requestHandler This is the replication config at the slave: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrl http://hostnamemaster:8080/solr/Master/replication/str str name=pollInterval00:10:00/str str name=ziptrue/str /lst /requestHandler I'm running a Solr nightly build of 21.12.2008 in Tomcat 6 on Windows 2003. Initially I thought there was some problem with disk space, but this is not the case. Replication did run fine for intial version of index, but after that at some point it didn't work anymore. Any ideas what could be wrong here? Thanks very much in advance, bye, Jaco.
Re: Unable to move index file error during replication
Very good! I applied the patch in the attached file, working fine now. I'll keep monitoring and post any issues found. Will this be included in some next nightly build? Thanks very much for the very quick response! Jaco. 2008/12/24 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com James thanks . If this is true the place to fix this is in ReplicationHandler#getFileList(). patch is attached. On Wed, Dec 24, 2008 at 4:04 PM, James Grant james.gr...@semantico.com wrote: I had the same problem. It turned out that the list of files from the master included duplicates. When the slave completes the download and tries to move the files into the index it comes across a file that does not exist because it has already been moved so it backs out the whole operation. My solution for now was to patch the copyindexFiles method of org.apache.solr.handler.SnapPuller so that it normalises the list before moving the files. This isn't the best solution since it will still download the file twice but it was the easiest and smallest change to make. The patch is below Regards James --- src/java/org/apache/solr/handler/SnapPuller.java(revision 727347) +++ src/java/org/apache/solr/handler/SnapPuller.java(working copy) @@ -470,7 +470,7 @@ */ private boolean copyIndexFiles(File snapDir, File indexDir) { String segmentsFile = null; -ListString copiedfiles = new ArrayListString(); +SetString filesToCopy = new HashSetString(); for (MapString, Object f : filesDownloaded) { String fname = (String) f.get(NAME); // the segments file must be copied last @@ -482,6 +482,10 @@ segmentsFile = fname; continue; } + filesToCopy.add(fname); +} +ListString copiedfiles = new ArrayListString(); +for (String fname: filesToCopy) { if (!copyAFile(snapDir, indexDir, fname, copiedfiles)) return false; copiedfiles.add(fname); } Jaco wrote: Hello, While testing out the new replication features, I'm running into some strange problem. On the slave, I keep getting an error like this after all files have been copied from the master to the temporary index.x directory: SEVERE: Unable to move index file from: D:\Data\solr\Slave\data\index.20081224110855\_21e.tvx to: D:\Data\Solr\Slave\data\index\_21e.tvx The replication then stops, index remains in original state, so the updates are not available at the slave. This is my replication config at the master: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize' it can also be 'commit' -- str name=replicateAftercommit/str str name=confFilesschema.xml/str /lst /requestHandler This is the replication config at the slave: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrl http://hostnamemaster:8080/solr/Master/replication/str str name=pollInterval00:10:00/str str name=ziptrue/str /lst /requestHandler I'm running a Solr nightly build of 21.12.2008 in Tomcat 6 on Windows 2003. Initially I thought there was some problem with disk space, but this is not the case. Replication did run fine for intial version of index, but after that at some point it didn't work anymore. Any ideas what could be wrong here? Thanks very much in advance, bye, Jaco. -- --Noble Paul
Preferred Tomcat version on Windows 2003 (64 bits)
Hello, I am planning a brand new environment for Solr running on a Windows 2003 Server 64 bits platform. I want to use Tomcat, and was wondering whether there is any preference in general for using Tomcat 5.5 or Tomcat 6.0 with Solr. Any suggestions would be appreciated! Thanks, bye, Jaco.
Re: Preferred Tomcat version on Windows 2003 (64 bits)
Thanks for the fast reply! I've tested SOLR-561, and it is working beautifully! Excellent functionality. Cheers, Jaco. 2008/11/6 Otis Gospodnetic [EMAIL PROTECTED] I don't think there are preferences. If going with the brand new setup why not go with Tomcat 6.0. Also be aware that if you want master-slave setup Windows you will need to use post 1.3 version of Solr (nightly) that includes functionality from SOLR-561. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jaco [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, November 6, 2008 11:32:04 AM Subject: Preferred Tomcat version on Windows 2003 (64 bits) Hello, I am planning a brand new environment for Solr running on a Windows 2003 Server 64 bits platform. I want to use Tomcat, and was wondering whether there is any preference in general for using Tomcat 5.5 or Tomcat 6.0 with Solr. Any suggestions would be appreciated! Thanks, bye, Jaco.
Distributed search, standard request handler and more like this
Hello, I'm doing some expirements with the morelikethis functionality using the standard request handler to see if it also works with distributed search (I saw that it will not yet work with the MoreLikeThis handler, https://issues.apache.org/jira/browse/SOLR-788). As far as I can see, this also does not work when using the standard request handler, i.e.: http://localhost:8080/solr/select?q=ID:*documentID* mlt=truemlt.fl=Textmlt.mindf=1mlt.mintf=1shards=shard1,shard2 I''m not getting any moreLikeThis results back, just the document resulting from the q= query. The same query without shards= does return moreLiekThis results. Am I doing something wrong or is this not yet supported..? Thanks, bye, Jaco.
Re: Integrating external stemmer in Solr and pre-processing text
Hi, The suggested approach with a TokenFilter extending the BufferedTokenStream class works fine, performance is OK - the external stemmer is now invoked only once for the complete search text. Also, from a functional point of view, the approach is useful, because it allows for other filtering (i.e WordDelimiterFilter with the various useful options) to be done before stemming takes place. Code is roughly like this for the process() function of the custom Filter class: protected Token process (Token token) { StringBuilderstringBuilder = new StringBuilder(); TokennextToken; IntegertokenPos = 0; MapInteger, TokentokenMap = new LinkedHashMapInteger, Token(); stringBuilder.append(token.term()).append(' '); tokenMap.put(tokenPos++, token); nextToken= read(); while (nextToken != null) { stringBuilder.append(nextToken.term()).append(' '); tokenMap.put(tokenPos++, nextToken); nextToken= read(); } StringinputText = stringBuilder.toString(); StringstemmedText = stemText(inputText); String[] stemmedWords= stemmedText.split(\\s); for (Map.EntryInteger, Token entry : tokenMap.entrySet()) { Integerpos= entry.getKey(); Tokentok = entry.getValue(); tok.setTermBuffer(stemmedWords[pos]); write(tok); } return null; } } This will need some work and additional error checking, and I'll probably put a maximum om the number of tokens that is to be processed in one go to make sure things don't get too big in memory. Thanks for helping out! Bye, Jaco. 2008/9/26 Jaco [EMAIL PROTECTED] Thanks for these suggestions, will try it in the coming days and post my findings in this thread. Bye, Jaco. 2008/9/26 Grant Ingersoll [EMAIL PROTECTED] On Sep 26, 2008, at 12:05 PM, Jaco wrote: Hi Grant, In reply to your questions: 1. Are you having to restart/initialize the stemmer every time for your slow approach? Does that really need to happen? It is invoking a COM object in Windows. The object is instantiated once for a token stream, and then invoked once for each token. The invoke always has an overhead, not much to do about that (sigh...) 2. Can the stemmer return something other than a String? Say a String array of all the stemmed words? Or maybe even some type of object that tells you the original word and the stemmed word? The stemmer can only return a String. But, I do know that the returned string always has exactly the same number of words as the input string. So logically, it would be possible to : a) first calculate the position/start/end of each token in the input string (usual tokenization by Whitespace), resulting in token list 1 b) then invoke the stemmer, and tokenize that result by Whitespace, resulting in token list 2 c) 'merge' the token values of token list 2 into token list 1, which is possible because each token's position is the same in both lists... d) return that 'merged' token list 2 for further processing Would this work in Solr? I think so, assuming your stemmer tokenizes on whitespace as well. I can do some Java coding to achieve that from logical point of view, but I wouldn't know how to structure this flow into the MyTokenizerFactory, so some hints to achieve that would be great! One thought: Don't create an all in one Tokenizer. Instead, keep the Whitespace Tokenizer as is. Then, create a TokenFilter that buffers the whole document into a memory (via the next() implementation) and also creates, using StringBuilder, a string containing the whole text. Once you've read it all in, then send the string to your stemmer, parse it back out and associate it back to your token buffer. If you are guaranteed position, you could even keep a (linked) hash, such that it is really quick to look up tokens after stemming. Pseudocode looks something like: while (token.next != null) tokenMap.put(token.position, token) stringBuilder.append(' ').append(token.text) stemmedText = comObj.stem(stringBuilder.toString()) correlateStemmedText(stemmedText, tokenMap) spit out the tokens one by one... I think this approach should be fast (but maybe not as fast as your all in one tokenizer) and will provide the correct position and offsets. You do have to be careful w/ really big documents, as that map can be big. You also want to be careful about map reuse, token reuse, etc. I believe there are a couple of buffering TokenFilters in Solr that you could examine for inspiration. I think the RemoveDuplicatesTokenFilter (or whatever it's called) does buffering. -Grant Thanks for helping out! Jaco. 2008/9/26 Grant Ingersoll [EMAIL PROTECTED] On Sep 26, 2008, at 9:40 AM, Jaco wrote: Hi, Here's some
Integrating external stemmer in Solr and pre-processing text
Hello, I need to work with an external stemmer in Solr. This stemmer is accessible as a COM object (running Solr in tomcat on Windows platform). I managed to integrate this using the com4j library. I tested two scenario's: 1. Create a custom FilterFactory and Filter class for this. The external stemmer is then invoked for every token 2. Create a custom TokenizerFactory (extending BaseTokenizerFactory), that invokes the external stemmer for the entire search text, then puts the result of this into a StringReader, and finally returns new WhitespaceTokenizer(stringReader), so the stemmed text gets tokenized by the whitespace tokenizer. Looking at search results, both scenario's appear to work from a functional point of view. The first scenario however is too slow because of the overhead of calling the external COM object for each token. The second scenario is much faster, and also gives correct search results. However, this then gives problems with highlighting - sometimes, errors are reported (String out of Range), in other cases, I get incorrect highlight fragments. Without knowing all details about this stuff, this makes sense because of the change done to the text to be processed before it's tokenized. Maybe my second scenario does not make sense at all..? Any ideas on how to overcome this or any other suggestions on how to realise this? Thanks, bye, Jaco. PS I posted this message twice before but it didn't come through (spam filtering..??), so this is the 2nd try with text changed a bit
Re: Integrating external stemmer in Solr and pre-processing text
Hi, Here's some of the code of my Tokenizer: public class MyTokenizerFactory extends BaseTokenizerFactory { public WhitespaceTokenizer create(Reader input) { String text, normalizedText; try { text = IOUtils.toString(input); normalizedText= *invoke my stemmer(text)*; } catch( IOException ex ) { throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, ex ); } StringReaderstringReader = new StringReader(normalizedText); return new WhitespaceTokenizer(stringReader); } } I see what's going in the analysis tool now, and I think I understand the problem. For instance, the text: abcdxxx defgxxx. Let's assume the stemmer gets rid of xxx. I would then see this in the analysis tool after the tokenizer stage: - abcd - term position 1; start: 1; end: 3 - defg - term position 2; start: 4; end: 7 These positions are not in line with the initial search text - this must be why the highlighting goes wrong. I guess my little trick to do this was a bit too simple because it messes up the positions basically because something different from the original source text is tokenized. Any suggestions would be very welcome... Cheers, Jaco. 2008/9/26 Grant Ingersoll [EMAIL PROTECTED] How are you creating the tokens? What are you setting for the offsets and the positions? One thing that is helpful is Solr's built in Analysis tool via the Admin interface (http://localhost:8983/solr/admin/) From there, you can plug in verbose mode, and see what the position and offsets are for every piece of your Analyzer. -Grant On Sep 26, 2008, at 3:10 AM, Jaco wrote: Hello, I need to work with an external stemmer in Solr. This stemmer is accessible as a COM object (running Solr in tomcat on Windows platform). I managed to integrate this using the com4j library. I tested two scenario's: 1. Create a custom FilterFactory and Filter class for this. The external stemmer is then invoked for every token 2. Create a custom TokenizerFactory (extending BaseTokenizerFactory), that invokes the external stemmer for the entire search text, then puts the result of this into a StringReader, and finally returns new WhitespaceTokenizer(stringReader), so the stemmed text gets tokenized by the whitespace tokenizer. Looking at search results, both scenario's appear to work from a functional point of view. The first scenario however is too slow because of the overhead of calling the external COM object for each token. The second scenario is much faster, and also gives correct search results. However, this then gives problems with highlighting - sometimes, errors are reported (String out of Range), in other cases, I get incorrect highlight fragments. Without knowing all details about this stuff, this makes sense because of the change done to the text to be processed before it's tokenized. Maybe my second scenario does not make sense at all..? Any ideas on how to overcome this or any other suggestions on how to realise this? Thanks, bye, Jaco. PS I posted this message twice before but it didn't come through (spam filtering..??), so this is the 2nd try with text changed a bit -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Integrating external stemmer in Solr and pre-processing text
Hi Grant, In reply to your questions: 1. Are you having to restart/initialize the stemmer every time for your slow approach? Does that really need to happen? It is invoking a COM object in Windows. The object is instantiated once for a token stream, and then invoked once for each token. The invoke always has an overhead, not much to do about that (sigh...) 2. Can the stemmer return something other than a String? Say a String array of all the stemmed words? Or maybe even some type of object that tells you the original word and the stemmed word? The stemmer can only return a String. But, I do know that the returned string always has exactly the same number of words as the input string. So logically, it would be possible to : a) first calculate the position/start/end of each token in the input string (usual tokenization by Whitespace), resulting in token list 1 b) then invoke the stemmer, and tokenize that result by Whitespace, resulting in token list 2 c) 'merge' the token values of token list 2 into token list 1, which is possible because each token's position is the same in both lists... d) return that 'merged' token list 2 for further processing Would this work in Solr? I can do some Java coding to achieve that from logical point of view, but I wouldn't know how to structure this flow into the MyTokenizerFactory, so some hints to achieve that would be great! Thanks for helping out! Jaco. 2008/9/26 Grant Ingersoll [EMAIL PROTECTED] On Sep 26, 2008, at 9:40 AM, Jaco wrote: Hi, Here's some of the code of my Tokenizer: public class MyTokenizerFactory extends BaseTokenizerFactory { public WhitespaceTokenizer create(Reader input) { String text, normalizedText; try { text = IOUtils.toString(input); normalizedText= *invoke my stemmer(text)*; } catch( IOException ex ) { throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, ex ); } StringReaderstringReader = new StringReader(normalizedText); return new WhitespaceTokenizer(stringReader); } } I see what's going in the analysis tool now, and I think I understand the problem. For instance, the text: abcdxxx defgxxx. Let's assume the stemmer gets rid of xxx. I would then see this in the analysis tool after the tokenizer stage: - abcd - term position 1; start: 1; end: 3 - defg - term position 2; start: 4; end: 7 These positions are not in line with the initial search text - this must be why the highlighting goes wrong. I guess my little trick to do this was a bit too simple because it messes up the positions basically because something different from the original source text is tokenized. Yes, this is exactly the problem. I don't know enough about com4J or your stemmer, but some things come to mind: 1. Are you having to restart/initialize the stemmer every time for your slow approach? Does that really need to happen? 2. Can the stemmer return something other than a String? Say a String array of all the stemmed words? Or maybe even some type of object that tells you the original word and the stemmed word? -Grant
Re: Integrating external stemmer in Solr and pre-processing text
Thanks for these suggestions, will try it in the coming days and post my findings in this thread. Bye, Jaco. 2008/9/26 Grant Ingersoll [EMAIL PROTECTED] On Sep 26, 2008, at 12:05 PM, Jaco wrote: Hi Grant, In reply to your questions: 1. Are you having to restart/initialize the stemmer every time for your slow approach? Does that really need to happen? It is invoking a COM object in Windows. The object is instantiated once for a token stream, and then invoked once for each token. The invoke always has an overhead, not much to do about that (sigh...) 2. Can the stemmer return something other than a String? Say a String array of all the stemmed words? Or maybe even some type of object that tells you the original word and the stemmed word? The stemmer can only return a String. But, I do know that the returned string always has exactly the same number of words as the input string. So logically, it would be possible to : a) first calculate the position/start/end of each token in the input string (usual tokenization by Whitespace), resulting in token list 1 b) then invoke the stemmer, and tokenize that result by Whitespace, resulting in token list 2 c) 'merge' the token values of token list 2 into token list 1, which is possible because each token's position is the same in both lists... d) return that 'merged' token list 2 for further processing Would this work in Solr? I think so, assuming your stemmer tokenizes on whitespace as well. I can do some Java coding to achieve that from logical point of view, but I wouldn't know how to structure this flow into the MyTokenizerFactory, so some hints to achieve that would be great! One thought: Don't create an all in one Tokenizer. Instead, keep the Whitespace Tokenizer as is. Then, create a TokenFilter that buffers the whole document into a memory (via the next() implementation) and also creates, using StringBuilder, a string containing the whole text. Once you've read it all in, then send the string to your stemmer, parse it back out and associate it back to your token buffer. If you are guaranteed position, you could even keep a (linked) hash, such that it is really quick to look up tokens after stemming. Pseudocode looks something like: while (token.next != null) tokenMap.put(token.position, token) stringBuilder.append(' ').append(token.text) stemmedText = comObj.stem(stringBuilder.toString()) correlateStemmedText(stemmedText, tokenMap) spit out the tokens one by one... I think this approach should be fast (but maybe not as fast as your all in one tokenizer) and will provide the correct position and offsets. You do have to be careful w/ really big documents, as that map can be big. You also want to be careful about map reuse, token reuse, etc. I believe there are a couple of buffering TokenFilters in Solr that you could examine for inspiration. I think the RemoveDuplicatesTokenFilter (or whatever it's called) does buffering. -Grant Thanks for helping out! Jaco. 2008/9/26 Grant Ingersoll [EMAIL PROTECTED] On Sep 26, 2008, at 9:40 AM, Jaco wrote: Hi, Here's some of the code of my Tokenizer: public class MyTokenizerFactory extends BaseTokenizerFactory { public WhitespaceTokenizer create(Reader input) { String text, normalizedText; try { text = IOUtils.toString(input); normalizedText= *invoke my stemmer(text)*; } catch( IOException ex ) { throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, ex ); } StringReaderstringReader = new StringReader(normalizedText); return new WhitespaceTokenizer(stringReader); } } I see what's going in the analysis tool now, and I think I understand the problem. For instance, the text: abcdxxx defgxxx. Let's assume the stemmer gets rid of xxx. I would then see this in the analysis tool after the tokenizer stage: - abcd - term position 1; start: 1; end: 3 - defg - term position 2; start: 4; end: 7 These positions are not in line with the initial search text - this must be why the highlighting goes wrong. I guess my little trick to do this was a bit too simple because it messes up the positions basically because something different from the original source text is tokenized. Yes, this is exactly the problem. I don't know enough about com4J or your stemmer, but some things come to mind: 1. Are you having to restart/initialize the stemmer every time for your slow approach? Does that really need to happen? 2. Can the stemmer return something other than a String? Say a String array of all the stemmed words? Or maybe even some type of object that tells you the original word and the stemmed word? -Grant -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Pre-processing text in custom FilterFactory / TokenizerFactory
Hello, I need to work with an external stemmer in Solr. This stemmer is accessible as a COM object (running Solr in tomcat on Windows platform). I managed to integrate this using the com4j library. I tried two scenario's: 1. Create a custom FilterFactory and Filter class for this. The external stemmer is then invoked for every token 2. Create a custom TokenizerFactory, that invokes the external stemmer for the entire search text, then puts the result of this into a StringReader, and finally returns new WhitespaceTokenizer(stringReader), so the stemmed text gets tokenized by the whitespace tokenizer. Looking at search results, both scenario's appear to work from a functional point of view. The first scenario however is too slow because of the overhead of calling the external COM object for each token. The second scenario is much faster, and also gives correct search results. However, this then gives problems with highlighting - sometimes, errors are reported (String out of Range), in other cases, I get incorrect highlight fragments. Without knowing all details about this stuff, this makes sense because of the change done to the text to be processed (I guess positions get messed up then). Maybe my second scenario is totally insane? Any ideas on how to overcome this or any other suggestions on how to realise this? Cheers, Jaco. PS I posted this message yesterday, but it didn't come through, so this is the 2nd try..
Pre-processing text in custom FilterFactory / TokenizerFactory
Hello, I need to work with an external stemmer, which is accessible as a COM object. I managed to integrate this using the com4j library. I tried two scenario's: 1. Create a custom FilterFactory and Filter class for this. The external stemmer is then invoked for every token 2. Create a custom TokenizerFactory, that invokes the external stemmer for the entire search text, then puts the result of this into a StringReader, and finally returns new WhitespaceTokenizer(stringReader), so the stemmed text gets tokenized by the whitespace tokenizer. Both scenario's appear to work from a functional point of view. The first scenario however is to slow because of the overhead of calling the external COM object. The second scenario is much faster, and also gives correct search results. However, this then gives problems with highlighting - sometimes, errors are reported (String out of Range), in other cases, I get incorrect highlight fragments. Without knowing all details about this stuff, this makes sense because of the change done to the text to be processed (I guess positions get messed up then). Maybe my second scenario is totally insane? Any ideas on how to overcome this? Cheers, Jaco.
Re: scoring individual values in a multivalued field
Hi, I ran into the same problem some time ago, couldn't find any relation to the boost values on the multivalued field and the search results. Does anybody have an idea how to handle this? Thanks, Jaco. 2008/8/29 Sébastien Rainville [EMAIL PROTECTED] Hi, I have a multivalued field that I would want to score individually for each value. Is there an easy way to do that? Here's a concrete example of what I'm trying to achieve: Let's say that I have 3 documents with a field name_t and a multivalued field caracteristic_t_mv: doc field name=name_t boost=1.0Dog/field field name=caracteristic_t_mv boost=0.45Cool/field field name=caracteristic_t_mv boost=0.2Big/field field name=caracteristic_t_mv boost=0.89Dirty/field /doc doc field name=name_t boost=1.0Cat/field field name=caracteristic_t_mv boost=0.76Small/field field name=caracteristic_t_mv boost=0.32Dirty/field /doc doc field name=name_t boost=1.0Fish/field field name=caracteristic_t_mv boost=0.92Smells/field field name=caracteristic_t_mv boost=0.55Dirty/field /doc If I query only the field caracteristic_t_mv for the value Dirty I would like the documents to be sorted accordingly = get 1-3-2. It's possible to set the scoring of a field when indexing but there are 2 problems with that: 1) the value of the field boost is actually the multiplication of the value for the different boost values of the fields with the same name; 2) the value of normField is persisted as a byte in the index and the precision loss hurts. Thanks in advance, Sebastien
Distributed search and facet counts using facet.limit=-1
Hello, I'm testing the distributed search using the shards= parameter, looking into the facet counts (release: Solr 1.3.0 RC-2). I noticed that when using facet.limit = -1 (to get unlimited number of facet values with count) there are no facet counts returned at all. There is mention of this in https://issues.apache.org/jira/browse/SOLR-303, but I don't see this working OK in the release mentioned. My query looks as follows: http://localhost:8080/solr/select?q=my queryfl= some fieldsfacet=truefacet.sort=truefacet.limit=01facet.field= my facet field shards=localhost:8080/solr Am I possibly doing something wrong or is this a bug? Bye, Jaco.
Re: Beginners question: adding a plugin
That does the trick! Thanks for the quick reply (and for a great Solr product!) Bye, Jaco. 2008/8/27 Grant Ingersoll [EMAIL PROTECTED] Instead of solr.TestStemFilterFactory, put the fully qualified classname for the TestStemFilterFactory, i.e. com.my.great.stemmer.TestStemFilterFactory. The solr.FactoryName notation is just shorthand for org.apache.solr.BlahBlahBlah -Grant On Aug 27, 2008, at 3:27 PM, Jaco wrote: Hello, I'm pretty new to Solr, and not a Java expert, and trying to create my own plug in according to the instructions given in http://wiki.apache.org/solr/SolrPlugins. I want to integrate an external stemmer for the Dutch language by creating a new FilterFactory that will invoke the external stemmer for a TokenStream. First thing I want to do is just make sure I can get the plug in running. Here's what I did: - Take a copy of DutchStemFilterFactory.java, rename it to TestStemFilterFactory, renamed the class to TestStemFilterFactory - Successfully compiled the java using javac, and add the .class file to a jar file - Put the jar file in SOLR_HOME/lib - Put a line filter class=solr.TestStemFilterFactory / in my analyzer definition in schema.xml - Restart tomcat In the Tomcat log, there is an indication that the file is found: 27-Aug-2008 20:58:25 org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/D:/Programs/Solr/lib/Test.jar' to Solr classloader But then I get errors being reported by Tomcat further down the log file: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.TestStemFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:256) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:261) at org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) Caused by: java.lang.ClassNotFoundException: solr.TestStemFilterFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) . Probably some configuration issue somewhere, but I am in the dark here (as said: not a Java expert...). I've tried to find information in mailing list archives on this, but no luck so far. I'm Running Solr nightly build of 20.08.2008, tomcat 5.5.26 on Windows. Any help would be much appreciated! Cheers, Jaco. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Beginners question: adding a plugin
Hello, I'm pretty new to Solr, and not a Java expert, and trying to create my own plug in according to the instructions given in http://wiki.apache.org/solr/SolrPlugins. I want to integrate an external stemmer for the Dutch language by creating a new FilterFactory that will invoke the external stemmer for a TokenStream. First thing I want to do is just make sure I can get the plug in running. Here's what I did: - Take a copy of DutchStemFilterFactory.java, rename it to TestStemFilterFactory, renamed the class to TestStemFilterFactory - Successfully compiled the java using javac, and add the .class file to a jar file - Put the jar file in SOLR_HOME/lib - Put a line filter class=solr.TestStemFilterFactory / in my analyzer definition in schema.xml - Restart tomcat In the Tomcat log, there is an indication that the file is found: 27-Aug-2008 20:58:25 org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/D:/Programs/Solr/lib/Test.jar' to Solr classloader But then I get errors being reported by Tomcat further down the log file: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.TestStemFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:256) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:261) at org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) Caused by: java.lang.ClassNotFoundException: solr.TestStemFilterFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) . Probably some configuration issue somewhere, but I am in the dark here (as said: not a Java expert...). I've tried to find information in mailing list archives on this, but no luck so far. I'm Running Solr nightly build of 20.08.2008, tomcat 5.5.26 on Windows. Any help would be much appreciated! Cheers, Jaco.