RE: Solr commits before documents are added
Any chance you are indexing to a Master, then synching to a Slave and you aren't seeing those last 20 on the Slave? There is an issue with synching between Master and Slave that we've experienced. If the last commit is very small (20 sounds possible!) it can occur in the same clock second on that machine. The Master will see the commit and its index will show the data fine. However, the Slave cannot see the second commit on the same clock second, so it will be missing the last 20 due to sync between the two. It's an edge case, but we ran into it recently. -Todd -Original Message- From: SharmilaR [mailto:sranganat...@library.rochester.edu] Sent: Monday, October 19, 2009 1:07 PM To: solr-user@lucene.apache.org Subject: RE: Solr commits before documents are added Solr version is 1.3 I am indexing total of 1.4 million documents. Yes, I commit(waitFlush="true" waitSearcher="true") every 100k documents and then one at the end. I have a counter next to addDoc(SolrDocument) statement to keep track of number of documents added. When I query Solr after commit, the total number of documents returned does not match the number of documents added. This happens only when I index millions of documents and not when I index like 500 documents. In this case, I know its the last 20 documents which are not committed because each document has a field 'RECORD_ID' which is assigned sequential number(in java code). When I query Solr using Solr admin interface, the documents with last 20 RECORD_ID are missing.(example the last id is 999,980 instead of 1,000,000) - Sharmila Feak, Todd wrote: > > A few questions to help the troubleshooting. > > Solr version #? > > Is there just 1 commit through Solrj for the millions of documents? > > Or do you do it on a regular interval (every 100k documents for example) > and then one at the end to be sure? > > How are you observing that the last few didn't make it in? Are you looking > at a slave or master? > > -Todd > > -Original Message- From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] Sent: Monday, October 19, 2009 9:19 AM To: solr-user@lucene.apache.org Subject: Solr commits before documents are added Hi, My application indexes huge number of documents(like in millions). Below is the snapshot of my code where I add all documents to Solr, and then at last issue commit command. I use Solrj. I find that last few documents are not committed to Solr. Is this because adding documents to Solr took longer time and it reached commit command even before it finished adding documents? Is there are way to ensure that solr waits for all documents to be added and then commits? Please advise me how to solve this issue. For loop solrServer.add(doc); // Add document to Solr End for loop solrServer.commit(); // Commit to Solr Thanks, Sharmila -- View this message in context: http://www.nabble.com/Solr-commits-before-documents-are-added-tp25961191p25964770.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr commits before documents are added
A few questions to help the troubleshooting. Solr version #? Is there just 1 commit through Solrj for the millions of documents? Or do you do it on a regular interval (every 100k documents for example) and then one at the end to be sure? How are you observing that the last few didn't make it in? Are you looking at a slave or master? -Todd -Original Message- From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] Sent: Monday, October 19, 2009 9:19 AM To: solr-user@lucene.apache.org Subject: Solr commits before documents are added Hi, My application indexes huge number of documents(like in millions). Below is the snapshot of my code where I add all documents to Solr, and then at last issue commit command. I use Solrj. I find that last few documents are not committed to Solr. Is this because adding documents to Solr took longer time and it reached commit command even before it finished adding documents? Is there are way to ensure that solr waits for all documents to be added and then commits? Please advise me how to solve this issue. For loop solrServer.add(doc); // Add document to Solr End for loop solrServer.commit(); // Commit to Solr Thanks, Sharmila
RE: Solr Timeouts
i Fernandez-Kincade > wrote: >> I'm fairly certain that all of the indexing jobs are calling SOLR with >> commit=false. They all construct the indexing URLs using a CLR function I >> wrote, which takes in a Commit parameter, which is always set to false. >> >> Also, I don't see any calls to commit in the Tomcat logs (whereas normally >> when I make a commit call I do). >> >> This suggests that Solr is doing it automatically, but the extract handler >> doesn't seem to be the problem: >> > class="org.apache.solr.handler.extraction.ExtractingRequestHandler" >> startup="lazy"> >> >> ignored_ >> fileData >> >> >> >> >> There is no external config file specified, and I don't see anything about >> commits here. >> >> I've tried setting up more detailed indexer logging but haven't been able to >> get it to work: >> true >> >> I tried relative and absolute paths, but no dice so far. >> >> Any other ideas? >> >> -Gio. >> >> -Original Message- >> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley >> Sent: Monday, October 05, 2009 12:52 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr Timeouts >> >>> This is what one of my SOLR requests look like: >>> >>> http://titans:8080/solr/update/extract/?literal.versionId=684936&literal.filingDate=1997-12-04T00:00:00Z&literal.formTypeId=95&literal.companyId=3567904&literal.sourceId=0&resource.name=684936.txt&commit=false >> >> Have you verified that all of your indexing jobs (you said you had 4 >> or 5) have commit=false? >> >> Also make sure that your extract handler doesn't have a default of >> something that could cause a commit - like commitWithin or something. >> >> -Yonik >> http://www.lucidimagination.com >> >> >> >> On Mon, Oct 5, 2009 at 12:44 PM, Giovanni Fernandez-Kincade >> wrote: >>> Is there somewhere other than solrConfig.xml that the autoCommit feature is >>> enabled? I've looked through that file and found autocommit to be commented >>> out: >>> >>> >>> >>> >>> >>> >>> >> >>> >>> >>> >>> -Original Message- >>> From: Feak, Todd [mailto:todd.f...@smss.sony.com] >>> Sent: Monday, October 05, 2009 12:40 PM >>> To: solr-user@lucene.apache.org >>> Subject: RE: Solr Timeouts >>> >>> >>> >>> Actually, ignore my other response. >>> >>> >>> >>> I believe you are committing, whether you know it or not. >>> >>> >>> >>> This is in your provided stack trace >>> >>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor, >>> SolrParams, boolean) >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, >>> SolrQueryResponse) >>> >>> >>> >>> I think Yonik gave you additional information for how to make it faster. >>> >>> >>> >>> -Todd >>> >>> >>> >>> -Original Message- >>> >>> From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] >>> >>> Sent: Monday, October 05, 2009 9:30 AM >>> >>> To: solr-user@lucene.apache.org >>> >>> Subject: RE: Solr Timeouts >>> >>> >>> >>> I'm not committing at all actually - I'm waiting for all 6 million to be >>> done. >>> >>> >>> >>> -Original Message- >>> >>> From: Feak, Todd [mailto:todd.f...@smss.sony.com] >>> >>> Sent: Monday, October 05, 2009 12:10 PM >>> >>> To: solr-user@lucene.apache.org >>> >>> Subject: RE: Solr Timeouts >>> >>> >>> >>> How often are you committing? >>> >>> >>> >>> Every time you commit, Solr will close the old index and open the new one. >>> If you are doing this in parallel from multiple jobs (4-5 you mention) then >>> eventually the server gets behind and you start to pile up commit requests. >>> Once this starts to happen, it will cascade out of control if the rate of >>> commits isn't slowed. >>> >>> >>> >>> -Todd >&g
RE: using regular expressions in solr query
Any particular reason for the double quotes in the 2nd and 3rd query example, but not the 1st, or is this just an artifact of your email? -Todd -Original Message- From: Rakhi Khatwani [mailto:rkhatw...@gmail.com] Sent: Tuesday, October 06, 2009 2:26 AM To: solr-user@lucene.apache.org Subject: using regular expressions in solr query Hi, i have an example in which i want to use a regular expression in my solr query: for example: suppose i wanna search on a sample : raakhi rajnish ninad goureya sheetal ritesh rajnish ninad goureya sheetal where my content field is of type text when i type in QUERY: content:raa* RESPONSE : raakhi rajnish ninad goureya sheetal QUERY: content:"ra*" RESPONSE: 0 results coz of this i am facing problems with the next query: QUERY: content: "r* rajnish" RESPONSE: 0 results which should ideally return both the results. any pointers?? Regards, Raakhi
RE: cleanup old index directories on slaves
We use the snapcleaner script. http://wiki.apache.org/solr/SolrCollectionDistributionScripts#snapcleaner Will that do the job? -Todd -Original Message- From: solr jay [mailto:solr...@gmail.com] Sent: Monday, October 05, 2009 1:58 PM To: solr-user@lucene.apache.org Subject: cleanup old index directories on slaves Is there a reliable way to safely clean up index directories? This is needed mainly on slave side as in several situations, an old index directory is replaced with a new one, and I'd like to remove those that are no longer in use. Thanks, -- J
RE: About SolrJ for XML
It looks like you have some confusion about queries vs. facets. You may want to look at the Solr wiki reqarding facets a bit. In the meanwhile, if you just want to query for that field containing "21"... I would suggest that you don't set the query type, don't set any facet fields, and only set the query. Set the query to "field:21" where "field" should be replaced with the fieldname that has a "21" in it. For example, if the field name is foo, try this instead: SolrQuery query = new SolrQuery(); query.setQuery("foo:21"); QueryResponse qr = server.query(query); SolrDocumentList sdl = qr.getResults(); To delve into more detail, what your original code did was query for a "21" in the default field (check your solrconfig.xml to see what is default). It then faceted the query results by the "id" field and "weight" fields. Because there were no search results at all, the faceting request didn't do anything. I'm not sure why you switched the query type to DisMax, as you didn't issue a query that would leverage it. -Todd -Original Message- From: Chaitali Gupta [mailto:chaitaligupt...@yahoo.com] Sent: Monday, October 05, 2009 2:05 PM To: solr-user@lucene.apache.org Subject: About SolrJ for XML Hi, I am new in Solr. I am using Solr version 1.3 I would like to index XML files using SolrJ API. I have gone through solr mailing list's emails and have been able to index XML files. But when I try to query on those files using SolrJ, I get no output. Especially, I do not find correct results for numeric fields that I have specified in the schema.xml file in the config directory for my XML files. I have made those fields "indexed" and "stored" by using "indexed=true" and "stored=true". I am using the following code in order to search for data (In the following code, I am trying to find out weight with values 21) - SolrQuery query = new SolrQuery(); query.setQueryType("dismax"); query.setFacet(true); query.addFacetField("id"); query.addFacetField("weight"); query.setQuery("21"); QueryResponse qr = server.query(query); SolrDocumentList sdl = qr.getResults(); Am I doing anything wrong? Why do I get zero results even when there is a XML file with weight being 21. What are the other ways of doing the numeric queries in SolrJ ? Also, I would like to know how do I get the exact size of the index being generated by Solr. I am using a single machine to generate and query the index. When I look at the index directory, I see that the size of the files in the index directory is much lesser than the size reported by the "total" column in "ls -lh" command. Does anyone have any idea why is it the case? Thanks in advance. Waiting for your reply soon. Regards Chaitali
RE: Solr Timeouts
Actually, ignore my other response. I believe you are committing, whether you know it or not. This is in your provided stack trace org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor, SolrParams, boolean) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse) I think Yonik gave you additional information for how to make it faster. -Todd -Original Message- From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 9:30 AM To: solr-user@lucene.apache.org Subject: RE: Solr Timeouts I'm not committing at all actually - I'm waiting for all 6 million to be done. -Original Message----- From: Feak, Todd [mailto:todd.f...@smss.sony.com] Sent: Monday, October 05, 2009 12:10 PM To: solr-user@lucene.apache.org Subject: RE: Solr Timeouts How often are you committing? Every time you commit, Solr will close the old index and open the new one. If you are doing this in parallel from multiple jobs (4-5 you mention) then eventually the server gets behind and you start to pile up commit requests. Once this starts to happen, it will cascade out of control if the rate of commits isn't slowed. -Todd From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 9:04 AM To: solr-user@lucene.apache.org Subject: Solr Timeouts Hi, I'm attempting to index approximately 6 million HTML/Text files using SOLR 1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat and JVM. I've fired up 4-5 different jobs that are making indexing requests using the ExtractionRequestHandler, and everything works well for about 30-40 minutes, after which all indexing requests start timing out. I profiled the server and found that all of the threads are getting blocked by this call to flush the Lucene index to disk (see below). This leads me to a few questions: 1. Is this normal? 2. Can I reduce the frequency with which this happens somehow? I've greatly increased the indexing options in SolrConfig.xml (attached here) to no avail. 3. During these flushes, resource utilization (CPU, I/O, Memory Consumption) is significantly down compared to when requests are being handled. Is there any way to make this index go faster? I have plenty of bandwidth on the machine. I appreciate any insight you can provide. We're currently using MS SQL 2005 as our full-text solution and are pretty much miserable. So far SOLR has been a great experience. Thanks, Gio. http-8080-Processor21 [RUNNABLE] CPU time: 9:51 java.io.RandomAccessFile.seek(long) org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[], int, int) org.apache.lucene.store.BufferedIndexInput.refill() org.apache.lucene.store.BufferedIndexInput.readByte() org.apache.lucene.store.IndexInput.readVInt() org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) org.apache.lucene.index.SegmentTermEnum.next() org.apache.lucene.index.SegmentTermEnum.scanTo(Term) org.apache.lucene.index.TermInfosReader.get(Term, boolean) org.apache.lucene.index.TermInfosReader.get(Term) org.apache.lucene.index.SegmentTermDocs.seek(Term) org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int) org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos) org.apache.lucene.index.IndexWriter.applyDeletes() org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean) org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean) org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean) org.apache.lucene.index.IndexWriter.closeInternal(boolean) org.apache.lucene.index.IndexWriter.close(boolean) org.apache.lucene.index.IndexWriter.close() org.apache.solr.update.SolrIndexWriter.close() org.apache.solr.update.DirectUpdateHandler2.closeWriter() org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand) org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand) org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor, SolrParams, boolean) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse) org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, SolrQueryResponse) org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest, SolrQueryResponse) org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, ServletResponse, FilterChain) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest, ServletResponse) org.apache.catalina.core.Applic
RE: Solr Timeouts
Ok. Guess that isn't a problem. :) A second consideration... I could see lock contention being an issue with multiple clients indexing at once. Is there any disadvantage to serializing the clients to remove lock contention? -Todd -Original Message- From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 9:30 AM To: solr-user@lucene.apache.org Subject: RE: Solr Timeouts I'm not committing at all actually - I'm waiting for all 6 million to be done. -Original Message----- From: Feak, Todd [mailto:todd.f...@smss.sony.com] Sent: Monday, October 05, 2009 12:10 PM To: solr-user@lucene.apache.org Subject: RE: Solr Timeouts How often are you committing? Every time you commit, Solr will close the old index and open the new one. If you are doing this in parallel from multiple jobs (4-5 you mention) then eventually the server gets behind and you start to pile up commit requests. Once this starts to happen, it will cascade out of control if the rate of commits isn't slowed. -Todd From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 9:04 AM To: solr-user@lucene.apache.org Subject: Solr Timeouts Hi, I'm attempting to index approximately 6 million HTML/Text files using SOLR 1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat and JVM. I've fired up 4-5 different jobs that are making indexing requests using the ExtractionRequestHandler, and everything works well for about 30-40 minutes, after which all indexing requests start timing out. I profiled the server and found that all of the threads are getting blocked by this call to flush the Lucene index to disk (see below). This leads me to a few questions: 1. Is this normal? 2. Can I reduce the frequency with which this happens somehow? I've greatly increased the indexing options in SolrConfig.xml (attached here) to no avail. 3. During these flushes, resource utilization (CPU, I/O, Memory Consumption) is significantly down compared to when requests are being handled. Is there any way to make this index go faster? I have plenty of bandwidth on the machine. I appreciate any insight you can provide. We're currently using MS SQL 2005 as our full-text solution and are pretty much miserable. So far SOLR has been a great experience. Thanks, Gio. http-8080-Processor21 [RUNNABLE] CPU time: 9:51 java.io.RandomAccessFile.seek(long) org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[], int, int) org.apache.lucene.store.BufferedIndexInput.refill() org.apache.lucene.store.BufferedIndexInput.readByte() org.apache.lucene.store.IndexInput.readVInt() org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) org.apache.lucene.index.SegmentTermEnum.next() org.apache.lucene.index.SegmentTermEnum.scanTo(Term) org.apache.lucene.index.TermInfosReader.get(Term, boolean) org.apache.lucene.index.TermInfosReader.get(Term) org.apache.lucene.index.SegmentTermDocs.seek(Term) org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int) org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos) org.apache.lucene.index.IndexWriter.applyDeletes() org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean) org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean) org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean) org.apache.lucene.index.IndexWriter.closeInternal(boolean) org.apache.lucene.index.IndexWriter.close(boolean) org.apache.lucene.index.IndexWriter.close() org.apache.solr.update.SolrIndexWriter.close() org.apache.solr.update.DirectUpdateHandler2.closeWriter() org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand) org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand) org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor, SolrParams, boolean) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse) org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, SolrQueryResponse) org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest, SolrQueryResponse) org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, ServletResponse, FilterChain) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest, ServletResponse) org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, ServletResponse) org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response) org.apache.catalina.core.
RE: Solr Timeouts
How often are you committing? Every time you commit, Solr will close the old index and open the new one. If you are doing this in parallel from multiple jobs (4-5 you mention) then eventually the server gets behind and you start to pile up commit requests. Once this starts to happen, it will cascade out of control if the rate of commits isn't slowed. -Todd From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 9:04 AM To: solr-user@lucene.apache.org Subject: Solr Timeouts Hi, I'm attempting to index approximately 6 million HTML/Text files using SOLR 1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat and JVM. I've fired up 4-5 different jobs that are making indexing requests using the ExtractionRequestHandler, and everything works well for about 30-40 minutes, after which all indexing requests start timing out. I profiled the server and found that all of the threads are getting blocked by this call to flush the Lucene index to disk (see below). This leads me to a few questions: 1. Is this normal? 2. Can I reduce the frequency with which this happens somehow? I've greatly increased the indexing options in SolrConfig.xml (attached here) to no avail. 3. During these flushes, resource utilization (CPU, I/O, Memory Consumption) is significantly down compared to when requests are being handled. Is there any way to make this index go faster? I have plenty of bandwidth on the machine. I appreciate any insight you can provide. We're currently using MS SQL 2005 as our full-text solution and are pretty much miserable. So far SOLR has been a great experience. Thanks, Gio. http-8080-Processor21 [RUNNABLE] CPU time: 9:51 java.io.RandomAccessFile.seek(long) org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[], int, int) org.apache.lucene.store.BufferedIndexInput.refill() org.apache.lucene.store.BufferedIndexInput.readByte() org.apache.lucene.store.IndexInput.readVInt() org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) org.apache.lucene.index.SegmentTermEnum.next() org.apache.lucene.index.SegmentTermEnum.scanTo(Term) org.apache.lucene.index.TermInfosReader.get(Term, boolean) org.apache.lucene.index.TermInfosReader.get(Term) org.apache.lucene.index.SegmentTermDocs.seek(Term) org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int) org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos) org.apache.lucene.index.IndexWriter.applyDeletes() org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean) org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean) org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean) org.apache.lucene.index.IndexWriter.closeInternal(boolean) org.apache.lucene.index.IndexWriter.close(boolean) org.apache.lucene.index.IndexWriter.close() org.apache.solr.update.SolrIndexWriter.close() org.apache.solr.update.DirectUpdateHandler2.closeWriter() org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand) org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand) org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor, SolrParams, boolean) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse) org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, SolrQueryResponse) org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest, SolrQueryResponse) org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, ServletResponse, FilterChain) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest, ServletResponse) org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, ServletResponse) org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response) org.apache.catalina.core.StandardContextValve.invoke(Request, Response) org.apache.catalina.core.StandardHostValve.invoke(Request, Response) org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response) org.apache.catalina.core.StandardEngineValve.invoke(Request, Response) org.apache.catalina.connector.CoyoteAdapter.service(Request, Response) org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream) org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection, Object[]) org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, TcpConnection, Object[]) org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[]) org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() java.lang.Thread.run()
RE: NGramTokenFilter behaviour
My understanding of a NGramTokenizing is to help with languages that don't necessarily contain spaces as a word delimiter (Japanese et al). In that case bi-gramming is used to find words contained within a stream of unbroken characters. In that case, you want to find all of the bi-grams that you input for the search query. An "OR" wouldn't work as well, as you would find tons of hits. -Todd Feak -Original Message- From: aod...@gmail.com [mailto:aod...@gmail.com] Sent: Wednesday, September 30, 2009 10:54 AM To: solr-user@lucene.apache.org Subject: NGramTokenFilter behaviour If I index the following text: "I live in Dublin Ireland where Guinness is brewed" Then search for: duvlin Should Solr return a match? In the admin interface under the analysis section, Solr highlights some NGram matches? When I enter the following query string into my browser address bar, I get 0 results? http://localhost:8983/solr/select/?q=duvlin&debugQuery=true Nor do I get results for dub, dubli, ublin, dublin (du does return a result). I also notice when I use debugQuery=true, the parsed query is a PhraseQuery. This doesn't make sense to me, as surely the point of the NGram is to use a Boolean OR between each Gram?? However, if I don't use an NGramFilterFactory at query time, I can get results for: dub, ublin, du, but not duvlin. Can someone please clarify what the purpose of the NGramFilter/tokenizer is, if not to allow for misspellings/morphological variation and also, what the correct configuration is in terms of use at index/query time. Any help appreciated! Aodh. Solr 1.3, JDK 1.6
RE: Re: WebLogic 10 Compatibility Issue - StackOverflowError
Are the issues ran into due to non-standard code in Solr, or is there some WebLogic inconsistency? -Todd Feak -Original Message- From: news [mailto:n...@ger.gmane.org] On Behalf Of Ilan Rabinovitch Sent: Friday, January 30, 2009 1:11 AM To: solr-user@lucene.apache.org Subject: Re: WebLogic 10 Compatibility Issue - StackOverflowError I created a wiki page shortly after posting to the list: http://wiki.apache.org/solr/SolrWeblogic From what we could tell Solr itself was fully functional, it was only the admin tools that were failing. Regards, Ilan Rabinovitch --- SCALE 7x: 2009 Southern California Linux Expo Los Angeles, CA http://www.socallinuxexpo.org On 1/29/09 4:34 AM, Mark Miller wrote: > We should get this on the wiki. > > - Mark > > > Ilan Rabinovitch wrote: >> >> We were able to deploy Solr 1.3 on Weblogic 10.0 earlier today. Doing >> so required two changes: >> >> 1) Creating a weblogic.xml file in solr.war's WEB-INF directory. The >> weblogic.xml file is required to disable Solr's filter on FORWARD. >> >> The contents of weblogic.xml should be: >> >> >> > xmlns="http://www.bea.com/ns/weblogic/90"; >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; >> xsi:schemaLocation="http://www.bea.com/ns/weblogic/90 >> http://www.bea.com/ns/weblogic/90/weblogic-web-app.xsd";> >> >> >> >> false >> >> >> >> >> >> >> 2) Remove the pageEncoding attribute from line 1 of solr/admin/header.jsp >> >> >> >> >> On 1/17/09 2:02 PM, KSY wrote: >>> I hit a major roadblock while trying to get Solr 1.3 running on WebLogic >>> 10.0. >>> >>> A similar message was posted before - ( >>> http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin- page-td20157873.html >>> >>> http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin- page-td20157873.html >>> >>> ) - but it seems like it hasn't been resolved yet, so I'm re-posting >>> here. >>> >>> I am sure I configured everything correctly because it's working fine on >>> Resin. >>> >>> Has anyone successfully run Solr 1.3 on WebLogic 10.0 or higher? Thanks. >>> >>> >>> SUMMARY: >>> >>> When accessing /solr/admin page, StackOverflowError occurs due to an >>> infinite recursion in SolrDispatchFilter >>> >>> >>> ENVIRONMENT SETTING: >>> >>> Solr 1.3.0 >>> WebLogic 10.0 >>> JRockit JVM 1.5 >>> >>> >>> ERROR MESSAGE: >>> >>> SEVERE: javax.servlet.ServletException: java.lang.StackOverflowError >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:276) >>> >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) >>> >>> at >>> weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java: 42) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis patcherImpl.java:526) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:261) >>> >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) >>> >>> at >>> weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java: 42) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis patcherImpl.java:526) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:261) >>> >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) >>> >>> at >>> weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java: 42) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis patcherImpl.java:526) >>> >>> at >>> weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:261) >>> >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) >>> >>> >> >> > >
RE: warmupTime : 0
This usually represents anything less then 8ms if you are on a Windows system. The granularity on timing on Windows systems is around 16ms. -Todd feak -Original Message- From: sunnyfr [mailto:johanna...@gmail.com] Sent: Thursday, January 29, 2009 9:13 AM To: solr-user@lucene.apache.org Subject: warmupTime : 0 Hi, Do you think it's normal to have warmupTime : 0 ?? searcher class: org.apache.solr.search.SolrIndexSearcher version:1.0 description:index searcher stats: searcherName : searc...@6f7cf6b6 main caching : true numDocs : 8207035 maxDoc : 8239991 readerImpl : ReadOnlyMultiSegmentReader readerDir : org.apache.lucene.store.FSDirectory@/data/solr/video/data/index indexVersion : 1228743257996 openedAt : Thu Jan 29 17:42:08 CET 2009 registeredAt : Thu Jan 29 17:42:09 CET 2009 warmupTime : 0 I've around 12M of data. thanks a lot, -- View this message in context: http://www.nabble.com/warmupTime-%3A-0-tp21731301p21731301.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr as the data store
Although the idea that you will need to rebuild from scratch is unlikely, you might want to fully understand the cost of recovery if you *do* have to. If it's incredibly expensive(time or money), you need to keep that in mind. -Todd -Original Message- From: Ian Connor [mailto:ian.con...@gmail.com] Sent: Wednesday, January 28, 2009 12:38 PM To: solr Subject: solr as the data store Hi All, Is anyone using Solr (and thus the lucene index) as there database store. Up to now, we have been using a database to build Solr from. However, given that lucene already keeps the stored data intact, and that rebuilding from solr to solr can be very fast, the need for the separate database does not seem so necessary. It seems totally possible to maintain just the solr shards and treat them as the database (backups, redundancy, etc are already built right in). The idea that we would need to rebuild from scratch seems unlikely and the speed boost by using solr shards for data massaging and reindexing seems very appealing. Has anyone else thought about this or done this and ran into problems that caused them to go back to a seperate database model? Is there a critical need you can think is missing? -- Regards, Ian Connor
RE: QTime in microsecond
The easiest way is to run maybe 100,000 or more queries and take an average. A single microsecond value for a query would be incredibly inaccurate. -ToddFeak -Original Message- From: AHMET ARSLAN [mailto:iori...@yahoo.com] Sent: Friday, January 23, 2009 1:33 AM To: solr-user@lucene.apache.org Subject: QTime in microsecond Is there a way to get QTime in microsecond from solr? I have small set of collection and my response time (QTime) is 0 or 1 milliseconds. I am running benchmark tests and I need more sensitive running times for comparision. Thanks for your help.
RE: Performance "dead-zone" due to garbage collection
Can you share your experience with the IBM JDK once you've evaluated it? You are working with a heavy load, I think many would benefit from the feedback. -Todd Feak -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Thursday, January 22, 2009 3:46 PM To: solr-user@lucene.apache.org Subject: Re: Performance "dead-zone" due to garbage collection I'm not sure if you suggested it, but I'd like to try the IBM JVM. Aside from setting my JRE paths, is there anything else I need to do run inside the IBM JVM? (e.g. re-compiling?) Walter Underwood wrote: > > What JVM and garbage collector setting? We are using the IBM JVM with > their concurrent generational collector. I would strongly recommend > trying a similar collector on your JVM. Hint: how much memory is in > use after a full GC? That is a good approximation to the working set. > > -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collect ion-tp21588427p21616078.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Performance "dead-zone" due to garbage collection
A ballpark calculation would be Collected Amount (From GC logging)/ # of Requests. The GC logging can tell you how much it collected each time, no need to try and snapshot before and after heap sizes. However (big caveat here), this is a ballpark figure. The garbage collector is not guaranteed to collect everything, every time. It can stop collecting depending on how much time it spent. It may only collect from certain sections within memory (Eden, survivor, tenured), etc. This may still be enough to make broad comparisons to see if you've decreased the overall garbage/request (via cache changes), but it will be quite a rough estimate. -Todd -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Wednesday, January 21, 2009 3:08 PM To: solr-user@lucene.apache.org Subject: Re: Performance "dead-zone" due to garbage collection (Thanks for the responses) My filterCache hit rate is ~60% (so I'll try making it bigger), and I am CPU bound. How do I measure the size of my per-request garbage? Is it (total heap size before collection - total heap size after collection) / # of requests to cause a collection? I'll try your suggestions and post back any useful results. -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collect ion-tp21588427p21593661.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Performance "dead-zone" due to garbage collection
>From a high level view, there is a certain amount of garbage collection that must occur. That garbage is generated per request, through a variety of means (buffers, request, response, cache expulsion). The only thing that JVM parameters can address is *when* that collection occurs. It can occur often in small chunks, or rarely in large chunks (or anywhere in between). If you are CPU bound (which it sounds like you may be), then you really have a decision to make. Do you want an overall drop in performance, as more time is spent garbage collecting, OR do you want spikes in garbage collection that are more rare, but have a stronger impact. Realistically it becomes a question of one or the other. You *must* pay the cost of garbage collection at some point in time. It is possible that increasing cache size will decrease overall garbage collection, as the churn caused by caused by cache misses creates additional garbage. Decreasing the churn could decrease garbage. BUT, this really depends on your cache hit rates. If they are pretty high (>90%) then it's probably not much of a factor. However, if you are in the 50%-60% range, larger caches may help you in a number of ways. -Todd Feak -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Wednesday, January 21, 2009 11:14 AM To: solr-user@lucene.apache.org Subject: Re: Performance "dead-zone" due to garbage collection I'm using a recent version of Sun's JVM (6 update 7) and am using the concurrent generational collector. I've tried several other collectors, none seemed to help the situation. I've tried reducing my heap allocation. The search performance got worse as I reduced the heap. I didn't monitor the garbage collector in those tests, but I imagine that it would've gotten better. (As a side note, I do lots of faceting and sorting, I have 10M records in this index, with an approximate index file size of 10GB). This index is on a single machine, in a single Solr core. Would splitting it across multiple Solr cores on a single machine help? I'd like to find the limit of this machine before spreading the data to more machines. Thanks, Wojtek -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collect ion-tp21588427p21590150.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Performance "dead-zone" due to garbage collection
The large drop in old generation from 27GB->6GB indicates that things are getting into your old generation prematurely. They really don't need to get there at all, and should be collected sooner (more frequently). Look into increasing young generation sizes via JVM parameters. Also look into concurrent collection. You could even consider decreasing your JVM max memory. Obviously you aren't using it all, decreasing it will force the JVM to do more frequent (and therefore smaller) collections. You're average collection time may go up, but you will get smaller performance decreases. Great details on memory tuning on Sun JDKs here http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html There are other articles for 1.6 and 1.4 as well. -Todd -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Wednesday, January 21, 2009 9:49 AM To: solr-user@lucene.apache.org Subject: Performance "dead-zone" due to garbage collection I'm intermittently experiencing severe performance drops due to Java garbage collection. I'm allocating a lot of RAM to my Java process (27GB of the 32GB physically available). Under heavy load, the performance drops approximately every 10 minutes, and the drop lasts for 30-40 seconds. This coincides with the size of the old generation heap dropping from ~27GB to ~6GB. Is there a way to reduce the impact of garbage collection? A couple ideas we've come up with (but haven't tried yet) are: increasing the minimum heap size, more frequent (but hopefully less costly) garbage collection. Thanks, Wojtek -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collect ion-tp21588427p21588427.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: New to Solr/Lucene design question
Yes, that's what I was suggesting. :) Might have to be careful with the extra underscore "_" characters. Not sure if those will cause issue with dynamic fields. -Todd Feak -Original Message- From: Yogesh Chawla - PD [mailto:premiergenerat...@yahoo.com] Sent: Tuesday, January 20, 2009 3:14 PM To: solr-user@lucene.apache.org Subject: Re: New to Solr/Lucene design question Hi Todd, I think I see what you are saying here. In our schema.xml we can define it like this: and then add data like this: Yogesh Chawla myMiddleName If we need to add other types of dynamic data types, we can do that at a later time by adding a different type of dynamic field. This way we are not querying a single field 'stash-content' but rather just the fields we are interested in and there is no need to change the java code or the schema.xml. Are we on the same wave length here? Thanks a lot for the suggestion, Yogesh - Original Message From: "Feak, Todd" To: solr-user@lucene.apache.org Sent: Tuesday, January 20, 2009 4:49:56 PM Subject: RE: New to Solr/Lucene design question A third option - Use dynamic fields. Add a dynamic field call "*_stash". This will allow new fields for documents to be added down the road without changing schema.xml, yet still allow you to query on fields like "arresteeFirstName_stash" without extra overhead. -Todd Feak -Original Message- From: Yogesh Chawla - PD [mailto:premiergenerat...@yahoo.com] Sent: Tuesday, January 20, 2009 2:30 PM To: solr-user@lucene.apache.org Subject: New to Solr/Lucene design question Hello All, We are using SOLR/Lucene as the search engine for an application we are designing. The application is a workflow application that can receive different types of documents. For example, we are currently working on getting booking documents but will also accept arrest documents later this year. We have defined a custom schema that incorporates some schemas designed by federal consortiums. From those schemas we pluck out values that we want SOLR/Lucene to index and search on and we go from our instance document to a SOLR document. The fields in our schema.xml look like this: Above, there is a field called "stash-content". The goal is to take any search able data from any document type and put it in this field. For example, we would store data like this in XML format: arrestee_firstname_Yogesh arrestee_lastname_Chawla arrestee_middlename_myMiddleName The advantage to such an approach is that we can add new document types to search on and as long as they use the same semantics such as arrestee_firstname that we won't to update any code. It also makes the code simple and generic for any document type. We can search on first name like this for a starts with query:arrestee_firstname_Y*. We had to use the _ instead of a space so that each word would not be searched when a query was performed and only a single string would be searched. (hope that makes sense). The cons could be a performance hit. The other approach is to add fields explicitly like this: Yogesh Chawla myMiddleName This approach seems more traditional. The pros of it are that it is straight forward. The cons are that every time we add a new document type to search on, we have to update schema.xml and the java code that creates SOLR documents. The number of documents that we will eventually want to search on is about 5 million. However, this will take a while to ramp up to and we are more immediately looking at searching on about 100,000. I am new to SOLR and just inherited this project with approach number 1. Is this something that is going to bite us in the future? Thanks, Yogesh
RE: New to Solr/Lucene design question
A third option - Use dynamic fields. Add a dynamic field call "*_stash". This will allow new fields for documents to be added down the road without changing schema.xml, yet still allow you to query on fields like "arresteeFirstName_stash" without extra overhead. -Todd Feak -Original Message- From: Yogesh Chawla - PD [mailto:premiergenerat...@yahoo.com] Sent: Tuesday, January 20, 2009 2:30 PM To: solr-user@lucene.apache.org Subject: New to Solr/Lucene design question Hello All, We are using SOLR/Lucene as the search engine for an application we are designing. The application is a workflow application that can receive different types of documents. For example, we are currently working on getting booking documents but will also accept arrest documents later this year. We have defined a custom schema that incorporates some schemas designed by federal consortiums. From those schemas we pluck out values that we want SOLR/Lucene to index and search on and we go from our instance document to a SOLR document. The fields in our schema.xml look like this: Above, there is a field called "stash-content". The goal is to take any search able data from any document type and put it in this field. For example, we would store data like this in XML format: arrestee_firstname_Yogesh arrestee_lastname_Chawla arrestee_middlename_myMiddleName The advantage to such an approach is that we can add new document types to search on and as long as they use the same semantics such as arrestee_firstname that we won't to update any code. It also makes the code simple and generic for any document type. We can search on first name like this for a starts with query:arrestee_firstname_Y*. We had to use the _ instead of a space so that each word would not be searched when a query was performed and only a single string would be searched. (hope that makes sense). The cons could be a performance hit. The other approach is to add fields explicitly like this: Yogesh Chawla myMiddleName This approach seems more traditional. The pros of it are that it is straight forward. The cons are that every time we add a new document type to search on, we have to update schema.xml and the java code that creates SOLR documents. The number of documents that we will eventually want to search on is about 5 million. However, this will take a while to ramp up to and we are more immediately looking at searching on about 100,000. I am new to SOLR and just inherited this project with approach number 1. Is this something that is going to bite us in the future? Thanks, Yogesh
RE: How to select *actual* match from a multi-valued field
Anyone that can shed some insight? -Todd -Original Message- From: Feak, Todd [mailto:todd.f...@smss.sony.com] Sent: Friday, January 16, 2009 9:55 AM To: solr-user@lucene.apache.org Subject: How to select *actual* match from a multi-valued field At a high level, I'm trying to do some more intelligent searching using an app that will send multiple queries to Solr. My current issue is around multi-valued fields and determining which entry actually generated the "hit" for a particular query. For example, let's say that I have a multi-valued field containing people's names, associated with the document (trying to be non-specific on purpose). In one document, I have the following names: Jane Smith, Bob Smith, Roger Smith, Jane Doe. If the user performs a search for Bob Smith, this document is returned. What I want to know is that this document was returned because of "Bob Smith", not because of Jane or Roger. I've tried using the highlighting settings. They do provide some help, as the Jane Doe entry doesn't come back highlighted, but both Jane and Roger do. I've tried using hl.requireFieldMatch, but that seems to pertain only to fields, not entries within a multi-valued field. Using Solr, is there a way to get the information I am looking for? Specifically, that "Bob Smith" is the value in the multi-valued field that triggered the hit? -Todd Feak
How to select *actual* match from a multi-valued field
At a high level, I'm trying to do some more intelligent searching using an app that will send multiple queries to Solr. My current issue is around multi-valued fields and determining which entry actually generated the "hit" for a particular query. For example, let's say that I have a multi-valued field containing people's names, associated with the document (trying to be non-specific on purpose). In one document, I have the following names: Jane Smith, Bob Smith, Roger Smith, Jane Doe. If the user performs a search for Bob Smith, this document is returned. What I want to know is that this document was returned because of "Bob Smith", not because of Jane or Roger. I've tried using the highlighting settings. They do provide some help, as the Jane Doe entry doesn't come back highlighted, but both Jane and Roger do. I've tried using hl.requireFieldMatch, but that seems to pertain only to fields, not entries within a multi-valued field. Using Solr, is there a way to get the information I am looking for? Specifically, that "Bob Smith" is the value in the multi-valued field that triggered the hit? -Todd Feak
RE: Commiting index while time-consuming query is running
I believe that when you commit, a new IndexReader is created, which is warmed, etc. New incoming queries will be sent to this new IndexReader. Once all previously existing queries have been answered, the old IndexReader will shut down. The commit doesn't wait for the query to finish, but it shouldn't impact the results of that query either. What may be impacted is overall system performance while you have 2 IndexReaders in play. There will always be some amount of overlap, but it may be drawn out by the long query. -Todd Feak -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Tuesday, January 13, 2009 2:18 PM To: solr-user@lucene.apache.org Subject: Commiting index while time-consuming query is running Once in a while my Solr instance receives a query that takes a really long time to execute (several minutes or more). What will happen if I update my index (and commit) while one of these really long queries is executing? Will Solr wait for the query to complete before it commits my update? (on a side note, I'm re-working my UI to eliminate these queries) Thanks! -- View this message in context: http://www.nabble.com/Commiting-index-while-time-consuming-query-is-runn ing-tp21445704p21445704.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Snapinstaller vs Solr Restart
Kind of a side-note, but I think it may be worth your while. If your queryResultCache hit rate is 65%, consider putting a reverse proxy in front of Solr. It can give performance boosts over the query cache in Solr, as it doesn't have to pay the cost of reformulating the response. I've used Varnish with great results. Squid is another option. -Todd Feak -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Tuesday, January 06, 2009 1:20 PM To: solr-user@lucene.apache.org Subject: Re: Snapinstaller vs Solr Restart I use my warm up queries to fill the field cache (or at least that's the idea). My filterCache hit rate is ~99% & queryResultCache is ~65%. I update my index several times a day with no 'optimize', and performance is seemless. I also update my index once nightly with an 'optimize', and that's where I see the performance drop. I'll try turning autowarming on. Could this have to do with file caching by the OS? Otis Gospodnetic wrote: > > Is autowarm count of 0 a good idea, though? > If you don't want to autowarm any caches, doesn't that imply that you have > very low hit rate and therefore don't care to autowarm? And if you have a > very low hit rate, then perhaps caches are not needed at all? > > > How about this. Do you optimize your index at any point? > -- View this message in context: http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21319344. html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Using query functions against a "type" field
Thanks Yonik! I still may investigate the query function stuff that was discussed, as Hoss indicated it may hold value. -Todd Feak -Original Message- From: Yonik Seeley [mailto:ysee...@gmail.com] Sent: Tuesday, January 06, 2009 10:19 AM To: solr-user@lucene.apache.org Subject: Re: Using query functions against a "type" field On Tue, Jan 6, 2009 at 1:05 PM, Feak, Todd wrote: > I'm not sure I followed all that Yonik. > > Are you saying that I can achieve this affect now with a bq setting in > my DisMax query instead of via a bf setting? Yep, a "const" QParser would enable that. bq={!const}foo:bar -Yonik
RE: Using query functions against a "type" field
I'm not sure I followed all that Yonik. Are you saying that I can achieve this affect now with a bq setting in my DisMax query instead of via a bf setting? -Todd Feak -Original Message- From: Yonik Seeley [mailto:ysee...@gmail.com] Sent: Tuesday, January 06, 2009 9:46 AM To: solr-user@lucene.apache.org Subject: Re: Using query functions against a "type" field On Tue, Jan 6, 2009 at 10:41 AM, Feak, Todd wrote: > The boost queries are true queries, so the amount boost can be affected > by things like term frequency for the query. Sounds like a constant score query is a general way to do this. Possible QParser syntax: {!const}tag:FOO OR tag:BAR Could be implemented via ConstantScoreQuery(QueryWrapperFilter(theQuery)) The value could be the boost, optionally set within this QParser... {!const v=2.0}tag:FOO OR tag:BAR -Yonik
RE: Snapinstaller vs Solr Restart
First suspect would be Filter Cache settings and Query Cache settings. If they are auto-warming at all, then there is a definite difference between the first start behavior and the post-commit behavior. This affects what's in memory, caches, etc. -Todd Feak -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Tuesday, January 06, 2009 9:46 AM To: solr-user@lucene.apache.org Subject: Snapinstaller vs Solr Restart I'm running load tests against my Solr instance. I find that it typically takes ~10 minutes for my Solr setup to "warm-up" while I throw my test queries at it. Also, I have the same two warm-up queries specified for the firstSearcher and newSearcher event listeners. I'm now benchmarking the affect of updating an index under load. I'm finding that after running snapinstaller, Solr takes ~1 hour to get back to the same performance numbers I was getting 10 minutes after a restart. If I can justify being offline for a few moments, it seems like I'll be better off restarting Solr rather than running Snapinstaller. Any ideas why? Thanks. -- View this message in context: http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315273. html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Using query functions against a "type" field
:It should be fairly predictible, can you elaborate on what problems you :have just adding boost queries for the specific types? The boost queries are true queries, so the amount boost can be affected by things like term frequency for the query. The functions aren't affected by this and therefore more predictable over the life of the index. If I want to boost documents via multiple factors, their interaction is very important. If that interaction slowly changes over the life of the index, I lose that control. :a generic Parser/ValueSource that let you specific term=>float mappings in :it's init params would certianly make a cool patch for Solr. I do believe I will work on this (may take me a bit). Once I nail it down, I've got a couple of other easier query functions I would like to add as well, if they hold value for the community. -Hoss
RE: Ngram Repeats
To get the unique brand names, you are wandering in to the Facet query territory that I mentioned. You could consider a separate index, and that will probably provide the best performance. Especially if you are hitting it on a per-keystroke basis to update that auto-complete box. Creating a separate index also allows you to scale this section of your search infrastructure separately, if necessary. You *can* put the separate index within the same Tomcat instance if you need to. The context snippets in Tomcat can be used to provide a different URL for those queries. -Todd Feak -Original Message- From: Jeff Newburn [mailto:jnewb...@zappos.com] Sent: Wednesday, December 24, 2008 2:30 PM To: solr-user@lucene.apache.org Subject: Re: Ngram Repeats You are correct on the layout. The reason we are trying to do the ngrams is we want to do a drop down box for autocomplete. The ngrams are extremely fast and the recommended way to do this according to the user group. They work wonderfully except this one issue. So do we basically have to do a separate index for this or is there a dedup setting to only return unique brand names. On 12/24/08 7:51 AM, "Feak, Todd" wrote: > It sounds like you want to get a list of "brands" that start with a particular > string, out of your index. But your index is based on products, not brands. Is > that correct? > > If so, that has nothing to do with NGrams (or even tokenizing for that matter) > I think you should be doing a Facet query instead of a standard query. Take a > look at Facets on the Solr Wiki. > > http://wiki.apache.org/solr/SolrFacetingOverview > > -Todd Feak > -Original Message- > From: Jeff Newburn [mailto:jnewb...@zappos.com] > Sent: Wednesday, December 24, 2008 7:39 AM > To: solr-user@lucene.apache.org > Subject: Ngram Repeats > > I have set up an ngram filter and have run into a problem. Our index is > basically composed of products as the unique id. Each product also has a > brand name assigned to it. There are much fewer unique brand names than > products in the index. I tried to set up an ngram based on the brand name > but it is returning the same brand name over and over for each product. > Essentially if you try for the brand name starting with ³as² you will get > the brand ³asus² 15 times. Is there a way to make the ngram only return > unique brand name? I have attached the configuration below. > > positionIncrementGap="1"> > > > > minGramSize="1" maxGramSize="20"/> > > > > > > > -Jeff
RE: Ngram Repeats
It sounds like you want to get a list of "brands" that start with a particular string, out of your index. But your index is based on products, not brands. Is that correct? If so, that has nothing to do with NGrams (or even tokenizing for that matter) I think you should be doing a Facet query instead of a standard query. Take a look at Facets on the Solr Wiki. http://wiki.apache.org/solr/SolrFacetingOverview -Todd Feak -Original Message- From: Jeff Newburn [mailto:jnewb...@zappos.com] Sent: Wednesday, December 24, 2008 7:39 AM To: solr-user@lucene.apache.org Subject: Ngram Repeats I have set up an ngram filter and have run into a problem. Our index is basically composed of products as the unique id. Each product also has a brand name assigned to it. There are much fewer unique brand names than products in the index. I tried to set up an ngram based on the brand name but it is returning the same brand name over and over for each product. Essentially if you try for the brand name starting with ³as² you will get the brand ³asus² 15 times. Is there a way to make the ngram only return unique brand name? I have attached the configuration below. -Jeff
RE: Using query functions against a "type" field
If I do that, how do I turn off the boosting for some queries but not others? This needs to be done at query time, I believe. -Todd Feak -Original Message- From: Walter Underwood [mailto:wunderw...@netflix.com] Sent: Monday, December 22, 2008 10:33 AM To: solr-user@lucene.apache.org Subject: Re: Using query functions against a "type" field Try document boost at index time. --wunder On 12/22/08 9:28 AM, "Feak, Todd" wrote: > I would like to use a query function to boost documents of a certain > "type". I realize that I can use a boost query for this, but in > analyzing the scoring it doesn't seem as predictable as the query > functions. > > > > So, imagine I have a field called "foo". Foo contains a value that > indicates what type of document this is. For now there are only document > types of "BAR" and "BAZ". I would like documents of type BAR to be > boosted much more strongly then documents of type BAZ. As far as I can > all of the query functions seem to work with fields that contain > numbers. The only exception being the ord() functions, but those don't > provide the stability I would like, as I can always introduce a new > document type down the road and risk screwing up my results. > > > > Can this be done with function queries? > > > > As a follow up, how difficult would it be for me to write my own > function (and plug it into Solr) that allowed me to return a 1.0 or 0.0 > if a field had a particular string value in it? A function that would > look something like "fieldEq(foo,BAR)" > > > > -Todd Feak >
Using query functions against a "type" field
I would like to use a query function to boost documents of a certain "type". I realize that I can use a boost query for this, but in analyzing the scoring it doesn't seem as predictable as the query functions. So, imagine I have a field called "foo". Foo contains a value that indicates what type of document this is. For now there are only document types of "BAR" and "BAZ". I would like documents of type BAR to be boosted much more strongly then documents of type BAZ. As far as I can all of the query functions seem to work with fields that contain numbers. The only exception being the ord() functions, but those don't provide the stability I would like, as I can always introduce a new document type down the road and risk screwing up my results. Can this be done with function queries? As a follow up, how difficult would it be for me to write my own function (and plug it into Solr) that allowed me to return a 1.0 or 0.0 if a field had a particular string value in it? A function that would look something like "fieldEq(foo,BAR)" -Todd Feak
RE: looking for multilanguage indexing best practice/hint
Don't forget to consider scaling concerns (if there are any). There are strong differences in the number of searches we receive for each language. We chose to create separate schema and config per language so that we can throw servers at a particular language (or set of languages) if we needed to. We see 2 orders of magnitude difference between our most popular language and our least popular. -Todd Feak -Original Message- From: Julian Davchev [mailto:j...@drun.net] Sent: Wednesday, December 17, 2008 11:31 AM To: solr-user@lucene.apache.org Subject: looking for multilanguage indexing best practice/hint Hi, >From my study on solr and lucene so far it seems that I will use single scheme.at least don't see scenario where I'd need more than that. So question is how do I approach multilanguage indexing and multilang searching. Will it really make sense for just searching word..or rather I should supply lang param to search as well. I see there are those filters and already advised on them but I guess question is more of a best practice. solr.ISOLatin1AccentFilterFactory, solr.SnowballPorterFilterFactory So solution I see is using copyField I have same field in different langs or something using distinct filter. Cheers
RE: Query Performance while updating teh index
Sorry, my bad. Didn't read the entire thread. Look at your filter cache first. You are autowarming 1000, and there is exactly 1000 in there. Yet it looks like there may be tens of thousands of filter queries in your system. I would try autowarming more. Try 10,000 or 20,000 and see if it helps. Second look at your document cache. Document caches don't use autowarm. But you can add queries to your firstSeacher and newSearcher entries in your solrconfig to pre-populate the document cache during warming. -Todd Feak -Original Message- From: oleg_gnatovskiy [mailto:oleg_gnatovs...@citysearch.com] Sent: Friday, December 12, 2008 11:19 AM To: solr-user@lucene.apache.org Subject: RE: Query Performance while updating teh index The auto warm time is not an issue. We take the server off the load balancer while it is autowarming. It seems that the slowness occurs after autowarm is done. Feak, Todd wrote: > > It's spending 4-5 seconds warming up your query cache. If 4-5 seconds is > too much, you could reduce the number of queries to auto-warm with on > that cache. > > Notice that the 4-5 seconds is spent only putting about 420 queries into > the query cache. Your autowarm of 5 for the query cache seems a bit > high. If you need to reduce that autowarm time below 5 seconds, you may > have to set that value in the hundreds, as opposed to tens of thousands. > > -Todd Feak > > -Original Message- > From: oleg_gnatovskiy [mailto:oleg_gnatovs...@citysearch.com] > Sent: Friday, December 12, 2008 10:08 AM > To: solr-user@lucene.apache.org > Subject: Re: Query Performance while updating teh index > > > Here's what we have on one of the data slaves for the autowarming. > > > > -- > > Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm > > INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main > > > filterCache{lookups=351993,hits=347055,hitratio=0.98,inserts=8332,evicti > ons=0,size=8245,warmupTime=215,cumulative_lookups=2837676,cumulative_hit > s=2766551,cumulative_hitratio=0.97,cumulative_inserts=72050,cumulative_e > victions=0} > > Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm > > INFO: autowarming result for searc...@3f32ca2b main > > > filterCache{lookups=0,hits=0,hitratio=0.00,inserts=1000,evictions=0,size > =1000,warmupTime=317,cumulative_lookups=2837676,cumulative_hits=2766551, > cumulative_hitratio=0.97,cumulative_inserts=72050,cumulative_evictions=0 > } > > Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm > > INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main > > > queryResultCache{lookups=5309,hits=5223,hitratio=0.98,inserts=422,evicti > ons=0,size=421,warmupTime=4628,cumulative_lookups=77802,cumulative_hits= > 77216,cumulative_hitratio=0.99,cumulative_inserts=424,cumulative_evictio > ns=0} > > -- > > Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm > > INFO: autowarming result for searc...@3f32ca2b main > > > queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=421,evictions=0, > size=421,warmupTime=5536,cumulative_lookups=77804,cumulative_hits=77218, > cumulative_hitratio=0.99,cumulative_inserts=424,cumulative_evictions=0} > > Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm > > INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main > > > documentCache{lookups=87216,hits=86686,hitratio=0.99,inserts=570,evictio > ns=0,size=570,warmupTime=0,cumulative_lookups=1270773,cumulative_hits=12 > 68318,cumulative_hitratio=0.99,cumulative_inserts=2455,cumulative_evicti > ons=0} > > Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm > > INFO: autowarming result for searc...@3f32ca2b main > > > documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= > 0,warmupTime=0,cumulative_lookups=1270773,cumulative_hits=1268318,cumula > tive_hitratio=0.99,cumulative_inserts=2455,cumulative_evictions=0} > > -- > > > > This is our current values after I've messed with them a few times > trying to > get better performance. > > > > > class="solr.LRUCache" > > size="3" > > initialSize="15000" > > autowarmCount="1000"/> > > > class="solr.LRUCache" > > size="6" > > initialSize="3" > > autowarmCount="5"/> > > > class="solr.LRUCache" > > size="20" > > initialSize="125000" > > autowarmCount="0"/> > > > -- > View this message in context: > http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452 > 835p20980669.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > > -- View this message in context: http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452 835p20981862.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Query Performance while updating teh index
It's spending 4-5 seconds warming up your query cache. If 4-5 seconds is too much, you could reduce the number of queries to auto-warm with on that cache. Notice that the 4-5 seconds is spent only putting about 420 queries into the query cache. Your autowarm of 5 for the query cache seems a bit high. If you need to reduce that autowarm time below 5 seconds, you may have to set that value in the hundreds, as opposed to tens of thousands. -Todd Feak -Original Message- From: oleg_gnatovskiy [mailto:oleg_gnatovs...@citysearch.com] Sent: Friday, December 12, 2008 10:08 AM To: solr-user@lucene.apache.org Subject: Re: Query Performance while updating teh index Here's what we have on one of the data slaves for the autowarming. -- Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main filterCache{lookups=351993,hits=347055,hitratio=0.98,inserts=8332,evicti ons=0,size=8245,warmupTime=215,cumulative_lookups=2837676,cumulative_hit s=2766551,cumulative_hitratio=0.97,cumulative_inserts=72050,cumulative_e victions=0} Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@3f32ca2b main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=1000,evictions=0,size =1000,warmupTime=317,cumulative_lookups=2837676,cumulative_hits=2766551, cumulative_hitratio=0.97,cumulative_inserts=72050,cumulative_evictions=0 } Dec 12, 2008 8:46:02 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main queryResultCache{lookups=5309,hits=5223,hitratio=0.98,inserts=422,evicti ons=0,size=421,warmupTime=4628,cumulative_lookups=77802,cumulative_hits= 77216,cumulative_hitratio=0.99,cumulative_inserts=424,cumulative_evictio ns=0} -- Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@3f32ca2b main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=421,evictions=0, size=421,warmupTime=5536,cumulative_lookups=77804,cumulative_hits=77218, cumulative_hitratio=0.99,cumulative_inserts=424,cumulative_evictions=0} Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@3f32ca2b main from searc...@443ad545 main documentCache{lookups=87216,hits=86686,hitratio=0.99,inserts=570,evictio ns=0,size=570,warmupTime=0,cumulative_lookups=1270773,cumulative_hits=12 68318,cumulative_hitratio=0.99,cumulative_inserts=2455,cumulative_evicti ons=0} Dec 12, 2008 8:46:07 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@3f32ca2b main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size= 0,warmupTime=0,cumulative_lookups=1270773,cumulative_hits=1268318,cumula tive_hitratio=0.99,cumulative_inserts=2455,cumulative_evictions=0} -- This is our current values after I've messed with them a few times trying to get better performance. -- View this message in context: http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452 835p20980669.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: move /solr directory from /tomcat/bin/
You can set the home directory in your Tomcat context snippet/file. http://wiki.apache.org/solr/SolrTomcat#head-7036378fa48b79c0797cc8230a8a a0965412fb2e This controls where Solr looks for solrconfig.xml and schema.xml. The solrconfig.xml in turn specifies where to find the data directory. -Original Message- From: Marc Sturlese [mailto:marc.sturl...@gmail.com] Sent: Thursday, December 11, 2008 12:20 PM To: solr-user@lucene.apache.org Subject: move /solr directory from /tomcat/bin/ Hey there, I would like to change the default directory where solr looks for the config files and index. Let's say I would like to put: /opt/tomcat/bin/solr/data/index in /var/searchengine_data/index and /opt/tomcat/bin/solr/conf in /usr/home/searchengine_files/conf Is there any way to do it via configuration or I should modify the SolrResourceLoader? Thanks in advance -- View this message in context: http://www.nabble.com/move--solr-directory-from--tomcat-bin--tp20963811p 20963811.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Issue with Search when using wildcard(*) in search term.
I'm pretty sure "*" isn't supported by DisMax. >From the Solr Wiki on DisMaxRequestHandler overview http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(dismax)#head -ce5517b6c702a55af5cc14a2c284dbd9f18a18c2 "This query handler supports an extremely simplified subset of the Lucene QueryParser syntax. Quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses ... but all other Lucene query parser special characters are escaped to simplify the user experience.." -Todd Feak -Original Message- From: payalsharma [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 09, 2008 4:51 AM To: solr-user@lucene.apache.org Subject: Issue with Search when using wildcard(*) in search term. Hi All, I am searching a term on Solr by using wildcard character "*" like this : http://delpearsonwebapps:8080/apache-solr-1.3.0/core51043/select/?q= alle* here the search term(word) is : alle* This query gives me proper result , but as i give dismaxrequest as parameter in the query , no results are returned , query with dismax parameter goes like this : http://delpearsonwebapps:8080/apache-solr-1.3.0/core51043/select/?q= alle*&qt=dismaxrequest Can anybody let me know the reason behind this behavior, also do I need to make any changes in my SolrConfig.XML in order to make the query run with both Wildcard as well as dismaxrequest. Thanks in advance. Payal -- View this message in context: http://www.nabble.com/Issue-with-Search-when-using-wildcard%28*%29-in-se arch-term.-tp20914102p20914102.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Sorting on text-fields with international characters
One option is to add an additional field for sorting. Create a copy of the field you want to sort on and modify the data you insert there so that it will sort the way you want it to. -ToddFeak -Original Message- From: Joel Karlsson [mailto:[EMAIL PROTECTED] Sent: Monday, December 08, 2008 2:38 PM To: solr-user@lucene.apache.org Subject: Sorting on text-fields with international characters Hello, Is there any way to get Solr to sort properly on a text field containing international, in my case swedish, letters? It doesn't sort å,ä and ö in the proper order. Also, is there any way to get Solr to sort, i.e, á, à or â together with the "regular" a's? Thanks in advance! // Joel
RE: Encoded search string & qt=Dismax
Do you have a "dismaxrequest" request handler defined in your solr config xml? Or is it "dismax"? -Todd Feak -Original Message- From: tushar kapoor [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 10:07 AM To: solr-user@lucene.apache.org Subject: Encoded search string & qt=Dismax Hi, I am facing problems while searching for some encoded text as part of the search query string. The results don't come up when I use some url encoding with qt=dismaxrequest. I am searching a Russian word by posting a URL encoded UTF8 transformation of the word. The query works fine for normal request. However, no docs are fetched when qt=dismaxrequest is appended as part of the query string. The word being searched is - Russian Word - Предварительное UTF8 Java Encoding - \u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 Posted query string (URL Encoded) - %5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435 Following are the two queries and the difference in results Query 1 - this one works fine ?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435 Result - 0 0 \u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 productIndex productIndex 4100018 4100018 productIndex product Предварительное K математики учебная книга 4100018 4100018 21125 91048 91047 21125 21125 91048 91047 Предварительное K математики учебная книга Предварительное K математики учебная книга product product 91048 91047 20081202T08:14:05.63Z Query 2 - qt=dismaxrequest - This doesnt work ?q=%5Cu041f%5Cu0440%5Cu0435%5Cu0434%5Cu0432%5Cu0430%5Cu0440%5Cu0438%5Cu0442%5Cu0435%5Cu043b%5Cu044c%5Cu043d%5Cu043e%5Cu0435&qt=dismaxrequest Result - 0 109 \u041f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 dismaxrequest Dont know why there is a difference on appending qt=dismaxrequest. Any help would be appreciated. Regards, Tushar. -- View this message in context: http://www.nabble.com/Encoded--search-string---qt%3DDismax-tp20797703p20797703.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: maxWarmingSearchers
The commit after each one may be hurting you. I believe that a new searcher is created after each commit. That searcher then runs through its warm up, which can be costly depending on your warming settings. Even if it's not overly costly, creating another one while the first one is running makes both of them run just a bit slower. Then creating a third exacerbates it, etc. If you are commiting faster then it can warm, you will get the pile-up of searchers you are seeing. And the more that pile up, the longer it takes each one to finish up. I would suggest trying to group those 4-10 documents into a single update job and doing a single commit. That way only 1 searcher is created per 4 minute window. Also (sorry I forgot this earlier) you can see how long your searcher is spending warming up by looking at the stats page under the admin. (/admin/stats.jsp) There is timing information on how long it took for the searcher and caches to warm up. -Todd Feak -Original Message- From: dudes dudes [mailto:[EMAIL PROTECTED] Sent: Monday, December 01, 2008 1:46 PM To: solr-user@lucene.apache.org Subject: RE: maxWarmingSearchers > Subject: RE: maxWarmingSearchers > Date: Mon, 1 Dec 2008 13:35:53 -0800 > From: [EMAIL PROTECTED] > To: solr-user@lucene.apache.org > > Ok sounds reasonable. When you index/update those 4-10 documents, are > you doing a single commit? OR are you doing a commit after each one? well, commits after each one.. > How big is your index? How big are your documents? Ballpark figures are > ok. more than couple of MBs one final piece of information: I only have 2 G of RAM on that machine( linux on VMware environment ) and increased the memory of tomcat to 1 G thanks > > -ToddFeak > > -Original Message- > From: dudes dudes [mailto:[EMAIL PROTECTED] > Sent: Monday, December 01, 2008 1:24 PM > To: solr-user@lucene.apache.org > Subject: RE: maxWarmingSearchers > > > Hi ToddFeak, > > thanks for your response... > > solr version is 1.3. Roughly about every 4 minutes there are > indexing/updaing of 4 to 10 documents that is from multiple clients > to one master server... > > It is also worth mentioning that I have > > > > > > postCommit uncommented under solrconfig ... QueryCache and > FilterCache settings are left as default > > thanks > ak > > > > > > > > Subject: RE: maxWarmingSearchers > > Date: Mon, 1 Dec 2008 13:13:15 -0800 > > From: [EMAIL PROTECTED] > > To: solr-user@lucene.apache.org > > > > Probably going to need a bit more information. > > > > Such as: > > What version of Solr and a little info on doc count, index size, etc. > > How often are you sending updates to your Master? > > How often are you committing? > > What are your QueryCache and FilterCache settings for autowarm? > > Do you have queries set up for newSearcher and firstSearcher? > > > > To start looking for your problem, you usually get a pile up of > > searchers if you are committing too fast, and/or the warming of new > > searchers is taking an extraordinary long time. If is happening in a > > repeatable fashion, increasing the number of warming searchers > probably > > won't fix the issue, just delay it. > > > > -ToddFeak > > > > -Original Message- > > From: dudes dudes [mailto:[EMAIL PROTECTED] > > Sent: Monday, December 01, 2008 12:13 PM > > To: solr-user@lucene.apache.org > > Subject: maxWarmingSearchers > > > > > > Hello all, > > > > I'm having this issue and I hope I get some help.. :) > > > > This following happens quite often ... even though searching and > > indexing are on a safe side... > > > > SolrException: HTTP code=503, reason=Error opening new searcher. > > exceeded > > > > limit of maxWarmingSearchers=4, try again later. > > > > I have increased the value of maxWarmingSearchers to 8 and I still > > experience the same problem > > > > This issue is happening to the master solr server changing > > maxWarmingSearchers to higher value would help overcoming this issue > ? > > or I should consider some other points ? > > > > Another question is ? from your experience, do you think such error > > introduces server crash ? > > > > > > thanks for your time.. > > ak > > > > > > > > _ > > Get a bird's eye view of the world with Multimap > > http://clk.atdmt.com/GBL/go/115454059/direct/01/ > > _ > Get Windows Live Messenger on your Mobile > http://clk.atdmt.com/UKM/go/msnnkmgl001001ukm/direct/01/ _ Imagine a life without walls. See the possibilities. http://clk.atdmt.com/UKM/go/122465943/direct/01/
RE: maxWarmingSearchers
Ok sounds reasonable. When you index/update those 4-10 documents, are you doing a single commit? OR are you doing a commit after each one? How big is your index? How big are your documents? Ballpark figures are ok. -ToddFeak -Original Message- From: dudes dudes [mailto:[EMAIL PROTECTED] Sent: Monday, December 01, 2008 1:24 PM To: solr-user@lucene.apache.org Subject: RE: maxWarmingSearchers Hi ToddFeak, thanks for your response... solr version is 1.3. Roughly about every 4 minutes there are indexing/updaing of 4 to 10 documents that is from multiple clients to one master server... It is also worth mentioning that I have postCommit uncommented under solrconfig ... QueryCache and FilterCache settings are left as default thanks ak > Subject: RE: maxWarmingSearchers > Date: Mon, 1 Dec 2008 13:13:15 -0800 > From: [EMAIL PROTECTED] > To: solr-user@lucene.apache.org > > Probably going to need a bit more information. > > Such as: > What version of Solr and a little info on doc count, index size, etc. > How often are you sending updates to your Master? > How often are you committing? > What are your QueryCache and FilterCache settings for autowarm? > Do you have queries set up for newSearcher and firstSearcher? > > To start looking for your problem, you usually get a pile up of > searchers if you are committing too fast, and/or the warming of new > searchers is taking an extraordinary long time. If is happening in a > repeatable fashion, increasing the number of warming searchers probably > won't fix the issue, just delay it. > > -ToddFeak > > -Original Message- > From: dudes dudes [mailto:[EMAIL PROTECTED] > Sent: Monday, December 01, 2008 12:13 PM > To: solr-user@lucene.apache.org > Subject: maxWarmingSearchers > > > Hello all, > > I'm having this issue and I hope I get some help.. :) > > This following happens quite often ... even though searching and > indexing are on a safe side... > > SolrException: HTTP code=503, reason=Error opening new searcher. > exceeded > > limit of maxWarmingSearchers=4, try again later. > > I have increased the value of maxWarmingSearchers to 8 and I still > experience the same problem > > This issue is happening to the master solr server changing > maxWarmingSearchers to higher value would help overcoming this issue ? > or I should consider some other points ? > > Another question is ? from your experience, do you think such error > introduces server crash ? > > > thanks for your time.. > ak > > > > _ > Get a bird's eye view of the world with Multimap > http://clk.atdmt.com/GBL/go/115454059/direct/01/ _ Get Windows Live Messenger on your Mobile http://clk.atdmt.com/UKM/go/msnnkmgl001001ukm/direct/01/
RE: maxWarmingSearchers
Probably going to need a bit more information. Such as: What version of Solr and a little info on doc count, index size, etc. How often are you sending updates to your Master? How often are you committing? What are your QueryCache and FilterCache settings for autowarm? Do you have queries set up for newSearcher and firstSearcher? To start looking for your problem, you usually get a pile up of searchers if you are committing too fast, and/or the warming of new searchers is taking an extraordinary long time. If is happening in a repeatable fashion, increasing the number of warming searchers probably won't fix the issue, just delay it. -ToddFeak -Original Message- From: dudes dudes [mailto:[EMAIL PROTECTED] Sent: Monday, December 01, 2008 12:13 PM To: solr-user@lucene.apache.org Subject: maxWarmingSearchers Hello all, I'm having this issue and I hope I get some help.. :) This following happens quite often ... even though searching and indexing are on a safe side... SolrException: HTTP code=503, reason=Error opening new searcher. exceeded limit of maxWarmingSearchers=4, try again later. I have increased the value of maxWarmingSearchers to 8 and I still experience the same problem This issue is happening to the master solr server changing maxWarmingSearchers to higher value would help overcoming this issue ? or I should consider some other points ? Another question is ? from your experience, do you think such error introduces server crash ? thanks for your time.. ak _ Get a bird's eye view of the world with Multimap http://clk.atdmt.com/GBL/go/115454059/direct/01/
RE: WordDelimeterFilter and its Factory: access to charTypeTable
I've found that creating a custom filter and filter factory isn't too burdensome when the filter doesn't "quite" do what I need. You could grab the source and create your own version. -Todd Feak -Original Message- From: Jerven Bolleman [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 1:56 AM To: solr-user@lucene.apache.org Subject: WordDelimeterFilter and its Factory: access to charTypeTable Hi Solr Community, I was wondering if it is possible to access and modify the charTypeTable of the WordDelimeterFilter. The use case is that I do not want to split on a '*' char. Which the filter currently does. If I could modify the charTypeTable I could change the behaviour of the filter. Or am I barking up the wrong tree and should I use a different approach? Thanks, Jerven Bolleman
RE: Newbie: For stopword query - All objects being returned
Could you provide your schema and the exact query that you issued? Things to consider... If you just searched for "the", it used the default search field, which is declared in your schema. The filters associated with that default field are what determine whether or not the stopword list is invoked during the query (and/or indexing time). -Todd Feak -Original Message- From: Sanjay Suri [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 12:31 AM To: solr-user@lucene.apache.org Subject: Newbie: For stopword query - All objects being returned Hi , I realize this might be too simple - Can someone tell me where to look? I'm new to solr and have to fix this for a demo asap. If my search query is "the", all 91 objects are returned as search results. I expect 0 results. -- Sanjay Suri Videocrux Inc. http://videocrux.com +91 99102 66626
RE: Searchable/indexable newsgroups
Can Nutch crawl newsgroups? Anyone? -Todd Feak -Original Message- From: John Martyniak [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 3:06 PM To: solr-user@lucene.apache.org Subject: Searchable/indexable newsgroups Does anybody know of a good way to index newsgroups using SOLR? Basically would like to build a searchable list of newsgroup content. Any help would be greatly appreciated. -John
RE: Solr security
I see value in this in the form of protecting the client from itself. For example, our Solr isn't accessible from the Internet. It's all behind firewalls. But, the client applications can make programming mistakes. I would love the ability to lock them down to a certain number of rows, just in case someone typos and puts in 1000 instead of 100, or the like. Admittedly, testing and QA should catch these things, but sometimes it's nice to put in a few safeguards to stop the obvious mistakes from occurring. -Todd Feak -Original Message- From: Matthias Epheser [mailto:[EMAIL PROTECTED] Sent: Monday, November 17, 2008 9:07 AM To: solr-user@lucene.apache.org Subject: Re: Solr security Ryan McKinley schrieb: however I have found that in any site where > stability/load and uptime are a serious concern, this is better handled > in a tier in front of java -- typically the loadbalancer / haproxy / > whatever -- and managed by people more cautious then me. Full ack. What do you think about the only solr related thing "left", the paramter filtering/blocking (eg. rows<1000). Is this suitable to do it in a Filter delivered by solr? Of course as an optional alternative. > > ryan > >
RE: maxCodeLen in the doublemetaphone solr analyzer
There's a patch in to do that as a separate filter. See https://issues.apache.org/jira/browse/SOLR-813 You could just take the patch. It's the full filter and factory. -Todd Feak -Original Message- From: Brian Whitman [mailto:[EMAIL PROTECTED] Sent: Thursday, November 13, 2008 12:31 PM To: solr-user@lucene.apache.org Subject: maxCodeLen in the doublemetaphone solr analyzer I want to change the maxCodeLen param that is in Solr 1.3's doublemetaphone plugin. Doc is here: http://commons.apache.org/codec/apidocs/org/apache/commons/codec/languag e/DoubleMetaphone.html Is this something I can do in solrconfig or do I need to change it and recompile?
RE: solr 1.3 Modification field in schema.xml
I believe (someone correct me if I'm wrong) that the only fields you need to store are those fields which you wish returned from the query. In other words, if you will never put the field on the list of fields (fl) to return, there is no need to store it. It would be advantageous not to store more then you have to. It reduces disk access, index size, memory usage, etc. However, you have to balance this against future needs. If re-indexing is costly just to start storing 1 more field, it may be worth it to just leave it in. -Todd Feak -Original Message- From: sunnyfr [mailto:[EMAIL PROTECTED] Sent: Thursday, November 13, 2008 9:13 AM To: solr-user@lucene.apache.org Subject: solr 1.3 Modification field in schema.xml Hi everybody, I don't get really when do I have to re index datas or not. I did a full import but I realised I stored too many fields which I don't need. So I have to change some fields inedexed which are stored to not stored. And I don't know if I have to re index my datas or not and in which case really do I have to re index datas. Another question, I would like to know which field must be stored, I thought it was field which use function for boosting, but I just tried to boost one field indexed but not stored and it worked. Thanks a lot for putting some light on my questions, -- View this message in context: http://www.nabble.com/solr-1.3--Modification-field-in-schema.xml-tp20483 691p20483691.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: NIO not working yet
Is support for setting the FSDirectory this way built into 1.3.0 release? Or is it necessary to grab a trunk build. -Todd Feak -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, November 12, 2008 11:59 AM To: solr-user@lucene.apache.org Subject: NIO not working yet NIO support in the latest Solr development versions does not work yet (I previously advised that some people with possible lock contention problems try it out). We'll let you know when it's fixed, but in the meantime you can always set the system property "org.apache.lucene.FSDirectory.class" to "org.apache.lucene.store.NIOFSDirectory" to try it out. for example: java -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.NIOFSDirec tory -jar start.jar -Yonik
RE: Throughput Optimization
Yonik said something about the FastLRUCache giving the most gain for high hit-rates and the LRUCache being faster for low hit-rates. It's in his Nov 1 comment on SOLR-667. I'm not sure if anything changed since then, as it's an active issue, but you may want to try the LRUCache for your query cache. It sounds like you are memory bound already, but you may want to investigate the tradeoffs of your filter cache vs. document cache. High document hit-rate was a big performance boost for us, as document garbage collection is a lot of overhead. I believe that would show up as CPU usage though, so it may not be your bottleneck. This also brings up an interesting question. 3% hit rate on your query cache seems low to me. Are you sure your load test is mimicking realistic query patterns from your user base? I realize this probably isn't part of your bottleneck, just curious. -Todd Feak -Original Message- From: wojtekpia [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 05, 2008 11:08 AM To: solr-user@lucene.apache.org Subject: RE: Throughput Optimization My documentCache hit rate is ~.7, and my queryCache is ~.03. I'm using FastLRUCache on all 3 of the caches. Feak, Todd wrote: > > What are your other cache hit rates looking like? > Which caches are you using the FastLRUCache on? > > -Todd Feak > > -Original Message- > From: wojtekpia [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 05, 2008 8:15 AM > To: solr-user@lucene.apache.org > Subject: Re: Throughput Optimization > > > Yes, I am seeing evictions. I've tried setting my filterCache higher, > but > then I start getting Out Of Memory exceptions. My filterCache hit ratio > is > > .99. It looks like I've hit a RAM bound here. > > I ran a test without faceting. The response times / throughput were both > significantly higher, there were no evictions from the filter cache, but > I > still wasn't getting > 50% CPU utilization. Any thoughts on what > physical > bound I've hit in this case? > > > > Erik Hatcher wrote: >> >> One quick question are you seeing any evictions from your >> filterCache? If so, it isn't set large enough to handle the faceting > >> you're doing. >> >> Erik >> >> >> On Nov 4, 2008, at 8:01 PM, wojtekpia wrote: >> >>> >>> I've been running load tests over the past week or 2, and I can't >>> figure out >>> my system's bottle neck that prevents me from increasing throughput. > >>> First >>> I'll describe my Solr setup, then what I've tried to optimize the >>> system. >>> >>> I have 10 million records and 59 fields (all are indexed, 37 are >>> stored, 17 >>> have termVectors, 33 are multi-valued) which takes about 15GB of >>> disk space. >>> Most field values are very short (single word or number), and >>> usually about >>> half the fields have any data at all. I'm running on an 8-core, 64- >>> bit, 32GB >>> RAM Redhat box. I allocate about 24GB of memory to the java process, > >>> and my >>> filterCache size is 700,000. I'm using a version of Solr between 1.3 > >>> and the >>> current trunk (including the latest SOLR-667 (FastLRUCache) patch), >>> and >>> Tomcat 6.0. >>> >>> I'm running a ramp-test, increasing the number of users every few >>> minutes. I >>> measure the maximum number of requests that Solr can handle per >>> second with >>> a fixed response time, and call that my throughput. I'd like to see >>> a single >>> physical resource be maxed out at some point during my test so I >>> know it is >>> my bottle neck. I generated random queries for my dataset >>> representing a >>> more or less realistic scenario. The queries include faceting by up >>> to 6 >>> fields, and quering by up to 8 fields. >>> >>> I ran a baseline on the un-optimized setup, and saw peak CPU usage >>> of about >>> 50%, IO usage around 5%, and negligible network traffic. >>> Interestingly, the >>> CPU peaked when I had 8 concurrent users, and actually dropped down >>> to about >>> 40% when I increased the users beyond 8. Is that because I have 8 >>> cores? >>> >>> I changed a few settings and observed the effect on throughput: >>> >>> 1. Increased filterCache size, and throughput increased by about >>> 50%, but it >>> seems to peak. >>> 2. Put
RE: Throughput Optimization
What are your other cache hit rates looking like? Which caches are you using the FastLRUCache on? -Todd Feak -Original Message- From: wojtekpia [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 05, 2008 8:15 AM To: solr-user@lucene.apache.org Subject: Re: Throughput Optimization Yes, I am seeing evictions. I've tried setting my filterCache higher, but then I start getting Out Of Memory exceptions. My filterCache hit ratio is > .99. It looks like I've hit a RAM bound here. I ran a test without faceting. The response times / throughput were both significantly higher, there were no evictions from the filter cache, but I still wasn't getting > 50% CPU utilization. Any thoughts on what physical bound I've hit in this case? Erik Hatcher wrote: > > One quick question are you seeing any evictions from your > filterCache? If so, it isn't set large enough to handle the faceting > you're doing. > > Erik > > > On Nov 4, 2008, at 8:01 PM, wojtekpia wrote: > >> >> I've been running load tests over the past week or 2, and I can't >> figure out >> my system's bottle neck that prevents me from increasing throughput. >> First >> I'll describe my Solr setup, then what I've tried to optimize the >> system. >> >> I have 10 million records and 59 fields (all are indexed, 37 are >> stored, 17 >> have termVectors, 33 are multi-valued) which takes about 15GB of >> disk space. >> Most field values are very short (single word or number), and >> usually about >> half the fields have any data at all. I'm running on an 8-core, 64- >> bit, 32GB >> RAM Redhat box. I allocate about 24GB of memory to the java process, >> and my >> filterCache size is 700,000. I'm using a version of Solr between 1.3 >> and the >> current trunk (including the latest SOLR-667 (FastLRUCache) patch), >> and >> Tomcat 6.0. >> >> I'm running a ramp-test, increasing the number of users every few >> minutes. I >> measure the maximum number of requests that Solr can handle per >> second with >> a fixed response time, and call that my throughput. I'd like to see >> a single >> physical resource be maxed out at some point during my test so I >> know it is >> my bottle neck. I generated random queries for my dataset >> representing a >> more or less realistic scenario. The queries include faceting by up >> to 6 >> fields, and quering by up to 8 fields. >> >> I ran a baseline on the un-optimized setup, and saw peak CPU usage >> of about >> 50%, IO usage around 5%, and negligible network traffic. >> Interestingly, the >> CPU peaked when I had 8 concurrent users, and actually dropped down >> to about >> 40% when I increased the users beyond 8. Is that because I have 8 >> cores? >> >> I changed a few settings and observed the effect on throughput: >> >> 1. Increased filterCache size, and throughput increased by about >> 50%, but it >> seems to peak. >> 2. Put the entire index on a RAM disk, and significantly reduced the >> average >> response time, but my throughput didn't change (i.e. even though my >> response >> time was 10X faster, the maximum number of requests I could make per >> second >> didn't increase). This makes no sense to me, unless there is another >> bottle >> neck somewhere. >> 3. Reduced the number of records in my index. The throughput >> increased, but >> the shape of all my graphs stayed the same, and my CPU usage was >> identical. >> >> I have a few questions: >> 1. Can I get more than 50% CPU utilization? >> 2. Why does CPU utilization fall when I make more than 8 concurrent >> requests? >> 3. Is there an obvious bottleneck that I'm missing? >> 4. Does Tomcat have any settings that affect Solr performance? >> >> Any input is greatly appreciated. >> >> -- >> View this message in context: >> http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20343425.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Throughput Optimization
If you are seeing < 90% CPU usage and are not IO (File or Network) bound, then you are most probably bound by lock contention. If your CPU usage goes down as you throw more threads at the box, that's an even bigger indication that that is the issue. A good profiling tool should help you locate this. I'm not endorsing it in any way, but I've use YourKit locally and have been able to see where the actual contention was coming from. That lead to my interest in the SOLR-667 cache fixes which provided enormous benefit. -Todd -Original Message- From: wojtekpia [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 05, 2008 8:15 AM To: solr-user@lucene.apache.org Subject: Re: Throughput Optimization Yes, I am seeing evictions. I've tried setting my filterCache higher, but then I start getting Out Of Memory exceptions. My filterCache hit ratio is > .99. It looks like I've hit a RAM bound here. I ran a test without faceting. The response times / throughput were both significantly higher, there were no evictions from the filter cache, but I still wasn't getting > 50% CPU utilization. Any thoughts on what physical bound I've hit in this case? Erik Hatcher wrote: > > One quick question are you seeing any evictions from your > filterCache? If so, it isn't set large enough to handle the faceting > you're doing. > > Erik > > > On Nov 4, 2008, at 8:01 PM, wojtekpia wrote: > >> >> I've been running load tests over the past week or 2, and I can't >> figure out >> my system's bottle neck that prevents me from increasing throughput. >> First >> I'll describe my Solr setup, then what I've tried to optimize the >> system. >> >> I have 10 million records and 59 fields (all are indexed, 37 are >> stored, 17 >> have termVectors, 33 are multi-valued) which takes about 15GB of >> disk space. >> Most field values are very short (single word or number), and >> usually about >> half the fields have any data at all. I'm running on an 8-core, 64- >> bit, 32GB >> RAM Redhat box. I allocate about 24GB of memory to the java process, >> and my >> filterCache size is 700,000. I'm using a version of Solr between 1.3 >> and the >> current trunk (including the latest SOLR-667 (FastLRUCache) patch), >> and >> Tomcat 6.0. >> >> I'm running a ramp-test, increasing the number of users every few >> minutes. I >> measure the maximum number of requests that Solr can handle per >> second with >> a fixed response time, and call that my throughput. I'd like to see >> a single >> physical resource be maxed out at some point during my test so I >> know it is >> my bottle neck. I generated random queries for my dataset >> representing a >> more or less realistic scenario. The queries include faceting by up >> to 6 >> fields, and quering by up to 8 fields. >> >> I ran a baseline on the un-optimized setup, and saw peak CPU usage >> of about >> 50%, IO usage around 5%, and negligible network traffic. >> Interestingly, the >> CPU peaked when I had 8 concurrent users, and actually dropped down >> to about >> 40% when I increased the users beyond 8. Is that because I have 8 >> cores? >> >> I changed a few settings and observed the effect on throughput: >> >> 1. Increased filterCache size, and throughput increased by about >> 50%, but it >> seems to peak. >> 2. Put the entire index on a RAM disk, and significantly reduced the >> average >> response time, but my throughput didn't change (i.e. even though my >> response >> time was 10X faster, the maximum number of requests I could make per >> second >> didn't increase). This makes no sense to me, unless there is another >> bottle >> neck somewhere. >> 3. Reduced the number of records in my index. The throughput >> increased, but >> the shape of all my graphs stayed the same, and my CPU usage was >> identical. >> >> I have a few questions: >> 1. Can I get more than 50% CPU utilization? >> 2. Why does CPU utilization fall when I make more than 8 concurrent >> requests? >> 3. Is there an obvious bottleneck that I'm missing? >> 4. Does Tomcat have any settings that affect Solr performance? >> >> Any input is greatly appreciated. >> >> -- >> View this message in context: >> http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20343425.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SOLR Performance
Most desktops nowadays have at least a dual-core and 1GB, you may be able to get a semi-realistic feel for performance on a local desktop. If you have access to something meaty in a desktop, you may not have to spend a dime to find out what it's going to take in a server. -T -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Monday, November 03, 2008 4:25 PM To: solr-user@lucene.apache.org Subject: Re: SOLR Performance If you never execute any queries, a gig should be more than enough. Of course, I've never played around with a .8 billion doc corpus on one machine. -Mike On 3-Nov-08, at 2:16 PM, Alok Dhir wrote: > in terms of RAM -- how to size that on the indexer? > > --- > Alok K. Dhir > Symplicity Corporation > www.symplicity.com > (703) 351-0200 x 8080 > [EMAIL PROTECTED] > > On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote: > >> The indexing box can be much smaller, especially in terms of CPU. >> It just needs one fast thread and enough disk. >> >> wunder >> >> On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote: >> >>> I was afraid of that. Was hoping not to need another big fat box >>> like >>> this one... >>> >>> --- >>> Alok K. Dhir >>> Symplicity Corporation >>> www.symplicity.com >>> (703) 351-0200 x 8080 >>> [EMAIL PROTECTED] >>> >>> On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote: >>> >>>> I believe this is one of the reasons that a master/slave >>>> configuration >>>> comes in handy. Commits to the Master don't slow down queries on >>>> the >>>> Slave. >>>> >>>> -Todd >>>> >>>> -Original Message- >>>> From: Alok Dhir [mailto:[EMAIL PROTECTED] >>>> Sent: Monday, November 03, 2008 1:47 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: SOLR Performance >>>> >>>> We've moved past this issue by reducing date precision -- thanks to >>>> all for the help. Now we're at another problem. >>>> >>>> There is relatively constant updating of the index -- new log >>>> entries >>>> are pumped in from several applications continuously. Obviously, >>>> new >>>> entries do not appear in searches until after a commit occurs. >>>> >>>> The problem is, issuing a commit causes searches to come to a >>>> screeching halt for up to 2 minutes. We're up to around 80M docs. >>>> Index size is 27G. The number of docs will soon be 800M, which >>>> doesn't bode well for these "pauses" in search performance. >>>> >>>> I'd appreciate any suggestions. >>>> >>>> --- >>>> Alok K. Dhir >>>> Symplicity Corporation >>>> www.symplicity.com >>>> (703) 351-0200 x 8080 >>>> [EMAIL PROTECTED] >>>> >>>> On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote: >>>> >>>>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core >>>>> machine. >>>>> >>>>> Fairly simple schema -- no large text fields, standard request >>>>> handler. 4 small facet fields. >>>>> >>>>> The index is an event log -- a primary search/retrieval >>>>> requirement >>>>> is date range queries. >>>>> >>>>> A simple query without a date range subquery is ridiculously >>>>> fast - >>>>> 2ms. The same query with a date range takes up to 30s (30,000ms). >>>>> >>>>> Concrete example, this query just look 18s: >>>>> >>>>> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z >>>> TO >>>>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position" >>>>> >>>>> The exact same query without the date range took 2ms. >>>>> >>>>> I saw a thread from Apr 2008 which explains the problem being >>>>> due to >>>>> too much precision on the DateField type, and the range expansion >>>>> leading to far too many elements being checked. Proposed solution >>>>> appears to be a hack where you index date fields as strings and >>>>> hacking together date functions to generate proper queries/format >>>>> results. >>>>> >>>>> Does this remain the recommended solution to this issue? >>>>> >>>>> Thanks >>>>> >>>>> --- >>>>> Alok K. Dhir >>>>> Symplicity Corporation >>>>> www.symplicity.com >>>>> (703) 351-0200 x 8080 >>>>> [EMAIL PROTECTED] >>>>> >>>> >>>> >>> >> >
RE: SOLR Performance
I believe this is one of the reasons that a master/slave configuration comes in handy. Commits to the Master don't slow down queries on the Slave. -Todd -Original Message- From: Alok Dhir [mailto:[EMAIL PROTECTED] Sent: Monday, November 03, 2008 1:47 PM To: solr-user@lucene.apache.org Subject: SOLR Performance We've moved past this issue by reducing date precision -- thanks to all for the help. Now we're at another problem. There is relatively constant updating of the index -- new log entries are pumped in from several applications continuously. Obviously, new entries do not appear in searches until after a commit occurs. The problem is, issuing a commit causes searches to come to a screeching halt for up to 2 minutes. We're up to around 80M docs. Index size is 27G. The number of docs will soon be 800M, which doesn't bode well for these "pauses" in search performance. I'd appreciate any suggestions. --- Alok K. Dhir Symplicity Corporation www.symplicity.com (703) 351-0200 x 8080 [EMAIL PROTECTED] On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote: > Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine. > > Fairly simple schema -- no large text fields, standard request > handler. 4 small facet fields. > > The index is an event log -- a primary search/retrieval requirement > is date range queries. > > A simple query without a date range subquery is ridiculously fast - > 2ms. The same query with a date range takes up to 30s (30,000ms). > > Concrete example, this query just look 18s: > > instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z TO > 2008-10-30T03:59:59Z] AND label_facet:"Added to Position" > > The exact same query without the date range took 2ms. > > I saw a thread from Apr 2008 which explains the problem being due to > too much precision on the DateField type, and the range expansion > leading to far too many elements being checked. Proposed solution > appears to be a hack where you index date fields as strings and > hacking together date functions to generate proper queries/format > results. > > Does this remain the recommended solution to this issue? > > Thanks > > --- > Alok K. Dhir > Symplicity Corporation > www.symplicity.com > (703) 351-0200 x 8080 > [EMAIL PROTECTED] >
RE: Custom sort (score + custom value)
Have you looked into the "bf" and "bq" arguments on the DisMaxRequestHandler? http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(dismax)#head -6862070cf279d9a09bdab971309135c7aea22fb3 -Todd -Original Message- From: George [mailto:[EMAIL PROTECTED] Sent: Monday, November 03, 2008 9:38 AM To: solr-user@lucene.apache.org Subject: Re: Custom sort (score + custom value) Ok Yonik, thank you. I've tried to execute the following query: "{!boost b=log(myrank) defType=dismax}q" and it works great. Do you know if I can do the same (combine a DisjunctionMaxQuery with a BoostedQuery) in solrconfig.xml? George On Sun, Nov 2, 2008 at 3:01 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Sun, Nov 2, 2008 at 5:09 AM, George <[EMAIL PROTECTED]> wrote: > > I want to implement a custom sort in Solr based on a combination of > > relevance (Solr gives me it yet => score) and a custom value I've > calculated > > previously for each document. I see two options: > > > > 1. Use a function query (I'm using a DisMaxRequestHandler). > > 2. Create a component that set SortSpec with a sort that has a custom > > ComparatorSource (similar to QueryElevationComponent). > > > > The first option has the problem: While the relevance value changes for > > every query, my custom value is constant for each doc. > > Yes, that can be an issue when adding unrelated scores. > Multiplying them might give you better results: > > http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlu gin.html > > -Yonik >
RE: Performanec Lucene / Solr
I realize you said caching won't help because the searches are different, but what about Document caching? Is every document returned different? What's your hit rate on the Document cache? Can you throw memory at the problem by increasing Document cache size? I ask all this, as the Document cache was the biggest win for my application when it came to increasing performance. Hit rates of 50% resulted in 30% GC time. Hit rates > 95% had GC rates below 2%. -Todd -Original Message- From: Kraus, Ralf | pixelhouse GmbH [mailto:[EMAIL PROTECTED] Sent: Thursday, October 30, 2008 6:18 AM To: solr-user@lucene.apache.org Subject: Re: Performanec Lucene / Solr Grant Ingersoll schrieb: > Have you gone through > http://wiki.apache.org/solr/SolrPerformanceFactors ? > > Can you explain a little more about your testcase, maybe even share > code? I only know a little PHP, but maybe someone else who is better > versed might spot something. I just wrote my JSP script for using solrj instead performence is much much better now ! Greets -Ralf-
RE: date range query performance
It strikes me that removing just the seconds could very well reduce overhead to 1/60 of original. 30 second query turns into 500ms query. Just a swag though. -Todd -Original Message- From: Alok Dhir [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 29, 2008 1:48 PM To: solr-user@lucene.apache.org Subject: Re: date range query performance Well, no - we don't care so much about the seconds, but hours & minutes are indeed crucial. --- Alok K. Dhir Symplicity Corporation www.symplicity.com (703) 351-0200 x 8080 [EMAIL PROTECTED] On Oct 29, 2008, at 4:41 PM, Chris Harris wrote: > Do you need to search down to the minutes and seconds level? If > searching by > date provides sufficient granularity, for instance, you can > normalize all > the time-of-day portions of the timestamps to midnight while > indexing. (So > index any event happening on Oct 01, 2008 as 2008-10-01T00:00:00Z.) > That > would give Solr many fewer unique timestamp values to go through. > > On Wed, Oct 29, 2008 at 1:30 PM, Alok Dhir <[EMAIL PROTECTED]> > wrote: > >> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine. >> >> Fairly simple schema -- no large text fields, standard request >> handler. 4 >> small facet fields. >> >> The index is an event log -- a primary search/retrieval requirement >> is date >> range queries. >> >> A simple query without a date range subquery is ridiculously fast - >> 2ms. >> The same query with a date range takes up to 30s (30,000ms). >> >> Concrete example, this query just look 18s: >> >> instance:client\-csm.symplicity.com AND dt: >> [2008-10-01T04:00:00Z TO >> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position" >> >> The exact same query without the date range took 2ms. >> >> I saw a thread from Apr 2008 which explains the problem being due >> to too >> much precision on the DateField type, and the range expansion >> leading to far >> too many elements being checked. Proposed solution appears to be a >> hack >> where you index date fields as strings and hacking together date >> functions >> to generate proper queries/format results. >> >> Does this remain the recommended solution to this issue? >> >> Thanks >> >> --- >> Alok K. Dhir >> Symplicity Corporation >> www.symplicity.com >> (703) 351-0200 x 8080 >> [EMAIL PROTECTED] >> >>
RE: exceeded limit of maxWarmingSearchers
Have you looked at how long your warm up is taking? If it's taking longer to warm up a searcher then it does for you to do an update, you will be behind the curve and eventually run into this no matter how big that number. -Original Message- From: news [mailto:[EMAIL PROTECTED] On Behalf Of Jon Drukman Sent: Wednesday, October 29, 2008 11:56 AM To: solr-user@lucene.apache.org Subject: exceeded limit of maxWarmingSearchers I am getting this error quite frequently on my Solr installation: SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=8, try again later. I've done some googling but the common explanation of it being related to autocommit doesn't apply. Our server is not even in public use yet, it's serving maybe one query every second, or less. I don't understand what could be causing this. We do a commit on every update, but updates are very infrequent. One every few minutes, and it's a very small update as well. -jsd-
RE: Question about textTight
You may want to take a very close look at what the WordDelimiterFilter is doing. I believe the underscore is dropped entirely during indexing AND searching as it's not alphanumeric. Wiki doco here http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=(t okenizer)#head-1c9b83870ca7890cd73b193cefed83c283339089 The admin analysis page and query debug will help a lot to see what's going on. -Todd -Original Message- From: Stephen Weiss [mailto:[EMAIL PROTECTED] Sent: Monday, October 27, 2008 10:32 PM To: solr-user@lucene.apache.org Subject: Question about textTight Hi, So I've been using the textTight field to hold filenames, and I've run into a weird problem. Basically, people want to search by part of a filename (say, the filename is stm0810m_ws_001ftws and they want to find everything starting with stm0810m_ (stm0810m_*). I'm hoping someone might have done this before (I bet someone has). Lots of things work - you can search for stm0810m_ws_001ftws and get a result, or (stm 0810 m*), or various other combinations. What does not work, is searching for (stm0810m_*) or (stm 0810 m_*) or anything like that - a problem, because often they don't want things with ma_ or mx_, but just m_. It's almost like underscores just break everything, escaping them does nothing. Here's the field definition (it should be what came with my solr): and usage: Now, I thought textTight would be good because it's the one best suited for SKU's, but I guess I'm wrong. What should I be using for this? Would changing any of these "generateWordParts" or "catenateAll" options help? I can't seem to find any documentation so I'm really not sure what it would do, but reindexing this whole thing will take quite some time so I'd rather know what will actually work before I just start changing things. Thanks so much for any insight! -- Steve
RE: One document inserted but nothing showing up ? SOLR 1.3
Unless "q=ALL" is a special query I don't know about, the only reason you would get results is if "ALL" showed up in the default field of the single document that was inserted/updated. You could try a query of "*:*" instead. Don't forget to URL encode if you are doing this via URL. -Todd -Original Message- From: sunnyfr [mailto:[EMAIL PROTECTED] Sent: Thursday, October 23, 2008 9:17 AM To: solr-user@lucene.apache.org Subject: One document inserted but nothing showing up ? SOLR 1.3 Hi Can somebody help me ? How can I see all my documents, I just did a full import : Indexing completed. Added/Updated: 1 documents. Deleted 0 documents. and when I do :8180/solr/video/select/?q=ALL, I've no result ? − 0 0 − ALL Thanks a lot, -- View this message in context: http://www.nabble.com/One-document-inserted-but-nothing-showing-up---SOLR-1.3-tp20134357p20134357.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Re[4]: Question about copyField
My bad. I misunderstood what you wanted. The example I gave was for the searching side of things. Not the data representation in the document. -Todd -Original Message- From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2008 11:14 AM To: Feak, Todd Subject: Re[4]: Question about copyField FT> I would suggest doing this in your schema, then starting up Solr and FT> using the analysis admin page to see if it will index and search the way FT> you want. That way you don't have to pay the cost of actually indexing FT> the data to find out. Thanks. I did it exactly like you said. I created a fieldType "ex" (short for experiment), defined corresponding and try it on the analysis page. Here is what I got (I uploaded the page, so you can see it): http://tut-i-tam.com.ua/static/analysis.jsp.htm I want the final token "samsung spinpoint p spn hard drive gb ata" to be the actual "ex" value. So I expect such response: samsung spinpoint p spn hard drive gb ata SP2514N Samsung SpinPoint12 P120 SP2514N - hard drive - 250 GB - ATA-133 But when I'm searching this doc, I got this: Samsung SpinPoint12 P120 SP2514N - hard drive - 250 GB - ATA-133 SP2514N Samsung SpinPoint12 P120 SP2514N - hard drive - 250 GB - ATA-133 As you can see "description" and "ex" filed are identical. The result of filter chain wasn't actually stored in the "ex" filed :( Anyway, thank you :) FT> -Todd FT> -Original Message- FT> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] FT> Sent: Wednesday, October 22, 2008 9:24 AM FT> To: Feak, Todd FT> Subject: Re[2]: Question about copyField FT> Thanks for reply. I want to make your point more exact, cause I'm not FT> sure that I correctly understood you :) FT> As far as I know (correct me please, if I wrong) type defines the way FT> in which the field is indexed and queried. But I don't want to index FT> or query "suggestion" field in different way, I want "suggestion" field FT> store different value (like in example I wrote in first mail). FT> So you are saying that I can tell to slor (using filedType) how solr FT> should process string before saving it? Yes? FT>> The filters and tokenizer that are applied to the copy field are FT>> determined by it's type in the schema. Simply create a new field FT> type in FT>> your schema with the filters you would like, and use that type for FT> your FT>> copy field. So, the field description would have it's old type, but FT> the FT>> field suggestion would get a new type. FT>> -Todd Feak FT>> -Original Message- FT>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] FT>> Sent: Wednesday, October 22, 2008 8:28 AM FT>> To: solr-user@lucene.apache.org FT>> Subject: Question about copyField FT>> Hello. FT>> I have field "description" in my schema. And I want make a filed FT>> "suggestion" with the same content. So I added following line to my FT>> schema.xml: FT>> FT>> But I also want to modify "description" string before copying it to FT>> "suggestion" field. I want to remove all comas, dots and slashes. FT> Here FT>> is an example of such transformation: FT>> "TvPL/st, SAMSUNG, SML200" => "TvPL st SAMSUNG SML200" FT>> And so as result I want to have such doc: FT>> FT>> 8asydauf9nbcngfaad FT>> TvPL/st, SAMSUNG, SML200 FT>> TvPL st SAMSUNG SML200 FT>> FT>> I think it would be nice to use solr.PatternReplaceFilterFactory for FT>> this purpose. So the question is: Can I use solr filters for FT>> processing "description" string before copying it to "suggestion" FT>> field? FT>> Thank you for your attention. -- Aleksey Gogolev developer, dev.co.ua Aleksey mailto:[EMAIL PROTECTED]
RE: Re[2]: Question about copyField
Yes, using fieldType, you can have Solr run the PatternReplaceFilter for you. So, for example, you can declare something like this: -- ... ... Put the PatternReplaceFilter in here. At least for indexing, maybe for query as well ... ... --- I would suggest doing this in your schema, then starting up Solr and using the analysis admin page to see if it will index and search the way you want. That way you don't have to pay the cost of actually indexing the data to find out. -Todd -Original Message- From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2008 9:24 AM To: Feak, Todd Subject: Re[2]: Question about copyField Thanks for reply. I want to make your point more exact, cause I'm not sure that I correctly understood you :) As far as I know (correct me please, if I wrong) type defines the way in which the field is indexed and queried. But I don't want to index or query "suggestion" field in different way, I want "suggestion" field store different value (like in example I wrote in first mail). So you are saying that I can tell to slor (using filedType) how solr should process string before saving it? Yes? FT> The filters and tokenizer that are applied to the copy field are FT> determined by it's type in the schema. Simply create a new field type in FT> your schema with the filters you would like, and use that type for your FT> copy field. So, the field description would have it's old type, but the FT> field suggestion would get a new type. FT> -Todd Feak FT> -Original Message- FT> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] FT> Sent: Wednesday, October 22, 2008 8:28 AM FT> To: solr-user@lucene.apache.org FT> Subject: Question about copyField FT> Hello. FT> I have field "description" in my schema. And I want make a filed FT> "suggestion" with the same content. So I added following line to my FT> schema.xml: FT> FT> But I also want to modify "description" string before copying it to FT> "suggestion" field. I want to remove all comas, dots and slashes. Here FT> is an example of such transformation: FT> "TvPL/st, SAMSUNG, SML200" => "TvPL st SAMSUNG SML200" FT> And so as result I want to have such doc: FT> FT> 8asydauf9nbcngfaad FT> TvPL/st, SAMSUNG, SML200 FT> TvPL st SAMSUNG SML200 FT> FT> I think it would be nice to use solr.PatternReplaceFilterFactory for FT> this purpose. So the question is: Can I use solr filters for FT> processing "description" string before copying it to "suggestion" FT> field? FT> Thank you for your attention. -- Aleksey Gogolev developer, dev.co.ua Aleksey mailto:[EMAIL PROTECTED]
RE: Question about copyField
The filters and tokenizer that are applied to the copy field are determined by it's type in the schema. Simply create a new field type in your schema with the filters you would like, and use that type for your copy field. So, the field description would have it's old type, but the field suggestion would get a new type. -Todd Feak -Original Message- From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2008 8:28 AM To: solr-user@lucene.apache.org Subject: Question about copyField Hello. I have field "description" in my schema. And I want make a filed "suggestion" with the same content. So I added following line to my schema.xml: But I also want to modify "description" string before copying it to "suggestion" field. I want to remove all comas, dots and slashes. Here is an example of such transformation: "TvPL/st, SAMSUNG, SML200" => "TvPL st SAMSUNG SML200" And so as result I want to have such doc: 8asydauf9nbcngfaad TvPL/st, SAMSUNG, SML200 TvPL st SAMSUNG SML200 I think it would be nice to use solr.PatternReplaceFilterFactory for this purpose. So the question is: Can I use solr filters for processing "description" string before copying it to "suggestion" field? Thank you for your attention. -- Aleksey Gogolev developer, dev.co.ua Aleksey
RE: solr1.3 / tomcat55 / MySql but character_set_client && character_set_connection LATIN1
Any chance this is a MySql server configuration issue? http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html -Todd -Original Message- From: sunnyfr [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 21, 2008 1:09 PM To: solr-user@lucene.apache.org Subject: Re: solr1.3 / tomcat55 / MySql but character_set_client && character_set_connection LATIN1 Any idea,? What can I do? sunnyfr wrote: > > Hi, > > How can I do to manage that ?? > | character_set_client| latin1 > | > | character_set_connection| latin1 > | > | character_set_database | utf8 > | > | character_set_filesystem| binary > | > | character_set_results | latin1 > | > | character_set_server| utf8 > | > | character_set_system| utf8 > | > | character_sets_dir | > /usr/local/mysql-5.0.51b-sphinx/share/mysql/charsets/ | > | collation_connection| latin1_swedish_ci > | > | collation_database | utf8_general_ci > | > | collation_server| utf8_general_ci > > Thanks a lot, > > -- View this message in context: http://www.nabble.com/solr1.3---tomcat55---MySql-but-character_set_clien tcharacter_set_connection---LATIN1-tp20090455p20098329.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem implementing a BinaryQueryResponseWriter
I do have that in my config. It's existence doesn't seem to affect this particular issue. I've tried it with and without. -Todd -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, October 20, 2008 4:36 PM To: solr-user@lucene.apache.org Subject: Re: Problem implementing a BinaryQueryResponseWriter do you have handleSelect set to true in solrconfig? ... if not, it would use a Servlet that is now deprecated On Oct 20, 2008, at 4:52 PM, Feak, Todd wrote: > I found out what's going on. > > My test queries from existing Solr (not 1.3.0) that I am using have > *2* > "select" in the URL. http://host:port/select/select?q=foo . Not sure > why, but that's a separate issue. The result is that it is following a > codepath that bypasses this decision point, and it falls back on > something that assumes it will *not* be a BinaryQueryResponseWriter, > even though it does correctly locate and use my new writer. > > The solution was to map /select/select to a new handler. > > Not sure if this raises another issue or not, but for me it solves the > problem. Thanks for the help. > > -Todd > > -Original Message- > From: Grant Ingersoll [mailto:[EMAIL PROTECTED] > Sent: Monday, October 20, 2008 1:09 PM > To: solr-user@lucene.apache.org > Subject: Re: Problem implementing a BinaryQueryResponseWriter > > I'd start by having a look at SolrDispatchFilter and put in a debug > breakpoint at: > > QueryResponseWriter responseWriter = > core.getQueryResponseWriter(solrReq); > > response.setContentType(responseWriter.getContentType(solrReq, > solrRsp)); > if (Method.HEAD != reqMethod) { > if (responseWriter instanceof > BinaryQueryResponseWriter) { > BinaryQueryResponseWriter binWriter = > (BinaryQueryResponseWriter) responseWriter; > binWriter.write(response.getOutputStream(), > solrReq, solrRsp); > } else { > PrintWriter out = response.getWriter(); > responseWriter.write(out, solrReq, solrRsp); > > } > > > On Oct 20, 2008, at 3:59 PM, Feak, Todd wrote: > >> Yes. >> >> I've gotten it to the point where my class is called, but the wrong >> method on it is called. >> >> -Todd >> >> -Original Message- >> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] >> Sent: Monday, October 20, 2008 12:19 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Problem implementing a BinaryQueryResponseWriter >> >> Hi Todd, >> >> Did you add your response writer in solrconfig.xml? >> >> > class="org.apache.solr.request.XMLResponseWriter" default="true"/> >> >> On Mon, Oct 20, 2008 at 9:35 PM, Feak, Todd <[EMAIL PROTECTED]> >> wrote: >> >>> I switched from dev group for this specific question, in case other >>> users have similar issue. >>> >>> >>> >>> I'm implementing my own BinaryQueryResponseWriter. I've implemented >> the >>> interface and successfully plugged it into the Solr configuration. >>> However, the application always calls the Writer method on the >> interface >>> instead of the OutputStream method. >>> >>> >>> >>> So, how does Solr determine *which* one to call? Is there a setting >>> somewhere I am missing maybe? >>> >>> >>> >>> For troubleshooting purposes, I am using 1.3.0 release version. If I >> try >>> using the BinaryResponseWriter (javabin) as the wt, I get the >> exception >>> indicating that Solr is doing the same thing with that writer as >>> well. >>> This leads me to believe I am somehow misconfigured, OR this isn't >>> supported with 1.3.0 release. >>> >>> >>> >>> -Todd >>> >>> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. > > -- > Grant Ingersoll > Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. > http://www.lucenebootcamp.com > > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > >
RE: Problem implementing a BinaryQueryResponseWriter
I found out what's going on. My test queries from existing Solr (not 1.3.0) that I am using have *2* "select" in the URL. http://host:port/select/select?q=foo . Not sure why, but that's a separate issue. The result is that it is following a codepath that bypasses this decision point, and it falls back on something that assumes it will *not* be a BinaryQueryResponseWriter, even though it does correctly locate and use my new writer. The solution was to map /select/select to a new handler. Not sure if this raises another issue or not, but for me it solves the problem. Thanks for the help. -Todd -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Monday, October 20, 2008 1:09 PM To: solr-user@lucene.apache.org Subject: Re: Problem implementing a BinaryQueryResponseWriter I'd start by having a look at SolrDispatchFilter and put in a debug breakpoint at: QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq); response.setContentType(responseWriter.getContentType(solrReq, solrRsp)); if (Method.HEAD != reqMethod) { if (responseWriter instanceof BinaryQueryResponseWriter) { BinaryQueryResponseWriter binWriter = (BinaryQueryResponseWriter) responseWriter; binWriter.write(response.getOutputStream(), solrReq, solrRsp); } else { PrintWriter out = response.getWriter(); responseWriter.write(out, solrReq, solrRsp); } On Oct 20, 2008, at 3:59 PM, Feak, Todd wrote: > Yes. > > I've gotten it to the point where my class is called, but the wrong > method on it is called. > > -Todd > > -Original Message- > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Sent: Monday, October 20, 2008 12:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Problem implementing a BinaryQueryResponseWriter > > Hi Todd, > > Did you add your response writer in solrconfig.xml? > > class="org.apache.solr.request.XMLResponseWriter" default="true"/> > > On Mon, Oct 20, 2008 at 9:35 PM, Feak, Todd <[EMAIL PROTECTED]> > wrote: > >> I switched from dev group for this specific question, in case other >> users have similar issue. >> >> >> >> I'm implementing my own BinaryQueryResponseWriter. I've implemented > the >> interface and successfully plugged it into the Solr configuration. >> However, the application always calls the Writer method on the > interface >> instead of the OutputStream method. >> >> >> >> So, how does Solr determine *which* one to call? Is there a setting >> somewhere I am missing maybe? >> >> >> >> For troubleshooting purposes, I am using 1.3.0 release version. If I > try >> using the BinaryResponseWriter (javabin) as the wt, I get the > exception >> indicating that Solr is doing the same thing with that writer as >> well. >> This leads me to believe I am somehow misconfigured, OR this isn't >> supported with 1.3.0 release. >> >> >> >> -Todd >> >> > > > -- > Regards, > Shalin Shekhar Mangar. -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
RE: Problem implementing a BinaryQueryResponseWriter
Yes. I've gotten it to the point where my class is called, but the wrong method on it is called. -Todd -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Monday, October 20, 2008 12:19 PM To: solr-user@lucene.apache.org Subject: Re: Problem implementing a BinaryQueryResponseWriter Hi Todd, Did you add your response writer in solrconfig.xml? On Mon, Oct 20, 2008 at 9:35 PM, Feak, Todd <[EMAIL PROTECTED]> wrote: > I switched from dev group for this specific question, in case other > users have similar issue. > > > > I'm implementing my own BinaryQueryResponseWriter. I've implemented the > interface and successfully plugged it into the Solr configuration. > However, the application always calls the Writer method on the interface > instead of the OutputStream method. > > > > So, how does Solr determine *which* one to call? Is there a setting > somewhere I am missing maybe? > > > > For troubleshooting purposes, I am using 1.3.0 release version. If I try > using the BinaryResponseWriter (javabin) as the wt, I get the exception > indicating that Solr is doing the same thing with that writer as well. > This leads me to believe I am somehow misconfigured, OR this isn't > supported with 1.3.0 release. > > > > -Todd > > -- Regards, Shalin Shekhar Mangar.
RE: Japonish language seems to don't work on solr 1.3
I would look real closely at the data between MySQL and Solr. I don't know how it got from the database to the index, but I would try and get a debugger running and look at the actual data as it's moving along. Possible suspects include, JDBC driver, JDBC driver settings, HTTP client (whatever sends the data to Solr). Also, you could play around with the Admin analysis page to make sure it's not cropping up in one of the Tokenizers or Analyzers. But I saw you are using CJK, which most probably doesn't have this issue. -Todd -Original Message- From: sunnyfr [mailto:[EMAIL PROTECTED] Sent: Monday, October 20, 2008 9:40 AM To: solr-user@lucene.apache.org Subject: RE: Japonish language seems to don't work on solr 1.3 So maybe when I import my data from mysql I loose it. ?? sunnyfr wrote: > > I did create again my index ? I'm using mySql when I request japon video > I've got result correctly. > And yes I did try to index again data, it takes one minute so it's not a > problem, but now I don't know what can I do? > > -- View this message in context: http://www.nabble.com/Japan-language-seems-to-don%27t-work-on-solr-1.3-t p20070938p20073767.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Japonish language seems to don't work on solr 1.3
That looks like the data in the index is incorrectly encoded. If the inserts into your index came in via HTTP GET and your Tomcat wasn't configured for UTF-8 at the time, I could see it going into the index corrupted. But I'm not sure if that's even possible (depends on Update) Is it hard to re-create your index after that configuration change? If it's a quick thing to do, it may be worth doing again to eliminate as a possibility. -Todd Feak -Original Message- From: sunnyfr [mailto:[EMAIL PROTECTED] Sent: Monday, October 20, 2008 9:05 AM To: solr-user@lucene.apache.org Subject: RE: Japonish language seems to don't work on solr 1.3 Hi Todd, It does definitly work better, it was the server.xml file, sorry I should have checked, but I still have a dodgy problem, it's like it doesn't encode it in the good way, Because if I'm looking for straight in the URL ... :8180/solr/video/select/?q=豐田真奈美 My result is : 0 0 豐田真奈美 And if I look for : :8180/solr/video/select/?q=ALL My result is : 0 0 ALL 2006-10-10T05:29:32Z All Japan Women's Pro-wrestling WWWA Champion Title Match è±ç”°çœŸå¥ˆç¾Ž VS 井上京å 813343 JA 40 Toyota Manami VS Inoue Kyoko 1421 false false 2008-10-20T15:57:27.197Z Toyota Manami VS Inoue Kyoko This : è±ç”°çœŸå¥ˆç¾Ž VS 井上京å Should be 豐田真奈美 An Idea ? Thanks a lot :) -- View this message in context: http://www.nabble.com/Japan-language-seems-to-don%27t-work-on-solr-1.3-tp20070938p20073108.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem implementing a BinaryQueryResponseWriter
I switched from dev group for this specific question, in case other users have similar issue. I'm implementing my own BinaryQueryResponseWriter. I've implemented the interface and successfully plugged it into the Solr configuration. However, the application always calls the Writer method on the interface instead of the OutputStream method. So, how does Solr determine *which* one to call? Is there a setting somewhere I am missing maybe? For troubleshooting purposes, I am using 1.3.0 release version. If I try using the BinaryResponseWriter (javabin) as the wt, I get the exception indicating that Solr is doing the same thing with that writer as well. This leads me to believe I am somehow misconfigured, OR this isn't supported with 1.3.0 release. -Todd
RE: Japonish language seems to don't work on solr 1.3
Two potential issues I see there. 1. Shouldn't your query string on the URL be encoded? 2. Are you using Tomcat, and did you set it up to use UTF-8 encoding? If not, your connector node in Tomcat needs to have the URIEncoding set to UTF-8. Documentation here http://struts.apache.org/2.0.11.2/docs/how-to-support-utf-8-uriencoding-with-tomcat.html -Todd Feak -Original Message- From: sunnyfr [mailto:[EMAIL PROTECTED] Sent: Monday, October 20, 2008 8:06 AM To: solr-user@lucene.apache.org Subject: Japonish language seems to don't work on solr 1.3 Hi, I don't get what am I doing wrong but when I request : .com:8180/solr/video/select/?q=初恋+-+村下孝蔵&version=2.2&start=0&rows=10&indent=on my result is : − 0 0 − on 0 åæ - æä¸åèµ 10 2.2 − − 2006-09-05T11:20:52Z 612530 JA 150 − PUSHIM, RHYMESTER, MABOROSHI, May J. 21049 false false 2008-10-20T14:58:30.799Z My schema is : ... ... -- View this message in context: http://www.nabble.com/Japonish-language-seems-to-don%27t-work-on-solr-1.3-tp20070938p20070938.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Lucene 2.4 released
The current Subversion trunk has the new Lucene 2.4.0 libraries committed. So, it's definitely under way. -Todd -Original Message- From: Julio Castillo [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 15, 2008 9:48 AM To: solr-user@lucene.apache.org Subject: Lucene 2.4 released Any ideas when solr 1.3 can be patched to use the official release of Lucene (rather than a Lucene snapshot)? Should I submit a JIRA request? thanks Julio Castillo Edgenuity Inc.
RE: Practical number of Solr instances per machine
Sorry Yonik, I hope this didn't come off as criticism. Far from it. We are very happy with the performance we are getting. I just happen to be the performance junkie trying to get every little bit out. That being said, I'm happy to hear it's going to get even better! -Todd -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, October 14, 2008 1:38 PM To: solr-user@lucene.apache.org Subject: Re: Practical number of Solr instances per machine On Tue, Oct 14, 2008 at 4:29 PM, Feak, Todd <[EMAIL PROTECTED]> wrote: > In our load testing, the limit for utilizing all of the processor time > on a box was locking (synchronize, mutex, monitor, pick one). There were > a couple of locking points that we saw. > > 1. Lucene's locking on the index for simultaneous read/write protection. > 2. Solr's locking on the LRUCaches for update protection. Luckily, both of these are very close to being improved: 1. Lucene 2.4 has NIO support (lockless) except for Windows, and there is already a Solr patch to add support for that. 2. Solr already has a patch (soon to be committed) for an LRUCache based on ConcurrentHashMap that should work better with multiple CPUs. -Yonik
RE: Practical number of Solr instances per machine
In our load testing, the limit for utilizing all of the processor time on a box was locking (synchronize, mutex, monitor, pick one). There were a couple of locking points that we saw. 1. Lucene's locking on the index for simultaneous read/write protection. 2. Solr's locking on the LRUCaches for update protection. If you've gotten Solr configured to the point where *most* of your work is done in memory, then multiple instances of Solr would essentially distribute this locking and create less contention enabling you to utilize more of the CPU. This assumes that the creation of another JVM won't hinder your in memory caching. Please note, this was only for *our* Solr configuration. It doesn't necessarily reflect anyone else's configuration. It does, however, provide at least one scenario where multiple instances could increase performance. -Todd -Original Message- From: Phillip Farber [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 14, 2008 12:44 PM To: solr-user@lucene.apache.org Subject: Re: Practical number of Solr instances per machine Otis, you have a good memory :-) I guess the main thing that prompted my question me was Mike Klass' statement that he runs 2 instance per machine to "squeeze" performance out of the box. That raised the question in my mind as to just how this could benefit performance over a single instance in one box. Phil Otis Gospodnetic wrote: > Hi, > > Did you not ask this question a while back? I may be mixing things... (hah, no, just checked) > In short, it depends on a number of factors, such as index sizes, query rates, complexity of queries, amount of RAM, your target query latency, etc. etc. So there is no super clear cut answer. If you have some concrete numbers, that will be easier to answer :) > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Phillip Farber <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Wednesday, October 8, 2008 5:34:58 PM >> Subject: Practical number of Solr instances per machine >> >> >> Hello everyone, >> >> What is the generally accepted number of solr instances it makes sense >> to run on a single machine given solr/lucene threading? Servers now >> commonly have 4 or 8 cpus. Obviously the more instances you run the >> bigger your JVM heap needs to be and that takes away from OS cache. Is >> the sweet spot just one instance per machine? What is the right way to >> think about this issue? >> >> Thanks, >> >> Phil >