RE: Results driving me nuts!
-Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Sunday, March 13, 2011 6:25 PM To: solr-user@lucene.apache.org; andy.ne...@gmail.com Subject: Re: Results driving me nuts! --- On Sun, 3/13/11, Andy Newby andy.ne...@gmail.com wrote: From: Andy Newby andy.ne...@gmail.com Subject: Results driving me nuts! To: solr-user@lucene.apache.org Date: Sunday, March 13, 2011, 10:38 PM Hi, Ok, I'm really really trying to get my head around this, but I just can't :/ Here are 2 example records, both using the query st patricks to search on (matches for the keywords are in **stars** like so, to make a point of what SHOULD be matching); keywords: animations mini alphabets **st** **patricks** animated 1 clover animations mini alphabets **st** **patricks** description: animated 1 clover 124966: 209.23984 = (MATCH) product of: 418.47968 = (MATCH) sum of: 418.47968 = (MATCH) sum of: 212.91336 = (MATCH) weight(keywords:st in 5697), product of: 0.41379675 = queryWeight(keywords:st), product of: 7.5798326 = idf(docFreq=233, maxDocs=168578) 0.05459181 = queryNorm 514.5361 = (MATCH) fieldWeight(keywords:st in 5697), product of: 1.4142135 = tf(termFreq(keywords:st)=2) 7.5798326 = idf(docFreq=233, maxDocs=168578) 48.0 = fieldNorm(field=keywords, doc=5697) 205.56633 = (MATCH) weight(keywords:patricks in 5697), product of: 0.4065946 = queryWeight(keywords:patricks), product of: 7.447905 = idf(docFreq=266, maxDocs=168578) 0.05459181 = queryNorm 505.58057 = (MATCH) fieldWeight(keywords:patricks in 5697), product of: 1.4142135 = tf(termFreq(keywords:patricks)=2) 7.447905 = idf(docFreq=266, maxDocs=168578) 48.0 = fieldNorm(field=keywords, doc=5697) 0.5 = coord(1/2) The other one: desc: a black and white mug of beer with a three leaf clover in it keywords: saint **patricks** day green irish beer spel132_bw clip art holidays **st** **patricks** day handle drink celebrate clip art holidays **st** **patricks** day 5 matches 145351: 193.61652 = (MATCH) product of: 387.23303 = (MATCH) sum of: 387.23303 = (MATCH) sum of: 177.4278 = (MATCH) weight(keywords:st in 25380), product of: 0.41379675 = queryWeight(keywords:st), product of: 7.5798326 = idf(docFreq=233, maxDocs=168578) 0.05459181 = queryNorm 428.78006 = (MATCH) fieldWeight(keywords:st in 25380), product of: 1.4142135 = tf(termFreq(keywords:st)=2) 7.5798326 = idf(docFreq=233, maxDocs=168578) 40.0 = fieldNorm(field=keywords, doc=25380) 209.80525 = (MATCH) weight(keywords:patricks in 25380), product of: 0.4065946 = queryWeight(keywords:patricks), product of: 7.447905 = idf(docFreq=266, maxDocs=168578) 0.05459181 = queryNorm 516.006 = (MATCH) fieldWeight(keywords:patricks in 25380), product of: 1.7320508 = tf(termFreq(keywords:patricks)=3) 7.447905 = idf(docFreq=266, maxDocs=168578) 40.0 = fieldNorm(field=keywords, doc=25380) 0.5 = coord(1/2) Now the thing thats getting me, is the record which has 5 occurencs of st patricks , is so different in terms of the scores it gives! 209.23984 193.61652 (these should be the other way around) Can anyone try and explain whats going on with this? BTW, the queries are matched based on a normal white space index, nothing special. The actual query being used, is as follows: (keywords:st AND keywords:patricks) OR (description:st AND description:patricks) TIA - I'm hoping someone can save my sanity ;) Their fieldNorm values are different. Norm consists of index time boost and length normalization. http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/S imilarity.html#formula_norm I can see that the one with 5 matches is longer than the other. Shorter documents are favored in solr/lucene with length normalization factor. Also the term frequency for patricks is different in each document For 1st doc termFreq(keywords:st)=2 and for 2nd doc termFreq(keywords:patricks)=3
RE: Filter Query, Filter Cache and Hit Ratio
Hi, You've used NOW in the range query which will give a date/time accurate to the millisecond, try using NOW\DAY Colin. -Original Message- From: Renaud Delbru [mailto:renaud.del...@deri.org] Sent: Friday, January 28, 2011 2:22 PM To: solr-user@lucene.apache.org Subject: Filter Query, Filter Cache and Hit Ratio Hi, I am looking for some more information on how the filter cache is working, and how the hit are incremented. We are using filter queries for certain predefined value, such as the timestamp:[2011-01-21T00:00:00Z+TO+NOW] (which is the current day). From what I understand from the documentation: the filter cache stores the results of any filter queries (fq parameters) that Solr is explicitly asked to execute. (Each filter is executed and cached separately. When it's time to use them to limit the number of results returned by a query, this is done using set intersections.) So, we were imagining that is two consecutive queries (as the one above) was using the same timestamp filter query, the second query will take advantage of the filter cache, and we would see the number of hits increasing (hit on the cached timestamp filter query) . However, this is not the case, the number of hits on the filter cache does not increase and stays very low. Is it normal ? INFO: [] webapp=/siren path=/select params={wt=javabinrows=0version=2fl=id,scorestart=0q=*:*isShard=t ruefq=timestamp:[2011-01- 21T00:00:00Z+TO+NOW]fq=domain:my.wordpress.comfsv=true} hits=0 status=0 QTime=139 INFO: [] webapp=/siren path=/select params={wt=javabinrows=0version=2fl=id,scorestart=0q=*:*isShard=t ruefq=timestamp:[2011-01- 21T00:00:00Z+TO+NOW]fq=domain:syours.wordpress.comfsv=true} hits=0 status=0 QTime=138 -- Renaud Delbru
RE: Filter Query, Filter Cache and Hit Ratio
Ooops, I meant NOW/DAY -Original Message- From: cbenn...@job.com [mailto:cbenn...@job.com] Sent: Friday, January 28, 2011 3:37 PM To: solr-user@lucene.apache.org Subject: RE: Filter Query, Filter Cache and Hit Ratio Hi, You've used NOW in the range query which will give a date/time accurate to the millisecond, try using NOW\DAY Colin. -Original Message- From: Renaud Delbru [mailto:renaud.del...@deri.org] Sent: Friday, January 28, 2011 2:22 PM To: solr-user@lucene.apache.org Subject: Filter Query, Filter Cache and Hit Ratio Hi, I am looking for some more information on how the filter cache is working, and how the hit are incremented. We are using filter queries for certain predefined value, such as the timestamp:[2011-01-21T00:00:00Z+TO+NOW] (which is the current day). From what I understand from the documentation: the filter cache stores the results of any filter queries (fq parameters) that Solr is explicitly asked to execute. (Each filter is executed and cached separately. When it's time to use them to limit the number of results returned by a query, this is done using set intersections.) So, we were imagining that is two consecutive queries (as the one above) was using the same timestamp filter query, the second query will take advantage of the filter cache, and we would see the number of hits increasing (hit on the cached timestamp filter query) . However, this is not the case, the number of hits on the filter cache does not increase and stays very low. Is it normal ? INFO: [] webapp=/siren path=/select params={wt=javabinrows=0version=2fl=id,scorestart=0q=*:*isShard=t ruefq=timestamp:[2011-01- 21T00:00:00Z+TO+NOW]fq=domain:my.wordpress.comfsv=true} hits=0 status=0 QTime=139 INFO: [] webapp=/siren path=/select params={wt=javabinrows=0version=2fl=id,scorestart=0q=*:*isShard=t ruefq=timestamp:[2011-01- 21T00:00:00Z+TO+NOW]fq=domain:syours.wordpress.comfsv=true} hits=0 status=0 QTime=138 -- Renaud Delbru
RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
Where do you get your Lucene/Solr downloads from? [x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project)
RE: Query question
Another option is to override the default operator in the query. {!lucene q.op=OR}city:Chicago^10 +Romantic +View Colin. -Original Message- From: Mike Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, November 03, 2010 9:42 AM To: solr-user@lucene.apache.org Cc: kenf_nc Subject: Re: Query question Another alternative (prettier to my eye), would be: (city:Chicago AND Romantic AND View)^10 OR (Romantic AND View) -Mike On 11/03/2010 09:28 AM, kenf_nc wrote: Unfortunately the default operator is set to AND and I can't change that at this time. If I do (city:Chicago^10 OR Romantic OR View) it returns way too many unwanted results. If I do (city:Chicago^10 OR (Romantic AND View)) it returns less unwanted results, but still a lot. iorixxx's solution of (Romantic AND View AND (city:Chicago^10 OR (*:* -city:Chicago))) does seem to work. Chicago results are at the top, and the remaining results seem to fit the other search parameters. It's an ugly query, but does seem to do the trick for now until I master Dismax. Thanks all!
RE: Tomcat startup script
The following should work on centos/redhat, don't forget to edit the paths, user, and java options for your environment. You can use chkconfig to add it to your startup. Note, this script assumes that the Solr webapp is configured using JNDI in a tomcat context fragment. If not you will need to add something like -Dsolr.solr.home=/solr/home to the JAVA_OPTS line. Colin. #!/bin/sh # chkconfig: 345 99 1 # description: Tomcat6 service # processname: java . /etc/init.d/functions my_log_message() { ACTION=$1 shift case $ACTION in success) echo -n $* success $* echo ;; failure) echo -n $* failure $* echo ;; warning) echo -n $* warning $* echo ;; *) ;; esac } log_success_msg() { my_log_message success $* } log_failure_msg() { my_log_message failure $* } log_warning_msg() { my_log_message warning $* } export JAVA_HOME=/usr/java/default export TOMCAT_USER=solr export CATALINA_HOME=/opt/solr/production/tomcat6 export CATALINA_PID=$CATALINA_HOME/bin/tomcat6.pid JAVA_OPTS=-server -Xms6G -Xmx6G -XX:+UseConcMarkSweepGC export JAVA_OPTS [ -d $CATALINA_HOME ] || { echo Tomcat requires $CATALINA_HOME.; exit 1; } case $1 in start|stop|run) if su $TOMCAT_USER bash -c $CATALINA_HOME/bin/catalina.sh $1; then log_success_msg Tomcat $1 successful [ $1 == stop ] rm -f $CATALINA_PID else log_failure_msg Error in Tomcat $1: $? fi ;; restart) $0 start $0 stop ;; status) if [ -f $CATALINA_PID ]; then read kpid $CATALINA_PID if ps --pid $kpid 21 1/dev/null; then echo $0 is already running at ${kpid} else echo $CATALINA_PID found, but $kpid is not running fi unset kpid else echo $0 is stopped fi ;; esac exit 0 -Original Message- From: Sixten Otto [mailto:six...@sfko.com] Sent: Tuesday, June 08, 2010 3:49 PM To: solr-user@lucene.apache.org Subject: Re: Tomcat startup script On Tue, Jun 8, 2010 at 11:00 AM, K Wong wongo...@gmail.com wrote: Okay. I've been running multicore Solr 1.4 on Tomcat 5.5/OpenJDK 6 straight out of the centos repo and I've not had any issues. We're not doing anything wild and crazy with it though. It's nice to know that the wiki's advice might be out of date. That doesn't really help me with my immediate problem (lacking the script the wiki is trying to provide), though, unless I want to rip out what I've got and start over. :-/ Sixten
RE: DIH, Full-Import, DB and Performance.
The settings and defaults will depend on which version of SQL Server you are using and which version of the JDBC driver. The default for resonseBuffering was changed to adaptive after version 1.2 so unless you are using 1.2 or earlier you don't need to set it to adaptive. Also if I remember correctly the batchsize will only take affect if you are using cursors, the default is for all data to be sent to the client (selectMethod is direct). Using the default settings for the MS sqljdbc driver caused locking issues in our database. As soon as the full import started shared locks would be set on all rows and wouldn't be removed until all the data had been sent, which for us would be around 30 minutes. During that time no updates could get an exclusive lock which of course led to huge problems. Setting selectMethod=cursor solved the problem for us although it does slow down the full import. Another option that worked for us was to not set the selectMethod and set readOnly=true, but be sure you understand the implications. This causes all data to be sent to the client (which is the default), giving maximum performance, and causes no locks to be set which resolves the other issues. However, this sets transaction isolation to TRANSACTION_READ_UNCOMMITTED which will cause the select statement to ignore any locks when getting data so the consistency of the data cannot be guaranteed, which may or may not be an issue depending on your particular situation. Colin. -Original Message- From: stockii [mailto:st...@shopgate.com] Sent: Tuesday, June 01, 2010 7:44 AM To: solr-user@lucene.apache.org Subject: Re: DIH, Full-Import, DB and Performance. do you think that the option responseBuffer=adaptive should solve my problem ? From DIH FAQ ...: I'm using DataImportHandler with MS SQL Server database with sqljdbc driver. DataImportHandler is going out of memory. I tried adjustng the batchSize values but they don't seem to make any difference. How do I fix this? There's a connection property called responseBuffering in the sqljdbc driver whose default value is full which causes the entire result set to be fetched. See http://msdn.microsoft.com/en-us/library/ms378988.aspx for more details. You can set this property to adaptive to keep the driver from getting everything into memory. Connection properties like this can be set as an attribute (responseBuffering=adaptive) in the dataSource configuration OR directly in the jdbc url specified in DataImportHandler's dataSource configuration. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH- Full-Import-DB-and-Performance-tp861068p861134.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DIH, Full-Import, DB and Performance.
Performance is dependent on your server/data and the batchsize. To reduce the server load experiment with different batchsize settings. The higher the batch size the faster the import and the higher your SQL Server load will be. Try starting with a small batch and then gradually increasing it. Colin. -Original Message- From: stockii [mailto:st...@shopgate.com] Sent: Tuesday, June 01, 2010 12:31 PM To: solr-user@lucene.apache.org Subject: RE: DIH, Full-Import, DB and Performance. thx for the reply =) i try out selectMethod=cursor but the load of the server is going bigger and bigger during a import =( selectMethod=cursor only solve the problem with the locking ? right ? -- View this message in context: http://lucene.472066.n3.nabble.com/DIH- Full-Import-DB-and-Performance-tp861068p862043.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Commit takes 1 to 2 minutes, CPU usage affects other apps
Hi, This could also be caused by performing an optimize after the commit, or it could be caused by auto warming the caches, or a combination of both. If you are using the Data Import Handler the default for a delta import is commit and optimize, which caused us a similar problem except we were optimizing a 7 million document, 23Gb index with every delta import which was taking over 10 minutes. As soon as we added optimize=false to the command updates took a few seconds. You can always add separate calls to perform the optimize when it's convenient for you. To see if the problem is auto warming take a look at the warm up time for the searcher. If this is the cause you will need to consider lowering the autowarmCount for your caches. Colin. -Original Message- From: Markus Fischer [mailto:mar...@fischer.name] Sent: Tuesday, May 04, 2010 6:22 AM To: solr-user@lucene.apache.org Subject: Re: Commit takes 1 to 2 minutes, CPU usage affects other apps On 04.05.2010 11:01, Peter Sturge wrote: It might be worth checking the VMWare environment - if you're using the VMWare scsi vmdk and it's shared across multiple VMs and there's a lot of disk contention (i.e. multiple VMs are all busy reading/writing to/from the same disk channel), this can really slow down I/O operations. Ok, thanks, I'll try to get the information from my hoster. I noticed that the commiting seems to be constant in time: it doesn't matter whether I'm updating only one document or 50 (usually it won't be more). Maybe these numbers are so low anyway to cause any real impact ... - Markus
RE: Problem with DIH delta-import on JDBC
Hi, It looks like the deltaImportQuery needs to be changed you are using dataimporter.delta.id which is not correct, you are selecting objected in the deltaQuery, so the deltaImportQuery should be using dataimporter.delta.objectid So try this: entity name=test pk=objectid query=select * from table deltaImportQuery=select * from table where objectid='${dataimporter.delta.objectid}' deltaQuery=select objectid from table where lastupdate '${dataimporter.last_index_time}' /entity Colin. -Original Message- From: safl [mailto:s...@salamin.net] Sent: Wednesday, April 28, 2010 3:05 PM To: solr-user@lucene.apache.org Subject: Problem with DIH delta-import on JDBC Hello, I'm just new on the list. I searched a lot on the list, but I didn't find an answer to my question. I'm using Solr 1.4 on Windows with an Oracle 10g database. I am able to do full-import without any problem, but I'm not able to get delta-import working. I have the following in the data-config.xml: ... entity name=test pk=objectid query=select * from table deltaImportQuery=select * from table where objectid='${dataimporter.delta.id}' deltaQuery=select objectid from table where lastupdate '${dataimporter.last_index_time}' /entity ... I update some records in the table and the try to run a delta-import. I track the SQL queries on DB with P6Spy, and I always see a query like select * from table where objectid='' Of course, with such an SQL query, nothing is updated in my index. It behave the same if I replace ${dataimporter.delta.id} by ${dataimporter.delta.objectid}. Can someone tell what is wrong with it? Thanks a lot, Florian -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on- JDBC-tp763469p763469.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr and Garbage Collection
Hi, Have you looked at tuning the garbage collection ? Take a look at the following articles http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot -camp-draft/ http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html Changing to the concurrent or throughput collector should help with the long pauses. Colin. -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 11:37 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: Re: Solr and Garbage Collection Right, now I'm giving it 12GB of heap memory. If I give it less (10GB) it throws the following exception: Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 61) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3 52) at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2 67) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2 07) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java :70) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand ler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 03) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler .java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl ection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11 4) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java: 835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22 6) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4 42) On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel ionat...@gmail.com wrote: Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. Bigger heaps lead to bigger GC pauses in general. Do you mean that you are giving the JVM a 10GB heap? Were you getting OOM exceptions with a smaller heap? -Yonik http://www.lucidimagination.com