parsing many documents takes too long
Hi, My results from solr returns about 982 documents and I use jaxb to parse them into java objects, which takes about 469 ms, which is over my 150-200ms threshold. Is there a solution around this? Can I store the java objects in the index and return them in the solr response and then serialize them back into java objects? Would this take less time? Any other ideas? Thanks, Tri
sorting distance in solr 1.4.1
Hi, We are using solr 1.4.1 and we need to sort our results by distance. We have lat lons for each document in the response and our reference point. Is it possible? I read about the spatial plugin but the does range searching: http://blog.jayway.com/2010/10/27/geo-search-with-spatial-solr-plugin/ Doesn't talk about sorting the results by distance (as supported by solr 3.1). Tri
class not found
Hi, I wrote my own parser plugin. I'm getting a NoClassCefFoundError. Any ideas why? Apr 7, 2011 1:12:43 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.search.QParserPlugin at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1444) at org.apache.solr.core.SolrCore.init(SolrCore.java:548) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4450) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:850) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:724) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:493) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:583) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Tri
Re: class not found
yes. From: Ahmet Arslan iori...@yahoo.com To: solr-user@lucene.apache.org Sent: Thu, April 7, 2011 3:23:56 PM Subject: Re: class not found I wrote my own parser plugin. I'm getting a NoClassCefFoundError. Any ideas why? Did you put jar file - that contains you custom code - into /lib directory? http://wiki.apache.org/solr/SolrPlugins
Re: class not found
The jar containing the class is in here: /usr/local/apache-tomcat-6.0.20/webapps/solr/WEB-INF/lib for my setup. Tri From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, April 7, 2011 3:24:14 PM Subject: Re: class not found Can you give us some more details? I suspect the jar file containing your plugin isn't in the Solr lib directory and/or you don't have a lib directive in your solrconfig.xml file pointing to where your jar is. But that's a guess since you haven't provided any information about what you did to try to use your plugin, like how you deployed it, how you compiled it, how Best Erick On Thu, Apr 7, 2011 at 4:43 PM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, I wrote my own parser plugin. I'm getting a NoClassCefFoundError. Any ideas why? Apr 7, 2011 1:12:43 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.search.QParserPlugin at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1444) at org.apache.solr.core.SolrCore.init(SolrCore.java:548) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) ) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) ) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) ) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) ) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4450) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:850) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:724) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:493) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) ) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:583) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Tri
Re: adding a TimerTask
Seems like one way is to write a servlet who's init method creates a TimerTask. From: Tri Nguyen tringuye...@yahoo.com To: solr user solr-user@lucene.apache.org Sent: Fri, February 18, 2011 6:02:44 PM Subject: adding a TimerTask Hi, How can I add a TimerTask to Solr? Tri
Re: slave out of sync
there is an http api where I can look at the latest replication and whether there is an ERROR keyword. If so, the latest replication failed. From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org Sent: Wed, February 16, 2011 11:31:26 AM Subject: Re: slave out of sync Hi Tri, You could look at the stats page for each slave and compare the number of docs in them. The one(s) that are off from the rest/majority are out of sync. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Tri Nguyen tringuye...@yahoo.com To: solr-user@lucene.apache.org Sent: Mon, February 14, 2011 7:19:58 PM Subject: slave out of sync Hi, We're thinking of having a master-slave configuration where there are multiple slaves. Let's say during replication, one of the slaves does not replicate properly. How will we dectect that the 1 slave is out of sync? Tri
adding a TimerTask
Hi, How can I add a TimerTask to Solr? Tri
Re: rollback to other versions of index
Hi, Wanted to explain my situation in more detail. I have a master which never adds or deletes documents incrementally. I just run the dataimport with autocommit. Seems like I'll need to make a custom DeletionPolicy to keep more than one index around. I'm accessing indices from Solr. How do I tell solr to use a particular index? Thanks, Tri From: Michael McCandless luc...@mikemccandless.com To: solr-user@lucene.apache.org Sent: Tue, February 15, 2011 5:36:49 AM Subject: Re: rollback to other versions of index Lucene is able to do this, if you make a custom DeletionPolicy (which controls when commit points are deleted). By default Lucene only saves the most recent commit (KeepOnlyLastCommitDeletionPolicy), but if your policy keeps more around, then you can open an IndexReader or IndexWriter on any IndexCommit. Any changes (including optimize, and even opening a new IW with create=true) are safe within a commit; Lucene is fully transactional. For example, I use this for benchmarking: I save 4 commit points in a single index. First is a multi-segment index, second is the same index with 5% deletions, third is an optimized index, and fourth is the optimized index with 5% deletions. This gives me a single index w/ 4 different commit points, so I can then benchmark searching against any of those 4. Mike On Tue, Feb 15, 2011 at 4:43 AM, Jan Høydahl jan@cominvent.com wrote: Yes and no. The index grows like an onion adding new segments for each commit. There is no API to remove the newly added segments, but I guess you could hack something. The other problem is that as soon as you trigger an optimize() all history is gone as the segments are merged into one. Optimize normally happens automatically behind the scenes. You could turn off merging but that will badly hurt your performance after some time and ultimately crash your OS. Since you only need a few versions back, you COULD write your own custom mergePolicy, always preserving at least N versions. But beware that a version may be ONE document or 1 documents, depending on how you commit or if autoCommit is active. so if you go this route you also need strict control over your commits. Perhaps best option is to handle this on feeding client side, where you keep a buffer of N last docs. Then you can freely roll back or re-index as you choose, based on time, number of docs etc. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 15. feb. 2011, at 01.21, Tri Nguyen wrote: Hi, Does solr version each index build? We'd like to be able to rollback to not just a previous version but maybe a few version before the current one. Thanks, Tri
slave out of sync
Hi, We're thinking of having a master-slave configuration where there are multiple slaves. Let's say during replication, one of the slaves does not replicate properly. How will we dectect that the 1 slave is out of sync? Tri
rollback to other versions of index
Hi, Does solr version each index build? We'd like to be able to rollback to not just a previous version but maybe a few version before the current one. Thanks, Tri
running optimize on master
Hi, I've read running optimize is similar to running defrag on a hard disk. Deleted docs are removed and segments are reorganized for faster searching. I have a couple questions. Is optimize necessary if I never delete documents? I build the index every hour but we don't delete in between builds. Secondly, what kind of reorganizing of segments is done to make searches faster? Thanks, Tri
Re: running optimize on master
Does optimize merge all segments into 1 segment on the master after the build? Or after the build, there's only 1 segment. thanks, Tri From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Thu, February 10, 2011 5:08:44 PM Subject: Re: running optimize on master Optimizing isn't necessary in your scenario, as you don't delete documents and rebuild the whole thing each time anyway. As for faster searches, this has been largely been made obsolete by recent changes in how indexes are built in the first place. Especially as you can build your index in an hour, it's likely not big enough to benefit from optimizing even under the old scenario So, unless you have some evidence that your queries are performing poorly, I would just leave the optimize step off. Best Erick On Thu, Feb 10, 2011 at 7:09 PM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, I've read running optimize is similar to running defrag on a hard disk. Deleted docs are removed and segments are reorganized for faster searching. I have a couple questions. Is optimize necessary if I never delete documents? I build the index every hour but we don't delete in between builds. Secondly, what kind of reorganizing of segments is done to make searches faster? Thanks, Tri
solr current workding directory or reading config files
Hi, I have a class (in a jar) that reads from properties (text) files. I have these files in the same jar file as the class. However, when my class reads those properties files, those files cannot be found since solr reads from tomcat's bin directory. I don't really want to put the config files in tomcat's bin directory. How do I reconcile this? Tri
pre and post processing when building index
Hi, I'm scheduling solr to build every hour or so. I'd like to do some pre and post processing for each index build. The preprocessing would do some checks and perhaps will skip the build. For post processing, I will do some checks and either commit or rollback the build. Can I write some class and plugin into solr for this? Thanks, Tri
communication between entity processor and solr DataImporter
Hi, I'd like to communicate errors between my entity processor and the DataImporter in case of error. Should there be an error in my entity processor, I'd like the index build to rollback. How can I do this? I want to throw an exception of some sort. Only thing I can think of is to force a runtime exception be thrown in nextRow() of the entityprocessor since runtime exceptions are not checked and does not have to be declared in the nextRow() method signature. How can I request the nextRow() method signature be updated to throw Exception? Would it even make sense? Tri
Re: solr current workding directory or reading config files
Wanted to add some more details to my problem. I have many jars that have their own config files. So I'd have to copy files for every jar. Can solr read from the classpath (jar files)? Yes my war is always deployed to the same location under webapps. I do already have solr/home defined in web.xml. I'll try copying my files into there, but I would have to extract every jar file and do this manually. From: Wilkes, Chris cwil...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, February 9, 2011 3:44:03 PM Subject: Re: solr current workding directory or reading config files Is your war always deployed the the same location, ie /usr/mycomp/myapplication/webapps/myapp.war? If so then on startup copy the files out of your directory and put them under CATALINA_BASE/solr (usr/mycomp/myapplication/solr) and in your war file have the META-INF/context.xml JNDI setting point to that. Context Environment name=solr/home type=java.lang.String value=/usr/mycomp/myapplication/solr override=true / /Context If you know of a way to reference CATALINA_BASE in the context.xml that would make it easier. On Feb 9, 2011, at 12:00 PM, Tri Nguyen wrote: Hi, I have a class (in a jar) that reads from properties (text) files. I have these files in the same jar file as the class. However, when my class reads those properties files, those files cannot be found since solr reads from tomcat's bin directory. I don't really want to put the config files in tomcat's bin directory. How do I reconcile this? Tri
Re: communication between entity processor and solr DataImporter
I can throw DataImportHandlerException (a runtime exception) from my entityprocessor which will force a rollback. Tri From: Tri Nguyen tringuye...@yahoo.com To: solr-user@lucene.apache.org Sent: Wed, February 9, 2011 3:50:05 PM Subject: communication between entity processor and solr DataImporter Hi, I'd like to communicate errors between my entity processor and the DataImporter in case of error. Should there be an error in my entity processor, I'd like the index build to rollback. How can I do this? I want to throw an exception of some sort. Only thing I can think of is to force a runtime exception be thrown in nextRow() of the entityprocessor since runtime exceptions are not checked and does not have to be declared in the nextRow() method signature. How can I request the nextRow() method signature be updated to throw Exception? Would it even make sense? Tri
response when using my own QParserPlugin
Hi, I wrote a QParserPlugin. When I hit solr and use this QParserPlugin, the response does not have the column names associated with the data such as: 0 29 0 {!tnav} faketn1 CA city san francisco US 10 - - 495,496,497 500,657,498,499 us:ca:san francisco faketn,fakeregression 037.74 -122.49 faketn1 faketn1 faketn1 faketn1 faketn1 99902837 +3774-12250|+3774-12250@1|+3772-12252@2 94116:us 495,496,497 fakecs,fakeatti,fakevenable 500,657,498,499 San Francisco 667 US 37.742369 -122.491240 boldMain Dishes/bold boldPancakes/bold faketn1 2.99 Enjoy best chinese food. faketn1 1;0:0:0:0:8:20% off.0:0:0:3:0.0 4158281775 94116 ACTION_MODEL TN CA 2350 Taraval St Enjoy best chinese food 40233 - 5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3 2027 - How do I get the data to be associated with the index columns so I can parse it and know the context of the data (such as this data is the business name, this data is the address, etc). --- i was hoping it return something like this or some sort of structure. ?xml version=1.0 encoding=UTF-8 ? - response - lstname=responseHeader intname=status0/int intname=QTime1/int - lstname=params strname=indenton/str strname=start0/str strname=qI_NAME_EXACT:faketn1/str strname=rows10/str strname=version2.2/str /lst /lst - resultname=responsenumFound=1start=0 - doc - arrname=I_BASE_ID str-/str str-/str /arr strname=I_BLOCK_CATEGORY_ID495,496,497/str strname=I_CATEGORY_ID500,657,498,499/str strname=I_CITY_DISTRICTus:ca:san francisco/str strname=I_KEYWORDfaketn,fakeregression/str strname=I_LAT_RANGE037.74/str strname=I_LON_RANGE-122.49/str strname=I_NAME_AS_KEYWORDfaketn1/str strname=I_NAME_ENUMfaketn1/str strname=I_NAME_EXACTfaketn1/str strname=I_NAME_NGRAMfaketn1/str strname=I_NAME_PACKfaketn1/str strname=I_POI_ID99902837/str strname=I_SPATIAL_BLOCK+3774-12250|+3774-12250@1|+3772-12252@2/str strname=I_ZIP_DISTRICT94116:us/str strname=S_BLOCK_CATEGORY_ID495,496,497/str strname=S_BLOCK_KEYWORDSfakecs,fakeatti,fakevenable/str strname=S_CATEGORY_ID500,657,498,499/str strname=S_CITYSan Francisco/str strname=S_COMPAIGN_ID667/str strname=S_COUNTRYUS/str str name=S_FAX/ strname=S_LATITUDE37.742369/str strname=S_LONGTITUDE-122.491240/str strname=S_MENUboldMain Dishes/bold boldPancakes/bold faketn1 2.99/str strname=S_MERCHANT_CONTENTEnjoy best chinese food./str strname=S_NAMEfaketn1/str strname=S_OFFERS1;0:0:0:0:8:20% off.0:0:0:3:0.0/str strname=S_PHONE_NUMBER4158281775/str strname=S_POSTALCODE94116/str strname=S_PRICEMODEACTION_MODEL/str strname=S_SOURCE_NAMETN/str str name=S_SPONSOREDTEXT/ strname=S_STATECA/str strname=S_STREET2350 Taraval St/str str name=S_STREET2/ str name=S_SUIT/ strname=S_TAGLINEEnjoy best chinese food/str strname=S_TARGET_DISTANCE_IN_METER40233/str strname=S_TA_ID-/str strname=S_USER_ACTIONS5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3/str strname=S_VENDOR_ID2027/str str name=S_WEBURL/ strname=S_YPC_ID-/str /doc /result /response Tri
performance during index switch
Hi, Are there performance issues during the index switch? As the size of index gets bigger, response time slows down? Are there any studies on this? Thanks, Tri
Re: performance during index switch
Yes, during a commit. I'm planning to do as you suggested, having a master do the indexing and replicating the index to a slave which leads to my next questions. During the slave replicates the index files from the master, how does it impact performance on the slave? Tri --- On Wed, 1/19/11, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: performance during index switch To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, January 19, 2011, 11:30 AM During commit? A commit (and especially an optimize) can be expensive in terms of both CPU and RAM as your index grows larger, leaving less CPU for querying, and possibly less RAM which can cause Java GC slowdowns in some cases. A common suggestion is to use Solr replication to seperate out a Solr index that you index to, and then replicate to a slave index that actually serves your queries. This should minimize any performance problems on your 'live' Solr while indexing, although there's still something that has to be done for the actual replication of course. Haven't tried it yet myself. Plan to -- my plan is actually to put them both on the same server (I've only got one), but in seperate JVMs, and on a server with enough CPU cores that hopefully the indexing won't steal CPU the querying needs. On 1/19/2011 2:23 PM, Tri Nguyen wrote: Hi, Are there performance issues during the index switch? As the size of index gets bigger, response time slows down? Are there any studies on this? Thanks, Tri
Re: HTTP Status 400 - org.apache.lucene.queryParser.ParseException
what's the alternative? --- On Tue, 1/18/11, Erick Erickson erickerick...@gmail.com wrote: From: Erick Erickson erickerick...@gmail.com Subject: Re: HTTP Status 400 - org.apache.lucene.queryParser.ParseException To: solr-user@lucene.apache.org Date: Tuesday, January 18, 2011, 5:24 AM Why do you want to do this? Because toString has never been guaranteed to be re-parsable, even in Lucene, so it's not surprising that taking a Lucene toString() clause and submitting it to Solr doesn't work. Best Erick On Tue, Jan 18, 2011 at 4:49 AM, kun xiong xiongku...@gmail.com wrote: -- Forwarded message -- From: kun xiong xiongku...@gmail.com Date: 2011/1/18 Subject: HTTP Status 400 - org.apache.lucene.queryParser.ParseException To: solr-user@lucene.apache.org Hi all, I got a ParseException when I query solr with Lucene BooleanQuery expression (toString()). I use the default parser : LuceneQParserPlugin,which should support whole lucene syntax,right? Java Code: BooleanQuery bq = new BooleanQuery(); Query q1 = new TermQuery(new Term(I_NAME_ENUM, KFC)); Query q2 = new TermQuery(new Term(I_NAME_ENUM, MCD)); bq.add(q1, Occur.SHOULD); bq.add(q2, Occur.SHOULD); bq.setMinimumNumberShouldMatch(1); String solrQuery = bq.toString(); query string is : q=(I_NAME_ENUM:kfc I_NAME_ENUM:best western)~1 Exceptions : *message* *org.apache.lucene.queryParser.ParseException: Cannot parse '(I_NAME_ENUM:kfc I_NAME_ENUM:best western)~1': Encountered FUZZY_SLOP ~1 at line 1, column 42. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... ( ... * ... ^ ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... * *description* *The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse '(I_NAME_ENUM:kfc I_NAME_ENUM:best western)~1': Encountered FUZZY_SLOP ~1 at line 1, column 42. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... ( ... * ... ^ ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... ).* * * Anyone could help? Thanks Kun * *
using dismax
Hi, Maybe I'm missing something obvious. I'm trying to use the dismax parser and it doesn't seem like I'm using it properly. When I do this: http://localhost:8080/solr/cs/select?q=(poi_id:3) I get a row returned. When I incorporate dismax and say mm=1, no results get returned. http://localhost:8080/solr/cs/select?q=(poi_id:3)defType=dismaxmm=1 What I wanted to do when I specify mm=1 is to say at least 1 query parameter matches. What am I missing? Thanks, Tri
abort data import on errors
Hi, Is there a way to specify to abort (rollback) the data import should there be an error/exception? If everything runs smoothly, commit the data import. Thanks, Tri
Re: abort data import on errors
I didn't want to issue the rollback command but have solr automatically detect exceptions and rollback should there be exceptions. Probably there's an attribute I can configure to specify this for solr to understand. Tri --- On Tue, 1/4/11, Markus Jelsma markus.jel...@openindex.io wrote: From: Markus Jelsma markus.jel...@openindex.io Subject: Re: abort data import on errors To: solr-user@lucene.apache.org Date: Tuesday, January 4, 2011, 4:57 PM http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 Hi, Is there a way to specify to abort (rollback) the data import should there be an error/exception? If everything runs smoothly, commit the data import. Thanks, Tri
solr benchmarks
Hi, I remember going through some page that had graphs of response times based on index size for solr. Anyone know of such pages? Internally, we have some requirements for response times and I'm trying to figure out when to shard the index. Thanks, Tri
exception obtaining write lock on startup
Hi, I'm getting this exception when I have 2 cores as masters. Seems like one of the cores obtains a lock (file) and then the other tries to obtain the same one. However, the first one is not deleted. How do I fix this? Dec 30, 2010 4:34:48 PM org.apache.solr.handler.ReplicationHandler inform WARNING: Unable to get IndexCommit on startup org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: Native fsl...@..\webapps\solr\tnsolr\data\index\lucene-fe3fc928a4bbfeb55082e49b32a70c10 -write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1565) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1421) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:19 1) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHand ler.java:98) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHa ndler2.java:173) at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpd ateHandler2.java:376) at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler. Tri
Re: shard versus core
Hi Erick, Thanks for the explanation. At which point does the index get too big where sharding is appropriate where it affects performance? Tri --- On Sun, 12/19/10, Erick Erickson erickerick...@gmail.com wrote: From: Erick Erickson erickerick...@gmail.com Subject: Re: shard versus core To: solr-user@lucene.apache.org Date: Sunday, December 19, 2010, 7:36 AM Well, they can be different beasts. First of all, different cores can have different schemas, which is not true of shards. Also, shards are almost assumed to be running on different machines as a scaling technique, whereas it multiple cores are run on a single Solr instance. So using multiple cores is very similar to running multiple virtual Solr serves on a single machine, each independent of the other. This can make sense if, for instance, you wanted to have a bunch of small indexes all on one machine. You could use multiple cores rather than multiple instances of Solr. These indexes may or may not have anything to do with each other. Sharding, on the other hand, is almost always used to split a single logical index up amongst multiple machines in order to improve performance. The assumption usually is that the index is too big to give satisfactory performance on a single machine, so you'll split it into parts. That assumption really implies that it makes no sense to put multiple shards on the #same# machine. So really, the answer to your question is that you choose the right technique for the problem you're trying to solve. They aren't really different solutions to the same problem... Hope this helps. Erick On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, Was wondering about the pro's and con's of using sharding versus cores. An index can be split up to multiple cores or multilple shards. So why one over the other? Thanks, tri
Re: shard versus core
Thought about it some more and after some reading. I suppose the answer depends on what kind of response time is expected to be good enough. I can do some stress testing and see if disk i/o is the bottleneck as the index grows. I can also look into optimizing/configuring solr parameters to help performance. One thing I've read is my disk should be at least 2 times the index. --- On Mon, 12/20/10, Tri Nguyen tringuye...@yahoo.com wrote: From: Tri Nguyen tringuye...@yahoo.com Subject: Re: shard versus core To: solr-user@lucene.apache.org Date: Monday, December 20, 2010, 4:04 AM Hi Erick, Thanks for the explanation. At which point does the index get too big where sharding is appropriate where it affects performance? Tri --- On Sun, 12/19/10, Erick Erickson erickerick...@gmail.com wrote: From: Erick Erickson erickerick...@gmail.com Subject: Re: shard versus core To: solr-user@lucene.apache.org Date: Sunday, December 19, 2010, 7:36 AM Well, they can be different beasts. First of all, different cores can have different schemas, which is not true of shards. Also, shards are almost assumed to be running on different machines as a scaling technique, whereas it multiple cores are run on a single Solr instance. So using multiple cores is very similar to running multiple virtual Solr serves on a single machine, each independent of the other. This can make sense if, for instance, you wanted to have a bunch of small indexes all on one machine. You could use multiple cores rather than multiple instances of Solr. These indexes may or may not have anything to do with each other. Sharding, on the other hand, is almost always used to split a single logical index up amongst multiple machines in order to improve performance. The assumption usually is that the index is too big to give satisfactory performance on a single machine, so you'll split it into parts. That assumption really implies that it makes no sense to put multiple shards on the #same# machine. So really, the answer to your question is that you choose the right technique for the problem you're trying to solve. They aren't really different solutions to the same problem... Hope this helps. Erick On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, Was wondering about the pro's and con's of using sharding versus cores. An index can be split up to multiple cores or multilple shards. So why one over the other? Thanks, tri
master master, repeaters
Hi, In the master-slave configuration, I'm trying to figure out how to configure the system setup for master failover. Does solr support master-master setup? From my readings, solr does not. I've read about repeaters as well where the slave can act as a master. When the main master goes down, do the other slaves switch to the repeater? Barring better solutions, I'm thinking about putting 2 masters behind a load balancer. If this is not implemented already, perhaps solr can be updated to support a list of masters for fault tolerance. Tri
shard versus core
Hi, Was wondering about the pro's and con's of using sharding versus cores. An index can be split up to multiple cores or multilple shards. So why one over the other? Thanks, tri
Re: master master, repeaters
How do we tell the slaves to point to the new master without modifying the config files? Can we do this while the slave is up, issuing a command to it? Thanks, Tri --- On Sun, 12/19/10, Upayavira u...@odoko.co.uk wrote: From: Upayavira u...@odoko.co.uk Subject: Re: master master, repeaters To: solr-user@lucene.apache.org Date: Sunday, December 19, 2010, 10:13 AM We had a (short) thread on this late last week. Solr doesn't support automatic failover of the master, at least in 1.4.1. I've been discussing with my colleague (Tommaso) about ways to achieve this. There's ways we could 'fake it', scripting the following: * set up a 'backup' master, as a replica of the actual master * monitor the master for 'up-ness' * if it fails: * tell the master to start indexing to the backup instead * tell the slave(s) to connect to a different master (the backup) * then, when the master is back: * wipe its index (backing up dir first?) * configure it to be a backup of the new master * make it pull a fresh index over But, Jan Høydahl suggested using SolrCloud. I'm going to follow up on how that might work in that thread. Upayavira On Sun, 19 Dec 2010 00:20 -0800, Tri Nguyen tringuye...@yahoo.com wrote: Hi, In the master-slave configuration, I'm trying to figure out how to configure the system setup for master failover. Does solr support master-master setup? From my readings, solr does not. I've read about repeaters as well where the slave can act as a master. When the main master goes down, do the other slaves switch to the repeater? Barring better solutions, I'm thinking about putting 2 masters behind a load balancer. If this is not implemented already, perhaps solr can be updated to support a list of masters for fault tolerance. Tri
solr immediate response on data import
Hi, I do a data import with commit=false. I get the response back saying it's idle and Total number of rows skipped = -1 Total number of rows processed = -1 This is the very first time after i start solr. Subsequent times it doesn't return -1 but the rows it read from the datasource. Why does it return -1? And how would I interpret this? Did the dataimport fail? thank, Tri
customer ping response
Can I have a custom xml response for the ping request? thanks, Tri
Re: customer ping response
I need to return this: ?xml version=1.0 encoding=UTF-8? admin status nameServer/name valueok/value /status /admin From: Markus Jelsma markus.jel...@openindex.io To: solr-user@lucene.apache.org Cc: Tri Nguyen tringuye...@yahoo.com Sent: Tue, December 7, 2010 4:27:32 PM Subject: Re: customer ping response Of course! The ping request handler behaves like any other request handler and accepts at last the wt parameter [1]. Use xslt [2] to transform the output to any desirable form or use other response writers [1]. Why anyway, is it a load balancer that only wants an OK output or something? [1]: http://wiki.apache.org/solr/CoreQueryParameters [2]: http://wiki.apache.org/solr/XsltResponseWriter [3]: http://wiki.apache.org/solr/QueryResponseWriter Can I have a custom xml response for the ping request? thanks, Tri
Re: customer ping response
Hi, I'm reading the wiki. What does q=apache mean in the url? http://localhost:8983/solr/select/?stylesheet=q=apachewt=xslttr=example.xsl thanks, tri From: Markus Jelsma markus.jel...@openindex.io To: Tri Nguyen tringuye...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Tue, December 7, 2010 4:35:28 PM Subject: Re: customer ping response Well, you can go a long way with xslt but i wouldn't know how to embed the server name in the response as Solr simply doesn't return that information. You'd have to patch the response Solr's giving or put a small script in front that can embed the server name. I need to return this: ?xml version=1.0 encoding=UTF-8? admin status nameServer/name valueok/value /status /admin From: Markus Jelsma markus.jel...@openindex.io To: solr-user@lucene.apache.org Cc: Tri Nguyen tringuye...@yahoo.com Sent: Tue, December 7, 2010 4:27:32 PM Subject: Re: customer ping response Of course! The ping request handler behaves like any other request handler and accepts at last the wt parameter [1]. Use xslt [2] to transform the output to any desirable form or use other response writers [1]. Why anyway, is it a load balancer that only wants an OK output or something? [1]: http://wiki.apache.org/solr/CoreQueryParameters [2]: http://wiki.apache.org/solr/XsltResponseWriter [3]: http://wiki.apache.org/solr/QueryResponseWriter Can I have a custom xml response for the ping request? thanks, Tri
dataimports response returns before done?
Hi, After issueing a dataimport, I've noticed solr returns a response prior to finishing the import. Is this correct? Is there anyway i can make solr not return until it finishes? If not, how do I ping for the status whether it finished or not? thanks, tri
sorl response xsd
Hi, I'm trying to look for the solr response xsd. Is this it here? https://issues.apache.org/jira/browse/SOLR-17 I'd basically want to know if the data import passed or failed. I can get the xml string and search for completed, but would wondering if I can use and xsd to parse the response. Or is there another way? Here's the response I have and I don't see in the xsd the lst element for statusMessages. xml version=1.0 encoding=UTF-8 ? - response + lst name=responseHeader int name=status0/int int name=QTime15/int /lst + lst name=initArgs - lst name=defaults str name=configdata-config.xml/str /lst /lst str name=commandfull-import/str str name=statusidle/str str name=importResponse / - lst name=statusMessages str name=Total Requests made to DataSource0/str str name=Total Rows Fetched0/str str name=Total Documents Skipped0/str str name=Full Dump Started2010-11-22 17:20:42/str str name=Indexing completed. Added/Updated: 0 documents. Deleted 0 documents./str str name=Committed2010-11-22 17:20:43/str str name=Optimized2010-11-22 17:20:43/str str name=Total Documents Processed0/str str name=Time taken0:0:0.375/str /lst str name=WARNINGThis response format is experimental. It is likely to change in the future./str /response Thanks, Tri
data import scheduling
Hi, Has anyone gotten solr to schedule data imports at a certain time interval through configuring solr? I tried setting interval=1, which is import every minute but I don't see it happening. I'm trying to avoid cron jobs. Thanks, Tri
importing from java
Hi, I'm restricted to the following in regards to importing. I have access to a list (Iterator) of Java objects I need to import into solr. Can I import the java objects as part of solr's data import interface (whenever an http request to solr to do a dataimport, it'll call my java class to get objects)? Before I had direct read only access to the db and specified the column mappings and things were fine with the data import. But now I am restricted to using a .jar file that has an api to get the records in the database and I need to publish these records in the db. I do see solrj and but solrj is seaparate from the solr webapp. Can I write my own dataimporthandler? Thanks, Tri
Re: importing from java
another question is, can I write my own DataImportHandler class? thanks, Tri From: Tri Nguyen tringuye...@yahoo.com To: solr user solr-user@lucene.apache.org Sent: Thu, November 11, 2010 7:01:25 PM Subject: importing from java Hi, I'm restricted to the following in regards to importing. I have access to a list (Iterator) of Java objects I need to import into solr. Can I import the java objects as part of solr's data import interface (whenever an http request to solr to do a dataimport, it'll call my java class to get objects)? Before I had direct read only access to the db and specified the column mappings and things were fine with the data import. But now I am restricted to using a .jar file that has an api to get the records in the database and I need to publish these records in the db. I do see solrj and but solrj is seaparate from the solr webapp. Can I write my own dataimporthandler? Thanks, Tri
Re: scheduling imports and heartbeats
i'm looking for another solution other than cron job. can i configure solr to schedule imports? From: Ranveer Kumar ranveer.s...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, November 9, 2010 8:13:03 PM Subject: Re: scheduling imports and heartbeats You should use cron for that.. On 10 Nov 2010 08:47, Tri Nguyen tringuye...@yahoo.com wrote: Hi, Can I configure solr to schedule imports at a specified time (say once a day, once an hour, etc)? Also, does solr have some sort of heartbeat mechanism? Thanks, Tri
Re: scheduling imports and heartbeats
Thanks for the tip Ken. I tried that but don't see the importing happening when I check up on the status. Below is what's in my dataimport.properties. #Wed Nov 10 11:36:28 PST 2010 metadataObject.last_index_time=2010-09-20 11\:12\:47 interval=1 port=8080 server=localhost params=/select?qt\=/dataimportcommand\=full-importclean\=truecommit\=true webapp=solr id.last_index_time=2010-11-10 11\:36\:27 syncEnabled=1 last_index_time=2010-11-10 11\:36\:27 From: Ken Stanley doh...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, November 10, 2010 4:41:17 AM Subject: Re: scheduling imports and heartbeats On Tue, Nov 9, 2010 at 10:16 PM, Tri Nguyen tringuye...@yahoo.com wrote: Hi, Can I configure solr to schedule imports at a specified time (say once a day, once an hour, etc)? Also, does solr have some sort of heartbeat mechanism? Thanks, Tri Tri, If you use the DataImportHandler (DIH), you can set up a dataimport.properties file that can be configured to import on intervals. http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example As for heartbeat, you can use the ping handler (default is /admin/ping) to check the status of the servlet. - Ken
scheduling imports and heartbeats
Hi, Can I configure solr to schedule imports at a specified time (say once a day, once an hour, etc)? Also, does solr have some sort of heartbeat mechanism? Thanks, Tri
searching while importing
Hi, Can I perform searches against the index while it is being imported? Does importing add 1 document at a time or will solr make a temporary index and switch to that index when indexing is done? Thanks, Tri
Re: searching while importing
Hi, As long as I can search on the current (older) index while importing, I'm good. I've tested this and I can search the older index while data-importing the newer index. So you can search the older index in your 5 hour wait? Thanks, Tri From: Shawn Heisey s...@elyograg.org To: solr-user@lucene.apache.org Sent: Wed, October 13, 2010 3:38:48 PM Subject: Re: searching while importing If you are using the DataImportHandler, you will not be able to search new data until the full-import or delta-import is complete and the update is committed. When I do a full reindex, it takes about 5 hours, and until it is finished, I cannot search it. I have not tried to issue a manual commit in the middle of an import to see whether that makes data inserted up to that point searchable, but I would not expect that to work. If you need this kind of functionality, you may need to change your build system so that a full import clears the index manually and then does a series of delta-import batches. On 10/13/2010 3:51 PM, Tri Nguyen wrote: Hi, Can I perform searches against the index while it is being imported? Does importing add 1 document at a time or will solr make a temporary index and switch to that index when indexing is done? Thanks, Tri