Re: Custom update handler?
You need to refer to your chain in a RequestHandler config. Search for /update, duplicate that, and change the chain it points to. Upayavira On Mon, Mar 11, 2013, at 05:22 AM, Jack Park wrote: With 4.1, not in cloud configuration, I have a custom response handler chain which injects an additional handler for studying the documents as they come in. But, when I do partial updates on those documents, I don't want them to be studied again, so I created another version of the same chain, but without my added feature. I named it /partial. When I create an instance of SolrJ for the url server/solr/partial, I get back this error message: Server at http://localhost:8983/solr/partial returned non ok status:404, message:Not Found {locator=2146fd50-fac9-47d5-85c0-47aaeafe177f, tuples={set=99edfffe-b65c-4b5e-9436-67085ce49c9c}} Here is the configuration for that: updateRequestProcessorChain name=/partial default=false processor class=solr.RunUpdateProcessorFactory/ processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain The normal handler chain is this: updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain which runs on a SolrJ set for http://localhost:8983/solr/ What might I be missing? Many thanks Jack
abc.def@gmail* not retrieved but without double quotes retrieved
I have the following field type: fieldtype name=email_type class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ /analyzer /fieldtype the following field: field name=email type=email_type indexed=true stored=true/ I add the value abc@gmail.com to this email field. When I search : 1. abc.def@gmail* - I get the result. 2. abc.def@gmail* (without doeble quotes) - I dont get the result. Am I missing something regarding wildcards and exact phrase searches? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/abc-def-gmail-not-retrieved-but-without-double-quotes-retrieved-tp4046268.html Sent from the Solr - User mailing list archive at Nabble.com.
Zookeeper and DataImportHandler properties
I realize this is not a zookeeper specific mailing list, but I am wondering if anybody has a simple process for updating zookeeper files other than restarting a solr instance? Specifically the data-import.properties value, which doesn't appear to be written to disk, but, rather, only exists in zookeeper itself. How can I edit this value? I am unfamiliar with zkCli.sh and am not sure how to add new lines to manually entered set commands. Regards, Nate -- CTO Zenlok株式会社
Re: Zookeeper and DataImportHandler properties
I use zookeeper eclipse plugin: http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper -- View this message in context: http://lucene.472066.n3.nabble.com/Zookeeper-and-DataImportHandler-properties-tp4046269p4046270.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud: port out of range:-1
On the end i want 3 servers, this was only a test. I now that a majority of servers is needed to provide service. I read some tutorials about zookeeper and looked at the wiki. I installed zookeeper seperate on the server and connect them with eachother(zoo.cfg). In the log i see the zookeeper know eachother. When i start SOLR, i used the -Dzkhost parameter to declare the zookeepers of the servers: -Dzkhost=ip:2181,ip:2181,ip:2181 It works great:) ps. With embedded zookeepers i can't get it working. With a second server in the zkhost it returns a error. Strange, but for me the seperate zookeepers is a great solution, seperate logs and easy to use on other zookeeper servers(in the future i want to seperate in 3 solr instances and 5 zookeeper instances). THANKS -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-port-out-of-range-1-tp4045804p4046278.html Sent from the Solr - User mailing list archive at Nabble.com.
How to set Configuration setting for search
Hi Team , In Solr how to setFREE TEXT SEARCH configuration. Is there any regular expression setting so that I can configure to obtain search results. With Warm Regards Deepshikha Raghav IBM , Gurgaon --- Mobile-+91-8800140037
Highlighting problems
Hi all, I have problems with the higlighting mechanism: The query is: http://127.0.0.1:8983/solr/mpiwgweb/select?facet=truefacet.field=descriptionfacet.field=langfacet.field=main_contentstart=0q=meier+AND+%28description:member+OR+description:project%29 after that: In the field main_content which is the default search field. meier as well as as member and project is highlighted, although im searching for member and project only in the field description. The search results are ok, as far as I can see. my settings requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=facet.limit300/str str name=hlon/str str name=hl.flmain_content/str str name=hl.encoderhtml/str str name=hl.simple.pre![CDATA[em class=webSearch_hl]]/str str name=hl.fragsize200/str str name=hl.snippets2/str str name=hl.usePhraseHighlightertrue/str /lst arr name=last-components strtvComponent/str /arr /requestHandler Cheers Dirk
How to Integrate Solr With Hbase
I have crawled data into Hbase with my Nutch. How can I use Solr to index the data at Hbase? (Is there any solution from Nutch side, you are welcome) PS: I am new to such kind of technologies and I run Solr from under example folder as startup.jar -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Integrate-Solr-With-Hbase-tp4046297.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: Highlighting problems
Hi Dirk, please check http://wiki.apache.org/solr/HighlightingParameters#hl.requireFieldMatch - this may help you. Regards, André Von: Dirk Wintergruen [dwin...@mpiwg-berlin.mpg.de] Gesendet: Montag, 11. März 2013 13:56 An: solr-user@lucene.apache.org Betreff: Highlighting problems Hi all, I have problems with the higlighting mechanism: The query is: http://127.0.0.1:8983/solr/mpiwgweb/select?facet=truefacet.field=descriptionfacet.field=langfacet.field=main_contentstart=0q=meier+AND+%28description:member+OR+description:project%29 after that: In the field main_content which is the default search field. meier as well as as member and project is highlighted, although im searching for member and project only in the field description. The search results are ok, as far as I can see. my settings requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=facet.limit300/str str name=hlon/str str name=hl.flmain_content/str str name=hl.encoderhtml/str str name=hl.simple.pre![CDATA[em class=webSearch_hl]]/str str name=hl.fragsize200/str str name=hl.snippets2/str str name=hl.usePhraseHighlightertrue/str /lst arr name=last-components strtvComponent/str /arr /requestHandler Cheers Dirk
Re: AW: Highlighting problems
Hi Andre, thanks this did the job. I also had to enable edismax and set the default parameter there - otherwise no highlighting at all. Best Dirk Am 11.03.2013 um 13:59 schrieb André Widhani andre.widh...@digicol.de: Hi Dirk, please check http://wiki.apache.org/solr/HighlightingParameters#hl.requireFieldMatch - this may help you. Regards, André Von: Dirk Wintergruen [dwin...@mpiwg-berlin.mpg.de] Gesendet: Montag, 11. März 2013 13:56 An: solr-user@lucene.apache.org Betreff: Highlighting problems Hi all, I have problems with the higlighting mechanism: The query is: http://127.0.0.1:8983/solr/mpiwgweb/select?facet=truefacet.field=descriptionfacet.field=langfacet.field=main_contentstart=0q=meier+AND+%28description:member+OR+description:project%29 after that: In the field main_content which is the default search field. meier as well as as member and project is highlighted, although im searching for member and project only in the field description. The search results are ok, as far as I can see. my settings requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=facet.limit300/str str name=hlon/str str name=hl.flmain_content/str str name=hl.encoderhtml/str str name=hl.simple.pre![CDATA[em class=webSearch_hl]]/str str name=hl.fragsize200/str str name=hl.snippets2/str str name=hl.usePhraseHighlightertrue/str /lst arr name=last-components strtvComponent/str /arr /requestHandler Cheers Dirk
Boosting based on filter query
I want to be able to boost results where the filetype is a pdf: Here is some pseudo code so I don't misrepresent/misinterpret via a URL: search(foobar) foreach result (where filetype==pdf) { boost^10 } Is there a way to do this? Thanks in advance!
Re: Boosting based on filter query
Definitely can do this, but how depends on the query parser you're using. With dismax/edismax you can use bq=filetype:pdf^10 (where filetype:pdf is a valid Lucene query parser expression for your documents). Erik On Mar 11, 2013, at 09:31 , Van Tassell, Kristian wrote: I want to be able to boost results where the filetype is a pdf: Here is some pseudo code so I don't misrepresent/misinterpret via a URL: search(foobar) foreach result (where filetype==pdf) { boost^10 } Is there a way to do this? Thanks in advance!
Re: abc.def@gmail* not retrieved but without double quotes retrieved
The simple rule is that a wildcard suppresses any analysis steps that are not multi-term aware. Unfortunately, the word delimiter filter is not multi-term aware (the lower case filter is). So, the query tries to find abc.def@gmail as a single (wildcard) term, which it won't find, since the index-time analysis will have indexed that same text as the three terms abc def gmail. Your query with double quotes works because the asterisk is treated as a simple punctuation character that the word delimiter filter ignores, so your query is equivalent to abc def gmail, exactly as those terms were indexed. -- Jack Krupansky -Original Message- From: adfel70 Sent: Monday, March 11, 2013 5:36 AM To: solr-user@lucene.apache.org Subject: abc.def@gmail* not retrieved but without double quotes retrieved I have the following field type: fieldtype name=email_type class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ /analyzer /fieldtype the following field: field name=email type=email_type indexed=true stored=true/ I add the value abc@gmail.com to this email field. When I search : 1. abc.def@gmail* - I get the result. 2. abc.def@gmail* (without doeble quotes) - I dont get the result. Am I missing something regarding wildcards and exact phrase searches? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/abc-def-gmail-not-retrieved-but-without-double-quotes-retrieved-tp4046268.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Boosting based on filter query
Thank you! -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Monday, March 11, 2013 8:50 AM To: solr-user@lucene.apache.org Subject: Re: Boosting based on filter query Definitely can do this, but how depends on the query parser you're using. With dismax/edismax you can use bq=filetype:pdf^10 (where filetype:pdf is a valid Lucene query parser expression for your documents). Erik On Mar 11, 2013, at 09:31 , Van Tassell, Kristian wrote: I want to be able to boost results where the filetype is a pdf: Here is some pseudo code so I don't misrepresent/misinterpret via a URL: search(foobar) foreach result (where filetype==pdf) { boost^10 } Is there a way to do this? Thanks in advance!
Re: Boost maximum match in a field
I'm curious if the default ranking doesn't already return these in 3,2,1 order. Doc 3 should get an implicit boost with norms enabled for your title field, so make sure the title field has omitNorms=false, ie. in schema.xml: field name=title ... omitNorms=false/ Tim On Mon, Mar 11, 2013 at 8:02 AM, Nicholas Ding nicholas...@gmail.comwrote: Hello, I was wondering how to boost a maximum match in a field. For example, you have few documents has different length of title. Doc 1: Title: Ford Car Body Parts Doc 2: Title: 2012 Ford Car Doc 3: Title: Ford Car If user searching for Ford Car, how to make the Doc 3 has the highest score? Thanks Nicholas
Re: Boost maximum match in a field
The length normalization factor is a very coarse value, so it may not be fine-grained enough to distinguish these particular field lengths. Normally, it is a short vs. long distinction rather than actual length. In any case, add debugQuery=true to your query and look at the explain section see how norm is either different or the same for these three documents in the results. The norm may in fact be fine, but maybe some other factors overwhelm the overall score. -- Jack Krupansky -Original Message- From: Timothy Potter Sent: Monday, March 11, 2013 10:43 AM To: solr-user@lucene.apache.org Subject: Re: Boost maximum match in a field I'm curious if the default ranking doesn't already return these in 3,2,1 order. Doc 3 should get an implicit boost with norms enabled for your title field, so make sure the title field has omitNorms=false, ie. in schema.xml: field name=title ... omitNorms=false/ Tim On Mon, Mar 11, 2013 at 8:02 AM, Nicholas Ding nicholas...@gmail.comwrote: Hello, I was wondering how to boost a maximum match in a field. For example, you have few documents has different length of title. Doc 1: Title: Ford Car Body Parts Doc 2: Title: 2012 Ford Car Doc 3: Title: Ford Car If user searching for Ford Car, how to make the Doc 3 has the highest score? Thanks Nicholas
SolrCloud index timeout
Hi, I have the next issue: I have a collection with a leader and a replica, both are synchronized. When I try to index data to this collection I have a timeout error (the output is python): (class 'requests.exceptions.Timeout', Timeout(TimeoutError(HTTPConnectionPool(host='192.168.20.50', port=8983): Request timed out. (timeout=60.0),),), traceback object at 0x7f64c033b908) Now, I can't index any document to this collection because I have always the timeout error. In the tomcat I have about 100 thread stuck, S 11393624 ms 0 KB30 KB 192.168.20.47 192.168.20.50 POST /solr/ST-4A46DF1563_0612/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.20.48%3A8983%2Fsolr%2FST-4A46DF1563_0612%2Fwt=javabinversion=2 HTTP/1.1 Someone have any idea that what can be happening and why I can't index any document to the collection? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-index-timeout-tp4046348.html Sent from the Solr - User mailing list archive at Nabble.com.
Some nodes have all the load
I was doing some rolling updates of my cluster ( 12 cores, 4 servers ) and I ended up in a situation where one node was elected leader by all the cores. This seemed very taxing to that one node. It was also still trying to serve query requests so it slowed everything down. I'm trying to do a lot of frequent atomic updates along with some periodic DIH syncs. My solution to this situation was to try to take the supreme leader out of the cluster and let the leader election start. This was not easy as there was so much load on it, I couldn't take it out gracefully. Some of my cores became unreachable for a while. This was all under fictitious load, but it made me nervous about high load production situation. I'm sure there's several things I'm doing wrong in all this, so I thought I'd see what you guys think. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Some-nodes-have-all-the-load-tp4046349.html Sent from the Solr - User mailing list archive at Nabble.com.
writing doc to another collection from UpdateReqeustProcessor
Whats the best approach in writing the current doc inside an UpdateRequestProcessor to another collection ? Would i just call up CloudSolrServer and process it as i normally would in solrj? Thanks msj
Re: SolrCloud index timeout
What Solr version? Are you mixing deletes and adds? Do you have more than one shard for a collection per machine? ie are you oversharding? Can you post the stack traces (using jstack, or jconsolr, or visualvm, or…)? - Mark On Mar 11, 2013, at 11:39 AM, yriveiro yago.rive...@gmail.com wrote: Hi, I have the next issue: I have a collection with a leader and a replica, both are synchronized. When I try to index data to this collection I have a timeout error (the output is python): (class 'requests.exceptions.Timeout', Timeout(TimeoutError(HTTPConnectionPool(host='192.168.20.50', port=8983): Request timed out. (timeout=60.0),),), traceback object at 0x7f64c033b908) Now, I can't index any document to this collection because I have always the timeout error. In the tomcat I have about 100 thread stuck, S 11393624 ms 0 KB30 KB 192.168.20.47 192.168.20.50 POST /solr/ST-4A46DF1563_0612/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2F192.168.20.48%3A8983%2Fsolr%2FST-4A46DF1563_0612%2Fwt=javabinversion=2 HTTP/1.1 Someone have any idea that what can be happening and why I can't index any document to the collection? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-index-timeout-tp4046348.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: writing doc to another collection from UpdateReqeustProcessor
Sure, seems reasonable. - Mark On Mar 11, 2013, at 11:52 AM, mike st. john mstj...@gmail.com wrote: Whats the best approach in writing the current doc inside an UpdateRequestProcessor to another collection ? Would i just call up CloudSolrServer and process it as i normally would in solrj? Thanks msj
Re: Some nodes have all the load
There is an open JIRA issue about trying to spread the leader load during elections. Was waiting to get reports that it was really a problem for someone though. How much load were you putting on? How long were the nodes unresponsive? Unresponsive to everything? Just updates? Searches? What version of Solr? How many shards do you have? Collections? - Mark On Mar 11, 2013, at 11:41 AM, jimtronic jimtro...@gmail.com wrote: I was doing some rolling updates of my cluster ( 12 cores, 4 servers ) and I ended up in a situation where one node was elected leader by all the cores. This seemed very taxing to that one node. It was also still trying to serve query requests so it slowed everything down. I'm trying to do a lot of frequent atomic updates along with some periodic DIH syncs. My solution to this situation was to try to take the supreme leader out of the cluster and let the leader election start. This was not easy as there was so much load on it, I couldn't take it out gracefully. Some of my cores became unreachable for a while. This was all under fictitious load, but it made me nervous about high load production situation. I'm sure there's several things I'm doing wrong in all this, so I thought I'd see what you guys think. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Some-nodes-have-all-the-load-tp4046349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [Beginner] wants to contribute in open source project
On Mar 11, 2013, at 11:14 AM, chandresh pancholi chandreshpancholi...@gmail.com wrote: I am beginner in this field. It would be great if you help me out. I love to code in java. can you guys share some link so that i can start contributing in solr/lucene project. This article I wrote about getting started contributing to projects may give you some ideas. http://blog.smartbear.com/software-quality/bid/167051/14-Ways-to-Contribute-to-Open-Source-without-Being-a-Programming-Genius-or-a-Rock-Star I don't have tasks specifically for the Solr project (does Solr have such a list for newcomers to help on?) but I hope that you'll get some ideas. xoa -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: [Beginner] wants to contribute in open source project
You can also take a look at http://wiki.apache.org/solr/HowToContribute Tomás On Mon, Mar 11, 2013 at 9:20 AM, Andy Lester a...@petdance.com wrote: On Mar 11, 2013, at 11:14 AM, chandresh pancholi chandreshpancholi...@gmail.com wrote: I am beginner in this field. It would be great if you help me out. I love to code in java. can you guys share some link so that i can start contributing in solr/lucene project. This article I wrote about getting started contributing to projects may give you some ideas. http://blog.smartbear.com/software-quality/bid/167051/14-Ways-to-Contribute-to-Open-Source-without-Being-a-Programming-Genius-or-a-Rock-Star I don't have tasks specifically for the Solr project (does Solr have such a list for newcomers to help on?) but I hope that you'll get some ideas. xoa -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: Memory Guidance
On 3/10/2013 8:00 PM, jimtronic wrote: I'm having trouble finding some problems while load testing my setup. If you saw these numbers on your dashboard, would they worry you? Physical Memory 97.6% 14.64 GB of 15.01 GB File Descriptor Count 19.1% 196 of 1024 JVM-Memory 95% 1.67 GB (dark gray) 1.76 GB (med gray) 1.76 GB What OS? If it's a unix/linux environment, the full output of the 'free' command will be important. Generally speaking, it's normal for any computer (client or server, regardless of OS) to use all available memory when under load. Thanks, Shawn
Re: Memory Guidance
On 3/11/2013 11:14 AM, Shawn Heisey wrote: On 3/10/2013 8:00 PM, jimtronic wrote: I'm having trouble finding some problems while load testing my setup. If you saw these numbers on your dashboard, would they worry you? Physical Memory 97.6% 14.64 GB of 15.01 GB File Descriptor Count 19.1% 196 of 1024 JVM-Memory 95% 1.67 GB (dark gray) 1.76 GB (med gray) 1.76 GB What OS? If it's a unix/linux environment, the full output of the 'free' command will be important. Generally speaking, it's normal for any computer (client or server, regardless of OS) to use all available memory when under load. Replying to myself. The cold must be getting to me. :) If nothing else is running on this server except for Solr, and your index is less than 15GB in size, these numbers would not worry me at all. If your index is less than 30GB in size, you might still be OK, but at that point your index would exceed available RAM. Chances are that you would be able to cache enough of it for good performance, depending on your schema. The reason that I say this is that you have about 2GB of RAM give to Solr, leaving about 13-14GB for OS disk caching. If the server is shared with other things, particularly a busy database or busy web server, then the above paragraph might not apply - you may not have enough resources for Solr to work effectively. Thanks, Shawn
Re: SolrCloud index timeout
Hi, The version is the 4.1 I'm not mixing deletes and adds, are only adds. I have a 4 nodes in 2 physical machines, 2 instances of tomcat in each machine. In this case the leader is located in a diferent physical machine that the replica. The collection has all shards in different nodes, I have not oversharding. The question of the stack I need install the visualvm and try to get the stack. I create the collection using the CORE API: LEADER curl http://192.168.20.48:8983/solr/admin/cores\?action\=CREATE\name\=ST-0112\collection\=ST-0112\shard\=00\collection.configName\=statisticsBucket-regular REPLICA curl http://192.168.20.50:8983/solr/admin/cores\?action\=CREATE\name\=ST-0112\collection\=ST-0112\shard\=00\collection.configName\=statisticsBucket-regular The data folders have the content: LEADER drwxr-xr-x 2 root root 4096 Jan 30 17:40 index drwxr-xr-x 2 root root 12288 Feb 5 13:28 index.20130130174052236 drwxr-xr-x 2 root root 36864 Mar 11 15:20 index.20130220001204140 -rw-r--r-- 1 root root78 Feb 20 00:13 index.properties -rw-r--r-- 1 root root 251 Feb 20 00:13 replication.properties drwxr-xr-x 2 root root 4096 Mar 11 15:19 tlog REPLICA drwxr-xr-x 2 root root 4096 Mar 11 15:59 index.20130228105843631 -rw-r--r-- 1 root root 78 Feb 28 10:59 index.properties -rw-r--r-- 1 root root 208 Feb 28 10:59 replication.properties drwxr-xr-x 2 root root 4096 Mar 11 12:17 tlog - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-update-timeout-tp4046348p4046385.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr replication takes long time
Hi guys, I have a problem with Solr replication. I have 2 solr servers (Solr 4.0.0) 1 master and 1 slave (8 processors,16GB RAM ,Ubuntu 11, ext3, each). In every server, there are 2 independent instances of solr running (I tried also multicore config, but having independent instances has for me better performance), every instance having a differente collection. So, we have 2 masters in server 1, and 2 slaves in server 2. Index size is currently (for the biggest collection) around 17 million documents, with a total size near 12 GB. The files transferred every replication cycle are typically not more than 100, with a total size not bigger than 50MB. The other collection is not that big, just around 1 million docs and not bigger than 2 GB and not a high update ratio. The big collection has a load around 200 queries per second (MoreLikeThis, RealTimeGetHandler , TermVectorComponent mainly), and for the small one it is below 50 queries per second Replication has been working for long time with any problem, but in the last weeks the replication cycles started to take long and long time for the big collection, even more than 2 minutes, some times even more. During that time, slaves are so overloaded, that many queries are timing out, despite the timeout in my clients is 30 seconds. The servers are in same LAN, gigabit ethernet, so the broadband should not be the bottleneck. Since the index is receiving frequents updates and deletes (update handler receives more than 200 request per second for the big collection, but not more than 5 per second for the small one), I tried to use the maxCommitsToKeep attribute, to ensure that no file was deleted during replication, but it has no effect. My solrconfig.xml in the big collection is like that: ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_40/luceneMatchVersion directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ indexConfig mergeFactor3/mergeFactor deletionPolicy class=solr.SolrDeletionPolicy str name=maxCommitsToKeep10/str str name=maxOptimizedCommitsToKeep1/str str name=maxCommitAge6HOUR/str /deletionPolicy /indexConfig jmx/ updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs2000/maxDocs maxTime3/maxTime /autoCommit autoSoftCommit maxTime500/maxTime /autoSoftCommit updateLog str name=dir${solr.data.dir:}/str /updateLog /updateHandler query maxBooleanClauses2048/maxBooleanClauses filterCache class=solr.FastLRUCache size=2048 initialSize=1024 autowarmCount=1024/ queryResultCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ documentCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ enableLazyFieldLoadingtrue/enableLazyFieldLoading queryResultWindowSize50/queryResultWindowSize queryResultMaxDocsCached50/queryResultMaxDocsCached listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=fqdate:[NOW/DAY-7DAY TO NOW/DAY+1DAY]/str str name=rows1000/str /lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=fqdate:[NOW/DAY-7DAY TO NOW/DAY+1DAY]/str str name=rows1000/str /lst /arr /listener useColdSearchertrue/useColdSearcher maxWarmingSearchers4/maxWarmingSearchers /query requestHandler name=/replication class=solr.ReplicationHandler lst
Re: Solr replication takes long time
Are you using Solr 4.1? - Mark On Mar 11, 2013, at 1:53 PM, Victor Ruiz bik1...@gmail.com wrote: Hi guys, I have a problem with Solr replication. I have 2 solr servers (Solr 4.0.0) 1 master and 1 slave (8 processors,16GB RAM ,Ubuntu 11, ext3, each). In every server, there are 2 independent instances of solr running (I tried also multicore config, but having independent instances has for me better performance), every instance having a differente collection. So, we have 2 masters in server 1, and 2 slaves in server 2. Index size is currently (for the biggest collection) around 17 million documents, with a total size near 12 GB. The files transferred every replication cycle are typically not more than 100, with a total size not bigger than 50MB. The other collection is not that big, just around 1 million docs and not bigger than 2 GB and not a high update ratio. The big collection has a load around 200 queries per second (MoreLikeThis, RealTimeGetHandler , TermVectorComponent mainly), and for the small one it is below 50 queries per second Replication has been working for long time with any problem, but in the last weeks the replication cycles started to take long and long time for the big collection, even more than 2 minutes, some times even more. During that time, slaves are so overloaded, that many queries are timing out, despite the timeout in my clients is 30 seconds. The servers are in same LAN, gigabit ethernet, so the broadband should not be the bottleneck. Since the index is receiving frequents updates and deletes (update handler receives more than 200 request per second for the big collection, but not more than 5 per second for the small one), I tried to use the maxCommitsToKeep attribute, to ensure that no file was deleted during replication, but it has no effect. My solrconfig.xml in the big collection is like that: ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_40/luceneMatchVersion directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ indexConfig mergeFactor3/mergeFactor deletionPolicy class=solr.SolrDeletionPolicy str name=maxCommitsToKeep10/str str name=maxOptimizedCommitsToKeep1/str str name=maxCommitAge6HOUR/str /deletionPolicy /indexConfig jmx/ updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs2000/maxDocs maxTime3/maxTime /autoCommit autoSoftCommit maxTime500/maxTime /autoSoftCommit updateLog str name=dir${solr.data.dir:}/str /updateLog /updateHandler query maxBooleanClauses2048/maxBooleanClauses filterCache class=solr.FastLRUCache size=2048 initialSize=1024 autowarmCount=1024/ queryResultCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ documentCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ enableLazyFieldLoadingtrue/enableLazyFieldLoading queryResultWindowSize50/queryResultWindowSize queryResultMaxDocsCached50/queryResultMaxDocsCached listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=fqdate:[NOW/DAY-7DAY TO NOW/DAY+1DAY]/str str name=rows1000/str /lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=fqdate:[NOW/DAY-7DAY TO NOW/DAY+1DAY]/str str name=rows1000/str /lst /arr /listener useColdSearchertrue/useColdSearcher maxWarmingSearchers4/maxWarmingSearchers /query
Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
Hello, We are planning to upgrade our solr servers from version 3.5 to 4.1. We have master slave configuration and the index size is quite big (i.e. around 14 GB ). 1. Do we really need to re-format the whole index , when we upgrade to 4.1 ? 2. What will be the consequences - if we do not re-format and simply upgrade war file and config files ( solrconfig.xml, schema.xml ) on all slaves and master together. (Shutdown all master slaves and then upgrade startup) ? 3. If re-formatting is neccessary - then what is the best tool to achieve it. ( How long does it usually take to re-format the index of size around 14GB ) ? Thanks, Feroz -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr replication takes long time
no, Solr 4.0.0, I wanted to update to Solr 4.1 but I read that there was an issue with the replication, so I decided not to try it for now Mark Miller-3 wrote Are you using Solr 4.1? - Mark On Mar 11, 2013, at 1:53 PM, Victor Ruiz lt; bik1979@ gt; wrote: Hi guys, I have a problem with Solr replication. I have 2 solr servers (Solr 4.0.0) 1 master and 1 slave (8 processors,16GB RAM ,Ubuntu 11, ext3, each). In every server, there are 2 independent instances of solr running (I tried also multicore config, but having independent instances has for me better performance), every instance having a differente collection. So, we have 2 masters in server 1, and 2 slaves in server 2. Index size is currently (for the biggest collection) around 17 million documents, with a total size near 12 GB. The files transferred every replication cycle are typically not more than 100, with a total size not bigger than 50MB. The other collection is not that big, just around 1 million docs and not bigger than 2 GB and not a high update ratio. The big collection has a load around 200 queries per second (MoreLikeThis, RealTimeGetHandler , TermVectorComponent mainly), and for the small one it is below 50 queries per second Replication has been working for long time with any problem, but in the last weeks the replication cycles started to take long and long time for the big collection, even more than 2 minutes, some times even more. During that time, slaves are so overloaded, that many queries are timing out, despite the timeout in my clients is 30 seconds. The servers are in same LAN, gigabit ethernet, so the broadband should not be the bottleneck. Since the index is receiving frequents updates and deletes (update handler receives more than 200 request per second for the big collection, but not more than 5 per second for the small one), I tried to use the maxCommitsToKeep attribute, to ensure that no file was deleted during replication, but it has no effect. My solrconfig.xml in the big collection is like that: ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersion LUCENE_40 /luceneMatchVersion directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ indexConfig mergeFactor 3 /mergeFactor deletionPolicy class=solr.SolrDeletionPolicy str name=maxCommitsToKeep 10 /str str name=maxOptimizedCommitsToKeep 1 /str str name=maxCommitAge 6HOUR /str /deletionPolicy /indexConfig jmx/ updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs 2000 /maxDocs maxTime 3 /maxTime /autoCommit autoSoftCommit maxTime 500 /maxTime /autoSoftCommit updateLog str name=dir ${solr.data.dir:} /str /updateLog /updateHandler query maxBooleanClauses 2048 /maxBooleanClauses filterCache class=solr.FastLRUCache size=2048 initialSize=1024 autowarmCount=1024/ queryResultCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ documentCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ enableLazyFieldLoading true /enableLazyFieldLoading queryResultWindowSize 50 /queryResultWindowSize queryResultMaxDocsCached 50 /queryResultMaxDocsCached listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=q *:* /str str name=fq date:[NOW/DAY-7DAY TO NOW/DAY+1DAY] /str str name=rows 1000 /str /lst /arr /listener listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=q *:* /str str name=fq date:[NOW/DAY-7DAY TO NOW/DAY+1DAY]
Re: Solr replication takes long time
Okay - yes, 4.0 is a better choice for replication than 4.1. It almost sounds like you may be replicating the full index rather than just changes or something. 4.0 had a couple issues as well - a couple things that were discovered while writing stronger tests for 4.2. 4.2 is spreading onto mirrors now. - Mark On Mar 11, 2013, at 2:00 PM, Victor Ruiz bik1...@gmail.com wrote: no, Solr 4.0.0, I wanted to update to Solr 4.1 but I read that there was an issue with the replication, so I decided not to try it for now Mark Miller-3 wrote Are you using Solr 4.1? - Mark On Mar 11, 2013, at 1:53 PM, Victor Ruiz lt; bik1979@ gt; wrote: Hi guys, I have a problem with Solr replication. I have 2 solr servers (Solr 4.0.0) 1 master and 1 slave (8 processors,16GB RAM ,Ubuntu 11, ext3, each). In every server, there are 2 independent instances of solr running (I tried also multicore config, but having independent instances has for me better performance), every instance having a differente collection. So, we have 2 masters in server 1, and 2 slaves in server 2. Index size is currently (for the biggest collection) around 17 million documents, with a total size near 12 GB. The files transferred every replication cycle are typically not more than 100, with a total size not bigger than 50MB. The other collection is not that big, just around 1 million docs and not bigger than 2 GB and not a high update ratio. The big collection has a load around 200 queries per second (MoreLikeThis, RealTimeGetHandler , TermVectorComponent mainly), and for the small one it is below 50 queries per second Replication has been working for long time with any problem, but in the last weeks the replication cycles started to take long and long time for the big collection, even more than 2 minutes, some times even more. During that time, slaves are so overloaded, that many queries are timing out, despite the timeout in my clients is 30 seconds. The servers are in same LAN, gigabit ethernet, so the broadband should not be the bottleneck. Since the index is receiving frequents updates and deletes (update handler receives more than 200 request per second for the big collection, but not more than 5 per second for the small one), I tried to use the maxCommitsToKeep attribute, to ensure that no file was deleted during replication, but it has no effect. My solrconfig.xml in the big collection is like that: ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersion LUCENE_40 /luceneMatchVersion directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ indexConfig mergeFactor 3 /mergeFactor deletionPolicy class=solr.SolrDeletionPolicy str name=maxCommitsToKeep 10 /str str name=maxOptimizedCommitsToKeep 1 /str str name=maxCommitAge 6HOUR /str /deletionPolicy /indexConfig jmx/ updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs 2000 /maxDocs maxTime 3 /maxTime /autoCommit autoSoftCommit maxTime 500 /maxTime /autoSoftCommit updateLog str name=dir ${solr.data.dir:} /str /updateLog /updateHandler query maxBooleanClauses 2048 /maxBooleanClauses filterCache class=solr.FastLRUCache size=2048 initialSize=1024 autowarmCount=1024/ queryResultCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ documentCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ enableLazyFieldLoading true /enableLazyFieldLoading queryResultWindowSize 50 /queryResultWindowSize queryResultMaxDocsCached 50 /queryResultMaxDocsCached listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=q *:* /str str name=fq date:[NOW/DAY-7DAY TO NOW/DAY+1DAY] /str str name=rows 1000 /str /lst
Re: Dynamic schema design: feedback requested
On Wed, Mar 6, 2013 at 7:50 PM, Chris Hostetter hossman_luc...@fucit.org wrote: 2) If you wish to use the /schema REST API for read and write operations, then schema information will be persisted under the covers in a data store whose format is an implementation detail just like the index file format. This really needs to be driven by costs and benefits... There are clear benefits to having a simple human readable / editable file for the schema (whether it's on the local filesystem or on ZK). The ability to say my schema is a config file and i own it should always exist (remove it over my dead body) There are clear benefits to this being the persistence mechanism for the REST API. Even if the REST API persisted it's data in some binary format for example, then there would still need to be import/export mechanisms for the human readable/editable config file config file that should always exist. Why would we want any other intermediate format (i.e. data that is not human readable)? Seems like we should only introduce that extra complexity if the benefits are great enough. Actually, I just realized we already have this intermediate representation - it's the in-memory IndexSchema object. -Yonik http://lucidworks.com
Re: Dynamic schema design: feedback requested
: 2) If you wish to use the /schema REST API for read and write operations, : then schema information will be persisted under the covers in a data store : whose format is an implementation detail just like the index file format. : : This really needs to be driven by costs and benefits... : There are clear benefits to having a simple human readable / editable : file for the schema (whether it's on the local filesystem or on ZK). The cost is the user complexity of understanding what changes are respected and when, and in hte implementation complexity of dealing with changes coming from multiple code paths (both files changed on disk and REST based request changes) in the current model, the config file on disk is hte authority, it is read in it's entirety on core init/reload, and users have total ownership of that file -- changes are funneled through the user, into the config, and solr is a read only participant. Since solr knows the only way schema information will ever change is when it reads that file, it can make internal assumptions about the consistency of that data. in a model where a public REST API might be modifying solr's in memory state, solr can't neccessarily make those same assumptions, and the complexity of the system becomes a lot simpler if the Solr is the authority of the information about the schema, and we don't have to worry about what happens if comflicts arrise, eg: someone modifies the schema on disk, but hasn't (yet?) done a core reload, when a new REST request comes in to modify the schema data in some other way. -Hoss
Re: Dynamic schema design: feedback requested
To revisit sarowes comment about how/when to decide if we are using the config file version of schema info (and hte API is read only) vs internal managed state data version of schema info (and the API is read/write)... On Wed, 6 Mar 2013, Steve Rowe wrote: : Two possible approaches: : : a. When schema.xml is present, ... ... : b. Alternatively, the reverse: ... ... : I like option a. better, since it provides a stable situation for users : who don't want the new dynamic schema modification feature, and who want : to continue to hand edit schema.xml. Users who want the new feature : would use a command-line tool to convert their schema.xml to : schema.json, then remove schema.xml from conf/. The more i think about it, the less I like either a or b because both are completley implicit. I think practically speaking, from a support standpoint, we should require an more explicit configuration of what *type* of schema management should be used, and then have code that sanity checks that and warns/fails if the configuraiton setting doesn't match what is found in the ./conf dir. The situation i worry about, is whan a novice solr user takes over maintence of an existing setup that is using REST based schema management, and therefore has no schema.xml file. The novice is reading docs/tutorials talking about how to achieve some goal, which make refrence to editing the schema.xml or adding XXX to the schema.xml or even worse in the cases of some CMSs: To upgrade to FooCMS vX.Y, replace your schema.xml with this file... but they have no schema.xml or any clear and obvious indication looking at what configs they do have of *why* there is no schema.xml, so maybe they try to add one. I think it would be better to add some new option in solroconfig.xml that requires the user to be explicit about what type of management thye want to use, defaulting to schema.xml for back compat... schema type=conf [maybe an optional file=path/to/schema.xml ?] / ...vs... schema type=managed [this is where the mutable=true|false sarowe mentioned could live] / The on core load: 1) if the configured schema type is file but there is no schema.xml file, ERROR loudly and fail fast. 2) if we see that the the configured schema type is file but we detected the existence of managed internal schema info (schema.json, zk nodes, whatever) then we should WARN that the managed internal data is being ignored. 3) if the configured schema type is managed but there is no manged internal schema info (schema.json, zk nodes, whatever) then ERROR loudly and fail fase (or maybe we create an empty schema for them?) 4) if we see that the the configured schema type is managed but we also detected the existence of a schema.xml config file, then whatever) then we should WARN that the schema.xml is being ignored. ...although i could easily be convinced that all of those WARN sitautions should really be hard failures to reduce confusion -- depends on how easy we can make it to let users delete all internally manged schema info before switching to a type=conf schema.xml approach. -Hoss
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
On 3/11/2013 11:56 AM, feroz_kh wrote: We are planning to upgrade our solr servers from version 3.5 to 4.1. We have master slave configuration and the index size is quite big (i.e. around 14 GB ). 1. Do we really need to re-format the whole index , when we upgrade to 4.1 ? 2. What will be the consequences - if we do not re-format and simply upgrade war file and config files ( solrconfig.xml, schema.xml ) on all slaves and master together. (Shutdown all master slaves and then upgrade startup) ? 3. If re-formatting is neccessary - then what is the best tool to achieve it. ( How long does it usually take to re-format the index of size around 14GB ) ? If you are replicating from 3.5 to 4.1, then your index will be in the 3.5 format. If you upgrade both the master where you index and the slave(s), existing index files will be in the old format, new index segments will be in the new format. If you were to optimize your index after upgrading, it would completely replace it with the new format. For me on a fast I/O subsystem (six 1TB SATA drives in RAID10), it takes about ten minutes to optimize a 22GB index on Solr 3.5. Solr 4.1 needs to compress stored fields, which means extra CPU time, but less time actually writing to disk, so it would be about the same or possibly less. Thanks, Shawn
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
Hi Feroz, due to Lucene's backward compatibility policy ( http://wiki.apache.org/lucene-java/BackwardsCompatibility ), a Solr 4.1 instance should be able to read an index generated by a Solr 3.5 instance. This would not be true if you need to change the schema. Also, be careful because Solr 4.1 could and will change the index files and will make them unreadable by Solr 3.5 (so you should make a backup in case you need to revert to 3.5 for some reason). This means, that if you can't shutdown your whole application all together, you could update the slaves first, and then the masters. Replacing all servers together will also work. That said, you should not use 4.1 if you are using Master/Slave, there are some known bugs in that specific feature in 4.1 that were fixed for 4.2. Tomás On Mon, Mar 11, 2013 at 10:56 AM, feroz_kh feroz.kh2...@gmail.com wrote: Hello, We are planning to upgrade our solr servers from version 3.5 to 4.1. We have master slave configuration and the index size is quite big (i.e. around 14 GB ). 1. Do we really need to re-format the whole index , when we upgrade to 4.1 ? 2. What will be the consequences - if we do not re-format and simply upgrade war file and config files ( solrconfig.xml, schema.xml ) on all slaves and master together. (Shutdown all master slaves and then upgrade startup) ? 3. If re-formatting is neccessary - then what is the best tool to achieve it. ( How long does it usually take to re-format the index of size around 14GB ) ? Thanks, Feroz -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamic schema design: feedback requested
On Mon, Mar 11, 2013 at 2:50 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : 2) If you wish to use the /schema REST API for read and write operations, : then schema information will be persisted under the covers in a data store : whose format is an implementation detail just like the index file format. : : This really needs to be driven by costs and benefits... : There are clear benefits to having a simple human readable / editable : file for the schema (whether it's on the local filesystem or on ZK). The cost is the user complexity of understanding what changes are respected and when There is going to be a cost to understanding any feature. This doesn't deal with the answer to the question are we better off with or without this feature. , and in hte implementation complexity of dealing with changes coming from multiple code paths (both files changed on disk and REST based request changes) Right - and these should be quantifiable going forward. In ZK mode, we need concurrency control anyway, so depending on the design, there may be really no cost at all. In local FS mode, it might be a very low cost (simply check the timestamp on the file for example). Code to re-read the schema and merge changes needs to be there anyway for cloud mode it seems. *If* we needed to, we could just assert that the schema file is the persistence mechanism, as opposed to the system of record, hence if you hand edit it and then use the API to change it, your hand edit may be lost. Or we may decide to do away with local FS mode altogether. I guess my main point is, we shouldn't decide a priori that using the API means you can no longer hand edit. My thoughts on this are probably heavily influenced on how I initially envisioned implementation working in cloud mode (which I thought about first since it's harder). A human readable file on ZK that represents the system of record for the schema seemed to be the best. I never even considered making it non-human readable (and thus non-editable by hand). -Yonik http://lucidworks.com
Re: Solr replication takes long time
Thanks for your answer Mark. I think I'll try to update to 4.2. I'll keep you updated. Anyway, I'd not say that the full index is replicated, I've been monitoring the replication process in the Solr admin console and there I see that usually not more than 50-100 files are transferrend, the total size is rarely greater than 50MB. Is this info trustable? Victor Mark Miller-3 wrote Okay - yes, 4.0 is a better choice for replication than 4.1. It almost sounds like you may be replicating the full index rather than just changes or something. 4.0 had a couple issues as well - a couple things that were discovered while writing stronger tests for 4.2. 4.2 is spreading onto mirrors now. - Mark On Mar 11, 2013, at 2:00 PM, Victor Ruiz lt; bik1979@ gt; wrote: no, Solr 4.0.0, I wanted to update to Solr 4.1 but I read that there was an issue with the replication, so I decided not to try it for now Mark Miller-3 wrote Are you using Solr 4.1? - Mark On Mar 11, 2013, at 1:53 PM, Victor Ruiz lt; bik1979@ gt; wrote: Hi guys, I have a problem with Solr replication. I have 2 solr servers (Solr 4.0.0) 1 master and 1 slave (8 processors,16GB RAM ,Ubuntu 11, ext3, each). In every server, there are 2 independent instances of solr running (I tried also multicore config, but having independent instances has for me better performance), every instance having a differente collection. So, we have 2 masters in server 1, and 2 slaves in server 2. Index size is currently (for the biggest collection) around 17 million documents, with a total size near 12 GB. The files transferred every replication cycle are typically not more than 100, with a total size not bigger than 50MB. The other collection is not that big, just around 1 million docs and not bigger than 2 GB and not a high update ratio. The big collection has a load around 200 queries per second (MoreLikeThis, RealTimeGetHandler , TermVectorComponent mainly), and for the small one it is below 50 queries per second Replication has been working for long time with any problem, but in the last weeks the replication cycles started to take long and long time for the big collection, even more than 2 minutes, some times even more. During that time, slaves are so overloaded, that many queries are timing out, despite the timeout in my clients is 30 seconds. The servers are in same LAN, gigabit ethernet, so the broadband should not be the bottleneck. Since the index is receiving frequents updates and deletes (update handler receives more than 200 request per second for the big collection, but not more than 5 per second for the small one), I tried to use the maxCommitsToKeep attribute, to ensure that no file was deleted during replication, but it has no effect. My solrconfig.xml in the big collection is like that: ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersion LUCENE_40 /luceneMatchVersion directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ indexConfig mergeFactor 3 /mergeFactor deletionPolicy class=solr.SolrDeletionPolicy str name=maxCommitsToKeep 10 /str str name=maxOptimizedCommitsToKeep 1 /str str name=maxCommitAge 6HOUR /str /deletionPolicy /indexConfig jmx/ updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs 2000 /maxDocs maxTime 3 /maxTime /autoCommit autoSoftCommit maxTime 500 /maxTime /autoSoftCommit updateLog str name=dir ${solr.data.dir:} /str /updateLog /updateHandler query maxBooleanClauses 2048 /maxBooleanClauses filterCache class=solr.FastLRUCache size=2048 initialSize=1024 autowarmCount=1024/ queryResultCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ documentCache class=solr.LRUCache size=2048 initialSize=1024 autowarmCount=1024/ enableLazyFieldLoading true /enableLazyFieldLoading queryResultWindowSize 50 /queryResultWindowSize queryResultMaxDocsCached 50
RE: Need help with delta import
This is absolutely a sintax error, I had the same problem, and with dih.delta.id it solves all my problems. Thanks to god and the special person who post the answer in this page. You have to revise your sintax in queries for delta import and watch the catalina (i use tomcat) log file for any errors. Regards,
question about syntax for multiple terms in filter query
hello everyone, i have a question on the filter query syntax for multiple terms, after reading this: http://wiki.apache.org/solr/CommonQueryParameters#fq i see from the above that two (2) syntax constructs are supported fq=term1:foo fq=term2:bar and fq=+term1:foo +term2:bar is there a reason why i would want to use one syntax over the other? does the first syntax support the and operand as well as the ? thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442.html Sent from the Solr - User mailing list archive at Nabble.com.
PostingsHighlighter and analysis
debug=timing has told me for a very long time that 99% of my query time for slow queries is in the highlighting component so I've been eagerly awaiting the postingshighlighter for quite some time. Mean query times 50ms or less, with certain queries able to generate 30s worth of highlighting.Now that it's here I've been somewhat disappointed since I can't use it since so many common analyzers emit tokens out of order, which, apparently is not compatible with storeOffsetsWithPositions. The only analyzer that is in the bad list according to LUCENE-4641 that is really critical to our searches is the WordDelimiter filer. My current index time filter config (which I believe has bee unchanged for me for 5+ years): filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ Does anyone have any suggestions deal with this? Perhaps limiting certain options will always produce tokens in order? Thanks Trey Hyde Director of Engineering Email th...@centraldesktop.com Central Desktop. Work together in ways you never thought possible. Connect with us Website | Twitter | Facebook | LinkedIn | Google+ | Blog
Re: question about syntax for multiple terms in filter query
Hello Mark, I think fq=+term1:foo +term2:bar doesn't actually result in 2 filters being created/used, while fq=term1:foofq=term2:bar does Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Mar 11, 2013 at 4:41 PM, geeky2 gee...@hotmail.com wrote: hello everyone, i have a question on the filter query syntax for multiple terms, after reading this: http://wiki.apache.org/solr/CommonQueryParameters#fq i see from the above that two (2) syntax constructs are supported fq=term1:foo fq=term2:bar and fq=+term1:foo +term2:bar is there a reason why i would want to use one syntax over the other? does the first syntax support the and operand as well as the ? thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about syntax for multiple terms in filter query
Point number 3 from that wiki says it all: 3.The document sets from each filter query are cached independently. Thus, concerning the previous examples: use a single fq containing two mandatory clauses if those clauses appear together often, and use two separate fq params if they are relatively independent. FWIW, there is not operator in Lucene/Solr query syntax. There is the operator which is equivalent to AND, but each of the ampersands must be URL-encoded as %26 to use them in a query in a URL. So, yes, you can use the AND operator, as: fq=term1:foo AND fq=term2:bar or fq=term1:foo %26%26 fq=term2:bar Note that this is not valid in a URL: fq=term1:foo fq=term2:bar It must be written as: fq=term1:foo fq=term2:bar The marks the start of a new query parameter - but that is query in the sense of the URL query, not a Solr query. The must be immediately followed by the parameter name and an =. -- Jack Krupansky -Original Message- From: geeky2 Sent: Monday, March 11, 2013 4:41 PM To: solr-user@lucene.apache.org Subject: question about syntax for multiple terms in filter query hello everyone, i have a question on the filter query syntax for multiple terms, after reading this: http://wiki.apache.org/solr/CommonQueryParameters#fq i see from the above that two (2) syntax constructs are supported fq=term1:foo fq=term2:bar and fq=+term1:foo +term2:bar is there a reason why i would want to use one syntax over the other? does the first syntax support the and operand as well as the ? thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to set Configuration setting for search
Hello Deepshikha, No need for regular expressions once you index some data try using keywords... like Google. :) Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Mar 11, 2013 at 6:05 AM, Deepshikha Raghav raghavd...@in.ibm.comwrote: Hi Team , In Solr how to setFREE TEXT SEARCH configuration. Is there any regular expression setting so that I can configure to obtain search results. With Warm Regards Deepshikha Raghav IBM , Gurgaon --- Mobile-+91-8800140037
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
Thanks Shawn. So if we have new segments in 4.1 format and all old files in 3.5 format at the same time, then will it cause any performance degradation on slaves while reading index files ( which will contain both 3.5 formatted and 4.1 formatted files)? -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4046469.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
Thanks Tomas! I see the latest available version is 4.1 - but you have suggested a 4.2 version, where can i grab 4.2 version from? -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4046471.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamic schema design: feedback requested
: we needed to, we could just assert that the schema file is the : persistence mechanism, as opposed to the system of record, hence if : you hand edit it and then use the API to change it, your hand edit may : be lost. Or we may decide to do away with local FS mode altogether. presuming that it's just a persistence mechanism, but also assuming that the user may edit directly, still creates burdens/complexity in when solr reads/writes to that file -- even if we say that user edits to that file might be overridden (ie: does solr garuntee if/when that the file will be written to if you use the REST api to modify things? -- that's going to be important if we let people read//edit that file) : I guess my main point is, we shouldn't decide a priori that using the : API means you can no longer hand edit. and my point is we should build a feature where solr has the ability to read/write some piece of information, we should start with the asumption that it's OK for us to decide that a priori, and not walk into things assuming we have to support a lot of much more complicated uses cases. if at some point during the implementation we find that supporting a more lax it's ok, you can edit this by hand approach won't be a burden, then so be it -- we can relax that a priori assertion. : My thoughts on this are probably heavily influenced on how I initially my thoughts on this are based directly on: A) the observations of the confusion implementation complexity observed in the dual nature of solr.xml over the years. B) having spent a lot of time maintining code that did programatic read/writing of solr schema.xml files while also trying to treat them as config files that users were allowed to hand edit -- it's a pain in the ass. : envisioned implementation working in cloud mode (which I thought about : first since it's harder). A human readable file on ZK that represents : the system of record for the schema seemed to be the best. I never 1) i never said the data couldn't/shouldn't be human readable -- i said it should be an implementation detail (ie: subject to change automaticly on upgrade just like hte index format), and that end users shouldn't be allowed to edit it arbitrarily 2) cloud mode, as i understand it, is actaully much *easier* (if you want to allow arbitrary user edits to these files) because you can set ZK watches on those nodes, so any code that is maintaining interal state based on them (ie: REST API round trip serialization code that just read the file in to modify the DOM before writing it back out) can be notified if the file has changed. I also beleive i was told that writes to files in ZK are atomic, which also means you never have to wory about reading partial data in the middle of someone else's write. in the general situation of config files on disk we can't even try to enforce a lock file type approach, because we shouldn't assume a user will remember to obey our locks before editing the file. If you sarowe others feel that: 1) it's important to allow arbitrary user editing of schema.xml files in zk mode even when REST read/writes are enabled 2) that allowing arbitrary user edits w/o risk of conflict or complexity in the REST read/write code is easy to implement in ZK mode 3) it's reasonable to require ZK mode in order to suppot read/write mode in the REST API ...that that would certainly resolve my concern's stemming from B above. i'm still worried about A, but perhaps the ZK nature of things and the watches atomicity provided there will reduce confusion. But as long as we are talking about this REST api supporting reads writes to schema info even when running in single node mode with files on disk -- i think it is a *HUGE* fucking mistake to start with the assumption that the serialization mechanism of the REST api needs to be able to play nicely with arbitrary user editing of schema.xml. -Hoss
Re: Some nodes have all the load
The load test was fairly heavy (ie lots of users) and designed to mimic a fully operational system with lots of users doing normal things. There were two things I gleaned from the logs: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 appeared for several of my more active cores and The non-leaders were throwing errors saying that the leader as not responding while trying to forward updates. (sorry can't find that specific error now) My best guess is that it has something to do with the commits. a. frequent user generated writes using /update?commitWithin=500waitFlush=falsewaitSearcher=false b. softCommit set to 3000 c. autoCommit set to 300,000 and openSearcher false d. I'm also doing frequent periodic DIH updates. I guess this is commit=true by default. Should I omit commitWithin and set DIH to commit=false and just let soft and autocommit do their jobs? Cheers, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Some-nodes-have-all-the-load-tp4046349p4046476.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
On 3/11/2013 3:39 PM, feroz_kh wrote: Thanks Shawn. So if we have new segments in 4.1 format and all old files in 3.5 format at the same time, then will it cause any performance degradation on slaves while reading index files ( which will contain both 3.5 formatted and 4.1 formatted files)? There should be no performance degradation. Solr 4.1 should perform at least as well as 3.5 and in many cases it will perform better. Your index on disk will get smaller when converted to 4.1 format, and may become faster. Thanks, Shawn
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
On 3/11/2013 3:43 PM, feroz_kh wrote: Thanks Tomas! I see the latest available version is 4.1 - but you have suggested a 4.2 version, where can i grab 4.2 version from? It is already accessible from many mirrors. Because it is not yet accessible from a large enough percentage of mirrors, the URL hasn't been updated on the main website yet. Here is the URL: http://www.apache.org/dyn/closer.cgi/lucene/solr/4.2.0 If the mirror that gets chosen for you automatically does not yet have it, just try another mirror. There is no information on the download list about where each mirror is, so you'll just have to guess, or look them up to see where they are. Thanks, Shawn
Re: [Beginner] wants to contribute in open source project
: This article I wrote about getting started contributing to projects may give you some ideas. : : http://blog.smartbear.com/software-quality/bid/167051/14-Ways-to-Contribute-to-Open-Source-without-Being-a-Programming-Genius-or-a-Rock-Star Or pehaps even the followup i did of Andy's article layering his advice directly on to Solr... http://searchhub.org/2012/03/26/14-ways-to-contribute-to-solr/ -Hoss
Re: How to Integrate Solr With Hbase
We do have same kind of scenario in our application also. The way we are achieving it is we have a batch process to read the data from Hbase using Hbase API and write it to SOLR using SOLRJ API. Thanks Bharat On Mon, Mar 11, 2013 at 5:38 AM, kamaci furkankam...@gmail.com wrote: I have crawled data into Hbase with my Nutch. How can I use Solr to index the data at Hbase? (Is there any solution from Nutch side, you are welcome) PS: I am new to such kind of technologies and I run Solr from under example folder as startup.jar -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Integrate-Solr-With-Hbase-tp4046297.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Some nodes have all the load
On 3/11/2013 3:52 PM, jimtronic wrote: The load test was fairly heavy (ie lots of users) and designed to mimic a fully operational system with lots of users doing normal things. There were two things I gleaned from the logs: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 appeared for several of my more active cores and The non-leaders were throwing errors saying that the leader as not responding while trying to forward updates. (sorry can't find that specific error now) My best guess is that it has something to do with the commits. a. frequent user generated writes using /update?commitWithin=500waitFlush=falsewaitSearcher=false b. softCommit set to 3000 c. autoCommit set to 300,000 and openSearcher false d. I'm also doing frequent periodic DIH updates. I guess this is commit=true by default. Should I omit commitWithin and set DIH to commit=false and just let soft and autocommit do their jobs? I've just locate a previous message on this list from Mark Miller saying that in Solr 4, commitWithin is a soft commit. You should definitely wait for Mark or another committer to verify what I'm saying in the small novel I am writing below. My personal opinion is that you should have frequent soft commits (auto, manual, commitWithin, or some combination) along with less frequent (but not infrequent) autoCommit with openSearcher=false. The autoCommit (which is a hard commit) does two things - ensures that the transaction logs do not grow out of control, and persists changes to disk. If you have auto soft commits and updateLog is enabled, I would say that you are pretty safe using commit=false on your DIH updates. If Mark agrees with what I have said, and your config/schema checks out OK with expected norms, you may be running into bugs. It might also be a case of not enough CPU/RAM resources for the system load. You never responded in another thread with the output of the 'free' command, or the size of your indexes. Putting 13 busy Solr cores onto one box is overkill, unless the machine has 16-32 CPU cores *and* plenty of fast RAM to cache all your indexes in the OS disk cache. Based on what you're saying here and in the other thread, you probably need a java heap size of 4GB or 8GB, heavily tuned JVM garbage collection options, and depending on the size of your indexes, 16GB may not be enough total system RAM. IMHO, you should not use trunk (5.0) for anything that you plan to one day run in production. Trunk is very volatile, large-scale changes sometimes get committed with only minimal testing. The dev branch named branch_4x (currently 4.3) is kept reasonably stable almost all of the time. Version 4.2 has just been released - it is already available on the faster mirrors and there should be a release announcement within a day from now. If this is not being set up in anticipation for a production deployment, then trunk would be fine, but bugs are to be expected. If the same problems do not happen in 4.2 or branch_4x, then I would move the discussion to the dev list. Thanks, Shawn
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
Thanks Tomas/Shawn! One more question related to backward compatibilty. Previously we had upgraded our solr master/slaves from 1.4 version to 3.5 version - We didn't reformat the whole index then. So i believe there will be some files with 1.4 format present in our index. Now when we upgrade from 3.5 to 4.1/or4.2 - Can we expect solr slave version 4.x to read both 1.4 and 3.5 formatted indices, without any issues ? Thanks, Feroz -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4046500.html Sent from the Solr - User mailing list archive at Nabble.com.
[ANNOUNCE] Apache Solr 4.2 released
March 2013, Apache Solr™ 4.2 available The Lucene PMC is pleased to announce the release of Apache Solr 4.2 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.2 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Solr 4.2 Release Highlights: * A read side REST API for the schema. Always wanted to introspect the schema over http? Now you can. Looks like the write side will be coming next. * DocValues have been integrated into Solr. DocValues can be loaded up a lot faster than the field cache and can also use different compression algorithms as well as in RAM or on Disk representations. Faceting, sorting, and function queries all get to benefit. How about the OS handling faceting and sorting caches off heap? No more tuning 60 gigabyte heaps? How about a snappy new per segment DocValues faceting method? Improved numeric faceting? Sweet. * Collection Aliasing. Got time based data? Want to re-index in a temporary collection and then swap it into production? Done. Stay tuned for Shard Aliasing. * Collection API responses. The collections API was still very new in 4.0, and while it improved a fair bit in 4.1, responses were certainly needed, but missed the cut off. Initially, we made the decision to make the Collection API super fault tolerant, which made responses tougher to do. No one wants to hunt through logs files to see how things turned out. Done in 4.2. * Interact with any collection on any node. Until 4.2, you could only interact with a node in your cluster if it hosted at least one replica of the collection you wanted to query/update. No longer - query any node, whether it has a piece of your intended collection or not and get a proxied response. * Allow custom shard names so that new host addresses can take over for retired shards. Working on Amazon without elastic ips? This is for you. * Lucene 4.2 optimizations such as compressed term vectors. Solr 4.2 also includes many other new features as well as numerous optimizations and bugfixes. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Lucene/Solr developers
Re: Dynamic schema design: feedback requested
On Mon, Mar 11, 2013 at 5:51 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I guess my main point is, we shouldn't decide a priori that using the : API means you can no longer hand edit. and my point is we should build a feature where solr has the ability to read/write some piece of information, we should start with the asumption that it's OK for us to decide that a priori, and not walk into things assuming we have to support a lot of much more complicated uses cases. if at some point during the implementation we find that supporting a more lax it's ok, you can edit this by hand approach won't be a burden, then so be it -- we can relax that a priori assertion. I guess I like a more breadth-first method (or at least that's what it feels like to me). You keep both options in mind as you proceed, and don't start off a hard assertion either way. It would be nice to support editing by hand... but if it becomes too burdensome, c'est la vie. If the persistence format we're going to use is nicely human readable, then I'm good. We can disagree on philosophies, but I'm not sure that it amounts to much in the way of concrete differences at this point. What concerned me was talk of starting to treat this as more of a black box. -Yonik http://lucidworks.com
RE: DataDirectory: relative path doesn't work
Thanks for fixing the wiki page http://wiki.apache.org/solr/SolrConfigXml now it says this: 'If this directory is not absolute, then it is relative to the directory you're in when you start SOLR.' It will be nice if you drop me a line here after you make the change on the document ... -Original Message- From: Patrick Mi [mailto:patrick...@touchpointgroup.com] Sent: Tuesday, 26 February 2013 5:49 p.m. To: solr-user@lucene.apache.org Subject: DataDirectory: relative path doesn't work I am running Solr4.0/Tomcat 7 on Centos6 According to this page http://wiki.apache.org/solr/SolrConfigXml if dataDir is not absolute, then it is relative to the instanceDir of the SolrCore. However the index directory is always created under the directory where I start the Tomcat (startup.sh) rather than under instanceDir of the SolrCore. Am I doing something wrong in configuration? Regards, Patrick
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
On 3/11/2013 5:59 PM, feroz_kh wrote: One more question related to backward compatibilty. Previously we had upgraded our solr master/slaves from 1.4 version to 3.5 version - We didn't reformat the whole index then. So i believe there will be some files with 1.4 format present in our index. Now when we upgrade from 3.5 to 4.1/or4.2 - Can we expect solr slave version 4.x to read both 1.4 and 3.5 formatted indices, without any issues ? If you think that you've got index files from 1.4 still hanging around, you should optimize the indexes in 3.5 before upgrading further, to convert the index. The new version will NOT read index segments that old. Thanks, Shawn
Re: Some nodes have all the load
On Mar 11, 2013, at 5:52 PM, jimtronic jimtro...@gmail.com wrote: Should I omit commitWithin and set DIH to commit=false and just let soft and autocommit do their jobs? Yeah, that's one valid option. You def are not able to keep up with the current commit / open searcher level. It looks like DIH will do a hard commit which will likely open a new searcher as well - that's not good - you should stick to soft commits and the infrequent hard commit. Then the commitWithin is fairly aggressive at 500ms. Whether or not you can keep up with this varies with a lot of factors and features and settings - clearly you are not currently able to keep up. - Mark
Re: Some nodes have all the load
On Mar 11, 2013, at 7:47 PM, Shawn Heisey s...@elyograg.org wrote: I've just locate a previous message on this list from Mark Miller saying that in Solr 4, commitWithin is a soft commit. Yes, that's true. You should definitely wait for Mark or another committer to verify what I'm saying in the small novel I am writing below. My personal opinion is that you should have frequent soft commits (auto, manual, commitWithin, or some combination) along with less frequent (but not infrequent) autoCommit with openSearcher=false. The autoCommit (which is a hard commit) does two things - ensures that the transaction logs do not grow out of control, and persists changes to disk. If you have auto soft commits and updateLog is enabled, I would say that you are pretty safe using commit=false on your DIH updates. Right. If Mark agrees with what I have said, and your config/schema checks out OK with expected norms, you may be running into bugs. It might also be a case of not enough CPU/RAM resources for the system load. You never responded in another thread with the output of the 'free' command, or the size of your indexes. Putting 13 busy Solr cores onto one box is overkill, unless the machine has 16-32 CPU cores *and* plenty of fast RAM to cache all your indexes in the OS disk cache. Based on what you're saying here and in the other thread, you probably need a java heap size of 4GB or 8GB, heavily tuned JVM garbage collection options, and depending on the size of your indexes, 16GB may not be enough total system RAM. IMHO, you should not use trunk (5.0) for anything that you plan to one day run in production. Trunk is very volatile, large-scale changes sometimes get committed with only minimal testing. The dev branch named branch_4x (currently 4.3) is kept reasonably stable almost all of the time. Version 4.2 has just been released - it is already available on the faster mirrors and there should be a release announcement within a day from now. If this is not being set up in anticipation for a production deployment, then trunk would be fine, but bugs are to be expected. If the same problems do not happen in 4.2 or branch_4x, then I would move the discussion to the dev list. Thanks, Shawn
SolrException: Error opening new searcher
I am running into issues where my Solr instance is behaving weirdly. After I get the SolrException Error opening new searcher, my Solr instance fails to handle even the simplest of update requests. http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-td494732.html I have found some suggestions that tend to suggest that I am making more Solr commit requests than what my instance can handle, though I am unsure on the way forward. What is really annoying is that I seem to have to restart my Solr instance (service tomcat7 restart) to get things working again. I am very concerned about this behaviour, as it seems that if I was to get a spike in demand, the whole instance could fall down. Any suggestions on the way forward? - 14:30:00SEVERE SolrCoreorg.apache.solr.common.SolrException: Error opening new searcher 14:30:00SEVERE SolrDispatchFilter null:org.apache.solr.common.SolrException: Error opening new searcher -- On Tomcat Solr restart -- 14:31:19WARNING UpdateLog Starting log replay tlog{file=/opt/solr/instances/solr/collection1/data/tlog/tlog.0017502 refcount=2} active=false starting pos=0 14:31:20WARNING UpdateLog Log replay finished. recoveryInfo=RecoveryInfo{adds=2 deletes=0 deleteByQuery=0 errors=0 positionOfStart=0} - -- View this message in context: http://lucene.472066.n3.nabble.com/SolrException-Error-opening-new-searcher-tp4046543.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr _docid_ parameter
In Solr, I noticed that I can sort by the internal Lucene _docid_. - http://wiki.apache.org/solr/CommonQueryParameters http://wiki.apache.org/solr/CommonQueryParameters You can sort by index id using sort=_docid_ asc or sort=_docid_ desc * I have also read the docid is represented by a sequential number. - http://lucene.472066.n3.nabble.com/Get-DocID-after-Document-insert-td556278.html http://lucene.472066.n3.nabble.com/Get-DocID-after-Document-insert-td556278.html Your document IDs may change, and in fact *will* change if you delete a document and then optimize. Say you index 100 docs, delete number 50 and optimize. Documents that originally had IDs 51-100 will now have IDs 50-99 and your hierarchy will be messed up. - http://www.garethfreeman.com/2011/11/sorting-results-by-order-indexed-in.html http://www.garethfreeman.com/2011/11/sorting-results-by-order-indexed-in.html Just a quick one. If you are looking to sort your Solr results by the order they were indexed you can used sort=_docid_ asc or sort=_docid_ desc as you sorting query parameter. So there is a slight chance that the _docid_ might represent document creation order. Does anyone have knowledge and experience with the internals of Solr/Lucene 4.x and the _docid_ field to clarify this? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-docid-parameter-tp4046544.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom update handler?
Many thanks. Let me record here what I have tried. I have viewed: http://wiki.apache.org/solr/UpdateXmlMessages and this github project which is suggestive: https://github.com/industria/solrprocessors I now have two UpdateRequestChains: updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain and the new one (which is harvest without the TopicQuestsDocumentProcessFactory): updateRequestProcessorChain name=partial default=false processor class=solr.RunUpdateProcessorFactory/ processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain Before I added partial requestHandler name=/update class=solr.XmlUpdateRequestHandler ... harvest always ran using http://localhost:8983/solr as the base URL. A goal was to use harvest only for updates and use partial for partial updates. I am now feeding partial with this code: UpdateRequest ur = new UpdateRequest(); ur.add(document); ur.setCommitWithin(1000); UpdateResponse response = ur.process(updateServer); where updateServer is a second SolrJ server set to http://localhost:8983/solr/update But, what is now happening, after I made this addition: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainpartial/str /lst /requestHandler dropping partial into /update where nothing was there before, Now, just partial is running from the base URL and harvest is never called, which means that I never see partial updates to validate that part of the code. At issue is this: I have two update pathways: One for when I am adding new documents One for which I am performing partial updates May I ask how I can configure my system to use harvest for new documents and partial for when partial updates are sent in? Many thanks Jack On Mon, Mar 11, 2013 at 12:23 AM, Upayavira u...@odoko.co.uk wrote: You need to refer to your chain in a RequestHandler config. Search for /update, duplicate that, and change the chain it points to. Upayavira On Mon, Mar 11, 2013, at 05:22 AM, Jack Park wrote: With 4.1, not in cloud configuration, I have a custom response handler chain which injects an additional handler for studying the documents as they come in. But, when I do partial updates on those documents, I don't want them to be studied again, so I created another version of the same chain, but without my added feature. I named it /partial. When I create an instance of SolrJ for the url server/solr/partial, I get back this error message: Server at http://localhost:8983/solr/partial returned non ok status:404, message:Not Found {locator=2146fd50-fac9-47d5-85c0-47aaeafe177f, tuples={set=99edfffe-b65c-4b5e-9436-67085ce49c9c}} Here is the configuration for that: updateRequestProcessorChain name=/partial default=false processor class=solr.RunUpdateProcessorFactory/ processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain The normal handler chain is this: updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain which runs on a SolrJ set for http://localhost:8983/solr/ What might I be missing? Many thanks Jack
Re: question about syntax for multiple terms in filter query
otis and jack - thank you VERY much for the feedback - jack - use a single fq containing two mandatory clauses if those clauses appear together often this is the use case i have to account for - eg, right now i have this in my request handler requestHandler name=partItemNoSearch class=solr.SearchHandler default=false ... str name=fqitemType:1/str ... /requestHandler which says - i only want parts but i need to augment the filter so only parts that have a price = 1.0 are returned from the request handler so i believe i need to have this in the RH requestHandler name=partItemNoSearch class=solr.SearchHandler default=false ... str name=fq+itemType:1 +sellingPrice:[1 TO *]/str ... /requestHandler thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-syntax-for-multiple-terms-in-filter-query-tp4046442p4046548.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom update handler? Some progress, new issue
Further progress now hampered by configuring an update log. When I follow instructions found around the web, I get this: SEVERE: Unable to create core: collection1 caused by Caused by: java.lang.NullPointerException at org.apache.solr.common.params.SolrParams.toSolrParams(SolrParams.java:295) Now, the updateLog is configured thus: requestHandler name=/update/partial class=solr.BinaryUpdateRequestHandler lst name=defaults str name=update.chainpartial/str /lst updateLog class=solr.FSUpdateLog str name=dir${solr.data.dir:}/str /updateLog /requestHandler I think the issue lies with solr.data.dir The wikis just say to drop that into the request handler chain, without any explanation of where solr.data.dir comes from. In any case, I might have successfully settled on how to choose which update chain, but now I am deep into the bowels of update logs. What am I missing? Many thanks Jack On Mon, Mar 11, 2013 at 9:45 PM, Jack Park jackp...@topicquests.org wrote: Many thanks. Let me record here what I have tried. I have viewed: http://wiki.apache.org/solr/UpdateXmlMessages and this github project which is suggestive: https://github.com/industria/solrprocessors I now have two UpdateRequestChains: updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain and the new one (which is harvest without the TopicQuestsDocumentProcessFactory): updateRequestProcessorChain name=partial default=false processor class=solr.RunUpdateProcessorFactory/ processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain Before I added partial requestHandler name=/update class=solr.XmlUpdateRequestHandler ... harvest always ran using http://localhost:8983/solr as the base URL. A goal was to use harvest only for updates and use partial for partial updates. I am now feeding partial with this code: UpdateRequest ur = new UpdateRequest(); ur.add(document); ur.setCommitWithin(1000); UpdateResponse response = ur.process(updateServer); where updateServer is a second SolrJ server set to http://localhost:8983/solr/update But, what is now happening, after I made this addition: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainpartial/str /lst /requestHandler dropping partial into /update where nothing was there before, Now, just partial is running from the base URL and harvest is never called, which means that I never see partial updates to validate that part of the code. At issue is this: I have two update pathways: One for when I am adding new documents One for which I am performing partial updates May I ask how I can configure my system to use harvest for new documents and partial for when partial updates are sent in? Many thanks Jack On Mon, Mar 11, 2013 at 12:23 AM, Upayavira u...@odoko.co.uk wrote: You need to refer to your chain in a RequestHandler config. Search for /update, duplicate that, and change the chain it points to. Upayavira On Mon, Mar 11, 2013, at 05:22 AM, Jack Park wrote: With 4.1, not in cloud configuration, I have a custom response handler chain which injects an additional handler for studying the documents as they come in. But, when I do partial updates on those documents, I don't want them to be studied again, so I created another version of the same chain, but without my added feature. I named it /partial. When I create an instance of SolrJ for the url server/solr/partial, I get back this error message: Server at http://localhost:8983/solr/partial returned non ok status:404, message:Not Found {locator=2146fd50-fac9-47d5-85c0-47aaeafe177f, tuples={set=99edfffe-b65c-4b5e-9436-67085ce49c9c}} Here is the configuration for that: updateRequestProcessorChain name=/partial default=false processor class=solr.RunUpdateProcessorFactory/ processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain The normal handler chain is this: updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain which runs on a SolrJ set for http://localhost:8983/solr/ What might I be missing? Many thanks Jack