Re: Replicating Between Solr Clouds
Unless Solr is your system of record, aren't you already replicating your source data across the WAN? If so, could you load Solr in colo B from your colo B data source? You may be duplicating some indexing work, but at least your colo B Solr would be more closely in sync with your colo B data. Toby Sent via BlackBerry by ATT -Original Message- From: Tim Potter tim.pot...@lucidworks.com Date: Wed, 5 Mar 2014 02:51:21 To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org Reply-To: solr-user@lucene.apache.org Subject: RE: Replicating Between Solr Clouds Unfortunately, there is no out-of-the-box solution for this at the moment. In the past, I solved this using a couple of different approaches, which weren't all that elegant but served the purpose and were simple enough to allow the ops folks to setup monitors and alerts if things didn't work. 1) use DIH's Solr entity processor to pull data from one Solr to another, see: http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor This only works if you store all fields, which in my use case was OK because I also did lots of partial document updates, which also required me to store all fields 2) use the replication handler's snapshot support to create snapshots on a regular basis and then move the files over the network This one works but required the use of read and write aliases and two collections on the remote (slave) data center so that I could rebuild my write collection from the snapshots and then update the aliases to point the reads at the updated collection. Work on an automated backup/restore solution is planned, see https://issues.apache.org/jira/browse/SOLR-5750, but if you need something sooner, you can write a backup driver using SolrJ that uses CloudSolrServer to get the address of all the shard leaders, initiate the backup command on each leader, poll the replication details handler for snapshot completion on each shard, and then ship the files across the network. Obviously, this isn't a solution for NRT multi-homing ;-) Lastly, these aren't the only ways to go about this, just wanted to share some high-level details about what has worked. Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: perdurabo robert_par...@volusion.com Sent: Tuesday, March 04, 2014 1:04 PM To: solr-user@lucene.apache.org Subject: Replicating Between Solr Clouds We are looking to setup a highly available failover site across a WAN for our SolrCloud instance. The main production instance is at colo center A and consists of a 3-node ZooKeeper ensemble managing configs for a 4-node SolrCloud running Solr 4.6.1. We only have one collection among the 4 cores and there are two shards in the collection, one master node and one replica node for each shard. Our search and indexing services address the Solr cloud through a load balancer VIP, not a compound API call. Anyway, the Solr wiki explains fairly well how to replicate single node Solr collections, but I do not see an obvious way for replicating a SolrCloud's indices over a WAN to another SolrCloud. I need for a SolrCloud in another data center to be able to replicate both shards of the collection in the other data center over a WAN. It needs to be able to replicate from a load balancer VIP, not a single named server of the SolrCloud, which round robins across all four nodes/2 shards for high availability. I've searched high and low for a white paper or some discussion of how to do this and haven't found anything. Any ideas? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, One thing to consider is, I think solrnet use xml update, there is xml parsing overhead with it. Switching to solrJ or CSV can cause additional gain. http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Ahmet On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com wrote: I will surely read about JVM Garbage collection. Thanks a lot, all of you. But, is the time required for my indexing good enough? I dont know about the ideal timings. I think that my indexing is taking more time. -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
Thanks Ahmet for the correction. I used wireshark to capture an UpdateRequest to solr and saw this XML: adddoc boost=1.0field name=caseID123/fieldfield name=caseNameblah/field/doc/add and figured that javabin was only for the responses. Does wt apply for how solrj send requests to solr? Could this HTTP content be in javabin format? Toby On Wed, Mar 5, 2014 at 4:34 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Toby, SolrJ uses javabin by default. Ahmet On Wednesday, March 5, 2014 11:31 PM, Toby Lazar tla...@capitaltg.com wrote: I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, One thing to consider is, I think solrnet use xml update, there is xml parsing overhead with it. Switching to solrJ or CSV can cause additional gain. http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Ahmet On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com wrote: I will surely read about JVM Garbage collection. Thanks a lot, all of you. But, is the time required for my indexing good enough? I dont know about the ideal timings. I think that my indexing is taking more time. -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
OK, I was using HttpSolrServer since I haven't yet migrated to CloudSolrServer. I added the line: solrServer.setRequestWriter(new BinaryRequestWriter()) after creating the server object and now see the difference through wireshark. Is it fair to assume that this usage is multi-thread safe? Thank you Shawn and Ahmet, Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Wed, Mar 5, 2014 at 4:46 PM, Shawn Heisey s...@elyograg.org wrote: On 3/5/2014 2:31 PM, Toby Lazar wrote: I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Until recently, SolrJ always used XML by default for requests and javabin for responses. That is moving to javabin for both. This is already the case in the newest versions for CloudSolrServer. HttpSolrServer is still using the XML RequestWriter by default, but you can change this very easily to BinaryRequestWriter. If you plan to use SolrJ, it's a change I would highly recommend. Thanks, Shawn
Re: Facet field query on subset of documents
Luis (or anyone else), Did you ever find a solution for this problem? If not, is querying twice the way to go? I'm looking to do the same with no luck yet. Thanks, Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Thu, Nov 21, 2013 at 5:44 PM, Luis Lebolo luis.leb...@gmail.com wrote: Hi Erick, Thanks for the reply and sorry, my fault, wasn't clear enough. I was wondering if there was a way to remove terms that would always be zero (because the term came from a document that didn't match the filter query). Here's an example. I have a bunch of documents with fields 'manufacturer' and 'location'. If I set my filter query to manufacturer = Sony and all Sony documents had a value of 'Florida' for location, then I want 'Florida' NOT to show up in my facet field results. Instead, it shows up with a count of zero (and it'll always be zero because of my filter query). Using mincount = 1 doesn't solve my problem because I don't want it to hide zeroes that came from documents that actually pass my filter query. Does that make more sense? On Thu, Nov 21, 2013 at 4:36 PM, Erick Erickson erickerick...@gmail.com wrote: That's what faceting does. The facets are only tabulated for documents that satisfy they query, including all of the filter queries and anh other criteria. Otherwise, facet counts would be the same no matter what the query was. Or I'm completely misunderstanding your question... Best, Erick On Thu, Nov 21, 2013 at 4:22 PM, Luis Lebolo luis.leb...@gmail.com wrote: Hi All, Is it possible to perform a facet field query on a subset of documents (the subset being defined via a filter query for instance)? I understand that facet pivoting might work, but it would require that the subset be defined by some field hierarchy, e.g. manufacturer - price (then only look at the results for the manufacturer I'm interested in). What if I wanted to define a more complex subset (where the name starts with A but ends with Z and some other field is greater than 5 and yet another field is not 'x', etc.)? Ideally I would then define a facet field constraining query to include only terms from documents that pass this query. Thanks, Luis
Re: How to get similarity score between 0 and 1 not relative score
I think you are looking for something like this, though you can omit the fq section: http://localhost:8983/solr/collection/select?abc=text:bobq={!func}scale(product(query($abc),1),0,1)fq={! frange l=0.9}$q Also, I don't understand all the fuss about normalized scores. In the linked example, I can see an interest in searching for apple bannana, zzz yyy xxx qqq kkk ttt rrr 111, etc. and wanting only close matches for that point in time. Would this be a good use for this approach? I understand that the results can change if the documents in the index change. Thanks, Toby On Thu, Oct 31, 2013 at 12:56 AM, Anshum Gupta ans...@anshumgupta.netwrote: Hi Susheel, Have a look at this: http://wiki.apache.org/lucene-java/ScoresAsPercentages You may really want to reconsider doing that. On Thu, Oct 31, 2013 at 9:41 AM, sushil sharma sushil2...@yahoo.co.in wrote: Hi, We have a requirement where user would like to see a score (between 0 to 1) which can tell how close the input search string is with result string. So if input was very close but not exact matach, score could be .90 etc. I do understand that we can get score from solr divide by highest score but that will always show 1 even if we match was not exact. Regards, Susheel -- Anshum Gupta http://www.anshumgupta.net
Re: pivot range faceting
Thanks for confirming my fears. I saw some presentations where I thought this feature was used, but perhaps it was done performing multiple range queries. Any chance there is a way for copyField to copy a function of a field instead of the original itself is there? Or, must this be done by the application? Thank you again for your help. Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Sun, Oct 20, 2013 at 2:39 PM, Upayavira u...@odoko.co.uk wrote: On Sun, Oct 20, 2013, at 04:04 AM, Toby Lazar wrote: Is it possible to get pivot info on a range-faceted query? For example, if I want to query the number of orders placed in January, February, etc., I know I can use a simple range search. If I want to get the number of orders by category, I can do that easily by faceting on category. I'm wondering if I can get the number of all orders by month, and also broken down by category. Is that possible in a single query? You can't yet include a range facet within a pivot. The way to achieve this is to store a version of your date field rounded to the nearest month, then you will be able to use that field in a pivot facet. Obviously, this requires index time effort, which is less than ideal. I guess this is a feature just waiting for someone to implement it. Upayavira
pivot range faceting
Is it possible to get pivot info on a range-faceted query? For example, if I want to query the number of orders placed in January, February, etc., I know I can use a simple range search. If I want to get the number of orders by category, I can do that easily by faceting on category. I'm wondering if I can get the number of all orders by month, and also broken down by category. Is that possible in a single query? Thanks, Toby