RE: SolrCloud deduplication
Hi, SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes the DistributedProcessor in the update chain. Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: Re: SolrCloud deduplication Hey Markus - When I ran into a similar issue with another update proc, I created https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things to avoid this. I have not committed this yet though, in favor of waiting for https://issues.apache.org/jira/browse/SOLR-2822 Go vote? :) On May 18, 2012, at 7:49 AM, Markus Jelsma wrote: Hi, Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not functional anymore. The problem is that documents are passed multiple times through the URP and the digest field is added as if it is an multi valued field. If the field is not multi valued you'll get this typical error. Changing the order or URP's in the chain does not solve the problem. Any hints on how to resolve the issue? Is this a problem in the SignatureUpdateRequestProcessor and does it need to be updated to work with SolrCloud? Thanks, Markus - Mark Miller lucidimagination.com
RE: SolrCloud deduplication
Hi again, It seemed to work fine but in the end duplicates are not overwritten. We first run the SignatureProcessor and then the DistributedProcessor. If we do it the other way around the digest field receives multiple values and throws errors. Is there anything else we can do or another patch to try? Thanks Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Mon 21-May-2012 15:58 To: solr-user@lucene.apache.org; Mark Miller markrmil...@gmail.com Subject: RE: SolrCloud deduplication Hi, SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes the DistributedProcessor in the update chain. Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: Re: SolrCloud deduplication Hey Markus - When I ran into a similar issue with another update proc, I created https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things to avoid this. I have not committed this yet though, in favor of waiting for https://issues.apache.org/jira/browse/SOLR-2822 Go vote? :) On May 18, 2012, at 7:49 AM, Markus Jelsma wrote: Hi, Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not functional anymore. The problem is that documents are passed multiple times through the URP and the digest field is added as if it is an multi valued field. If the field is not multi valued you'll get this typical error. Changing the order or URP's in the chain does not solve the problem. Any hints on how to resolve the issue? Is this a problem in the SignatureUpdateRequestProcessor and does it need to be updated to work with SolrCloud? Thanks, Markus - Mark Miller lucidimagination.com
RE: SolrCloud deduplication
https://issues.apache.org/jira/browse/SOLR-3473 -Original message- From:Mark Miller markrmil...@gmail.com Sent: Mon 21-May-2012 18:11 To: solr-user@lucene.apache.org Subject: Re: SolrCloud deduplication Looking again at the SignatureUpdateProcessor code, I think that indeed this won't currently work with distrib updates. Could you file a JIRA issue for that? The problem is that we convert update commands into solr documents - and that can cause a loss of info if an update proc modifies the update command. I think the reason that you see a multiple values error when you try the other order is because of the lack of a document clone (the other issue I mentioned a few emails back). Addressing that won't solve your issue though - we have to come up with a way to propagate the currently lost info on the update command. - Mark On May 21, 2012, at 10:39 AM, Markus Jelsma wrote: Hi again, It seemed to work fine but in the end duplicates are not overwritten. We first run the SignatureProcessor and then the DistributedProcessor. If we do it the other way around the digest field receives multiple values and throws errors. Is there anything else we can do or another patch to try? Thanks Markus -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Mon 21-May-2012 15:58 To: solr-user@lucene.apache.org; Mark Miller markrmil...@gmail.com Subject: RE: SolrCloud deduplication Hi, SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes the DistributedProcessor in the update chain. Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: Re: SolrCloud deduplication Hey Markus - When I ran into a similar issue with another update proc, I created https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things to avoid this. I have not committed this yet though, in favor of waiting for https://issues.apache.org/jira/browse/SOLR-2822 Go vote? :) On May 18, 2012, at 7:49 AM, Markus Jelsma wrote: Hi, Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not functional anymore. The problem is that documents are passed multiple times through the URP and the digest field is added as if it is an multi valued field. If the field is not multi valued you'll get this typical error. Changing the order or URP's in the chain does not solve the problem. Any hints on how to resolve the issue? Is this a problem in the SignatureUpdateRequestProcessor and does it need to be updated to work with SolrCloud? Thanks, Markus - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: SolrCloud deduplication
Hey Markus - When I ran into a similar issue with another update proc, I created https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things to avoid this. I have not committed this yet though, in favor of waiting for https://issues.apache.org/jira/browse/SOLR-2822 Go vote? :) On May 18, 2012, at 7:49 AM, Markus Jelsma wrote: Hi, Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not functional anymore. The problem is that documents are passed multiple times through the URP and the digest field is added as if it is an multi valued field. If the field is not multi valued you'll get this typical error. Changing the order or URP's in the chain does not solve the problem. Any hints on how to resolve the issue? Is this a problem in the SignatureUpdateRequestProcessor and does it need to be updated to work with SolrCloud? Thanks, Markus - Mark Miller lucidimagination.com
RE: SolrCloud deduplication
Hi, Interesting! I'm watching the issues and will test as soon as they are committed. Thanks! -Original message- From:Mark Miller markrmil...@gmail.com Sent: Fri 18-May-2012 16:05 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io Subject: Re: SolrCloud deduplication Hey Markus - When I ran into a similar issue with another update proc, I created https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things to avoid this. I have not committed this yet though, in favor of waiting for https://issues.apache.org/jira/browse/SOLR-2822 Go vote? :) On May 18, 2012, at 7:49 AM, Markus Jelsma wrote: Hi, Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not functional anymore. The problem is that documents are passed multiple times through the URP and the digest field is added as if it is an multi valued field. If the field is not multi valued you'll get this typical error. Changing the order or URP's in the chain does not solve the problem. Any hints on how to resolve the issue? Is this a problem in the SignatureUpdateRequestProcessor and does it need to be updated to work with SolrCloud? Thanks, Markus - Mark Miller lucidimagination.com
RE: SolrCloud deduplication
: Interesting! I'm watching the issues and will test as soon as they are committed. FWIW: it's a chicken and egg problem -- if you could test out the patch in SOLR-2822 with your real world use case / configs, and comment on it's effectiveness, that would go a long way towards my confidence in it. -Hoss
RE: SolrCloud deduplication
you're right. I'll test the patch as soon as possible. Thanks! -Original message- From:Chris Hostetter hossman_luc...@fucit.org Sent: Fri 18-May-2012 18:20 To: solr-user@lucene.apache.org Subject: RE: SolrCloud deduplication : Interesting! I'm watching the issues and will test as soon as they are committed. FWIW: it's a chicken and egg problem -- if you could test out the patch in SOLR-2822 with your real world use case / configs, and comment on it's effectiveness, that would go a long way towards my confidence in it. -Hoss