RE: SolrCloud deduplication

2012-05-21 Thread Markus Jelsma
Hi,

SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes 
the DistributedProcessor in the update chain. 

Thanks,
Markus

 
 
-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Fri 18-May-2012 16:05
 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io
 Subject: Re: SolrCloud deduplication
 
 Hey Markus -
 
 When I ran into a similar issue with another update proc, I created 
 https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things 
 to avoid this. I have not committed this yet though, in favor of waiting for 
 https://issues.apache.org/jira/browse/SOLR-2822
 
 Go vote? :)
 
 On May 18, 2012, at 7:49 AM, Markus Jelsma wrote:
 
  Hi,
  
  Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is 
  not 
  functional anymore. The problem is that documents are passed multiple times 
  through the URP and the digest field is added as if it is an multi valued 
  field. 
  If the field is not multi valued you'll get this typical error. Changing 
  the 
  order or URP's in the chain does not solve the problem.
  
  Any hints on how to resolve the issue? Is this a problem in the 
  SignatureUpdateRequestProcessor and does it need to be updated to work with 
  SolrCloud? 
  
  Thanks,
  Markus
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 


RE: SolrCloud deduplication

2012-05-21 Thread Markus Jelsma
Hi again,

It seemed to work fine but in the end duplicates are not overwritten. We first 
run the SignatureProcessor and then the DistributedProcessor. If we do it the 
other way around the digest field receives multiple values and throws errors. 
Is there anything else we can do or another patch to try?

Thanks
Markus
 
 
-Original message-
 From:Markus Jelsma markus.jel...@openindex.io
 Sent: Mon 21-May-2012 15:58
 To: solr-user@lucene.apache.org; Mark Miller markrmil...@gmail.com
 Subject: RE: SolrCloud deduplication
 
 Hi,
 
 SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes 
 the DistributedProcessor in the update chain. 
 
 Thanks,
 Markus
 
  
  
 -Original message-
  From:Mark Miller markrmil...@gmail.com
  Sent: Fri 18-May-2012 16:05
  To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io
  Subject: Re: SolrCloud deduplication
  
  Hey Markus -
  
  When I ran into a similar issue with another update proc, I created 
  https://issues.apache.org/jira/browse/SOLR-3215 so that I could order 
  things to avoid this. I have not committed this yet though, in favor of 
  waiting for https://issues.apache.org/jira/browse/SOLR-2822
  
  Go vote? :)
  
  On May 18, 2012, at 7:49 AM, Markus Jelsma wrote:
  
   Hi,
   
   Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is 
   not 
   functional anymore. The problem is that documents are passed multiple 
   times 
   through the URP and the digest field is added as if it is an multi valued 
   field. 
   If the field is not multi valued you'll get this typical error. Changing 
   the 
   order or URP's in the chain does not solve the problem.
   
   Any hints on how to resolve the issue? Is this a problem in the 
   SignatureUpdateRequestProcessor and does it need to be updated to work 
   with 
   SolrCloud? 
   
   Thanks,
   Markus
  
  - Mark Miller
  lucidimagination.com
  
  
  
  
  
  
  
  
  
  
  
  
 


RE: SolrCloud deduplication

2012-05-21 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-3473

-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Mon 21-May-2012 18:11
 To: solr-user@lucene.apache.org
 Subject: Re: SolrCloud deduplication
 
 Looking again at the SignatureUpdateProcessor code, I think that indeed this 
 won't currently work with distrib updates. Could you file a JIRA issue for 
 that? The problem is that we convert update commands into solr documents - 
 and that can cause a loss of info if an update proc modifies the update 
 command.
 
 I think the reason that you see a multiple values error when you try the 
 other order is because of the lack of a document clone (the other issue I 
 mentioned a few emails back). Addressing that won't solve your issue though - 
 we have to come up with a way to propagate the currently lost info on the 
 update command.
 
 - Mark
 
 On May 21, 2012, at 10:39 AM, Markus Jelsma wrote:
 
  Hi again,
  
  It seemed to work fine but in the end duplicates are not overwritten. We 
  first run the SignatureProcessor and then the DistributedProcessor. If we 
  do it the other way around the digest field receives multiple values and 
  throws errors. Is there anything else we can do or another patch to try?
  
  Thanks
  Markus
  
  
  -Original message-
  From:Markus Jelsma markus.jel...@openindex.io
  Sent: Mon 21-May-2012 15:58
  To: solr-user@lucene.apache.org; Mark Miller markrmil...@gmail.com
  Subject: RE: SolrCloud deduplication
  
  Hi,
  
  SOLR-2822 seems to work just fine as long as the SignatureProcessor 
  precedes the DistributedProcessor in the update chain. 
  
  Thanks,
  Markus
  
  
  
  -Original message-
  From:Mark Miller markrmil...@gmail.com
  Sent: Fri 18-May-2012 16:05
  To: solr-user@lucene.apache.org; Markus Jelsma 
  markus.jel...@openindex.io
  Subject: Re: SolrCloud deduplication
  
  Hey Markus -
  
  When I ran into a similar issue with another update proc, I created 
  https://issues.apache.org/jira/browse/SOLR-3215 so that I could order 
  things to avoid this. I have not committed this yet though, in favor of 
  waiting for https://issues.apache.org/jira/browse/SOLR-2822
  
  Go vote? :)
  
  On May 18, 2012, at 7:49 AM, Markus Jelsma wrote:
  
  Hi,
  
  Deduplication on SolrCloud through the SignatureUpdateRequestProcessor 
  is not 
  functional anymore. The problem is that documents are passed multiple 
  times 
  through the URP and the digest field is added as if it is an multi 
  valued field. 
  If the field is not multi valued you'll get this typical error. Changing 
  the 
  order or URP's in the chain does not solve the problem.
  
  Any hints on how to resolve the issue? Is this a problem in the 
  SignatureUpdateRequestProcessor and does it need to be updated to work 
  with 
  SolrCloud? 
  
  Thanks,
  Markus
  
  - Mark Miller
  lucidimagination.com
  
  
  
  
  
  
  
  
  
  
  
  
  
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 


Re: SolrCloud deduplication

2012-05-18 Thread Mark Miller
Hey Markus -

When I ran into a similar issue with another update proc, I created 
https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things to 
avoid this. I have not committed this yet though, in favor of waiting for 
https://issues.apache.org/jira/browse/SOLR-2822

Go vote? :)

On May 18, 2012, at 7:49 AM, Markus Jelsma wrote:

 Hi,
 
 Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is not 
 functional anymore. The problem is that documents are passed multiple times 
 through the URP and the digest field is added as if it is an multi valued 
 field. 
 If the field is not multi valued you'll get this typical error. Changing the 
 order or URP's in the chain does not solve the problem.
 
 Any hints on how to resolve the issue? Is this a problem in the 
 SignatureUpdateRequestProcessor and does it need to be updated to work with 
 SolrCloud? 
 
 Thanks,
 Markus

- Mark Miller
lucidimagination.com













RE: SolrCloud deduplication

2012-05-18 Thread Markus Jelsma
Hi,

Interesting! I'm watching the issues and will test as soon as they are 
committed.

Thanks!

 
 
-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Fri 18-May-2012 16:05
 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io
 Subject: Re: SolrCloud deduplication
 
 Hey Markus -
 
 When I ran into a similar issue with another update proc, I created 
 https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things 
 to avoid this. I have not committed this yet though, in favor of waiting for 
 https://issues.apache.org/jira/browse/SOLR-2822
 
 Go vote? :)
 
 On May 18, 2012, at 7:49 AM, Markus Jelsma wrote:
 
  Hi,
  
  Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is 
  not 
  functional anymore. The problem is that documents are passed multiple times 
  through the URP and the digest field is added as if it is an multi valued 
  field. 
  If the field is not multi valued you'll get this typical error. Changing 
  the 
  order or URP's in the chain does not solve the problem.
  
  Any hints on how to resolve the issue? Is this a problem in the 
  SignatureUpdateRequestProcessor and does it need to be updated to work with 
  SolrCloud? 
  
  Thanks,
  Markus
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 


RE: SolrCloud deduplication

2012-05-18 Thread Chris Hostetter

: Interesting! I'm watching the issues and will test as soon as they are 
committed.

FWIW: it's a chicken and egg problem -- if you could test out the patch in 
SOLR-2822 with your real world use case / configs, and comment on it's 
effectiveness, that would go a long way towards my confidence in it.


-Hoss


RE: SolrCloud deduplication

2012-05-18 Thread Markus Jelsma
you're right. I'll test the patch as soon as possible.
Thanks!

 
 
-Original message-
 From:Chris Hostetter hossman_luc...@fucit.org
 Sent: Fri 18-May-2012 18:20
 To: solr-user@lucene.apache.org
 Subject: RE: SolrCloud deduplication
 
 
 : Interesting! I'm watching the issues and will test as soon as they are 
 committed.
 
 FWIW: it's a chicken and egg problem -- if you could test out the patch in 
 SOLR-2822 with your real world use case / configs, and comment on it's 
 effectiveness, that would go a long way towards my confidence in it.
 
 
 -Hoss