deleting a doc inside a custom UpdateRequestProcessor

2013-11-18 Thread Peyman Faratin
Hi

I am building a custom UpdateRequestProcessor to intercept any doc heading to 
the index. Basically what I want to do is to check if the current index has a 
doc with the same title (i am using IDs as the uniques so I can't use that, and 
besides the logic of checking is a little more complicated). If the incoming 
doc has a duplicate and some other conditions hold then one of 2 things can 
happen:

1- we don't index the incoming document
2- we index the incoming and delete the duplicate currently in the index

I think (1) can be done by simple not passing the call up the chain (not 
calling super.processAdd(cmd)). However, I don't know how to implement the 
second condition, deleting the duplicate document, inside a custom 
UpdateRequestProcessor. This thread is the closest to my goal 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html

however i am not clear how to proceed. Code snippets below.

thank you in advance for your help

class isDuplicate extends UpdateRequestProcessor 
{
public isDuplicate( UpdateRequestProcessor next) { 
  super( next ); 
} 
@Override 
public void processAdd(AddUpdateCommand cmd) throws IOException 
{   
try 
{
boolean indexIncomingDoc = 
checkIfIsDuplicate(cmd); 
if(indexIncomingDoc)
super.processAdd(cmd);  

} catch (SolrServerException e) {e.printStackTrace();} 
catch (ParseException e) {e.printStackTrace();}
} 
public boolean checkIfIsDuplicate(AddUpdateCommand cmd) ...{

SolrInputDocument incomingDoc = 
cmd.getSolrInputDocument();
if(incomingDoc == null) return false;
String title = (String) incomingDoc.getFieldValue( 
title );
SolrIndexSearcher searcher = 
cmd.getReq().getSearcher();
boolean addIncomingDoc = true;
Integer idOfDuplicate = searcher.getFirstMatch(new 
Term(title,title));
if(idOfDuplicate != -1) 
{
addIncomingDoc = 
compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc);
}
return addIncomingDoc;  
}
private boolean compareDocs(.){ 

if( condition 1 ) 
{
-- DELETE DUPLICATE DOC in INDEX --
addIncomingDoc = true;
}

return addIncomingDoc;
}

Re: deleting a doc inside a custom UpdateRequestProcessor

2013-11-18 Thread Liu Bo
hi,

you can try this in your checkIfIsDuplicate(), build a query based on
your title, and set it to a delete command:

//build your query accordingly, this depends on how your
tittle is indexed, eg analyzed or not. be careful with it and do some test.
  DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processDelete(cmd);

Processors are normally chained, you should make sure that your
processor comes the first so that it can control what's coming next based
on your logic.

you can also try to write your own updaterequesthandler instead of a
customized processor.

you can do a set of operations in your function
@Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {}

get your processor chain in this function and passes a delete command
to it such as :

SolrParams params = req.getParams();
checkParameter(params);
UpdateRequestProcessorChain processorChain =
req.getCore().getUpdateProcessingChain(params.get(UpdateParams.UPDATE_CHAIN));
UpdateRequestProcessor processor = processorChain.createProcessor(req,
rsp);

  DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processor.processDelete(cmd);

this is what I am doing when customizing a update request handler, I try
not to touch the original process chain but tell solr what to do by
commands.


On 19 November 2013 10:01, Peyman Faratin pey...@robustlinks.com wrote:

 Hi

 I am building a custom UpdateRequestProcessor to intercept any doc heading
 to the index. Basically what I want to do is to check if the current index
 has a doc with the same title (i am using IDs as the uniques so I can't use
 that, and besides the logic of checking is a little more complicated). If
 the incoming doc has a duplicate and some other conditions hold then one of
 2 things can happen:

 1- we don't index the incoming document
 2- we index the incoming and delete the duplicate currently in the
 index

 I think (1) can be done by simple not passing the call up the chain (not
 calling super.processAdd(cmd)). However, I don't know how to implement the
 second condition, deleting the duplicate document, inside a custom
 UpdateRequestProcessor. This thread is the closest to my goal

 http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html

 however i am not clear how to proceed. Code snippets below.

 thank you in advance for your help

 class isDuplicate extends UpdateRequestProcessor
 {
 public isDuplicate( UpdateRequestProcessor next) {
   super( next );
 }
 @Override
 public void processAdd(AddUpdateCommand cmd) throws
 IOException {
 try
 {
 boolean indexIncomingDoc =
 checkIfIsDuplicate(cmd);
 if(indexIncomingDoc)
 super.processAdd(cmd);
 } catch (SolrServerException e)
 {e.printStackTrace();}
 catch (ParseException e) {e.printStackTrace();}
 }
 public boolean checkIfIsDuplicate(AddUpdateCommand cmd)
 ...{

 SolrInputDocument incomingDoc =
 cmd.getSolrInputDocument();
 if(incomingDoc == null) return false;
 String title = (String) incomingDoc.getFieldValue(
 title );
 SolrIndexSearcher searcher =
 cmd.getReq().getSearcher();
 boolean addIncomingDoc = true;
 Integer idOfDuplicate = searcher.getFirstMatch(new
 Term(title,title));
 if(idOfDuplicate != -1)
 {
 addIncomingDoc =
 compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc);
 }
 return addIncomingDoc;
 }
 private boolean compareDocs(.){
 
 if( condition 1 )
 {
 -- DELETE DUPLICATE DOC in INDEX --
 addIncomingDoc = true;
 }
 
 return addIncomingDoc;
 }




-- 
All the best

Liu Bo