hi,
you can try this in your checkIfIsDuplicate(), build a query based on
your title, and set it to a delete command:
//build your query accordingly, this depends on how your
tittle is indexed, eg analyzed or not. be careful with it and do some test.
DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processDelete(cmd);
Processors are normally chained, you should make sure that your
processor comes the first so that it can control what's coming next based
on your logic.
you can also try to write your own updaterequesthandler instead of a
customized processor.
you can do a set of operations in your function
@Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {}
get your processor chain in this function and passes a delete command
to it such as :
SolrParams params = req.getParams();
checkParameter(params);
UpdateRequestProcessorChain processorChain =
req.getCore().getUpdateProcessingChain(params.get(UpdateParams.UPDATE_CHAIN));
UpdateRequestProcessor processor = processorChain.createProcessor(req,
rsp);
DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processor.processDelete(cmd);
this is what I am doing when customizing a update request handler, I try
not to touch the original process chain but tell solr what to do by
commands.
On 19 November 2013 10:01, Peyman Faratin pey...@robustlinks.com wrote:
Hi
I am building a custom UpdateRequestProcessor to intercept any doc heading
to the index. Basically what I want to do is to check if the current index
has a doc with the same title (i am using IDs as the uniques so I can't use
that, and besides the logic of checking is a little more complicated). If
the incoming doc has a duplicate and some other conditions hold then one of
2 things can happen:
1- we don't index the incoming document
2- we index the incoming and delete the duplicate currently in the
index
I think (1) can be done by simple not passing the call up the chain (not
calling super.processAdd(cmd)). However, I don't know how to implement the
second condition, deleting the duplicate document, inside a custom
UpdateRequestProcessor. This thread is the closest to my goal
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html
however i am not clear how to proceed. Code snippets below.
thank you in advance for your help
class isDuplicate extends UpdateRequestProcessor
{
public isDuplicate( UpdateRequestProcessor next) {
super( next );
}
@Override
public void processAdd(AddUpdateCommand cmd) throws
IOException {
try
{
boolean indexIncomingDoc =
checkIfIsDuplicate(cmd);
if(indexIncomingDoc)
super.processAdd(cmd);
} catch (SolrServerException e)
{e.printStackTrace();}
catch (ParseException e) {e.printStackTrace();}
}
public boolean checkIfIsDuplicate(AddUpdateCommand cmd)
...{
SolrInputDocument incomingDoc =
cmd.getSolrInputDocument();
if(incomingDoc == null) return false;
String title = (String) incomingDoc.getFieldValue(
title );
SolrIndexSearcher searcher =
cmd.getReq().getSearcher();
boolean addIncomingDoc = true;
Integer idOfDuplicate = searcher.getFirstMatch(new
Term(title,title));
if(idOfDuplicate != -1)
{
addIncomingDoc =
compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc);
}
return addIncomingDoc;
}
private boolean compareDocs(.){
if( condition 1 )
{
-- DELETE DUPLICATE DOC in INDEX --
addIncomingDoc = true;
}
return addIncomingDoc;
}
--
All the best
Liu Bo