Hi all,

I need to extract and analyse crawled urls because they may contain certain
parameters such as "?redirectURL=" that could point to new Documents to be
fetched and indexed.

First I was trying to create a subclass that extends

public class RedirectExtractor extends
org.apache.manifoldcf.agents.transformation.BaseTransformationConnector

and add a "RedirectExtractor" transformation step to the fetch process in
ManifoldCF, but it only allows me to modify current Document, not to create
a new FETCH from the extracted parameter.

I was investigating manifoldCF source code and I found something that may
be in hand

activities.recordActivity(null,ACTIVITY_FETCH,
                null,urlValue,Integer.toString(-2),"Robots exclusion",null);

from the IProcessActivity interface, which is used by the Connectors. I
didn't want to create a new connector since it is a bit complex but, do you
see an alternative or this is the only way?

Thanks in advance.

Reply via email to