Hi , I am having a job Job:-myuniversity_intranet (which is crawling data from intranet site) and the data has been indexed in an index. My query here is, does manifold have some functionality to test a url before indexing that whether the URL is existing or not?. Likewise , in my index (say index name: abc), i am having URL(indexed). URL:- https:myuniversity/reaserch/info(which is an intranet url). This URL was existing earlier but not existing now, and resulting status is 404.
Query is :- Can monifoldcf checks before indexing whether its status is not equal to 404(that means it exists). if the URL exists in real only then index otherwise skip that URL. Does this setting can be implemented while configuring manifold cf job., or do I have to manually handle this in code. Kind regards Priya On Mon, Sep 2, 2019 at 8:19 PM Karl Wright <daddy...@gmail.com> wrote: > Hi, > You aren't giving me enough information to know why your job isn't > rechecking URLs. Please tell me how your job is configured, specifically > whether it's continuous or not. Thanks. > > Karl > > > On Mon, Sep 2, 2019 at 4:47 AM Priya Arora <pr...@smartshore.nl> wrote: > > > Hi, > > > > I have a query regarding manifoldCF. Is this having some kind of > > functionality to check, if the URL it is crawling, does exist actually or > > page not found(404). > > > > Like I have a requirement in which i am crawling data for university and > > job i continuously running.After some period it found that the certain > > URL's have been removed from University site but its is getting indexed > > still also. > > > > Some pages have been marked as status 404. > > How can manifold be automatise to check this , that if the URL is > > corresponding to 404(does not exist anymore), it should be indexed > > > > Thanks > > Priya. > > >