Yes, if mcf receives a 404 response it will delete the document from the
index.

Continuous crawling though means the document may not be retried for a long
time.  Exponential back off is used.

Karl

On Tue, Sep 3, 2019, 1:36 AM Priya Arora <pr...@smartshore.nl> wrote:

> Yes its a  continuous   Job.
>
> On Tue, Sep 3, 2019 at 11:05 AM Priya Arora <pr...@smartshore.nl> wrote:
>
> > Hi ,
> > I am having a job Job:-myuniversity_intranet (which is crawling data from
> > intranet site) and the data has been indexed in an index.
> > My query here is, does manifold have some functionality to test a url
> > before indexing that whether the URL is existing or not?.
> > Likewise , in my index (say index name: abc), i am having URL(indexed).
> > URL:- https:myuniversity/reaserch/info(which is an intranet url). This
> URL
> > was existing earlier but not existing now, and resulting status is 404.
> >
> > Query is :- Can monifoldcf checks before indexing whether its status is
> > not equal to 404(that means it exists). if the URL exists in real only
> then
> > index otherwise skip that URL.
> > Does this setting can be implemented while configuring manifold cf job.,
> > or do I have to manually handle this in code.
> >
> >
> > Kind regards
> > Priya
> >
> > On Mon, Sep 2, 2019 at 8:19 PM Karl Wright <daddy...@gmail.com> wrote:
> >
> >> Hi,
> >> You aren't giving me enough information to know why your job isn't
> >> rechecking URLs.  Please tell me how your job is configured,
> specifically
> >> whether it's continuous or not.  Thanks.
> >>
> >> Karl
> >>
> >>
> >> On Mon, Sep 2, 2019 at 4:47 AM Priya Arora <pr...@smartshore.nl> wrote:
> >>
> >> > Hi,
> >> >
> >> > I have a query regarding manifoldCF. Is this having some kind of
> >> > functionality to check, if the URL it is crawling, does exist actually
> >> or
> >> > page not found(404).
> >> >
> >> > Like I have a requirement in which i am crawling data for university
> and
> >> > job i continuously running.After some period it found that the certain
> >> > URL's have been removed from University site but its is getting
> indexed
> >> > still also.
> >> >
> >> > Some pages have been marked as status 404.
> >> >  How can manifold be automatise to check this , that if the URL is
> >> > corresponding to 404(does not  exist anymore), it should be indexed
> >> >
> >> > Thanks
> >> > Priya.
> >> >
> >>
> >
>

Reply via email to