Hi ,
I am having a job Job:-myuniversity_intranet (which is crawling data from
intranet site) and the data has been indexed in an index.
My query here is, does manifold have some functionality to test a url
before indexing that whether the URL is existing or not?.
Likewise , in my index (say index name: abc), i am having URL(indexed).
URL:- https:myuniversity/reaserch/info(which is an intranet url). This URL
was existing earlier but not existing now, and resulting status is 404.

Query is :- Can monifoldcf checks before indexing whether its status is not
equal to 404(that means it exists). if the URL exists in real only then
index otherwise skip that URL.
Does this setting can be implemented while configuring manifold cf job., or
do I have to manually handle this in code.


Kind regards
Priya

On Mon, Sep 2, 2019 at 8:19 PM Karl Wright <daddy...@gmail.com> wrote:

> Hi,
> You aren't giving me enough information to know why your job isn't
> rechecking URLs.  Please tell me how your job is configured, specifically
> whether it's continuous or not.  Thanks.
>
> Karl
>
>
> On Mon, Sep 2, 2019 at 4:47 AM Priya Arora <pr...@smartshore.nl> wrote:
>
> > Hi,
> >
> > I have a query regarding manifoldCF. Is this having some kind of
> > functionality to check, if the URL it is crawling, does exist actually or
> > page not found(404).
> >
> > Like I have a requirement in which i am crawling data for university and
> > job i continuously running.After some period it found that the certain
> > URL's have been removed from University site but its is getting indexed
> > still also.
> >
> > Some pages have been marked as status 404.
> >  How can manifold be automatise to check this , that if the URL is
> > corresponding to 404(does not  exist anymore), it should be indexed
> >
> > Thanks
> > Priya.
> >
>

Reply via email to