[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714692#comment-16714692
 ] 

Tim Steenbeke commented on CONNECTORS-1562:
-------------------------------------------

# I created a job with a Null-Outputconnector
 # put  30 url's as seeds
 # set the hopfilter to 0 so no links or redirects will be checked,
 # run the job.

Check Simple History: All the docuemtns get fetched and processed (if: 
{color:#333333}RESPONSECODENOTINDEXABLE{color})
 # I edit the JOB
 # delete all but 3 URL's, seeds are now just 3 URL's
 # run the job

Check Simple History: all documents get fetched even though they aren't in the 
seeds anymore no document gets deleted and the job ends

 

!30URLSeeds.png!

!3URLSeed.png!

!Screenshot from 2018-12-10 14-07-46.png!

 

> Document removal Elastic
> ------------------------
>
>                 Key: CONNECTORS-1562
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Elastic Search connector, Web connector
>    Affects Versions: ManifoldCF 2.11
>         Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>            Reporter: Tim Steenbeke
>            Assignee: Karl Wright
>            Priority: Critical
>              Labels: starter
>         Attachments: 30URLSeeds.png, 3URLSeed.png, Screenshot from 2018-12-05 
> 09-01-46.png, Screenshot from 2018-12-10 14-07-46.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to