[ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714692#comment-16714692 ]
Tim Steenbeke commented on CONNECTORS-1562: ------------------------------------------- # I created a job with a Null-Outputconnector # put 30 url's as seeds # set the hopfilter to 0 so no links or redirects will be checked, # run the job. Check Simple History: All the docuemtns get fetched and processed (if: {color:#333333}RESPONSECODENOTINDEXABLE{color}) # I edit the JOB # delete all but 3 URL's, seeds are now just 3 URL's # run the job Check Simple History: all documents get fetched even though they aren't in the seeds anymore no document gets deleted and the job ends !30URLSeeds.png! !3URLSeed.png! !Screenshot from 2018-12-10 14-07-46.png! > Document removal Elastic > ------------------------ > > Key: CONNECTORS-1562 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1562 > Project: ManifoldCF > Issue Type: Bug > Components: Elastic Search connector, Web connector > Affects Versions: ManifoldCF 2.11 > Environment: Manifoldcf 2.11 > Elasticsearch 6.3.2 > Web inputconnector > elastic outputconnecotr > Job crawls website input and outputs content to elastic > Reporter: Tim Steenbeke > Assignee: Karl Wright > Priority: Critical > Labels: starter > Attachments: 30URLSeeds.png, 3URLSeed.png, Screenshot from 2018-12-05 > 09-01-46.png, Screenshot from 2018-12-10 14-07-46.png > > Original Estimate: 4h > Remaining Estimate: 4h > > My documents aren't removed from ElasticSearch index after rerunning the > changed seeds > I update my job to change the seedmap and rerun it or use the schedualer to > keep it runneng even after updating it. > After the rerun the unreachable documents don't get deleted. > It only adds doucments when they can be reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)