Re: Repository connector for source with with delta API

2019-05-27 Thread Raman Gupta
On Mon, May 27, 2019 at 5:58 PM Karl Wright  wrote:
>
> (1) There should be no new tables needed for any of this.  Your seed list
> can be stored in the job specification information.  See the rss connector
> for simple example of how this might be done.

Are you assuming the seed list is static? The RSS connector only
changes the job specification in the `processSpecificationPost`
method, and I assume that the job spec is read-only in
`addSeedDocuments`.

As I thought I made clear in previous messages, my seed list is
dynamic, which is why I switched to MODEL_ALL -- on each call to
addSeedDocuments, I can dynamically determine which seeds are
relevant, and only provide that list. To provide additional context,
the dynamic seed list is based on regular expression matches, where
the underlying seeds/roots can come and go based on which ones match
the regexes, and the regexes are present in the document spec.

> (2) If you have switched to MODEL_ALL then all you need to do is provide a
> mechanism for any given document for determining which seed it comes from,
> and simply look for that in the job specification.  If not there, call
> activities.removeDocument().

See above.

Regards,
Raman

> Karl
>
>
> On Mon, May 27, 2019 at 5:16 PM Raman Gupta  wrote:
>
> > One seed per job is an interesting approach but in the interests of
> > fully understanding the alternatives let me consider choice #2.
> >
> > >  you might want to combine this all into one job, but then you would
> > need to link your documents somehow to the seed they came from, so that if
> > the seed was no longer part of the job specification, it could always be
> > detected as a deletion.
> >
> > There are good reasons for me to prefer a single job, so how would I
> > accomplish this? Should my connector create its own tables and manage
> > this state there? Or is there another more light-weight approach?
> >
> > > Unfortunately, this is inconsistent with MODEL_ADD_CHANGE_DELETE,
> > because under that scheme you'd need to *detect* the deletion, because you
> > wouldn't be told by the repository that somebody had changed the
> > configuration.
> >
> > That is fine and I understand completely -- I forgot to mention in my
> > previous message that I've already switched to MODEL_ALL, and am
> > detecting and providing the list of currently active seeds on every
> > call to addSeedDocuments.
> >
> > Regards,
> > Raman
> >
> > On Mon, May 27, 2019 at 4:55 PM Karl Wright  wrote:
> > >
> > > This is very different from the design you originally told me you were
> > > going to do.
> > >
> > > Generally, using hopcounts for managing your documents is a bad practice;
> > > this is expensive to do and almost always yields unexpected results.
> > > You could have one job per seed, which means all you have to do to make
> > the
> > > seed go away is delete the job corresponding to it.  If you have way too
> > > many seeds for that, you might want to combine this all into one job, but
> > > then you would need to link your documents somehow to the seed they came
> > > from, so that if the seed was no longer part of the job specification, it
> > > could always be detected as a deletion.  Unfortunately, this is
> > > inconsistent with MODEL_ADD_CHANGE_DELETE, because under that scheme
> > you'd
> > > need to *detect* the deletion, because you wouldn't be told by the
> > > repository that somebody had changed the configuration.
> > >
> > > So two choices: (1) Exactly one seed per job, or (2) don't use
> > > MODEL_ADD_CHANGE_DELETE.
> > >
> > > Karl
> > >
> > >
> > > On Mon, May 27, 2019 at 4:38 PM Raman Gupta 
> > wrote:
> > >
> > > > Thanks for your help Karl. So I think I'm converging on a design.
> > > > First of all, per your recommendation, I've switched to scheduled
> > > > crawl and it executes as expected every minute with the "schedule
> > > > window anytime" setting.
> > > >
> > > > My next problem is dealing with seed deletion. My upstream source
> > > > actually has multiple "roots" i.e. each root has its own set of
> > > > documents, and the delta API must be called once for each "root". To
> > > > deal with this, I'm specifying each "root" as  a "seed document", and
> > > > each such root/seed creates "contained_in" documents. It is also
> > > > possible for a "root" to be deleted by a user of the upstream system.
> > > >
> > > > My job is defined with an accurate hopcount as follows:
> > > >
> > > > "job": {
> > > >   ... snip naming, scheduling, output connectors, doc spec
> > > >   "hopcount_mode" to "accurate"
> > > >   "hopcount" to json {
> > > > "link_type" to "contained_in"
> > > > "count" to 1
> > > >   },
> > > >
> > > > For each seed, in processDocuments I am doing:
> > > >
> > > > activities.addDocumentReference("... doc identifier ...",
> > > > seedDocumentIdentifier, "contained_in");
> > > >
> > > > and then this triggers processDocuments for each of those documents,
> > > > as expected.
> > > >
> > > > How do I code the 

Re: Repository connector for source with with delta API

2019-05-27 Thread Raman Gupta
One seed per job is an interesting approach but in the interests of
fully understanding the alternatives let me consider choice #2.

>  you might want to combine this all into one job, but then you would need to 
> link your documents somehow to the seed they came from, so that if the seed 
> was no longer part of the job specification, it could always be detected as a 
> deletion.

There are good reasons for me to prefer a single job, so how would I
accomplish this? Should my connector create its own tables and manage
this state there? Or is there another more light-weight approach?

> Unfortunately, this is inconsistent with MODEL_ADD_CHANGE_DELETE, because 
> under that scheme you'd need to *detect* the deletion, because you wouldn't 
> be told by the repository that somebody had changed the configuration.

That is fine and I understand completely -- I forgot to mention in my
previous message that I've already switched to MODEL_ALL, and am
detecting and providing the list of currently active seeds on every
call to addSeedDocuments.

Regards,
Raman

On Mon, May 27, 2019 at 4:55 PM Karl Wright  wrote:
>
> This is very different from the design you originally told me you were
> going to do.
>
> Generally, using hopcounts for managing your documents is a bad practice;
> this is expensive to do and almost always yields unexpected results.
> You could have one job per seed, which means all you have to do to make the
> seed go away is delete the job corresponding to it.  If you have way too
> many seeds for that, you might want to combine this all into one job, but
> then you would need to link your documents somehow to the seed they came
> from, so that if the seed was no longer part of the job specification, it
> could always be detected as a deletion.  Unfortunately, this is
> inconsistent with MODEL_ADD_CHANGE_DELETE, because under that scheme you'd
> need to *detect* the deletion, because you wouldn't be told by the
> repository that somebody had changed the configuration.
>
> So two choices: (1) Exactly one seed per job, or (2) don't use
> MODEL_ADD_CHANGE_DELETE.
>
> Karl
>
>
> On Mon, May 27, 2019 at 4:38 PM Raman Gupta  wrote:
>
> > Thanks for your help Karl. So I think I'm converging on a design.
> > First of all, per your recommendation, I've switched to scheduled
> > crawl and it executes as expected every minute with the "schedule
> > window anytime" setting.
> >
> > My next problem is dealing with seed deletion. My upstream source
> > actually has multiple "roots" i.e. each root has its own set of
> > documents, and the delta API must be called once for each "root". To
> > deal with this, I'm specifying each "root" as  a "seed document", and
> > each such root/seed creates "contained_in" documents. It is also
> > possible for a "root" to be deleted by a user of the upstream system.
> >
> > My job is defined with an accurate hopcount as follows:
> >
> > "job": {
> >   ... snip naming, scheduling, output connectors, doc spec
> >   "hopcount_mode" to "accurate"
> >   "hopcount" to json {
> > "link_type" to "contained_in"
> > "count" to 1
> >   },
> >
> > For each seed, in processDocuments I am doing:
> >
> > activities.addDocumentReference("... doc identifier ...",
> > seedDocumentIdentifier, "contained_in");
> >
> > and then this triggers processDocuments for each of those documents,
> > as expected.
> >
> > How do I code the connector such that I can now remove the documents
> > that are now unreachable due to the deleted seed? I don't see any
> > calls to `processDocuments` via the framework that would allow me to
> > do this.
> >
> > Regards,
> > Raman
> >
> >
> > On Fri, May 24, 2019 at 7:29 PM Karl Wright  wrote:
> > >
> > > Hi Raman,
> > >
> > > (1) Continuous crawl is not a good model for you.  It's meant for
> > crawling
> > > large web domains, not the kind of task you are doing.
> > > (2) Scheduled crawl will work fine for you if you simply tell it "start
> > > within schedule window" and make sure your schedule completely covers
> > 7x24
> > > times.  So you can do this with one record, which triggers on every day
> > of
> > > the week, that has a schedule window of 24 hours.
> > >
> > > Karl
> > >
> > >
> > > On Fri, May 24, 2019 at 7:12 PM Raman Gupta 
> > wrote:
> > >
> > > > Yes, we are indeed running it in continuous crawl mode. Scheduled mode
> > > > works, but given we have a delta API, we thought this is what makes
> > > > sense, as the delta API is efficient and we don't need to wait an
> > > > entire day for a scheduled job to run. I see that if I change recrawl
> > > > interval and max recrawl interval also to 1 minute, then my documents
> > > > do get processed each time. However, now we have the opposite problem:
> > > > now the documents are reprocessed every minute, regardless of whether
> > > > they were reseeded or not, which makes no sense to me. If I am using
> > > > MODEL_ADD_CHANGE_DELETE and not returning anything in my seed method,
> > > > then why are 

Re: Repository connector for source with with delta API

2019-05-27 Thread Raman Gupta
Thanks for your help Karl. So I think I'm converging on a design.
First of all, per your recommendation, I've switched to scheduled
crawl and it executes as expected every minute with the "schedule
window anytime" setting.

My next problem is dealing with seed deletion. My upstream source
actually has multiple "roots" i.e. each root has its own set of
documents, and the delta API must be called once for each "root". To
deal with this, I'm specifying each "root" as  a "seed document", and
each such root/seed creates "contained_in" documents. It is also
possible for a "root" to be deleted by a user of the upstream system.

My job is defined with an accurate hopcount as follows:

"job": {
  ... snip naming, scheduling, output connectors, doc spec
  "hopcount_mode" to "accurate"
  "hopcount" to json {
"link_type" to "contained_in"
"count" to 1
  },

For each seed, in processDocuments I am doing:

activities.addDocumentReference("... doc identifier ...",
seedDocumentIdentifier, "contained_in");

and then this triggers processDocuments for each of those documents,
as expected.

How do I code the connector such that I can now remove the documents
that are now unreachable due to the deleted seed? I don't see any
calls to `processDocuments` via the framework that would allow me to
do this.

Regards,
Raman


On Fri, May 24, 2019 at 7:29 PM Karl Wright  wrote:
>
> Hi Raman,
>
> (1) Continuous crawl is not a good model for you.  It's meant for crawling
> large web domains, not the kind of task you are doing.
> (2) Scheduled crawl will work fine for you if you simply tell it "start
> within schedule window" and make sure your schedule completely covers 7x24
> times.  So you can do this with one record, which triggers on every day of
> the week, that has a schedule window of 24 hours.
>
> Karl
>
>
> On Fri, May 24, 2019 at 7:12 PM Raman Gupta  wrote:
>
> > Yes, we are indeed running it in continuous crawl mode. Scheduled mode
> > works, but given we have a delta API, we thought this is what makes
> > sense, as the delta API is efficient and we don't need to wait an
> > entire day for a scheduled job to run. I see that if I change recrawl
> > interval and max recrawl interval also to 1 minute, then my documents
> > do get processed each time. However, now we have the opposite problem:
> > now the documents are reprocessed every minute, regardless of whether
> > they were reseeded or not, which makes no sense to me. If I am using
> > MODEL_ADD_CHANGE_DELETE and not returning anything in my seed method,
> > then why are the same documents being reprocessed over and over? I
> > have sent the output to the NullOutput using
> > `ingestDocumentWithException` and the status shows OK, and yet the
> > same documents are repeatedly sent to processDocuments.
> >
> > I just want to process the particular documents I specify on each
> > iteration every 60 seconds -- no more, no less, and yet I seem unable
> > to build a connector that does this.
> >
> > If I move to a non-contiguous mode, do I really have to create 1440
> > schedule objects, one for each minute of each day? The way the
> > schedule seems to be put together, I don't see a way to just schedule
> > every minute with one schedule. I would have expected schedules to
> > just use cron expressions.
> >
> > If I move to the design #2 in my OP and have one "virtual document" to
> > just avoid the seeding stage all-together, then is there some place
> > where I can store the delta token state? Or does my connector have to
> > create its own db table to store this?
> >
> > Regards,
> > Raman
> >
> > On Fri, May 24, 2019 at 6:18 PM Karl Wright  wrote:
> > >
> > > So MODEL_ADD_CHANGE does not work for you, eh?
> > >
> > > You were saying that every minute a addSeedDocuments is being called,
> > > correct?  It sounds to me like you are running this job in continuous
> > crawl
> > > mode.  Can you try running the job in non-continuous mode, and just
> > > repeating the job run once it completes?
> > >
> > > The reason I ask is because continuous crawling has very unique kinds of
> > > ways of dealing with documents it has crawled.  It uses "exponential
> > > backoff" to schedule the next document crawl and that is probably why you
> > > see the documents in the queue but not being processed; you simply
> > haven't
> > > waited long enough.
> > >
> > > Karl
> > >
> > > Karl
> > >
> > >
> > > On Fri, May 24, 2019 at 5:36 PM Raman Gupta 
> > wrote:
> > >
> > > > Here are my addSeedDocuments and processDocuments methods simplifying
> > > > them down to the minimum necessary to show what is happening:
> > > >
> > > > @Override
> > > > public String addSeedDocuments(ISeedingActivity activities,
> > Specification
> > > > spec,
> > > >String lastSeedVersion, long seedTime,
> > > > int jobMode)
> > > >   throws ManifoldCFException, ServiceInterruption
> > > > {
> > > >   // return the same 3 docs every time, simulating an initial load, and
> > > 

[jira] [Updated] (CONNECTORS-1609) SharePoint connector ignore 503 errors

2019-05-27 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1609:

Attachment: CONNECTORS-1609.diff

> SharePoint connector ignore 503 errors
> --
>
> Key: CONNECTORS-1609
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1609
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Attachments: CONNECTORS-1609.diff, CONNECTORS-1609.diff
>
>
> During a job, it may occur, for some reasons (related to the SharePoint 
> server configuration), that some resources of a SP site are not available 
> (for instance if it requires some credentials to open a resource). In that 
> case, the SP connector gets a 403 or a 503 response code from the SharePoint. 
> The problem is that whenever it gets this kind of response code, the job is 
> aborted with an error. 
> Since the response codes are clearly identified (403 and 503), it would be 
> better that, at least for a 503 error, the connector ignores it, continues 
> the job, and log something into the repo history instead of aborting the job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CONNECTORS-1609) SharePoint connector ignore 503 errors

2019-05-27 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849069#comment-16849069
 ] 

Julien Massiera edited comment on CONNECTORS-1609 at 5/27/19 4:30 PM:
--

[~kwri...@metacarta.com], no the patch does not address the problem. It does 
not focus on the concerned code lines  : 

Error fetching document xx': 503

    at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.fetchAndIndexFile(SharePointRepository.java:1778)
 ~[?:?]

    at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1542)
 ~[?:?]

    at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]


was (Author: julienfl):
[~kwri...@metacarta.com], no the patch does not addresses the problem. It does 
not focus on the concerned code lines  : 

Error fetching document xx': 503

    at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.fetchAndIndexFile(SharePointRepository.java:1778)
 ~[?:?]

    at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1542)
 ~[?:?]

    at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]

> SharePoint connector ignore 503 errors
> --
>
> Key: CONNECTORS-1609
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1609
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Attachments: CONNECTORS-1609.diff
>
>
> During a job, it may occur, for some reasons (related to the SharePoint 
> server configuration), that some resources of a SP site are not available 
> (for instance if it requires some credentials to open a resource). In that 
> case, the SP connector gets a 403 or a 503 response code from the SharePoint. 
> The problem is that whenever it gets this kind of response code, the job is 
> aborted with an error. 
> Since the response codes are clearly identified (403 and 503), it would be 
> better that, at least for a 503 error, the connector ignores it, continues 
> the job, and log something into the repo history instead of aborting the job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1609) SharePoint connector ignore 503 errors

2019-05-27 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849069#comment-16849069
 ] 

Julien Massiera commented on CONNECTORS-1609:
-

[~kwri...@metacarta.com], no the patch does not addresses the problem. It does 
not focus on the concerned code lines  : 

Error fetching document xx': 503

    at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.fetchAndIndexFile(SharePointRepository.java:1778)
 ~[?:?]

    at 
org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1542)
 ~[?:?]

    at 
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
[mcf-pull-agent.jar:?]

> SharePoint connector ignore 503 errors
> --
>
> Key: CONNECTORS-1609
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1609
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Attachments: CONNECTORS-1609.diff
>
>
> During a job, it may occur, for some reasons (related to the SharePoint 
> server configuration), that some resources of a SP site are not available 
> (for instance if it requires some credentials to open a resource). In that 
> case, the SP connector gets a 403 or a 503 response code from the SharePoint. 
> The problem is that whenever it gets this kind of response code, the job is 
> aborted with an error. 
> Since the response codes are clearly identified (403 and 503), it would be 
> better that, at least for a 503 error, the connector ignores it, continues 
> the job, and log something into the repo history instead of aborting the job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1609) SharePoint connector ignore 503 errors

2019-05-27 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1609:

Attachment: CONNECTORS-1609.diff

> SharePoint connector ignore 503 errors
> --
>
> Key: CONNECTORS-1609
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1609
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Attachments: CONNECTORS-1609.diff
>
>
> During a job, it may occur, for some reasons (related to the SharePoint 
> server configuration), that some resources of a SP site are not available 
> (for instance if it requires some credentials to open a resource). In that 
> case, the SP connector gets a 403 or a 503 response code from the SharePoint. 
> The problem is that whenever it gets this kind of response code, the job is 
> aborted with an error. 
> Since the response codes are clearly identified (403 and 503), it would be 
> better that, at least for a 503 error, the connector ignores it, continues 
> the job, and log something into the repo history instead of aborting the job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1609) SharePoint connector ignore 503 errors

2019-05-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848719#comment-16848719
 ] 

Karl Wright commented on CONNECTORS-1609:
-

Patched attached.  Please try and tell me whether it addresses your problem.


> SharePoint connector ignore 503 errors
> --
>
> Key: CONNECTORS-1609
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1609
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Attachments: CONNECTORS-1609.diff
>
>
> During a job, it may occur, for some reasons (related to the SharePoint 
> server configuration), that some resources of a SP site are not available 
> (for instance if it requires some credentials to open a resource). In that 
> case, the SP connector gets a 403 or a 503 response code from the SharePoint. 
> The problem is that whenever it gets this kind of response code, the job is 
> aborted with an error. 
> Since the response codes are clearly identified (403 and 503), it would be 
> better that, at least for a 503 error, the connector ignores it, continues 
> the job, and log something into the repo history instead of aborting the job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CONNECTORS-1609) SharePoint connector ignore 503 errors

2019-05-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848713#comment-16848713
 ] 

Karl Wright edited comment on CONNECTORS-1609 at 5/27/19 8:20 AM:
--

As discussed in email, 403 actually means something already: "permission 
denied".  As such it is returned when the credentials provided are incorrect.  
It would be a bad idea to make the connector just keep going when it receives 
this error code, in my opinion.



was (Author: kwri...@metacarta.com):
As discussed in email, 403 actually means something already: "permission 
denied".  As such it is returned when the credentials provided are incorrect.  
It would be a bad idea to make the connector just keep going when it receives 
these error codes, in my opinion.


> SharePoint connector ignore 503 errors
> --
>
> Key: CONNECTORS-1609
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1609
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
>
> During a job, it may occur, for some reasons (related to the SharePoint 
> server configuration), that some resources of a SP site are not available 
> (for instance if it requires some credentials to open a resource). In that 
> case, the SP connector gets a 403 or a 503 response code from the SharePoint. 
> The problem is that whenever it gets this kind of response code, the job is 
> aborted with an error. 
> Since the response codes are clearly identified (403 and 503), it would be 
> better that, at least for a 503 error, the connector ignores it, continues 
> the job, and log something into the repo history instead of aborting the job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1609) SharePoint connector ignore 503 errors

2019-05-27 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848713#comment-16848713
 ] 

Karl Wright commented on CONNECTORS-1609:
-

As discussed in email, 403 actually means something already: "permission 
denied".  As such it is returned when the credentials provided are incorrect.  
It would be a bad idea to make the connector just keep going when it receives 
these error codes, in my opinion.


> SharePoint connector ignore 503 errors
> --
>
> Key: CONNECTORS-1609
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1609
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
>
> During a job, it may occur, for some reasons (related to the SharePoint 
> server configuration), that some resources of a SP site are not available 
> (for instance if it requires some credentials to open a resource). In that 
> case, the SP connector gets a 403 or a 503 response code from the SharePoint. 
> The problem is that whenever it gets this kind of response code, the job is 
> aborted with an error. 
> Since the response codes are clearly identified (403 and 503), it would be 
> better that, at least for a 503 error, the connector ignores it, continues 
> the job, and log something into the repo history instead of aborting the job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CONNECTORS-1609) SharePoint connector ignore 503 errors

2019-05-27 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1609:
---

Assignee: Karl Wright

> SharePoint connector ignore 503 errors
> --
>
> Key: CONNECTORS-1609
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1609
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
>
> During a job, it may occur, for some reasons (related to the SharePoint 
> server configuration), that some resources of a SP site are not available 
> (for instance if it requires some credentials to open a resource). In that 
> case, the SP connector gets a 403 or a 503 response code from the SharePoint. 
> The problem is that whenever it gets this kind of response code, the job is 
> aborted with an error. 
> Since the response codes are clearly identified (403 and 503), it would be 
> better that, at least for a 503 error, the connector ignores it, continues 
> the job, and log something into the repo history instead of aborting the job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1609) SharePoint connector ignore 503 errors

2019-05-27 Thread Julien Massiera (JIRA)
Julien Massiera created CONNECTORS-1609:
---

 Summary: SharePoint connector ignore 503 errors
 Key: CONNECTORS-1609
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1609
 Project: ManifoldCF
  Issue Type: Improvement
  Components: SharePoint connector
Affects Versions: ManifoldCF 2.12
Reporter: Julien Massiera


During a job, it may occur, for some reasons (related to the SharePoint server 
configuration), that some resources of a SP site are not available (for 
instance if it requires some credentials to open a resource). In that case, the 
SP connector gets a 403 or a 503 response code from the SharePoint. The problem 
is that whenever it gets this kind of response code, the job is aborted with an 
error. 

Since the response codes are clearly identified (403 and 503), it would be 
better that, at least for a 503 error, the connector ignores it, continues the 
job, and log something into the repo history instead of aborting the job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)