I've attached a patch to the CONNECTORS-1242 ticket. Karl
On Thu, Sep 17, 2015 at 12:52 PM, Karl Wright <[email protected]> wrote: > I was able to reproduce this; CONNECTORS-1242. > > Karl > > > On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <[email protected]> wrote: > >> I'm interested in the time it is supposed to be processed, actually. >> >> I'm trying to recreate your example here to see if I can get more >> information. >> >> Karl >> >> >> >> On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall < >> [email protected]> wrote: >> >>> The document is in a state of 'Processed' and the status is 'Ready for >>> processing' >>> >>> -----Original Message----- >>> From: Karl Wright [mailto:[email protected]] >>> Sent: 17 September 2015 5:28 >>> To: dev >>> Subject: Re: Potential Issue with pausing jobs >>> >>> When it is in the state after the job has resumed, can you do a Document >>> Status report and tell me what that says for your document? >>> >>> Thanks, >>> Karl >>> >>> >>> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall < >>> [email protected]> wrote: >>> >>> > Hi Karl, >>> > >>> > Thanks for that. I think the problem might be more fundamental. When I >>> > start my job and monitor the simple job history I can see the job doing >>> > things like: >>> > >>> > Run the seed query >>> > Run the data query >>> > Run the seed query >>> > Run the data query >>> > >>> > Etc. >>> > >>> > It continues to do this indefinitely from what I have observed. As >>> soon as >>> > I pause and resume the job, all I can see in the simple job history is: >>> > >>> > Run the seed query >>> > Run the seed query >>> > Run the seed query >>> > >>> > It's like it's never going to run the data query again? >>> > >>> > Kind Regards, >>> > >>> > Niall >>> > >>> > -----Original Message----- >>> > From: Karl Wright [mailto:[email protected]] >>> > Sent: 17 September 2015 4:53 >>> > To: dev >>> > Subject: Re: Potential Issue with pausing jobs >>> > >>> > Hi Niall, >>> > >>> > A continuous job reseeds on a schedule, which you set as part of the >>> job >>> > setup. For a continuous job, if the document has been crawled, it >>> will be >>> > recrawled again at a specific time in the future, and if at that time >>> it >>> > hasn't changed, it will be scheduled for checking again even further >>> out, >>> > up to a certain limit (also settable within the job). >>> > >>> > You can look at the document's schedule, by the way, using the >>> "Document >>> > Status" report, and it should be pretty clear from that what should >>> happen >>> > and when. >>> > >>> > When you abort the job and restart it, everything is reset, so the >>> document >>> > will be checked immediately at that point, and relatively frequently >>> for a >>> > while until the system figures out that the document isn't changing >>> very >>> > rapidly. >>> > >>> > Thanks, >>> > Karl >>> > >>> > >>> > >>> > >>> > >>> > >>> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall < >>> > [email protected]> wrote: >>> > >>> > > Hi Karl, >>> > > >>> > > You'll have to forgive me if my answer is a bit uncertain but I am >>> very >>> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC >>> > > connector, I am literally just selecting 1 for the id, 'myurl' for >>> the >>> > url >>> > > and 'mydata' for the data. So there is only ever 1 document being >>> > processed. >>> > > >>> > > So to answer the questions: >>> > > >>> > > 1. There are 0 active documents on the queue. >>> > > 2. Single process >>> > > 3. Yes, this is a continuous crawl. >>> > > >>> > > Kind Regards, >>> > > >>> > > Niall >>> > > >>> > > -----Original Message----- >>> > > From: Karl Wright [mailto:[email protected]] >>> > > Sent: 17 September 2015 4:27 >>> > > To: dev >>> > > Subject: Re: Potential Issue with pausing jobs >>> > > >>> > > Hi Niall, >>> > > >>> > > Pausing and resuming a job should have no effects *other* than >>> > > reprioritization of the active documents on the queue, which if there >>> > are a >>> > > lot of them, may take some time. >>> > > >>> > > So let's ask some basic questions. (1) How many active documents on >>> your >>> > > queue? (2) What kind of synchronization are you using? Is this >>> single >>> > > process, or multiprocess? (3) Is this a continuous crawl? >>> > > >>> > > >>>>>> >>> > > And on a side note, what is the difference between pausing a job and >>> > > aborting a job? >>> > > <<<<<< >>> > > >>> > > I can't fully answer that unless I know the characteristics of your >>> job, >>> > > especially continuous crawl vs. crawl to completion. >>> > > >>> > > Karl >>> > > >>> > > >>> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall < >>> > > [email protected]> wrote: >>> > > >>> > > > Hi, >>> > > > >>> > > > I am experimenting with pausing a job. The job has a simple JDBC >>> > > > connection and a null output connection. I was experimenting with >>> > pausing >>> > > > the job and I notice that when I resume the job, and monitor it's >>> > > progress >>> > > > in the simple history report, the job never seems to run the data >>> query >>> > > any >>> > > > more. I can see that it runs the seed query but it doesn't >>> progress to >>> > > the >>> > > > data query. If I abort the job and restart it, it does seem to >>> start >>> > > > running the data query again. >>> > > > >>> > > > Can anyone explain this behaviour? And on a side note, what is the >>> > > > difference between pausing a job and aborting a job? >>> > > > >>> > > > Thanks, >>> > > > >>> > > > Niall >>> > > > >>> > > >>> > >>> >> >> >
