[ 
https://issues.apache.org/jira/browse/SQOOP-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243794#comment-14243794
 ] 

Veena Basavaraj commented on SQOOP-1878:
----------------------------------------

After offline discussion with Gwen, got some insights into cache. To be very 
precise the current JIRA description did not propose any cache, for a simple 
reason that that it seems a overkill to me.

The change suggested here was very simple and straightforward

1. Correct the implementation of GET API to be true to its semantic meaning. DO 
NO do updates in a get call
2. If the requirement is get a a more real time status update, make it a action 
API on submission (POST / PUT), where the user will invoke the update 
submission and in turn we will make the MR call to get the runningJob details
      RunningJob runningJob = jobClient.getJob(JobID.forName(externalJobId));

PS: Another optimization to fix is not to make this call 5 times as it does 
below, each api call to status/ error make the MR call 
{code}
String externalJobId = submission.getExternalJobId();
    SubmissionStatus newStatus = submissionEngine.status(externalJobId);
    SubmissionError error = submissionEngine.error(externalJobId);
    String externalLink = submissionEngine.externalLink(externalJobId);

    if (newStatus.isRunning()) {
      progress = submissionEngine.progress(externalJobId);
    } else {
      counters = submissionEngine.counters(externalJobId);
    }
{code}

3. if we dont need both get and update, thats also reasonable and lets fix the 
API to be a post/ put a one line change. or a another few lines of change to 
add the updateAndGetSubmission API

4. I do not have stats have what is the best interval for UpdateThread to poll, 
all I said is this can be made configurable from sqoop.properties if we want 
to, but seems orthogonal to my proposal here and not something that is directly 
related to this change
{code}
rivate class UpdateThread extends Thread {
    public UpdateThread() {
      super("UpdateThread");
    }

    public void run() {
      LOG.info("Starting submission manager update thread");

      while (running) {
        try {
          LOG.debug("Updating running submissions");

          // Let's get all running submissions from repository to check them out
          List<MSubmission> unfinishedSubmissions =
            RepositoryManager.getInstance().getRepository()
              .findUnfinishedSubmissions();

          for (MSubmission submission : unfinishedSubmissions) {
            update(submission);
          }

          Thread.sleep(updateSleep);
        } catch (InterruptedException e) {
          LOG.debug("Purge thread interrupted", e);
        }
      }

      LOG.info("Ending submission manager update thread");
    }
  }
}
{code}

I hope this makes it clear.

For the record, cache is not something I even proposed, so + 1 on proposal I 
did not even suggest threw me off the curve.

> JobManager status method is a GET call and should not performa update 
> operations
> --------------------------------------------------------------------------------
>
>                 Key: SQOOP-1878
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1878
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>         Attachments: SQOOP-1878.patch
>
>
> JobManager status method is a GET call and should not performa update 
> operations
> status () method on JobManager is get request from the rest API. it should 
> not ever do an update.
> if we need more faster updates of job, then it is best to create a new action 
> that can do it or reduce the updateSleep parameter to less than 5 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to