[jira] [Comment Edited] (SQOOP-1803) JobManager and Execution Engine changes: Support for a injecting and pulling out configs and job output in connectors

Veena Basavaraj (JIRA) Mon, 16 Mar 2015 14:57:34 -0700

    [ 
https://issues.apache.org/jira/browse/SQOOP-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363551#comment-14363551
 ]


Veena Basavaraj edited comment on SQOOP-1803 at 3/16/15 9:56 PM:
-----------------------------------------------------------------

To clarify my earlier point I made a few days ago that I do not see caught your 
attention [~jarcec], Here are the details.

MutableContext today is not persisted, it allows certain types like int/ 
long/boolean/String. My question was we should allow even a list/ map or any 
object to be stored in here. The key - value pairs are already uniquely 
identified, so any config is underneath a key /value pair and we can keep this 
interface to update or overwrite any of these config values. I do not see a 
need for a special API for doing this. 

The only additional change is to look up this context map that the intiializer 
has already set and then persist them. We can add a new property to indicate if 
this "transient or persistent" value for the context so we dont end up shoving 
everything in this object into the repository. Makes sense?

Second, most important point, the code I posted above in the JobManager class 
..happens only when the job has completed successfully so there is no need to 
worry about any synchronization issues at this point
{code}
      RepositoryManager.getInstance().getRepository().updateJobConfig( ...)

{code}

few more details, after thinking through this more,  My thought when I first 
used the distributed cache, was to do this update in the "output committer" 
since it is ensured to be called "once". Similar to how the current 
SqoopDestroyerExecutor is invoked, we need to have the 
MutableContextPesistExecutor or something along those lines that will invoke 
the code to persist the context into the repository. This probably is the only 
point in the job flow where we are ensured to run once.  The advantage of 
storing the state in hdfs/cache files would mean that we have access to this 
context/ state in the JobContext object, but it should not be too hard to pass 
the "MutableContext" object in this at the beginning of the job.


was (Author: vybs):
To clarify my earlier point I made a few days ago that I do not see caught your 
attention [~jarcec], Here are the details.

MutableContext today is not persisted, it allows certain types like int/ 
long/boolean/String. My question was we should allow even a list/ map or any 
object to be stored in here. The key - value pairs are already uniquely 
identified, so any config is underneath a key /value pair and we can keep this 
interface to update or overwrite any of these config values. I do not see a 
need for a special API for doing this. 

The only additional change is to look up this context map that the intiializer 
has already set and then persist them. We can add a new property to indicate if 
this "transient or persistent" value for the context so we dont end up shoving 
everything in this object into the repository. Makes sense?

Second, most important point, the code I posted above in the JobManager class 
..happens only when the job has completed successfully so there is no need to 
worry about any synchronization issues at this point
{code}
      RepositoryManager.getInstance().getRepository().updateJobConfig( ...)

{code}

few more details, after thinking through. My thought when I first used the 
distributed cache, was to do this update in the "output committer" since it is 
ensured to be called "once", similar to how the current SqoopDestroyerExecutor 
is invoked, we need to have the MutableContextPesistExecutor or something along 
those lines.


For this 

> JobManager and Execution Engine changes: Support for a injecting and pulling 
> out configs and job output in connectors 
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1803
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1803
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.6
>
>
> The details are in the design wiki, as the implementation happens more 
> discussions can happen here.
> https://cwiki.apache.org/confluence/display/SQOOP/Delta+Fetch+And+Merge+Design#DeltaFetchAndMergeDesign-Howtogetoutputfromconnectortosqoop?
> The goal is to dynamically inject a IncrementalConfig instance into the 
> FromJobConfiguration. The current MFromConfig and MToConfig can already hold 
> a list of configs, and a strong sentiment was expressed to keep it as a list, 
> why not for the first time actually make use of it and group the incremental 
> related configs in one config object
> This task will prepare the FromJobConfiguration from the job config data, 
> ExtractorContext with the relevant values from the prev job run 
> This task will prepare the ToJobConfiguration from the job config data, 
> LoaderContext with the relevant values from the prev job run if any
> We will use DistributedCache to get State information from the Extractor and 
> Loader out and finally persist it into the sqoop repository depending on 
> SQOOP-1804 once the outputcommitter commit is called



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (SQOOP-1803) JobManager and Execution Engine changes: Support for a injecting and pulling out configs and job output in connectors

Reply via email to