[ 
https://issues.apache.org/jira/browse/SQOOP-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Veena Basavaraj reassigned SQOOP-1719:
--------------------------------------

    Assignee:     (was: Veena Basavaraj)

> Schema Validation Rules between From and To Schema
> --------------------------------------------------
>
>                 Key: SQOOP-1719
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1719
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: sqoop2-framework
>            Reporter: Veena Basavaraj
>             Fix For: 2.0.0
>
>
> Today we have a Matcher code that checks for existence of atleast one schema.
> {code}
> public Matcher(Schema fromSchema, Schema toSchema) {
>     if (fromSchema.isEmpty() && toSchema.isEmpty()) {
>       throw new SqoopException(MatcherError.MATCHER_0000, "Neither a FROM or 
> TO schemas been provided.");
>     } else if (toSchema.isEmpty()) {
>       this.fromSchema = fromSchema;
>       this.toSchema = fromSchema;
>     } else if (fromSchema.isEmpty()) {
>       this.fromSchema = toSchema;
>       this.toSchema = toSchema;
>     } else {
>       this.fromSchema = fromSchema;
>       this.toSchema = toSchema;
>     }
>   }
> {code}
> if both exist, then in addition to this we need to validate that they both 
> are compatible.
> Today we have some logic around matchers to use based on the presense  and 
> absence of the from and to schemas
> {code}
> public class MatcherFactory {
>   public static Matcher getMatcher(Schema fromSchema, Schema toSchema) {
>     if (toSchema.isEmpty() || fromSchema.isEmpty()) {
>       return new LocationMatcher(fromSchema, toSchema);
>     } else {
>       return new NameMatcher(fromSchema, toSchema);
>     }
>   }
> }
> {code}
> But the above can be extended to further elaborate the rules and the order in 
> while these rules will and should be applied. Having this in Sqoop internals 
> means we better have a good story on how schema matching works
> For instance if we have From schema with a one column of type String and then 
> a To schema with one column of type INTEGER, then we should warn/ fail to 
> even start the JOB since it might not be recommended . These validation rules 
> are not documented in Sqoop and if implemented should be configurable if 
> possible externally per job.
> Second, such validation should happen before the job is submitted. But for 
> that we need to get the schemas. so It may not be not be possible to avoid 
> starting the job.
> NOTE: In 1.99.5 we do not yet support transformations, hence the schema's for 
> the FROM and TO are static, i,e there is no way during the job execution for 
> the TO source to tell that it would like to store the varchar data as binary. 
>  If the FROM source has the "varchar" type, we will validate that the "TO" 
> source is storing this in varchar or a suitable compatible type in the "TO" 
> data source as well.
> If we allow transformation layer in between post 1.99.5 then we potentially 
> can have the schema change during the job execution phase i.e have dynamic 
> schema's created per job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to