[ https://issues.apache.org/jira/browse/SQOOP-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Veena Basavaraj reassigned SQOOP-1719: -------------------------------------- Assignee: (was: Veena Basavaraj) > Schema Validation Rules between From and To Schema > -------------------------------------------------- > > Key: SQOOP-1719 > URL: https://issues.apache.org/jira/browse/SQOOP-1719 > Project: Sqoop > Issue Type: Sub-task > Components: sqoop2-framework > Reporter: Veena Basavaraj > Fix For: 2.0.0 > > > Today we have a Matcher code that checks for existence of atleast one schema. > {code} > public Matcher(Schema fromSchema, Schema toSchema) { > if (fromSchema.isEmpty() && toSchema.isEmpty()) { > throw new SqoopException(MatcherError.MATCHER_0000, "Neither a FROM or > TO schemas been provided."); > } else if (toSchema.isEmpty()) { > this.fromSchema = fromSchema; > this.toSchema = fromSchema; > } else if (fromSchema.isEmpty()) { > this.fromSchema = toSchema; > this.toSchema = toSchema; > } else { > this.fromSchema = fromSchema; > this.toSchema = toSchema; > } > } > {code} > if both exist, then in addition to this we need to validate that they both > are compatible. > Today we have some logic around matchers to use based on the presense and > absence of the from and to schemas > {code} > public class MatcherFactory { > public static Matcher getMatcher(Schema fromSchema, Schema toSchema) { > if (toSchema.isEmpty() || fromSchema.isEmpty()) { > return new LocationMatcher(fromSchema, toSchema); > } else { > return new NameMatcher(fromSchema, toSchema); > } > } > } > {code} > But the above can be extended to further elaborate the rules and the order in > while these rules will and should be applied. Having this in Sqoop internals > means we better have a good story on how schema matching works > For instance if we have From schema with a one column of type String and then > a To schema with one column of type INTEGER, then we should warn/ fail to > even start the JOB since it might not be recommended . These validation rules > are not documented in Sqoop and if implemented should be configurable if > possible externally per job. > Second, such validation should happen before the job is submitted. But for > that we need to get the schemas. so It may not be not be possible to avoid > starting the job. > NOTE: In 1.99.5 we do not yet support transformations, hence the schema's for > the FROM and TO are static, i,e there is no way during the job execution for > the TO source to tell that it would like to store the varchar data as binary. > If the FROM source has the "varchar" type, we will validate that the "TO" > source is storing this in varchar or a suitable compatible type in the "TO" > data source as well. > If we allow transformation layer in between post 1.99.5 then we potentially > can have the schema change during the job execution phase i.e have dynamic > schema's created per job -- This message was sent by Atlassian JIRA (v6.3.4#6332)