[
https://issues.apache.org/jira/browse/SQOOP-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269004#comment-16269004
]
ASF GitHub Bot commented on SQOOP-3264:
---------------------------------------
GitHub user michalklempa opened a pull request:
https://github.com/apache/sqoop/pull/40
SQOOP-3264 Hive types for JDBC timestamp, date and time types are
timestamp, timestamp and bigint respectively
This resolves https://issues.apache.org/jira/browse/SQOOP-3264
Although further testing on different databases is needed (help welcome).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/triviadata/sqoop
SQOOP-3264_date_to_hive_timestamp
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/sqoop/pull/40.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #40
----
commit ceefc33a6c5632f1300bddc73c20f7819959f10f
Author: Michal Klempa <[email protected]>
Date: 2017-11-28T16:30:28Z
SQOOP-3264 Hive types for JDBC timestamp, date and time types are
timestamp, timestamp and bigint respectively
----
> Import JDBC SQL date,time,timestamp to Hive as TIMESTAMP, BIGINT and TIMESTAMP
> ------------------------------------------------------------------------------
>
> Key: SQOOP-3264
> URL: https://issues.apache.org/jira/browse/SQOOP-3264
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Affects Versions: 1.4.6
> Reporter: Michal Klempa
> Priority: Minor
> Fix For: 1.4.7
>
>
> When importing JDBC SQL Types:
> {code}
> public final static int DATE = 91;
> public final static int TIME = 92;
> public final static int TIMESTAMP = 93;
> {code}
> Sqoop currently uses the org.apache.sqoop.hive.HiveTypes.toHiveType method,
> where all of these types are mapped to STRING type.
> Given that in fact, the JDBC value returned is of type Long, let me propose
> we can output the type for Hive as:
> {code}
> DATE -> TIMESTAMP
> TIME -> BIGINT
> TIMESTAMP -> TIMESTAMP
> {code}
> This is also in line with org.apache.sqoop.manager.ConnManager.toAvroType,
> where the type is
> {code}
> case Types.DATE:
> case Types.TIME:
> case Types.TIMESTAMP:
> return Type.LONG;
> {code}
> Some of the connectors override the toJavaType:
> {code}
> org.apache.sqoop.manager.SQLServerManager
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> which may indicate different handling.
> The SQLServerManager uses Java String as the output type, because of
> timezones.
> Same holds true for OraOopConnManager, although it has a separate
> configuration boolean value
> 'oraoop.timestamp.string' which controls whether the import will use
> timezones and convert date types
> to Java String, or timezones are going to be dropped and import will behave
> the 'sqoop way'.
> Both of these connectors already handle these types as String by default,
> proposed change would not affect them.
> Other connectors are needed to be checked.
> Some of the connectors override the toHiveType:
> {code}
> org.apache.sqoop.manager.oracle.OraOopConnManager
> {code}
> This connector uses the 'sqoop way':
> {code}
> String hiveType = super.toHiveType(sqlType);
> {code}
> and only when not resolved, the type used is decided:
> {code}
> if (hiveType == null) {
> // http://wiki.apache.org/hadoop/Hive/Tutorial#Primitive_Types
> if (sqlType == OraOopOracleQueries.getOracleType("BFILE")
> || sqlType == OraOopOracleQueries.getOracleType("INTERVALYM")
> || sqlType == OraOopOracleQueries.getOracleType("INTERVALDS")
> || sqlType == OraOopOracleQueries.getOracleType("NCLOB")
> || sqlType == OraOopOracleQueries.getOracleType("NCHAR")
> || sqlType == OraOopOracleQueries.getOracleType("NVARCHAR")
> || sqlType == OraOopOracleQueries.getOracleType("OTHER")
> || sqlType == OraOopOracleQueries.getOracleType("ROWID")
> || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPTZ")
> || sqlType == OraOopOracleQueries.getOracleType("TIMESTAMPLTZ")
> || sqlType == OraOopOracleQueries.getOracleType("STRUCT")) {
> hiveType = "STRING";
> }
> if (sqlType == OraOopOracleQueries.getOracleType("BINARY_FLOAT")) {
> hiveType = "FLOAT";
> }
> if (sqlType == OraOopOracleQueries.getOracleType("BINARY_DOUBLE")) {
> hiveType = "DOUBLE";
> }
> }
> {code}
> This code is affected with proposed change. As the Hive TIMESTAMP is
> timezone-less, we have to change the handling in this method - respect the
> property 'oraoop.timestamp.string' - if true, output STRING hive type, if
> false, go with 'sqoop way'.
> The Hive Type is only used when generating the table ddl (create statement)
> and Hive can properly recognize the JDBC compliant java.sql.Timestamp format
> "YYYY-MM-DD HH:MM:SS.fffffffff", so no connector should be affected in a way,
> that Hive would not read the resulting column values.
> However, thorough testing should be done on all connectors before releasing
> any column type behavior changes.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)