[ https://issues.apache.org/jira/browse/SPARK-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-7270: ----------------------------- Assignee: Liang-Chi Hsieh > StringType dynamic partition cast to DecimalType in Spark Sql Hive > ------------------------------------------------------------------- > > Key: SPARK-7270 > URL: https://issues.apache.org/jira/browse/SPARK-7270 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Feixiang Yan > Assignee: Liang-Chi Hsieh > Fix For: 1.4.0 > > > Create a hive table with two partitons,the first type is bigint and the > second type is string.When insert overwrite the table with one static > partiton and one dynamic partiton, the second StringType dynamic partition > will be cast to DecimalType. > {noformat} > desc test; > OK > a string None > b bigint None > c string None > > # Partition Information > # col_name data_type comment > > b bigint None > c string NoneĀ· > {noformat} > when run following hive sql in HiveContext > {noformat}sqlContext.sql("insert overwrite table test partition (b=1,c) > select 'a','c' from ptest"){noformat} > get the result of partition is > {noformat}test[1,__HIVE_DEFAULT_PARTITION__]{noformat} > spark log > {noformat}15/04/30 10:38:09 WARN HiveConf: DEPRECATED: > hive.metastore.ds.retry.* no longer has any effect. Use > hive.hmshandler.retry.* instead > 15/04/30 10:38:09 INFO ParseDriver: Parsing command: insert overwrite table > test partition (b=1,c) select 'a','c' from ptest > 15/04/30 10:38:09 INFO ParseDriver: Parse Completed > 15/04/30 10:38:09 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no > longer has any effect. Use hive.hmshandler.retry.* instead > 15/04/30 10:38:10 INFO HiveMetaStore: 0: Opening raw store with implemenation > class:org.apache.hadoop.hive.metastore.ObjectStore > 15/04/30 10:38:10 INFO ObjectStore: ObjectStore, initialize called > 15/04/30 10:38:10 INFO Persistence: Property datanucleus.cache.level2 unknown > - will be ignored > 15/04/30 10:38:10 INFO Persistence: Property > hive.metastore.integral.jdo.pushdown unknown - will be ignored > 15/04/30 10:38:10 WARN Connection: BoneCP specified but not present in > CLASSPATH (or one of dependencies) > 15/04/30 10:38:10 WARN Connection: BoneCP specified but not present in > CLASSPATH (or one of dependencies) > 15/04/30 10:38:11 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no > longer has any effect. Use hive.hmshandler.retry.* instead > 15/04/30 10:38:11 INFO ObjectStore: Setting MetaStore object pin classes with > hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" > 15/04/30 10:38:11 INFO Datastore: The class > "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as > "embedded-only" so does not have its own datastore table. > 15/04/30 10:38:11 INFO Datastore: The class > "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" > so does not have its own datastore table. > 15/04/30 10:38:12 INFO Datastore: The class > "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as > "embedded-only" so does not have its own datastore table. > 15/04/30 10:38:12 INFO Datastore: The class > "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" > so does not have its own datastore table. > 15/04/30 10:38:12 INFO Query: Reading in results for query > "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is > closing > 15/04/30 10:38:12 INFO ObjectStore: Initialized ObjectStore > 15/04/30 10:38:12 INFO HiveMetaStore: Added admin role in metastore > 15/04/30 10:38:12 INFO HiveMetaStore: Added public role in metastore > 15/04/30 10:38:12 INFO HiveMetaStore: No user is added in admin role, since > config is empty > 15/04/30 10:38:12 INFO SessionState: No Tez session required at this point. > hive.execution.engine=mr. > 15/04/30 10:38:13 INFO HiveMetaStore: 0: get_table : db=default tbl=test > 15/04/30 10:38:13 INFO audit: ugi=root ip=unknown-ip-addr > cmd=get_table : db=default tbl=test > 15/04/30 10:38:13 INFO HiveMetaStore: 0: get_partitions : db=default tbl=test > 15/04/30 10:38:13 INFO audit: ugi=root ip=unknown-ip-addr > cmd=get_partitions : db=default tbl=test > 15/04/30 10:38:13 INFO HiveMetaStore: 0: get_table : db=default tbl=ptest > 15/04/30 10:38:13 INFO audit: ugi=root ip=unknown-ip-addr > cmd=get_table : db=default tbl=ptest > 15/04/30 10:38:13 INFO HiveMetaStore: 0: get_partitions : db=default tbl=ptest > 15/04/30 10:38:13 INFO audit: ugi=root ip=unknown-ip-addr > cmd=get_partitions : db=default tbl=ptest > 15/04/30 10:38:13 INFO deprecation: mapred.map.tasks is deprecated. Instead, > use mapreduce.job.maps > 15/04/30 10:38:13 INFO MemoryStore: ensureFreeSpace(451930) called with > curMem=0, maxMem=2291041566 > 15/04/30 10:38:13 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 441.3 KB, free 2.1 GB) > 15/04/30 10:38:13 INFO MemoryStore: ensureFreeSpace(71321) called with > curMem=451930, maxMem=2291041566 > 15/04/30 10:38:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes > in memory (estimated size 69.6 KB, free 2.1 GB) > 15/04/30 10:38:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on 10.134.72.169:45859 (size: 69.6 KB, free: 2.1 GB) > 15/04/30 10:38:13 INFO BlockManagerMaster: Updated info of block > broadcast_0_piece0 > 15/04/30 10:38:13 INFO SparkContext: Created broadcast 0 from broadcast at > TableReader.scala:68 > 15/04/30 10:38:13 INFO deprecation: mapred.output.compress is deprecated. > Instead, use mapreduce.output.fileoutputformat.compress > 15/04/30 10:38:13 INFO deprecation: mapred.output.compression.codec is > deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec > 15/04/30 10:38:13 INFO deprecation: mapred.output.compression.type is > deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type > 15/04/30 10:38:14 INFO deprecation: mapred.job.id is deprecated. Instead, use > mapreduce.job.id > 15/04/30 10:38:14 INFO deprecation: mapred.tip.id is deprecated. Instead, use > mapreduce.task.id > 15/04/30 10:38:14 INFO deprecation: mapred.task.id is deprecated. Instead, > use mapreduce.task.attempt.id > 15/04/30 10:38:14 INFO deprecation: mapred.task.is.map is deprecated. > Instead, use mapreduce.task.ismap > 15/04/30 10:38:14 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 15/04/30 10:38:14 INFO GPLNativeCodeLoader: Loaded native gpl library > 15/04/30 10:38:14 INFO LzoCodec: Successfully loaded & initialized native-lzo > library [hadoop-lzo rev 7041408c0d57cb3b6f51d004772ccf5073ecc95e] > 15/04/30 10:38:14 INFO FileInputFormat: Total input paths to process : 1 > 15/04/30 10:38:14 INFO SparkContext: Starting job: runJob at > InsertIntoHiveTable.scala:93 > 15/04/30 10:38:14 INFO DAGScheduler: Got job 0 (runJob at > InsertIntoHiveTable.scala:93) with 1 output partitions (allowLocal=false) > 15/04/30 10:38:14 INFO DAGScheduler: Final stage: Stage 0(runJob at > InsertIntoHiveTable.scala:93) > 15/04/30 10:38:14 INFO DAGScheduler: Parents of final stage: List() > 15/04/30 10:38:14 INFO DAGScheduler: Missing parents: List() > 15/04/30 10:38:14 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[5] > at mapPartitions at basicOperators.scala:43), which has no missing parents > 15/04/30 10:38:14 INFO MemoryStore: ensureFreeSpace(125560) called with > curMem=523251, maxMem=2291041566 > 15/04/30 10:38:14 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 122.6 KB, free 2.1 GB) > 15/04/30 10:38:14 INFO MemoryStore: ensureFreeSpace(82648) called with > curMem=648811, maxMem=2291041566 > 15/04/30 10:38:14 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes > in memory (estimated size 80.7 KB, free 2.1 GB) > 15/04/30 10:38:14 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on 10.134.72.169:45859 (size: 80.7 KB, free: 2.1 GB) > 15/04/30 10:38:14 INFO BlockManagerMaster: Updated info of block > broadcast_1_piece0 > 15/04/30 10:38:14 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:838 > 15/04/30 10:38:14 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 > (MapPartitionsRDD[5] at mapPartitions at basicOperators.scala:43) > 15/04/30 10:38:14 INFO YarnClientClusterScheduler: Adding task set 0.0 with 1 > tasks > 15/04/30 10:38:14 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > rsync.slave006.yarn.hadoop.sjs.sogou-op.org, NODE_LOCAL, 1794 bytes) > 15/04/30 10:38:14 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on rsync.slave006.yarn.hadoop.sjs.sogou-op.org:55678 (size: 80.7 KB, free: > 5.3 GB) > 15/04/30 10:38:16 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on rsync.slave006.yarn.hadoop.sjs.sogou-op.org:55678 (size: 69.6 KB, free: > 5.3 GB) > 15/04/30 10:38:17 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) > in 3152 ms on rsync.slave006.yarn.hadoop.sjs.sogou-op.org (1/1) > 15/04/30 10:38:17 INFO DAGScheduler: Stage 0 (runJob at > InsertIntoHiveTable.scala:93) finished in 3.162 s > 15/04/30 10:38:17 INFO YarnClientClusterScheduler: Removed TaskSet 0.0, whose > tasks have all completed, from pool > 15/04/30 10:38:17 INFO DAGScheduler: Job 0 finished: runJob at > InsertIntoHiveTable.scala:93, took 3.369777 s > 15/04/30 10:38:17 INFO HiveMetaStore: 0: partition_name_has_valid_characters > 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr > cmd=partition_name_has_valid_characters > 15/04/30 10:38:17 INFO HiveMetaStore: 0: partition_name_has_valid_characters > 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr > cmd=partition_name_has_valid_characters > 15/04/30 10:38:17 WARN UserGroupInformation: No groups available for user root > 15/04/30 10:38:17 WARN UserGroupInformation: No groups available for user root > 15/04/30 10:38:17 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no > longer has any effect. Use hive.hmshandler.retry.* instead > 15/04/30 10:38:17 INFO HiveMetaStore: 0: get_table : db=default tbl=test > 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr > cmd=get_table : db=default tbl=test > 15/04/30 10:38:17 INFO HiveMetaStore: 0: get_partition_with_auth : db=default > tbl=test[1,__HIVE_DEFAULT_PARTITION__] > 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr > cmd=get_partition_with_auth : db=default > tbl=test[1,__HIVE_DEFAULT_PARTITION__] > 15/04/30 10:38:17 INFO Hive: Replacing > src:hdfs://yarncluster/tmp/hive-root/hive_2015-04-30_10-38-13_846_3096248751564356035-1/-ext-10000/c=__HIVE_DEFAULT_PARTITION__;dest: > > hdfs://yarncluster/user/root/hive/warehouse/test/b=1/c=__HIVE_DEFAULT_PARTITION__;Status:true > 15/04/30 10:38:17 INFO HiveMetaStore: 0: get_partition_with_auth : db=default > tbl=test[1,__HIVE_DEFAULT_PARTITION__] > 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr > cmd=get_partition_with_auth : db=default > tbl=test[1,__HIVE_DEFAULT_PARTITION__] > 15/04/30 10:38:17 INFO HiveMetaStore: 0: append_partition : db=default > tbl=test[1,__HIVE_DEFAULT_PARTITION__] > 15/04/30 10:38:17 INFO audit: ugi=root ip=unknown-ip-addr > cmd=append_partition : db=default tbl=test[1,__HIVE_DEFAULT_PARTITION__] > > 15/04/30 10:38:17 WARN log: Updating partition stats fast for: test > 15/04/30 10:38:17 WARN log: Updated size to 10 > 15/04/30 10:38:17 INFO Hive: New loading path = > hdfs://yarncluster/tmp/hive-root/hive_2015-04-30_10-38-13_846_3096248751564356035-1/-ext-10000/c=__HIVE_DEFAULT_PARTITION__ > with partSpec {b=1, c=__HIVE_DEFAULT_PARTITION__} > res0: org.apache.spark.sql.SchemaRDD = > SchemaRDD[0] at RDD at SchemaRDD.scala:108 > == Query Plan == > == Physical Plan == > InsertIntoHiveTable (MetastoreRelation default, test, None), Map(b -> > Some(1), c -> None), true > Project [a AS _c0#0,CAST(CAST(c AS _c1#1, DecimalType()), LongType) AS > _c1#8L] > HiveTableScan [], (MetastoreRelation default, ptest, None), None > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org