[ 
https://issues.apache.org/jira/browse/HIVE-27065?focusedWorklogId=844842&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-844842
 ]

ASF GitHub Bot logged work on HIVE-27065:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Feb/23 17:03
            Start Date: 10/Feb/23 17:03
    Worklog Time Spent: 10m 
      Work Description: asolimando commented on PR #4050:
URL: https://github.com/apache/hive/pull/4050#issuecomment-1426092206

   > > @VenuReddy2103 thanks for your PR, I have a couple of questions: did you 
try with other backend DBs as well? Did you encounter a similar error? I am 
curious because this part of the code does not seem to be DB specific, so I'd 
rather confirm quickly if that's an issue only with MSSQL or if it's more 
general, and that we don't regress on other databases by using `setBytes()` in 
place of `setObject()`.
   > > Have you tried the same code when histogram is actually computed? Did it 
fail? I am asking because it might be an artifact of the uninitialized 
`histogram` field.
   > > Since the `bitvector` field is totally analogous (same data type, `byte 
array`, and written with `setObject()`), apart from the fact that it is always 
computed, I am wondering if that's not the real difference, as it's 
successfully written to MSSQL with the `.setObject()` call.
   > > Can you update here with the actual return value of 
`mPartitionColumnStatistics.getHistogram()` and of 
`mPartitionColumnStatistics.getBitVector()` at the time of the call by putting 
a breakpoint in the `insertIntoPartColStatTable` function? That would help shed 
some light on what's actually going on there.
   > > Thanks!
   > 
   > @asolimando Thanks for reviewing PR. Please find reply for queries.
   > 
   > 1. I've tried with Mysql, Postgresql, Oracle and Derby as well. This issue 
is not observed with them. It is observed only with MSSql. When histogram 
statistics is disabled, `histogram` field in `MPartitionColumnStatistics` is 
initialized default with empty byte array. And 
`mPartitionColumnStatistics.getHistogram()` returns null if `histogram.length` 
is 0. With `preparedStatement.setObject(18, 
mPartitionColumnStatistics.getHistogram())` invocation, since the object is 
null, `SQLServerPreparedStatement.setObject()` inferred the jdbc type as 
`JDBCType.CHAR`. Hence this issue. Whereas with `preparedStatement.setBytes()`, 
though null is passed, it don't try to infer type again. Have verified this 
change with MSSql, MySql, PostgreSql, Oracle and Derby.
   >    Have found a similar issue fix here - 
https://github.com/trinodb/trino/pull/4846/files#:~:text=statement.setBytes(index%2C%20null)%3B
   > 2. When the histogram is computed, it didn't fail because value is not 
null and driver could infer the type correctly.
   > 3. Yes this issue is not seen with `bitvector` as it is always computed.
   > 4. Debugger screenshots for `MPartitionColumnStatistics.getBitVector()` 
and `MPartitionColumnStatistics.getHistogram()`
   > 
   > <img alt="image" width="974" 
src="https://user-images.githubusercontent.com/35334869/218059642-40495bcf-a547-48cb-bc5d-0a21697e3c07.png";>
   > 
   > <img alt="image" width="800" 
src="https://user-images.githubusercontent.com/35334869/218060043-960bcad7-0578-4b9b-8183-068f9672eaf7.png";>
   
   @VenuReddy2103, thanks a lot for checking, it all makes sense to me.
   One last thing, could you try with the other metastore backend databases if 
we are still OK with the current fix, both in case of empty (null) and 
non-empty histogram array?
   
   If you can confirm that, then we are good to go.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 844842)
    Time Spent: 1h  (was: 50m)

> Exception in partition column statistics update with SQL Server db when 
> histogram statistics is not enabled
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27065
>                 URL: https://issues.apache.org/jira/browse/HIVE-27065
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Venugopal Reddy K
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> *[Description]* 
> java.sql.BatchUpdateException thrown from insertIntoPartColStatTable() with 
> SQL Server db when histogram statistics is not enabled.
> *java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.*
>  
> *[Steps to reproduce]* 
> Create stage table, load data into stage table, create partition table and 
> load data into the table from the stage table.
> {code:java}
> 0: jdbc:hive2://localhost:10000> create database mydb;
> 0: jdbc:hive2://localhost:10000> use mydb;
>  
> 0: jdbc:hive2://localhost:10000> create table stage(sr int, st string, name 
> string) row format delimited fields terminated by '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:10000> load data local inpath 'partdata' into table 
> stage;
>  
> 0: jdbc:hive2://localhost:10000> create table dynpart(num int, name string) 
> partitioned by (category string) row format delimited fields terminated by 
> '\t' stored as textfile;
>  
> 0: jdbc:hive2://localhost:10000> insert into dynpart select * from stage; 
> {code}
>  
> *[Exception Stack]*
> {code:java}
> 2023-02-10T05:16:42,921 ERROR [HiveServer2-Background-Pool: Thread-112] 
> metastore.DirectSqlUpdateStat: Unable to update Column stats for  dynpart
> java.sql.BatchUpdateException: Implicit conversion from data type varchar to 
> varbinary(max) is not allowed. Use the CONVERT function to run this query.
>     at 
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2303)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:127)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.insertIntoPartColStatTable(DirectSqlUpdateStat.java:281)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.updatePartitionColumnStatistics(DirectSqlUpdateStat.java:612)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:3063)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9943)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at com.sun.proxy.$Proxy29.updatePartitionColumnStatisticsInBatch(Unknown 
> Source) ~[?:?]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7068)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7121)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartColumnStatsWithMerge(HMSHandler.java:9247)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9149)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:146)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at com.sun.proxy.$Proxy31.set_aggr_stats_for(Unknown Source) ~[?:?]
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:3307)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.setPartitionColumnStatistics(SessionHiveMetaStoreClient.java:566)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:218)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at com.sun.proxy.$Proxy32.setPartitionColumnStatistics(Unknown Source) 
> ~[?:?]
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:5677)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:221)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:94)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:370) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_292]
>     at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_292]
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  ~[hadoop-common-3.1.0.jar:?]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_292]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_292]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_292]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_292]
>     at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to