[ 
https://issues.apache.org/jira/browse/HIVE-26350?focusedWorklogId=794261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794261
 ]

ASF GitHub Bot logged work on HIVE-26350:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Jul/22 15:05
            Start Date: 22/Jul/22 15:05
    Worklog Time Spent: 10m 
      Work Description: zabetak opened a new pull request, #3470:
URL: https://github.com/apache/hive/pull/3470

   ### What changes were proposed in this pull request and why?
   1. Introduce new API `DatabaseAccessor#getColumnTypes` to: i) allow fetching 
column types from the database; ii) align with the code using 
`DatabaseAccessor#getColumnNames`.
   2. Use the new API to find the type of the partition column in 
`JdbcInputFormat` since information is not propagated correctly to 
`LIST_COLUMN_TYPES` and leads to IOBE.
   3. Some refactoring in `GenericJdbcDatabaseAccessor` to avoid duplicate code 
with the introduction of the new API.
   4. Add test reproducing the IOBE problem, and tests for the new API. 
   5. Adapt existing accessor,Jdbc format tests based on the changes.
   
   ### Does this PR introduce _any_ user-facing change?
   Solves the IOBE problem described in HIVE-26350
   
   ### How was this patch tested?
   ```
   mvn test -pl itests/qtest -Pitests -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=jdbc_partition_table_pruned_pcolumn.q,external_jdbc_table_partition.q,external_jdbc_table_typeconversion.q
   mvn test -pl jdbc-handler 
-Dtest=TestJdbcInputFormat,TestGenericJdbcDatabaseAccessor
   ```




Issue Time Tracking
-------------------

    Worklog Id:     (was: 794261)
    Time Spent: 20m  (was: 10m)

> IndexOutOfBoundsException when generating splits for external JDBC table with 
> partition columns
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-26350
>                 URL: https://issues.apache.org/jira/browse/HIVE-26350
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, JDBC storage handler
>            Reporter: Stamatis Zampetakis
>            Assignee: Soumyakanti Das
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: cbo_plan.txt, explain_plan.txt, 
> jdbc_join_with_partition_table.q
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Create the following table in some JDBC database (e.g., Postgres).
> {code:sql}
> CREATE TABLE country
> (
>     id   int,
>     name varchar(20)
> );
> {code}
> Create the following tables in Hive ensuring that the external JDBC table has 
> the {{hive.sql.partitionColumn}} table property set.
> {code:sql}
> CREATE TABLE city (id int);
> CREATE EXTERNAL TABLE country
> (
>     id int,
>     name varchar(20)
> )
> STORED BY                                          
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (                                    
>     "hive.sql.database.type" = "POSTGRES",
>     "hive.sql.jdbc.driver" = "org.postgresql.Driver",
>     "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
>     "hive.sql.dbcp.username" = "qtestuser",
>     "hive.sql.dbcp.password" = "qtestpassword",
>     "hive.sql.table" = "country",
>     "hive.sql.partitionColumn" = "name",
>     "hive.sql.numPartitions" = "2"
> );
> {code}
> The query below fails with IndexOutOfBoundsException when the mapper scanning 
> the JDBC table tries to generate the splits by exploiting the partitioning 
> column.
> {code:sql}
> select country.id from country cross join city;
> {code}
> The full stack trace is given below.
> {noformat}
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>         at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_261]
>         at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_261]
>         at 
> org.apache.hive.storage.jdbc.JdbcInputFormat.getSplits(JdbcInputFormat.java:102)
>  [hive-jdbc-handler-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:564)
>  [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:858)
>  [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:263)
>  [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
>         at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:281)
>  [tez-dag-0.10.1.jar:0.10.1]
>         at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272)
>  [tez-dag-0.10.1.jar:0.10.1]
>         at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_261]
>         at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_261]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  [hadoop-common-3.1.0.jar:?]
>         at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272)
>  [tez-dag-0.10.1.jar:0.10.1]
>         at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256)
>  [tez-dag-0.10.1.jar:0.10.1]
>         at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>  [guava-19.0.jar:?]
>         at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>  [guava-19.0.jar:?]
>         at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>  [guava-19.0.jar:?]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_261]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_261]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to