[jira] [Created] (HIVE-26188) Query level cache and HMS local cache doesn't work locally and with Explain statements.
Soumyakanti Das created HIVE-26188: -- Summary: Query level cache and HMS local cache doesn't work locally and with Explain statements. Key: HIVE-26188 URL: https://issues.apache.org/jira/browse/HIVE-26188 Project: Hive Issue Type: Bug Reporter: Soumyakanti Das Assignee: Soumyakanti Das {{ExplainSemanticAnalyzer}} should override {{startAnalysis()}} method that creates the query level cache. This is important because after https://issues.apache.org/jira/browse/HIVE-25918, the HMS local cache only works if the query level cache is also initialized. Also, {{data/conf/llap/hive-site.xml}} properties for the HMS cache are incorrect which should be fixed to enable the cache during qtests. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26187) Set operations and time travel is not working
Zoltán Borók-Nagy created HIVE-26187: Summary: Set operations and time travel is not working Key: HIVE-26187 URL: https://issues.apache.org/jira/browse/HIVE-26187 Project: Hive Issue Type: Bug Reporter: Zoltán Borók-Nagy Set operations doesn't work well with time travel queries. Repro: {noformat} select * from t FOR SYSTEM_VERSION AS OF MINUS select * from t FOR SYSTEM_VERSION AS OF ; {noformat} Returns 0 results because both selects use the same snapshot id, instead of snapshot_id_1 and snapshot_id_2. Probably there're issues with other queries as well, when the same table is used multiple times with different snapshot ids. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26186) Resultset returned by getTables does not order data per JDBC specification
N Campbell created HIVE-26186: - Summary: Resultset returned by getTables does not order data per JDBC specification Key: HIVE-26186 URL: https://issues.apache.org/jira/browse/HIVE-26186 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 3.1.3 Environment: !HiveMeta.png! Reporter: N Campbell Attachments: HiveMeta.png JDBC specification states that data in a Resultset must be ordered. A simple Java program issues a request to getTables ResultSet rs = dbMeta.getTables( {*}null{*}, "cert", "%", {*}null{*}); The Resultset is not order per JDBC spec [https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-] Happens with various releases including hive-jdbc-3.1.3000.7.1.7.0-551 hive-jdbc-3.1.3000.7.1.6.0-297 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HIVE-26185) Need support for metadataonly operations with iceberg (e.g select distinct on partition column)
Rajesh Balamohan created HIVE-26185: --- Summary: Need support for metadataonly operations with iceberg (e.g select distinct on partition column) Key: HIVE-26185 URL: https://issues.apache.org/jira/browse/HIVE-26185 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Rajesh Balamohan {noformat} select distinct ss_sold_date_sk from store_sales {noformat} This query scans 1800+ rows in hive acid. But takes ages to process with NullScanOptimiser during compilation phase (https://issues.apache.org/jira/browse/HIVE-24262) {noformat} Hive ACID INFO : Executing command(queryId=hive_20220427233926_282bc9d8-220c-4a09-928d-411601c2ef14): select distinct ss_sold_date_sk from store_sales INFO : Compute 'ndembla-test2' is active. INFO : Query ID = hive_20220427233926_282bc9d8-220c-4a09-928d-411601c2ef14 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20220427233926_282bc9d8-220c-4a09-928d-411601c2ef14 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select distinct ss_sold_date_s...store_sales (Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1651102345385_) INFO : Status: DAG finished successfully in 1.81 seconds INFO : DAG ID: dag_1651102345385__5 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATIONDURATION INFO : -- INFO : Compile Query 55.47s INFO : Prepare Plan2.32s INFO : Get Query Coordinator (AM) 0.13s INFO : Submit Plan 0.03s INFO : Start DAG 0.09s INFO : Run DAG 1.80s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 1009.00 0 0 1,8241,824 INFO : Reducer 2 0.00 0 0 1,8240 INFO : -- INFO : {noformat} However, same query scans *2.8 Billion records.* in iceberg format. This can be fixed. {noformat} INFO : Executing command(queryId=hive_20220427233519_cddc6dd1-95a3-4f0e-afa5-e11e9dc5fa72): select distinct ss_sold_date_sk from store_sales INFO : Compute 'ndembla-test2' is active. INFO : Query ID = hive_20220427233519_cddc6dd1-95a3-4f0e-afa5-e11e9dc5fa72 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20220427233519_cddc6dd1-95a3-4f0e-afa5-e11e9dc5fa72 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select distinct ss_sold_date_s...store_sales (Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1651102345385_) -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. llap SUCCEEDED 7141 714100 0 0 Reducer 2 .. llap SUCCEEDED 2 200 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 18.48 s -- INFO : Status: DAG finished successfully in 17.97 seconds INFO : DAG ID: dag_1651102345385__4 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATIONDURATION INFO : -- INFO : Compile Query 1.81s INFO : Prepare Plan0.04s INFO : Get Query Coordinator
[jira] [Created] (HIVE-26184) COLLECT_SET with GROUP BY is very slow when some keys are highly skewed
okumin created HIVE-26184: - Summary: COLLECT_SET with GROUP BY is very slow when some keys are highly skewed Key: HIVE-26184 URL: https://issues.apache.org/jira/browse/HIVE-26184 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.3, 2.3.8 Reporter: okumin Assignee: okumin I observed some reducers spend 98% of CPU time in invoking `java.util.HashMap#clear`. Looking the detail, I found COLLECT_SET reuses a LinkedHashSet and its `clear` can be quite heavy when a relation has a small number of highly skewed keys. To reproduce the issue, first, we will create rows with a skewed key. {code:java} INSERT INTO test_collect_set SELECT '----' AS key, CAST(UUID() AS VARCHAR) AS value FROM table_with_many_rows LIMIT 10;{code} Then, we will create many non-skewed rows. {code:java} INSERT INTO test_collect_set SELECT UUID() AS key, UUID() AS value FROM sample_datasets.nasdaq LIMIT 500;{code} We can observe the issue when we aggregate values by `key`. {code:java} SELECT key, COLLECT_SET(value) FROM group_by_skew GROUP BY key{code} -- This message was sent by Atlassian Jira (v8.20.7#820007)