[jira] [Created] (HIVE-27117) Fix compaction related flaky tests

2023-03-02 Thread Jira
László Végh created HIVE-27117:
--

 Summary: Fix compaction related flaky tests
 Key: HIVE-27117
 URL: https://issues.apache.org/jira/browse/HIVE-27117
 Project: Hive
  Issue Type: Task
Reporter: László Végh


The following tests turned out to be flaky recently:
 * 
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDropTableAndCompactionConcurrent
 * 
org.apache.hadoop.hive.ql.txn.compactor.TestCompactionMetrics.testInitiatorFailuresCountedCorrectly
 * 
org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testMajorCompactionNotPartitionedWithoutBuckets



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27118) implement array_intersect UDF in Hive

2023-03-02 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-27118:
-

 Summary: implement array_intersect UDF in Hive
 Key: HIVE-27118
 URL: https://issues.apache.org/jira/browse/HIVE-27118
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Reporter: Taraka Rama Rao Lethavadla
Assignee: Taraka Rama Rao Lethavadla


*array_intersect(array1, array2)*
{{Returns an array of the elements in the intersection of {{array1}} and 
{{{}array2{}}}, without duplicates.}}

 
{noformat}
> SELECT array_intersect(array(1, 2, 2, 3), array(1, 1, 3, 5));
[1,3]
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27119) Iceberg: Delete from table generates lot of files

2023-03-02 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27119:
---

 Summary: Iceberg: Delete from table generates lot of files
 Key: HIVE-27119
 URL: https://issues.apache.org/jira/browse/HIVE-27119
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Rajesh Balamohan


With "delete" it generates lot of files due to the way data is sent to the 
reducers. Files per partition is impacted by the number of reduce tasks.

One way could be to explicitly control the number of reducers; Creating this 
ticket to have a long term fix.
 
{noformat}
 explain delete from store_Sales where ss_customer_sk % 10 = 0;
INFO  : Compiling 
command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b): 
explain delete from store_Sales where ss_customer_sk % 10 = 0
INFO  : No Stats for tpcds_1000_iceberg_mor_v4@store_sales, Columns: 
ss_sold_time_sk, ss_cdemo_sk, ss_promo_sk, ss_ext_discount_amt, 
ss_ext_sales_price, ss_net_profit, ss_addr_sk, ss_ticket_number, 
ss_wholesale_cost, ss_item_sk, ss_ext_list_price, ss_sold_date_sk, ss_store_sk, 
ss_coupon_amt, ss_quantity, ss_list_price, ss_sales_price, ss_customer_sk, 
ss_ext_wholesale_cost, ss_net_paid, ss_ext_tax, ss_hdemo_sk, ss_net_paid_inc_tax
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b); Time 
taken: 0.704 seconds
INFO  : Executing 
command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b): 
explain delete from store_Sales where ss_customer_sk % 10 = 0
INFO  : Starting task [Stage-4:EXPLAIN] in serial mode
INFO  : Completed executing 
command(queryId=hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b); Time 
taken: 0.005 seconds
INFO  : OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b:377
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  DagName: hive_20230303021031_855dd644-8f67-482d-98d7-e9f70b56ae0b:377
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: ((ss_customer_sk % 10) = 0) (type: boolean)
  Statistics: Num rows: 2755519629 Data size: 3643899155232 
Basic stats: COMPLETE Column stats: NONE
  Filter Operator
predicate: ((ss_customer_sk % 10) = 0) (type: boolean)
Statistics: Num rows: 1377759814 Data size: 1821949576954 
Basic stats: COMPLETE Column stats: NONE
Select Operator
  expressions: PARTITION__SPEC__ID (type: int), 
PARTITION__HASH (type: bigint), FILE__PATH (type: string), ROW__POSITION (type: 
bigint), ss_sold_time_sk (type: int), ss_item_sk (type: int), ss_customer_sk 
(type: int), ss_cdemo_sk (type: int), ss_hdemo_sk (type: int), ss_addr_sk 
(type: int), ss_store_sk (type: int), ss_promo_sk (type: int), ss_ticket_number 
(type: bigint), ss_quantity (type: int), ss_wholesale_cost (type: 
decimal(7,2)), ss_list_price (type: decimal(7,2)), ss_sales_price (type: 
decimal(7,2)), ss_ext_discount_amt (type: decimal(7,2)), ss_ext_sales_price 
(type: decimal(7,2)), ss_ext_wholesale_cost (type: decimal(7,2)), 
ss_ext_list_price (type: decimal(7,2)), ss_ext_tax (type: decimal(7,2)), 
ss_coupon_amt (type: decimal(7,2)), ss_net_paid (type: decimal(7,2)), 
ss_net_paid_inc_tax (type: decimal(7,2)), ss_net_profit (type: decimal(7,2)), 
ss_sold_date_sk (type: int)
  outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, 
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, 
_col25, _col26
  Statistics: Num rows: 1377759814 Data size: 1821949576954 
Basic stats: COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: 
bigint), _col2 (type: string), _col3 (type: bigint)
null sort order: 
sort order: 
Statistics: Num rows: 1377759814 Data size: 
1821949576954 Basic stats: COMPLETE Column stats: NONE
value expressions: _col4 (type: int), _col5 (type: 
int), _col6 (type: int), _col7 (type: int), _col8 (type: int), _col9 (type: 
int), _col10 (type: int), _col11 (type: int), _col12 (type: bigint), _col13 
(type: int), _col14 (type: decimal(7,2)), _col15 (type: decimal(7,2)), _col16 
(type: decimal(7,2)), _col17 (type