[jira] [Commented] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path

2024-01-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803385#comment-17803385
 ] 

László Bodor commented on HIVE-27951:
-

this change has broken precommit testing as described here: 
https://github.com/apache/hive/pull/4937#issuecomment-1878050097
reverted

> hcatalog dynamic partitioning fails with partition already exist error when 
> exist parent partitions path
> 
>
> Key: HIVE-27951
> URL: https://issues.apache.org/jira/browse/HIVE-27951
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 4.0.0-beta-1
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Critical
>  Labels: pull-request-available
>
> if a table have multiple partitions (part1=x1, part2=y1), when insert into a 
> new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer 
> throws path already exists error
>  
> reproduce:
> create table source(id int, part1 string, part2 string);
> create table target(id int) partitioned by (part1 string, part2 string)
> insert into table source values (1, "x1", "y1"), (2, "x1", "y2");
>  
> pig -useHcatalog
> A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader();
> B = filter A by (part2 == 'y1');
> // following succeeds
> store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
> //following fails with duplicate publishing error
> C = filter A by (part2 == 'y2');
> store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
>  
>  
> ```
> Partition already present with given partition key values : Data already 
> exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not 
> possible.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243)
> at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
>  
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition 
> already present with given partition key values : Data already exists in 
> /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible.
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HIVE-27916) Increase tez.am.resource.memory.mb for TestIcebergCliDrver

2024-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reopened HIVE-27916:
-

> Increase tez.am.resource.memory.mb for TestIcebergCliDrver
> --
>
> Key: HIVE-27916
> URL: https://issues.apache.org/jira/browse/HIVE-27916
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 4.0.0-beta-1
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: Not Applicable
>
>
> this is HIVE-27695 for another tez drivers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-01-04 Thread Dmitriy Fingerman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Fingerman updated HIVE-27980:
-
Description: 
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name
REWRITE DATA [USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 

  was:
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name
REWRITE DATA [USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction.

 


> Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
> --
>
> Key: HIVE-27980
> URL: https://issues.apache.org/jira/browse/HIVE-27980
> Project: Hive
>  Issue Type: New Feature
>Reporter: Dmitriy Fingerman
>Priority: Major
>
> Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
> {code:java}
> ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
> Add support for OPTIMIZE TABLE syntax. Example:
> {code:java}
> OPTIMIZE TABLE name
> REWRITE DATA [USING BIN_PACK]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
> WHERE category = 'c1' {code}
> This syntax will be inline with Impala.
> Also, OPTIMIZE command is not limited to compaction, but also supports other 
> table maintenance operations.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-01-04 Thread Dmitriy Fingerman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803231#comment-17803231
 ] 

Dmitriy Fingerman commented on HIVE-27980:
--

FYI [~aturoczy], [~dkuzmenko] 

> Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
> --
>
> Key: HIVE-27980
> URL: https://issues.apache.org/jira/browse/HIVE-27980
> Project: Hive
>  Issue Type: New Feature
>Reporter: Dmitriy Fingerman
>Priority: Major
>
> Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
> {code:java}
> ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
> Add support for OPTIMIZE TABLE syntax. Example:
> {code:java}
> OPTIMIZE TABLE name
> REWRITE DATA [USING BIN_PACK]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
> WHERE category = 'c1' {code}
> This syntax will be inline with Impala.
> Also, OPTIMIZE command supports more syntax than only compaction.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-01-04 Thread Dmitriy Fingerman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Fingerman updated HIVE-27980:
-
Description: 
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name
REWRITE DATA [USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction.

 

  was:
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name
REWRITE DATA [USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command supports more syntax than only compaction.

 


> Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
> --
>
> Key: HIVE-27980
> URL: https://issues.apache.org/jira/browse/HIVE-27980
> Project: Hive
>  Issue Type: New Feature
>Reporter: Dmitriy Fingerman
>Priority: Major
>
> Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
> {code:java}
> ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
> Add support for OPTIMIZE TABLE syntax. Example:
> {code:java}
> OPTIMIZE TABLE name
> REWRITE DATA [USING BIN_PACK]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
> WHERE category = 'c1' {code}
> This syntax will be inline with Impala.
> Also, OPTIMIZE command is not limited to compaction.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-01-04 Thread Dmitriy Fingerman (Jira)
Dmitriy Fingerman created HIVE-27980:


 Summary: Hive Iceberg Compaction: add support for OPTIMIZE TABLE 
syntax
 Key: HIVE-27980
 URL: https://issues.apache.org/jira/browse/HIVE-27980
 Project: Hive
  Issue Type: New Feature
Reporter: Dmitriy Fingerman


Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name
REWRITE DATA [USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command supports more syntax than only compaction.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-24515) Analyze table job can be skipped when stats populated are already accurate

2024-01-04 Thread Dmitriy Fingerman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24515 started by Dmitriy Fingerman.

> Analyze table job can be skipped when stats populated are already accurate
> --
>
> Key: HIVE-24515
> URL: https://issues.apache.org/jira/browse/HIVE-24515
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> For non-partitioned tables, stats detail should be present in table level,
> e.g
> {noformat}
> COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"d_current_day":"true"...
>  }}
>   {noformat}
> For partitioned tables, stats detail should be present in partition level,
> {noformat}
> store_sales(ss_sold_date_sk=2451819)
> {totalSize=0, numRows=0, rawDataSize=0, 
> COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"ss_addr_sk":"true"}}
>  
>  {noformat}
> When stats populated are already accurate, {{analyze table tn compute 
> statistics for columns}} should skip launching the job.
>  
> For ACID tables, stats are auto computed and it can skip computing stats 
> again when stats are accurate.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27979) HMS alter_partitions log adds table name

2024-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27979:
--
Labels: pull-request-available  (was: )

> HMS alter_partitions log adds table name
> 
>
> Key: HIVE-27979
> URL: https://issues.apache.org/jira/browse/HIVE-27979
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27978) Tests in hive-unit module are not running again

2024-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27978:
--
Labels: pull-request-available  (was: )

> Tests in hive-unit module are not running again
> ---
>
> Key: HIVE-27978
> URL: https://issues.apache.org/jira/browse/HIVE-27978
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>
> Fixed in HIVE-27846, went bad in an addendum of HIVE-27797:
> https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503
> currently, it only runs test if I manually remove this dependency
> {code}
> 
>   org.junit.jupiter
>   junit-jupiter
>   ${junit.jupiter.version}
>   test
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27979) HMS alter_partitions log adds table name

2024-01-04 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated HIVE-27979:
--
Component/s: Standalone Metastore

> HMS alter_partitions log adds table name
> 
>
> Key: HIVE-27979
> URL: https://issues.apache.org/jira/browse/HIVE-27979
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27979) HMS alter_partitions log adds table name

2024-01-04 Thread dzcxzl (Jira)
dzcxzl created HIVE-27979:
-

 Summary: HMS alter_partitions log adds table name
 Key: HIVE-27979
 URL: https://issues.apache.org/jira/browse/HIVE-27979
 Project: Hive
  Issue Type: Improvement
Reporter: dzcxzl






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27978) Tests in hive-unit module are not running again

2024-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-27978:
---

Assignee: László Bodor

> Tests in hive-unit module are not running again
> ---
>
> Key: HIVE-27978
> URL: https://issues.apache.org/jira/browse/HIVE-27978
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Fixed in HIVE-27846, went bad in an addendum of HIVE-27797:
> https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503
> currently, it only runs test if I manually remove this dependency
> {code}
> 
>   org.junit.jupiter
>   junit-jupiter
>   ${junit.jupiter.version}
>   test
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27978) Tests in hive-unit module are not running again

2024-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27978 started by László Bodor.
---
> Tests in hive-unit module are not running again
> ---
>
> Key: HIVE-27978
> URL: https://issues.apache.org/jira/browse/HIVE-27978
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Fixed in HIVE-27846, went bad in an addendum of HIVE-27797:
> https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503
> currently, it only runs test if I manually remove this dependency
> {code}
> 
>   org.junit.jupiter
>   junit-jupiter
>   ${junit.jupiter.version}
>   test
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27978) Tests in hive-unit module are not running again

2024-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27978:

Description: 
Fixed in HIVE-27846, went bad in an addendum of HIVE-27797:
https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503

currently, it only runs test if I manually remove this dependency
{code}

  org.junit.jupiter
  junit-jupiter
  ${junit.jupiter.version}
  test

{code}

> Tests in hive-unit module are not running again
> ---
>
> Key: HIVE-27978
> URL: https://issues.apache.org/jira/browse/HIVE-27978
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Major
>
> Fixed in HIVE-27846, went bad in an addendum of HIVE-27797:
> https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503
> currently, it only runs test if I manually remove this dependency
> {code}
> 
>   org.junit.jupiter
>   junit-jupiter
>   ${junit.jupiter.version}
>   test
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27978) Tests in hive-unit module are not running again

2024-01-04 Thread Jira
László Bodor created HIVE-27978:
---

 Summary: Tests in hive-unit module are not running again
 Key: HIVE-27978
 URL: https://issues.apache.org/jira/browse/HIVE-27978
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine

2024-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27977:

Description: 
like:
{code}
Output: '++
|  _c0   |
++
| Hello Smith!   |
| Hello Sachin!  |
++
' should match Hello Sachin!.*Hello Smith!
{code}

I found this flakiness after backporting a related patch to downstream repos 
(HIVE-24730)
not sure why it isn't flaky upstream, however, select records without order is 
not deterministic by design, so it's worth taking care of this

  was:
like:
{code}
Output: '++
|  _c0   |
++
| Hello Smith!   |
| Hello Sachin!  |
++
' should match Hello Sachin!.*Hello Smith!
{code}


> Fix ordering flakiness in TestHplSqlViaBeeLine
> --
>
> Key: HIVE-27977
> URL: https://issues.apache.org/jira/browse/HIVE-27977
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> like:
> {code}
> Output: '++
> |  _c0   |
> ++
> | Hello Smith!   |
> | Hello Sachin!  |
> ++
> ' should match Hello Sachin!.*Hello Smith!
> {code}
> I found this flakiness after backporting a related patch to downstream repos 
> (HIVE-24730)
> not sure why it isn't flaky upstream, however, select records without order 
> is not deterministic by design, so it's worth taking care of this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine

2024-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27977 started by László Bodor.
---
> Fix ordering flakiness in TestHplSqlViaBeeLine
> --
>
> Key: HIVE-27977
> URL: https://issues.apache.org/jira/browse/HIVE-27977
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> like:
> {code}
> Output: '++
> |  _c0   |
> ++
> | Hello Smith!   |
> | Hello Sachin!  |
> ++
> ' should match Hello Sachin!.*Hello Smith!
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27969) Add verbose logging for schematool and metastore service for Docker container

2024-01-04 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803160#comment-17803160
 ] 

Zhihua Deng commented on HIVE-27969:


Fix has been merged into master. Thank you for the PR [~akshatm]!

> Add verbose logging for schematool and metastore service for Docker container
> -
>
> Key: HIVE-27969
> URL: https://issues.apache.org/jira/browse/HIVE-27969
> Project: Hive
>  Issue Type: Improvement
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Adding capability to print verbose logs for schematool and metastore service 
> inside docker container.
>  
> Note: hiveserver2 doesnt support verbose option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine

2024-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27977:

Description: 
like:
{code}
Output: '++
|  _c0   |
++
| Hello Smith!   |
| Hello Sachin!  |
++
' should match Hello Sachin!.*Hello Smith!
{code}

> Fix ordering flakiness in TestHplSqlViaBeeLine
> --
>
> Key: HIVE-27977
> URL: https://issues.apache.org/jira/browse/HIVE-27977
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> like:
> {code}
> Output: '++
> |  _c0   |
> ++
> | Hello Smith!   |
> | Hello Sachin!  |
> ++
> ' should match Hello Sachin!.*Hello Smith!
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine

2024-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-27977:
---

Assignee: László Bodor

> Fix ordering flakiness in TestHplSqlViaBeeLine
> --
>
> Key: HIVE-27977
> URL: https://issues.apache.org/jira/browse/HIVE-27977
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine

2024-01-04 Thread Jira
László Bodor created HIVE-27977:
---

 Summary: Fix ordering flakiness in TestHplSqlViaBeeLine
 Key: HIVE-27977
 URL: https://issues.apache.org/jira/browse/HIVE-27977
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27969) Add verbose logging for schematool and metastore service for Docker container

2024-01-04 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27969.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Add verbose logging for schematool and metastore service for Docker container
> -
>
> Key: HIVE-27969
> URL: https://issues.apache.org/jira/browse/HIVE-27969
> Project: Hive
>  Issue Type: Improvement
>Reporter: Akshat Mathur
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Adding capability to print verbose logs for schematool and metastore service 
> inside docker container.
>  
> Note: hiveserver2 doesnt support verbose option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26768) HPLSQL UDF is not working if it is applied on a column of type varchar/char/decimal in a table.

2024-01-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802897#comment-17802897
 ] 

Ayush Saxena commented on HIVE-26768:
-

[~abstractdog] has fixed it, he will create a ticket to fix it upstream as 
well, sorry for the noise :-) 

> HPLSQL UDF is not working if it is applied on a column of type 
> varchar/char/decimal in a table.
> ---
>
> Key: HIVE-26768
> URL: https://issues.apache.org/jira/browse/HIVE-26768
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> HPLSQL UDF is not working if it is applied on a column of type 
> varchar/char/decimal in a table.
> For example:
> {code:java}
> CREATE TABLE result (s varchar(20));
> INSERT INTO result VALUES('alice');
> INSERT INTO result VALUES('bob');
> CREATE FUNCTION hello(p string)
>  RETURNS STRING
> BEGIN
>  RETURN 'Hello, ' || p;
> END;
> SELECT hello(s) FROM result; {code}
>  
> --> It should return below
> {code:java}
> ++
> |      _c0       |
> ++
> | Hello, alice  |
> | Hello, bob  |
> ++
> {code}
>  
> But actual result is 
> {code:java}
> ++
> |      _c0       |
> ++
> | Hello,   |
> | Hello,   |
> ++
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26768) HPLSQL UDF is not working if it is applied on a column of type varchar/char/decimal in a table.

2024-01-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802818#comment-17802818
 ] 

Ayush Saxena commented on HIVE-26768:
-

Hi [~Dayakar] we are seeing some failures downstream with the test introduced 
in this PR.
Something like

{noformat}
Output: '++
|  _c0   |
++
| Hello Smith!   |
| Hello Sachin!  |
++
' should match Hello Sachin!.*Hello Smith!
{noformat}

I think in your query you should have an Order by clause, so that the entries 
maintain the order before you assert them, else this test would fail whenever 
the order of returned values changes.

Can you raise an Addendum PR to check & fix the tests?

> HPLSQL UDF is not working if it is applied on a column of type 
> varchar/char/decimal in a table.
> ---
>
> Key: HIVE-26768
> URL: https://issues.apache.org/jira/browse/HIVE-26768
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> HPLSQL UDF is not working if it is applied on a column of type 
> varchar/char/decimal in a table.
> For example:
> {code:java}
> CREATE TABLE result (s varchar(20));
> INSERT INTO result VALUES('alice');
> INSERT INTO result VALUES('bob');
> CREATE FUNCTION hello(p string)
>  RETURNS STRING
> BEGIN
>  RETURN 'Hello, ' || p;
> END;
> SELECT hello(s) FROM result; {code}
>  
> --> It should return below
> {code:java}
> ++
> |      _c0       |
> ++
> | Hello, alice  |
> | Hello, bob  |
> ++
> {code}
>  
> But actual result is 
> {code:java}
> ++
> |      _c0       |
> ++
> | Hello,   |
> | Hello,   |
> ++
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27976) Improve logic/query to clean COMPLETED_TXN_COMPONENTS table

2024-01-04 Thread Taraka Rama Rao Lethavadla (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taraka Rama Rao Lethavadla updated HIVE-27976:
--
Description: 
removeDuplicateCompletedTxnComponents seems to take more time in busy clusters 
where the number of entries grow at a higher rate in COMPLETED_TXN_COMPONENTS

Copying discussion from 
[HIVE-27022|https://github.com/apache/hive/pull/4970#issuecomment-1875219288]

What about considering an other approach? What if instead of running the clean 
up in one, large transaction, we try to run multiple small ones?

For example, at MSSQL we did measures back then and found that when we want to 
delete large amount of records, it is way faster in batches under 5000 element 
(usually we used 4000).

The reason why it was faster was the locking mechanism of the database: for 
large amount of records, it put exclusive locks and you cannot use the table in 
other processes and it can cause performance issues.

The clean up in that case is a little bit complicated: it can be slow because 
of the time to take to delete the records or it can be the time to collect the 
records that we want to delete.

For those kind of scenarios I would recommend to have two parameters for the 
clean up:
 * Batch size
 * Number of iterations
And I would still keep a 1 minute interval as default.

So that, it can be easily to fine tune the parameters for the customers: if 
there are too many records to delete, just increase the number of iterations. 
If it takes too large of time to collect what to delete, increase the time 
window and/or the batch size.

Another thought, 

How about modifying the query like
{noformat}
DELETE FROM "completed_txn_components" "tc" WHERE  rowid in (SELECT :"SYS_B_0"  
  FROM   "completed_txn_components"    WHERE  
"ctc_database" = "tc"."ctc_database"   AND "ctc_table" = 
"tc"."ctc_table"   AND ( "ctc_partition" = 
"tc"."ctc_partition"  OR ( "ctc_partition" IS NULL  
 AND "tc"."ctc_partition" IS NULL ) )   
AND ( "tc"."ctc_update_delete" = :"SYS_B_1" 
 OR "tc"."ctc_update_delete" = :"SYS_B_2" 
AND "ctc_update_delete" = :"SYS_B_3" )   AND 
"tc"."ctc_writeid" < "ctc_writeid") {noformat}
Or

How about we do this clean up query to clean entries related to a 
table/partition as part of Cleaner itself? so that the overall load on house 
keeper get's reduced

  was:
removeDuplicateCompletedTxnComponents seems to take more time in busy clusters 
where the number of entries grow at a higher rate in COMPLETED_TXN_COMPONENTS

Copying discussion from 
[HIVE-27022|https://github.com/apache/hive/pull/4970#issuecomment-1875219288]

What about considering an other approach? What if instead of running the clean 
up in one, large transaction, we try to run multiple small ones?

For example, at MSSQL we did measures back then and found that when we want to 
delete large amount of records, it is way faster in batches under 5000 element 
(usually we used 4000).

The reason why it was faster was the locking mechanism of the database: for 
large amount of records, it put exclusive locks and you cannot use the table in 
other processes and it can cause performance issues.

The clean up in that case is a little bit complicated: it can be slow because 
of the time to take to delete the records or it can be the time to collect the 
records that we want to delete.

For those kind of scenarios I would recommend to have two parameters for the 
clean up:
 * Batch size
 * Number of iterations
And I would still keep a 1 minute interval as default.

So that, it can be easily to fine tune the parameters for the customers: if 
there are too many records to delete, just increase the number of iterations. 
If it takes too large of time to collect what to delete, increase the time 
window and/or the batch size.

Another thought, 

How about modifying the query like
{noformat}
DELETE FROM "completed_txn_components" "tc" WHERE  rowid in (SELECT :"SYS_B_0"  
  FROM   "completed_txn_components"    WHERE  
"ctc_database" = "tc"."ctc_database"   AND "ctc_table" = 
"tc"."ctc_table"   AND ( "ctc_partition" = 
"tc"."ctc_partition"  OR ( "ctc_partition" IS NULL  
 AND "tc"."ctc_partition" IS NULL ) )   
AND ( "tc"."ctc_update_delete" = :"SYS_B_1" 
 OR "tc"."ctc_update_delete" = :"SYS_B_2" 
AND "ctc_update_delete" = :"SYS_B_3" )   AND 
"tc"."ctc_writeid" < "ctc_writeid") {noformat}


> Improve logic/query to clean COMPLETED_TXN_COMPONENTS table
> 

[jira] [Created] (HIVE-27976) Improve logic/query to clean COMPLETED_TXN_COMPONENTS table

2024-01-04 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-27976:
-

 Summary: Improve logic/query to clean COMPLETED_TXN_COMPONENTS 
table
 Key: HIVE-27976
 URL: https://issues.apache.org/jira/browse/HIVE-27976
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Taraka Rama Rao Lethavadla


removeDuplicateCompletedTxnComponents seems to take more time in busy clusters 
where the number of entries grow at a higher rate in COMPLETED_TXN_COMPONENTS

Copying discussion from 
[HIVE-27022|https://github.com/apache/hive/pull/4970#issuecomment-1875219288]

What about considering an other approach? What if instead of running the clean 
up in one, large transaction, we try to run multiple small ones?

For example, at MSSQL we did measures back then and found that when we want to 
delete large amount of records, it is way faster in batches under 5000 element 
(usually we used 4000).

The reason why it was faster was the locking mechanism of the database: for 
large amount of records, it put exclusive locks and you cannot use the table in 
other processes and it can cause performance issues.

The clean up in that case is a little bit complicated: it can be slow because 
of the time to take to delete the records or it can be the time to collect the 
records that we want to delete.

For those kind of scenarios I would recommend to have two parameters for the 
clean up:
 * Batch size
 * Number of iterations
And I would still keep a 1 minute interval as default.

So that, it can be easily to fine tune the parameters for the customers: if 
there are too many records to delete, just increase the number of iterations. 
If it takes too large of time to collect what to delete, increase the time 
window and/or the batch size.

Another thought, 

How about modifying the query like
{noformat}
DELETE FROM "completed_txn_components" "tc" WHERE  rowid in (SELECT :"SYS_B_0"  
  FROM   "completed_txn_components"    WHERE  
"ctc_database" = "tc"."ctc_database"   AND "ctc_table" = 
"tc"."ctc_table"   AND ( "ctc_partition" = 
"tc"."ctc_partition"  OR ( "ctc_partition" IS NULL  
 AND "tc"."ctc_partition" IS NULL ) )   
AND ( "tc"."ctc_update_delete" = :"SYS_B_1" 
 OR "tc"."ctc_update_delete" = :"SYS_B_2" 
AND "ctc_update_delete" = :"SYS_B_3" )   AND 
"tc"."ctc_writeid" < "ctc_writeid") {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)