[jira] [Commented] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path
[ https://issues.apache.org/jira/browse/HIVE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803385#comment-17803385 ] László Bodor commented on HIVE-27951: - this change has broken precommit testing as described here: https://github.com/apache/hive/pull/4937#issuecomment-1878050097 reverted > hcatalog dynamic partitioning fails with partition already exist error when > exist parent partitions path > > > Key: HIVE-27951 > URL: https://issues.apache.org/jira/browse/HIVE-27951 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 4.0.0-beta-1 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Critical > Labels: pull-request-available > > if a table have multiple partitions (part1=x1, part2=y1), when insert into a > new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer > throws path already exists error > > reproduce: > create table source(id int, part1 string, part2 string); > create table target(id int) partitioned by (part1 string, part2 string) > insert into table source values (1, "x1", "y1"), (2, "x1", "y2"); > > pig -useHcatalog > A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader(); > B = filter A by (part2 == 'y1'); > // following succeeds > store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); > //following fails with duplicate publishing error > C = filter A by (part2 == 'y2'); > store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); > > > ``` > Partition already present with given partition key values : Data already > exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not > possible. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286) > > Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition > already present with given partition key values : Data already exists in > /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible. > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HIVE-27916) Increase tez.am.resource.memory.mb for TestIcebergCliDrver
[ https://issues.apache.org/jira/browse/HIVE-27916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reopened HIVE-27916: - > Increase tez.am.resource.memory.mb for TestIcebergCliDrver > -- > > Key: HIVE-27916 > URL: https://issues.apache.org/jira/browse/HIVE-27916 > Project: Hive > Issue Type: Bug > Components: Test >Affects Versions: 4.0.0-beta-1 >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: Not Applicable > > > this is HIVE-27695 for another tez drivers -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
[ https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Fingerman updated HIVE-27980: - Description: Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below. {code:java} ALTER TABLE name COMPACT MAJOR [AND WAIT] {code} Add support for OPTIMIZE TABLE syntax. Example: {code:java} OPTIMIZE TABLE name REWRITE DATA [USING BIN_PACK] [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } = [, ... ] ) ] WHERE category = 'c1' {code} This syntax will be inline with Impala. Also, OPTIMIZE command is not limited to compaction, but also supports other table maintenance operations. was: Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below. {code:java} ALTER TABLE name COMPACT MAJOR [AND WAIT] {code} Add support for OPTIMIZE TABLE syntax. Example: {code:java} OPTIMIZE TABLE name REWRITE DATA [USING BIN_PACK] [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } = [, ... ] ) ] WHERE category = 'c1' {code} This syntax will be inline with Impala. Also, OPTIMIZE command is not limited to compaction. > Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax > -- > > Key: HIVE-27980 > URL: https://issues.apache.org/jira/browse/HIVE-27980 > Project: Hive > Issue Type: New Feature >Reporter: Dmitriy Fingerman >Priority: Major > > Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below. > {code:java} > ALTER TABLE name COMPACT MAJOR [AND WAIT] {code} > Add support for OPTIMIZE TABLE syntax. Example: > {code:java} > OPTIMIZE TABLE name > REWRITE DATA [USING BIN_PACK] > [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } = [, ... ] ) ] > WHERE category = 'c1' {code} > This syntax will be inline with Impala. > Also, OPTIMIZE command is not limited to compaction, but also supports other > table maintenance operations. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
[ https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803231#comment-17803231 ] Dmitriy Fingerman commented on HIVE-27980: -- FYI [~aturoczy], [~dkuzmenko] > Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax > -- > > Key: HIVE-27980 > URL: https://issues.apache.org/jira/browse/HIVE-27980 > Project: Hive > Issue Type: New Feature >Reporter: Dmitriy Fingerman >Priority: Major > > Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below. > {code:java} > ALTER TABLE name COMPACT MAJOR [AND WAIT] {code} > Add support for OPTIMIZE TABLE syntax. Example: > {code:java} > OPTIMIZE TABLE name > REWRITE DATA [USING BIN_PACK] > [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } = [, ... ] ) ] > WHERE category = 'c1' {code} > This syntax will be inline with Impala. > Also, OPTIMIZE command supports more syntax than only compaction. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
[ https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Fingerman updated HIVE-27980: - Description: Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below. {code:java} ALTER TABLE name COMPACT MAJOR [AND WAIT] {code} Add support for OPTIMIZE TABLE syntax. Example: {code:java} OPTIMIZE TABLE name REWRITE DATA [USING BIN_PACK] [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } = [, ... ] ) ] WHERE category = 'c1' {code} This syntax will be inline with Impala. Also, OPTIMIZE command is not limited to compaction. was: Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below. {code:java} ALTER TABLE name COMPACT MAJOR [AND WAIT] {code} Add support for OPTIMIZE TABLE syntax. Example: {code:java} OPTIMIZE TABLE name REWRITE DATA [USING BIN_PACK] [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } = [, ... ] ) ] WHERE category = 'c1' {code} This syntax will be inline with Impala. Also, OPTIMIZE command supports more syntax than only compaction. > Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax > -- > > Key: HIVE-27980 > URL: https://issues.apache.org/jira/browse/HIVE-27980 > Project: Hive > Issue Type: New Feature >Reporter: Dmitriy Fingerman >Priority: Major > > Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below. > {code:java} > ALTER TABLE name COMPACT MAJOR [AND WAIT] {code} > Add support for OPTIMIZE TABLE syntax. Example: > {code:java} > OPTIMIZE TABLE name > REWRITE DATA [USING BIN_PACK] > [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } = [, ... ] ) ] > WHERE category = 'c1' {code} > This syntax will be inline with Impala. > Also, OPTIMIZE command is not limited to compaction. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
Dmitriy Fingerman created HIVE-27980: Summary: Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax Key: HIVE-27980 URL: https://issues.apache.org/jira/browse/HIVE-27980 Project: Hive Issue Type: New Feature Reporter: Dmitriy Fingerman Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below. {code:java} ALTER TABLE name COMPACT MAJOR [AND WAIT] {code} Add support for OPTIMIZE TABLE syntax. Example: {code:java} OPTIMIZE TABLE name REWRITE DATA [USING BIN_PACK] [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } = [, ... ] ) ] WHERE category = 'c1' {code} This syntax will be inline with Impala. Also, OPTIMIZE command supports more syntax than only compaction. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-24515) Analyze table job can be skipped when stats populated are already accurate
[ https://issues.apache.org/jira/browse/HIVE-24515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24515 started by Dmitriy Fingerman. > Analyze table job can be skipped when stats populated are already accurate > -- > > Key: HIVE-24515 > URL: https://issues.apache.org/jira/browse/HIVE-24515 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > For non-partitioned tables, stats detail should be present in table level, > e.g > {noformat} > COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"d_current_day":"true"... > }} > {noformat} > For partitioned tables, stats detail should be present in partition level, > {noformat} > store_sales(ss_sold_date_sk=2451819) > {totalSize=0, numRows=0, rawDataSize=0, > COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"ss_addr_sk":"true"}} > > {noformat} > When stats populated are already accurate, {{analyze table tn compute > statistics for columns}} should skip launching the job. > > For ACID tables, stats are auto computed and it can skip computing stats > again when stats are accurate. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27979) HMS alter_partitions log adds table name
[ https://issues.apache.org/jira/browse/HIVE-27979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27979: -- Labels: pull-request-available (was: ) > HMS alter_partitions log adds table name > > > Key: HIVE-27979 > URL: https://issues.apache.org/jira/browse/HIVE-27979 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: dzcxzl >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27978) Tests in hive-unit module are not running again
[ https://issues.apache.org/jira/browse/HIVE-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27978: -- Labels: pull-request-available (was: ) > Tests in hive-unit module are not running again > --- > > Key: HIVE-27978 > URL: https://issues.apache.org/jira/browse/HIVE-27978 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > Fixed in HIVE-27846, went bad in an addendum of HIVE-27797: > https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503 > currently, it only runs test if I manually remove this dependency > {code} > > org.junit.jupiter > junit-jupiter > ${junit.jupiter.version} > test > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27979) HMS alter_partitions log adds table name
[ https://issues.apache.org/jira/browse/HIVE-27979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated HIVE-27979: -- Component/s: Standalone Metastore > HMS alter_partitions log adds table name > > > Key: HIVE-27979 > URL: https://issues.apache.org/jira/browse/HIVE-27979 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: dzcxzl >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27979) HMS alter_partitions log adds table name
dzcxzl created HIVE-27979: - Summary: HMS alter_partitions log adds table name Key: HIVE-27979 URL: https://issues.apache.org/jira/browse/HIVE-27979 Project: Hive Issue Type: Improvement Reporter: dzcxzl -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27978) Tests in hive-unit module are not running again
[ https://issues.apache.org/jira/browse/HIVE-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-27978: --- Assignee: László Bodor > Tests in hive-unit module are not running again > --- > > Key: HIVE-27978 > URL: https://issues.apache.org/jira/browse/HIVE-27978 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > Fixed in HIVE-27846, went bad in an addendum of HIVE-27797: > https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503 > currently, it only runs test if I manually remove this dependency > {code} > > org.junit.jupiter > junit-jupiter > ${junit.jupiter.version} > test > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27978) Tests in hive-unit module are not running again
[ https://issues.apache.org/jira/browse/HIVE-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27978 started by László Bodor. --- > Tests in hive-unit module are not running again > --- > > Key: HIVE-27978 > URL: https://issues.apache.org/jira/browse/HIVE-27978 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > Fixed in HIVE-27846, went bad in an addendum of HIVE-27797: > https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503 > currently, it only runs test if I manually remove this dependency > {code} > > org.junit.jupiter > junit-jupiter > ${junit.jupiter.version} > test > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27978) Tests in hive-unit module are not running again
[ https://issues.apache.org/jira/browse/HIVE-27978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27978: Description: Fixed in HIVE-27846, went bad in an addendum of HIVE-27797: https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503 currently, it only runs test if I manually remove this dependency {code} org.junit.jupiter junit-jupiter ${junit.jupiter.version} test {code} > Tests in hive-unit module are not running again > --- > > Key: HIVE-27978 > URL: https://issues.apache.org/jira/browse/HIVE-27978 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Priority: Major > > Fixed in HIVE-27846, went bad in an addendum of HIVE-27797: > https://github.com/apache/hive/commit/5022b85b5f50#diff-2f651f99c3a3a2dd091abda120ae33f028ba3bdfa749cc5c3aa36ebba15379e3R498-R503 > currently, it only runs test if I manually remove this dependency > {code} > > org.junit.jupiter > junit-jupiter > ${junit.jupiter.version} > test > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27978) Tests in hive-unit module are not running again
László Bodor created HIVE-27978: --- Summary: Tests in hive-unit module are not running again Key: HIVE-27978 URL: https://issues.apache.org/jira/browse/HIVE-27978 Project: Hive Issue Type: Improvement Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine
[ https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27977: Description: like: {code} Output: '++ | _c0 | ++ | Hello Smith! | | Hello Sachin! | ++ ' should match Hello Sachin!.*Hello Smith! {code} I found this flakiness after backporting a related patch to downstream repos (HIVE-24730) not sure why it isn't flaky upstream, however, select records without order is not deterministic by design, so it's worth taking care of this was: like: {code} Output: '++ | _c0 | ++ | Hello Smith! | | Hello Sachin! | ++ ' should match Hello Sachin!.*Hello Smith! {code} > Fix ordering flakiness in TestHplSqlViaBeeLine > -- > > Key: HIVE-27977 > URL: https://issues.apache.org/jira/browse/HIVE-27977 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > like: > {code} > Output: '++ > | _c0 | > ++ > | Hello Smith! | > | Hello Sachin! | > ++ > ' should match Hello Sachin!.*Hello Smith! > {code} > I found this flakiness after backporting a related patch to downstream repos > (HIVE-24730) > not sure why it isn't flaky upstream, however, select records without order > is not deterministic by design, so it's worth taking care of this -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine
[ https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27977 started by László Bodor. --- > Fix ordering flakiness in TestHplSqlViaBeeLine > -- > > Key: HIVE-27977 > URL: https://issues.apache.org/jira/browse/HIVE-27977 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > like: > {code} > Output: '++ > | _c0 | > ++ > | Hello Smith! | > | Hello Sachin! | > ++ > ' should match Hello Sachin!.*Hello Smith! > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27969) Add verbose logging for schematool and metastore service for Docker container
[ https://issues.apache.org/jira/browse/HIVE-27969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803160#comment-17803160 ] Zhihua Deng commented on HIVE-27969: Fix has been merged into master. Thank you for the PR [~akshatm]! > Add verbose logging for schematool and metastore service for Docker container > - > > Key: HIVE-27969 > URL: https://issues.apache.org/jira/browse/HIVE-27969 > Project: Hive > Issue Type: Improvement >Reporter: Akshat Mathur >Assignee: Akshat Mathur >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Adding capability to print verbose logs for schematool and metastore service > inside docker container. > > Note: hiveserver2 doesnt support verbose option. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine
[ https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor updated HIVE-27977: Description: like: {code} Output: '++ | _c0 | ++ | Hello Smith! | | Hello Sachin! | ++ ' should match Hello Sachin!.*Hello Smith! {code} > Fix ordering flakiness in TestHplSqlViaBeeLine > -- > > Key: HIVE-27977 > URL: https://issues.apache.org/jira/browse/HIVE-27977 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > like: > {code} > Output: '++ > | _c0 | > ++ > | Hello Smith! | > | Hello Sachin! | > ++ > ' should match Hello Sachin!.*Hello Smith! > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine
[ https://issues.apache.org/jira/browse/HIVE-27977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-27977: --- Assignee: László Bodor > Fix ordering flakiness in TestHplSqlViaBeeLine > -- > > Key: HIVE-27977 > URL: https://issues.apache.org/jira/browse/HIVE-27977 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27977) Fix ordering flakiness in TestHplSqlViaBeeLine
László Bodor created HIVE-27977: --- Summary: Fix ordering flakiness in TestHplSqlViaBeeLine Key: HIVE-27977 URL: https://issues.apache.org/jira/browse/HIVE-27977 Project: Hive Issue Type: Improvement Reporter: László Bodor -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27969) Add verbose logging for schematool and metastore service for Docker container
[ https://issues.apache.org/jira/browse/HIVE-27969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng resolved HIVE-27969. Fix Version/s: 4.0.0 Resolution: Fixed > Add verbose logging for schematool and metastore service for Docker container > - > > Key: HIVE-27969 > URL: https://issues.apache.org/jira/browse/HIVE-27969 > Project: Hive > Issue Type: Improvement >Reporter: Akshat Mathur >Assignee: Akshat Mathur >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Adding capability to print verbose logs for schematool and metastore service > inside docker container. > > Note: hiveserver2 doesnt support verbose option. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26768) HPLSQL UDF is not working if it is applied on a column of type varchar/char/decimal in a table.
[ https://issues.apache.org/jira/browse/HIVE-26768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802897#comment-17802897 ] Ayush Saxena commented on HIVE-26768: - [~abstractdog] has fixed it, he will create a ticket to fix it upstream as well, sorry for the noise :-) > HPLSQL UDF is not working if it is applied on a column of type > varchar/char/decimal in a table. > --- > > Key: HIVE-26768 > URL: https://issues.apache.org/jira/browse/HIVE-26768 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-beta-1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > HPLSQL UDF is not working if it is applied on a column of type > varchar/char/decimal in a table. > For example: > {code:java} > CREATE TABLE result (s varchar(20)); > INSERT INTO result VALUES('alice'); > INSERT INTO result VALUES('bob'); > CREATE FUNCTION hello(p string) > RETURNS STRING > BEGIN > RETURN 'Hello, ' || p; > END; > SELECT hello(s) FROM result; {code} > > --> It should return below > {code:java} > ++ > | _c0 | > ++ > | Hello, alice | > | Hello, bob | > ++ > {code} > > But actual result is > {code:java} > ++ > | _c0 | > ++ > | Hello, | > | Hello, | > ++ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26768) HPLSQL UDF is not working if it is applied on a column of type varchar/char/decimal in a table.
[ https://issues.apache.org/jira/browse/HIVE-26768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802818#comment-17802818 ] Ayush Saxena commented on HIVE-26768: - Hi [~Dayakar] we are seeing some failures downstream with the test introduced in this PR. Something like {noformat} Output: '++ | _c0 | ++ | Hello Smith! | | Hello Sachin! | ++ ' should match Hello Sachin!.*Hello Smith! {noformat} I think in your query you should have an Order by clause, so that the entries maintain the order before you assert them, else this test would fail whenever the order of returned values changes. Can you raise an Addendum PR to check & fix the tests? > HPLSQL UDF is not working if it is applied on a column of type > varchar/char/decimal in a table. > --- > > Key: HIVE-26768 > URL: https://issues.apache.org/jira/browse/HIVE-26768 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-beta-1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > HPLSQL UDF is not working if it is applied on a column of type > varchar/char/decimal in a table. > For example: > {code:java} > CREATE TABLE result (s varchar(20)); > INSERT INTO result VALUES('alice'); > INSERT INTO result VALUES('bob'); > CREATE FUNCTION hello(p string) > RETURNS STRING > BEGIN > RETURN 'Hello, ' || p; > END; > SELECT hello(s) FROM result; {code} > > --> It should return below > {code:java} > ++ > | _c0 | > ++ > | Hello, alice | > | Hello, bob | > ++ > {code} > > But actual result is > {code:java} > ++ > | _c0 | > ++ > | Hello, | > | Hello, | > ++ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27976) Improve logic/query to clean COMPLETED_TXN_COMPONENTS table
[ https://issues.apache.org/jira/browse/HIVE-27976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taraka Rama Rao Lethavadla updated HIVE-27976: -- Description: removeDuplicateCompletedTxnComponents seems to take more time in busy clusters where the number of entries grow at a higher rate in COMPLETED_TXN_COMPONENTS Copying discussion from [HIVE-27022|https://github.com/apache/hive/pull/4970#issuecomment-1875219288] What about considering an other approach? What if instead of running the clean up in one, large transaction, we try to run multiple small ones? For example, at MSSQL we did measures back then and found that when we want to delete large amount of records, it is way faster in batches under 5000 element (usually we used 4000). The reason why it was faster was the locking mechanism of the database: for large amount of records, it put exclusive locks and you cannot use the table in other processes and it can cause performance issues. The clean up in that case is a little bit complicated: it can be slow because of the time to take to delete the records or it can be the time to collect the records that we want to delete. For those kind of scenarios I would recommend to have two parameters for the clean up: * Batch size * Number of iterations And I would still keep a 1 minute interval as default. So that, it can be easily to fine tune the parameters for the customers: if there are too many records to delete, just increase the number of iterations. If it takes too large of time to collect what to delete, increase the time window and/or the batch size. Another thought, How about modifying the query like {noformat} DELETE FROM "completed_txn_components" "tc" WHERE rowid in (SELECT :"SYS_B_0" FROM "completed_txn_components" WHERE "ctc_database" = "tc"."ctc_database" AND "ctc_table" = "tc"."ctc_table" AND ( "ctc_partition" = "tc"."ctc_partition" OR ( "ctc_partition" IS NULL AND "tc"."ctc_partition" IS NULL ) ) AND ( "tc"."ctc_update_delete" = :"SYS_B_1" OR "tc"."ctc_update_delete" = :"SYS_B_2" AND "ctc_update_delete" = :"SYS_B_3" ) AND "tc"."ctc_writeid" < "ctc_writeid") {noformat} Or How about we do this clean up query to clean entries related to a table/partition as part of Cleaner itself? so that the overall load on house keeper get's reduced was: removeDuplicateCompletedTxnComponents seems to take more time in busy clusters where the number of entries grow at a higher rate in COMPLETED_TXN_COMPONENTS Copying discussion from [HIVE-27022|https://github.com/apache/hive/pull/4970#issuecomment-1875219288] What about considering an other approach? What if instead of running the clean up in one, large transaction, we try to run multiple small ones? For example, at MSSQL we did measures back then and found that when we want to delete large amount of records, it is way faster in batches under 5000 element (usually we used 4000). The reason why it was faster was the locking mechanism of the database: for large amount of records, it put exclusive locks and you cannot use the table in other processes and it can cause performance issues. The clean up in that case is a little bit complicated: it can be slow because of the time to take to delete the records or it can be the time to collect the records that we want to delete. For those kind of scenarios I would recommend to have two parameters for the clean up: * Batch size * Number of iterations And I would still keep a 1 minute interval as default. So that, it can be easily to fine tune the parameters for the customers: if there are too many records to delete, just increase the number of iterations. If it takes too large of time to collect what to delete, increase the time window and/or the batch size. Another thought, How about modifying the query like {noformat} DELETE FROM "completed_txn_components" "tc" WHERE rowid in (SELECT :"SYS_B_0" FROM "completed_txn_components" WHERE "ctc_database" = "tc"."ctc_database" AND "ctc_table" = "tc"."ctc_table" AND ( "ctc_partition" = "tc"."ctc_partition" OR ( "ctc_partition" IS NULL AND "tc"."ctc_partition" IS NULL ) ) AND ( "tc"."ctc_update_delete" = :"SYS_B_1" OR "tc"."ctc_update_delete" = :"SYS_B_2" AND "ctc_update_delete" = :"SYS_B_3" ) AND "tc"."ctc_writeid" < "ctc_writeid") {noformat} > Improve logic/query to clean COMPLETED_TXN_COMPONENTS table >
[jira] [Created] (HIVE-27976) Improve logic/query to clean COMPLETED_TXN_COMPONENTS table
Taraka Rama Rao Lethavadla created HIVE-27976: - Summary: Improve logic/query to clean COMPLETED_TXN_COMPONENTS table Key: HIVE-27976 URL: https://issues.apache.org/jira/browse/HIVE-27976 Project: Hive Issue Type: Improvement Components: Hive Reporter: Taraka Rama Rao Lethavadla removeDuplicateCompletedTxnComponents seems to take more time in busy clusters where the number of entries grow at a higher rate in COMPLETED_TXN_COMPONENTS Copying discussion from [HIVE-27022|https://github.com/apache/hive/pull/4970#issuecomment-1875219288] What about considering an other approach? What if instead of running the clean up in one, large transaction, we try to run multiple small ones? For example, at MSSQL we did measures back then and found that when we want to delete large amount of records, it is way faster in batches under 5000 element (usually we used 4000). The reason why it was faster was the locking mechanism of the database: for large amount of records, it put exclusive locks and you cannot use the table in other processes and it can cause performance issues. The clean up in that case is a little bit complicated: it can be slow because of the time to take to delete the records or it can be the time to collect the records that we want to delete. For those kind of scenarios I would recommend to have two parameters for the clean up: * Batch size * Number of iterations And I would still keep a 1 minute interval as default. So that, it can be easily to fine tune the parameters for the customers: if there are too many records to delete, just increase the number of iterations. If it takes too large of time to collect what to delete, increase the time window and/or the batch size. Another thought, How about modifying the query like {noformat} DELETE FROM "completed_txn_components" "tc" WHERE rowid in (SELECT :"SYS_B_0" FROM "completed_txn_components" WHERE "ctc_database" = "tc"."ctc_database" AND "ctc_table" = "tc"."ctc_table" AND ( "ctc_partition" = "tc"."ctc_partition" OR ( "ctc_partition" IS NULL AND "tc"."ctc_partition" IS NULL ) ) AND ( "tc"."ctc_update_delete" = :"SYS_B_1" OR "tc"."ctc_update_delete" = :"SYS_B_2" AND "ctc_update_delete" = :"SYS_B_3" ) AND "tc"."ctc_writeid" < "ctc_writeid") {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)