[jira] [Work started] (HIVE-25403) from_unixtime() does not consider leap seconds
[ https://issues.apache.org/jira/browse/HIVE-25403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25403 started by Sruthi Mooriyathvariam. - > from_unixtime() does not consider leap seconds > --- > > Key: HIVE-25403 > URL: https://issues.apache.org/jira/browse/HIVE-25403 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sruthi Mooriyathvariam >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: pull-request-available > Fix For: 3.1.2 > > Attachments: image-2021-07-29-14-42-49-806.png > > Time Spent: 40m > Remaining Estimate: 0h > > The Unix_timestamp() considers "leap second" while the from_unixtime is not; > which results in to wrong result as below: > !image-2021-07-29-14-42-49-806.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds
[ https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=631476&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631476 ] ASF GitHub Bot logged work on HIVE-25403: - Author: ASF GitHub Bot Created on: 30/Jul/21 05:49 Start Date: 30/Jul/21 05:49 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on pull request #2550: URL: https://github.com/apache/hive/pull/2550#issuecomment-889645738 @warriersruthi Please refer - https://github.com/apache/hive/blob/10c8278e18942819f2a16c546d5ee1170937e64b/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java and refactor the class. As the code is redundant and not issue to review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631476) Time Spent: 40m (was: 0.5h) > from_unixtime() does not consider leap seconds > --- > > Key: HIVE-25403 > URL: https://issues.apache.org/jira/browse/HIVE-25403 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sruthi Mooriyathvariam >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: pull-request-available > Fix For: 3.1.2 > > Attachments: image-2021-07-29-14-42-49-806.png > > Time Spent: 40m > Remaining Estimate: 0h > > The Unix_timestamp() considers "leap second" while the from_unixtime is not; > which results in to wrong result as below: > !image-2021-07-29-14-42-49-806.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds
[ https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=631473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631473 ] ASF GitHub Bot logged work on HIVE-25403: - Author: ASF GitHub Bot Created on: 30/Jul/21 05:44 Start Date: 30/Jul/21 05:44 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on pull request #2550: URL: https://github.com/apache/hive/pull/2550#issuecomment-889644002 @warriersruthi divide the PR description as follows - What changes were proposed in this pull request? Why are the changes needed? Does this PR introduce any user-facing change? How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631473) Time Spent: 0.5h (was: 20m) > from_unixtime() does not consider leap seconds > --- > > Key: HIVE-25403 > URL: https://issues.apache.org/jira/browse/HIVE-25403 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sruthi Mooriyathvariam >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: pull-request-available > Fix For: 3.1.2 > > Attachments: image-2021-07-29-14-42-49-806.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > The Unix_timestamp() considers "leap second" while the from_unixtime is not; > which results in to wrong result as below: > !image-2021-07-29-14-42-49-806.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds
[ https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=631470&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631470 ] ASF GitHub Bot logged work on HIVE-25403: - Author: ASF GitHub Bot Created on: 30/Jul/21 05:41 Start Date: 30/Jul/21 05:41 Worklog Time Spent: 10m Work Description: adesh-rao commented on a change in pull request #2550: URL: https://github.com/apache/hive/pull/2550#discussion_r679662298 ## File path: ql/src/test/queries/clientpositive/udf5.q ## @@ -13,6 +13,8 @@ SELECT from_unixtime(unix_timestamp('2010-01-13 11:57:40', '-MM-dd HH:mm:ss' SELECT from_unixtime(unix_timestamp('2010-01-13 11:57:40', '-MM-dd HH:mm:ss'), 'MM/dd/yy HH:mm:ss'), from_unixtime(unix_timestamp('2010-01-13 11:57:40')) from dest1_n14; +SELECT from_unixtime(unix_timestamp(cast('2010-01-13' as date))); Review comment: Run the same query in different timezone too. ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFromUnixTime.java ## @@ -60,7 +62,7 @@ private transient SimpleDateFormat formatter = new SimpleDateFormat("-MM-dd HH:mm:ss"); Review comment: Remove this variable if not being used anywhere? ## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFromUnixTime.java ## @@ -60,7 +62,7 @@ private transient SimpleDateFormat formatter = new SimpleDateFormat("-MM-dd HH:mm:ss"); private transient String lastFormat = null; - + private transient DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("-MM-dd HH:mm:ss"); Review comment: Update this variable in configure method variable too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631470) Time Spent: 20m (was: 10m) > from_unixtime() does not consider leap seconds > --- > > Key: HIVE-25403 > URL: https://issues.apache.org/jira/browse/HIVE-25403 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sruthi Mooriyathvariam >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: pull-request-available > Fix For: 3.1.2 > > Attachments: image-2021-07-29-14-42-49-806.png > > Time Spent: 20m > Remaining Estimate: 0h > > The Unix_timestamp() considers "leap second" while the from_unixtime is not; > which results in to wrong result as below: > !image-2021-07-29-14-42-49-806.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25407) [Hive] Investigate why advancing the Write ID not working for some DDLs and fix it, if appropriate
[ https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390292#comment-17390292 ] Peter Vary commented on HIVE-25407: --- We probably don't want to advance the writeId, especially for compactions, where there are no changes in the data. Don't forget that the materialised view handling depends on the writeId to decide if the view data should be used / refreshed or not. If we add unnecessary writeIds we start to cause performance issues with materialised views above the table. CC: [~klcopp] > [Hive] Investigate why advancing the Write ID not working for some DDLs and > fix it, if appropriate > -- > > Key: HIVE-25407 > URL: https://issues.apache.org/jira/browse/HIVE-25407 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Priority: Major > > Below DDLs should be investigated separately on why the advancing the write > ID is not working for transactional tables, even after adding the logic to > advance the write ID. > * ALTER TABLE SET PARTITION SPEC > * ALTER TABLE UNSET SERDEPROPERTIES > * ALTER TABLE COMPACT > * ALTER TABLE SKEWED BY > * ALTER TABLE SET SKEWED LOCATION -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25403) from_unixtime() does not consider leap seconds
[ https://issues.apache.org/jira/browse/HIVE-25403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25403: -- Labels: pull-request-available (was: ) > from_unixtime() does not consider leap seconds > --- > > Key: HIVE-25403 > URL: https://issues.apache.org/jira/browse/HIVE-25403 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sruthi Mooriyathvariam >Assignee: Sruthi Mooriyathvariam >Priority: Major > Labels: pull-request-available > Fix For: 3.1.2 > > Attachments: image-2021-07-29-14-42-49-806.png > > Time Spent: 10m > Remaining Estimate: 0h > > The Unix_timestamp() considers "leap second" while the from_unixtime is not; > which results in to wrong result as below: > !image-2021-07-29-14-42-49-806.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds
[ https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=631462&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631462 ] ASF GitHub Bot logged work on HIVE-25403: - Author: ASF GitHub Bot Created on: 30/Jul/21 05:26 Start Date: 30/Jul/21 05:26 Worklog Time Spent: 10m Work Description: warriersruthi opened a new pull request #2550: URL: https://github.com/apache/hive/pull/2550 The query SELECT from_unixtime(unix_timestamp(cast('1400-01-01' as date))); was giving wrong results, because the from_unixtime() function was not considering leap seconds while representing Timestamp (it was using java.utils.Date). The Unix_timestamp() already considered this, so in order to get the correct results for the above query, it was required to change the epoch time in the from_unixtime() to ZonedDateTime. The required conversion was done with the help of the Instant class, which represents a moment given the epoch time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631462) Remaining Estimate: 0h Time Spent: 10m > from_unixtime() does not consider leap seconds > --- > > Key: HIVE-25403 > URL: https://issues.apache.org/jira/browse/HIVE-25403 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sruthi Mooriyathvariam >Assignee: Sruthi Mooriyathvariam >Priority: Major > Fix For: 3.1.2 > > Attachments: image-2021-07-29-14-42-49-806.png > > Time Spent: 10m > Remaining Estimate: 0h > > The Unix_timestamp() considers "leap second" while the from_unixtime is not; > which results in to wrong result as below: > !image-2021-07-29-14-42-49-806.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25408) AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for Authorization.
[ https://issues.apache.org/jira/browse/HIVE-25408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala reassigned HIVE-25408: > AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for > Authorization. > - > > Key: HIVE-25408 > URL: https://issues.apache.org/jira/browse/HIVE-25408 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > Currently, Hive is sending an empty list in the Hive Privilege Objects for > authorization when a user does the following operation: alter table foo set > owner user user_name; > We should be sending the input/objects related to the table in Hive privilege > objects for authorization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.
[ https://issues.apache.org/jira/browse/HIVE-25400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HIVE-25400. -- Hadoop Flags: Reviewed Resolution: Fixed Thanks for the review, Panos! > Move the offset updating in BytesColumnVector to setValPreallocated. > > > Key: HIVE-25400 > URL: https://issues.apache.org/jira/browse/HIVE-25400 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0 > > Time Spent: 40m > Remaining Estimate: 0h > > HIVE-25190 changed the semantics of BytesColumnVector so that > ensureValPreallocated reserved the room, which interacted badly with ORC's > redact mask code. The redact mask code needs to be able to increase the > allocation as it goes so it can call the ensureValPreallocated multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.
[ https://issues.apache.org/jira/browse/HIVE-25400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390170#comment-17390170 ] Dongjoon Hyun commented on HIVE-25400: -- Since the PR is merged, could you resolve this issue? > Move the offset updating in BytesColumnVector to setValPreallocated. > > > Key: HIVE-25400 > URL: https://issues.apache.org/jira/browse/HIVE-25400 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0 > > Time Spent: 40m > Remaining Estimate: 0h > > HIVE-25190 changed the semantics of BytesColumnVector so that > ensureValPreallocated reserved the room, which interacted badly with ORC's > redact mask code. The redact mask code needs to be able to increase the > allocation as it goes so it can call the ensureValPreallocated multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.
[ https://issues.apache.org/jira/browse/HIVE-25400?focusedWorklogId=631367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631367 ] ASF GitHub Bot logged work on HIVE-25400: - Author: ASF GitHub Bot Created on: 29/Jul/21 21:41 Start Date: 29/Jul/21 21:41 Worklog Time Spent: 10m Work Description: dongjoon-hyun commented on pull request #2543: URL: https://github.com/apache/hive/pull/2543#issuecomment-889479884 Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631367) Time Spent: 40m (was: 0.5h) > Move the offset updating in BytesColumnVector to setValPreallocated. > > > Key: HIVE-25400 > URL: https://issues.apache.org/jira/browse/HIVE-25400 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0 > > Time Spent: 40m > Remaining Estimate: 0h > > HIVE-25190 changed the semantics of BytesColumnVector so that > ensureValPreallocated reserved the room, which interacted badly with ORC's > redact mask code. The redact mask code needs to be able to increase the > allocation as it goes so it can call the ensureValPreallocated multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.
[ https://issues.apache.org/jira/browse/HIVE-25400?focusedWorklogId=631256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631256 ] ASF GitHub Bot logged work on HIVE-25400: - Author: ASF GitHub Bot Created on: 29/Jul/21 18:06 Start Date: 29/Jul/21 18:06 Worklog Time Spent: 10m Work Description: omalley closed pull request #2543: URL: https://github.com/apache/hive/pull/2543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631256) Time Spent: 0.5h (was: 20m) > Move the offset updating in BytesColumnVector to setValPreallocated. > > > Key: HIVE-25400 > URL: https://issues.apache.org/jira/browse/HIVE-25400 > Project: Hive > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-25190 changed the semantics of BytesColumnVector so that > ensureValPreallocated reserved the room, which interacted badly with ORC's > redact mask code. The redact mask code needs to be able to increase the > allocation as it goes so it can call the ensureValPreallocated multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25407) [Hive] Investigate why advancing the Write ID not working for some DDLs and fix it, if appropriate
[ https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das updated HIVE-25407: -- Summary: [Hive] Investigate why advancing the Write ID not working for some DDLs and fix it, if appropriate (was: Advance Write ID for remaining DDLs) > [Hive] Investigate why advancing the Write ID not working for some DDLs and > fix it, if appropriate > -- > > Key: HIVE-25407 > URL: https://issues.apache.org/jira/browse/HIVE-25407 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Priority: Major > > Below DDLs should be investigated separately on why the advancing the write > ID is not working for transactional tables, even after adding the logic to > advance the write ID. > * ALTER TABLE SET PARTITION SPEC > * ALTER TABLE UNSET SERDEPROPERTIES > * ALTER TABLE COMPACT > * ALTER TABLE SKEWED BY > * ALTER TABLE SET SKEWED LOCATION -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25381) Hive impersonation Failed when load data of managed tables set as hive
[ https://issues.apache.org/jira/browse/HIVE-25381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HIVE-25381: Fix Version/s: (was: 4.0.0) (was: 3.1.0) > Hive impersonation Failed when load data of managed tables set as hive > -- > > Key: HIVE-25381 > URL: https://issues.apache.org/jira/browse/HIVE-25381 > Project: Hive > Issue Type: Bug >Reporter: Ranith Sardar >Assignee: Ranith Sardar >Priority: Minor > > When hive.server2.enable.doAs = True and setting hive as the default value > for "hive.load.data.owner" property, this will cause below logic(in > Hive.java-needToCopy{color:#24292e}({color})) to fail always as the > framework is validating the owner of the file against the value which we set > in the property hive.load.data.owner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25275) OOM during query planning due to HiveJoinPushTransitivePredicatesRule matching infinitely
[ https://issues.apache.org/jira/browse/HIVE-25275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-25275: -- Assignee: Stamatis Zampetakis > OOM during query planning due to HiveJoinPushTransitivePredicatesRule > matching infinitely > - > > Key: HIVE-25275 > URL: https://issues.apache.org/jira/browse/HIVE-25275 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: Stamatis Zampetakis >Priority: Major > > While running the following query OOM is raised during the planning phase > {code:sql} > CREATE TABLE A (`value_date` date) STORED AS ORC; > CREATE TABLE B (`business_date` date) STORED AS ORC; > SELECT A.VALUE_DATE > FROM A, B > WHERE A.VALUE_DATE = BUSINESS_DATE > AND A.VALUE_DATE = TRUNC(BUSINESS_DATE, 'MONTH'); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25356) JDBCSplitFilterAboveJoinRule's onMatch method throws exception
[ https://issues.apache.org/jira/browse/HIVE-25356?focusedWorklogId=631120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631120 ] ASF GitHub Bot logged work on HIVE-25356: - Author: ASF GitHub Bot Created on: 29/Jul/21 12:48 Start Date: 29/Jul/21 12:48 Worklog Time Spent: 10m Work Description: zabetak commented on a change in pull request #2504: URL: https://github.com/apache/hive/pull/2504#discussion_r679118159 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCAbstractSplitFilterRule.java ## @@ -172,13 +172,14 @@ public boolean matches(RelOptRuleCall call) { final HiveJdbcConverter conv = call.rel(2); RexNode joinCond = join.getCondition(); + SqlDialect dialect = conv.getJdbcDialect(); - return super.matches(call) && JDBCRexCallValidator.isValidJdbcOperation(joinCond, conv.getJdbcDialect()); + return super.matches(call, dialect) && JDBCRexCallValidator.isValidJdbcOperation(joinCond, dialect); Review comment: I think the original author meant to call `super.matches(call, dialect)` and mistakenly called `super.matches(call)`. The signature of `matches(call, dialect)` is source of confusion so to avoid similar problems in the future I would suggest removing entirely this method and call directly `canSplitFilter`. Moreover, it seems that `canSplitFilter` already calls `JDBCRexCallValidator.isValidJdbcOperation` internally so possibly we can remove this additional call from here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631120) Time Spent: 1h (was: 50m) > JDBCSplitFilterAboveJoinRule's onMatch method throws exception > --- > > Key: HIVE-25356 > URL: https://issues.apache.org/jira/browse/HIVE-25356 > Project: Hive > Issue Type: Bug >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > > The stack trace is produced by [JDBCAbstractSplitFilterRule.java#L181 > |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCAbstractSplitFilterRule.java#L181]. > In the onMatch method, a HiveFilter is being cast to HiveJdbcConverter. > {code:java} > java.lang.ClassCastException: > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter cannot be > cast to > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.jdbc.HiveJdbcConverter > java.lang.ClassCastException: > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter cannot be > cast to > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.jdbc.HiveJdbcConverter > at > org.apache.hadoop.hive.ql.optimizer.calcite.rules.jdbc.JDBCAbstractSplitFilterRule$JDBCSplitFilterAboveJoinRule.onMatch(JDBCAbstractSplitFilterRule.java:181) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:271) > at > org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2440) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2406) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2326) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1735) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1588) > at > org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) > at > org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) > at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at > org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at > org.apa
[jira] [Work logged] (HIVE-25406) Fetch writeId from insert-only transactional tables
[ https://issues.apache.org/jira/browse/HIVE-25406?focusedWorklogId=631107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631107 ] ASF GitHub Bot logged work on HIVE-25406: - Author: ASF GitHub Bot Created on: 29/Jul/21 12:10 Start Date: 29/Jul/21 12:10 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #2549: URL: https://github.com/apache/hive/pull/2549 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631107) Remaining Estimate: 0h Time Spent: 10m > Fetch writeId from insert-only transactional tables > --- > > Key: HIVE-25406 > URL: https://issues.apache.org/jira/browse/HIVE-25406 > Project: Hive > Issue Type: Improvement > Components: ORC, Parquet, Reader, Vectorization >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When generating plan for incremental materialized view rebuild a filter > operator is inserted on top of each source table scans. The predicates > contain a filter for writeId since we want to get all the rows > inserted/deleted from the source tables since the last rebuild only. > WriteId is part of the ROW_ID virtual column and only available for > fully-ACID ORC tables. > The goal of this jira is to populate a writeId when fetching from insert-only > transactional tables. > {code:java} > create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES ('transactional'='true', > 'transactional_properties'='insert_only'); > ... > SELECT t1.ROW__ID.writeId, a, b FROM t1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25406) Fetch writeId from insert-only transactional tables
[ https://issues.apache.org/jira/browse/HIVE-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25406: -- Labels: pull-request-available (was: ) > Fetch writeId from insert-only transactional tables > --- > > Key: HIVE-25406 > URL: https://issues.apache.org/jira/browse/HIVE-25406 > Project: Hive > Issue Type: Improvement > Components: ORC, Parquet, Reader, Vectorization >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When generating plan for incremental materialized view rebuild a filter > operator is inserted on top of each source table scans. The predicates > contain a filter for writeId since we want to get all the rows > inserted/deleted from the source tables since the last rebuild only. > WriteId is part of the ROW_ID virtual column and only available for > fully-ACID ORC tables. > The goal of this jira is to populate a writeId when fetching from insert-only > transactional tables. > {code:java} > create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES ('transactional'='true', > 'transactional_properties'='insert_only'); > ... > SELECT t1.ROW__ID.writeId, a, b FROM t1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25406) Fetch writeId from insert-only transactional tables
[ https://issues.apache.org/jira/browse/HIVE-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-25406: -- Summary: Fetch writeId from insert-only transactional tables (was: Fetch writeId from insert-only tables) > Fetch writeId from insert-only transactional tables > --- > > Key: HIVE-25406 > URL: https://issues.apache.org/jira/browse/HIVE-25406 > Project: Hive > Issue Type: Improvement > Components: ORC, Parquet, Reader, Vectorization >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > When generating plan for incremental materialized view rebuild a filter > operator is inserted on top of each source table scans. The predicates > contain a filter for writeId since we want to get all the rows > inserted/deleted from the source tables since the last rebuild only. > WriteId is part of the ROW_ID virtual column and only available for > fully-ACID ORC tables. > The goal of this jira is to populate a writeId when fetching from insert-only > transactional tables. > {code:java} > create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES ('transactional'='true', > 'transactional_properties'='insert_only'); > ... > SELECT t1.ROW__ID.writeId, a, b FROM t1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24706) Spark SQL access hive on HBase table access exception
[ https://issues.apache.org/jira/browse/HIVE-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389816#comment-17389816 ] Paul Lysak edited comment on HIVE-24706 at 7/29/21, 12:08 PM: -- The problem is that `HiveHBaseTableInputFormat` doesn't properly implement `org.apache.hadoop.mapreduce.InputFormat`. We also see the exception happening - and it appears that due to this bug it's not possible to read any HBase-backed Hive tables in Spark 3.x. The issue was originally described here: https://issues.apache.org/jira/browse/SPARK-34210 . A bit of analysis: `org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat` implements `org.apache.hadoop.mapreduce.InputFormat` but it doesn't override `getSplits(JobContext context)` (unlike `getSplits(final JobConf jobConf, final int numSplits)` from the old interface `org.apache.hadoop.mapred.InputFormat`), so it gets delegated to the superclass which doesn't initialize the table properly. Prior to version 3.0, Spark's class `HadoopRDD` was using the old interface `org.apache.hadoop.mapred.InputFormat` which has correct implementation in `HiveHBaseTableInputFormat`. Spark 3.0 has introduced `NewHadoopRDD` which relies on the new interface `org.apache.hadoop.mapreduce.InputFormat` for getting the splits, and its implementation in `HiveHBaseTableInputFormat` is broken - it doesn't initialize the table properly. Here's the excerpt of the exception stacktrace we're getting: {code:java} at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2621) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2610) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeT able either in your constructor or initialize method at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:557) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:248) ... 37 more 21/07/28 10:04:16 ERROR ApplicationMaster: User class threw exception: java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details. java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task 's full log for more details. at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:253) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296){code} was (Author: lysak): The problem is that `HiveHBaseTableInputFormat` doesn't properly implement `org.apache.hadoop.mapreduce.InputFormat`. We also see the exception happening - and it appears that due to this bug it's not possible to read any HBase-backed Hive tables in Spark 3.x. The issue was originally described here: https://issues.apache.org/jira/browse/SPARK-26630. A bit of analysis: `org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat` implements `org.apache.hadoop.mapreduce.InputFormat` but it doesn't override `getSplits(JobContext context)` (unlike `getSplits(final JobConf jobConf, final int numSplits)` from the old interface `org.apache.hadoop.mapred.InputFormat`), so it gets delegated to the superclass which doesn't initialize the table properly. Prior to version 3.0, Spark's class `HadoopRDD` was using the old interface `org.apache.hadoop.mapred.InputFormat` which has correct implementation in `HiveHBaseTableInputFormat`. Spark 3.0 has introduced `NewHadoopRDD` which relies on the new interface `org.apache.hadoop.mapreduce.InputFormat` for getting the splits, and its implementation in `HiveHBaseTableInputFormat` is broken - it doesn't initialize the table properly. Here's the excerpt of the exception stacktrace we're getting: {code:java} at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2621) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2610) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeT able either in your constructor or initialize method at org.a
[jira] [Assigned] (HIVE-25406) Fetch writeId from insert-only tables
[ https://issues.apache.org/jira/browse/HIVE-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-25406: - > Fetch writeId from insert-only tables > - > > Key: HIVE-25406 > URL: https://issues.apache.org/jira/browse/HIVE-25406 > Project: Hive > Issue Type: Improvement > Components: ORC, Parquet, Reader, Vectorization >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > > When generating plan for incremental materialized view rebuild a filter > operator is inserted on top of each source table scans. The predicates > contain a filter for writeId since we want to get all the rows > inserted/deleted from the source tables since the last rebuild only. > WriteId is part of the ROW_ID virtual column and only available for > fully-ACID ORC tables. > The goal of this jira is to populate a writeId when fetching from insert-only > transactional tables. > {code:java} > create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES ('transactional'='true', > 'transactional_properties'='insert_only'); > ... > SELECT t1.ROW__ID.writeId, a, b FROM t1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation
[ https://issues.apache.org/jira/browse/HIVE-25346?focusedWorklogId=631094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631094 ] ASF GitHub Bot logged work on HIVE-25346: - Author: ASF GitHub Bot Created on: 29/Jul/21 11:31 Start Date: 29/Jul/21 11:31 Worklog Time Spent: 10m Work Description: zchovan opened a new pull request #2547: URL: https://github.com/apache/hive/pull/2547 initial test fixes Change-Id: Ieb4f922d1e1957538cbeda2d410a167d18993724 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631094) Time Spent: 1h 20m (was: 1h 10m) > cleanTxnToWriteIdTable breaks SNAPSHOT isolation > > > Key: HIVE-25346 > URL: https://issues.apache.org/jira/browse/HIVE-25346 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25405) Implement Connector Provider for Amazon Redshift
[ https://issues.apache.org/jira/browse/HIVE-25405?focusedWorklogId=631091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631091 ] ASF GitHub Bot logged work on HIVE-25405: - Author: ASF GitHub Bot Created on: 29/Jul/21 11:26 Start Date: 29/Jul/21 11:26 Worklog Time Spent: 10m Work Description: vnhive opened a new pull request #2546: URL: https://github.com/apache/hive/pull/2546 This PR proposes the addition of a data connector implementation to support amazon redshift. The data connector enables connecting to and seamlessly working with a redshift database. 0: jdbc:hive2://> CREATE CONNECTOR IF NOT EXISTS redshift_test_7 . . . . . . . . > TYPE 'redshift' . . . . . . . . > URL '' . . . . . . . . > COMMENT 'test redshift connector' . . . . . . . . > WITH DCPROPERTIES ( . . . . . . . . > "hive.sql.dbcp.username"="**", . . . . . . . . > "hive.sql.dbcp.password"="**"); No rows affected (0.015 seconds) 0: jdbc:hive2://> CREATE REMOTE DATABASE db_sample_7 USING redshift_test_7 with DBPROPERTIES("connector.remoteDbName"="dbname"); 21/07/29 16:40:06 [HiveServer2-Background-Pool: Thread-217]: WARN exec.DDLTask: metastore.warehouse.external.dir is not set, falling back to metastore.warehouse.dir. This could cause external tables to use to managed tablespace. No rows affected (0.02 seconds) 0: jdbc:hive2://> use db_sample_7; No rows affected (0.014 seconds) 0: jdbc:hive2://> show tables; +-+ |tab_name | +-+ | accommodations | | category| | date| | event | | listing | | sales | | sample | | test_time | | test_time_2 | | test_timestamp | | users | | venue | | zipcode | +-+ 13 rows selected (8.578 seconds) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631091) Remaining Estimate: 0h Time Spent: 10m > Implement Connector Provider for Amazon Redshift > > > Key: HIVE-25405 > URL: https://issues.apache.org/jira/browse/HIVE-25405 > Project: Hive > Issue Type: Sub-task >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25405) Implement Connector Provider for Amazon Redshift
[ https://issues.apache.org/jira/browse/HIVE-25405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25405: -- Labels: pull-request-available (was: ) > Implement Connector Provider for Amazon Redshift > > > Key: HIVE-25405 > URL: https://issues.apache.org/jira/browse/HIVE-25405 > Project: Hive > Issue Type: Sub-task >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25405) Implement Connector Provider for Amazon Redshift
[ https://issues.apache.org/jira/browse/HIVE-25405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Narayanan Venkateswaran reassigned HIVE-25405: -- > Implement Connector Provider for Amazon Redshift > > > Key: HIVE-25405 > URL: https://issues.apache.org/jira/browse/HIVE-25405 > Project: Hive > Issue Type: Sub-task >Reporter: Narayanan Venkateswaran >Assignee: Narayanan Venkateswaran >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24706) Spark SQL access hive on HBase table access exception
[ https://issues.apache.org/jira/browse/HIVE-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389816#comment-17389816 ] Paul Lysak commented on HIVE-24706: --- The problem is that `HiveHBaseTableInputFormat` doesn't properly implement `org.apache.hadoop.mapreduce.InputFormat`. We also see the exception happening - and it appears that due to this bug it's not possible to read any HBase-backed Hive tables in Spark 3.x. The issue was originally described here: https://issues.apache.org/jira/browse/SPARK-26630. A bit of analysis: `org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat` implements `org.apache.hadoop.mapreduce.InputFormat` but it doesn't override `getSplits(JobContext context)` (unlike `getSplits(final JobConf jobConf, final int numSplits)` from the old interface `org.apache.hadoop.mapred.InputFormat`), so it gets delegated to the superclass which doesn't initialize the table properly. Prior to version 3.0, Spark's class `HadoopRDD` was using the old interface `org.apache.hadoop.mapred.InputFormat` which has correct implementation in `HiveHBaseTableInputFormat`. Spark 3.0 has introduced `NewHadoopRDD` which relies on the new interface `org.apache.hadoop.mapreduce.InputFormat` for getting the splits, and its implementation in `HiveHBaseTableInputFormat` is broken - it doesn't initialize the table properly. Here's the excerpt of the exception stacktrace we're getting: {code:java} at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2621) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2610) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeT able either in your constructor or initialize method at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:557) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:248) ... 37 more 21/07/28 10:04:16 ERROR ApplicationMaster: User class threw exception: java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details. java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task 's full log for more details. at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:253) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.RDD.partitions(RDD.scala:296){code} > Spark SQL access hive on HBase table access exception > - > > Key: HIVE-24706 > URL: https://issues.apache.org/jira/browse/HIVE-24706 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: zhangzhanchang >Priority: Major > Attachments: image-2021-01-30-15-51-58-665.png > > > Hivehbasetableinputformat relies on two versions of inputformat,one is > org.apache.hadoop.mapred.InputFormat, the other is > org.apache.hadoop.mapreduce.InputFormat,Causes > spark 3.0(https://github.com/apache/spark/pull/31302) both conditions to be > true: > # classOf[oldInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true > # classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true > !image-2021-01-30-15-51-58-665.png|width=430,height=137! > Hivehbasetableinputformat relies on inputformat to be changed to > org.apache.hadoop.mapreduce or org.apache.hadoop.mapred? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-25404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389811#comment-17389811 ] Zoltan Haindrich commented on HIVE-25404: - we could fix the rewrite to be correct {code} #1) INSERT INTO `default`.`t` partition (`id`) (`value`)-- insert clause #2) INSERT INTO `default`.`t` partition (`id`) ()-- insert clause {code} however in #2 case we will bump into that we don't support empty column lists or we could probably rely on HIVE-? and proceed without the partition keyword {code} #1) INSERT INTO `default`.`t` (`id`,`value`)-- insert clause #2) INSERT INTO `default`.`t` (`id`)-- insert clause {code} > Inserts inside merge statements are rewritten incorrectly for partitioned > tables > > > Key: HIVE-25404 > URL: https://issues.apache.org/jira/browse/HIVE-25404 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Priority: Major > > {code} > drop table u;drop table t; > create table t(value string default 'def') partitioned by (id integer); > create table u(id integer); > {code} > #1 id&value specified > rewritten > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause > SELECT `u`.`id`,'x' >WHERE `t`.`id` IS NULL > {code} > #2 when values is not specified > {code} > merge into t using u on t.id=u.id when not matched then insert (id) values > (u.id); > {code} > rewritten query: > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause > SELECT `u`.`id` >WHERE `t`.`id` IS NULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-25404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-25404: Description: {code} drop table u;drop table t; create table t(value string default 'def') partitioned by (id integer); create table u(id integer); {code} #1 id&value specified rewritten {code} FROM `default`.`t` RIGHT OUTER JOIN `default`.`u` ON `t`.`id`=`u`.`id` INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause SELECT `u`.`id`,'x' WHERE `t`.`id` IS NULL {code} #2 when values is not specified {code} merge into t using u on t.id=u.id when not matched then insert (id) values (u.id); {code} rewritten query: {code} FROM `default`.`t` RIGHT OUTER JOIN `default`.`u` ON `t`.`id`=`u`.`id` INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause SELECT `u`.`id` WHERE `t`.`id` IS NULL {code} was: {code} drop table u;drop table t; create table t(value string default 'def') partitioned by (id integer); create table u(id integer); {code} #1 id&value specified rewritten {code} FROM `default`.`t` RIGHT OUTER JOIN `default`.`u` ON `t`.`id`=`u`.`id` INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause SELECT `u`.`id`,'x' WHERE `t`.`id` IS NULL {code} it should be {code} [...] INSERT INTO `default`.`t` partition (`id`) (`value`)-- insert clause [...] {code} #2 when values is not specified {code} merge into t using u on t.id=u.id when not matched then insert (id) values (u.id); {code} rewritten query: {code} FROM `default`.`t` RIGHT OUTER JOIN `default`.`u` ON `t`.`id`=`u`.`id` INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause SELECT `u`.`id` WHERE `t`.`id` IS NULL {code} it should be {code} [...] INSERT INTO `default`.`t` partition (`id`) ()-- insert clause [...] {code} however we don't accept empty column lists > Inserts inside merge statements are rewritten incorrectly for partitioned > tables > > > Key: HIVE-25404 > URL: https://issues.apache.org/jira/browse/HIVE-25404 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Priority: Major > > {code} > drop table u;drop table t; > create table t(value string default 'def') partitioned by (id integer); > create table u(id integer); > {code} > #1 id&value specified > rewritten > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause > SELECT `u`.`id`,'x' >WHERE `t`.`id` IS NULL > {code} > #2 when values is not specified > {code} > merge into t using u on t.id=u.id when not matched then insert (id) values > (u.id); > {code} > rewritten query: > {code} > FROM > `default`.`t` > RIGHT OUTER JOIN > `default`.`u` > ON `t`.`id`=`u`.`id` > INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause > SELECT `u`.`id` >WHERE `t`.`id` IS NULL > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24946) Handle failover case during Repl Load
[ https://issues.apache.org/jira/browse/HIVE-24946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haymant Mangla updated HIVE-24946: -- Description: To handle: # Introduce two states of failover db property to denote nature of database at the time failover was initiated. # If failover start config is enabled and dump directory contains failover marker file, then in incremental load as a preAckTask, we should ## Remove repl.target.for from target db. ## Set repl.failover.endpoint = "TARGET" ## Updated the replication metrics saying that target cluster is failover ready # In the first dump operation in reverse direction, presence of failover ready marker and repl.failover.endpoint = "TARGET" will be used as indicator for bootstrap iteration. # In any dump operation except the first dump operation in reverse dxn, if repl.failover.endpoint is set for db and failover start config is set to false, then remove this property. # In incremental load, if the failover start config is disabled, then add repl.target.for and remove repl.failover.endpoint if present. was: * Update metric during load to capture the readiness for failover * Remove repl.target.for property on target cluster * Prepare the dump directory to be used during failover first dump operation > Handle failover case during Repl Load > - > > Key: HIVE-24946 > URL: https://issues.apache.org/jira/browse/HIVE-24946 > Project: Hive > Issue Type: New Feature >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > To handle: > # Introduce two states of failover db property to denote nature of database > at the time failover was initiated. > # If failover start config is enabled and dump directory contains failover > marker file, then in incremental load as a preAckTask, we should > ## Remove repl.target.for from target db. > ## Set repl.failover.endpoint = "TARGET" > ## Updated the replication metrics saying that target cluster is failover > ready > # In the first dump operation in reverse direction, presence of failover > ready marker and repl.failover.endpoint = "TARGET" will be used as indicator > for bootstrap iteration. > # In any dump operation except the first dump operation in reverse dxn, if > repl.failover.endpoint is set for db and failover start config is set to > false, then remove this property. > # In incremental load, if the failover start config is disabled, then add > repl.target.for and remove repl.failover.endpoint if present. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled
[ https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389774#comment-17389774 ] Zoltan Haindrich commented on HIVE-25140: - using aspect oriented programming as well doesn't mean you can't directly use the api - but it will be less disturbing (and easier to add). you say it doesn't affect a lot of code - the latest patch is 734K long and it only touches VectorMapOperator for some exception. I think this approach is simply bad - it will just miss things here and there...and the patch is just getting bigger and bigger... bq. The very nature of manually instrumenting code like Hive to do tracing is to start at the top of execution (e.g. BeeLine's SQL Statement) and judicially look for large areas of execution that would provide us benefit from a Span. I think this approach is usable when you are looking after a concreate problem and not developing a profiling tool for the system - for hive we should be doing the latter ; we don't know what issues will we encounter in the future. The patch also introduces "traceable" classes which will be painfull to maintain. I think the annotation aspect with an online feature switch plus a big compile time feature disable toggle would be the best; it wouldn't affect much things - people may even run the whole system without the tracing code even in the binaries. > Hive Distributed Tracing -- Part 1: Disabled > > > Key: HIVE-25140 > URL: https://issues.apache.org/jira/browse/HIVE-25140 > Project: Hive > Issue Type: Sub-task >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Major > Attachments: HIVE-25140.01.patch, HIVE-25140.02.patch, > HIVE-25140.03.patch > > > Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to > Thrift and protobuf version conflicts. A logging only exporter is used. > There are Spans for BeeLine and Hive. Server 2. The code was developed on > branch-3.1 and porting Spans to the Hive MetaStore on master is taking more > time due to major metastore code refactoring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation
[ https://issues.apache.org/jira/browse/HIVE-25346?focusedWorklogId=631057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631057 ] ASF GitHub Bot logged work on HIVE-25346: - Author: ASF GitHub Bot Created on: 29/Jul/21 09:23 Start Date: 29/Jul/21 09:23 Worklog Time Spent: 10m Work Description: zchovan closed pull request #2494: URL: https://github.com/apache/hive/pull/2494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631057) Time Spent: 1h 10m (was: 1h) > cleanTxnToWriteIdTable breaks SNAPSHOT isolation > > > Key: HIVE-25346 > URL: https://issues.apache.org/jira/browse/HIVE-25346 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25403) from_unixtime() does not consider leap seconds
[ https://issues.apache.org/jira/browse/HIVE-25403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sruthi Mooriyathvariam reassigned HIVE-25403: - > from_unixtime() does not consider leap seconds > --- > > Key: HIVE-25403 > URL: https://issues.apache.org/jira/browse/HIVE-25403 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sruthi Mooriyathvariam >Assignee: Sruthi Mooriyathvariam >Priority: Major > Fix For: 3.1.2 > > Attachments: image-2021-07-29-14-42-49-806.png > > > The Unix_timestamp() considers "leap second" while the from_unixtime is not; > which results in to wrong result as below: > !image-2021-07-29-14-42-49-806.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable
[ https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389759#comment-17389759 ] Oleksiy Sayankin commented on HIVE-6679: [~kgyrtkirk] Could you please review the PR? > HiveServer2 should support configurable the server side socket timeout and > keepalive for various transports types where applicable > -- > > Key: HIVE-6679 > URL: https://issues.apache.org/jira/browse/HIVE-6679 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0, 1.2.0 >Reporter: Prasad Suresh Mujumdar >Assignee: Oleksiy Sayankin >Priority: Major > Labels: TODOC1.0, TODOC15, pull-request-available > Fix For: 1.3.0 > > Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, > HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch > > Time Spent: 10m > Remaining Estimate: 0h > > HiveServer2 should support configurable the server side socket read timeout > and TCP keep-alive option. Metastore server already support this (and the so > is the old hive server). > We now have multiple client connectivity options like Kerberos, Delegation > Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The > configuration should be applicable to all types (if possible). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem
[ https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary reassigned HIVE-24467: - Assignee: Xi Chen (was: guojh) > ConditionalTask remove tasks that not selected exists thread safety problem > --- > > Key: HIVE-24467 > URL: https://issues.apache.org/jira/browse/HIVE-24467 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.0, 2.3.4, 3.1.2 >Reporter: guojh >Assignee: Xi Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), ConditionalTasks remove the tasks that not selected in parallel, > because there are thread safety issues, some task may not remove from the > dependent task tree. This is a very serious bug, which causes some stage task > not trigger execution. > In our production cluster, the query run three conditional task in parallel, > after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit > to runnable list for his parent Stage-31 is not done. But Stage-31 should > removed for it not selected. > Stage dependencies is below: > {code:java} > STAGE DEPENDENCIES: > Stage-41 is a root stage > Stage-26 depends on stages: Stage-41 > Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, > Stage-2 > Stage-39 has a backup stage: Stage-2 > Stage-23 depends on stages: Stage-39 > Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, > Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 > Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 > Stage-5 > Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 > Stage-51 depends on stages: Stage-0 > Stage-4 > Stage-6 > Stage-7 depends on stages: Stage-6 > Stage-40 has a backup stage: Stage-2 > Stage-24 depends on stages: Stage-40 > Stage-2 > Stage-44 is a root stage > Stage-30 depends on stages: Stage-44 > Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, > Stage-12 > Stage-42 has a backup stage: Stage-12 > Stage-27 depends on stages: Stage-42 > Stage-43 has a backup stage: Stage-12 > Stage-28 depends on stages: Stage-43 > Stage-12 > Stage-47 is a root stage > Stage-34 depends on stages: Stage-47 > Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, > Stage-16 > Stage-45 has a backup stage: Stage-16 > Stage-31 depends on stages: Stage-45 > Stage-46 has a backup stage: Stage-16 > Stage-32 depends on stages: Stage-46 > Stage-16 > Stage-50 is a root stage > Stage-38 depends on stages: Stage-50 > Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, > Stage-20 > Stage-48 has a backup stage: Stage-20 > Stage-35 depends on stages: Stage-48 > Stage-49 has a backup stage: Stage-20 > Stage-36 depends on stages: Stage-49 > Stage-20 > {code} > Stage tasks execute log is below, we can see Stage-33 is conditional task and > it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 > and Stage-46 should remove from the dependent tree, Stage-31 is child of > Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the > below log, we find Stage-31 is still in the parent list of Stage-3, this > should not happend. > {code:java} > 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 1 out of 17 > 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-26:MAPRED] in parallel > 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 2 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-30:MAPRED] in parallel > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 3 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-34:MAPRED] in parallel > 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 4 out of 17 > 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-38:MAPRED] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel > 2020-12-03T01:10:32,946
[jira] [Resolved] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem
[ https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-24467. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the PR [~jshmchenxi]! > ConditionalTask remove tasks that not selected exists thread safety problem > --- > > Key: HIVE-24467 > URL: https://issues.apache.org/jira/browse/HIVE-24467 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.0, 2.3.4, 3.1.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), ConditionalTasks remove the tasks that not selected in parallel, > because there are thread safety issues, some task may not remove from the > dependent task tree. This is a very serious bug, which causes some stage task > not trigger execution. > In our production cluster, the query run three conditional task in parallel, > after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit > to runnable list for his parent Stage-31 is not done. But Stage-31 should > removed for it not selected. > Stage dependencies is below: > {code:java} > STAGE DEPENDENCIES: > Stage-41 is a root stage > Stage-26 depends on stages: Stage-41 > Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, > Stage-2 > Stage-39 has a backup stage: Stage-2 > Stage-23 depends on stages: Stage-39 > Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, > Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 > Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 > Stage-5 > Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 > Stage-51 depends on stages: Stage-0 > Stage-4 > Stage-6 > Stage-7 depends on stages: Stage-6 > Stage-40 has a backup stage: Stage-2 > Stage-24 depends on stages: Stage-40 > Stage-2 > Stage-44 is a root stage > Stage-30 depends on stages: Stage-44 > Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, > Stage-12 > Stage-42 has a backup stage: Stage-12 > Stage-27 depends on stages: Stage-42 > Stage-43 has a backup stage: Stage-12 > Stage-28 depends on stages: Stage-43 > Stage-12 > Stage-47 is a root stage > Stage-34 depends on stages: Stage-47 > Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, > Stage-16 > Stage-45 has a backup stage: Stage-16 > Stage-31 depends on stages: Stage-45 > Stage-46 has a backup stage: Stage-16 > Stage-32 depends on stages: Stage-46 > Stage-16 > Stage-50 is a root stage > Stage-38 depends on stages: Stage-50 > Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, > Stage-20 > Stage-48 has a backup stage: Stage-20 > Stage-35 depends on stages: Stage-48 > Stage-49 has a backup stage: Stage-20 > Stage-36 depends on stages: Stage-49 > Stage-20 > {code} > Stage tasks execute log is below, we can see Stage-33 is conditional task and > it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 > and Stage-46 should remove from the dependent tree, Stage-31 is child of > Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the > below log, we find Stage-31 is still in the parent list of Stage-3, this > should not happend. > {code:java} > 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 1 out of 17 > 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-26:MAPRED] in parallel > 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 2 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-30:MAPRED] in parallel > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 3 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-34:MAPRED] in parallel > 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 4 out of 17 > 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-38:MAPRED] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [
[jira] [Resolved] (HIVE-24904) CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar
[ https://issues.apache.org/jira/browse/HIVE-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24904. - Fix Version/s: 4.0.0 Resolution: Duplicate I've fixed this in HIVE-20071 by migrating and banning that old dependency > CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar > -- > > Key: HIVE-24904 > URL: https://issues.apache.org/jira/browse/HIVE-24904 > Project: Hive > Issue Type: Bug > Components: Security >Reporter: Oleksiy Sayankin >Assignee: Zoltan Haindrich >Priority: Critical > Labels: CVE > Fix For: 4.0.0 > > > CVE list: CVE-2019-10172,CVE-2019-10202 > CVSS score: High > {code} > ./packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/lib/jackson-mapper-asl-1.9.13.jar > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25402) When Hive client has multiple statements without close. queryIdOperation in OperationManager class will exist object that cannot be released
[ https://issues.apache.org/jira/browse/HIVE-25402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lvyankui updated HIVE-25402: Attachment: HIVE-25402.patch > When Hive client has multiple statements without close. queryIdOperation in > OperationManager class will exist object that cannot be released > > > Key: HIVE-25402 > URL: https://issues.apache.org/jira/browse/HIVE-25402 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: All Versions >Reporter: lvyankui >Priority: Major > Attachments: HIVE-25402.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > Hive client code has multiple statements without close > connect = DriverManager.getConnection(jdbcUrl, user, password); > PrintWriter pw = new PrintWriter("/tmp/hive.result" ); > Statement stmt = connect.createStatement(); > Statement stmt1 = connect.createStatement(); > Statement stmt2 = connect.createStatement(); > String sql = "select * from test"; > runSQL(stmt, sql, pw); > runSQL(stmt1, sql, pw); > runSQL(stmt2, sql, pw); > > OperationManager removeOperation method > private Operation removeOperation(OperationHandle opHandle) { > Operation operation = handleToOperation.remove(opHandle); > if (operation == null) { > throw new RuntimeException("Operation does not exist: " + opHandle); > } > String queryId = getQueryId(operation); > *queryIdOperation.remove(queryId);* > > The key of queryIdOperation is queryIdOperation is queryId, queryId is > getted from HiveConf. A new queryId will be generated when a new queryPlan is > generated and set it into HiveConf. If Hive client code has multiple > statements without close, when sqls execute complete, queryIdOperation can > only release the object whose queryId is last generated,other object cannot > be released. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24546) Avoid unwanted cloud storage call during dynamic partition load
[ https://issues.apache.org/jira/browse/HIVE-24546?focusedWorklogId=631021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631021 ] ASF GitHub Bot logged work on HIVE-24546: - Author: ASF GitHub Bot Created on: 29/Jul/21 07:09 Start Date: 29/Jul/21 07:09 Worklog Time Spent: 10m Work Description: rbalamohan opened a new pull request #2545: URL: https://github.com/apache/hive/pull/2545 ### What changes were proposed in this pull request? https://issues.apache.org/jira/browse/HIVE-24546 Fix FS usage ### Why are the changes needed? Optimised FS usage for objectstores; especially during dynamic partition loads. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? small internal cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 631021) Remaining Estimate: 0h Time Spent: 10m > Avoid unwanted cloud storage call during dynamic partition load > --- > > Key: HIVE-24546 > URL: https://issues.apache.org/jira/browse/HIVE-24546 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Attachments: simple_test.sql > > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > private void createDpDirCheckSrc(final Path dpStagingPath, final Path > dpFinalPath) throws IOException { > if (!fs.exists(dpStagingPath) && !fs.exists(dpFinalPath)) { > fs.mkdirs(dpStagingPath); > // move task will create dp final path > if (reporter != null) { > reporter.incrCounter(counterGroup, > Operator.HIVE_COUNTER_CREATED_DYNAMIC_PARTITIONS, 1); > } > } > } > {code} > > > {noformat} > at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:370) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1960) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3164) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3031) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2899) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1723) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4157) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createDpDir(FileSinkOperator.java:948) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.updateDPCounters(FileSinkOperator.java:916) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:849) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:1200) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:1324) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1036) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24546) Avoid unwanted cloud storage call during dynamic partition load
[ https://issues.apache.org/jira/browse/HIVE-24546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24546: -- Labels: pull-request-available (was: ) > Avoid unwanted cloud storage call during dynamic partition load > --- > > Key: HIVE-24546 > URL: https://issues.apache.org/jira/browse/HIVE-24546 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Attachments: simple_test.sql > > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > private void createDpDirCheckSrc(final Path dpStagingPath, final Path > dpFinalPath) throws IOException { > if (!fs.exists(dpStagingPath) && !fs.exists(dpFinalPath)) { > fs.mkdirs(dpStagingPath); > // move task will create dp final path > if (reporter != null) { > reporter.incrCounter(counterGroup, > Operator.HIVE_COUNTER_CREATED_DYNAMIC_PARTITIONS, 1); > } > } > } > {code} > > > {noformat} > at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:370) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1960) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3164) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3031) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2899) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1723) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4157) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createDpDir(FileSinkOperator.java:948) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.updateDPCounters(FileSinkOperator.java:916) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:849) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:1200) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:1324) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1036) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster
[ https://issues.apache.org/jira/browse/HIVE-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Xie updated HIVE-25401: Attachment: (was: HIVE-25401.patch) > Insert overwrite a table which location is on other cluster fail in > kerberos cluster > -- > > Key: HIVE-25401 > URL: https://issues.apache.org/jira/browse/HIVE-25401 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.0, 3.1.2 > Environment: hive 2.3 > hadoop3 cluster with kerberos >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png > > Time Spent: 20m > Remaining Estimate: 0h > > we have tow hdfs clusters with kerberos security, it means that mapreduce > task need delegation tokens to authenticate namenode when hive on mapreduce > run. > Insert overwrite a table which location is on other cluster fail in kerberos > cluster. For example, > # yarn cluster's default fs is hdfs://cluster1 > # tb1's location is hdfs://cluster1/tb1 > # tb2's location is hdfs://cluster2/tb2 > # sql `INSERT OVERWRITE TABLE tb2 SELECT * from tb1` run on yarn cluster > will fail > > reduce task error log: > !image-2021-07-29-14-25-23-418.png! > How to fix: > After dig it, web found mapreduce job just obtain delegation tokens for input > files in FileInputFormat. But Hive context get extendal scratchDir base on > table's location, If the table 's location is on other cluster, the > delegation token will not be obtained. > So we need to obtaine delegation tokens for hive scratchDirs before hive > submit mapreduce job. > > How to test: > no test > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster
[ https://issues.apache.org/jira/browse/HIVE-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Xie updated HIVE-25401: Attachment: HIVE-25401.patch Assignee: Max Xie Status: Patch Available (was: Open) > Insert overwrite a table which location is on other cluster fail in > kerberos cluster > -- > > Key: HIVE-25401 > URL: https://issues.apache.org/jira/browse/HIVE-25401 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.2, 2.3.0 > Environment: hive 2.3 > hadoop3 cluster with kerberos >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png > > Time Spent: 20m > Remaining Estimate: 0h > > we have tow hdfs clusters with kerberos security, it means that mapreduce > task need delegation tokens to authenticate namenode when hive on mapreduce > run. > Insert overwrite a table which location is on other cluster fail in kerberos > cluster. For example, > # yarn cluster's default fs is hdfs://cluster1 > # tb1's location is hdfs://cluster1/tb1 > # tb2's location is hdfs://cluster2/tb2 > # sql `INSERT OVERWRITE TABLE tb2 SELECT * from tb1` run on yarn cluster > will fail > > reduce task error log: > !image-2021-07-29-14-25-23-418.png! > How to fix: > After dig it, web found mapreduce job just obtain delegation tokens for input > files in FileInputFormat. But Hive context get extendal scratchDir base on > table's location, If the table 's location is on other cluster, the > delegation token will not be obtained. > So we need to obtaine delegation tokens for hive scratchDirs before hive > submit mapreduce job. > > How to test: > no test > > -- This message was sent by Atlassian Jira (v8.3.4#803005)