[jira] [Work started] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-07-29 Thread Sruthi Mooriyathvariam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25403 started by Sruthi Mooriyathvariam.
-
>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=631476&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631476
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 30/Jul/21 05:49
Start Date: 30/Jul/21 05:49
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on pull request #2550:
URL: https://github.com/apache/hive/pull/2550#issuecomment-889645738


   @warriersruthi  Please refer - 
https://github.com/apache/hive/blob/10c8278e18942819f2a16c546d5ee1170937e64b/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java
 and refactor the class. As the code is redundant and not issue to review 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631476)
Time Spent: 40m  (was: 0.5h)

>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=631473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631473
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 30/Jul/21 05:44
Start Date: 30/Jul/21 05:44
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on pull request #2550:
URL: https://github.com/apache/hive/pull/2550#issuecomment-889644002


   @warriersruthi divide the PR description as follows - 
   
   What changes were proposed in this pull request?
   Why are the changes needed?
   Does this PR introduce any user-facing change?
   How was this patch tested?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631473)
Time Spent: 0.5h  (was: 20m)

>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=631470&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631470
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 30/Jul/21 05:41
Start Date: 30/Jul/21 05:41
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #2550:
URL: https://github.com/apache/hive/pull/2550#discussion_r679662298



##
File path: ql/src/test/queries/clientpositive/udf5.q
##
@@ -13,6 +13,8 @@ SELECT from_unixtime(unix_timestamp('2010-01-13 11:57:40', 
'-MM-dd HH:mm:ss'
 
 SELECT from_unixtime(unix_timestamp('2010-01-13 11:57:40', '-MM-dd 
HH:mm:ss'), 'MM/dd/yy HH:mm:ss'), from_unixtime(unix_timestamp('2010-01-13 
11:57:40')) from dest1_n14;
 
+SELECT from_unixtime(unix_timestamp(cast('2010-01-13' as date)));

Review comment:
   Run the same query in different timezone too.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFromUnixTime.java
##
@@ -60,7 +62,7 @@
 
   private transient SimpleDateFormat formatter = new 
SimpleDateFormat("-MM-dd HH:mm:ss");

Review comment:
   Remove this variable if not being used anywhere?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFromUnixTime.java
##
@@ -60,7 +62,7 @@
 
   private transient SimpleDateFormat formatter = new 
SimpleDateFormat("-MM-dd HH:mm:ss");
   private transient String lastFormat = null;
-
+  private transient DateTimeFormatter FORMATTER =  
DateTimeFormatter.ofPattern("-MM-dd HH:mm:ss");

Review comment:
   Update this variable in configure method variable too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631470)
Time Spent: 20m  (was: 10m)

>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25407) [Hive] Investigate why advancing the Write ID not working for some DDLs and fix it, if appropriate

2021-07-29 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390292#comment-17390292
 ] 

Peter Vary commented on HIVE-25407:
---

We probably don't want to advance the writeId, especially for compactions, 
where there are no changes in the data. Don't forget that the materialised view 
handling depends on the writeId to decide if the view data should be used / 
refreshed or not. If we add unnecessary writeIds we start to cause performance 
issues with materialised views above the table.

CC: [~klcopp]

> [Hive] Investigate why advancing the Write ID not working for some DDLs and 
> fix it, if appropriate
> --
>
> Key: HIVE-25407
> URL: https://issues.apache.org/jira/browse/HIVE-25407
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Priority: Major
>
> Below DDLs should be investigated separately on why the advancing the write 
> ID is not working for transactional tables, even after adding the logic to 
> advance the write ID. 
>  * ALTER TABLE SET PARTITION SPEC 
>  * ALTER TABLE  UNSET SERDEPROPERTIES 
>  * ALTER TABLE COMPACT 
>  * ALTER TABLE SKEWED BY
>  * ALTER TABLE SET SKEWED LOCATION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25403:
--
Labels: pull-request-available  (was: )

>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?focusedWorklogId=631462&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631462
 ]

ASF GitHub Bot logged work on HIVE-25403:
-

Author: ASF GitHub Bot
Created on: 30/Jul/21 05:26
Start Date: 30/Jul/21 05:26
Worklog Time Spent: 10m 
  Work Description: warriersruthi opened a new pull request #2550:
URL: https://github.com/apache/hive/pull/2550


   The query 
   SELECT from_unixtime(unix_timestamp(cast('1400-01-01' as date)));
   was giving wrong results, because the from_unixtime() function was not 
considering leap seconds while representing Timestamp (it was using 
java.utils.Date). 
   The Unix_timestamp() already considered this, so in order to get the correct 
results for the above query, it was required to change the epoch time in the 
from_unixtime() to ZonedDateTime. The required conversion was done with the 
help of the Instant class, which represents a moment given the epoch time. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631462)
Remaining Estimate: 0h
Time Spent: 10m

>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
> Fix For: 3.1.2
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25408) AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for Authorization.

2021-07-29 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-25408:



> AlterTableSetOwnerAnalyzer should send Hive Privilege Objects for 
> Authorization. 
> -
>
> Key: HIVE-25408
> URL: https://issues.apache.org/jira/browse/HIVE-25408
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Currently, Hive is sending an empty list in the Hive Privilege Objects for 
> authorization when a user does the following operation: alter table foo set 
> owner user user_name;
> We should be sending the input/objects related to the table in Hive privilege 
> objects for authorization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-29 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-25400.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Thanks for the review, Panos!

> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-29 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390170#comment-17390170
 ] 

Dongjoon Hyun commented on HIVE-25400:
--

Since the PR is merged, could you resolve this issue?

> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25400?focusedWorklogId=631367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631367
 ]

ASF GitHub Bot logged work on HIVE-25400:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 21:41
Start Date: 29/Jul/21 21:41
Worklog Time Spent: 10m 
  Work Description: dongjoon-hyun commented on pull request #2543:
URL: https://github.com/apache/hive/pull/2543#issuecomment-889479884


   Thank you all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631367)
Time Spent: 40m  (was: 0.5h)

> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25400) Move the offset updating in BytesColumnVector to setValPreallocated.

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25400?focusedWorklogId=631256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631256
 ]

ASF GitHub Bot logged work on HIVE-25400:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 18:06
Start Date: 29/Jul/21 18:06
Worklog Time Spent: 10m 
  Work Description: omalley closed pull request #2543:
URL: https://github.com/apache/hive/pull/2543


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631256)
Time Spent: 0.5h  (was: 20m)

> Move the offset updating in BytesColumnVector to setValPreallocated.
> 
>
> Key: HIVE-25400
> URL: https://issues.apache.org/jira/browse/HIVE-25400
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: storage-2.7.3, storage-2.8.1, storage-2.9.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-25190 changed the semantics of BytesColumnVector so that 
> ensureValPreallocated reserved the room, which interacted badly with ORC's 
> redact mask code. The redact mask code needs to be able to increase the 
> allocation as it goes so it can call the ensureValPreallocated multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25407) [Hive] Investigate why advancing the Write ID not working for some DDLs and fix it, if appropriate

2021-07-29 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das updated HIVE-25407:
--
Summary: [Hive] Investigate why advancing the Write ID not working for some 
DDLs and fix it, if appropriate  (was: Advance Write ID for remaining DDLs)

> [Hive] Investigate why advancing the Write ID not working for some DDLs and 
> fix it, if appropriate
> --
>
> Key: HIVE-25407
> URL: https://issues.apache.org/jira/browse/HIVE-25407
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Priority: Major
>
> Below DDLs should be investigated separately on why the advancing the write 
> ID is not working for transactional tables, even after adding the logic to 
> advance the write ID. 
>  * ALTER TABLE SET PARTITION SPEC 
>  * ALTER TABLE  UNSET SERDEPROPERTIES 
>  * ALTER TABLE COMPACT 
>  * ALTER TABLE SKEWED BY
>  * ALTER TABLE SET SKEWED LOCATION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25381) Hive impersonation Failed when load data of managed tables set as hive

2021-07-29 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HIVE-25381:

Fix Version/s: (was: 4.0.0)
   (was: 3.1.0)

> Hive impersonation Failed when load data of managed tables set as hive
> --
>
> Key: HIVE-25381
> URL: https://issues.apache.org/jira/browse/HIVE-25381
> Project: Hive
>  Issue Type: Bug
>Reporter: Ranith Sardar
>Assignee: Ranith Sardar
>Priority: Minor
>
> When hive.server2.enable.doAs = True and setting hive as the default value 
> for "hive.load.data.owner" property, this will cause below logic(in 
> Hive.java-needToCopy{color:#24292e}({color}))  to fail always as the 
> framework is validating the owner of the file against the value which we set 
> in the property hive.load.data.owner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25275) OOM during query planning due to HiveJoinPushTransitivePredicatesRule matching infinitely

2021-07-29 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-25275:
--

Assignee: Stamatis Zampetakis

> OOM during query planning due to HiveJoinPushTransitivePredicatesRule 
> matching infinitely
> -
>
> Key: HIVE-25275
> URL: https://issues.apache.org/jira/browse/HIVE-25275
> Project: Hive
>  Issue Type: Bug
>Reporter: László Pintér
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> While running the following query OOM is raised during the planning phase
> {code:sql}
> CREATE TABLE A (`value_date` date) STORED AS ORC;
> CREATE TABLE B (`business_date` date) STORED AS ORC;
> SELECT A.VALUE_DATE
> FROM A, B
> WHERE A.VALUE_DATE = BUSINESS_DATE
>   AND A.VALUE_DATE = TRUNC(BUSINESS_DATE, 'MONTH');
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25356) JDBCSplitFilterAboveJoinRule's onMatch method throws exception

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25356?focusedWorklogId=631120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631120
 ]

ASF GitHub Bot logged work on HIVE-25356:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 12:48
Start Date: 29/Jul/21 12:48
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2504:
URL: https://github.com/apache/hive/pull/2504#discussion_r679118159



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCAbstractSplitFilterRule.java
##
@@ -172,13 +172,14 @@ public boolean matches(RelOptRuleCall call) {
   final HiveJdbcConverter conv = call.rel(2);
 
   RexNode joinCond = join.getCondition();
+  SqlDialect dialect = conv.getJdbcDialect();
 
-  return super.matches(call) && 
JDBCRexCallValidator.isValidJdbcOperation(joinCond, conv.getJdbcDialect());
+  return super.matches(call, dialect) && 
JDBCRexCallValidator.isValidJdbcOperation(joinCond, dialect);

Review comment:
   I think the original author meant to call `super.matches(call, dialect)` 
and mistakenly called `super.matches(call)`. The signature of `matches(call, 
dialect)` is source of confusion so to avoid similar problems in the future I 
would suggest removing entirely this method and call directly `canSplitFilter`. 
   
   Moreover, it seems that `canSplitFilter` already calls 
`JDBCRexCallValidator.isValidJdbcOperation` internally so possibly we can 
remove this additional call from here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631120)
Time Spent: 1h  (was: 50m)

> JDBCSplitFilterAboveJoinRule's onMatch method throws exception 
> ---
>
> Key: HIVE-25356
> URL: https://issues.apache.org/jira/browse/HIVE-25356
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  
>  The stack trace is produced by [JDBCAbstractSplitFilterRule.java#L181 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCAbstractSplitFilterRule.java#L181].
>  In the onMatch method, a HiveFilter is being cast to HiveJdbcConverter.
> {code:java}
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter cannot be 
> cast to 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.jdbc.HiveJdbcConverter
>  java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveFilter cannot be 
> cast to 
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.jdbc.HiveJdbcConverter
>  at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.jdbc.JDBCAbstractSplitFilterRule$JDBCSplitFilterAboveJoinRule.onMatch(JDBCAbstractSplitFilterRule.java:181)
>  at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>  at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:271)
>  at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>  at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2440)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2406)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPostJoinOrderingTransform(CalcitePlanner.java:2326)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1735)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1588)
>  at 
> org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) 
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914)
>  at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at 
> org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at 
> org.apa

[jira] [Work logged] (HIVE-25406) Fetch writeId from insert-only transactional tables

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25406?focusedWorklogId=631107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631107
 ]

ASF GitHub Bot logged work on HIVE-25406:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 12:10
Start Date: 29/Jul/21 12:10
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2549:
URL: https://github.com/apache/hive/pull/2549


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631107)
Remaining Estimate: 0h
Time Spent: 10m

> Fetch writeId from insert-only transactional tables
> ---
>
> Key: HIVE-25406
> URL: https://issues.apache.org/jira/browse/HIVE-25406
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Parquet, Reader, Vectorization
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When generating plan for incremental materialized view rebuild a filter 
> operator is inserted on top of each source table scans. The predicates 
> contain a filter for writeId since we want to get all the rows 
> inserted/deleted from the source tables since the last rebuild only.
> WriteId is part of the ROW_ID virtual column and only available for 
> fully-ACID ORC tables.
> The goal of this jira is to populate a writeId when fetching from insert-only 
> transactional tables.
> {code:java}
> create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true', 
> 'transactional_properties'='insert_only');
> ...
> SELECT t1.ROW__ID.writeId, a, b FROM t1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25406) Fetch writeId from insert-only transactional tables

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25406:
--
Labels: pull-request-available  (was: )

> Fetch writeId from insert-only transactional tables
> ---
>
> Key: HIVE-25406
> URL: https://issues.apache.org/jira/browse/HIVE-25406
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Parquet, Reader, Vectorization
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When generating plan for incremental materialized view rebuild a filter 
> operator is inserted on top of each source table scans. The predicates 
> contain a filter for writeId since we want to get all the rows 
> inserted/deleted from the source tables since the last rebuild only.
> WriteId is part of the ROW_ID virtual column and only available for 
> fully-ACID ORC tables.
> The goal of this jira is to populate a writeId when fetching from insert-only 
> transactional tables.
> {code:java}
> create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true', 
> 'transactional_properties'='insert_only');
> ...
> SELECT t1.ROW__ID.writeId, a, b FROM t1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25406) Fetch writeId from insert-only transactional tables

2021-07-29 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-25406:
--
Summary: Fetch writeId from insert-only transactional tables  (was: Fetch 
writeId from insert-only tables)

> Fetch writeId from insert-only transactional tables
> ---
>
> Key: HIVE-25406
> URL: https://issues.apache.org/jira/browse/HIVE-25406
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Parquet, Reader, Vectorization
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> When generating plan for incremental materialized view rebuild a filter 
> operator is inserted on top of each source table scans. The predicates 
> contain a filter for writeId since we want to get all the rows 
> inserted/deleted from the source tables since the last rebuild only.
> WriteId is part of the ROW_ID virtual column and only available for 
> fully-ACID ORC tables.
> The goal of this jira is to populate a writeId when fetching from insert-only 
> transactional tables.
> {code:java}
> create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true', 
> 'transactional_properties'='insert_only');
> ...
> SELECT t1.ROW__ID.writeId, a, b FROM t1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24706) Spark SQL access hive on HBase table access exception

2021-07-29 Thread Paul Lysak (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389816#comment-17389816
 ] 

Paul Lysak edited comment on HIVE-24706 at 7/29/21, 12:08 PM:
--

The problem is that `HiveHBaseTableInputFormat` doesn't properly implement 
`org.apache.hadoop.mapreduce.InputFormat`.
 We also see the exception happening - and it appears that due to this bug it's 
not possible to read any HBase-backed Hive tables in Spark 3.x. 
 The issue was originally described here: 
https://issues.apache.org/jira/browse/SPARK-34210 . 

A bit of analysis: `org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat` 
implements `org.apache.hadoop.mapreduce.InputFormat`
 but it doesn't override `getSplits(JobContext context)` (unlike 
`getSplits(final JobConf jobConf, final int numSplits)` from the old interface 
`org.apache.hadoop.mapred.InputFormat`), 
 so it gets delegated to the superclass which doesn't initialize the table 
properly.
 Prior to version 3.0, Spark's class `HadoopRDD` was using the old interface 
`org.apache.hadoop.mapred.InputFormat` which has correct implementation in 
`HiveHBaseTableInputFormat`.
 Spark 3.0 has introduced `NewHadoopRDD` which relies on the new interface 
`org.apache.hadoop.mapreduce.InputFormat` for getting the splits, and its 
implementation in `HiveHBaseTableInputFormat`
 is broken - it doesn't initialize the table properly.

Here's the excerpt of the exception stacktrace we're getting:
{code:java}
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2621)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2610)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
 Caused by: java.lang.IllegalStateException: The input format instance has not 
been properly initialized. Ensure you call initializeT
 able either in your constructor or initialize method
 at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:557)
 at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:248)
 ... 37 more
 21/07/28 10:04:16 ERROR ApplicationMaster: User class threw exception: 
java.io.IOException: Cannot create a record reader because of
 a previous error. Please look at the previous logs lines from the task's full 
log for more details.
 java.io.IOException: Cannot create a record reader because of a previous 
error. Please look at the previous logs lines from the task
 's full log for more details.
 at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:253)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131)
 at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
 at scala.Option.getOrElse(Option.scala:189)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
 at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
 at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
 at scala.Option.getOrElse(Option.scala:189)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:296){code}
 


was (Author: lysak):
The problem is that `HiveHBaseTableInputFormat` doesn't properly implement 
`org.apache.hadoop.mapreduce.InputFormat`.
 We also see the exception happening - and it appears that due to this bug it's 
not possible to read any HBase-backed Hive tables in Spark 3.x. 
 The issue was originally described here: 
https://issues.apache.org/jira/browse/SPARK-26630.

A bit of analysis: `org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat` 
implements `org.apache.hadoop.mapreduce.InputFormat`
 but it doesn't override `getSplits(JobContext context)` (unlike 
`getSplits(final JobConf jobConf, final int numSplits)` from the old interface 
`org.apache.hadoop.mapred.InputFormat`), 
 so it gets delegated to the superclass which doesn't initialize the table 
properly.
 Prior to version 3.0, Spark's class `HadoopRDD` was using the old interface 
`org.apache.hadoop.mapred.InputFormat` which has correct implementation in 
`HiveHBaseTableInputFormat`.
 Spark 3.0 has introduced `NewHadoopRDD` which relies on the new interface 
`org.apache.hadoop.mapreduce.InputFormat` for getting the splits, and its 
implementation in `HiveHBaseTableInputFormat`
 is broken - it doesn't initialize the table properly.

Here's the excerpt of the exception stacktrace we're getting:
{code:java}
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2621)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2610)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
 Caused by: java.lang.IllegalStateException: The input format instance has not 
been properly initialized. Ensure you call initializeT
 able either in your constructor or initialize method
 at 
org.a

[jira] [Assigned] (HIVE-25406) Fetch writeId from insert-only tables

2021-07-29 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25406:
-


> Fetch writeId from insert-only tables
> -
>
> Key: HIVE-25406
> URL: https://issues.apache.org/jira/browse/HIVE-25406
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Parquet, Reader, Vectorization
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> When generating plan for incremental materialized view rebuild a filter 
> operator is inserted on top of each source table scans. The predicates 
> contain a filter for writeId since we want to get all the rows 
> inserted/deleted from the source tables since the last rebuild only.
> WriteId is part of the ROW_ID virtual column and only available for 
> fully-ACID ORC tables.
> The goal of this jira is to populate a writeId when fetching from insert-only 
> transactional tables.
> {code:java}
> create table t1(a int, b int) clustered by (a) into 2 buckets stored as orc 
> TBLPROPERTIES ('transactional'='true', 
> 'transactional_properties'='insert_only');
> ...
> SELECT t1.ROW__ID.writeId, a, b FROM t1;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25346?focusedWorklogId=631094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631094
 ]

ASF GitHub Bot logged work on HIVE-25346:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 11:31
Start Date: 29/Jul/21 11:31
Worklog Time Spent: 10m 
  Work Description: zchovan opened a new pull request #2547:
URL: https://github.com/apache/hive/pull/2547


   initial test fixes
   Change-Id: Ieb4f922d1e1957538cbeda2d410a167d18993724
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631094)
Time Spent: 1h 20m  (was: 1h 10m)

> cleanTxnToWriteIdTable breaks SNAPSHOT isolation
> 
>
> Key: HIVE-25346
> URL: https://issues.apache.org/jira/browse/HIVE-25346
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25405) Implement Connector Provider for Amazon Redshift

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25405?focusedWorklogId=631091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631091
 ]

ASF GitHub Bot logged work on HIVE-25405:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 11:26
Start Date: 29/Jul/21 11:26
Worklog Time Spent: 10m 
  Work Description: vnhive opened a new pull request #2546:
URL: https://github.com/apache/hive/pull/2546


   This PR proposes the addition of a data connector implementation to support 
amazon redshift.
   
   The data connector enables connecting to and seamlessly working with a 
redshift database.
   
   0: jdbc:hive2://> CREATE CONNECTOR IF NOT EXISTS redshift_test_7
   . . . . . . . . > TYPE 'redshift'
   . . . . . . . . > URL ''
   . . . . . . . . > COMMENT 'test redshift connector'
   . . . . . . . . > WITH DCPROPERTIES (
   . . . . . . . . > "hive.sql.dbcp.username"="**",
   . . . . . . . . > "hive.sql.dbcp.password"="**");
   No rows affected (0.015 seconds)
   
   0: jdbc:hive2://> CREATE REMOTE DATABASE db_sample_7 USING redshift_test_7 
with DBPROPERTIES("connector.remoteDbName"="dbname");
   21/07/29 16:40:06 [HiveServer2-Background-Pool: Thread-217]: WARN 
exec.DDLTask: metastore.warehouse.external.dir is not set, falling back to 
metastore.warehouse.dir. This could cause external tables to use to managed 
tablespace.
   No rows affected (0.02 seconds)
   
   0: jdbc:hive2://> use db_sample_7;
   No rows affected (0.014 seconds)
   
   0: jdbc:hive2://> show tables;
   +-+
   |tab_name |
   +-+
   | accommodations  |
   | category|
   | date|
   | event   |
   | listing |
   | sales   |
   | sample  |
   | test_time   |
   | test_time_2 |
   | test_timestamp  |
   | users   |
   | venue   |
   | zipcode |
   +-+
   13 rows selected (8.578 seconds)
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631091)
Remaining Estimate: 0h
Time Spent: 10m

> Implement Connector Provider for Amazon Redshift
> 
>
> Key: HIVE-25405
> URL: https://issues.apache.org/jira/browse/HIVE-25405
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25405) Implement Connector Provider for Amazon Redshift

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25405:
--
Labels: pull-request-available  (was: )

> Implement Connector Provider for Amazon Redshift
> 
>
> Key: HIVE-25405
> URL: https://issues.apache.org/jira/browse/HIVE-25405
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25405) Implement Connector Provider for Amazon Redshift

2021-07-29 Thread Narayanan Venkateswaran (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayanan Venkateswaran reassigned HIVE-25405:
--


> Implement Connector Provider for Amazon Redshift
> 
>
> Key: HIVE-25405
> URL: https://issues.apache.org/jira/browse/HIVE-25405
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24706) Spark SQL access hive on HBase table access exception

2021-07-29 Thread Paul Lysak (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389816#comment-17389816
 ] 

Paul Lysak commented on HIVE-24706:
---

The problem is that `HiveHBaseTableInputFormat` doesn't properly implement 
`org.apache.hadoop.mapreduce.InputFormat`.
 We also see the exception happening - and it appears that due to this bug it's 
not possible to read any HBase-backed Hive tables in Spark 3.x. 
 The issue was originally described here: 
https://issues.apache.org/jira/browse/SPARK-26630.

A bit of analysis: `org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat` 
implements `org.apache.hadoop.mapreduce.InputFormat`
 but it doesn't override `getSplits(JobContext context)` (unlike 
`getSplits(final JobConf jobConf, final int numSplits)` from the old interface 
`org.apache.hadoop.mapred.InputFormat`), 
 so it gets delegated to the superclass which doesn't initialize the table 
properly.
 Prior to version 3.0, Spark's class `HadoopRDD` was using the old interface 
`org.apache.hadoop.mapred.InputFormat` which has correct implementation in 
`HiveHBaseTableInputFormat`.
 Spark 3.0 has introduced `NewHadoopRDD` which relies on the new interface 
`org.apache.hadoop.mapreduce.InputFormat` for getting the splits, and its 
implementation in `HiveHBaseTableInputFormat`
 is broken - it doesn't initialize the table properly.

Here's the excerpt of the exception stacktrace we're getting:
{code:java}
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2621)
 at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2610)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
 Caused by: java.lang.IllegalStateException: The input format instance has not 
been properly initialized. Ensure you call initializeT
 able either in your constructor or initialize method
 at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:557)
 at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:248)
 ... 37 more
 21/07/28 10:04:16 ERROR ApplicationMaster: User class threw exception: 
java.io.IOException: Cannot create a record reader because of
 a previous error. Please look at the previous logs lines from the task's full 
log for more details.
 java.io.IOException: Cannot create a record reader because of a previous 
error. Please look at the previous logs lines from the task
 's full log for more details.
 at 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:253)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:131)
 at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
 at scala.Option.getOrElse(Option.scala:189)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
 at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
 at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
 at scala.Option.getOrElse(Option.scala:189)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:296){code}
 

> Spark SQL access hive on HBase table access exception
> -
>
> Key: HIVE-24706
> URL: https://issues.apache.org/jira/browse/HIVE-24706
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: zhangzhanchang
>Priority: Major
> Attachments: image-2021-01-30-15-51-58-665.png
>
>
> Hivehbasetableinputformat relies on two versions of inputformat,one is 
> org.apache.hadoop.mapred.InputFormat, the other is 
> org.apache.hadoop.mapreduce.InputFormat,Causes
> spark 3.0(https://github.com/apache/spark/pull/31302) both conditions to be 
> true:
>  # classOf[oldInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
>  # classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
> !image-2021-01-30-15-51-58-665.png|width=430,height=137!
> Hivehbasetableinputformat relies on inputformat to be changed to 
> org.apache.hadoop.mapreduce or org.apache.hadoop.mapred?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables

2021-07-29 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389811#comment-17389811
 ] 

Zoltan Haindrich commented on HIVE-25404:
-

we could fix the rewrite to  be correct
{code}
#1) INSERT INTO `default`.`t` partition (`id`) (`value`)-- insert clause
#2) INSERT INTO `default`.`t` partition (`id`) ()-- insert clause
{code}
however in #2 case we will bump into that we don't support empty column lists

or we could probably rely on HIVE-? and proceed without the partition keyword
{code}
#1) INSERT INTO `default`.`t` (`id`,`value`)-- insert clause
#2) INSERT INTO `default`.`t` (`id`)-- insert clause
{code}

> Inserts inside merge statements are rewritten incorrectly for partitioned 
> tables
> 
>
> Key: HIVE-25404
> URL: https://issues.apache.org/jira/browse/HIVE-25404
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> {code}
> drop table u;drop table t;
> create table t(value string default 'def') partitioned by (id integer);
> create table u(id integer);
> {code}
> #1 id&value specified
> rewritten
> {code}
> FROM
>   `default`.`t`
>   RIGHT OUTER JOIN
>   `default`.`u`
>   ON `t`.`id`=`u`.`id`
> INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause
>   SELECT `u`.`id`,'x'
>WHERE `t`.`id` IS NULL
> {code}
> #2 when values is not specified
> {code}
> merge into t using u on t.id=u.id when not matched then insert (id) values 
> (u.id);
> {code}
> rewritten query:
> {code}
> FROM
>   `default`.`t`
>   RIGHT OUTER JOIN
>   `default`.`u`
>   ON `t`.`id`=`u`.`id`
> INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause
>   SELECT `u`.`id`
>WHERE `t`.`id` IS NULL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables

2021-07-29 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25404:

Description: 
{code}
drop table u;drop table t;

create table t(value string default 'def') partitioned by (id integer);
create table u(id integer);
{code}

#1 id&value specified
rewritten
{code}
FROM
  `default`.`t`
  RIGHT OUTER JOIN
  `default`.`u`
  ON `t`.`id`=`u`.`id`
INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause
  SELECT `u`.`id`,'x'
   WHERE `t`.`id` IS NULL
{code}

#2 when values is not specified

{code}
merge into t using u on t.id=u.id when not matched then insert (id) values 
(u.id);
{code}

rewritten query:
{code}
FROM
  `default`.`t`
  RIGHT OUTER JOIN
  `default`.`u`
  ON `t`.`id`=`u`.`id`
INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause
  SELECT `u`.`id`
   WHERE `t`.`id` IS NULL
{code}



  was:
{code}
drop table u;drop table t;

create table t(value string default 'def') partitioned by (id integer);
create table u(id integer);
{code}

#1 id&value specified
rewritten
{code}
FROM
  `default`.`t`
  RIGHT OUTER JOIN
  `default`.`u`
  ON `t`.`id`=`u`.`id`
INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause
  SELECT `u`.`id`,'x'
   WHERE `t`.`id` IS NULL
{code}
it should be
{code}
[...]
INSERT INTO `default`.`t` partition (`id`) (`value`)-- insert clause
[...]
{code}

#2 when values is not specified

{code}
merge into t using u on t.id=u.id when not matched then insert (id) values 
(u.id);
{code}

rewritten query:
{code}
FROM
  `default`.`t`
  RIGHT OUTER JOIN
  `default`.`u`
  ON `t`.`id`=`u`.`id`
INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause
  SELECT `u`.`id`
   WHERE `t`.`id` IS NULL
{code}

it should be
{code}
[...]
INSERT INTO `default`.`t` partition (`id`) ()-- insert clause
[...]
{code}

however we don't accept empty column lists


> Inserts inside merge statements are rewritten incorrectly for partitioned 
> tables
> 
>
> Key: HIVE-25404
> URL: https://issues.apache.org/jira/browse/HIVE-25404
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> {code}
> drop table u;drop table t;
> create table t(value string default 'def') partitioned by (id integer);
> create table u(id integer);
> {code}
> #1 id&value specified
> rewritten
> {code}
> FROM
>   `default`.`t`
>   RIGHT OUTER JOIN
>   `default`.`u`
>   ON `t`.`id`=`u`.`id`
> INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause
>   SELECT `u`.`id`,'x'
>WHERE `t`.`id` IS NULL
> {code}
> #2 when values is not specified
> {code}
> merge into t using u on t.id=u.id when not matched then insert (id) values 
> (u.id);
> {code}
> rewritten query:
> {code}
> FROM
>   `default`.`t`
>   RIGHT OUTER JOIN
>   `default`.`u`
>   ON `t`.`id`=`u`.`id`
> INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause
>   SELECT `u`.`id`
>WHERE `t`.`id` IS NULL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24946) Handle failover case during Repl Load

2021-07-29 Thread Haymant Mangla (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haymant Mangla updated HIVE-24946:
--
Description: 
To handle:
 # Introduce two states of failover db property to denote nature of database at 
the time failover was initiated.
 # If failover start config is enabled and dump directory contains failover 
marker file, then in incremental load as a preAckTask, we should
 ## Remove repl.target.for from target db.
 ## Set repl.failover.endpoint = "TARGET"
 ## Updated the replication metrics saying that target cluster is failover ready
 # In the first dump operation in reverse direction, presence of failover ready 
marker and repl.failover.endpoint = "TARGET" will be used as indicator for 
bootstrap iteration.
 # In any dump operation except the first dump operation in reverse dxn, if 
repl.failover.endpoint is set for db and failover start config is set to false, 
then remove this property.
 # In incremental load, if the failover start config is disabled, then add 
repl.target.for and remove repl.failover.endpoint if present.

  was:
* Update metric during load to capture the readiness for failover
 * Remove repl.target.for property on target cluster
 * Prepare the dump directory to be used during failover first dump operation


> Handle failover case during Repl Load
> -
>
> Key: HIVE-24946
> URL: https://issues.apache.org/jira/browse/HIVE-24946
> Project: Hive
>  Issue Type: New Feature
>Reporter: Haymant Mangla
>Assignee: Haymant Mangla
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> To handle:
>  # Introduce two states of failover db property to denote nature of database 
> at the time failover was initiated.
>  # If failover start config is enabled and dump directory contains failover 
> marker file, then in incremental load as a preAckTask, we should
>  ## Remove repl.target.for from target db.
>  ## Set repl.failover.endpoint = "TARGET"
>  ## Updated the replication metrics saying that target cluster is failover 
> ready
>  # In the first dump operation in reverse direction, presence of failover 
> ready marker and repl.failover.endpoint = "TARGET" will be used as indicator 
> for bootstrap iteration.
>  # In any dump operation except the first dump operation in reverse dxn, if 
> repl.failover.endpoint is set for db and failover start config is set to 
> false, then remove this property.
>  # In incremental load, if the failover start config is disabled, then add 
> repl.target.for and remove repl.failover.endpoint if present.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25140) Hive Distributed Tracing -- Part 1: Disabled

2021-07-29 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389774#comment-17389774
 ] 

Zoltan Haindrich commented on HIVE-25140:
-

using aspect oriented programming as well doesn't mean you can't directly use 
the api - but it will be less disturbing (and easier to add).

you say it doesn't affect a lot of code - the latest patch is 734K long and it 
only touches VectorMapOperator for some exception. I think this approach is 
simply bad - it will just miss things here and there...and the patch is just 
getting bigger and bigger...

bq.  The very nature of manually instrumenting code like Hive to do tracing is 
to start at the top of execution (e.g. BeeLine's SQL Statement) and judicially 
look for large areas of execution that would provide us benefit from a Span.

I think this approach is usable when you are looking after a concreate problem 
and not developing a profiling tool for the system - for hive we should be 
doing the latter ; we don't know what issues will we encounter in the future.

The patch also introduces "traceable" classes which will be painfull to 
maintain.

I think the annotation aspect with an online feature switch plus a big compile 
time feature disable toggle would be the best; it wouldn't affect much things - 
people may even run the whole system without the tracing code even in the 
binaries.

> Hive Distributed Tracing -- Part 1: Disabled
> 
>
> Key: HIVE-25140
> URL: https://issues.apache.org/jira/browse/HIVE-25140
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-25140.01.patch, HIVE-25140.02.patch, 
> HIVE-25140.03.patch
>
>
> Infrastructure except exporters to Jaeger or OpenTelementry (OTL) due to 
> Thrift and protobuf version conflicts. A logging only exporter is used.
> There are Spans for BeeLine and Hive. Server 2. The code was developed on 
> branch-3.1 and porting Spans to the Hive MetaStore on master is taking more 
> time due to major metastore code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25346) cleanTxnToWriteIdTable breaks SNAPSHOT isolation

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25346?focusedWorklogId=631057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631057
 ]

ASF GitHub Bot logged work on HIVE-25346:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 09:23
Start Date: 29/Jul/21 09:23
Worklog Time Spent: 10m 
  Work Description: zchovan closed pull request #2494:
URL: https://github.com/apache/hive/pull/2494


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631057)
Time Spent: 1h 10m  (was: 1h)

> cleanTxnToWriteIdTable breaks SNAPSHOT isolation
> 
>
> Key: HIVE-25346
> URL: https://issues.apache.org/jira/browse/HIVE-25346
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25403) from_unixtime() does not consider leap seconds

2021-07-29 Thread Sruthi Mooriyathvariam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sruthi Mooriyathvariam reassigned HIVE-25403:
-


>  from_unixtime() does not consider leap seconds
> ---
>
> Key: HIVE-25403
> URL: https://issues.apache.org/jira/browse/HIVE-25403
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Major
> Fix For: 3.1.2
>
> Attachments: image-2021-07-29-14-42-49-806.png
>
>
> The Unix_timestamp() considers "leap second" while the from_unixtime is not; 
> which results in to wrong result as below:
> !image-2021-07-29-14-42-49-806.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-6679) HiveServer2 should support configurable the server side socket timeout and keepalive for various transports types where applicable

2021-07-29 Thread Oleksiy Sayankin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389759#comment-17389759
 ] 

Oleksiy Sayankin commented on HIVE-6679:


[~kgyrtkirk]

Could you please review the PR?

> HiveServer2 should support configurable the server side socket timeout and 
> keepalive for various transports types where applicable
> --
>
> Key: HIVE-6679
> URL: https://issues.apache.org/jira/browse/HIVE-6679
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0, 1.2.0
>Reporter: Prasad Suresh Mujumdar
>Assignee: Oleksiy Sayankin
>Priority: Major
>  Labels: TODOC1.0, TODOC15, pull-request-available
> Fix For: 1.3.0
>
> Attachments: HIVE-6679.1.patch.txt, HIVE-6679.2.patch.txt, 
> HIVE-6679.3.patch, HIVE-6679.4.patch, HIVE-6679.5.patch, HIVE-6679.6.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  HiveServer2 should support configurable the server side socket read timeout 
> and TCP keep-alive option. Metastore server already support this (and the so 
> is the old hive server). 
> We now have multiple client connectivity options like Kerberos, Delegation 
> Token (Digest-MD5), Plain SASL, Plain SASL with SSL and raw sockets. The 
> configuration should be applicable to all types (if possible).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem

2021-07-29 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-24467:
-

Assignee: Xi Chen  (was: guojh)

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---
>
> Key: HIVE-24467
> URL: https://issues.apache.org/jira/browse/HIVE-24467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0, 2.3.4, 3.1.2
>Reporter: guojh
>Assignee: Xi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-34:MAPRED] in parallel
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 4 out of 17
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-38:MAPRED] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946

[jira] [Resolved] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem

2021-07-29 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-24467.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the PR [~jshmchenxi]!

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---
>
> Key: HIVE-24467
> URL: https://issues.apache.org/jira/browse/HIVE-24467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0, 2.3.4, 3.1.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-34:MAPRED] in parallel
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 4 out of 17
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-38:MAPRED] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [

[jira] [Resolved] (HIVE-24904) CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar

2021-07-29 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24904.
-
Fix Version/s: 4.0.0
   Resolution: Duplicate

I've fixed this in HIVE-20071 by migrating and banning that old dependency

> CVE-2019-10172,CVE-2019-10202 vulnerabilities in jackson-mapper-asl-1.9.13.jar
> --
>
> Key: HIVE-24904
> URL: https://issues.apache.org/jira/browse/HIVE-24904
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Oleksiy Sayankin
>Assignee: Zoltan Haindrich
>Priority: Critical
>  Labels: CVE
> Fix For: 4.0.0
>
>
> CVE list: CVE-2019-10172,CVE-2019-10202
> CVSS score: High
> {code}
> ./packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/lib/jackson-mapper-asl-1.9.13.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25402) When Hive client has multiple statements without close. queryIdOperation in OperationManager class will exist object that cannot be released

2021-07-29 Thread lvyankui (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lvyankui updated HIVE-25402:

Attachment: HIVE-25402.patch

> When Hive client has multiple statements without close. queryIdOperation in 
> OperationManager class will exist object that cannot be released
> 
>
> Key: HIVE-25402
> URL: https://issues.apache.org/jira/browse/HIVE-25402
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: All Versions
>Reporter: lvyankui
>Priority: Major
> Attachments: HIVE-25402.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Hive client code has multiple statements without close
> connect = DriverManager.getConnection(jdbcUrl, user, password);
> PrintWriter pw = new PrintWriter("/tmp/hive.result" );
> Statement stmt = connect.createStatement();
> Statement stmt1 = connect.createStatement();
> Statement stmt2 = connect.createStatement();
> String sql = "select * from test";
> runSQL(stmt, sql, pw);
> runSQL(stmt1, sql, pw);
> runSQL(stmt2, sql, pw);
>  
> OperationManager  removeOperation  method
> private Operation removeOperation(OperationHandle opHandle) {
>  Operation operation = handleToOperation.remove(opHandle);
>  if (operation == null) {
>  throw new RuntimeException("Operation does not exist: " + opHandle);
>  }
>  String queryId = getQueryId(operation);
>  *queryIdOperation.remove(queryId);*
>  
> The key of queryIdOperation is queryIdOperation is queryId, queryId is  
> getted from HiveConf. A new queryId will be generated when a new queryPlan is 
> generated and set it into HiveConf.  If Hive client code has multiple 
> statements without close, when sqls execute complete, queryIdOperation can 
> only release the object whose queryId is last generated,other object cannot 
> be released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24546) Avoid unwanted cloud storage call during dynamic partition load

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24546?focusedWorklogId=631021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-631021
 ]

ASF GitHub Bot logged work on HIVE-24546:
-

Author: ASF GitHub Bot
Created on: 29/Jul/21 07:09
Start Date: 29/Jul/21 07:09
Worklog Time Spent: 10m 
  Work Description: rbalamohan opened a new pull request #2545:
URL: https://github.com/apache/hive/pull/2545


   ### What changes were proposed in this pull request?
   https://issues.apache.org/jira/browse/HIVE-24546
   Fix FS usage
   
   ### Why are the changes needed?
   Optimised FS usage for objectstores; especially during dynamic partition 
loads.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   small internal cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 631021)
Remaining Estimate: 0h
Time Spent: 10m

> Avoid unwanted cloud storage call during dynamic partition load
> ---
>
> Key: HIVE-24546
> URL: https://issues.apache.org/jira/browse/HIVE-24546
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
> Attachments: simple_test.sql
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
>  private void createDpDirCheckSrc(final Path dpStagingPath, final Path 
> dpFinalPath) throws IOException {
> if (!fs.exists(dpStagingPath) && !fs.exists(dpFinalPath)) {
>   fs.mkdirs(dpStagingPath);
>   // move task will create dp final path
>   if (reporter != null) {
> reporter.incrCounter(counterGroup, 
> Operator.HIVE_COUNTER_CREATED_DYNAMIC_PARTITIONS, 1);
>   }
> }
>   }
>  {code}
>  
>  
> {noformat}
> at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:370)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1960)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3164)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3031)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2899)
>   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1723)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4157)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createDpDir(FileSinkOperator.java:948)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.updateDPCounters(FileSinkOperator.java:916)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:849)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:1200)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:1324)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1036)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24546) Avoid unwanted cloud storage call during dynamic partition load

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24546:
--
Labels: pull-request-available  (was: )

> Avoid unwanted cloud storage call during dynamic partition load
> ---
>
> Key: HIVE-24546
> URL: https://issues.apache.org/jira/browse/HIVE-24546
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
> Attachments: simple_test.sql
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
>  private void createDpDirCheckSrc(final Path dpStagingPath, final Path 
> dpFinalPath) throws IOException {
> if (!fs.exists(dpStagingPath) && !fs.exists(dpFinalPath)) {
>   fs.mkdirs(dpStagingPath);
>   // move task will create dp final path
>   if (reporter != null) {
> reporter.incrCounter(counterGroup, 
> Operator.HIVE_COUNTER_CREATED_DYNAMIC_PARTITIONS, 1);
>   }
> }
>   }
>  {code}
>  
>  
> {noformat}
> at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:370)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1960)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3164)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3031)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2899)
>   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1723)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4157)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createDpDir(FileSinkOperator.java:948)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.updateDPCounters(FileSinkOperator.java:916)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:849)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:1200)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:1324)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1036)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

2021-07-29 Thread Max Xie (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max  Xie updated HIVE-25401:

Attachment: (was: HIVE-25401.patch)

> Insert overwrite  a table which location is on other cluster fail  in 
> kerberos cluster
> --
>
> Key: HIVE-25401
> URL: https://issues.apache.org/jira/browse/HIVE-25401
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.0, 3.1.2
> Environment: hive 2.3 
> hadoop3 cluster with kerberos 
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> we have tow hdfs clusters with kerberos security,  it means that mapreduce 
> task need delegation tokens to authenticate namenode when hive on mapreduce 
> run.
> Insert overwrite a table which location is on other cluster fail in kerberos 
> cluster. For example, 
>  # yarn cluster's default fs is hdfs://cluster1
>  # tb1's location is hdfs://cluster1/tb1
>  # tb2's location is hdfs://cluster2/tb2 
>  #  sql `INSERT OVERWRITE TABLE  tb2 SELECT * from tb1` run on yarn cluster 
> will fail
>  
> reduce task error log:
> !image-2021-07-29-14-25-23-418.png!
> How to fix:
> After dig it, web found mapreduce job just obtain delegation tokens for input 
> files in FileInputFormat. But Hive context get extendal scratchDir base on 
> table's location, If the table 's location is on other cluster, the 
> delegation token will not be obtained. 
> So we need to obtaine delegation tokens for hive scratchDirs before hive 
> submit mapreduce job.
>  
> How to test:
> no test
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25401) Insert overwrite a table which location is on other cluster fail in kerberos cluster

2021-07-29 Thread Max Xie (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max  Xie updated HIVE-25401:

Attachment: HIVE-25401.patch
  Assignee: Max  Xie
Status: Patch Available  (was: Open)

> Insert overwrite  a table which location is on other cluster fail  in 
> kerberos cluster
> --
>
> Key: HIVE-25401
> URL: https://issues.apache.org/jira/browse/HIVE-25401
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2, 2.3.0
> Environment: hive 2.3 
> hadoop3 cluster with kerberos 
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-25401.patch, image-2021-07-29-14-25-23-418.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> we have tow hdfs clusters with kerberos security,  it means that mapreduce 
> task need delegation tokens to authenticate namenode when hive on mapreduce 
> run.
> Insert overwrite a table which location is on other cluster fail in kerberos 
> cluster. For example, 
>  # yarn cluster's default fs is hdfs://cluster1
>  # tb1's location is hdfs://cluster1/tb1
>  # tb2's location is hdfs://cluster2/tb2 
>  #  sql `INSERT OVERWRITE TABLE  tb2 SELECT * from tb1` run on yarn cluster 
> will fail
>  
> reduce task error log:
> !image-2021-07-29-14-25-23-418.png!
> How to fix:
> After dig it, web found mapreduce job just obtain delegation tokens for input 
> files in FileInputFormat. But Hive context get extendal scratchDir base on 
> table's location, If the table 's location is on other cluster, the 
> delegation token will not be obtained. 
> So we need to obtaine delegation tokens for hive scratchDirs before hive 
> submit mapreduce job.
>  
> How to test:
> no test
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)