[jira] [Comment Edited] (HIVE-24596) Explain ddl for debugging

2021-03-06 Thread Harshit Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293868#comment-17293868
 ] 

Harshit Gupta edited comment on HIVE-24596 at 3/7/21, 3:45 AM:
---

Yeah Sure!!

Let's assume the following [^table_definitions] and the following [^query]. The 
explain ddl output for the query will look like this:[^output]

 

 


was (Author: harshit.gupta):
Yeah Sure!!

Let's assume the following [^table_definitions] and the following [^query]. The 
explain ddl output for the query will look like [^explain_ddl_output]

 

 

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Attachments: output, query, table_definitions
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24596) Explain ddl for debugging

2021-03-06 Thread Harshit Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshit Gupta updated HIVE-24596:
-
Attachment: (was: explain_ddl_output)

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Attachments: output, query, table_definitions
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24596) Explain ddl for debugging

2021-03-06 Thread Harshit Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshit Gupta updated HIVE-24596:
-
Attachment: output

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Attachments: output, query, table_definitions
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2021-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=561887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561887
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 07/Mar/21 00:52
Start Date: 07/Mar/21 00:52
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1736:
URL: https://github.com/apache/hive/pull/1736#issuecomment-792135255


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561887)
Time Spent: 2h  (was: 1h 50m)

> Add support for combiner in hash mode group aggregation 
> 
>
> Key: HIVE-24471
> URL: https://issues.apache.org/jira/browse/HIVE-24471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used (with sorting also 
> being performed before flushing). If the hash table size increases beyond 
> configurable limit, data is flushed to disk and new hash table is generated. 
> If the reduction by hash table is less than min hash aggregation reduction 
> calculated during compile time, the map side aggregation is converted to 
> streaming mode. So if the first few batch of records does not result into 
> significant reduction, then the mode is switched to streaming mode. This may 
> have impact on performance, if the subsequent batch of records have less 
> number of distinct values. 
> To improve performance both in Hash and Streaming mode, a combiner can be 
> added to the map task after the keys are sorted. This will make sure that the 
> aggregation is done if possible and reduce the data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24526) Get grouped locations of external table data using metatool.

2021-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24526?focusedWorklogId=561886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561886
 ]

ASF GitHub Bot logged work on HIVE-24526:
-

Author: ASF GitHub Bot
Created on: 07/Mar/21 00:52
Start Date: 07/Mar/21 00:52
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1768:
URL: https://github.com/apache/hive/pull/1768#issuecomment-792135250


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561886)
Time Spent: 40m  (was: 0.5h)

> Get grouped locations of external table data using metatool.
> 
>
> Key: HIVE-24526
> URL: https://issues.apache.org/jira/browse/HIVE-24526
> Project: Hive
>  Issue Type: Task
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24526.01.patch, HIVE-24526.02.patch, 
> HIVE-24526.03.patch, HIVE-24526.04.patch, HIVE-24526.05.patch, 
> HIVE-24526.06.patch, HIVE-24526.07.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This task adds two new functionalities to metatool.
> The first option, -listExtTblLocs generates a json-file containing a set of 
> locations which cover all external-table data-locations for a database 
> specified by user.
> The second option, -diffExtTblLocs creates a diff from two jsons generated 
> using the first option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24529) Metastore truncates milliseconds while storing timestamp column stats

2021-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24529?focusedWorklogId=561832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-561832
 ]

ASF GitHub Bot logged work on HIVE-24529:
-

Author: ASF GitHub Bot
Created on: 06/Mar/21 19:26
Start Date: 06/Mar/21 19:26
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma opened a new pull request #2041:
URL: https://github.com/apache/hive/pull/2041


   …column stats
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 561832)
Remaining Estimate: 0h
Time Spent: 10m

> Metastore truncates milliseconds while storing timestamp column stats
> -
>
> Key: HIVE-24529
> URL: https://issues.apache.org/jira/browse/HIVE-24529
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce the issue:
> create table tnikhil (t timestamp);
> insert into tnikhil values ('2019-01-01 23:12:45.123456');
> analyze table tnikhil compute statistics for columns;
> select * from tnikhil;
> {noformat}
> +-+
> |  tnikhil.t  |
> +-+
> | 2019-01-01 23:12:45.123456  |
> +-+{noformat}
> desc formatted tnikhil t; 
> {noformat}
> +++
> |col_name| data_type  
> |
> +++
> | col_name   | t  
> |
> | data_type  | timestamp  
> |
> | min| 1546384365 
> |
> | max| 1546384365 
> |
> +++
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24529) Metastore truncates milliseconds while storing timestamp column stats

2021-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24529:
--
Labels: pull-request-available  (was: )

> Metastore truncates milliseconds while storing timestamp column stats
> -
>
> Key: HIVE-24529
> URL: https://issues.apache.org/jira/browse/HIVE-24529
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce the issue:
> create table tnikhil (t timestamp);
> insert into tnikhil values ('2019-01-01 23:12:45.123456');
> analyze table tnikhil compute statistics for columns;
> select * from tnikhil;
> {noformat}
> +-+
> |  tnikhil.t  |
> +-+
> | 2019-01-01 23:12:45.123456  |
> +-+{noformat}
> desc formatted tnikhil t; 
> {noformat}
> +++
> |col_name| data_type  
> |
> +++
> | col_name   | t  
> |
> | data_type  | timestamp  
> |
> | min| 1546384365 
> |
> | max| 1546384365 
> |
> +++
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)