[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937547#comment-15937547
 ] 

Sahil Takiar commented on HIVE-15396:
-------------------------------------

Attached patch is pretty simple. It changes some logic in {{CreateTableDesc}} 
where stats entries for a table are initialized to 0. Originally, the logic 
would only initialize basic stats for managed tables with a {{LOCATION}} 
specified. This patch changes that logic so that all stats are collected for 
managed tables, and now basic stats are only called for {{EXTERNAL}} tables (I 
believe that may have been the original intention?). Only stats are properly 
initialized for a table, their collection proceeds successfully.

I'm guessing there will be some qtest failures, so I'll fix that before adding 
any additional tests.

[~pxiong] I believe this patch modifies some of the code from HIVE-13341 - 
could you take a look at this patch and let me know what you think?

> Basic Stats are not collected when for managed tables with LOCATION specified
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-15396
>                 URL: https://issues.apache.org/jira/browse/HIVE-15396
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1;
> +-------------------------------+----------------------------------------------------+-----------------------------+
> |           col_name            |                     data_type               
>        |           comment           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | # col_name                    | data_type                                   
>        | comment                     |
> |                               | NULL                                        
>        | NULL                        |
> | col                           | int                                         
>        |                             |
> |                               | NULL                                        
>        | NULL                        |
> | # Detailed Table Information  | NULL                                        
>        | NULL                        |
> | Database:                     | default                                     
>        | NULL                        |
> | Owner:                        | anonymous                                   
>        | NULL                        |
> | CreateTime:                   | Wed Mar 22 18:09:19 PDT 2017                
>        | NULL                        |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                        |
> | Retention:                    | 0                                           
>        | NULL                        |
> | Location:                     | file:/warehouse/hdfs_1 | NULL               
>          |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                        |
> | Table Parameters:             | NULL                                        
>        | NULL                        |
> |                               | COLUMN_STATS_ACCURATE                       
>        | {\"BASIC_STATS\":\"true\"}  |
> |                               | numFiles                                    
>        | 0                           |
> |                               | numRows                                     
>        | 0                           |
> |                               | rawDataSize                                 
>        | 0                           |
> |                               | totalSize                                   
>        | 0                           |
> |                               | transient_lastDdlTime                       
>        | 1490231359                  |
> |                               | NULL                                        
>        | NULL                        |
> | # Storage Information         | NULL                                        
>        | NULL                        |
> | SerDe Library:                | 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                     
>    |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat    
>        | NULL                        |
> | OutputFormat:                 | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL             
>            |
> | Compressed:                   | No                                          
>        | NULL                        |
> | Num Buckets:                  | -1                                          
>        | NULL                        |
> | Bucket Columns:               | []                                          
>        | NULL                        |
> | Sort Columns:                 | []                                          
>        | NULL                        |
> | Storage Desc Params:          | NULL                                        
>        | NULL                        |
> |                               | serialization.format                        
>        | 1                           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location 
> 's3a://[bucket]/test-tables/s3-1';
> 0: jdbc:hive2://localhost:10000> describe formatted s3_1;
> +-------------------------------+----------------------------------------------------+-----------------------+
> |           col_name            |                     data_type               
>        |        comment        |
> +-------------------------------+----------------------------------------------------+-----------------------+
> | # col_name                    | data_type                                   
>        | comment               |
> |                               | NULL                                        
>        | NULL                  |
> | col                           | int                                         
>        |                       |
> |                               | NULL                                        
>        | NULL                  |
> | # Detailed Table Information  | NULL                                        
>        | NULL                  |
> | Database:                     | default                                     
>        | NULL                  |
> | Owner:                        | anonymous                                   
>        | NULL                  |
> | CreateTime:                   | Wed Mar 22 18:10:01 PDT 2017                
>        | NULL                  |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                  |
> | Retention:                    | 0                                           
>        | NULL                  |
> | Location:                     | s3a://[bucket]/test-tables/s3-1     | NULL  
>                 |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                  |
> | Table Parameters:             | NULL                                        
>        | NULL                  |
> |                               | transient_lastDdlTime                       
>        | 1490231401            |
> |                               | NULL                                        
>        | NULL                  |
> | # Storage Information         | NULL                                        
>        | NULL                  |
> | SerDe Library:                | 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat    
>        | NULL                  |
> | OutputFormat:                 | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL             
>      |
> | Compressed:                   | No                                          
>        | NULL                  |
> | Num Buckets:                  | -1                                          
>        | NULL                  |
> | Bucket Columns:               | []                                          
>        | NULL                  |
> | Sort Columns:                 | []                                          
>        | NULL                  |
> | Storage Desc Params:          | NULL                                        
>        | NULL                  |
> |                               | serialization.format                        
>        | 1                     |
> +-------------------------------+----------------------------------------------------+-----------------------+
> {code}
> There are no stats defined in the describe for the s3 table. Furthermore, 
> when inserting into the s3 table the {{numRows}} stats are not collected for 
> the s3 table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to