[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949511#comment-15949511
 ] 

Sahil Takiar commented on HIVE-15396:
-------------------------------------

Good point. How about the approach in my 3rd patch? It checks if the data 
location is empty or not. If it is empty, all stats are collected, if it isn't 
then only basic stats are added. I'll remove the check for {{isExternal()}}.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-15396
>                 URL: https://issues.apache.org/jira/browse/HIVE-15396
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1;
> +-------------------------------+----------------------------------------------------+-----------------------------+
> |           col_name            |                     data_type               
>        |           comment           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | # col_name                    | data_type                                   
>        | comment                     |
> |                               | NULL                                        
>        | NULL                        |
> | col                           | int                                         
>        |                             |
> |                               | NULL                                        
>        | NULL                        |
> | # Detailed Table Information  | NULL                                        
>        | NULL                        |
> | Database:                     | default                                     
>        | NULL                        |
> | Owner:                        | anonymous                                   
>        | NULL                        |
> | CreateTime:                   | Wed Mar 22 18:09:19 PDT 2017                
>        | NULL                        |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                        |
> | Retention:                    | 0                                           
>        | NULL                        |
> | Location:                     | file:/warehouse/hdfs_1 | NULL               
>          |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                        |
> | Table Parameters:             | NULL                                        
>        | NULL                        |
> |                               | COLUMN_STATS_ACCURATE                       
>        | {\"BASIC_STATS\":\"true\"}  |
> |                               | numFiles                                    
>        | 0                           |
> |                               | numRows                                     
>        | 0                           |
> |                               | rawDataSize                                 
>        | 0                           |
> |                               | totalSize                                   
>        | 0                           |
> |                               | transient_lastDdlTime                       
>        | 1490231359                  |
> |                               | NULL                                        
>        | NULL                        |
> | # Storage Information         | NULL                                        
>        | NULL                        |
> | SerDe Library:                | 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                     
>    |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat    
>        | NULL                        |
> | OutputFormat:                 | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL             
>            |
> | Compressed:                   | No                                          
>        | NULL                        |
> | Num Buckets:                  | -1                                          
>        | NULL                        |
> | Bucket Columns:               | []                                          
>        | NULL                        |
> | Sort Columns:                 | []                                          
>        | NULL                        |
> | Storage Desc Params:          | NULL                                        
>        | NULL                        |
> |                               | serialization.format                        
>        | 1                           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location 
> 's3a://[bucket]/test-tables/s3-1';
> 0: jdbc:hive2://localhost:10000> describe formatted s3_1;
> +-------------------------------+----------------------------------------------------+-----------------------+
> |           col_name            |                     data_type               
>        |        comment        |
> +-------------------------------+----------------------------------------------------+-----------------------+
> | # col_name                    | data_type                                   
>        | comment               |
> |                               | NULL                                        
>        | NULL                  |
> | col                           | int                                         
>        |                       |
> |                               | NULL                                        
>        | NULL                  |
> | # Detailed Table Information  | NULL                                        
>        | NULL                  |
> | Database:                     | default                                     
>        | NULL                  |
> | Owner:                        | anonymous                                   
>        | NULL                  |
> | CreateTime:                   | Wed Mar 22 18:10:01 PDT 2017                
>        | NULL                  |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                  |
> | Retention:                    | 0                                           
>        | NULL                  |
> | Location:                     | s3a://[bucket]/test-tables/s3-1     | NULL  
>                 |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                  |
> | Table Parameters:             | NULL                                        
>        | NULL                  |
> |                               | transient_lastDdlTime                       
>        | 1490231401            |
> |                               | NULL                                        
>        | NULL                  |
> | # Storage Information         | NULL                                        
>        | NULL                  |
> | SerDe Library:                | 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat    
>        | NULL                  |
> | OutputFormat:                 | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL             
>      |
> | Compressed:                   | No                                          
>        | NULL                  |
> | Num Buckets:                  | -1                                          
>        | NULL                  |
> | Bucket Columns:               | []                                          
>        | NULL                  |
> | Sort Columns:                 | []                                          
>        | NULL                  |
> | Storage Desc Params:          | NULL                                        
>        | NULL                  |
> |                               | serialization.format                        
>        | 1                     |
> +-------------------------------+----------------------------------------------------+-----------------------+
> {code}
> There are no stats defined in the describe for the s3 table. Furthermore, 
> when inserting into the s3 table the {{numRows}} stats are not collected for 
> the s3 table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to