[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939454#comment-15939454
 ] 

Sahil Takiar commented on HIVE-15396:
-------------------------------------

[~pxiong] can't we take the location, create a {{FileSystem}} object, and the 
run {{fs.exists()}} - if the location exists, then don't setup stats, if it 
doesn't exist then setup full stats.

There is no guarantee that other process don't write data into the the 
location, but then again there is no guarantee that other processes don't write 
into {{hive.metastore.warehouse.dir}}

> Basic Stats are not collected when for managed tables with LOCATION specified
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-15396
>                 URL: https://issues.apache.org/jira/browse/HIVE-15396
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-15396.1.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:10000> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:10000> describe formatted hdfs_1;
> +-------------------------------+----------------------------------------------------+-----------------------------+
> |           col_name            |                     data_type               
>        |           comment           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> | # col_name                    | data_type                                   
>        | comment                     |
> |                               | NULL                                        
>        | NULL                        |
> | col                           | int                                         
>        |                             |
> |                               | NULL                                        
>        | NULL                        |
> | # Detailed Table Information  | NULL                                        
>        | NULL                        |
> | Database:                     | default                                     
>        | NULL                        |
> | Owner:                        | anonymous                                   
>        | NULL                        |
> | CreateTime:                   | Wed Mar 22 18:09:19 PDT 2017                
>        | NULL                        |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                        |
> | Retention:                    | 0                                           
>        | NULL                        |
> | Location:                     | file:/warehouse/hdfs_1 | NULL               
>          |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                        |
> | Table Parameters:             | NULL                                        
>        | NULL                        |
> |                               | COLUMN_STATS_ACCURATE                       
>        | {\"BASIC_STATS\":\"true\"}  |
> |                               | numFiles                                    
>        | 0                           |
> |                               | numRows                                     
>        | 0                           |
> |                               | rawDataSize                                 
>        | 0                           |
> |                               | totalSize                                   
>        | 0                           |
> |                               | transient_lastDdlTime                       
>        | 1490231359                  |
> |                               | NULL                                        
>        | NULL                        |
> | # Storage Information         | NULL                                        
>        | NULL                        |
> | SerDe Library:                | 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                     
>    |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat    
>        | NULL                        |
> | OutputFormat:                 | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL             
>            |
> | Compressed:                   | No                                          
>        | NULL                        |
> | Num Buckets:                  | -1                                          
>        | NULL                        |
> | Bucket Columns:               | []                                          
>        | NULL                        |
> | Sort Columns:                 | []                                          
>        | NULL                        |
> | Storage Desc Params:          | NULL                                        
>        | NULL                        |
> |                               | serialization.format                        
>        | 1                           |
> +-------------------------------+----------------------------------------------------+-----------------------------+
> 0: jdbc:hive2://localhost:10000> create table s3_1 (col int) location 
> 's3a://[bucket]/test-tables/s3-1';
> 0: jdbc:hive2://localhost:10000> describe formatted s3_1;
> +-------------------------------+----------------------------------------------------+-----------------------+
> |           col_name            |                     data_type               
>        |        comment        |
> +-------------------------------+----------------------------------------------------+-----------------------+
> | # col_name                    | data_type                                   
>        | comment               |
> |                               | NULL                                        
>        | NULL                  |
> | col                           | int                                         
>        |                       |
> |                               | NULL                                        
>        | NULL                  |
> | # Detailed Table Information  | NULL                                        
>        | NULL                  |
> | Database:                     | default                                     
>        | NULL                  |
> | Owner:                        | anonymous                                   
>        | NULL                  |
> | CreateTime:                   | Wed Mar 22 18:10:01 PDT 2017                
>        | NULL                  |
> | LastAccessTime:               | UNKNOWN                                     
>        | NULL                  |
> | Retention:                    | 0                                           
>        | NULL                  |
> | Location:                     | s3a://[bucket]/test-tables/s3-1     | NULL  
>                 |
> | Table Type:                   | MANAGED_TABLE                               
>        | NULL                  |
> | Table Parameters:             | NULL                                        
>        | NULL                  |
> |                               | transient_lastDdlTime                       
>        | 1490231401            |
> |                               | NULL                                        
>        | NULL                  |
> | # Storage Information         | NULL                                        
>        | NULL                  |
> | SerDe Library:                | 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
> | InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat    
>        | NULL                  |
> | OutputFormat:                 | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL             
>      |
> | Compressed:                   | No                                          
>        | NULL                  |
> | Num Buckets:                  | -1                                          
>        | NULL                  |
> | Bucket Columns:               | []                                          
>        | NULL                  |
> | Sort Columns:                 | []                                          
>        | NULL                  |
> | Storage Desc Params:          | NULL                                        
>        | NULL                  |
> |                               | serialization.format                        
>        | 1                     |
> +-------------------------------+----------------------------------------------------+-----------------------+
> {code}
> There are no stats defined in the describe for the s3 table. Furthermore, 
> when inserting into the s3 table the {{numRows}} stats are not collected for 
> the s3 table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to