[jira] [Updated] (SPARK-28930) Spark DESC FORMATTED TABLENAME information display issues

jobit mathew (Jira) Mon, 02 Sep 2019 22:56:19 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-28930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jobit mathew updated SPARK-28930:
---------------------------------
    Description: 
Spark DESC FORMATTED TABLENAME information display issues.Showing incorrect 
*Last Access time and* feeling some information displays can make it better.

Test steps:
 1. Open spark sql
 2. Create table with partition
 CREATE EXTERNAL TABLE IF NOT EXISTS employees_info_extended ( id INT, name 
STRING, usd_flag STRING, salary DOUBLE, deductions MAP<STRING, DOUBLE>, address 
STRING ) PARTITIONED BY (entrytime STRING) STORED AS TEXTFILE location 
'hdfs://hacluster/user/sparkhive/warehouse';
 3. from spark sql check the table description
 desc formatted tablename;
 4. From scala shell check the table description
 sql("desc formatted tablename").show()

*Issue1:*
 If there is no comment for spark scala shell shows *"null" in small letters* 
but all other places Hive beeline/Spark beeline/Spark SQL it is showing in 
*CAPITAL "NULL*". Better to show same in all places.

 
{code:java}
*scala>* sql("desc formatted employees_info_extended").show(false);
 +-----------------------------+---------------------------++-------
|col_name|data_type|*comment*|

+-----------------------------+---------------------------++-------
|id|int|*null*|
|name|string|*null*|
|usd_flag|string|*null*|
|salary|double|*null*|
|deductions|map<string,double>|*null*|
|address|string|null|
|entrytime|string|null|
| # Partition Information| | |
| # col_name|data_type|comment|
|entrytime|string|null|
| | | |
| # Detailed Table Information| | |
|Database|sparkdb__| |
|Table|employees_info_extended| |
|Owner|root| |

*|Created Time |Tue Aug 20 13:42:06 CST 2019| |*
 *|Last Access |Thu Jan 01 08:00:00 CST 1970| |*
|Created By|Spark 2.4.3| |
|Type|EXTERNAL| |
|Provider|hive| |

+-----------------------------+---------------------------++-------
 only showing top 20 rows

*scala>*
{code}
*Issue 2:*
 Spark SQL "desc formatted tablename" is not showing the header [# 
col_name,data_type,comment|#col_name,data_type,comment] in the top of the query 
result.But header is showing on top of partition description. For Better 
understanding show the header on Top of the query result.Other than in spark 
sql ,we are able to see the header like [# 
col_name,data_type,comment|#col_name,data_type,comment] in spark-beeline & hive 
beeline  .
{code:java}
*spark-sql>* desc formatted employees_info_extended1;
 id int *NULL*
 name string *NULL*
 usd_flag string NULL
 salary double NULL
 deductions map<string,double> NULL
 address string NULL
 entrytime string NULL
 * 
 ## Partition Information*
 ## col_name data_type comment*
 entrytime string *NULL*

 # Detailed Table Information
 Database sparkdb__
 Table employees_info_extended1
 Owner spark
 *Created Time Tue Aug 20 14:50:37 CST 2019*
 *Last Access Thu Jan 01 08:00:00 CST 1970*
 Created By Spark 2.3.2.0201
 Type EXTERNAL
 Provider hive
 Table Properties [transient_lastDdlTime=1566286655]
 Location hdfs://hacluster/user/sparkhive/warehouse
 Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 InputFormat org.apache.hadoop.mapred.TextInputFormat
 OutputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
 Storage Properties [serialization.format=1]
 Partition Provider Catalog
 Time taken: 0.477 seconds, Fetched 27 row(s)
 *spark-sql>*

this is the spark-beeline which is showing the headers 
0: jdbc:hive2://10.186.60.158:23040/default> desc formatted employees;
+-------------------------------+---------------------------------------------------------------------------------+------------------+--+
|           col_name            |                                    data_type  
                                  |     comment      |
+-------------------------------+---------------------------------------------------------------------------------+------------------+--+
| name                          | string                                        
                                  | Employee name    |
| salary                        | float                                         
                                  | Employee salary  |
|                               |                                               
                                  |                  |
| # Detailed Table Information  |                                               
                                  |                  |
| Database                      | sparkdb__                                     
                                  |                  |
| Table                         | employees                                     
                                  |                  |
| Owner                         | spark                                         
                                  |                  |
| Created Time                  | Mon Aug 26 15:25:01 CST 2019                  
                                  |                  |
| Last Access                   | Thu Jan 01 08:00:00 CST 1970                  
                                  |                  |
| Created By                    | Spark 2.3.2.0201                              
                                  |                  |
| Type                          | MANAGED                                       
                                  |                  |
| Provider                      | hive                                          
                                  |                  |
| Comment                       | Description of the table                      
                                  |                  |
| Table Properties              | [transient_lastDdlTime=1566804669, 
creator=me, created_at=2012-01-02 10:00:00]  |                  |
| Statistics                    | 34 bytes                                      
                                  |                  |
| Location                      | 
hdfs://hacluster/user/sparkhive/warehouse/sparkdb__.db/employees                
|                  |
| Serde Library                 | 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                              
|                  |
| InputFormat                   | org.apache.hadoop.mapred.TextInputFormat      
                                  |                  |
| OutputFormat                  | 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                      
|                  |
| Storage Properties            | [serialization.format=1]                      
                                  |                  |
| Partition Provider            | Catalog                                       
                                  |                  |
+-------------------------------+---------------------------------------------------------------------------------+------------------+--+
21 rows selected (0.257 seconds)
0: jdbc:hive2://10.186.60.158:23040/default>


{code}
 

*Issue 3:*
 I created the table on Aug 20.So it is showing created time correct .*But Last 
access time showing 1970 Jan 01*. It is not good to show Last access time 
earlier time than the created time.Better to show the correct date and time 
else show UNKNOWN.
 *[Created Time,Tue Aug 20 13:42:06 CST 2019,]*
 *[Last Access,Thu Jan 01 08:00:00 CST 1970,]*

  was:
Spark DESC FORMATTED TABLENAME information display issues.Showing incorrect 
*Last Access time and* feeling some information displays can make it better.

Test steps:
 1. Open spark sql
 2. Create table with partition
 CREATE EXTERNAL TABLE IF NOT EXISTS employees_info_extended ( id INT, name 
STRING, usd_flag STRING, salary DOUBLE, deductions MAP<STRING, DOUBLE>, address 
STRING ) PARTITIONED BY (entrytime STRING) STORED AS TEXTFILE location 
'hdfs://hacluster/user/sparkhive/warehouse';
 3. from spark sql check the table description
 desc formatted tablename;
 4. From scala shell check the table description
 sql("desc formatted tablename").show()

*Issue1:*
 If there is no comment for spark scala shell shows *"null" in small letters* 
but all other places Hive beeline/Spark beeline/Spark SQL it is showing in 
*CAPITAL "NULL*". Better to show same in all places.

 
{code}
*scala>* sql("desc formatted employees_info_extended").show(false);
 +-----------------------------+---------------------------++-------
|col_name|data_type|*comment*|

+-----------------------------+---------------------------++-------
|id|int|*null*|
|name|string|*null*|
|usd_flag|string|*null*|
|salary|double|*null*|
|deductions|map<string,double>|*null*|
|address|string|null|
|entrytime|string|null|
| # Partition Information| | |
| # col_name|data_type|comment|
|entrytime|string|null|
| | | |
| # Detailed Table Information| | |
|Database|sparkdb__| |
|Table|employees_info_extended| |
|Owner|root| |

*|Created Time |Tue Aug 20 13:42:06 CST 2019| |*
 *|Last Access |Thu Jan 01 08:00:00 CST 1970| |*
|Created By|Spark 2.4.3| |
|Type|EXTERNAL| |
|Provider|hive| |

+-----------------------------+---------------------------++-------
 only showing top 20 rows

*scala>*
{code}

*Issue 2:*
 Spark SQL "desc formatted tablename" is not showing the header [# 
col_name,data_type,comment|#col_name,data_type,comment] in the top of the query 
result.But header is showing on top of partition description. For Better 
understanding show the header on Top of the query result.

{code}
*spark-sql>* desc formatted employees_info_extended1;
 id int *NULL*
 name string *NULL*
 usd_flag string NULL
 salary double NULL
 deductions map<string,double> NULL
 address string NULL
 entrytime string NULL
 * 
 ## Partition Information*
 ## col_name data_type comment*
 entrytime string *NULL*

 # Detailed Table Information
 Database sparkdb__
 Table employees_info_extended1
 Owner spark
 *Created Time Tue Aug 20 14:50:37 CST 2019*
 *Last Access Thu Jan 01 08:00:00 CST 1970*
 Created By Spark 2.3.2.0201
 Type EXTERNAL
 Provider hive
 Table Properties [transient_lastDdlTime=1566286655]
 Location hdfs://hacluster/user/sparkhive/warehouse
 Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 InputFormat org.apache.hadoop.mapred.TextInputFormat
 OutputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
 Storage Properties [serialization.format=1]
 Partition Provider Catalog
 Time taken: 0.477 seconds, Fetched 27 row(s)
 *spark-sql>*
{code}
 

*Issue 3:*
 I created the table on Aug 20.So it is showing created time correct .*But Last 
access time showing 1970 Jan 01*. It is not good to show Last access time 
earlier time than the created time.Better to show the correct date and time 
else show UNKNOWN.
 *[Created Time,Tue Aug 20 13:42:06 CST 2019,]*
 *[Last Access,Thu Jan 01 08:00:00 CST 1970,]*


> Spark DESC FORMATTED TABLENAME information display issues
> ---------------------------------------------------------
>
>                 Key: SPARK-28930
>                 URL: https://issues.apache.org/jira/browse/SPARK-28930
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: jobit mathew
>            Priority: Minor
>
> Spark DESC FORMATTED TABLENAME information display issues.Showing incorrect 
> *Last Access time and* feeling some information displays can make it better.
> Test steps:
>  1. Open spark sql
>  2. Create table with partition
>  CREATE EXTERNAL TABLE IF NOT EXISTS employees_info_extended ( id INT, name 
> STRING, usd_flag STRING, salary DOUBLE, deductions MAP<STRING, DOUBLE>, 
> address STRING ) PARTITIONED BY (entrytime STRING) STORED AS TEXTFILE 
> location 'hdfs://hacluster/user/sparkhive/warehouse';
>  3. from spark sql check the table description
>  desc formatted tablename;
>  4. From scala shell check the table description
>  sql("desc formatted tablename").show()
> *Issue1:*
>  If there is no comment for spark scala shell shows *"null" in small letters* 
> but all other places Hive beeline/Spark beeline/Spark SQL it is showing in 
> *CAPITAL "NULL*". Better to show same in all places.
>  
> {code:java}
> *scala>* sql("desc formatted employees_info_extended").show(false);
>  +-----------------------------+---------------------------++-------
> |col_name|data_type|*comment*|
> +-----------------------------+---------------------------++-------
> |id|int|*null*|
> |name|string|*null*|
> |usd_flag|string|*null*|
> |salary|double|*null*|
> |deductions|map<string,double>|*null*|
> |address|string|null|
> |entrytime|string|null|
> | # Partition Information| | |
> | # col_name|data_type|comment|
> |entrytime|string|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|sparkdb__| |
> |Table|employees_info_extended| |
> |Owner|root| |
> *|Created Time |Tue Aug 20 13:42:06 CST 2019| |*
>  *|Last Access |Thu Jan 01 08:00:00 CST 1970| |*
> |Created By|Spark 2.4.3| |
> |Type|EXTERNAL| |
> |Provider|hive| |
> +-----------------------------+---------------------------++-------
>  only showing top 20 rows
> *scala>*
> {code}
> *Issue 2:*
>  Spark SQL "desc formatted tablename" is not showing the header [# 
> col_name,data_type,comment|#col_name,data_type,comment] in the top of the 
> query result.But header is showing on top of partition description. For 
> Better understanding show the header on Top of the query result.Other than in 
> spark sql ,we are able to see the header like [# 
> col_name,data_type,comment|#col_name,data_type,comment] in spark-beeline & 
> hive beeline  .
> {code:java}
> *spark-sql>* desc formatted employees_info_extended1;
>  id int *NULL*
>  name string *NULL*
>  usd_flag string NULL
>  salary double NULL
>  deductions map<string,double> NULL
>  address string NULL
>  entrytime string NULL
>  * 
>  ## Partition Information*
>  ## col_name data_type comment*
>  entrytime string *NULL*
>  # Detailed Table Information
>  Database sparkdb__
>  Table employees_info_extended1
>  Owner spark
>  *Created Time Tue Aug 20 14:50:37 CST 2019*
>  *Last Access Thu Jan 01 08:00:00 CST 1970*
>  Created By Spark 2.3.2.0201
>  Type EXTERNAL
>  Provider hive
>  Table Properties [transient_lastDdlTime=1566286655]
>  Location hdfs://hacluster/user/sparkhive/warehouse
>  Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>  InputFormat org.apache.hadoop.mapred.TextInputFormat
>  OutputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  Storage Properties [serialization.format=1]
>  Partition Provider Catalog
>  Time taken: 0.477 seconds, Fetched 27 row(s)
>  *spark-sql>*
> this is the spark-beeline which is showing the headers 
> 0: jdbc:hive2://10.186.60.158:23040/default> desc formatted employees;
> +-------------------------------+---------------------------------------------------------------------------------+------------------+--+
> |           col_name            |                                    
> data_type                                    |     comment      |
> +-------------------------------+---------------------------------------------------------------------------------+------------------+--+
> | name                          | string                                      
>                                     | Employee name    |
> | salary                        | float                                       
>                                     | Employee salary  |
> |                               |                                             
>                                     |                  |
> | # Detailed Table Information  |                                             
>                                     |                  |
> | Database                      | sparkdb__                                   
>                                     |                  |
> | Table                         | employees                                   
>                                     |                  |
> | Owner                         | spark                                       
>                                     |                  |
> | Created Time                  | Mon Aug 26 15:25:01 CST 2019                
>                                     |                  |
> | Last Access                   | Thu Jan 01 08:00:00 CST 1970                
>                                     |                  |
> | Created By                    | Spark 2.3.2.0201                            
>                                     |                  |
> | Type                          | MANAGED                                     
>                                     |                  |
> | Provider                      | hive                                        
>                                     |                  |
> | Comment                       | Description of the table                    
>                                     |                  |
> | Table Properties              | [transient_lastDdlTime=1566804669, 
> creator=me, created_at=2012-01-02 10:00:00]  |                  |
> | Statistics                    | 34 bytes                                    
>                                     |                  |
> | Location                      | 
> hdfs://hacluster/user/sparkhive/warehouse/sparkdb__.db/employees              
>   |                  |
> | Serde Library                 | 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                            
>   |                  |
> | InputFormat                   | org.apache.hadoop.mapred.TextInputFormat    
>                                     |                  |
> | OutputFormat                  | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                    
>   |                  |
> | Storage Properties            | [serialization.format=1]                    
>                                     |                  |
> | Partition Provider            | Catalog                                     
>                                     |                  |
> +-------------------------------+---------------------------------------------------------------------------------+------------------+--+
> 21 rows selected (0.257 seconds)
> 0: jdbc:hive2://10.186.60.158:23040/default>
> {code}
>  
> *Issue 3:*
>  I created the table on Aug 20.So it is showing created time correct .*But 
> Last access time showing 1970 Jan 01*. It is not good to show Last access 
> time earlier time than the created time.Better to show the correct date and 
> time else show UNKNOWN.
>  *[Created Time,Tue Aug 20 13:42:06 CST 2019,]*
>  *[Last Access,Thu Jan 01 08:00:00 CST 1970,]*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28930) Spark DESC FORMATTED TABLENAME information display issues

Reply via email to