[ 
https://issues.apache.org/jira/browse/SPARK-27403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith Chacko updated SPARK-27403:
----------------------------------
    Description: 
system shall update the table stats automatiaclly if user set 
spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
is not having any significance even if it is anabled or disabled. This feature 
is similar to Hives auto-gather feature where statistics are automatically 
computed by default if this feature is enabled.

Reference:

[https://cwiki.apache.org/confluence/display/Hive/StatsDev]

Reproducing steps:

scala> spark.sql("create table table1 (name string,age int) stored as 
parquet")scala> spark.sql("insert into table1 select 'a',29")
 res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
 
+-----------------------------+-------------------------------------------------------------++-------
|col_name|data_type|comment|

+-----------------------------+-------------------------------------------------------------++-------
|name|string|null|
|age|int|null|
| | | |
| # Detailed Table Information| | |
|Database|default| |
|Table|table1| |
|Owner|Administrator| |
|Created Time|Sun Apr 07 23:41:56 IST 2019| |
|Last Access|Thu Jan 01 05:30:00 IST 1970| |
|Created By|Spark 2.4.1| |
|Type|MANAGED| |
|Provider|hive| |
|Table Properties|[transient_lastDdlTime=1554660716]| |
|Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
|Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
|InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
|OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties|[serialization.format=1]| |
|Partition Provider|Catalog| |

+-----------------------------+-------------------------------------------------------------++-------

  was:
scala> spark.sql("insert into table1 select 'a',29")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("desc extended table1").show(false)
+----------------------------+--------------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+--------------------------------------------------------------+-------+
|name |string |null |
|age |int |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |table1 | |
|Owner |Administrator | |
|Created Time |Sun Apr 07 23:41:56 IST 2019 | |
|Last Access |Thu Jan 01 05:30:00 IST 1970 | |
|Created By |Spark 2.4.1 | |
|Type |MANAGED | |
|Provider |hive | |
|Table Properties |[transient_lastDdlTime=1554660716] | |
|Location |file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1 | |
|Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
|InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
+----------------------------+--------------------------------------------------------------+-------+


> Failed to update the table size automatically even though 
> spark.sql.statistics.size.autoUpdate.enabled is set as rue
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27403
>                 URL: https://issues.apache.org/jira/browse/SPARK-27403
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.1
>            Reporter: Sujith Chacko
>            Priority: Major
>
> system shall update the table stats automatiaclly if user set 
> spark.sql.statistics.size.autoUpdate.enabled as true, currently this property 
> is not having any significance even if it is anabled or disabled. This 
> feature is similar to Hives auto-gather feature where statistics are 
> automatically computed by default if this feature is enabled.
> Reference:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev]
> Reproducing steps:
> scala> spark.sql("create table table1 (name string,age int) stored as 
> parquet")scala> spark.sql("insert into table1 select 'a',29")
>  res2: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("desc extended table1").show(false)
>  
> +-----------------------------+-------------------------------------------------------------++-------
> |col_name|data_type|comment|
> +-----------------------------+-------------------------------------------------------------++-------
> |name|string|null|
> |age|int|null|
> | | | |
> | # Detailed Table Information| | |
> |Database|default| |
> |Table|table1| |
> |Owner|Administrator| |
> |Created Time|Sun Apr 07 23:41:56 IST 2019| |
> |Last Access|Thu Jan 01 05:30:00 IST 1970| |
> |Created By|Spark 2.4.1| |
> |Type|MANAGED| |
> |Provider|hive| |
> |Table Properties|[transient_lastDdlTime=1554660716]| |
> |Location|file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1| |
> |Serde Library|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe| |
> |InputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat| |
> |OutputFormat|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| 
> |
> |Storage Properties|[serialization.format=1]| |
> |Partition Provider|Catalog| |
> +-----------------------------+-------------------------------------------------------------++-------



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to