Re: [PR] [SPARK-47444][SQL] Validate numeric table stats in ALTER TABLE SET TBLPROPERTIES [spark]

via GitHub Wed, 29 Apr 2026 22:34:30 -0700


sarutak commented on PR #55550:
URL: https://github.com/apache/spark/pull/55550#issuecomment-4349946083


   Hi @shrirangmhalgi, `numRows`, `totalSize`, and `rawDataSize` are Hive 
Metastore's internal statistics properties, populated by `ANALYZE TABLE` and 
not intended to be set by users via `SET TBLPROPERTIES`. Also, Spark manages 
its own statistics under the `spark.sql.statistics.*` prefix (users are blocked 
from setting `spark.sql.*` keys by 
`HiveExternalCatalog.verifyTableProperties()`). The prefix-less Hive keys 
(`numRows` etc.) are a separate namespace that Spark does not use for its 
optimizer when Spark's own statistics are present.
   
   SPARK-30262 (cited as motivation) was a read-side issue caused by Hive 
Metastore's internal behavior, not by users writing invalid values. The 
`.filter(_.nonEmpty)` fix was the appropriate approach.
   
   The JIRA argues that Hive validates these properties, but the context is 
different. In Hive, `SET TBLPROPERTIES` with stats keys is a *specified 
operation*. This means validation is paired with a `STATS_GENERATED = USER` 
marker that tells the Metastore to treat the update as a user-initiated 
statistics change and update `COLUMN_STATS_ACCURATE` accordingly (see 
`AbstractAlterTablePropertiesAnalyzer` and `AlterTableSetPropertiesOperation` 
in Hive). Spark has none of this machinery. Adding validation alone would 
validate input for an operation that Spark doesn't actually support as a stats 
update mechanism.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47444][SQL] Validate numeric table stats in ALTER TABLE SET TBLPROPERTIES [spark]

Reply via email to