[
https://issues.apache.org/jira/browse/IMPALA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18055226#comment-18055226
]
ASF subversion and git services commented on IMPALA-12918:
----------------------------------------------------------
Commit f2904b1627e3e735eca3ccccd2658780360ea4c5 in impala's branch
refs/heads/master from Kunal Siyag
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f2904b162 ]
IMPALA-12918: Validate numeric values for table stats properties
Added validation to ensure that table stats properties (numRows, totalSize,
rawDataSize) contain valid numeric values in ALTER TABLE SET TBLPROPERTIES
statements. Empty strings and non-numeric values will now cause an
AnalysisException to be thrown.
Testing:
- Added tests in AnalyzeDDLTest.java to verify validation logic
Change-Id: I5e8f2a9784edc86838a375d373e2095dd674d63d
Reviewed-on: http://gerrit.cloudera.org:8080/23857
Reviewed-by: Noemi Pap-Takacs <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Do not allow non-numeric values in Hive table stats during an alter table
> -------------------------------------------------------------------------
>
> Key: IMPALA-12918
> URL: https://issues.apache.org/jira/browse/IMPALA-12918
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 4.0.0
> Reporter: Miklos Szurap
> Assignee: Kunal Siyag
> Priority: Major
> Labels: alter, alter-table, catalog-2024, newbie, ramp-up,
> stats, validation
> Time Spent: 360.5h
> Remaining Estimate: 335h 40m
>
> Hive table properties are string in their nature, however some of them have
> special meaning and should have numeric values, like the "totalSize",
> "numRows", "rawDataSize".
> Impala currently allows these to be set to non-numeric values (including
> empty string).
> From certain applications (like from Spark) we get quite obscure
> "NumberFormatException" errors while trying to access such broken tables.
> (see SPARK-47444)
> Impala should also validate "alter table" statements and not allow
> non-numeric values in the "totalSize", "numRows", "rawDataSize" table
> properties.
> For example a query which may break the table (after it can't be read from
> Spark):
> {code}
> [impalacoordinator:21000] default> alter table t1p set
> tblproperties('numRows'='', 'STATS_GENERATED_VIA_STATS_TASK'='true');
> {code}
> Note: beeline/Hive validates alter table statements with the "numRows" and
> "rawDataSize", the "totalSize" still needs validation there too.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]