[ https://issues.apache.org/jira/browse/IMPALA-9175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-9175: ---------------------------------- Priority: Major (was: Blocker) > Revisit the error handling logics in ORC scanner > ------------------------------------------------ > > Key: IMPALA-9175 > URL: https://issues.apache.org/jira/browse/IMPALA-9175 > Project: IMPALA > Issue Type: Task > Reporter: Quanlong Huang > Assignee: Norbert Luksa > Priority: Major > > This is a task to revisit all the corresponding error handling logics in the > ORC scanner comparing to the Parquet scanner. For each kind of error handling > in the parquet scanner, make sure we already handle it in the orc scanner, > otherwise create separate JIRAs to handle them. > Also need to make sure whether the exposed error messages are enough for > debugging. For instance, one frequently encountered error when Impala has > stale metadata of an ORC file is: > {code:java} > Encountered parse error in tail of ORC file > hdfs://hadoop2cluster/user/hive-0.13.1/warehouse/bi_ucar.db/alliance_driver_stat_hour_api/dt=2019-08-09/part-00006: > Invalid ORC postscript length > {code} > It'd be better to also print the postscript length we read and the file size. > So users can know whether the file is corrupt (so need data regeneration) or > the metadata is stale (so need refresh). We may need changes in the ORC lib > for these. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org