[ https://issues.apache.org/jira/browse/HUDI-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Kudinkin reassigned HUDI-5276: ------------------------------------- Assignee: Ethan Guo (was: Alexey Kudinkin) > Hudi getAllQueryPartitionPaths use regular match caused Invalid input path > add > ------------------------------------------------------------------------------- > > Key: HUDI-5276 > URL: https://issues.apache.org/jira/browse/HUDI-5276 > Project: Apache Hudi > Issue Type: Bug > Reporter: yuehanwang > Assignee: Ethan Guo > Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > > When we query sql in hive like: > select mainwaybillno, > zonecode, > accountantcode, > baroprcode, > opcode, > row_number() over(PARTITION BY mainwaybillno, zonecode, opcode ORDER BY > barscantm) sn from dm_kafka_rdmp_dw.fvp_core_fact_route_op_hudi_op_new_rt > WHERE opcode IN ('50') and inc_day='20221120' limit 10; > In MapReduce Job the config > mapreduce.input.fileinputformat.inputdir=hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=50 > But this file split > hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=5000 > was added to the job. > This job was failed and throw exception : > 2022-11-21 18:11:33,895 INFO [IPC Server handler 1 on 45077] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from > attempt_1668750926041_1011874_m_000110_0: Error: java.lang.RuntimeException: > java.lang.IllegalStateException: Invalid input path > hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2 > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.IllegalStateException: Invalid input path > hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2 > at > org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119) > at > org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) > ... 8 more > 2022-11-21 18:11:33,897 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics > report from attempt_1668750926041_1011874_m_000110_0: Error: > java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input > path > hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2 > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.IllegalStateException: Invalid input path > hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2 > at > org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119) > at > org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) > ... 8 more -- This message was sent by Atlassian Jira (v8.20.10#820010)