[ 
https://issues.apache.org/jira/browse/HUDI-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-5276:
----------------------------------
    Description: 
 

When we query sql in hive like:

select mainwaybillno,
zonecode,
accountantcode,
baroprcode,
opcode,
row_number() over(PARTITION BY mainwaybillno, zonecode, opcode ORDER BY 
barscantm) sn from dm_kafka_rdmp_dw.fvp_core_fact_route_op_hudi_op_new_rt WHERE 
opcode IN ('50') and inc_day='20221120' limit 10;

In MapReduce Job the config 
mapreduce.input.fileinputformat.inputdir=hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=50

But this file split 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=5000
 was added to the job.

This job was failed and throw exception :
2022-11-21 18:11:33,895 INFO [IPC Server handler 1 on 45077] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from 
attempt_1668750926041_1011874_m_000110_0: Error: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more

2022-11-21 18:11:33,897 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1668750926041_1011874_m_000110_0: Error: 
java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more

  was:
When we query sql in hive like:

select mainwaybillno,
zonecode,
accountantcode,
baroprcode,
opcode,
row_number() over(PARTITION BY mainwaybillno, zonecode, opcode ORDER BY 
barscantm) sn from dm_kafka_rdmp_dw.fvp_core_fact_route_op_hudi_op_new_rt WHERE 
opcode IN ('50') and inc_day='20221120' limit 10;

In MapReduce Job the config 
mapreduce.input.fileinputformat.inputdir=hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=50

But this file split 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=5000
 was added to the job.

This job was failed and throw exception :
2022-11-21 18:11:33,895 INFO [IPC Server handler 1 on 45077] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from 
attempt_1668750926041_1011874_m_000110_0: Error: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more

2022-11-21 18:11:33,897 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1668750926041_1011874_m_000110_0: Error: 
java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.IllegalStateException: Invalid input path 
hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more


> Hudi getAllQueryPartitionPaths use regular match caused Invalid input path 
> add 
> -------------------------------------------------------------------------------
>
>                 Key: HUDI-5276
>                 URL: https://issues.apache.org/jira/browse/HUDI-5276
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: yuehanwang
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>
>  
> When we query sql in hive like:
> select mainwaybillno,
> zonecode,
> accountantcode,
> baroprcode,
> opcode,
> row_number() over(PARTITION BY mainwaybillno, zonecode, opcode ORDER BY 
> barscantm) sn from dm_kafka_rdmp_dw.fvp_core_fact_route_op_hudi_op_new_rt 
> WHERE opcode IN ('50') and inc_day='20221120' limit 10;
> In MapReduce Job the config 
> mapreduce.input.fileinputformat.inputdir=hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=50
> But this file split 
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=5000
>  was added to the job.
> This job was failed and throw exception :
> 2022-11-21 18:11:33,895 INFO [IPC Server handler 1 on 45077] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from 
> attempt_1668750926041_1011874_m_000110_0: Error: java.lang.RuntimeException: 
> java.lang.IllegalStateException: Invalid input path 
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.IllegalStateException: Invalid input path 
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
> at 
> org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
> ... 8 more
> 2022-11-21 18:11:33,897 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1668750926041_1011874_m_000110_0: Error: 
> java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input 
> path 
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.IllegalStateException: Invalid input path 
> hdfs://dw/hive/warehouse/dm/dm_kafka_rdmp_dw/fvp_core_fact_route_op_hudi_op_new/inc_day=20221120/opcode=501/.00000006-2d6e-4d26-93ea-1026632abb67_20221119235956333.log.1_44-150-2
> at 
> org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:119)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:452)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1106)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:482)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
> ... 8 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to