[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-4542: - Fix Version/s: 0.14.1 (was: 0.14.0) > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Critical > Fix For: 0.14.1 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Write a Hudi table with a few commits with metadata table enabled (no column > stats). Then, run the following for the streaming query > {code:java} > CREATE TABLE t2( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', > 'read.streaming.enabled' = 'true', -- this option enable the streaming > read > 'read.start-commit' = '20220803165232362', -- specifies the start commit > instant time > 'read.streaming.check-interval' = '4' -- specifies the check interval for > finding new source commits, default 60s. > ); {code} > {code:java} > select * from t2; {code} > {code:java} > Flink SQL> select * from t2; > 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig > [] - Loaded properties from hadoop-metrics2.properties > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled > Metric snapshot period at 300 second(s). > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - > s3a-file-system metrics system started > 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream > [] - Switching to Random IO seek policy > 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy > [] - Connecting to ResourceManager at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 > 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy > [] - Connecting to Application History server at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 > 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - No path for the flink jar passed. Using the location of > class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor > [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR > environment variable is set.The Flink YARN Client needs one of these to be > set to properly load the Hadoop configuration for accessing YARN. > 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - Found Web Interface > ip-172-31-3-92.us-east-2.compute.internal:39605 of application > 'application_1659656614768_0001'. > [ERROR] Could not execute SQL statement. Reason: > java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} > {code:java} > 2022-08-04 17:12:59 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at >
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Zhang updated HUDI-4542: Fix Version/s: 0.14.0 (was: 0.13.1) > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Critical > Fix For: 0.14.0 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Write a Hudi table with a few commits with metadata table enabled (no column > stats). Then, run the following for the streaming query > {code:java} > CREATE TABLE t2( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', > 'read.streaming.enabled' = 'true', -- this option enable the streaming > read > 'read.start-commit' = '20220803165232362', -- specifies the start commit > instant time > 'read.streaming.check-interval' = '4' -- specifies the check interval for > finding new source commits, default 60s. > ); {code} > {code:java} > select * from t2; {code} > {code:java} > Flink SQL> select * from t2; > 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig > [] - Loaded properties from hadoop-metrics2.properties > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled > Metric snapshot period at 300 second(s). > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - > s3a-file-system metrics system started > 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream > [] - Switching to Random IO seek policy > 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy > [] - Connecting to ResourceManager at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 > 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy > [] - Connecting to Application History server at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 > 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - No path for the flink jar passed. Using the location of > class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor > [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR > environment variable is set.The Flink YARN Client needs one of these to be > set to properly load the Hadoop configuration for accessing YARN. > 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - Found Web Interface > ip-172-31-3-92.us-east-2.compute.internal:39605 of application > 'application_1659656614768_0001'. > [ERROR] Could not execute SQL statement. Reason: > java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} > {code:java} > 2022-08-04 17:12:59 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at >
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaojing Yu updated HUDI-4542: -- Fix Version/s: 0.13.0 (was: 0.12.1) > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Write a Hudi table with a few commits with metadata table enabled (no column > stats). Then, run the following for the streaming query > {code:java} > CREATE TABLE t2( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', > 'read.streaming.enabled' = 'true', -- this option enable the streaming > read > 'read.start-commit' = '20220803165232362', -- specifies the start commit > instant time > 'read.streaming.check-interval' = '4' -- specifies the check interval for > finding new source commits, default 60s. > ); {code} > {code:java} > select * from t2; {code} > {code:java} > Flink SQL> select * from t2; > 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig > [] - Loaded properties from hadoop-metrics2.properties > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled > Metric snapshot period at 300 second(s). > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - > s3a-file-system metrics system started > 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream > [] - Switching to Random IO seek policy > 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy > [] - Connecting to ResourceManager at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 > 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy > [] - Connecting to Application History server at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 > 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - No path for the flink jar passed. Using the location of > class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor > [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR > environment variable is set.The Flink YARN Client needs one of these to be > set to properly load the Hadoop configuration for accessing YARN. > 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - Found Web Interface > ip-172-31-3-92.us-east-2.compute.internal:39605 of application > 'application_1659656614768_0001'. > [ERROR] Could not execute SQL statement. Reason: > java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} > {code:java} > 2022-08-04 17:12:59 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at >
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4542: - Sprint: (was: 2022/09/05) > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Critical > Fix For: 0.12.1 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Write a Hudi table with a few commits with metadata table enabled (no column > stats). Then, run the following for the streaming query > {code:java} > CREATE TABLE t2( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', > 'read.streaming.enabled' = 'true', -- this option enable the streaming > read > 'read.start-commit' = '20220803165232362', -- specifies the start commit > instant time > 'read.streaming.check-interval' = '4' -- specifies the check interval for > finding new source commits, default 60s. > ); {code} > {code:java} > select * from t2; {code} > {code:java} > Flink SQL> select * from t2; > 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig > [] - Loaded properties from hadoop-metrics2.properties > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled > Metric snapshot period at 300 second(s). > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - > s3a-file-system metrics system started > 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream > [] - Switching to Random IO seek policy > 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy > [] - Connecting to ResourceManager at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 > 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy > [] - Connecting to Application History server at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 > 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - No path for the flink jar passed. Using the location of > class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor > [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR > environment variable is set.The Flink YARN Client needs one of these to be > set to properly load the Hadoop configuration for accessing YARN. > 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - Found Web Interface > ip-172-31-3-92.us-east-2.compute.internal:39605 of application > 'application_1659656614768_0001'. > [ERROR] Could not execute SQL statement. Reason: > java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} > {code:java} > 2022-08-04 17:12:59 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at >
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4542: - Sprint: 2022/09/05 (was: 2022/08/22) > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Critical > Fix For: 0.12.1 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Write a Hudi table with a few commits with metadata table enabled (no column > stats). Then, run the following for the streaming query > {code:java} > CREATE TABLE t2( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', > 'read.streaming.enabled' = 'true', -- this option enable the streaming > read > 'read.start-commit' = '20220803165232362', -- specifies the start commit > instant time > 'read.streaming.check-interval' = '4' -- specifies the check interval for > finding new source commits, default 60s. > ); {code} > {code:java} > select * from t2; {code} > {code:java} > Flink SQL> select * from t2; > 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig > [] - Loaded properties from hadoop-metrics2.properties > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled > Metric snapshot period at 300 second(s). > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - > s3a-file-system metrics system started > 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream > [] - Switching to Random IO seek policy > 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy > [] - Connecting to ResourceManager at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 > 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy > [] - Connecting to Application History server at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 > 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - No path for the flink jar passed. Using the location of > class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor > [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR > environment variable is set.The Flink YARN Client needs one of these to be > set to properly load the Hadoop configuration for accessing YARN. > 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - Found Web Interface > ip-172-31-3-92.us-east-2.compute.internal:39605 of application > 'application_1659656614768_0001'. > [ERROR] Could not execute SQL statement. Reason: > java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} > {code:java} > 2022-08-04 17:12:59 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at >
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4542: - Story Points: 2 > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Critical > Fix For: 0.12.1 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Write a Hudi table with a few commits with metadata table enabled (no column > stats). Then, run the following for the streaming query > {code:java} > CREATE TABLE t2( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', > 'read.streaming.enabled' = 'true', -- this option enable the streaming > read > 'read.start-commit' = '20220803165232362', -- specifies the start commit > instant time > 'read.streaming.check-interval' = '4' -- specifies the check interval for > finding new source commits, default 60s. > ); {code} > {code:java} > select * from t2; {code} > {code:java} > Flink SQL> select * from t2; > 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig > [] - Loaded properties from hadoop-metrics2.properties > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled > Metric snapshot period at 300 second(s). > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - > s3a-file-system metrics system started > 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream > [] - Switching to Random IO seek policy > 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy > [] - Connecting to ResourceManager at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 > 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy > [] - Connecting to Application History server at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 > 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - No path for the flink jar passed. Using the location of > class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor > [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR > environment variable is set.The Flink YARN Client needs one of these to be > set to properly load the Hadoop configuration for accessing YARN. > 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - Found Web Interface > ip-172-31-3-92.us-east-2.compute.internal:39605 of application > 'application_1659656614768_0001'. > [ERROR] Could not execute SQL statement. Reason: > java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} > {code:java} > 2022-08-04 17:12:59 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at >
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4542: - Priority: Critical (was: Major) > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Critical > Fix For: 0.12.1 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Write a Hudi table with a few commits with metadata table enabled (no column > stats). Then, run the following for the streaming query > {code:java} > CREATE TABLE t2( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', > 'read.streaming.enabled' = 'true', -- this option enable the streaming > read > 'read.start-commit' = '20220803165232362', -- specifies the start commit > instant time > 'read.streaming.check-interval' = '4' -- specifies the check interval for > finding new source commits, default 60s. > ); {code} > {code:java} > select * from t2; {code} > {code:java} > Flink SQL> select * from t2; > 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig > [] - Loaded properties from hadoop-metrics2.properties > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled > Metric snapshot period at 300 second(s). > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - > s3a-file-system metrics system started > 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream > [] - Switching to Random IO seek policy > 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy > [] - Connecting to ResourceManager at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 > 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy > [] - Connecting to Application History server at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 > 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - No path for the flink jar passed. Using the location of > class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor > [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR > environment variable is set.The Flink YARN Client needs one of these to be > set to properly load the Hadoop configuration for accessing YARN. > 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - Found Web Interface > ip-172-31-3-92.us-east-2.compute.internal:39605 of application > 'application_1659656614768_0001'. > [ERROR] Could not execute SQL statement. Reason: > java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} > {code:java} > 2022-08-04 17:12:59 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at >
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4542: -- Sprint: 2022/08/22 > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Major > Fix For: 0.12.1 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Write a Hudi table with a few commits with metadata table enabled (no column > stats). Then, run the following for the streaming query > {code:java} > CREATE TABLE t2( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', > 'read.streaming.enabled' = 'true', -- this option enable the streaming > read > 'read.start-commit' = '20220803165232362', -- specifies the start commit > instant time > 'read.streaming.check-interval' = '4' -- specifies the check interval for > finding new source commits, default 60s. > ); {code} > {code:java} > select * from t2; {code} > {code:java} > Flink SQL> select * from t2; > 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig > [] - Loaded properties from hadoop-metrics2.properties > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled > Metric snapshot period at 300 second(s). > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - > s3a-file-system metrics system started > 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream > [] - Switching to Random IO seek policy > 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy > [] - Connecting to ResourceManager at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 > 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy > [] - Connecting to Application History server at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 > 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - No path for the flink jar passed. Using the location of > class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor > [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR > environment variable is set.The Flink YARN Client needs one of these to be > set to properly load the Hadoop configuration for accessing YARN. > 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - Found Web Interface > ip-172-31-3-92.us-east-2.compute.internal:39605 of application > 'application_1659656614768_0001'. > [ERROR] Could not execute SQL statement. Reason: > java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} > {code:java} > 2022-08-04 17:12:59 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at >
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4542: -- Fix Version/s: 0.12.1 (was: 0.12.0) > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Major > Fix For: 0.12.1 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > > Environment: EMR 6.7.0 Flink 1.14.2 > Reproducible steps: Build Hudi Flink bundle from master > {code:java} > mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} > Copy to EMR master node /lib/flink/lib > Launch Flink SQL client: > {code:java} > cd /lib/flink && ./bin/yarn-session.sh --detached > ./bin/sql-client.sh {code} > Write a Hudi table with a few commits with metadata table enabled (no column > stats). Then, run the following for the streaming query > {code:java} > CREATE TABLE t2( > uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, > name VARCHAR(10), > age INT, > ts TIMESTAMP(3), > `partition` VARCHAR(20) > ) > PARTITIONED BY (`partition`) > WITH ( > 'connector' = 'hudi', > 'path' = 's3a://', > 'table.type' = 'MERGE_ON_READ', > 'read.streaming.enabled' = 'true', -- this option enable the streaming > read > 'read.start-commit' = '20220803165232362', -- specifies the start commit > instant time > 'read.streaming.check-interval' = '4' -- specifies the check interval for > finding new source commits, default 60s. > ); {code} > {code:java} > select * from t2; {code} > {code:java} > Flink SQL> select * from t2; > 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig > [] - Loaded properties from hadoop-metrics2.properties > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled > Metric snapshot period at 300 second(s). > 2022-08-05 00:12:43,650 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - > s3a-file-system metrics system started > 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream > [] - Switching to Random IO seek policy > 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy > [] - Connecting to ResourceManager at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 > 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy > [] - Connecting to Application History server at > ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 > 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - No path for the flink jar passed. Using the location of > class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor > [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR > environment variable is set.The Flink YARN Client needs one of these to be > set to properly load the Hadoop configuration for accessing YARN. > 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor > [] - Found Web Interface > ip-172-31-3-92.us-east-2.compute.internal:39605 of application > 'application_1659656614768_0001'. > [ERROR] Could not execute SQL statement. Reason: > java.lang.ClassNotFoundException: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} > {code:java} > 2022-08-04 17:12:59 > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) > at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) > at >
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4542: Description: Environment: EMR 6.7.0 Flink 1.14.2 Reproducible steps: Build Hudi Flink bundle from master {code:java} mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} Copy to EMR master node /lib/flink/lib Launch Flink SQL client: {code:java} cd /lib/flink && ./bin/yarn-session.sh --detached ./bin/sql-client.sh {code} Write a Hudi table with a few commits with metadata table enabled (no column stats). Then, run the following for the streaming query {code:java} CREATE TABLE t2( uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 's3a://', 'table.type' = 'MERGE_ON_READ', 'read.streaming.enabled' = 'true', -- this option enable the streaming read 'read.start-commit' = '20220803165232362', -- specifies the start commit instant time 'read.streaming.check-interval' = '4' -- specifies the check interval for finding new source commits, default 60s. ); {code} {code:java} select * from t2; {code} {code:java} Flink SQL> select * from t2; 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig [] - Loaded properties from hadoop-metrics2.properties 2022-08-05 00:12:43,650 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled Metric snapshot period at 300 second(s). 2022-08-05 00:12:43,650 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - s3a-file-system metrics system started 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream [] - Switching to Random IO seek policy 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy [] - Connecting to ResourceManager at ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy [] - Connecting to Application History server at ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface ip-172-31-3-92.us-east-2.compute.internal:39605 of application 'application_1659656614768_0001'. [ERROR] Could not execute SQL statement. Reason: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} {code:java} 2022-08-04 17:12:59 org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316) at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) at
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4542: Component/s: flink-sql > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Major > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4542: Description: Environment: EMR 6.7.0 Flink 1.14.2 Reproducible steps: Build Hudi Flink bundle from master {code:java} mvn clean package -DskipTests -pl :hudi-flink1.14-bundle -am {code} Copy to EMR master node /lib/flink/lib Launch Flink SQL client: {code:java} cd /lib/flink && ./bin/yarn-session.sh --detached ./bin/sql-client.sh {code} Write a Hudi table with a few commits with metadata table enabled (no column stats). Then, run the following for the streaming query {code:java} CREATE TABLE t2( uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED, name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 's3a://', 'table.type' = 'MERGE_ON_READ', 'read.streaming.enabled' = 'true', -- this option enable the streaming read 'read.start-commit' = '20220803165232362', -- specifies the start commit instant time 'read.streaming.check-interval' = '4' -- specifies the check interval for finding new source commits, default 60s. ); {code} {code:java} select * from t2; {code} {code:java} Flink SQL> select * from t2; 2022-08-05 00:12:43,635 INFO org.apache.hadoop.metrics2.impl.MetricsConfig [] - Loaded properties from hadoop-metrics2.properties 2022-08-05 00:12:43,650 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - Scheduled Metric snapshot period at 300 second(s). 2022-08-05 00:12:43,650 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl [] - s3a-file-system metrics system started 2022-08-05 00:12:47,722 INFO org.apache.hadoop.fs.s3a.S3AInputStream [] - Switching to Random IO seek policy 2022-08-05 00:12:47,941 INFO org.apache.hadoop.yarn.client.RMProxy [] - Connecting to ResourceManager at ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:8032 2022-08-05 00:12:47,942 INFO org.apache.hadoop.yarn.client.AHSProxy [] - Connecting to Application History server at ip-172-31-9-157.us-east-2.compute.internal/172.31.9.157:10200 2022-08-05 00:12:47,942 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2022-08-05 00:12:47,942 WARN org.apache.flink.yarn.YarnClusterDescriptor [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. 2022-08-05 00:12:47,959 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface ip-172-31-3-92.us-east-2.compute.internal:39605 of application 'application_1659656614768_0001'. [ERROR] Could not execute SQL statement. Reason: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat{code} {code:java} 2022-08-04 17:12:59org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategyat org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:228) at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:218) at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:209) at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:679) at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444) at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316) at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) at
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4542: Attachment: Screen Shot 2022-08-04 at 17.17.42.png > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Ethan Guo >Priority: Major > Fix For: 0.12.0 > > Attachments: Screen Shot 2022-08-04 at 17.17.42.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4542: Fix Version/s: 0.12.0 > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-4542) Flink streaming query fails with ClassNotFoundException
[ https://issues.apache.org/jira/browse/HUDI-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4542: Issue Type: Bug (was: Improvement) > Flink streaming query fails with ClassNotFoundException > --- > > Key: HUDI-4542 > URL: https://issues.apache.org/jira/browse/HUDI-4542 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)