刘珍 created IOTDB-5467: ------------------------- Summary: Execute query : ERROR o.a.i.d.m.e.e.RegionWriteExecutor:88 - Fetch Schema failed Key: IOTDB-5467 URL: https://issues.apache.org/jira/browse/IOTDB-5467 Project: Apache IoTDB Issue Type: Bug Components: mpp-cluster Affects Versions: 1.0.1 Reporter: 刘珍 Assignee: Minghui Liu Attachments: image-2023-02-03-14-39-21-109.png, image-2023-02-03-14-39-33-195.png
测试版本:rc/1.0.1 20230129 573097a 问题描述: 启动3副本3C21D集群, 2023-1-31 16:40:00 , 启动1个Benchmark连ip2(操作执行间隔OP_INTERVAL=1000)执行读写, 设置TTL 为1小时(脚本见附件) 2023-02-02 02:21:08 ConfigNode Leader(ip23)报错,连不上ip7的datanode(unkown) IP2 datanode log: 2023-02-02 02:51:04,625 [pool-26-IoTDB-DataNodeInternalRPC-Processor-9]{color:red}* ERROR o.a.i.d.m.e.e.RegionWriteExecutor:88 - Fetch Schema failed. java.lang.RuntimeException: Fetch Schema failed.*{color} at org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:202) at org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) at org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) at org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) at org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:202) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:174) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:128) at org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:86) at org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:288) at org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) at org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.apache.iotdb.commons.exception.IoTDBException: org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceFailureInfo$FailureException at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.dealWithException(QueryExecution.java:428) at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getResult(QueryExecution.java:411) at org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getBatchResult(QueryExecution.java:437) at org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:200) ... 18 common frames omitted Caused by: org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceFailureInfo$FailureException: null at org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceManager.lambda$cancelTimeoutFlushingInstances$8(FragmentInstanceManager.java:288) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) at java.base/java.util.concurrent.ConcurrentHashMap$EntrySpliterator.forEachRemaining(ConcurrentHashMap.java:3645) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) at org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceManager.cancelTimeoutFlushingInstances(FragmentInstanceManager.java:288) at org.apache.iotdb.commons.concurrent.threadpool.ScheduledExecutorUtil.lambda$scheduleWithFixedDelay$1(ScheduledExecutorUtil.java:177) at org.apache.iotdb.commons.concurrent.WrappedRunnable$1.runMayThrow(WrappedRunnable.java:44) at org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2023-02-02 04:00:52,056 [pool-26-IoTDB-DataNodeInternalRPC-Processor-16] ERROR o.a.i.d.m.e.e.RegionWriteExecutor:88 - cannot fetch schema, status is: 301, msg is: {color:red}*There is not enough memory to execute current fragment instance, current remaining free memory is 10409, estimated memory usage for current fragment instance is 131072*{color} java.lang.RuntimeException: cannot fetch schema, status is: 301, msg is: There is not enough memory to execute current fragment instance, current remaining free memory is 10409, estimated memory usage for current fragment instance is 131072 at org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:188) at org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) at org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) at org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) at org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:202) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:174) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:128) at org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) at org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:86) at org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:288) at org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) at org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 测试详细信息: 1.启动3C21D集群 3C:172.16.2.23/24/25 /data/iotdb/r_0129_573097a/logs 21D: 172.16.2.2 ~ 172.16.2.22 /data1/iotdb/r_0129_573097a 配置参数: ConfigNode配置 MAX_HEAP_SIZE="20G" MAX_DIRECT_MEMORY_SIZE="6G" cn_target_config_node_list=172.16.2.23:10710 DataNode配置: MAX_HEAP_SIZE="20G" MAX_DIRECT_MEMORY_SIZE="6G" dn_target_config_node_list=172.16.2.23:10710,172.16.2.24:10710,172.16.2.25:10710 Common配置: schema_replication_factor=3 data_replication_factor=3 2. 启动Benchmark,配置见附件,主要参数如下: DEVICE_NUMBER=4200 SENSOR_NUMBER=600 CLIENT_NUMBER=210 GROUP_NUMBER=1 OPERATION_PROPORTION=91:1:1:1:1:0:1:1:1:1:1 3. 启动BM开始写入数据后,启动设置TTL的脚本(见附件) 4. 查看日志 2023-02-02 02:21:08 ConfigNode Leader(ip23)报错,连不上ip7的datanode(unkown),ip7 ping不通。 查看datanode的报错日志,见问题描述。 集群状态: !image-2023-02-03-14-39-21-109.png! region状态: !image-2023-02-03-14-39-33-195.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)