[ https://issues.apache.org/jira/browse/IOTDB-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
刘珍 reopened IOTDB-5467: ----------------------- 不过这个异常,需要分析一下: java.lang.RuntimeException: cannot fetch schema, status is: 301, msg is: There is not enough memory to execute current fragment instance, current remaining free memory is 10409, estimated memory usage for current fragment instance is 131072 > Execute query : ERROR o.a.i.d.m.e.e.RegionWriteExecutor:88 - Fetch Schema > failed > -------------------------------------------------------------------------------- > > Key: IOTDB-5467 > URL: https://issues.apache.org/jira/browse/IOTDB-5467 > Project: Apache IoTDB > Issue Type: Bug > Components: mpp-cluster > Affects Versions: 1.0.1-SNAPSHOT > Reporter: 刘珍 > Assignee: Minghui Liu > Priority: Major > Attachments: auto_set_ttl_per_sg.sh, > confignode_ip23_leader_logs.tar.gz, datanode_ip2_fetch_schema_failed.tar.gz, > image-2023-02-03-14-39-21-109.png, image-2023-02-03-14-39-33-195.png, lt.conf > > > 测试版本:rc/1.0.1 20230129 573097a > 问题描述: > 启动3副本3C21D集群, > 2023-1-31 16:40:00 , 启动1个Benchmark连ip2(操作执行间隔OP_INTERVAL=1000)执行读写, > 设置TTL 为1小时(脚本见附件) > 2023-02-02 02:21:08 ConfigNode Leader(ip23)报错,连不上ip7的datanode(unkown) > IP2 datanode log: > 2023-02-02 02:51:04,625 > [pool-26-IoTDB-DataNodeInternalRPC-Processor-9]{color:red}* ERROR > o.a.i.d.m.e.e.RegionWriteExecutor:88 - Fetch Schema failed. > java.lang.RuntimeException: Fetch Schema failed.*{color} > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:202) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:202) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:174) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:128) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:86) > at > org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:288) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.iotdb.commons.exception.IoTDBException: > org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceFailureInfo$FailureException > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.dealWithException(QueryExecution.java:428) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getResult(QueryExecution.java:411) > at > org.apache.iotdb.db.mpp.plan.execution.QueryExecution.getBatchResult(QueryExecution.java:437) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:200) > ... 18 common frames omitted > Caused by: > org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceFailureInfo$FailureException: > null > at > org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceManager.lambda$cancelTimeoutFlushingInstances$8(FragmentInstanceManager.java:288) > at > java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) > at > java.base/java.util.concurrent.ConcurrentHashMap$EntrySpliterator.forEachRemaining(ConcurrentHashMap.java:3645) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > at > java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > at > java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) > at > org.apache.iotdb.db.mpp.execution.fragment.FragmentInstanceManager.cancelTimeoutFlushingInstances(FragmentInstanceManager.java:288) > at > org.apache.iotdb.commons.concurrent.threadpool.ScheduledExecutorUtil.lambda$scheduleWithFixedDelay$1(ScheduledExecutorUtil.java:177) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable$1.runMayThrow(WrappedRunnable.java:44) > at > org.apache.iotdb.commons.concurrent.WrappedRunnable.run(WrappedRunnable.java:29) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at > java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2023-02-02 04:00:52,056 [pool-26-IoTDB-DataNodeInternalRPC-Processor-16] > ERROR o.a.i.d.m.e.e.RegionWriteExecutor:88 - cannot fetch schema, status is: > 301, msg is: {color:red}*There is not enough memory to execute current > fragment instance, current remaining free memory is 10409, estimated memory > usage for current fragment instance is 131072*{color} > java.lang.RuntimeException: cannot fetch schema, status is: 301, msg is: > There is not enough memory to execute current fragment instance, current > remaining free memory is 10409, estimated memory usage for current fragment > instance is 131072 > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.executeSchemaFetchQuery(ClusterSchemaFetcher.java:188) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:156) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchema(ClusterSchemaFetcher.java:98) > at > org.apache.iotdb.db.mpp.plan.analyze.ClusterSchemaFetcher.fetchSchemaWithAutoCreate(ClusterSchemaFetcher.java:265) > at > org.apache.iotdb.db.mpp.plan.analyze.SchemaValidator.validate(SchemaValidator.java:56) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.executeDataInsert(RegionWriteExecutor.java:202) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:174) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor$WritePlanNodeExecutionVisitor.visitInsertTablet(RegionWriteExecutor.java:128) > at > org.apache.iotdb.db.mpp.plan.planner.plan.node.write.InsertTabletNode.accept(InsertTabletNode.java:1086) > at > org.apache.iotdb.db.mpp.execution.executor.RegionWriteExecutor.execute(RegionWriteExecutor.java:86) > at > org.apache.iotdb.db.service.thrift.impl.DataNodeInternalRPCServiceImpl.sendPlanNode(DataNodeInternalRPCServiceImpl.java:288) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3607) > at > org.apache.iotdb.mpp.rpc.thrift.IDataNodeRPCService$Processor$sendPlanNode.getResult(IDataNodeRPCService.java:3587) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 测试详细信息: > 1.启动3C21D集群 > 3C:172.16.2.23/24/25 /data/iotdb/r_0129_573097a/logs > 21D: 172.16.2.2 ~ 172.16.2.22 /data1/iotdb/r_0129_573097a > 配置参数: > ConfigNode配置 > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > cn_target_config_node_list=172.16.2.23:10710 > DataNode配置: > MAX_HEAP_SIZE="20G" > MAX_DIRECT_MEMORY_SIZE="6G" > dn_target_config_node_list=172.16.2.23:10710,172.16.2.24:10710,172.16.2.25:10710 > Common配置: > schema_replication_factor=3 > data_replication_factor=3 > 2. 启动Benchmark,配置见附件,主要参数如下: > DEVICE_NUMBER=4200 > SENSOR_NUMBER=600 > CLIENT_NUMBER=210 > GROUP_NUMBER=1 > OPERATION_PROPORTION=91:1:1:1:1:0:1:1:1:1:1 > 3. 启动BM开始写入数据后,启动设置TTL的脚本(见附件) > 脚本位置在172.16.2.205 > /data/iotdb/deploy_mpp_scripts_0110 > 4. 查看日志 > 2023-02-02 02:21:08 ConfigNode Leader(ip23)报错,连不上ip7的datanode(unkown),ip7 > ping不通。 > 查看datanode的报错日志,见问题描述。 > 集群状态: > !image-2023-02-03-14-39-21-109.png! > region状态: > !image-2023-02-03-14-39-33-195.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)