Re: [PR] [AINode] Adding scheduler to support concurrent inference [iotdb]

via GitHub Fri, 25 Jul 2025 22:16:25 -0700


yunbow30944 commented on code in PR #16005:
URL: https://github.com/apache/iotdb/pull/16005#discussion_r2232580192



##########
iotdb-core/ainode/ainode/core/inference/scheduler/basic_scheduler.py:
##########
@@ -0,0 +1,69 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import torch
+
+from ainode.core.inference.inference_request import InferenceRequest
+from ainode.core.inference.scheduler.abstract_scheduler import 
AbstractScheduler
+from ainode.core.log import Logger
+
+logger = Logger()
+
+
+class BasicScheduler(AbstractScheduler):
+
+    def __init__(
+        self,
+        waiting_queue,
+        running_queue,
+        finished_queue,
+        max_memory_bytes=1 << 30,
+        max_activate_size=10,
+        max_step_size=10,
+    ):
+        super().__init__(waiting_queue, running_queue, finished_queue)
+        self.max_memory_bytes = max_memory_bytes
+        self.max_activate_size = max_activate_size
+        self.max_step_size = max_step_size
+
+    def memory_is_available(self):
+        used = torch.cuda.memory_allocated()  # memory allocated to tensors

Review Comment:
   The allocated memory of this process , i.e. the inference request pool , but 
it only includes memory allocated to tensors.
   The total GPU memory reserved by the PyTorch caching allocator for this 
process (including memory held for reuse), can be obtained via 
`torch.cuda.memory_reserved()` .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [AINode] Adding scheduler to support concurrent inference [iotdb]

Reply via email to