pan3793 commented on code in PR #7176:
URL: https://github.com/apache/kyuubi/pull/7176#discussion_r2306146265
##########
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JdbcDatabaseDialect.scala:
##########
@@ -31,6 +32,10 @@ class GenericDatabaseDialect extends JdbcDatabaseDialect {
s"LIMIT $limit OFFSET $offset"
}
+ override def deleteFromLimitClause(limit: Int): String = {
+ "" // Generic dialect does not support LIMIT in DELETE statements
Review Comment:
let's add a warning log here. it won't bother the user since generally, it
won't fall back to `GenericDatabaseDialect`
##########
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataManager.scala:
##########
@@ -253,6 +256,31 @@ class MetadataManager extends
AbstractService("MetadataManager") {
TimeUnit.MILLISECONDS)
}
+ @VisibleForTesting
+ private[metadata] def cleanupMetadata(maxAge: Long, batchSize: Int,
batchInterval: Long): Unit = {
+ var needToCleanMetadata = true
+ var needToCleanKubernetesInfo = true
+
+ while (needToCleanMetadata || needToCleanKubernetesInfo) {
Review Comment:
let's have an enforced loop round threshold(maybe `100`, no need to be
configurableļ¼ hardcode is fine) to guard this while loop.
##########
kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala:
##########
@@ -2077,6 +2077,24 @@ object KyuubiConf {
.timeConf
.createWithDefault(Duration.ofMinutes(30).toMillis)
+ val METADATA_CLEANER_BATCH_SIZE: ConfigEntry[Int] =
+ buildConf("kyuubi.metadata.cleaner.batch.size")
+ .serverOnly
+ .doc("The batch size for cleaning expired metadata. " +
+ "This is used to avoid holding the delete lock for a long time " +
+ "when there are too many expired metadata to be cleaned.")
+ .version("1.11.0")
+ .intConf
+ .createWithDefault(Int.MaxValue)
Review Comment:
I expect this is a general perf optimization, should we give a more
reasonable default value? e.g. 1000 or 3000
##########
kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala:
##########
@@ -2077,6 +2077,24 @@ object KyuubiConf {
.timeConf
.createWithDefault(Duration.ofMinutes(30).toMillis)
+ val METADATA_CLEANER_BATCH_SIZE: ConfigEntry[Int] =
+ buildConf("kyuubi.metadata.cleaner.batch.size")
+ .serverOnly
+ .doc("The batch size for cleaning expired metadata. " +
+ "This is used to avoid holding the delete lock for a long time " +
+ "when there are too many expired metadata to be cleaned.")
+ .version("1.11.0")
+ .intConf
+ .createWithDefault(Int.MaxValue)
+
+ val METADATA_CLEANER_BATCH_INTERVAL: ConfigEntry[Long] =
+ buildConf("kyuubi.metadata.cleaner.batch.interval")
+ .serverOnly
+ .internal
+ .doc("The interval to wait before next batch of cleaning expired
metadata.")
+ .timeConf
+ .createWithDefault(Duration.ofMillis(100).toMillis)
Review Comment:
this is a lower-priority work, let's have at least 1s interval to avoid
impact for batch scheduling
##########
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataManager.scala:
##########
@@ -253,6 +256,31 @@ class MetadataManager extends
AbstractService("MetadataManager") {
TimeUnit.MILLISECONDS)
}
+ @VisibleForTesting
+ private[metadata] def cleanupMetadata(maxAge: Long, batchSize: Int,
batchInterval: Long): Unit = {
+ var needToCleanMetadata = true
+ var needToCleanKubernetesInfo = true
+
+ while (needToCleanMetadata || needToCleanKubernetesInfo) {
+ if (needToCleanMetadata) {
+ needToCleanMetadata =
+ withMetadataRequestMetrics(_metadataStore.cleanupMetadataByAge(
+ maxAge,
+ batchSize)) >= batchSize
+ }
+ if (needToCleanKubernetesInfo) {
+ needToCleanKubernetesInfo =
+
withMetadataRequestMetrics(_metadataStore.cleanupKubernetesEngineInfoByAge(
+ maxAge,
+ batchSize)) >= batchSize
+ }
+ if (needToCleanMetadata || needToCleanKubernetesInfo) {
+ info("Sleep for " + batchInterval + "ms before next metadata cleanup
batch")
Review Comment:
why not use an interpreted string?
##########
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/MetadataStore.scala:
##########
@@ -102,8 +102,13 @@ trait MetadataStore extends Closeable {
/**
* Check and cleanup the terminated batches information with maxAge
limitation.
* @param maxAge the batch state info maximum age.
+ * @param limit the maximum number of metadata to be cleaned up.
*/
- def cleanupMetadataByAge(maxAge: Long): Unit
+ def cleanupMetadataByAge(maxAge: Long, limit: Int): Int
+
+ def cleanupMetadataByAge(maxAge: Long): Int = {
+ cleanupMetadataByAge(maxAge, Int.MaxValue)
Review Comment:
it worth nothing to keep this API, since the return value changed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]