Copilot commented on code in PR #5085:
URL: https://github.com/apache/texera/pull/5085#discussion_r3251682742
##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/DocumentFactory.scala:
##########
@@ -103,6 +103,40 @@ object DocumentFactory {
}
}
+ /**
+ * Check whether a document exists at the given URI without opening it.
+ *
+ * Returns true iff the underlying storage already has an entry for this
+ * URI (e.g., an iceberg table at the resolved namespace + storage key).
+ * Useful for "create only if absent" flows that would otherwise have to
+ * call `openDocument` inside a try/catch to test existence.
+ */
+ def documentExists(uri: URI): Boolean = {
+ uri.getScheme match {
+ case VFS_FILE_URI_SCHEME =>
+ val (_, _, _, resourceType) = decodeURI(uri)
+ val storageKey = sanitizeURIPath(uri)
+
+ val namespace = resourceType match {
+ case RESULT => StorageConfig.icebergTableResultNamespace
+ case CONSOLE_MESSAGES =>
StorageConfig.icebergTableConsoleMessagesNamespace
+ case RUNTIME_STATISTICS =>
StorageConfig.icebergTableRuntimeStatisticsNamespace
+ case STATE => StorageConfig.icebergTableStateNamespace
+ case _ =>
+ throw new IllegalArgumentException(s"Resource type $resourceType
is not supported")
+ }
+
+ IcebergUtil
+ .loadTableMetadata(IcebergCatalogInstance.getInstance(), namespace,
storageKey)
+ .isDefined
+
Review Comment:
`documentExists` currently checks existence by calling
`IcebergUtil.loadTableMetadata(...).isDefined`. `loadTableMetadata` catches all
exceptions and returns `None`, so this can produce false negatives (returning
`false` when the table exists but the catalog/load fails transiently) and it
also loads full table metadata which is heavier than necessary. Consider using
the Iceberg catalog's `tableExists` (or a dedicated utility) and only treating
actual "not found" as `false`, while surfacing unexpected errors.
##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/DocumentFactory.scala:
##########
@@ -103,6 +103,40 @@ object DocumentFactory {
}
}
+ /**
+ * Check whether a document exists at the given URI without opening it.
+ *
+ * Returns true iff the underlying storage already has an entry for this
+ * URI (e.g., an iceberg table at the resolved namespace + storage key).
+ * Useful for "create only if absent" flows that would otherwise have to
+ * call `openDocument` inside a try/catch to test existence.
+ */
+ def documentExists(uri: URI): Boolean = {
+ uri.getScheme match {
+ case VFS_FILE_URI_SCHEME =>
+ val (_, _, _, resourceType) = decodeURI(uri)
+ val storageKey = sanitizeURIPath(uri)
+
+ val namespace = resourceType match {
+ case RESULT => StorageConfig.icebergTableResultNamespace
+ case CONSOLE_MESSAGES =>
StorageConfig.icebergTableConsoleMessagesNamespace
+ case RUNTIME_STATISTICS =>
StorageConfig.icebergTableRuntimeStatisticsNamespace
+ case STATE => StorageConfig.icebergTableStateNamespace
+ case _ =>
+ throw new IllegalArgumentException(s"Resource type $resourceType
is not supported")
+ }
+
+ IcebergUtil
+ .loadTableMetadata(IcebergCatalogInstance.getInstance(), namespace,
storageKey)
+ .isDefined
+
+ case unsupportedScheme =>
+ throw new UnsupportedOperationException(
+ s"Unsupported URI scheme: $unsupportedScheme for checking the
document"
Review Comment:
The exception message for unsupported schemes says "for checking the
document", which is ambiguous (this method isn't opening/creating a document).
Consider clarifying it to "for checking document existence" (or referencing
`documentExists`) to make debugging easier.
##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/DocumentFactory.scala:
##########
@@ -103,6 +103,40 @@ object DocumentFactory {
}
}
+ /**
+ * Check whether a document exists at the given URI without opening it.
+ *
+ * Returns true iff the underlying storage already has an entry for this
+ * URI (e.g., an iceberg table at the resolved namespace + storage key).
+ * Useful for "create only if absent" flows that would otherwise have to
+ * call `openDocument` inside a try/catch to test existence.
+ */
+ def documentExists(uri: URI): Boolean = {
+ uri.getScheme match {
+ case VFS_FILE_URI_SCHEME =>
+ val (_, _, _, resourceType) = decodeURI(uri)
+ val storageKey = sanitizeURIPath(uri)
+
+ val namespace = resourceType match {
+ case RESULT => StorageConfig.icebergTableResultNamespace
+ case CONSOLE_MESSAGES =>
StorageConfig.icebergTableConsoleMessagesNamespace
+ case RUNTIME_STATISTICS =>
StorageConfig.icebergTableRuntimeStatisticsNamespace
+ case STATE => StorageConfig.icebergTableStateNamespace
+ case _ =>
+ throw new IllegalArgumentException(s"Resource type $resourceType
is not supported")
+ }
Review Comment:
The `resourceType` -> namespace mapping is duplicated across
`createDocument`, `openDocument`, and now `documentExists`. This duplication
risks the methods diverging when new `VFSResourceType`s/namespaces are added.
Consider extracting a shared helper (e.g., `resolveNamespace(resourceType)`) to
keep the behavior consistent.
##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/DocumentFactory.scala:
##########
@@ -103,6 +103,40 @@ object DocumentFactory {
}
}
+ /**
+ * Check whether a document exists at the given URI without opening it.
+ *
+ * Returns true iff the underlying storage already has an entry for this
+ * URI (e.g., an iceberg table at the resolved namespace + storage key).
+ * Useful for "create only if absent" flows that would otherwise have to
+ * call `openDocument` inside a try/catch to test existence.
+ */
+ def documentExists(uri: URI): Boolean = {
+ uri.getScheme match {
+ case VFS_FILE_URI_SCHEME =>
Review Comment:
`documentExists` adds new URI-handling behavior (supported scheme +
supported resource types + exception cases) but there are no unit tests
covering it. Since `workflow-core` already has Iceberg-backed tests using
`DocumentFactory.createDocument/openDocument`, it would be valuable to add a
spec that asserts `documentExists` returns true after `createDocument`, false
for a fresh URI, and throws on unsupported schemes/resource types.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]