dimas-b commented on code in PR #1686:
URL: https://github.com/apache/polaris/pull/1686#discussion_r2124454587
##########
extension/persistence/relational-jdbc/src/test/java/org/apache/polaris/extension/persistence/relational/jdbc/QueryGeneratorTest.java:
##########
@@ -222,4 +222,29 @@ void testGenerateWhereClause_emptyMap() {
Map<String, Object> whereClause = Collections.emptyMap();
assertEquals("", QueryGenerator.generateWhereClause(whereClause));
}
+
+ @Test
+ void testGenerateOverlapQuery() {
+ String realmId = "realmId";
+ int parentId = "polaris".hashCode();
+
+ assertEquals(
+ "SELECT entity_version, to_purge_timestamp, internal_properties, "
+ + "catalog_id, purge_timestamp, sub_type_code, create_timestamp,
last_update_timestamp, "
+ + "parent_id, name, location, id, drop_timestamp, properties,
grant_records_version, "
+ + "type_code FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id =
'realmId' AND parent_id = "
+ + "-398224152 AND (1 = 2 OR location = '/' OR location = '/tmp/'
OR location = '/tmp/location/' "
+ + "OR location LIKE '/tmp/location/%')",
+ QueryGenerator.generateOverlapQuery(realmId, parentId,
"/tmp/location/"));
+
+ assertEquals(
+ "SELECT entity_version, to_purge_timestamp, internal_properties,
catalog_id, "
+ + "purge_timestamp, sub_type_code, create_timestamp,
last_update_timestamp, parent_id, "
+ + "name, location, id, drop_timestamp, properties,
grant_records_version, type_code "
+ + "FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 'realmId' AND
parent_id = -398224152 "
+ + "AND (1 = 2 OR location = 's3:/' OR location = 's3://' OR
location = 's3://bucket/' OR "
+ + "location = 's3://bucket/tmp/' OR location =
's3://bucket/tmp/location/' OR location "
Review Comment:
> Do you mean what if a storage technology does that?
If we later support a storage technology with case-insensitive file paths,
the existing implementation, which is grounded in a database-level string
comparison feature, will have a hard time supporting it (initially it will not
work correctly, as far as I can tell).
Conversely, if the admin user for some reason (accidentally?) configures the
database to perform case-insensitive comparisons, it will break Polaris
Management API contracts (behaviour) for the current set of storage
technologies.
All in all, current approach appear to introduce too many obscure
dependencies between API-level behaviours and backend behaviours (IMHO).
I support the idea of leveraging database capabilities to facilitate finding
overlapping catalogs. However, I think it should be done in a way and any
oddities at the database level do not affect the correctness of user-visible
Polaris behaviour. In other words, if something is not aligned at the database
level, it would be acceptable for Polaris performance to degrade as long as
correctness is not affected.
In that regard, the presence of any entity with a location, but without an
indexed location, should probably invalidate the optimized lookup result. If
this requires admin user intervention, we have to make it explicit.
Also, the parsing of the URI into path components (prefixes) should probably
be generalized outside the Persistence layer (for reuse by all Persistence
implementations).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]