szehon-ho commented on code in PR #6698:
URL: https://github.com/apache/iceberg/pull/6698#discussion_r1131801430
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -53,12 +70,14 @@ public class CachedClientPool implements
ClientPool<IMetaStoreClient, TException
properties,
CatalogProperties.CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS,
CatalogProperties.CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS_DEFAULT);
+ this.keySuppliers =
Review Comment:
While I like the keySupplier idea, wouldn't it be much simpler if we just
use a static map of suppliers, like:
```
static final Map<String, Supplier> keySuppliers = ImmutableMap.of(
"uri", () -> {
try {
return UserGroupInformation.getCurrentUser();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}),
"user_name" -> {
() -> {
try {
String userName =
UserGroupInformation.getCurrentUser().getUserName();
return UserNameElement.of(userName);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
});
}
```
(as i dont see any need to generate this on the fly.) And then here in
CachedClientPool CTOR, we can just use key to use the conf to make a key?
##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -119,6 +119,26 @@ private CatalogProperties() {}
"client.pool.cache.eviction-interval-ms";
public static final long CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS_DEFAULT =
TimeUnit.MINUTES.toMillis(5);
+ /**
+ * A comma separated list of elements that are used to compose the key of
the client pool cache.
+ *
+ * <p>The following elements are supported:
+ *
+ * <ul>
+ * <li>URI - as specified by {@link CatalogProperties#URI}. URI will be
the only element when
Review Comment:
Actually , I am now thinking I don't see any use-case where you would turn
off any of these from the key. What do you think? From user point of view,
probably simpler is better and we should have reasonable defaults.
Maybe we could have a configurable key but just limit it to to add
additional hive conf if we missed something, as a safety valve. In any case,
without dynamic loading, the user cant add additional code suppliers like ugi,
and thus is limited to just setting config values here.
cc @pvary @RussellSpitzer @flyrain for thoughts.
##########
hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java:
##########
@@ -87,4 +106,125 @@ public <R> R run(Action<R, IMetaStoreClient, TException>
action, boolean retry)
throws TException, InterruptedException {
return clientPool().run(action, retry);
}
+
+ @VisibleForTesting
+ static Key toKey(List<Supplier<Object>> suppliers) {
+ return
Key.of(suppliers.stream().map(Supplier::get).collect(Collectors.toList()));
+ }
+
+ @VisibleForTesting
+ static List<Supplier<Object>> extractKeySuppliers(String cacheKeys,
Configuration conf) {
+ URIElement uri =
URIElement.of(conf.get(HiveConf.ConfVars.METASTOREURIS.varname, ""));
+ if (cacheKeys == null || cacheKeys.isEmpty()) {
+ return Collections.singletonList(() -> uri);
+ }
+
+ // generate key elements in a certain order, so that the Key instances are
comparable
+ Set<KeyElementType> types =
Sets.newTreeSet(Comparator.comparingInt(Enum::ordinal));
+ Map<String, String> confElements = Maps.newTreeMap();
+ for (String element : cacheKeys.split(",", -1)) {
+ String trimmed = element.trim();
+ if (trimmed.toLowerCase(Locale.ROOT).startsWith(CONF_ELEMENT_PREFIX)) {
+ String key = trimmed.substring(CONF_ELEMENT_PREFIX.length());
+ ValidationException.check(
+ !confElements.containsKey(key), "Conf key element %s already
specified", key);
+ confElements.put(key, conf.get(key));
+ } else {
+ KeyElementType type = KeyElementType.valueOf(trimmed.toUpperCase());
+ switch (type) {
+ case URI:
+ case UGI:
+ case USER_NAME:
+ ValidationException.check(
+ types.add(type), "%s key element already specified",
type.name());
+ break;
+ default:
+ throw new ValidationException("Unknown key element %s", trimmed);
+ }
+ }
+ }
+ ImmutableList.Builder<Supplier<Object>> suppliers =
ImmutableList.builder();
+ for (KeyElementType type : types) {
+ switch (type) {
+ case URI:
+ suppliers.add(() -> uri);
+ break;
+ case UGI:
+ suppliers.add(
+ () -> {
+ try {
+ return UserGroupInformation.getCurrentUser();
+ } catch (IOException e) {
+ throw new UncheckedIOException(e);
+ }
+ });
+ break;
+ case USER_NAME:
+ suppliers.add(
+ () -> {
+ try {
+ String userName =
UserGroupInformation.getCurrentUser().getUserName();
+ return UserNameElement.of(userName);
+ } catch (IOException e) {
+ throw new UncheckedIOException(e);
+ }
+ });
+ break;
+ default:
+ throw new RuntimeException("Unexpected key element " + type.name());
+ }
+ }
+ for (String key : confElements.keySet()) {
+ ConfElement element = ConfElement.of(key, confElements.get(key));
+ suppliers.add(() -> element);
+ }
+ return suppliers.build();
+ }
+
+ @Value.Immutable
Review Comment:
I'm strugling to see the value of adding these immutable value, although I
may be missing something. Why not just use String?
##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -119,6 +119,26 @@ private CatalogProperties() {}
"client.pool.cache.eviction-interval-ms";
public static final long CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS_DEFAULT =
TimeUnit.MINUTES.toMillis(5);
+ /**
+ * A comma separated list of elements that are used to compose the key of
the client pool cache.
+ *
+ * <p>The following elements are supported:
+ *
+ * <ul>
+ * <li>URI - as specified by {@link CatalogProperties#URI}. URI will be
the only element when
Review Comment:
I think URI should be removed here as it shouldn't be configurable, as it is
always there.
Maybe we could mention in the javadoc that uri is always used , no matter
what configurable keys the user passes in? Something like:
```
A comma separated list of elements used, in addition to the hive metastore
uri, to compose the key of the client pool cache.
```
##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -119,6 +119,26 @@ private CatalogProperties() {}
"client.pool.cache.eviction-interval-ms";
public static final long CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS_DEFAULT =
TimeUnit.MINUTES.toMillis(5);
+ /**
+ * A comma separated list of elements that are used to compose the key of
the client pool cache.
+ *
+ * <p>The following elements are supported:
+ *
+ * <ul>
+ * <li>URI - as specified by {@link CatalogProperties#URI}. URI will be
the only element when
+ * this property is not set.
+ * <li>UGI - the Hadoop UserGroupInformation instance that represents the
current user using the
+ * cache.
+ * <li>USER_NAME - similar to UGI but only includes the user's name
determined by
+ * UserGroupInformation#getUserName.
+ * <li>CONF - name of an arbitrary configuration. The value of the
configuration will be
Review Comment:
Can we also add a example what the user can pass in here?
Also, what do you think about make the list here show lower-case, to match
what you wrote (ie, there is CONF but then you mention conf.a.b.c)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]