[
https://issues.apache.org/jira/browse/TIKA-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045611#comment-18045611
]
ASF GitHub Bot commented on TIKA-4576:
--------------------------------------
Copilot commented on code in PR #2460:
URL: https://github.com/apache/tika/pull/2460#discussion_r2624517466
##########
tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/config/InMemoryConfigStore.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes.core.config;
+
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+
+import org.apache.tika.plugins.ExtensionConfig;
+
+/**
+ * Default in-memory implementation of {@link ConfigStore} using a {@link
ConcurrentHashMap}.
+ * Thread-safe and suitable for single-instance deployments.
+ */
+public class InMemoryConfigStore implements ConfigStore {
+
+ private final ConcurrentHashMap<String, ExtensionConfig> store = new
ConcurrentHashMap<>();
+
+ @Override
+ public void put(String id, ExtensionConfig config) {
+ store.put(id, config);
+ }
+
+ @Override
+ public ExtensionConfig get(String id) {
+ return store.get(id);
+ }
+
+ @Override
+ public boolean containsKey(String id) {
+ return store.containsKey(id);
+ }
+
+ @Override
+ public Set<String> keySet() {
+ return store.keySet();
Review Comment:
The keySet() method returns the live keyset from ConcurrentHashMap, which
can throw ConcurrentModificationException during iteration if the map is
modified concurrently. The LoggingConfigStore implementation returns an
immutable copy using Set.copyOf() at line 65, which is the safer approach.
Consider returning Set.copyOf(store.keySet()) to ensure consistency across
implementations and prevent potential ConcurrentModificationExceptions when the
returned set is iterated in AbstractComponentManager.getComponent() at line 328.
```suggestion
return Set.copyOf(store.keySet());
```
##########
tika-pipes/tika-pipes-core/src/test/java/org/apache/tika/pipes/core/config/LoggingConfigStore.java:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes.core.config;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.tika.plugins.ExtensionConfig;
+
+/**
+ * Example custom ConfigStore implementation for demonstration purposes.
+ * This implementation logs all operations and could be extended to add
+ * persistence, caching, or other custom behavior.
+ */
+public class LoggingConfigStore implements ConfigStore {
+
+ private final Map<String, ExtensionConfig> store = new HashMap<>();
+
+ @Override
+ public void put(String id, ExtensionConfig config) {
+ System.out.println("ConfigStore: Storing config with id=" + id);
+ synchronized (store) {
+ store.put(id, config);
+ }
+ }
+
+ @Override
+ public ExtensionConfig get(String id) {
+ synchronized (store) {
+ ExtensionConfig config = store.get(id);
+ if (config != null) {
+ System.out.println("ConfigStore: Retrieved config with id=" +
id);
+ } else {
+ System.out.println("ConfigStore: Config not found for id=" +
id);
Review Comment:
Using System.out.println for logging is not a best practice in production
code, even in test utilities. Consider using SLF4J Logger instead (already
available in the project) to maintain consistency with the rest of the codebase
and allow for proper log level configuration. This would make the logging
functionality more useful for demonstration and debugging purposes.
##########
tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/config/ConfigStore.java:
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes.core.config;
+
+import java.util.Set;
+
+import org.apache.tika.plugins.ExtensionConfig;
+
+/**
+ * Interface for storing and retrieving component configurations.
+ * Implementations can provide different storage backends (in-memory,
database, distributed cache, etc.).
+ */
+public interface ConfigStore {
+
+ /**
+ * Stores a configuration.
+ *
+ * @param id the configuration ID
+ * @param config the configuration to store
+ */
+ void put(String id, ExtensionConfig config);
+
+ /**
+ * Retrieves a configuration by ID.
+ *
+ * @param id the configuration ID
+ * @return the configuration, or null if not found
+ */
+ ExtensionConfig get(String id);
+
+ /**
+ * Checks if a configuration exists.
+ *
+ * @param id the configuration ID
+ * @return true if the configuration exists
Review Comment:
The ConfigStore interface should document the expected behavior when null
values are passed to put() or get() methods. Should put(id, null) be allowed?
What should get(null) return? Clear documentation of these edge cases helps
ensure consistent behavior across implementations.
```suggestion
* @param id the configuration ID. Implementations should specify
whether {@code null} is allowed.
* By default, passing {@code null} for {@code id} is not
supported and may result in a
* {@link NullPointerException}.
* @param config the configuration to store. Implementations should
specify whether {@code null}
* is allowed. By default, passing {@code null} for {@code
config} is not supported
* and may result in a {@link NullPointerException}.
* @throws NullPointerException if {@code id} or {@code config} is
{@code null} and nulls are not supported
*/
void put(String id, ExtensionConfig config);
/**
* Retrieves a configuration by ID.
*
* @param id the configuration ID. Implementations should specify
whether {@code null} is allowed.
* By default, passing {@code null} for {@code id} is not
supported and may result in a
* {@link NullPointerException}.
* @return the configuration, or {@code null} if not found
* @throws NullPointerException if {@code id} is {@code null} and nulls
are not supported
*/
ExtensionConfig get(String id);
/**
* Checks if a configuration exists.
*
* @param id the configuration ID. Implementations should specify
whether {@code null} is allowed.
* By default, passing {@code null} for {@code id} is not
supported and may result in a
* {@link NullPointerException}.
* @return true if the configuration exists
* @throws NullPointerException if {@code id} is {@code null} and nulls
are not supported
```
##########
tika-pipes/tika-pipes-core/src/test/java/org/apache/tika/pipes/core/config/ConfigStoreTest.java:
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes.core.config;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+import static org.junit.jupiter.api.Assertions.assertNull;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+import org.junit.jupiter.api.Test;
+
+import org.apache.tika.plugins.ExtensionConfig;
+
+public class ConfigStoreTest {
+
+ @Test
+ public void testInMemoryConfigStore() {
+ ConfigStore store = new InMemoryConfigStore();
+
+ ExtensionConfig config1 = new ExtensionConfig("id1", "type1",
"{\"key\":\"value\"}");
+ ExtensionConfig config2 = new ExtensionConfig("id2", "type2",
"{\"key2\":\"value2\"}");
+
+ // Test put and get
+ store.put("id1", config1);
+ store.put("id2", config2);
+
+ assertNotNull(store.get("id1"));
+ assertEquals("id1", store.get("id1").id());
+ assertEquals("type1", store.get("id1").name());
+
+ assertNotNull(store.get("id2"));
+ assertEquals("id2", store.get("id2").id());
+
+ // Test containsKey
+ assertTrue(store.containsKey("id1"));
+ assertTrue(store.containsKey("id2"));
+ assertFalse(store.containsKey("id3"));
+
+ // Test size
+ assertEquals(2, store.size());
+
+ // Test keySet
+ assertEquals(2, store.keySet().size());
+ assertTrue(store.keySet().contains("id1"));
+ assertTrue(store.keySet().contains("id2"));
+
+ // Test get non-existent
+ assertNull(store.get("nonexistent"));
+ }
+
+ @Test
+ public void testConfigStoreThreadSafety() throws InterruptedException {
+ ConfigStore store = new InMemoryConfigStore();
+ int numThreads = 10;
+ int numOperationsPerThread = 100;
+
+ Thread[] threads = new Thread[numThreads];
+ for (int i = 0; i < numThreads; i++) {
+ final int threadId = i;
+ threads[i] = new Thread(() -> {
+ for (int j = 0; j < numOperationsPerThread; j++) {
+ String id = "thread" + threadId + "_config" + j;
+ ExtensionConfig config = new ExtensionConfig(id, "type",
"{}");
+ store.put(id, config);
+ assertNotNull(store.get(id));
+ }
+ });
+ threads[i].start();
+ }
+
+ for (Thread thread : threads) {
+ thread.join();
+ }
+
+ assertEquals(numThreads * numOperationsPerThread, store.size());
+ }
+}
Review Comment:
The test class LoggingConfigStore is never actually tested or used. Consider
adding a test method in ConfigStoreTest that verifies LoggingConfigStore works
correctly, to ensure custom implementations of ConfigStore follow the expected
contract. This would also serve as an example for users implementing their own
ConfigStore.
##########
tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/AbstractComponentManager.java:
##########
@@ -194,11 +212,11 @@ public T getComponent(String id) throws IOException,
TikaException {
}
// Check if config exists
- ExtensionConfig config = componentConfigs.get(id);
+ ExtensionConfig config = configStore.get(id);
if (config == null) {
throw createNotFoundException(
"Can't find " + getComponentName() + " for id=" + id +
- ". Available: " + componentConfigs.keySet());
+ ". Available: " + configStore.keySet());
Review Comment:
The call to configStore.keySet() in an error message could be inefficient if
a custom ConfigStore implementation performs expensive operations for keySet().
Consider caching the keySet result in a local variable before constructing the
error message, or documenting that ConfigStore.keySet() should be an
inexpensive operation.
##########
tika-pipes/tika-pipes-core/src/main/java/org/apache/tika/pipes/core/config/ConfigStore.java:
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.pipes.core.config;
+
+import java.util.Set;
+
+import org.apache.tika.plugins.ExtensionConfig;
+
+/**
+ * Interface for storing and retrieving component configurations.
+ * Implementations can provide different storage backends (in-memory,
database, distributed cache, etc.).
Review Comment:
The ConfigStore interface documentation should specify thread-safety
requirements. Since InMemoryConfigStore is documented as "Thread-safe" and uses
ConcurrentHashMap, the interface should clarify whether implementations must be
thread-safe or if it's optional. This is important for users implementing
custom ConfigStore backends.
```suggestion
* Implementations can provide different storage backends (in-memory,
database, distributed cache, etc.).
* <p>
* <b>Thread-safety:</b> Implementations of this interface <i>may</i> or
<i>may not</i> be thread-safe.
* If an implementation is thread-safe, it should be clearly documented as
such.
* Callers must not assume thread-safety unless it is explicitly documented
by the implementation.
* The default in-memory implementation ({@code InMemoryConfigStore}) is
thread-safe.
```
> Create pluggable storage interface for Fetcher components with in-memory
> implementation
> ----------------------------------------------------------------------------------------
>
> Key: TIKA-4576
> URL: https://issues.apache.org/jira/browse/TIKA-4576
> Project: Tika
> Issue Type: Sub-task
> Reporter: Nicholas DiPiazza
> Priority: Major
>
> when fetchers are stored, right now they are stored in memory within a sync
> hashmap.
> i would like to create an interface for storage of fetchers. then
> implement the interface with an InMemoryStore.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)