[
https://issues.apache.org/jira/browse/TIKA-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicholas DiPiazza updated TIKA-4579:
------------------------------------
Description:
The fetcher and emitter managers need the ability to save/update configurations
at runtime.
h2. Background
The TikaGrpcServer currently uses reflection hacks to update fetcher
configurations because the FetcherManager.saveFetcher() method threw an
exception when trying to save a fetcher with an ID that already exists.
h2. Use Case
A practical scenario for this functionality:
# tika-grpc server starts with no fetcher configs in the tika-config (blank
slate)
# Users call the saveFetcher gRPC method to create new fetcher configurations
# Users can then use those fetchers
# Users may need to update/modify existing fetcher configurations
h2. Solution Implemented
Modified the AbstractComponentManager.saveComponent() method to support both
creating new and updating existing component configurations.
h3. Changes Made:
*AbstractComponentManager.saveComponent()* - Changed behavior from throwing
exception on duplicate IDs to supporting updates:
* Removed the duplicate ID check that threw TikaConfigException
* When updating an existing component, the cached instance is cleared to force
re-instantiation
* Added logging to distinguish between creating new configs vs updating
existing ones
*TikaGrpcServerImpl.saveFetcher()* - Removed reflection hack:
* Deleted the reflection-based code that was forcibly clearing the cache
* Now simply calls fetcherManager.saveFetcher() which handles updates properly
*Updated JavaDocs* - Modified documentation for:
* AbstractComponentManager.saveComponent()
* FetcherManager.saveFetcher()
* EmitterManager.saveEmitter()
* Changed from "adds a component" to "adds or updates a component"
* Removed mentions of exceptions for duplicate IDs
*Updated Tests* - Modified FetcherManagerTest:
* Changed test from expecting TikaConfigException to verifying update behavior
* Verifies that updating a fetcher clears the cache and creates a new instance
* Ensures the config store contains only one fetcher after update
h2. Security Note
This "save" functionality stores configurations in-memory only. Since tika-grpc
is secured via mutual TLS, only authorized users can modify configurations at
runtime.
h2. Technical Details
* Component configurations are stored in a Map (configStore)
* Component instances are cached in a separate Map (componentCache)
* When updating an existing config, only the cache is cleared, not the config
store entry
* The new configuration will be instantiated lazily on next use via
getComponent()
* Runtime modifications require allowRuntimeModifications=true when loading the
manager
was:
The fetcher and emitter managers need the ability to save/update configurations
at runtime.
## Background
The TikaGrpcServer currently uses reflection hacks to update fetcher
configurations because the FetcherManager.saveFetcher() method threw an
exception when trying to save a fetcher with an ID that already exists.
## Use Case
A practical scenario for this functionality:
1. tika-grpc server starts with no fetcher configs in the tika-config (blank
slate)
2. Users call the saveFetcher gRPC method to create new fetcher configurations
3. Users can then use those fetchers
4. Users may need to update/modify existing fetcher configurations
## Solution Implemented
Modified the AbstractComponentManager.saveComponent() method to support both
creating new and updating existing component configurations:
### Changes Made:
1. **AbstractComponentManager.saveComponent()** - Changed behavior from
throwing exception on duplicate IDs to supporting updates:
- Removed the duplicate ID check that threw TikaConfigException
- When updating an existing component, the cached instance is cleared to
force re-instantiation
- Added logging to distinguish between creating new configs vs updating
existing ones
2. **TikaGrpcServerImpl.saveFetcher()** - Removed reflection hack:
- Deleted the reflection-based code that was forcibly clearing the cache
- Now simply calls fetcherManager.saveFetcher() which handles updates
properly
3. **Updated JavaDocs** - Modified documentation for:
- AbstractComponentManager.saveComponent()
- FetcherManager.saveFetcher()
- EmitterManager.saveEmitter()
- Changed from "adds a component" to "adds or updates a component"
- Removed mentions of exceptions for duplicate IDs
4. **Updated Tests** - Modified FetcherManagerTest:
- Changed test from expecting TikaConfigException to verifying update
behavior
- Verifies that updating a fetcher clears the cache and creates a new
instance
- Ensures the config store contains only one fetcher after update
## Security Note
This "save" functionality stores configurations in-memory only. Since tika-grpc
is secured via mutual TLS, only authorized users can modify configurations at
runtime.
## Technical Details
- Component configurations are stored in a Map (configStore)
- Component instances are cached in a separate Map (componentCache)
- When updating an existing config, only the cache is cleared, not the config
store entry
- The new configuration will be instantiated lazily on next use via
getComponent()
- Runtime modifications require allowRuntimeModifications=true when loading the
manager
> Add the ability to save pipes configs
> -------------------------------------
>
> Key: TIKA-4579
> URL: https://issues.apache.org/jira/browse/TIKA-4579
> Project: Tika
> Issue Type: Sub-task
> Reporter: Nicholas DiPiazza
> Priority: Major
>
> The fetcher and emitter managers need the ability to save/update
> configurations at runtime.
> h2. Background
> The TikaGrpcServer currently uses reflection hacks to update fetcher
> configurations because the FetcherManager.saveFetcher() method threw an
> exception when trying to save a fetcher with an ID that already exists.
> h2. Use Case
> A practical scenario for this functionality:
> # tika-grpc server starts with no fetcher configs in the tika-config (blank
> slate)
> # Users call the saveFetcher gRPC method to create new fetcher configurations
> # Users can then use those fetchers
> # Users may need to update/modify existing fetcher configurations
> h2. Solution Implemented
> Modified the AbstractComponentManager.saveComponent() method to support both
> creating new and updating existing component configurations.
> h3. Changes Made:
> *AbstractComponentManager.saveComponent()* - Changed behavior from throwing
> exception on duplicate IDs to supporting updates:
> * Removed the duplicate ID check that threw TikaConfigException
> * When updating an existing component, the cached instance is cleared to
> force re-instantiation
> * Added logging to distinguish between creating new configs vs updating
> existing ones
> *TikaGrpcServerImpl.saveFetcher()* - Removed reflection hack:
> * Deleted the reflection-based code that was forcibly clearing the cache
> * Now simply calls fetcherManager.saveFetcher() which handles updates properly
> *Updated JavaDocs* - Modified documentation for:
> * AbstractComponentManager.saveComponent()
> * FetcherManager.saveFetcher()
> * EmitterManager.saveEmitter()
> * Changed from "adds a component" to "adds or updates a component"
> * Removed mentions of exceptions for duplicate IDs
> *Updated Tests* - Modified FetcherManagerTest:
> * Changed test from expecting TikaConfigException to verifying update behavior
> * Verifies that updating a fetcher clears the cache and creates a new instance
> * Ensures the config store contains only one fetcher after update
> h2. Security Note
> This "save" functionality stores configurations in-memory only. Since
> tika-grpc is secured via mutual TLS, only authorized users can modify
> configurations at runtime.
> h2. Technical Details
> * Component configurations are stored in a Map (configStore)
> * Component instances are cached in a separate Map (componentCache)
> * When updating an existing config, only the cache is cleared, not the config
> store entry
> * The new configuration will be instantiated lazily on next use via
> getComponent()
> * Runtime modifications require allowRuntimeModifications=true when loading
> the manager
--
This message was sent by Atlassian Jira
(v8.20.10#820010)