[ 
https://issues.apache.org/jira/browse/TIKA-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas DiPiazza updated TIKA-4579:
------------------------------------
    Description: 
The fetcher and emitter managers need the ability to save/update configurations 
at runtime.

## Background
The TikaGrpcServer currently uses reflection hacks to update fetcher 
configurations because the FetcherManager.saveFetcher() method threw an 
exception when trying to save a fetcher with an ID that already exists.

## Use Case
A practical scenario for this functionality:
1. tika-grpc server starts with no fetcher configs in the tika-config (blank 
slate)
2. Users call the saveFetcher gRPC method to create new fetcher configurations
3. Users can then use those fetchers
4. Users may need to update/modify existing fetcher configurations

## Solution Implemented
Modified the AbstractComponentManager.saveComponent() method to support both 
creating new and updating existing component configurations:

### Changes Made:
1. **AbstractComponentManager.saveComponent()** - Changed behavior from 
throwing exception on duplicate IDs to supporting updates:
   - Removed the duplicate ID check that threw TikaConfigException
   - When updating an existing component, the cached instance is cleared to 
force re-instantiation
   - Added logging to distinguish between creating new configs vs updating 
existing ones

2. **TikaGrpcServerImpl.saveFetcher()** - Removed reflection hack:
   - Deleted the reflection-based code that was forcibly clearing the cache
   - Now simply calls fetcherManager.saveFetcher() which handles updates 
properly

3. **Updated JavaDocs** - Modified documentation for:
   - AbstractComponentManager.saveComponent()
   - FetcherManager.saveFetcher()
   - EmitterManager.saveEmitter()
   - Changed from "adds a component" to "adds or updates a component"
   - Removed mentions of exceptions for duplicate IDs

4. **Updated Tests** - Modified FetcherManagerTest:
   - Changed test from expecting TikaConfigException to verifying update 
behavior
   - Verifies that updating a fetcher clears the cache and creates a new 
instance
   - Ensures the config store contains only one fetcher after update

## Security Note
This "save" functionality stores configurations in-memory only. Since tika-grpc 
is secured via mutual TLS, only authorized users can modify configurations at 
runtime.

## Technical Details
- Component configurations are stored in a Map (configStore)
- Component instances are cached in a separate Map (componentCache)
- When updating an existing config, only the cache is cleared, not the config 
store entry
- The new configuration will be instantiated lazily on next use via 
getComponent()
- Runtime modifications require allowRuntimeModifications=true when loading the 
manager

  was:
the fetcher interface needs the ability to save

although we will only expose this on tika-grpc which is secured via mutal TLS


> Add the ability to save pipes configs
> -------------------------------------
>
>                 Key: TIKA-4579
>                 URL: https://issues.apache.org/jira/browse/TIKA-4579
>             Project: Tika
>          Issue Type: Sub-task
>            Reporter: Nicholas DiPiazza
>            Priority: Major
>
> The fetcher and emitter managers need the ability to save/update 
> configurations at runtime.
> ## Background
> The TikaGrpcServer currently uses reflection hacks to update fetcher 
> configurations because the FetcherManager.saveFetcher() method threw an 
> exception when trying to save a fetcher with an ID that already exists.
> ## Use Case
> A practical scenario for this functionality:
> 1. tika-grpc server starts with no fetcher configs in the tika-config (blank 
> slate)
> 2. Users call the saveFetcher gRPC method to create new fetcher configurations
> 3. Users can then use those fetchers
> 4. Users may need to update/modify existing fetcher configurations
> ## Solution Implemented
> Modified the AbstractComponentManager.saveComponent() method to support both 
> creating new and updating existing component configurations:
> ### Changes Made:
> 1. **AbstractComponentManager.saveComponent()** - Changed behavior from 
> throwing exception on duplicate IDs to supporting updates:
>    - Removed the duplicate ID check that threw TikaConfigException
>    - When updating an existing component, the cached instance is cleared to 
> force re-instantiation
>    - Added logging to distinguish between creating new configs vs updating 
> existing ones
> 2. **TikaGrpcServerImpl.saveFetcher()** - Removed reflection hack:
>    - Deleted the reflection-based code that was forcibly clearing the cache
>    - Now simply calls fetcherManager.saveFetcher() which handles updates 
> properly
> 3. **Updated JavaDocs** - Modified documentation for:
>    - AbstractComponentManager.saveComponent()
>    - FetcherManager.saveFetcher()
>    - EmitterManager.saveEmitter()
>    - Changed from "adds a component" to "adds or updates a component"
>    - Removed mentions of exceptions for duplicate IDs
> 4. **Updated Tests** - Modified FetcherManagerTest:
>    - Changed test from expecting TikaConfigException to verifying update 
> behavior
>    - Verifies that updating a fetcher clears the cache and creates a new 
> instance
>    - Ensures the config store contains only one fetcher after update
> ## Security Note
> This "save" functionality stores configurations in-memory only. Since 
> tika-grpc is secured via mutual TLS, only authorized users can modify 
> configurations at runtime.
> ## Technical Details
> - Component configurations are stored in a Map (configStore)
> - Component instances are cached in a separate Map (componentCache)
> - When updating an existing config, only the cache is cleared, not the config 
> store entry
> - The new configuration will be instantiated lazily on next use via 
> getComponent()
> - Runtime modifications require allowRuntimeModifications=true when loading 
> the manager



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to