adityamparikh opened a new pull request, #87:
URL: https://github.com/apache/solr-mcp/pull/87

   ## Summary
   
   Adds a new MCP tool `index-file-document` that enables users to upload files 
of any format (PDF, Word, Excel, PowerPoint, etc.) through their AI chat client 
and have the content indexed into Solr for full-text search.
   
   Closes https://github.com/apache/solr-mcp/issues/69
   
   ## How it works
   
   When a user uploads a file in an AI chat client like Claude Desktop, the 
**client** handles text extraction — not the MCP server. Here's the flow:
   
   1. **User uploads a file** (e.g., `report.pdf`) in Claude Desktop
   2. **Claude Desktop extracts text** from the PDF before Claude ever sees it 
— Claude receives the readable text content, not the raw binary bytes
   3. **Claude calls the `index-file-document` tool**, passing the extracted 
text as `content` and the original filename (`report.pdf`) as `filename`
   4. **The MCP server indexes** a SolrInputDocument with `id` (auto-generated 
UUID), `content` (the full text), and `filename` (for filtering/display)
   5. **User can now search** over the indexed content using existing search 
tools
   
   This design means no binary parsing library (Tika, Docling, etc.) is needed 
on the server side — the AI chat client already does the heavy lifting of text 
extraction before invoking MCP tools. This keeps the server lightweight and 
avoids ~100MB of transitive dependencies.
   
   ### Tool signature
   
   ```
   index-file-document(collection, content, filename)
   ```
   
   | Parameter | Description |
   |-----------|-------------|
   | `collection` | Solr collection to index into |
   | `content` | Text content extracted from the file by the chat client |
   | `filename` | Original filename with extension (e.g. `report.pdf`) — stored 
as a searchable field |
   
   ## Changes
   
   - **`FileDocumentCreator`** (new) — `@Component` that creates a 
`SolrInputDocument` with `id`, `content`, and `filename` fields. Does not 
implement `SolrDocumentCreator` because it requires a filename parameter in 
addition to content.
   - **`IndexingDocumentCreator`** — Added `FileDocumentCreator` dependency and 
`createSchemalessDocumentsFromFile()` delegation method
   - **`IndexingService`** — New `indexFileDocument()` MCP tool with 
`@PreAuthorize("isAuthenticated()")`
   - **`AGENTS.md`** — Updated architecture docs
   
   ## Test plan
   
   - [x] `FileDocumentCreatorTest` — 9 unit tests: valid input, 
null/empty/blank content, null/empty filename, oversized content, unique IDs, 
multiline content
   - [x] `FileIndexingTest` — Spring Boot integration test through 
`IndexingDocumentCreator`
   - [x] `IndexingServiceTest` — 2 new Testcontainers integration tests 
verifying index-then-search round-trip (search by content, search by filename)
   - [x] `IndexingServiceTest.UnitTests` — 2 new mocked unit tests for the MCP 
tool method
   - [x] Existing test constructors updated for new `FileDocumentCreator` 
parameter
   - [x] `./gradlew build` passes with all tests green
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to