[ 
https://issues.apache.org/jira/browse/TIKA-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059504#comment-18059504
 ] 

Hudson commented on TIKA-4665:
------------------------------

SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk17 #1215 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/1215/])
TIKA-4665-inference-module (#2613) (github: 
[https://github.com/apache/tika/commit/315ed1ebef6e8f9469de328f7285653221f9af10])
* (add) tika-parsers/tika-parsers-ml/tika-inference/pom.xml
* (add) 
docs/modules/ROOT/pages/migration-to-4x/inference-handler-requirements.adoc
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/MarkdownChunker.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/ChunkSerializer.java
* (edit) docs/modules/ROOT/nav.adoc
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/AbstractEmbeddingFilter.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/locator/PaginatedLocator.java
* (edit) docs/modules/ROOT/pages/migration-to-4x/design-notes-4x.adoc
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/test/java/org/apache/tika/inference/ChunkSerializerTest.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/test/java/org/apache/tika/inference/VectorSerializerTest.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/locator/TextLocator.java
* (edit) tika-parsers/tika-parsers-ml/pom.xml
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/OpenAIEmbeddingFilter.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/test/java/org/apache/tika/inference/OpenAIEmbeddingFilterTest.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/locator/Locators.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/test/java/org/apache/tika/inference/MarkdownChunkerTest.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/ImageEmbeddingConfig.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/locator/SpatialLocator.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/Chunk.java
* (add) docs/modules/ROOT/pages/migration-to-4x/chunk-strategies.adoc
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/locator/TemporalLocator.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/OpenAIImageEmbeddingParser.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/test/java/org/apache/tika/inference/OpenAIImageEmbeddingParserTest.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/VectorSerializer.java
* (add) 
tika-parsers/tika-parsers-ml/tika-inference/src/main/java/org/apache/tika/inference/InferenceConfig.java


> Add chunking and inference handling poc in 4.x
> ----------------------------------------------
>
>                 Key: TIKA-4665
>                 URL: https://issues.apache.org/jira/browse/TIKA-4665
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Tim Allison
>            Priority: Major
>
> We should offer basic chunking (based on markdown) and basic integration with 
> the openai spec for inference so that we can do all the work and then emit 
> the parsed text+metadata+chunks+vectors.
> In some ways, this modernizes the deeplearning4j modules that we no longer 
> have in 4.x. Obv, the capability is entirely different, but I think we should 
> leave room for these types of PoC integrations. This integration at least 
> will be exceedingly light because it relies on external inference services. 
> We will not be downloading gigs of model files. :D



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to