Tim Allison created TIKA-4665:
---------------------------------

             Summary: Add chunking and inference handling poc in 4.x
                 Key: TIKA-4665
                 URL: https://issues.apache.org/jira/browse/TIKA-4665
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


We should offer basic chunking (based on markdown) and basic integration with 
the openai spec for inference so that we can do all the work and then emit the 
parsed text+metadata+chunks+vectors.

In some ways, this modernizes the deeplearning4j modules that we no longer have 
in 4.x. Obv, the capability is entirely different, but I think we should leave 
room for these types of PoC integrations. This integration at least will be 
exceedingly light because it relies on external inference services. We will not 
be downloading gigs of model files. :D



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to