Tim Allison created TIKA-4665:
---------------------------------
Summary: Add chunking and inference handling poc in 4.x
Key: TIKA-4665
URL: https://issues.apache.org/jira/browse/TIKA-4665
Project: Tika
Issue Type: Task
Reporter: Tim Allison
We should offer basic chunking (based on markdown) and basic integration with
the openai spec for inference so that we can do all the work and then emit the
parsed text+metadata+chunks+vectors.
In some ways, this modernizes the deeplearning4j modules that we no longer have
in 4.x. Obv, the capability is entirely different, but I think we should leave
room for these types of PoC integrations. This integration at least will be
exceedingly light because it relies on external inference services. We will not
be downloading gigs of model files. :D
--
This message was sent by Atlassian Jira
(v8.20.10#820010)