[
https://issues.apache.org/jira/browse/TIKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2720:
------------------------------
Fix Version/s: (was: 2.0.0)
2.0.0-BETA
> A parser to output universal sentence encodings to text
> -------------------------------------------------------
>
> Key: TIKA-2720
> URL: https://issues.apache.org/jira/browse/TIKA-2720
> Project: Tika
> Issue Type: New Feature
> Components: tika-dl
> Reporter: Thejan Wijesinghe
> Priority: Major
> Fix For: 2.0.0-BETA
>
>
> This parser encodes a text into high dimensional vectors that can be used for
> text classification, semantic similarity, clustering and other natural
> language tasks. The model is trained and optimized for greater-than-word
> length text, such as sentences, phrases or short paragraphs. It is trained on
> a variety of data sources and a variety of tasks with the aim of dynamically
> accommodating a wide variety of natural language understanding tasks. The
> input is variable length English text and the output is a 512 dimensional
> vector.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)