Hisoka-X commented on code in PR #9862: URL: https://github.com/apache/seatunnel/pull/9862#discussion_r2368674895
########## docs/en/transform-v2/tikadocument.md: ########## @@ -0,0 +1,257 @@ +# TikaDocument + +> TikaDocument Transform Plugin + +## Description + +The `TikaDocument` transform plugin uses Apache Tika to extract text content and metadata from various document formats including PDF, Microsoft Office documents (Word, Excel, PowerPoint), plain text, HTML, XML, and many other file formats. This transform converts binary document data into structured text content and metadata fields. + +The plugin supports comprehensive error handling, content processing options, and can handle both binary data and Base64-encoded document content. Review Comment: Let's link to tike docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
