Hi All,

Right now Tika can help with reading from files in many formats.

Would it make sense to consider a new project, which would help users write into a user-preferred format, using a builder API ?

I realize different formats have different capabilities but I reckon some minimalistic API can be created which will work for many mainstream formats, with some of the API methods becoming optional for some other formats.

Example,

TikaWriter writer = TikaWriter.newInstance(Formats.PDF)
// start
writer.header("someheader").tableofContent(new TableOfContents(...));
// body
writer.asTable(new String[][]{...});
writer.attachment(new Image());
...

I guess this can become quite complicated, but may be one can start with some easy Writer API which can be easily mapped to PDF/etc and then take it from there, slowly adding more methods, etc...

Just one idea for 2018 and new 2.0 master :-)

Thanks, Sergey

P.S I saw some related discussion at the Beam dev, about writing to different formats, and thought, may be something like that can be done for Tika, which might be of help in general but also complement in time the Beam TikaIO (which can only read at the moment)


Reply via email to