Hi All,
Right now Tika can help with reading from files in many formats.
Would it make sense to consider a new project, which would help users
write into a user-preferred format, using a builder API ?
I realize different formats have different capabilities but I reckon
some minimalistic API can be created which will work for many mainstream
formats, with some of the API methods becoming optional for some other
formats.
Example,
TikaWriter writer = TikaWriter.newInstance(Formats.PDF)
// start
writer.header("someheader").tableofContent(new TableOfContents(...));
// body
writer.asTable(new String[][]{...});
writer.attachment(new Image());
...
I guess this can become quite complicated, but may be one can start with
some easy Writer API which can be easily mapped to PDF/etc and then take
it from there, slowly adding more methods, etc...
Just one idea for 2018 and new 2.0 master :-)
Thanks, Sergey
P.S I saw some related discussion at the Beam dev, about writing to
different formats, and thought, may be something like that can be done
for Tika, which might be of help in general but also complement in time
the Beam TikaIO (which can only read at the moment)