Hello PoDoFo devs and users,

I have a quick update on the pdfmm/PoDoFo-next merge status:
- I checked in the text extraction API, with a small test[1];
- I checked in a full review/revamp of the IO subsystem.

The latter review has been more complex than I initially thought, but
eventually successful. Basically the aim for the IO system review was
cleaning it and making it simpler and more powerful at the same time:
one of the issues of the previous hierarchy was for example that
PdfInputDevice and PdfOutputDevice weren't inheriting PdfInputStream
and PdfOutputStream respectively, needing adapter classes to
interchange instances. Also the naming choices were sometimes weak, as
for example PdfOutputDevice had in fact a read-write contract. The new
hierarchy inspires from C++, .NET hierarchies and tries to take the
best from all worlds: non overlapping Read/Write contracts/interfaces
exist but most implementers for these just inherits a merged
Read/Write StreamDevice[2] class similar to the Stream class in
.NET[3]. This is a major cleaning/simplification as it makes the
hierarchy easier/more balanced, as there is no requirement to
have specialized implementations for all the Read/Write/ReadWrite
combinations, a lot of those were lacking previously making the API
looking incomplete. This is trading a bit of type enforcement (which
anyway wasn't fully enforced before) to have less implementations to
maintain. Specialized Read/Write only implementations are still
possible and few notable examples exist (eg. PdfCanvasInputDevice).
Attached is the UML diagram of the new hierarchy: the naming choices
has been very carefully weighted so that the name of the classes are
not excessively long (the "Pdf" prefix only for these classes has been
sacrificed, also because they are very generic use). Even thought I
have quite some experience with big refactors, it's always surprising
how long it takes to do accomplish those: I reached the current model
after more than 8 iterations/reversals.

With these news, my list of TODOs is shortening quickly[4]. A couple
of medium API reviews plus the porting of the tools and I will be
ready to integrate into PoDoFo-next. The plan is still this summer,
hopefully before the end of August.

Cheers,
Francesco

[1] 
https://github.com/pdfmm/pdfmm/blob/f2be85e365a186f51fd13147cc6a0f1bc6ce0aa6/test/unit/TextExtraction.cpp#L15
[2] 
https://github.com/pdfmm/pdfmm/blob/675c03a872c0d8969ae5e123940c88712107a03b/src/pdfmm/base/PdfStreamDevice.h#L27
[3] https://docs.microsoft.com/en-us/dotnet/api/system.io.stream?view=net-6.0
[4] https://github.com/pdfmm/pdfmm/blob/master/TODO.md
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to