On Tue, 29 Oct 2019 at 16:49, Alistiar <alist...@seznam.cz> wrote:
>
> Hello,
>
> I was looking at your .pdf search tool (library) that allows data extraction 
> from .pdf documents and I’d like to ask about its features: My intention is 
> to implement your library (APIs) in C++, while my requirement is following: 
> to search/count keyword/s from a multiple .pdf files at the same time as well 
> as counting all words (also, I’d like to ask whether it’s possible to make an 
> exception for prepositions, conjunctions in a way that they are not part of 
> the final word count. I suppose this function won’t be available directly in 
> any APIs, so my question would be - whether is it possible to extend the APIs 
> or write my own function/method that would be used to eliminate particular 
> words/sentences or extend PoDoFo APIs functionality in any way)?
>

The mentioned tools/podofotxtextract is a starting point but it's
currently more like a sample project because it has very limited
functionality. To achieve what you need it basically requires you to
understand and implement the PDF specification regarding PostScript
streams because PDF is very raw presentation format and doesn't
require to encode semantic concepts like concept of separated word:
your "word" could end be represented by separate characters like in
"w-o-r-d", each character at different coordinates and with a
different font specified, and believe it: it happens extremely often
in real world. Nevertheless is doable and good results can be achieved
in a 2 weeks project (depending on your confidence with working with a
C++ library, your confidence with the PDF specification and the
variety of PDF you have to support), and I know because I did
something similar (closed source, sorry).

>
> Just to check: I suppose that the library is fully compatible with Windows 10 
> and that it should be fully supported (using APIs and such) in C++ as it was 
> written in C++ (I’ve read that on your website, but I just want to make sure 
> that I haven’t missed anything)?
>

It compiles perfectly fine with Visual Studio 2017.

> Also, is your SW freeware even for a commercial use?
>

It's LGPL 2.0[1] for the library part and it allows to use in
commercial products respecting the terms of the license. If you intend
to not disclose source of your product, be careful to not derive your
work from verbatim tools source (including podofotxtextract) because
they are GPL 2.0.

Cheers,
Francesco

[1] https://sourceforge.net/p/podofo/code/HEAD/tree/podofo/trunk/COPYING.LIB
[2] https://tldrlegal.com/license/gnu-general-public-license-v2


_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to