Albretch,

This will likely be a two-step process:

1. Extract page images from pdf using a tool such at the one you indicate.
2. Extract subpage components from separate image files, using a tool such
as: https://opensource.com/article/18/5/getting-started-luminoth

Linux terminals makes such processing rather easy.

NOTE that I have not personally used a tool like luminoth so I cannot
comment on its accuracy, but its website pages suggest it may have the
capability to do what you want (well, using the example document you
provided and the examples shown on its website).

Good luck!

Guy Stalnaker
jimmyg...@gmail.com


On Tue, Sep 3, 2019 at 10:53 AM Albretch Mueller via gimp-user-list <
gimp-user-list@gnome.org> wrote:

>  The output of pdfimages would be a whole page image if the input is a
> non-searchable, image-based pdf files. Take for example:
>
>  https://www.nysedregents.org/ushistorygov/Archive/20000126exam.pdf
>
>  which utility would detect the cartoons on page 6 and 7?
>
>  lbrtchx
> gimp-user-list@gnome.org
> _______________________________________________
> gimp-user-list mailing list
> List address:    gimp-user-list@gnome.org
> List membership: https://mail.gnome.org/mailman/listinfo/gimp-user-list
> List archives:   https://mail.gnome.org/archives/gimp-user-list
>
_______________________________________________
gimp-user-list mailing list
List address:    gimp-user-list@gnome.org
List membership: https://mail.gnome.org/mailman/listinfo/gimp-user-list
List archives:   https://mail.gnome.org/archives/gimp-user-list

Reply via email to