> - an idea which came up some years ago, was to implement a gui-interface to
> bundle some/all/future tools/features of pdfbox, like printing, rendering,
> preflight, split, merge etc.

The AWT/Swing PDF viewer could do with rewriting. But does anyone want that? 
Maybe support for JavaFX?

> - a high-level api to create pdfs

I've been thinking about this recently and have come to the conclusion that 
it's really hard to do well.

> - an advanced text extractor with table/column support

The table stuff sounds a lot like Tabula? Do we really not have column support? 
We need that!

I'll throw in some ideas too:

- an interface for OCR engines to plug into the text extraction API. It could 
provide access to extracted images or allow badly encoded fonts to be passed to 
OCR one character or text run at a time.

- 

-- John

> On 29 Jan 2014, at 03:20, Andreas Lehmkühler <[email protected]> wrote:
> 
> Hi,
> 
>> Maruan Sahyoun <[email protected]> hat am 29. Januar 2014 um 10:44
>> geschrieben:
>> 
>> 
>> Hi
>> 
>> shall we try to participate at GSoC? Needs a mentor though.
> That idea already came up from time to time and it didn't work for different
> reasons.
> 
> So, to participate we need a mentor and or course at least one good idea to pe
> proposed.
> 
> I won't act as mentor for different reasons but I'll try to help in the normal
> manner.
> 
> IMO an appropriate idea shall not deal with pdf-specific low-level features,
> like linearization support, as I doubt that any possible student is familiar
> with the pdf-spec.
> 
> So possible ideas could be:
> 
> - an idea which came up some years ago, was to implement a gui-interface to
> bundle some/all/future tools/features of pdfbox, like printing, rendering,
> preflight, split, merge etc.
> - a high-level api to create pdfs
> - an advanced text extractor with table/column support
> 
> 
>> BR
>> 
>> Maruan Sahyoun
> 
> BR
> Andreas Lehmkühler

Reply via email to