I think the idea needs to be considerably more exciting to attract students - nobody want’s to fix the bugs that even we don’t want to fix!
There are some interesting users of PDFBox, see http://pdfliberation.wordpress.com/ for some possible ideas… lots of people using OCR there too. > PDFBOX-1594 Add support for AES256 Encryption Seems like a reasonable project. -- John On 29 Jan 2014, at 17:28, Fred Hansen <zweibie...@yahoo.com> wrote: > > IMHO a task for GSoC should be non-critical, localized, and not a user > interface. A "non-critical" is one where PDFBOX development can continue > without relying on the project result. A "localized" project is one that can > be incorporated into the code base with few changes to the base. This will > limit the effort required to learn about the system into which the effort > will fit. A "user-interface" implements an interactive window or an API. I > have low expectations of the capabilities of students for doing good designs > in these areas. > > So I looked through JIRA for open projects meeting the above. Since I am not > all that familiar with PDFBOX, some of my suggestions may be laughable and > surely I have missed some. Nonetheless, here's what I found: > > > PDFBOX-553 writing pdf file in Japanese, garbled > PDFBOX-570 Windings font recognition + spacing issue > PDFBOX-605 Better support for Type0 fonts > PDFBOX-678 Support missing Text Rendering Modes when rendering a PDF > PDFBOX-870 PDF-To-IMAGE output is not anti-aliased > PDFBOX-1094 Pattern colorspace support > PDFBOX-1594 Add support for AES256 Encryption > (see also PDFBOX-1450 document how to encrypt with AES 256 ) > PDFBOX-1734 ImageIoUtil.WriteImage doesn't work with tiff images > PDFBOX-1843 Find a way to test PDFToImage > > > > >> ________________________________ >> From: John Hewson <j...@jahewson.com> >> To: "dev@pdfbox.apache.org" <dev@pdfbox.apache.org> >> Sent: Wednesday, January 29, 2014 6:38 PM >> Subject: Re: [DISCUSS] GSoC Participation >> >> >>> - an idea which came up some years ago, was to implement a gui-interface to >>> bundle some/all/future tools/features of pdfbox, like printing, rendering, >>> preflight, split, merge etc. >> >> The AWT/Swing PDF viewer could do with rewriting. But does anyone want that? >> Maybe support for JavaFX? >> >>> - a high-level api to create pdfs >> >> I've been thinking about this recently and have come to the conclusion that >> it's really hard to do well. >> >>> - an advanced text extractor with table/column support >> >> The table stuff sounds a lot like Tabula? Do we really not have column >> support? We need that! >> >> I'll throw in some ideas too: >> >> - an interface for OCR engines to plug into the text extraction API. It >> could provide access to extracted images or allow badly encoded fonts to be >> passed to OCR one character or text run at a time. >> >> - >> >> -- John >> >> >>> On 29 Jan 2014, at 03:20, Andreas Lehmkühler <andr...@lehmi.de> wrote: >>> >>> Hi, >>> >>>> Maruan Sahyoun <sahy...@fileaffairs.de> hat am 29. Januar 2014 um 10:44 >>>> geschrieben: >>>> >>>> >>>> Hi >>>> >>>> shall we try to participate at GSoC? Needs a mentor though. >>> That idea already came up from time to time and it didn't work for different >>> reasons. >>> >>> So, to participate we need a mentor and or course at least one good idea to >>> pe >>> proposed. >>> >>> I won't act as mentor for different reasons but I'll try to help in the >>> normal >>> manner. >>> >>> IMO an appropriate idea shall not deal with pdf-specific low-level features, >>> like linearization support, as I doubt that any possible student is familiar >>> with the pdf-spec. >>> >>> So possible ideas could be: >>> >>> - an idea which came up some years ago, was to implement a gui-interface to >>> bundle some/all/future tools/features of pdfbox, like printing, rendering, >>> preflight, split, merge etc. >>> - a high-level api to create pdfs >>> - an advanced text extractor with table/column support >>> >>> >>>> BR >>>> >>>> Maruan Sahyoun >>> >>> BR >>> Andreas Lehmkühler >>