from:"Maruan Sahyoun"

Re: Website not working correctly when accessed via HTTPS

2014-01-13 Thread Maruan Sahyoun

Hi Arlo,

thanks for your report. But it works fine for me. Which browser are you using?

BR
Maruan Sahyoun

Am 13.01.2014 um 11:47 schrieb Arlo O'Keeffe :

> Hi everyone,
> 
> since the bug tracker is used for only bugs relating to PDFBox itself I am
> posting this here.
> 
> When accessing the project website via HTTPS (https://pdfbox.apache.org/)
> jQuery is not loaded correctly making the cookbook dropdown not work.
> 
> A solution would be to use a protocol relative URL like "//
> code.jquery.com/jquery-latest.js".
> 
> Hopefully this can be fixed easily.
> 
> Greetings,
> 
> Arlo
> 
> -- 
> Arlo O'Keeffe

Re: Website not working correctly when accessed via HTTPS

2014-01-13 Thread Maruan Sahyoun

Hi Arlo,

I fixed the issue in revision 893921 - could you verify if it now works for you?

With kind regards and many thanks for bringing this to our attention.

Maruan Sahyoun


Am 13.01.2014 um 13:09 schrieb Arlo O'Keeffe :

> Hi Maruan,
> 
> I tested this in Chrome Version 32.0.1700.72 m, Firefox 26.0 and Internet
> Explorer 10 on Windows Server 2008 R2 Standard.
> 
> Greetings,
> 
> Arlo
> 
> 
> 
> 
> On Mon, Jan 13, 2014 at 11:58 AM, Maruan Sahyoun 
> wrote:
> 
>> Hi Arlo,
>> 
>> thanks for your report. But it works fine for me. Which browser are you
>> using?
>> 
>> BR
>> Maruan Sahyoun
>> 
>> Am 13.01.2014 um 11:47 schrieb Arlo O'Keeffe :
>> 
>>> Hi everyone,
>>> 
>>> since the bug tracker is used for only bugs relating to PDFBox itself I
>> am
>>> posting this here.
>>> 
>>> When accessing the project website via HTTPS (https://pdfbox.apache.org/
>> )
>>> jQuery is not loaded correctly making the cookbook dropdown not work.
>>> 
>>> A solution would be to use a protocol relative URL like "//
>>> code.jquery.com/jquery-latest.js".
>>> 
>>> Hopefully this can be fixed easily.
>>> 
>>> Greetings,
>>> 
>>> Arlo
>>> 
>>> --
>>> Arlo O'Keeffe
>> 
>> 
> 
> 
> -- 
> Arlo O'Keeffe | Skype: arlolok

Re: [VOTE] Release Apache PDFBox 1.8.4

2014-01-28 Thread Maruan Sahyoun

+1 - thanks for preparing the release. I’ll take care of the docs as soon as 
the vote passes.

BR
Maruan Sahyoun

Am 27.01.2014 um 19:18 schrieb Andreas Lehmkuehler :

> Hi,
> 
> a candidate for the PDFBox 1.8.4 release is available at:
> 
>http://people.apache.org/~lehmi/pdfbox/1.8.4/
> 
> The release candidate is a zip archive of the sources in:
> 
>http://svn.apache.org/repos/asf/pdfbox/tags/1.8.4/
> 
> The SHA1 checksum of the archive is 40ad44d5d7948fb0f4e5eeea038a0caf29488bce.
> 
> Please vote on releasing this package as Apache PDFBox 1.8.4.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PDFBox PMC votes are cast.
> 
>[ ] +1 Release this package as Apache PDFBox 1.8.4
>[ ] -1 Do not release this package because...
> 
> 
> Here is my +1
> 
> BR
> Andreas Lehmkühler

[DISCUSS] GSoC Participation

2014-01-29 Thread Maruan Sahyoun

Hi

shall we try to participate at GSoC? Needs a mentor though.

BR

Maruan Sahyoun

Re: Test PDF for PDJpeg#replaceHeader

2014-02-03 Thread Maruan Sahyoun

+1 to change the class as suggested.

BR

Maruan Sahyoun

> Am 03.02.2014 um 23:55 schrieb John Hewson :
> 
> Looking at the code in replaceHeader, I see that it overwrites the image's 
> width, height, number of components, and sampling factors with its own 
> hardcoded values. Won’t that just break most JPEG files?
> 
> I’d like to remove this code because it doesn’t really seem like an 
> appropriate solution to the problem and we don’t have any test PDFs. If 
> somebody encounters this issue again out in the real world we’ll at least get 
> a test PDF when they open a new issue. I’m incredibly doubtful that this code 
> is being executed at all out in the wild (excluding the one file it was 
> written for). It’s making it difficult for me to refactor PDJpeg.
> 
> -- John
> 
>> On 3 Feb 2014, at 01:34, Timo Boehme  wrote:
>> 
>> Hi,
>> 
>> Am 01.02.2014 22:39, schrieb John Hewson:
>>> Hi All
>>> 
>>> Does anyone have a PDF file which triggers the call to
>>> PDJpeg#replaceHeader? The comment in the code claims that it fixes
>>> JPEGs with malformed “Adobe” headers, but I can’t find anything on
>>> Google about such images. Is this a real issue or a historic ImageIO
>>> bug?
>> 
>> While I do not have such an PDF I've found a discussion about this topic at 
>> stackoverflow:
>> http://stackoverflow.com/questions/7676701/java-jpeg-converter-for-odd-image-types
>> 
>> I don't known if ImageIO was changed to work with strange/malformed JPEG 
>> headers. But I don't think so.
>> Maybe replacing/'fixing' the header should at least trigger a warning 
>> message since it won't be clear if the resulting image is ok, thus one gets 
>> a hint what the reason for a wrong image could have been.
>> 
>> 
>> Best,
>> Timo
>> 
>> 
>> -- 
>> 
>> Timo Boehme
>> OntoChem GmbH
>> H.-Damerow-Str. 4
>> 06120 Halle/Saale
>> T: +49 345 4780474
>> F: +49 345 4780471
>> timo.boe...@ontochem.com
>> 
>> _
>> 
>> OntoChem GmbH
>> Geschäftsführer: Dr. Lutz Weber
>> Sitz: Halle / Saale
>> Registergericht: Stendal
>> Registernummer: HRB 215461
>> _
>

API docs for 1.8.4

2014-02-08 Thread Maruan Sahyoun

Hi,

I’ve added the API docs to the website. Please review these at 
http://pdfbox.staging.apache.org/apidocs/ and let me know if I can publish 
these.

BR
Maruan Sahyoun

Re: Committers using Netbeans

2014-02-08 Thread Maruan Sahyoun

Hi Tilman,

although I’m using a different than yours I’ve setup to commit to 
svn.apache.org wo any issues. So I think your assumption that committing to the 
US server might resolve the issues you are facing is correct.

BR
Maruan Sahyoun

Am 08.02.2014 um 16:46 schrieb Tilman Hausherr :

> Am 08.02.2014 15:53, schrieb Andreas Lehmkuehler:
>>> I'm wondering whether my problem is a configuration problem, an apache 
>>> server
>>> problem, or something else (e.g. a fault in the secret NSA gateway).
>> This is just a guess, but maybe it's a problem with the svn mirroring? The 
>> ASF has an us and an eu located svn. Maybe the us mirror is used for the 
>> commit and the following update runs on the eu mirror which doesn't know the 
>> version yet due to the mirror delay.
>> 
>> Give svn.eu.apache.org instead svn.apache.org a try, maybe this will ease 
>> your issue ... but maybe I'm simply wrong 
> 
> That makes a lot of sense. Plus, after getting a different problem (certs), I 
> found this:
> 
> https://www.apache.org/dev/version-control.html#out-of-sync
> "This may be because of a short lag in the synchronization between Subversion 
> mirrors, and can occur if multiple commits are run immediately after each 
> other. This error will usually only happen if you are located in Europe, or 
> explicitly using the European mirror.
> 
> Waiting for 10 seconds and repeating the command should succeed."
> 
> Sadly it didn't help. And reading the text again, I think the better solution 
> would be to try my next commit on the US server.
> 
> Tilman

Re: API docs for 1.8.4

2014-02-08 Thread Maruan Sahyoun

OK - the API docs went live on revision 896904 of the cmssite production build. 

There are several warnings about errors in the javadoc tags. I’ll address these 
with PDFBox 2.0 and 1.8.5. Will open an issue for that.

BR 
Maruan Sahyoun

Am 08.02.2014 um 16:50 schrieb Andreas Lehmkuehler :

> Hi,
> 
> Am 08.02.2014 16:45, schrieb Maruan Sahyoun:
>> Hi,
>> 
>> I’ve added the API docs to the website. Please review these at 
>> http://pdfbox.staging.apache.org/apidocs/ and let me know if I can publish 
>> these.
> Looks good to me. Thanks for the effort!
> 
>> BR
>> Maruan Sahyoun
> 
> BR
> Andreas Lehmkühler
> 
>

PDFBox documentation

2014-02-08 Thread Maruan Sahyoun

Hi,

could we add „Documentation" to the Components in PDFBox Jira in order to track 
Documentation related issues?

BR
Maruan Sahyoun

PDFBox Website

2014-02-09 Thread Maruan Sahyoun

Hi ..

I highlighted how to get help a little more on the home page as to direct 
people to use the mailing list and not Jira. Please take a look at 
http://pdfbox.staging.apache.org/index.html and let me know what you think.

BR

Maruan Sahyoun

Re: best way for contribution

2014-02-10 Thread Maruan Sahyoun

Hi,

if possible create a ticket and attach a patch to it. There are some hints on 
coding conventions at http://pdfbox.apache.org/codingconventions.html. If it’s 
possible for you we’d appreciate if you could follow these as it makes it 
easier for us to include the patches.

SVN is the VC system we are using as you already found out.

As we do enhance our documentation maybe you could provide some hints for us 
about what’s missing to get these questions answered quickly so we can add that 
to your and others benefit.

Thank you for your interest in making PDFbox better. 

BR
Maruan Sahyoun

Am 10.02.2014 um 14:42 schrieb Jens Kapitza :

> i found pdfbox on github https://github.com/apache/pdfbox
> i can remember that there was a disscussion on how to contribute to the 
> projects (commons or pdfbox)
> 
> 
> i missing the contribution or source link at the website and a hint if it is 
> better to create tickets (how i did it, 
> http://pdfbox.apache.org/building.html  it seams svn is the default VC system 
> ) or to create a pull request at github. (sources in sync?)
>

[DISCUSS] PDFBox and Exception handling

2014-02-13 Thread Maruan Sahyoun

Hi,

what do you think of having an exception handling in pdfbox where people could 
define their own handlers. Something similar to

https://camel.apache.org/exception-clause.html

The benefit would be that we could pass the context e.g. during PDF parsing and 
the handler could return something which is than taken as the input. In 
addition to that maybe we can think about having some additional types of 
exceptions instead of mostly IOException to support that.  

BR
Maruan Sahyoun

Re: [DISCUSS] GSoC Participation

2014-02-13 Thread Maruan Sahyoun

There were several ideas floating around wo a real consensus. From my 
perspective PDFbox would benefit most if missing pieces could be implemented:

- shading types as Tilman suggested
- signature algorithms
- support for different character sets during PDF generation
- PDF optimization e.g. remove duplicate resources when merging PDFs or no 
longer needed ones during splitting a PDF
- PoC work could also be feasible of how to implement different PDF levels and 
standards in a similar, extendable manner.

I’d rather see us completing PDF core features than adding new functionality 
like table recognition, high level PDF creation API or OCR interface although 
these would be very beneficial functionalities.

In addition working on the documentation might be something although not for a 
‚core‘ developer. 

One question which needs to be answered is who would act as a mentor?

BR
Maruan Sahyoun


Am 13.02.2014 um 09:19 schrieb Andreas Lehmkühler :

> Hi,
> 
> for those who are still interested in GSoC, [1] has some information on how
> to participate. According to the mail it's maybe to late, but I would give
> it a try. I've shared a private link, only available to PMC-members, as some
> of the imformation seems to be private.
> 
> BR
> Andreas Lehmkühler
> 
> [1]
> https://mail-search.apache.org/pmc/private-arch/pdfbox-private/201401.mbox/%3c22705bfe-be29-492a-be12-0749317fb...@apache.org%3E
> 
> 
>> John Hewson  hat am 11. Februar 2014 um 23:52 geschrieben:
>> 
>> 
>> The ideas are supposed to be starting points for students to make their own
>> proposal, so give them some ideas for expanding/reducing the scope and they
>> can choose themselves.
>> 
>> -- John
>> 
>> On 11 Feb 2014, at 13:47, Tilman Hausherr  wrote:
>> 
>>> Its unclear what the "size" of a participation must be. What I'd like to
>>> have is someone to implement shading types 6 and 7, and I think it would be
>>> 1-2 weeks of work. This would be perfect for a math student, or a computer
>>> science student who is specializing in graphics. My own math is from school
>>> 30 years ago and we never did Bézier curves, tensor-products and Bernstein
>>> polynomials so I can't do it without learning the math first.
>>> 
>>> Tilman
>>

Re: [DISCUSS] PDFBox and Exception handling

2014-02-13 Thread Maruan Sahyoun

Hi John,

currently pdfbox mostly throws IOExceptions where the user of the lib is not 
able to do something about it. 

Some of these exceptions could occur because a file was not found etc. So 
that’s ok. Others might occur because objects are not at a certain position. 
There are workarounds for some of these in pdfbox e.g. if %%EOF ist not the 
last entry in a PDF. Thus users are dependent on us putting in the workarounds 
to handle such situations. 

Now let’s assume there is a situation where an object is not at a certain 
location, or a specific string is missing …. what if we throw an exception 
where one could register a handler. We pass some kind of context e.g. lexer, 
file position, token …. and the user can handle the exception and „enrich“ the 
content or pass the correct information. The exception is than resolved and the 
process can continue.

In addition to that we are able to extend from a strictly conformant parsing to 
a relaxed parsing by using the same mechanism thus having the workarounds not 
in the ‚core‘ parser.

BR
Maruan Sahyoun

Am 13.02.2014 um 09:44 schrieb John Hewson :

> I'm not sure in understand what you mean, the Camel examples are very complex 
> indeed. A quick concrete example of what you're after would help greatly.
> 
> -- John
> 
>> On 13 Feb 2014, at 00:20, Maruan Sahyoun  wrote:
>> 
>> Hi,
>> 
>> what do you think of having an exception handling in pdfbox where people 
>> could define their own handlers. Something similar to
>> 
>> https://camel.apache.org/exception-clause.html
>> 
>> The benefit would be that we could pass the context e.g. during PDF parsing 
>> and the handler could return something which is than taken as the input. In 
>> addition to that maybe we can think about having some additional types of 
>> exceptions instead of mostly IOException to support that.  
>> 
>> BR
>> Maruan Sahyoun
>>

Re: [DISCUSS] PDFBox and Exception handling

2014-02-13 Thread Maruan Sahyoun

John

Am 13.02.2014 um 18:50 schrieb John Hewson :

> Maruan,
> 
>> Now let’s assume there is a situation where an object is not at a certain 
>> location, or a specific string is missing …. what if we throw an exception 
>> where one could register a handler. We pass some kind of context e.g. lexer, 
>> file position, token …. and the user can handle the exception and „enrich“ 
>> the content or pass the correct information.
> 
> The idea sounds reasonable in theory, but the more I reflect on in the more I 
> think that we should assume that the user is making use of PDFBox because 
> they don’t want to have to parse the PDF file themselves. I can’t think of an 
> example where the knowledge of how to correct some invalid PDF would’t be 
> better off existing within PDFBox itself, rather than in user code.

Of course they don’t want to parse it themselves. They can expect that PDFBox 
can handle a valid PDF file. But in case a file is invalid for whatever reason 
the only options are to either wait until we include a workaround or put it in 
themselves. The idea is to have an entry point. What’s the benefit of an 
exception when one can’t do anything about it.  And if you don’t want to write 
your handler you are not enforced to do so. 
 
> 
> From a technical standpoint, exposing the internal parser context to the user 
> seems particularly problematic: the internal implementation details which are 
> part of the context now become part of PDFBox’s public API which needs to be 
> kept stable between major releases. How is the user to resolve a non-trivial 
> exception and allow parsing to continue in a manner which leaves the 
> internals of the parser in a consistent state? If we don’t know how users are 
> resolving exceptions out in the real world, how can we be sure that changes 
> we make to the parser later won’t break their code?

One can only assume that a documented API is stable. As long as this is the 
case why should it break their code. Of course if a different file is causing a 
similar exception which will be dealt with by the exception handler and the 
code is not able to deal with it ...

> 
>> In addition to that we are able to extend from a strictly conformant parsing 
>> to a relaxed parsing by using the same mechanism thus having the workarounds 
>> not in the ‚core‘ parser.
> 
> 
> My suggestion would be to either subclass the core parser or pass it a 
> “conformance level” argument, e.g. PDF_1_5 or PDF_X. I don’t think any 
> external error handling/recovery mechanism is going to work in practice, 
> especially if that means generating thousands of exceptions when given a bad 
> content stream.
> 

It’s not about supporting different standards - that’s different thing 
(currently PDFBox doesn’t have concept of applying standards or versions - 
functions are either available or not, regardless of when they became part of 
the PDF spec). It’s about having a core which handles conformant files and an 
extension which handles workarounds for nonconformant files. Currently that’s 
all within the code - sometimes marked, sometimes not - which makes it 
difficult to rewrite the parser. As you already found out sometimes a fix was 
made to handle a single occurrence of a file and the file itself might no 
longer exist.


> -- John
> 
> On 13 Feb 2014, at 03:24, Maruan Sahyoun  wrote:
> 
>> Hi John,
>> 
>> currently pdfbox mostly throws IOExceptions where the user of the lib is not 
>> able to do something about it. 
>> 
>> Some of these exceptions could occur because a file was not found etc. So 
>> that’s ok. Others might occur because objects are not at a certain position. 
>> There are workarounds for some of these in pdfbox e.g. if %%EOF ist not the 
>> last entry in a PDF. Thus users are dependent on us putting in the 
>> workarounds to handle such situations. 
>> 
>> Now let’s assume there is a situation where an object is not at a certain 
>> location, or a specific string is missing …. what if we throw an exception 
>> where one could register a handler. We pass some kind of context e.g. lexer, 
>> file position, token …. and the user can handle the exception and „enrich“ 
>> the content or pass the correct information. The exception is than resolved 
>> and the process can continue.
>> 
>> In addition to that we are able to extend from a strictly conformant parsing 
>> to a relaxed parsing by using the same mechanism thus having the workarounds 
>> not in the ‚core‘ parser.
>> 
>> BR
>> Maruan Sahyoun
>> 
>> Am 13.02.2014 um 09:44 schrieb John Hewson :
>> 
>>> I'm not sure in understand what you mean, the Camel examples are very 
>>> complex indeed.

Re: [DISCUSS] PDFBox and Exception handling

2014-02-14 Thread Maruan Sahyoun

hat would mean that every improvement or 
bugfix which changes the result breaks the contract. E.g. lets say that we 
extract additional text, or we no longer extract text that should not have been 
extracted or we render a PDF differently …. 

So I do get and understand you point - I don’t share your view though. 


> 
>> It’s not about supporting different standards […] It’s about having a core 
>> which handles conformant files and an extension which handles workarounds 
>> for nonconformant files. 
> 
> A commonly used approach to parsing programming languages is to have a core 
> language which is small, easily parsed and with an AST which is easy to 
> manipulate. On top of that is another parser which handles all of the 
> syntactic sugar of the language, transforming a complex concrete AST into a 
> simple core AST. Perhaps PDFBox could take a similar approach with 
> ConformingParser having a NonConformingParser subclass which is capable of 
> pre-processing bad PDF files before they reach the core parser. The actual 
> implementation may be more subtle than this, perhaps with some back-and-forth 
> between the conforming and non-conforming parsers, so that when the 
> conforming parser encounters an error it can call a protected method which in 
> ConformingParser would throw an error but in NonConformingParser would 
> perform a recovery, as you proposed. But by using protected methods we avoid 
> the maintainability problem caused by making the error recovery mechanism 
> public.
> 
> What do you think?

This is a good and valid approach, but doesn’t address the intention I had.


> 
> -- John
> 
> On 13 Feb 2014, at 10:57, Maruan Sahyoun  wrote:
> 
>> John
>> 
>> Am 13.02.2014 um 18:50 schrieb John Hewson :
>> 
>>> Maruan,
>>> 
>>>> Now let’s assume there is a situation where an object is not at a certain 
>>>> location, or a specific string is missing …. what if we throw an exception 
>>>> where one could register a handler. We pass some kind of context e.g. 
>>>> lexer, file position, token …. and the user can handle the exception and 
>>>> „enrich“ the content or pass the correct information.
>>> 
>>> The idea sounds reasonable in theory, but the more I reflect on in the more 
>>> I think that we should assume that the user is making use of PDFBox because 
>>> they don’t want to have to parse the PDF file themselves. I can’t think of 
>>> an example where the knowledge of how to correct some invalid PDF would’t 
>>> be better off existing within PDFBox itself, rather than in user code.
>> 
>> Of course they don’t want to parse it themselves. They can expect that 
>> PDFBox can handle a valid PDF file. But in case a file is invalid for 
>> whatever reason the only options are to either wait until we include a 
>> workaround or put it in themselves. The idea is to have an entry point. 
>> What’s the benefit of an exception when one can’t do anything about it.  And 
>> if you don’t want to write your handler you are not enforced to do so. 
>> 
>>> 
>>> From a technical standpoint, exposing the internal parser context to the 
>>> user seems particularly problematic: the internal implementation details 
>>> which are part of the context now become part of PDFBox’s public API which 
>>> needs to be kept stable between major releases. How is the user to resolve 
>>> a non-trivial exception and allow parsing to continue in a manner which 
>>> leaves the internals of the parser in a consistent state? If we don’t know 
>>> how users are resolving exceptions out in the real world, how can we be 
>>> sure that changes we make to the parser later won’t break their code?
>> 
>> One can only assume that a documented API is stable. As long as this is the 
>> case why should it break their code. Of course if a different file is 
>> causing a similar exception which will be dealt with by the exception 
>> handler and the code is not able to deal with it ...
>> 
>>> 
>>>> In addition to that we are able to extend from a strictly conformant 
>>>> parsing to a relaxed parsing by using the same mechanism thus having the 
>>>> workarounds not in the ‚core‘ parser.
>>> 
>>> 
>>> My suggestion would be to either subclass the core parser or pass it a 
>>> “conformance level” argument, e.g. PDF_1_5 or PDF_X. I don’t think any 
>>> external error handling/recovery mechanism is going to work in practice, 
>>> especially if that means generating thousands of exceptions when given a 
>>> bad content stream.
>>&

Re: [DISCUSS] PDFBox and Exception handling

2014-02-16 Thread Maruan Sahyoun

Hi Fred,

unfortunately the attachment didn't make it through due to restrictions of the 
mailing list - could you make it available somewhere on a public site?

BR

Maruan Sahyoun

> Am 16.02.2014 um 01:04 schrieb Fred Hansen :
> 
> 
> Just in case you're not tired of exceptions, I've written the attached. It 
> concludes that the right-thing-to-do is to examine individually each throw 
> statement.
>

Re: [DISCUSS] PDFBox and Exception handling

2014-02-16 Thread Maruan Sahyoun

Hi Fred,

thank you for putting down your thoughts, very helpful.

BR
Maruan Sahyoun

> Am 16.02.2014 um 23:36 schrieb Fred Hansen :
> 
> I've converted the attachment to a web page:
>http://physpics.com/Java/Notes/ExceptionHandling.php
> 
> 
> 
> 
>> ____________
>> From: Maruan Sahyoun 
>> To: "dev@pdfbox.apache.org"  
>> Sent: Sunday, February 16, 2014 3:36 AM
>> Subject: Re: [DISCUSS] PDFBox and Exception handling
>> 
>> 
>> Hi Fred,
>> 
>> unfortunately the attachment didn't make it through due to restrictions of 
>> the mailing list - could you make it available somewhere on a public site?
>> 
>> BR
>> 
>> Maruan Sahyoun
>> 
>> 
>>> Am 16.02.2014 um 01:04 schrieb Fred Hansen :
>>> 
>>> 
>>> Just in case you're not tired of exceptions, I've written the attached. It 
>>> concludes that the right-thing-to-do is to examine individually each throw 
>>> statement.
>>

PDFBox and GitHub

2014-02-17 Thread Maruan Sahyoun

Hi,

according to Infra there is a better GitHub integration available on as an opt 
in feature

https://blogs.apache.org/infra/entry/improved_integration_between_apache_and

Shall we use it?

Maruan Sahyoun

pdfbox.io - which should I use

2014-02-18 Thread Maruan Sahyoun

Hi,

there are currently a number of different options to use as a base for a 
potential new parser/lexer. The ones currently in use are

BaseParser: 
import org.apache.pdfbox.io.PushBackInputStream;
import org.apache.pdfbox.io.RandomAccess;

PDFParser (additional):
import org.apache.pdfbox.io.RandomAccess;

NonSequentialParser:
import org.apache.pdfbox.io.PushBackInputStream;
import org.apache.pdfbox.io.RandomAccess;
import org.apache.pdfbox.io.RandomAccessBuffer;
import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream;

There are some additional Classes/Interfaces in the io package e.g. 
RandomAccessBufferedFileInputStream implementing RandomAccessRead

Any preferences, ideas of consolidating this? 

Currently I’m using RandomAccessBufferedFileInputStream with some additional 
implementations of RandomAccessRead to support reading from a ByteArray for 
testing purposes)

BR

Maruan Sahyoun

Re: PDFBox and GitHub

2014-02-18 Thread Maruan Sahyoun

Hi,

Am 18.02.2014 um 13:00 schrieb Andreas Lehmkühler :

> Hi,
> 
>> Maruan Sahyoun  hat am 17. Februar 2014 um 09:16
>> geschrieben:
>> 
>> 
>> Hi,
>> 
>> according to Infra there is a better GitHub integration available on as an 
>> opt
>> in feature
>> 
>> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
>> 
>> Shall we use it?
> I'm not sure if I got the point. Is your idea to do the switch from svn to git
> or to use those
> opt in features with our readonly git mirror (is that possible)?

If I understood correctly that’s possible with the current setup. So I’m not 
proposing to switch to git from svn as part of that question.

> 
>> Maruan Sahyoun
> 
> BR
> Andreas Lehmkühler

BR
Maruan

Re: pdfbox.io - which should I use

2014-02-18 Thread Maruan Sahyoun

Yes, we could use RandomAccessRead as a base and subclasses to wrap NIO and 
others. 

Then the parsers would use RandomAccessRead

WDYT

Maruan Sahyoun

> Am 18.02.2014 um 21:42 schrieb John Hewson :
> 
> The streams used by BaseParser and PDFParser are sequential, so you can 
> ignore them.
> Use of PushBackInputStream in the non-sequential parser seems a little odd. 
> 
> We might want to think about getting rid of the classes in 
> org.apache.pdfbox.io and replacing
> them with classes from java.nio.channels. It looks like the PDFBox classes 
> pre-date NIO.
> With NIO we could use memory mapped files, which for large PDFFiles will 
> perform better
> than an InputStream.
> 
> -- John
> 
>> On 18 Feb 2014, at 03:53, Maruan Sahyoun  wrote:
>> 
>> Hi,
>> 
>> there are currently a number of different options to use as a base for a 
>> potential new parser/lexer. The ones currently in use are
>> 
>> BaseParser: 
>> import org.apache.pdfbox.io.PushBackInputStream;
>> import org.apache.pdfbox.io.RandomAccess;
>> 
>> PDFParser (additional):
>> import org.apache.pdfbox.io.RandomAccess;
>> 
>> NonSequentialParser:
>> import org.apache.pdfbox.io.PushBackInputStream;
>> import org.apache.pdfbox.io.RandomAccess;
>> import org.apache.pdfbox.io.RandomAccessBuffer;
>> import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream;
>> 
>> There are some additional Classes/Interfaces in the io package e.g. 
>> RandomAccessBufferedFileInputStream implementing RandomAccessRead
>> 
>> Any preferences, ideas of consolidating this? 
>> 
>> Currently I’m using RandomAccessBufferedFileInputStream with some additional 
>> implementations of RandomAccessRead to support reading from a ByteArray for 
>> testing purposes)
>> 
>> BR
>> 
>> Maruan Sahyoun
>

Re: pdfbox.io - which should I use

2014-02-18 Thread Maruan Sahyoun

Hi John,

I'd think that we would still need pdfbox.io as e.g SeekableByteChannel doesn't 
give us an easy way of reading a single char (needed for parsing) but that 
would be a small wrapper so we don't need to handle that inside parsers. Reason 
is that data is read as a ByteBuffer which is a chunk of data.

Maruan Sahyoun

> Am 19.02.2014 um 04:45 schrieb John Hewson :
> 
> RandomAccessRead looks like it could be replaced with 
> java.nio.channels.SeekableByteChannel as implemented by 
> java.nio.channels.FileChannel.
> 
> -- John
> 
>> On 18 Feb 2014, at 12:50, Maruan Sahyoun  wrote:
>> 
>> Yes, we could use RandomAccessRead as a base and subclasses to wrap NIO and 
>> others. 
>> 
>> Then the parsers would use RandomAccessRead
>> 
>> WDYT
>> 
>> Maruan Sahyoun
>> 
>>> Am 18.02.2014 um 21:42 schrieb John Hewson :
>>> 
>>> The streams used by BaseParser and PDFParser are sequential, so you can 
>>> ignore them.
>>> Use of PushBackInputStream in the non-sequential parser seems a little odd. 
>>> 
>>> We might want to think about getting rid of the classes in 
>>> org.apache.pdfbox.io and replacing
>>> them with classes from java.nio.channels. It looks like the PDFBox classes 
>>> pre-date NIO.
>>> With NIO we could use memory mapped files, which for large PDFFiles will 
>>> perform better
>>> than an InputStream.
>>> 
>>> -- John
>>> 
>>>> On 18 Feb 2014, at 03:53, Maruan Sahyoun  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> there are currently a number of different options to use as a base for a 
>>>> potential new parser/lexer. The ones currently in use are
>>>> 
>>>> BaseParser: 
>>>> import org.apache.pdfbox.io.PushBackInputStream;
>>>> import org.apache.pdfbox.io.RandomAccess;
>>>> 
>>>> PDFParser (additional):
>>>> import org.apache.pdfbox.io.RandomAccess;
>>>> 
>>>> NonSequentialParser:
>>>> import org.apache.pdfbox.io.PushBackInputStream;
>>>> import org.apache.pdfbox.io.RandomAccess;
>>>> import org.apache.pdfbox.io.RandomAccessBuffer;
>>>> import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream;
>>>> 
>>>> There are some additional Classes/Interfaces in the io package e.g. 
>>>> RandomAccessBufferedFileInputStream implementing RandomAccessRead
>>>> 
>>>> Any preferences, ideas of consolidating this? 
>>>> 
>>>> Currently I’m using RandomAccessBufferedFileInputStream with some 
>>>> additional implementations of RandomAccessRead to support reading from a 
>>>> ByteArray for testing purposes)
>>>> 
>>>> BR
>>>> 
>>>> Maruan Sahyoun
>

Re: pdfbox.io - which should I use

2014-02-18 Thread Maruan Sahyoun

Hi John,

forgot that - SeekableByteChannel is Java 1.7

BR
Maruan Sahyoun

Am 19.02.2014 um 04:45 schrieb John Hewson :

> RandomAccessRead looks like it could be replaced with 
> java.nio.channels.SeekableByteChannel as implemented by 
> java.nio.channels.FileChannel.
> 
> -- John
> 
> On 18 Feb 2014, at 12:50, Maruan Sahyoun  wrote:
> 
>> Yes, we could use RandomAccessRead as a base and subclasses to wrap NIO and 
>> others. 
>> 
>> Then the parsers would use RandomAccessRead
>> 
>> WDYT
>> 
>> Maruan Sahyoun
>> 
>>> Am 18.02.2014 um 21:42 schrieb John Hewson :
>>> 
>>> The streams used by BaseParser and PDFParser are sequential, so you can 
>>> ignore them.
>>> Use of PushBackInputStream in the non-sequential parser seems a little odd. 
>>> 
>>> We might want to think about getting rid of the classes in 
>>> org.apache.pdfbox.io and replacing
>>> them with classes from java.nio.channels. It looks like the PDFBox classes 
>>> pre-date NIO.
>>> With NIO we could use memory mapped files, which for large PDFFiles will 
>>> perform better
>>> than an InputStream.
>>> 
>>> -- John
>>> 
>>>> On 18 Feb 2014, at 03:53, Maruan Sahyoun  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> there are currently a number of different options to use as a base for a 
>>>> potential new parser/lexer. The ones currently in use are
>>>> 
>>>> BaseParser: 
>>>> import org.apache.pdfbox.io.PushBackInputStream;
>>>> import org.apache.pdfbox.io.RandomAccess;
>>>> 
>>>> PDFParser (additional):
>>>> import org.apache.pdfbox.io.RandomAccess;
>>>> 
>>>> NonSequentialParser:
>>>> import org.apache.pdfbox.io.PushBackInputStream;
>>>> import org.apache.pdfbox.io.RandomAccess;
>>>> import org.apache.pdfbox.io.RandomAccessBuffer;
>>>> import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream;
>>>> 
>>>> There are some additional Classes/Interfaces in the io package e.g. 
>>>> RandomAccessBufferedFileInputStream implementing RandomAccessRead
>>>> 
>>>> Any preferences, ideas of consolidating this? 
>>>> 
>>>> Currently I’m using RandomAccessBufferedFileInputStream with some 
>>>> additional implementations of RandomAccessRead to support reading from a 
>>>> ByteArray for testing purposes)
>>>> 
>>>> BR
>>>> 
>>>> Maruan Sahyoun
>>> 
>

Re: Color Space Refactoring

2014-02-20 Thread Maruan Sahyoun

Hi John,

that's no doubt a great enhancement and a hughe step forward.

BR

Maruan Sahyoun

> Am 20.02.2014 um 11:17 schrieb John Hewson :
> 
> Hi All
> 
> I have just committed a significant refactoring of color spaces to trunk. The 
> main purpose of the change is to encapsulate all color space handling code 
> within PDColorSpace and its subclasses. Until now there was color handling 
> code in many different places, including separate code for each image format. 
> Due to the close link between images, color, and performance it has been 
> necessary to rewrite much of the image reading code.
> 
> Here's a summary of the changes:
> 
> - PDCcitt has been removed, its reading capability has moved to 
> CCITTFaxFilter and writing capability has moved to CCITTFactory.
> 
> - PDJpeg has been removed. JPEG reading is now done by new code in DCTFilter 
> which correctly handles CMYK/YCCK color. This fixes various files where 
> images appeared like negatives. JPEG writing is done by new code in 
> JPEGFactory.
> 
> - cleaned up JBIG2Filter
> 
> - cleaned up JPXFilter, in particular calling decode() caused the stream 
> dictionary to be updated, which was unsafe. I've also added a special 
> JPXColorSpace which wraps the embedded AWT color space of a JPX 
> BufferedImage, this replaces the need for the awkward mapping of ColorSpace 
> to PDColorSpace.
> 
> - Added better error messages for missing JAI plugins (JPX, JBIG2). A special 
> exception, MissingImageReaderException is now thrown.
> 
> - PDXObjectForm has been renamed to PDFormXObject to match the PDF spec.
> - PDXObjectImage has been renamed in the same manner.
> - PDInlinedImage has been renamed to PDInlineImage for the same reason.
> - CCITTFaxDecodeFilter has been renamed to CCITTFaxFilter for consistency 
> with the other filters.
> 
> - ImageParameters has been removed, it was used to represent inline image 
> parameters which are now simply members of PDInlineImage.
> 
> - added PDColor which represents a color value, including patterns, it is 
> immutable for ease of use.
> 
> - removed PDColorState which was a container for both a color and a color 
> space, in almost every case it was used to represent a color and so has been 
> replaced by PDColor and occasionally PDColorSpace.
> 
> - moved most of the functionality of PDXObject into its subclasses
> 
> - rewrote almost all color handling code in all PDColorSpace subclasses, 
> including fixing the calculations for l*a*b, DeviceN, and indexed color 
> spaces. 
> 
> - all color spaces now implement a toRGB(float[]) function for color 
> conversion, so external consumers of color spaces no longer have to know 
> about internals such as tint transforms.
> 
> - image color conversion is now performed in one operation, using 
> ColorConvertOp, rather than pixel-by-pixel, this speeds up ICC transforms by 
> many orders of magnitude. Color spaces now expose a special method 
> toImageRGB(Raster) for this purpose. This fixes some known performance issues 
> with certain files.
> 
> - updated Type1, Axial, Radial, and Gouraud shading contexts to call the new 
> toRGB functions. This is an interim measure, for better performance the color 
> conversion should instead be done using toImageRGB after the entire gradient 
> is drawn to the raster.
> 
> - creation of AWT Paint has been moved inside color spaces, hiding the 
> details from the caller. It is no longer possible to get an AWT Color from a 
> color space, only a Paint may be obtained.
> 
> - removed PDColorSpaceFactory and moved its functionality into PDColorSpace.
> 
> - moved some of the new shading and tiling pattern code to PDPattern so that 
> toPaint() is encapsulated in the color space.
> 
> - new PDImage interface which is implemented by both PDInlineImage and 
> PDImageXObject
> 
> - Image XObject image reading, masking  and stencilling code has been 
> rewritten, resulting in the removal of CompositeImage.
> 
> - new SampledImageReader performs image reading for all formats, including 
> JPEG and CCITT. The format itself is simply a filter, as is the case in the 
> PDF spec. New image reading handles decode arrays, interpolation, and 
> conversion of all image types to efficient 8bpp rasters. This replaces 
> PDPixelMap as well as reading code from PDJpeg and PDCcitt. Handling of decod 
> arrays fixes various issues where images were inverted, especially inline 
> images in Type 3 fonts.
> 
> - removed SetNonStrokingICCBasedColor, SetNonStrokingIndexed, 
> SetNonStrokingPattern, SetNonStrokingSeparation, SetStrokingICCBasedColor, 
> SetStrokingIndexed, SetStrokingPattern, SetStrokingSeparation, and replaced 
> them with SetColor.
> 
> There will no doubt be some regressions, please post a comment on PDFBOX-1893 
> to let me know.
> 
> Thanks
> 
> -- John
> 
>

Re: covert to Image is very slow

2014-02-25 Thread Maruan Sahyoun

Antonio,

in addition to John’s comment:

is the 4 to 5 secs for the pure conversion (page.convertToImage) or the 
complete run? Could you time the portions?

BR 
Maruan Sahyoun

Am 25.02.2014 um 19:20 schrieb John Hewson :

> Antonio
> 
> For complex pages or pages with many images 4-5 seconds is to be expected.
> If the page in question is very simple there may be something PDFBox can fix
> to seed things up. If so, open an issue on the PDFBox JIRA and attach the PDF
> file via More > Attach Files.
> 
> Before doing so, please try the latest 2.0.0 trunk snapshot, we have recently 
> made
> a number of performance improvements.
> 
> Some general speed tips: use TYPE_INT_RGB or TYPE_INT_ARGB buffers,
> not *_BGR and try rendering at a lower resolution, if possible.
> 
> -- John
> 
> On 25 Feb 2014, at 06:15, Antonio González  wrote:
> 
>> Hi
>> 
>> When i convert a PDF file a Image is very slow 4 o 5 secs.
>> 
>> my code is
>> 
>> 
>> String fichero = "C:\\guiaalfresco.pdf";
>> PDDocument pdfDocument= null;
>> try {
>> File file = new File(fichero);
>> pdfDocument = PDDocument.load(file);
>> List pages = pdfDocument.getDocumentCatalog().getAllPages();
>> if (pages.size()>0){
>> // Captura la primera página del PDF
>> PDPage page = (PDPage) pages.get(0);
>> // Convierta la página PDF a Image
>> BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_BGR,200 );
>> pdfDocument.close();
>> File outputfile = new File("c:\\saved.png");
>> BufferedImage imagen=resizeImage(image, 200);
>> ImageIO.write(imagen, "png", outputfile);
>> }
>> } catch (IOException e) {
>> e.printStackTrace();
>> }
>

Re: Remove AWT Fonts

2014-03-04 Thread Maruan Sahyoun

Hi John,

what about just using the platform fonts? If not then Latex uses the URW++ 
fonts which were made available under the http://www.latex-project.org/lppl 
license. (same fonts are used by Ghostscript). Could check if the license is 
fine with ours.

BR
Maruan Sahyoun

Am 03.03.2014 um 21:20 schrieb John Hewson :

> Hi All
> 
> I wanted to bring PDFBOX-1959 to the attention of the mailing list. PDFBox is 
> ready to leave AWT font rendering behind as the JDKs rendering has proven to 
> be buggy and we now have our own renderers for all font types in 2.0.0.
> 
> Before we can do this we need to ship a set of standard 14 fonts with PDFBox 
> as currently the system fonts are being used via AWT. We also need to provide 
> a mechanism for the user to supply their own external fonts for cases where 
> embedded fonts are missing. 
> 
> The main question is, what fonts should we ship? Some of the "free" fonts 
> I've seen render very poorly, any suggestions? Furthermore, are there fonts 
> under more restrictive licenses which we could ship? Apache does allow for 
> such files to be part of a project under certain conditions.
> 
> Also: Adobe has some font packs, e.g. Japanese, which we could point users 
> towards.
> 
> Cheers
> 
> -- John

Re: Remove AWT Fonts

2014-03-04 Thread Maruan Sahyoun

Hi John,

what I was having in mind is something similar to Apache FOP’s auto detect 
feature for fonts.

doc: https://xmlgraphics.apache.org/fop/1.1/fonts.html
code: 
http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/fonts/autodetect/

Fo inclusion these are some additional candidates

https://fedorahosted.org/liberation-fonts/ (SIL licensed 
http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
Croscore fonts https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts


I’d think if we can avoid bundling a set of fonts but use OS fonts and/or allow 
people to use their own will help us in the long run as if the quality is not 
inline with the ones used by Adobe Reader there will be additional 
questions/issues/bug reports we are not able to resolve.

BR

Maruan Sahyoun

Am 04.03.2014 um 19:34 schrieb John Hewson :

> Hi Maruan
> 
> Java provides access to platform fonts via AWT and does not reveal the paths 
> to the fonts
> which it finds, so it is not practical to use platform fonts without using 
> AWT. There have also
> been a number of problems with some unix platforms which lack some of the 
> standard 14
> fonts or which ship with poor quality substitutes. Ideally, PDFBox should 
> produce the same
> result irrespective of which platform it is running on, much like Adobe 
> Reader (excluding any
> missing embedded fonts, of course).
> 
> I’ve had poor experiences in the past with the Nimbus family of fonts from 
> URW++ but there
> are numerous factors (kerning, hinting, metrics, TTF vs Type 1) which may 
> have changed since
> then. We should check out how well these fonts compare with the standard 14 
> used by Adobe,
> in particular whether or not the metrics actually match (I know that it is 
> claimed that they do).
> 
> -- John
> 
> On 4 Mar 2014, at 05:48, Maruan Sahyoun  wrote:
> 
>> Hi John,
>> 
>> what about just using the platform fonts? If not then Latex uses the URW++ 
>> fonts which were made available under the http://www.latex-project.org/lppl 
>> license. (same fonts are used by Ghostscript). Could check if the license is 
>> fine with ours.
>> 
>> BR
>> Maruan Sahyoun
>> 
>> Am 03.03.2014 um 21:20 schrieb John Hewson :
>> 
>>> Hi All
>>> 
>>> I wanted to bring PDFBOX-1959 to the attention of the mailing list. PDFBox 
>>> is ready to leave AWT font rendering behind as the JDKs rendering has 
>>> proven to be buggy and we now have our own renderers for all font types in 
>>> 2.0.0.
>>> 
>>> Before we can do this we need to ship a set of standard 14 fonts with 
>>> PDFBox as currently the system fonts are being used via AWT. We also need 
>>> to provide a mechanism for the user to supply their own external fonts for 
>>> cases where embedded fonts are missing. 
>>> 
>>> The main question is, what fonts should we ship? Some of the "free" fonts 
>>> I've seen render very poorly, any suggestions? Furthermore, are there fonts 
>>> under more restrictive licenses which we could ship? Apache does allow for 
>>> such files to be part of a project under certain conditions.
>>> 
>>> Also: Adobe has some font packs, e.g. Japanese, which we could point users 
>>> towards.
>>> 
>>> Cheers
>>> 
>>> -- John
>> 
>

Re: Remove AWT Fonts

2014-03-04 Thread Maruan Sahyoun

John,

I don’t understand why we do have to ship fonts. We didn’t ship fonts until now 
but were dependent on platform fonts through AWT. So the situation won’t 
change. 

For legal reasons we won’t be able to use the fonts Adobe uses and I doubt that 
there are open source fonts which provide the same results. (rendering quality, 
number of glyphs ….) so I think a mechanism to use platform fonts and letting 
users register new ones similar to our current font aliases is a better and 
more reliable option. 

BR
Maruan Sahyoun

Am 04.03.2014 um 21:28 schrieb John Hewson :

> Maruan
> 
>> what I was having in mind is something similar to Apache FOP’s auto detect 
>> feature for fonts.
> 
> Yeah, this looks good, we could use this for finding missing embedded fonts.
> 
>> For inclusion these are some additional candidates
>> 
>> https://fedorahosted.org/liberation-fonts/ (SIL licensed 
>> http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
>> http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
>> Croscore fonts 
>> https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts
> 
> Great, I’ll take a look.
> 
>> I’d think if we can avoid bundling a set of fonts but use OS fonts and/or 
>> allow people to use their own will help us in the long run as if the quality 
>> is not inline with the ones used by Adobe Reader there will be additional 
>> questions/issues/bug reports we are not able to resolve.
> 
> We still need to ship a set of standard 14 fonts to solve the problems with 
> platforms which don’t
> have these fonts or have poor quality substitutes. The ideal solution is to 
> bundle our own high
> quality fonts and not depend on proprietary, platform-specific fonts. If we 
> can’t do this for some
> reason (e.g. quality), then we can reluctantly make use of platform fonts.
> 
> -- John
> 
> On 4 Mar 2014, at 11:45, Maruan Sahyoun  wrote:
> 
>> Hi John,
>> 
>> what I was having in mind is something similar to Apache FOP’s auto detect 
>> feature for fonts.
>> 
>> doc: https://xmlgraphics.apache.org/fop/1.1/fonts.html
>> code: 
>> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/fonts/autodetect/
>> 
>> Fo inclusion these are some additional candidates
>> 
>> https://fedorahosted.org/liberation-fonts/ (SIL licensed 
>> http://scripts.sil.org/cms/scripts/page.php?item_id=OFL-FAQ_web&_sc=1#68092c0f)
>> http://dejavu-fonts.org/ (http://dejavu-fonts.org/wiki/License)
>> Croscore fonts 
>> https://fedoraproject.org/wiki/I18N/Liberation_vs_Croscore_fonts
>> 
>> 
>> I’d think if we can avoid bundling a set of fonts but use OS fonts and/or 
>> allow people to use their own will help us in the long run as if the quality 
>> is not inline with the ones used by Adobe Reader there will be additional 
>> questions/issues/bug reports we are not able to resolve.
>> 
>> BR
>> 
>> Maruan Sahyoun
>> 
>> Am 04.03.2014 um 19:34 schrieb John Hewson :
>> 
>>> Hi Maruan
>>> 
>>> Java provides access to platform fonts via AWT and does not reveal the 
>>> paths to the fonts
>>> which it finds, so it is not practical to use platform fonts without using 
>>> AWT. There have also
>>> been a number of problems with some unix platforms which lack some of the 
>>> standard 14
>>> fonts or which ship with poor quality substitutes. Ideally, PDFBox should 
>>> produce the same
>>> result irrespective of which platform it is running on, much like Adobe 
>>> Reader (excluding any
>>> missing embedded fonts, of course).
>>> 
>>> I’ve had poor experiences in the past with the Nimbus family of fonts from 
>>> URW++ but there
>>> are numerous factors (kerning, hinting, metrics, TTF vs Type 1) which may 
>>> have changed since
>>> then. We should check out how well these fonts compare with the standard 14 
>>> used by Adobe,
>>> in particular whether or not the metrics actually match (I know that it is 
>>> claimed that they do).
>>> 
>>> -- John
>>> 
>>> On 4 Mar 2014, at 05:48, Maruan Sahyoun  wrote:
>>> 
>>>> Hi John,
>>>> 
>>>> what about just using the platform fonts? If not then Latex uses the URW++ 
>>>> fonts which were made available under the 
>>>> http://www.latex-project.org/lppl license. (same fonts are used by 
>>>> Ghostscript). Could check if the license is fine with ours.
>>>> 
>>>> BR
>>>> Maru

Re: IOException when merging PDF after increasing pushBackSize

2014-03-05 Thread Maruan Sahyoun

Hi James,

a) the file didn’t make it to the mailing list because of restrictions. Could 
you upload it to a public location?
b) try opening the document with PDDocument.loadNonSeq() in a simple test case 
- will it give errors?

BR
Maruan Sahyoun

Am 05.03.2014 um 15:21 schrieb James Carter :

> When attempting to merge the attached PDF with several other documents, PDF 
> throws the following exception: Could not push back 328764 bytes in order to 
> reparse stream. Try increasing push back buffer using system property 
> org.apache.pdfbox.baseParser.pushBackSize
> 
> The discussion on the JIRA ticket (PDFBOX-1920) mentioned that the PDF is not 
> well formed. Upon increasing the pushBackSize, the following error is seen:
> 
> Exception in thread "main" java.io.IOException: expected='endstream' 
> actual='' org.apache.pdfbox.io.PushBackInputStream@45cb0cdc
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:609)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:605)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:194)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1186)
> at 
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:196)
> at com.acme.MergePDF.runSmartService(MergePDF.java:52)
> at com.acme.MergePDF.main(MergePDF.java:68)
> 
> Is this reasonably something that PDFBox could handle, or does the ill formed 
> nature of the PDF leave this outside of what PDFBox would support?
> 
> Thanks,
> James

PDFBox Documentation - Rendering

2014-03-10 Thread Maruan Sahyoun

Hi,

I’m currently enhancing the documentation for PDFBox with some more samples, 
code snippets etc. 

For the developer section would it be possible that someone - maybe John or 
Tilman as they are most familiar with the rendering code - writes up a small 
introductory article about how rendering works in PDFBox. Only a quick overview?

BR
Maruan Sahyoun

[DISCUSS] PDFBox and support for PDF versions, PDF standards

2014-03-10 Thread Maruan Sahyoun

Hi,

as I’m currently looking at the parsing part of PDFBox one question came to my 
mind which is a more formal support for PDF versions and PDF standards such as 
PDF/A, PDF/UA …

As of today PDFBox has no formal support for specific PDF versions in a way 
that a specific version can be enforced, validated ... The PDFBox PDF/A 
validation does a good job for PDF/A 1b but it can not be easily extended to 
other standards.

Do you think that there is a need for a more formal support of such standards 
and versions? The would influence some of the design decisions for the parser 
and affect the base objects.

BR
Maruan Sahyoun

Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

2014-03-10 Thread Maruan Sahyoun

Hi John,

it’s not about PDF versions but PDF versions and standards.

The base syntax has not changed. But the elements described by the base have.

BR
Maruan Sahyoun

Am 10.03.2014 um 09:20 schrieb John Hewson :

> Hi Maruan
> 
>> As of today PDFBox has no formal support for specific PDF versions in a way 
>> that a specific version can be enforced, validated ...
> 
> Perhaps that is because there is not much demand for this? Nowadays everyone 
> has instant access to the latest version of Adobe Reader so checking that a 
> PDF can be opened with a specific version of Adobe Reader is not that useful 
> anymore. There might be some niche cases, but I can’t think what they would 
> be. For cases where it’s important that a PDF file is valid then a format 
> such as PDF/A or PDF/X must be used instead as “vanilla" PDF is ambiguous.
> 
>> The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not be 
>> easily extended to other standards.
> 
> Yes, PDF/A is carefully validated because it is for archival purposes, unlike 
> regular PDF files.
> 
>> Do you think that there is a need for a more formal support of such 
>> standards and versions? The would influence some of the design decisions for 
>> the parser and affect the base objects.
> 
> 
> I can’t think of a reason why someone would want to parse a specific PDF 
> version, so my answer is no, I don’t think there is such a need. Has the 
> syntax of PDF even changed that much over the different versions?
> 
> — John
>

Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

2014-03-10 Thread Maruan Sahyoun

I think we are talking about two different things here. The parsing process to 
get the tokens, and the parsing process to follow the PDF file layout and to 
form and follow the higher level structures such as Xref. Tokens didn’t change. 
File layout and higher level structures did like - Linerization or Xref 
Streams. Dependent on the PDF standard some are permitted some are not. 

BR
Maruan

Am 10.03.2014 um 10:06 schrieb John Hewson :

>> The base syntax has not changed. But the elements described by the base have.
> 
> 
> If the syntax hasn’t changed then there can’t be anything in the parser which 
> is version-specific.
> 
> -- John
> 
> On 10 Mar 2014, at 01:43, Maruan Sahyoun  wrote:
> 
>> Hi John,
>> 
>> it’s not about PDF versions but PDF versions and standards.
>> 
>> The base syntax has not changed. But the elements described by the base have.
>> 
>> BR
>> Maruan Sahyoun
>> 
>> Am 10.03.2014 um 09:20 schrieb John Hewson :
>> 
>>> Hi Maruan
>>> 
>>>> As of today PDFBox has no formal support for specific PDF versions in a 
>>>> way that a specific version can be enforced, validated ...
>>> 
>>> Perhaps that is because there is not much demand for this? Nowadays 
>>> everyone has instant access to the latest version of Adobe Reader so 
>>> checking that a PDF can be opened with a specific version of Adobe Reader 
>>> is not that useful anymore. There might be some niche cases, but I can’t 
>>> think what they would be. For cases where it’s important that a PDF file is 
>>> valid then a format such as PDF/A or PDF/X must be used instead as 
>>> “vanilla" PDF is ambiguous.
>>> 
>>>> The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not be 
>>>> easily extended to other standards.
>>> 
>>> Yes, PDF/A is carefully validated because it is for archival purposes, 
>>> unlike regular PDF files.
>>> 
>>>> Do you think that there is a need for a more formal support of such 
>>>> standards and versions? The would influence some of the design decisions 
>>>> for the parser and affect the base objects.
>>> 
>>> 
>>> I can’t think of a reason why someone would want to parse a specific PDF 
>>> version, so my answer is no, I don’t think there is such a need. Has the 
>>> syntax of PDF even changed that much over the different versions?
>>> 
>>> — John
>>> 
>> 
>

Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

2014-03-10 Thread Maruan Sahyoun

OK - wasn’t precise enough - token types didn’t change but there are newer 
tokens introduced. 

As the syntax has changed do we need version and standards support in the 
parsing phase then? Other way would be to parse what’s in there and do 
validation etc. purely on the parsing result (COS model, PD model). Need to do 
that anyway.

What about writing?

BR
Maruan Sahyoun

Am 10.03.2014 um 11:43 schrieb John Hewson :

>>> If the syntax hasn’t changed then there can’t be anything in the parser 
>>> which is version-specific.
>> 
>> I think we are talking about two different things here. The parsing process 
>> to get the tokens and the parsing process to follow the PDF file layout and 
>> to form and follow the higher level structures such as Xref.
> 
> Yes, there are two phases, tokenizing and parsing; sometimes both are called 
> parsing.
> 
>> Tokens didn’t change. File layout and higher level structures did like - 
>> Linerization or Xref Streams. Dependent on the PDF standard some are 
>> permitted some are not. 
> 
> That’s not right. The tokens have changed: “xref” is a keyword and therefore 
> a token. Also, as I said originally, the syntax has changed, because what you 
> call "higher level structures” is actually the syntax.
> 
> -- John
> 
> On 10 Mar 2014, at 02:32, Maruan Sahyoun  wrote:
> 
>> I think we are talking about two different things here. The parsing process 
>> to get the tokens, and the parsing process to follow the PDF file layout and 
>> to form and follow the higher level structures such as Xref. Tokens didn’t 
>> change. File layout and higher level structures did like - Linerization or 
>> Xref Streams. Dependent on the PDF standard some are permitted some are not. 
>> 
>> BR
>> Maruan
>> 
>> Am 10.03.2014 um 10:06 schrieb John Hewson :
>> 
>>>> The base syntax has not changed. But the elements described by the base 
>>>> have.
>>> 
>>> 
>>> If the syntax hasn’t changed then there can’t be anything in the parser 
>>> which is version-specific.
>>> 
>>> -- John
>>> 
>>> On 10 Mar 2014, at 01:43, Maruan Sahyoun  wrote:
>>> 
>>>> Hi John,
>>>> 
>>>> it’s not about PDF versions but PDF versions and standards.
>>>> 
>>>> The base syntax has not changed. But the elements described by the base 
>>>> have.
>>>> 
>>>> BR
>>>> Maruan Sahyoun
>>>> 
>>>> Am 10.03.2014 um 09:20 schrieb John Hewson :
>>>> 
>>>>> Hi Maruan
>>>>> 
>>>>>> As of today PDFBox has no formal support for specific PDF versions in a 
>>>>>> way that a specific version can be enforced, validated ...
>>>>> 
>>>>> Perhaps that is because there is not much demand for this? Nowadays 
>>>>> everyone has instant access to the latest version of Adobe Reader so 
>>>>> checking that a PDF can be opened with a specific version of Adobe Reader 
>>>>> is not that useful anymore. There might be some niche cases, but I can’t 
>>>>> think what they would be. For cases where it’s important that a PDF file 
>>>>> is valid then a format such as PDF/A or PDF/X must be used instead as 
>>>>> “vanilla" PDF is ambiguous.
>>>>> 
>>>>>> The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not 
>>>>>> be easily extended to other standards.
>>>>> 
>>>>> Yes, PDF/A is carefully validated because it is for archival purposes, 
>>>>> unlike regular PDF files.
>>>>> 
>>>>>> Do you think that there is a need for a more formal support of such 
>>>>>> standards and versions? The would influence some of the design decisions 
>>>>>> for the parser and affect the base objects.
>>>>> 
>>>>> 
>>>>> I can’t think of a reason why someone would want to parse a specific PDF 
>>>>> version, so my answer is no, I don’t think there is such a need. Has the 
>>>>> syntax of PDF even changed that much over the different versions?
>>>>> 
>>>>> — John
>>>>> 
>>>> 
>>> 
>> 
>

Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

2014-03-11 Thread Maruan Sahyoun


> 
>> OK - wasn’t precise enough - token types didn’t change but there are newer 
>> tokens introduced. 
> 
> Yes.
> 
>> As the syntax has changed do we need version and standards support in the 
>> parsing phase then?
> 
> I don’t think so, no. I don’t know what the use-case would be. You’d have to 
> go back and read all seven versions of the PDF Reference and make sure that 
> the parser implements the correct handling for each version, that’s an awful 
> lot of work.

OK - so the parser should concentrate on getting the parsing done according to 
the spec (which is mostly the case with NonSequentialParser today) and we also 
have a way that there is some standards/relaxed way of parsing for files where 
the base syntax is not correct as we need to catch such circumstances for 
standards compliant parsing (which we don’t have in core but in the PDF/A 
project) but would ignore such errors if they can be corrected for relaxed 
parsing. 

> 
>> Other way would be to parse what’s in there and do validation etc. purely on 
>> the parsing result (COS model, PD model). Need to do that anyway.
> 
> Yes, I prefer this approach, you can always write a tool which inspects a 
> PDDocument and determines whether or not it uses features available in a 
> given PDF version. It seems better to do this as a separate feature than to 
> try and build it into the parser or the PD model directly.

Fine for me - would be something like a ‚profile' per standard which could be 
used for validation as well as writing.

To get that completed we need to revisit the PD model as not all features of 
PDF are reflected in the matching PD model. That could be done when 
implementing the profiles.

> 
>> What about writing?
> 
> Yes, we want versions for writing, because a user may want to generate e.g a 
> PDF 1.6 file. This is going to be even more important in the near future 
> because the PDF 2.0 standard is supposed to be introduced in 2014.

There are some base features missing in writing a PDF today but I think Andreas 
has something in the works. The ‚profile‘ mentioned above could be used for 
writing too e.g. to check if PD model keys are permitted for a certain 
standard/version or not.

> 
> -- John

Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

2014-03-11 Thread Maruan Sahyoun


> Great. One more thing...
> 
>> To get that completed we need to revisit the PD model as not all features of 
>> PDF are reflected in the matching PD model. That could be done when 
>> implementing the profiles.
> 
> All the PD classes provide access to the underlying COS model, so there’s no 
> need to expose low-level details in the PD model.

Yes I know. Working on the PD model would make the ‚profile‘ easier to build 
and understand but thinking about it, as one can work on the COS level, that’s 
the one which needs to be checked. WDYT?

Maruan


> 
> -- John
> 
> On 11 Mar 2014, at 00:24, Maruan Sahyoun  wrote:
> 
>> 
>>> 
>>>> OK - wasn’t precise enough - token types didn’t change but there are newer 
>>>> tokens introduced. 
>>> 
>>> Yes.
>>> 
>>>> As the syntax has changed do we need version and standards support in the 
>>>> parsing phase then?
>>> 
>>> I don’t think so, no. I don’t know what the use-case would be. You’d have 
>>> to go back and read all seven versions of the PDF Reference and make sure 
>>> that the parser implements the correct handling for each version, that’s an 
>>> awful lot of work.
>> 
>> OK - so the parser should concentrate on getting the parsing done according 
>> to the spec (which is mostly the case with NonSequentialParser today) and we 
>> also have a way that there is some standards/relaxed way of parsing for 
>> files where the base syntax is not correct as we need to catch such 
>> circumstances for standards compliant parsing (which we don’t have in core 
>> but in the PDF/A project) but would ignore such errors if they can be 
>> corrected for relaxed parsing. 
>> 
>>> 
>>>> Other way would be to parse what’s in there and do validation etc. purely 
>>>> on the parsing result (COS model, PD model). Need to do that anyway.
>>> 
>>> Yes, I prefer this approach, you can always write a tool which inspects a 
>>> PDDocument and determines whether or not it uses features available in a 
>>> given PDF version. It seems better to do this as a separate feature than to 
>>> try and build it into the parser or the PD model directly.
>> 
>> Fine for me - would be something like a ‚profile' per standard which could 
>> be used for validation as well as writing.
>> 
>> To get that completed we need to revisit the PD model as not all features of 
>> PDF are reflected in the matching PD model. That could be done when 
>> implementing the profiles.
>> 
>>> 
>>>> What about writing?
>>> 
>>> Yes, we want versions for writing, because a user may want to generate e.g 
>>> a PDF 1.6 file. This is going to be even more important in the near future 
>>> because the PDF 2.0 standard is supposed to be introduced in 2014.
>> 
>> There are some base features missing in writing a PDF today but I think 
>> Andreas has something in the works. The ‚profile‘ mentioned above could be 
>> used for writing too e.g. to check if PD model keys are permitted for a 
>> certain standard/version or not.
>> 
>>> 
>>> -- John
>> 
>

Re: Need JBIG2 test image

2014-03-12 Thread Maruan Sahyoun

Hi Tilman,

I can make one up tomorrow if no one else is faster. Will be done from scratch 
with no real world data in it.

BR

Maruan Sahyoun

Am 12.03.2014 um 18:43 schrieb Tilman Hausherr :

> No, the file would of course be public.
> 
> I can still have a look about whether PDFBOX can now handle these files, 
> however I suspect that this would bring you in trouble with the law even if I 
> promise you all you want.
> 
> PDFBOX does support JBIG2, you need the levigo plugin.
> 
> Tilman
> 
> Am 12.03.2014 18:33, schrieb Alin Mazilu:
>> I have a scanned accident police reports that have people names, addresses
>> and phone numbers in them. I had a problem printing these files with pdfbox
>> and I had to improvise by using a command prompt print utility as a
>> Process. I could maybe give you one if you agree not to release it to the
>> public.
>> 
>> Alin
>> 
>> 
>> On Wed, Mar 12, 2014 at 1:19 PM, Tilman Hausherr 
>> wrote:
>> 
>>> Hello all,
>>> 
>>> I'd need a PDF with JBIG2 encoding that can be distributed. So it should
>>> not have anything on it that is copyrighted, i.e. artwork or a real text.
>>> Just some random lines or a lorem ipsum text. The image should be black &
>>> white, i.e. not have other elements in it that have a color like a
>>> watermark. Some unserviced Xerox copiers might produce such images, or some
>>> software from Adobe, IRIS etc. If you have such a file, sent it to me,
>>> tilman at snafu dot de, not to the list.
>>> 
>>> I want to use this PDF for a unit test that checks whether the PDF is
>>> decoded with the JBIG2 plugin. A fail would be an empty image. This way we
>>> check that the JBIG2 plugin is properly attached.
>>> 
>>> Tilman
>>> 
>>> 
>

Re: Problem With MergeUtility

2014-03-13 Thread Maruan Sahyoun

Hi,

not a direct answer to your question but could you try PDDocument.loadNonSeq 
instead?

BR
Maruan Sahyoun

> Am 13.03.2014 um 16:16 schrieb Alin Mazilu :
> 
> Hello guys,
> 
> 
> Has anyone had any problem with this? Any idea why it happens? What would
> be a good value for pushBackSize so this does not happen? Thanks!
> 
> 
> Partial stack trace:
> 
> 
> org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940
> bytes in order to reparse stream. Try increasing push back buffer using
> system property org.apache.pdfbox.baseParser.pushBackSize
> 
> 
> 
>at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)
> 
> 
> 
>at
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
> 
> 
> 
>at
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
> 
> 
> 
>at
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
> 
> 
> 
>at
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
> 
> 
> 
>at
> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)

Re: Problem With MergeUtility

2014-03-13 Thread Maruan Sahyoun

this issue is logged at PDFBOX-1964 with a potential patch attached.


BR 
Maruan Sahyoun

Am 13.03.2014 um 17:52 schrieb Timo Boehme :

> Hi,
> 
> as far as I remember PDFMergeUtility is one of the last utilities not 
> supporting loadNonSeq currently.
> 
> As a workaround get the source of PDFMergeUtility, change PDDocument.load to 
> PDDocument.loadNonSeq  (you may provide null as buffer parameter).
> 
> 
> Best,
> Timo
> 
> 
> Am 13.03.2014 16:46, schrieb Alin Mazilu:
>> Where? Here's the code that causes that:
>> 
>> PDFMergeUtility util = new PDFMergeUtility();
>> 
>> for (File file : set) {
>> try{
>> if( file.exists() ){
>> util.addSource(file);
>> }
>> } catch ( Exception e ){
>>//log e
>> }
>>  }
>> util.setDestinationFileName(...);
>> 
>> util.mergeDocuments();
>> 
>> 
>> On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun 
>> wrote:
>> 
>>> Hi,
>>> 
>>> not a direct answer to your question but could you try
>>> PDDocument.loadNonSeq instead?
>>> 
>>> BR
>>> Maruan Sahyoun
>>> 
>>>> Am 13.03.2014 um 16:16 schrieb Alin Mazilu :
>>>> 
>>>> Hello guys,
>>>> 
>>>> 
>>>> Has anyone had any problem with this? Any idea why it happens? What would
>>>> be a good value for pushBackSize so this does not happen? Thanks!
>>>> 
>>>> 
>>>> Partial stack trace:
>>>> 
>>>> 
>>>> org.apache.pdfbox.exceptions.WrappedIOException: Could not push back
>>> 72940
>>>> bytes in order to reparse stream. Try increasing push back buffer using
>>>> system property org.apache.pdfbox.baseParser.pushBackSize
>>>> 
>>>> 
>>>> 
>>>>at
>>>> 
>>> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)
>>>> 
>>>> 
>>>> 
>>>>at
>>>> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
>>>> 
>>>> 
>>>> 
>>>>at
>>>> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
>>>> 
>>>> 
>>>> 
>>>>at
>>>> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
>>>> 
>>>> 
>>>> 
>>>>at
>>>> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
>>>> 
>>>> 
>>>> 
>>>>at
>>>> 
>>> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)
>>> 
>> 
> 
> 
> -- 
> 
> Timo Boehme
> OntoChem GmbH
> H.-Damerow-Str. 4
> 06120 Halle/Saale
> T: +49 345 4780474
> F: +49 345 4780471
> timo.boe...@ontochem.com
> 
> _
> 
> OntoChem GmbH
> Geschäftsführer: Dr. Lutz Weber
> Sitz: Halle / Saale
> Registergericht: Stendal
> Registernummer: HRB 215461
> _
>

Re: Removing processStream and processSubStream

2014-03-19 Thread Maruan Sahyoun

Hi,

in general I think that this is a valid change. From how I understand the 
rendering in PDF Form, Text, Image and Pattern maintain their own matrix to map 
to user space which is then transformed by the CTM to device space so handling 
them specifically is fine and inline with the spec. I’d suggest that we make 
sure that the different ‚spaces‘ are defined properly within the code and refer 
to the PDF spec so that the code is easier to read if this is not already the 
case. With so many changes it’s a good opportunity to enhance the documentation 
within the source code. Some of the old code enjoys very little documentation.  

I wouldn’t remove processStream and processSubStream but deprecate them and 
remove them in the next major release though as to keep the changes to a 
minimum. There are a number of very important changes in 2.0. The easier we can 
get people to use that version wo to many changes to their own code the better.

For 2.0 removing the deprecated stuff of 1.x is fine. Removing not deprecated 
stuff should be avoided if possible. 

For the rendering what might have been missed is taking the UserUnit entry in 
the page dictionary into account which might change the default user space. 
This was introduced in PDF 1.6. A good opportunity to read that entry and make 
sure that we handle it appropriately.

BR
Maruan Sahyoun

Am 18.03.2014 um 20:46 schrieb John Hewson :

> Hi All
> 
> I’m still working on getting Tiling Patterns to render correctly, and need to 
> make some
> changes to core PDFBox functionality in order to proceed. My problem is that 
> tiling
> patterns are defined in their parent stream’s initial coordinate space, 
> rather than the
> coordinate space defined by the CTM. However, in PDFBox there is no way to 
> access
> the parent stream, so I can’t find out what it’s initial matrix is. The 
> manner in which the
> initial coordinate space is determined is different for pages, forms, and 
> patterns
> 
> What this means is that the parent stream’s initial coordinate space needs to 
> be passed
> to processStream and processSubStream in PDFStreamEngine. This will 
> necessarily be
> a breaking change, and it will affect all downstream subclasses of 
> PDFStreamEngine.
> 
> Because this has to be a breaking change, I propose that we go all the way 
> and make
> the new API bulletproof, 1) so that we won’t have to introduce breaking 
> changes in the
> future if we encounter similar issues, 2) so that the caller of the method 
> can’t pass the
> wrong data in the parameters. We would remove the two generic methods:
> 
> public void processStream(PDResources resources, COSStream cosStream, 
> PDRectangle drawingSize, int rotation)
> public void processSubStream(PDResources resources, COSStream cosStream)
> 
> and replace them with four specific methods:
> 
> public void processPage(PDPage page)
> public void processForm(PDFormXObject form)
> public void processTilingPattern(PDTilingPattern pattern)
> public void processType3Font(PDType3Font font)
> 
> This would mean that the various “proces” methods have access to their 
> parent
> stream, and can read any of its public fields in the future without 
> introducing breaking
> changes by altering the method’s parameters.
> 
> What do you think?
> 
> -- John
>

Re: Removing processStream and processSubStream

2014-03-19 Thread Maruan Sahyoun

John,

Am 19.03.2014 um 18:15 schrieb John Hewson :

> Maruan
> 
>> From how I understand the rendering in PDF Form, Text, Image and Pattern 
>> maintain their own matrix to map to user space which is then transformed by 
>> the CTM to device space so handling them specifically is fine and inline 
>> with the spec.
> 
> No, that’s not right, what I said was:
> 
>>> My problem is that tiling patterns are defined in their parent stream’s 
>>> initial coordinate space, rather than the
>>> coordinate space defined by the CTM.
> 
> So patterns should *not* be using the CTM, which is what I’m trying to 
> achieve.
> 

I think you misunderstood what I wrote - patterns have their own matrix - so I 
think we are on the same page here. IMHO according to the spec CTM transforms 
from user space to device space. So it’s pattern space -> user space -> device 
space.


>> I’d suggest that we make sure that the different ‚spaces‘ are defined 
>> properly within the code and refer to the PDF spec so that the code is 
>> easier to read if this is not already the case. With so many changes it’s a 
>> good opportunity to enhance the documentation within the source code. Some 
>> of the old code enjoys very little documentation.
> 
> 
> I disagree, in general I don’t think that references to the PDF spec are a 
> good form of documentation (there are some exceptions). References to the 
> spec are meaningless to the reader unless they take the time to look them up 
> in a 700 page PDF document. I would argue that by just linking back to the 
> spec, we have *failed* to document PDFBox, not succeeded.
> 
> References to the PDF spec have another major flaw: they go out-of-date. For 
> example a Pattern Colour Space will always be called “Pattern Colour Space” 
> in future versions of the PDF spec but it may not be described in paragraph 
> 8.6.6.2 or on page 156. The existing code contains many references to the PDF 
> 1.6 and 1.7 specs as well as the ISO PDF32000 spec, which means that I need 
> three 700 page PDF files open at all times in order to look up PDFBox 
> references. With the new version of the PDF spec due this year, this 
> situation is going to get worse.
> 

Didn’t mean to only reference to the spec but to use the same terms as 
described by the spec. Adding references to the spec is an add-on not a 
replacement.

> I agree that some of the existing code needs more documentation, and I often 
> add documentation to old files which I’m working on. However, my approach is 
> to just paste in a sentence or two from the PDF spec (fair use). That way the 
> reader does not ever need to look at the PDF spec. Because we use the same 
> terminology in PDFBox as in the spec, if someone really wants to look 
> something up, it’s as simple as Ctrl+F, no reference needed, and it’s 
> guaranteed not to go out-of-date.
> 
>> I wouldn’t remove processStream and processSubStream but deprecate them and 
>> remove them in the next major release though as to keep the changes to a 
>> minimum.
> 
> This isn’t possible, as I said it "will necessarily be a breaking change”. 
> This is because in 2.0 PDFStreamEngine needs to know the parent of each 
> stream, but processStream and processSubStream do not provide this 
> information. That’s why I’m discussing this on the mailing list.

I don’t understand why this is shouldn’t be possible. It’s more effort, agreed, 
but beneficial.

> 
>> For the rendering what might have been missed is taking the UserUnit entry 
>> in the page dictionary into account which might change the default user 
>> space. This was introduced in PDF 1.6. A good opportunity to read that entry 
>> and make sure that we handle it appropriately.
> 
> Yes, I have this as a “todo” in my working copy, however, if we put the 
> UserUnit in the matrix then we should also put the page Rotation into the 
> matrix, but that’a a significant change.
> 
> -- John

Re: Removing processStream and processSubStream

2014-03-19 Thread Maruan Sahyoun

as an added note - initially you suggested

public void processTilingPattern(PDTilingPattern pattern) 

but as Patterns in general have their own matrix I think it applies to all 
patterns, that’s why I wrote „… Form, Text, Image and Pattern maintain …“

BR
Maruan

Am 19.03.2014 um 18:31 schrieb Maruan Sahyoun :

> John,
> 
> Am 19.03.2014 um 18:15 schrieb John Hewson :
> 
>> Maruan
>> 
>>> From how I understand the rendering in PDF Form, Text, Image and Pattern 
>>> maintain their own matrix to map to user space which is then transformed by 
>>> the CTM to device space so handling them specifically is fine and inline 
>>> with the spec.
>> 
>> No, that’s not right, what I said was:
>> 
>>>> My problem is that tiling patterns are defined in their parent stream’s 
>>>> initial coordinate space, rather than the
>>>> coordinate space defined by the CTM.
>> 
>> So patterns should *not* be using the CTM, which is what I’m trying to 
>> achieve.
>> 
> 
> I think you misunderstood what I wrote - patterns have their own matrix - so 
> I think we are on the same page here. IMHO according to the spec CTM 
> transforms from user space to device space. So it’s pattern space -> user 
> space -> device space.
> 
> 
>>> I’d suggest that we make sure that the different ‚spaces‘ are defined 
>>> properly within the code and refer to the PDF spec so that the code is 
>>> easier to read if this is not already the case. With so many changes it’s a 
>>> good opportunity to enhance the documentation within the source code. Some 
>>> of the old code enjoys very little documentation.
>> 
>> 
>> I disagree, in general I don’t think that references to the PDF spec are a 
>> good form of documentation (there are some exceptions). References to the 
>> spec are meaningless to the reader unless they take the time to look them up 
>> in a 700 page PDF document. I would argue that by just linking back to the 
>> spec, we have *failed* to document PDFBox, not succeeded.
>> 
>> References to the PDF spec have another major flaw: they go out-of-date. For 
>> example a Pattern Colour Space will always be called “Pattern Colour Space” 
>> in future versions of the PDF spec but it may not be described in paragraph 
>> 8.6.6.2 or on page 156. The existing code contains many references to the 
>> PDF 1.6 and 1.7 specs as well as the ISO PDF32000 spec, which means that I 
>> need three 700 page PDF files open at all times in order to look up PDFBox 
>> references. With the new version of the PDF spec due this year, this 
>> situation is going to get worse.
>> 
> 
> Didn’t mean to only reference to the spec but to use the same terms as 
> described by the spec. Adding references to the spec is an add-on not a 
> replacement.
> 
>> I agree that some of the existing code needs more documentation, and I often 
>> add documentation to old files which I’m working on. However, my approach is 
>> to just paste in a sentence or two from the PDF spec (fair use). That way 
>> the reader does not ever need to look at the PDF spec. Because we use the 
>> same terminology in PDFBox as in the spec, if someone really wants to look 
>> something up, it’s as simple as Ctrl+F, no reference needed, and it’s 
>> guaranteed not to go out-of-date.
>> 
>>> I wouldn’t remove processStream and processSubStream but deprecate them and 
>>> remove them in the next major release though as to keep the changes to a 
>>> minimum.
>> 
>> This isn’t possible, as I said it "will necessarily be a breaking change”. 
>> This is because in 2.0 PDFStreamEngine needs to know the parent of each 
>> stream, but processStream and processSubStream do not provide this 
>> information. That’s why I’m discussing this on the mailing list.
> 
> I don’t understand why this is shouldn’t be possible. It’s more effort, 
> agreed, but beneficial.
> 
>> 
>>> For the rendering what might have been missed is taking the UserUnit entry 
>>> in the page dictionary into account which might change the default user 
>>> space. This was introduced in PDF 1.6. A good opportunity to read that 
>>> entry and make sure that we handle it appropriately.
>> 
>> Yes, I have this as a “todo” in my working copy, however, if we put the 
>> UserUnit in the matrix then we should also put the page Rotation into the 
>> matrix, but that’a a significant change.
>> 
>> -- John
>

Re: Removing processStream and processSubStream

2014-03-19 Thread Maruan Sahyoun

John

Am 19.03.2014 um 19:10 schrieb John Hewson :

> Maruan,
> 
>>>> From how I understand the rendering in PDF Form, Text, Image and Pattern 
>>>> maintain their own matrix to map to user space which is then transformed 
>>>> by the CTM to device space so handling them specifically is fine and 
>>>> inline with the spec.
>>> 
>>> No, that’s not right, what I said was:
>>> 
>>>>> My problem is that tiling patterns are defined in their parent stream’s 
>>>>> initial coordinate space, rather than the
>>>>> coordinate space defined by the CTM.
>>> 
>>> So patterns should *not* be using the CTM, which is what I’m trying to 
>>> achieve.
>>> 
>> 
>> I think you misunderstood what I wrote - patterns have their own matrix - so 
>> I think we are on the same page here. IMHO according to the spec CTM 
>> transforms from user space to device space. So it’s pattern space -> user 
>> space -> device space.
> 
> Nope, as I said, that’s what PDFBox currently does and it’s wrong. As you say 
> the CTM transforms from user space to device space, but it’s not the only way 
> to do so, and it is not used by patterns.

As the processing is defined in the spec this is a good reference so no need to 
discuss that further. Of course different people might come to different 
conclusions by reading and interpreting the spec. 

> 
>> Didn’t mean to only reference to the spec but to use the same terms as 
>> described by the spec. Adding references to the spec is an add-on not a 
>> replacement.
> 
> I don’t see what value this adds, given that the references will just go 
> out-of-date when the next spec is released. We already use the same 
> terminology as the PDF spec, so Ctrl+F can be used for quick look-ups that 
> won’t go out-of-date.

You are not enforced to add the information.

> 
>>> This isn’t possible, as I said it "will necessarily be a breaking change”. 
>>> This is because in 2.0 PDFStreamEngine needs to know the parent of each 
>>> stream, but processStream and processSubStream do not provide this 
>>> information. That’s why I’m discussing this on the mailing list.
>> 
>> I don’t understand why this is shouldn’t be possible. It’s more effort, 
>> agreed, but beneficial.
> 
> 
> What’s not to understand? PDFStreamEngine *needs* to know the parent of each 
> stream, and the old methods don’t provide this, passing a null parent will 
> not work because we need that information later in order to correctly process 
> the stream. If we allowed a null parent to be passed, the result would be 
> silently broken rendering - there’s no value in providing a 
> backwards-compatible API if it can only produce broken results.

Won’t get to the same conclusion here (as I think we won’t get on the other 
topics above).

> 
> -- John
> 
> On 19 Mar 2014, at 10:31, Maruan Sahyoun  wrote:
> 
>> John,
>> 
>> Am 19.03.2014 um 18:15 schrieb John Hewson :
>> 
>>> Maruan
>>> 
>>>> From how I understand the rendering in PDF Form, Text, Image and Pattern 
>>>> maintain their own matrix to map to user space which is then transformed 
>>>> by the CTM to device space so handling them specifically is fine and 
>>>> inline with the spec.
>>> 
>>> No, that’s not right, what I said was:
>>> 
>>>>> My problem is that tiling patterns are defined in their parent stream’s 
>>>>> initial coordinate space, rather than the
>>>>> coordinate space defined by the CTM.
>>> 
>>> So patterns should *not* be using the CTM, which is what I’m trying to 
>>> achieve.
>>> 
>> 
>> I think you misunderstood what I wrote - patterns have their own matrix - so 
>> I think we are on the same page here. IMHO according to the spec CTM 
>> transforms from user space to device space. So it’s pattern space -> user 
>> space -> device space.
>> 
>> 
>>>> I’d suggest that we make sure that the different ‚spaces‘ are defined 
>>>> properly within the code and refer to the PDF spec so that the code is 
>>>> easier to read if this is not already the case. With so many changes it’s 
>>>> a good opportunity to enhance the documentation within the source code. 
>>>> Some of the old code enjoys very little documentation.
>>> 
>>> 
>>> I disagree, in general I don’t think that references to the PDF spec are a 
>>> good form of documentation (there are some exceptions). References to the 
>>> spe

Re: Apache PDFBox April 2014 board report due

2014-04-01 Thread Maruan Sahyoun

Hi Andreas,

+1 with the additions from John and Tilman

BR
Maruan

Am 30.03.2014 um 16:29 schrieb Andreas Lehmkuehler :

> Hi,
> 
> find attached a quick draft of the board report we're expected to submit this
> month.
> 
> @Johm, @Tilman
> Please add something about the GSoC status.
> 
> 
> Any further comments, objections or additions?
> 
> 
> 
> 
> The Apache PDFBox library is an open source Java tool for working with PDF
> documents.
> 
> 
> General Comments
> 
> 
> There are no issues that require Board attention.
> 
> 
> Community
> -
> 
> There is a steady stream of contributions and bug reports from the community.
> 
> John Hewson and Tilman Hausherr were added as committers and PMC members to 
> our ranks in February 2014.
> 
> Eric Leleu stepped back and went emeritus per his own request in March 2014.
> 
> 452 (429 last report) subscribers on the user@ list
> 157 (164 last report) subscribers on the dev@ list
> 
> Releases
> 
> 
> Version 1.8.4 was released on 31th of January 2014
> 
> 1.8.4 is an incremental bugfix release based on PDFBox 1.8.x.
> 
> GSoC
> 
> 
> TODO
> 
> Development:
> 
> 
> Most likely the next bugfix version 1.8.5 will be released in the second 
> quarter.
> 
> The work on our next major release is an ongoing effort. The main topics are:
> 
> - switch to java 1.6
> - modularization
> - replace/enhance the parser
> - refactor the underlying COS model
> - code cleanup
> - enhance rendering
> 
> 
> 
> BR
> Andreas Lehmkühler

Re: Apache PDFBox April 2014 board report due

2014-04-02 Thread Maruan Sahyoun

Hi,

to unsubscribe please follow the information at 
http://pdfbox.apache.org/mailinglists.html

BR
Maruan Sahyoun

Am 02.04.2014 um 10:02 schrieb Somnath Jadhav :

> Hello ,
> 
> Can I know how to unsubscribe from this alerts ?
> 
> I no longer needs those alerts and I cant see any option for
> unsubscribe..Please help.
> 
> Regards,
> Somnath Jadhav,
> +91-9270153230
> www.somnathjadhav.com
> 
> 
> On 2 April 2014 12:58, Timo Boehme  wrote:
> 
>> +1 with the GSoC additions.
>> 
>> 
>> Best,
>> Timo
>> 
>> 
>> 
>> Am 30.03.2014 16:29, schrieb Andreas Lehmkuehler:
>> 
>>> Hi,
>>> 
>>> 
>>> find attached a quick draft of the board report we're expected to submit
>>> this
>>> month.
>>> 
>>> @Johm, @Tilman
>>> Please add something about the GSoC status.
>>> 
>>> 
>>> Any further comments, objections or additions?
>>> 
>>> 
>>> 
>>> 
>>> The Apache PDFBox library is an open source Java tool for working with PDF
>>> documents.
>>> 
>>> 
>>> General Comments
>>> 
>>> 
>>> There are no issues that require Board attention.
>>> 
>>> 
>>> Community
>>> -
>>> 
>>> There is a steady stream of contributions and bug reports from the
>>> community.
>>> 
>>> John Hewson and Tilman Hausherr were added as committers and PMC members
>>> to our ranks in February 2014.
>>> 
>>> Eric Leleu stepped back and went emeritus per his own request in March
>>> 2014.
>>> 
>>> 452 (429 last report) subscribers on the user@ list
>>> 157 (164 last report) subscribers on the dev@ list
>>> 
>>> Releases
>>> 
>>> 
>>> Version 1.8.4 was released on 31th of January 2014
>>> 
>>> 1.8.4 is an incremental bugfix release based on PDFBox 1.8.x.
>>> 
>>> GSoC
>>> 
>>> 
>>> TODO
>>> 
>>> Development:
>>> 
>>> 
>>> Most likely the next bugfix version 1.8.5 will be released in the second
>>> quarter.
>>> 
>>> The work on our next major release is an ongoing effort. The main topics
>>> are:
>>> 
>>> - switch to java 1.6
>>> - modularization
>>> - replace/enhance the parser
>>> - refactor the underlying COS model
>>> - code cleanup
>>> - enhance rendering
>>> 
>>> 
>>> 
>>> BR
>>> Andreas Lehmkühler
>>> 
>> 
>> 
>> --
>> 
>> Timo Boehme
>> OntoChem GmbH
>> H.-Damerow-Str. 4
>> 06120 Halle/Saale
>> T: +49 345 4780474
>> F: +49 345 4780471
>> timo.boe...@ontochem.com
>> 
>> _
>> 
>> OntoChem GmbH
>> Geschäftsführer: Dr. Lutz Weber
>> Sitz: Halle / Saale
>> Registergericht: Stendal
>> Registernummer: HRB 215461
>> _
>> 
>>

xmpbox vs. jempbox - which is the one moving forward

2014-04-09 Thread Maruan Sahyoun

Hi,

did we make a decision about xmpbox or jempbox are the one to use for XMP 
metadata moving forward? There is a discussion in PDFBOX-1187 about cutting the 
dependency to jempbox and preflight uses xmpbox.

BR
Maruan

Re: possible memory leak PDFBox 2.0.0

2014-04-10 Thread Maruan Sahyoun

Hi Joseph,

the attachments didn’t make it to the mailing list. Could you upload it to a 
public location? Id the behavior reproducible with any PDF or only with some. 
Could you oplad a sample PDF too?

BR
Maruan Sahyoun

Am 10.04.2014 um 13:50 schrieb Joseph Siddal :

> Hi,
> 
> I've found a memory leak that is caused when doing high volumes of printing.
> 
> The code that reproduces the bug is attached. The code just continuously 
> sends the same printjob to the default printer. The pdf I'm using is 
> available here. The memory leak is evident after 6mins of running the code. 
> The sun.print.CustomMediaTray has 2 static ArrayList fields which are 
> continuously growing in size going from size 29000 to 1+ after 6 minutes 
> and continuing to climb.
> 
> This is using OSX Mavericks, JDK 1.8.0.
> 
> Any help would be appreciated.
> 
> Regards
> Joseph

Re: svn commit: r950554 - in /websites/staging/pdfbox/trunk/content: ./ 1.8/ 1.8/cookbook/ 2.0/ FontAwesome/ docs/1.8.2/ errors/

2015-05-08 Thread Maruan Sahyoun

webfont for icons - I'm removing the stuff that's not needed

BR
Maruan


> Am 08.05.2015 um 17:38 schrieb Tilman Hausherr :
> 
> Am 08.05.2015 um 10:28 schrieb build...@apache.org:
>> Modified: websites/staging/pdfbox/trunk/content/FontAwesome/README.html
> 
> What is this
> 
> https://pdfbox.apache.org/FontAwesome/
> https://pdfbox.apache.org/FontAwesome/docs/
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: svn commit: r950554 - in /websites/staging/pdfbox/trunk/content: ./ 1.8/ 1.8/cookbook/ 2.0/ FontAwesome/ docs/1.8.2/ errors/

2015-05-08 Thread Maruan Sahyoun

Hi,

> Am 08.05.2015 um 17:38 schrieb Tilman Hausherr :
> 
> Am 08.05.2015 um 10:28 schrieb build...@apache.org:
>> Modified: websites/staging/pdfbox/trunk/content/FontAwesome/README.html
> 
> What is this
> 
> https://pdfbox.apache.org/FontAwesome/
> https://pdfbox.apache.org/FontAwesome/docs/
> 

I removed the unneeded files and disabled directory listing in 950602

BR
Maruan

> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

PDRectangle and Matrix Transformations

2015-05-13 Thread Maruan Sahyoun

Hi,

in order to support the AP generation for rotated form fields I need to rotate 
and translate PDRectangles. Unfortunately the current transformation returns a 
GeneralPath from which the transformed PDRectangle needs to be created to be 
set ….

I'd rather add some methods to Matrix to transform and return PDRectangle 
likely wo an awt dependency for these methods. Other option would be to add 
that to the interactive.form package.

WDYT?

BR
Maruan
-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDRectangle and Matrix Transformations

2015-05-13 Thread Maruan Sahyoun

Hi,

> Am 13.05.2015 um 22:42 schrieb John Hewson :
> 
> 
>> On 13 May 2015, at 11:10, Maruan Sahyoun  wrote:
>> 
>> Hi,
>> 
>> in order to support the AP generation for rotated form fields I need to 
>> rotate and translate PDRectangles. Unfortunately the current transformation 
>> returns a GeneralPath from which the transformed PDRectangle needs to be 
>> created to be set ….
>> 
>> I'd rather add some methods to Matrix to transform and return PDRectangle 
>> likely wo an awt dependency for these methods. Other option would be to add 
>> that to the interactive.form package.
>> 
>> WDYT?
> 
> It's not meaningful to rotate a PDRectangle, because the end result is no 
> longer a rectangle, 

fields are only rotated in 90 degree steps.

BR
Maruan 

> 
> 
>> BR
>> Maruan
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Dear. Developers

2015-05-19 Thread Maruan Sahyoun

Hi,

yes, there were some changes regarding such files (incremental save). Please 
use the latest version [1] and load your documents using 
PDDocument.loadNonSeq() [2]. You can use null for the scratchFile parameter.

BR
Maruan

[1] http://pdfbox.apache.org/download.cgi
[2] 
http://pdfbox.apache.org/docs/1.8.9/javadocs/org/apache/pdfbox/pdmodel/PDDocument.html#loadNonSeq(java.io.File,%20org.apache.pdfbox.io.RandomAccess)

> Am 20.05.2015 um 06:18 schrieb 정승연 :
> 
> 
> 
> 
> 
> Dear. Developers
> My name is Park, Joon Ho. I am a programer and I work for Scourt in R.O.K
> I use pdfbox 0.8.0 version.
> I get number of PDF File page From "pdfdoc.getNumberOfPages();" function.
> But, i get wrong pagecount for some special case PDF File.
> there are many exist "%%EOF" and "starxref" (more than two of them)
> For example, the actual page number are 20 pages , but it returens only 12 
> pages.
> So, How can I get correct page numbers.
> If you have any idea for this case. can you please tell me what to do.
> I am useing jdk1.4  and PDFBOX 0.8.0 
> would upgrading Pdfbox version solve this problem?
> Finally, Which pdfbox version does JDK 1.6 supports?
> I am looking forward to seeing your reply soon, thank you in advance.
> 
> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Trunk JavaDoc is stale

2015-06-25 Thread Maruan Sahyoun

I'm currently updating the JavaDoc - should be available shortly (as soon as 
the CMS build server completed)

The process is 

- generate the javadoc
- remove the old one either via svn or through the CMS web ui
- upload a tar archive of the new one using the CMS web ui

While writing the CMS build failed - so looking into that.

BR
Maruan



> Am 25.06.2015 um 20:35 schrieb John Hewson :
> 
> What’s the process for updating the trunk JavaDoc? It’s currently weeks (if 
> not months) old...
> 
> — John
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Trunk JavaDoc is stale

2015-06-25 Thread Maruan Sahyoun

Hi,

> Am 25.06.2015 um 21:04 schrieb Maruan Sahyoun :
> 
> I'm currently updating the JavaDoc - should be available shortly (as soon as 
> the CMS build server completed)
> 
> The process is 
> 
> - generate the javadoc
> - remove the old one either via svn or through the CMS web ui
> - upload a tar archive of the new one using the CMS web ui
> 
> While writing the CMS build failed - so looking into that.

pdfbox.staging.apache.org <http://pdfbox.staging.apache.org/> is updated. 
pdfbox.apache.org <http://pdfbox.apache.org/> should be as soon as the 
publishing process finished.

BR
Maruan

> 
> BR
> Maruan
> 
> 
> 
>> Am 25.06.2015 um 20:35 schrieb John Hewson :
>> 
>> What’s the process for updating the trunk JavaDoc? It’s currently weeks (if 
>> not months) old...
>> 
>> — John
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>

Re: Trunk JavaDoc is stale

2015-06-25 Thread Maruan Sahyoun

Hi,

> Am 25.06.2015 um 21:19 schrieb Maruan Sahyoun :
> 
> Hi,
> 
>> Am 25.06.2015 um 21:04 schrieb Maruan Sahyoun :
>> 
>> I'm currently updating the JavaDoc - should be available shortly (as soon as 
>> the CMS build server completed)
>> 
>> The process is 
>> 
>> - generate the javadoc
>> - remove the old one either via svn or through the CMS web ui
>> - upload a tar archive of the new one using the CMS web ui
>> 
>> While writing the CMS build failed - so looking into that.
> 
> pdfbox.staging.apache.org <http://pdfbox.staging.apache.org/> is updated. 
> pdfbox.apache.org <http://pdfbox.apache.org/> should be as soon as the 
> publishing process finished.

unfortunately the changes didn't make it from staging to production - needs 
investigation. Normally hitting publish in the CMS web UI should do the job.

BR
Maruan

> 
> BR
> Maruan
> 
>> 
>> BR
>> Maruan
>> 
>> 
>> 
>>> Am 25.06.2015 um 20:35 schrieb John Hewson :
>>> 
>>> What’s the process for updating the trunk JavaDoc? It’s currently weeks (if 
>>> not months) old...
>>> 
>>> — John
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Trunk JavaDoc is stale

2015-06-25 Thread Maruan Sahyoun


> Am 26.06.2015 um 08:24 schrieb Maruan Sahyoun :
> 
> Hi,
> 
>> Am 25.06.2015 um 21:19 schrieb Maruan Sahyoun :
>> 
>> Hi,
>> 
>>> Am 25.06.2015 um 21:04 schrieb Maruan Sahyoun :
>>> 
>>> I'm currently updating the JavaDoc - should be available shortly (as soon 
>>> as the CMS build server completed)
>>> 
>>> The process is 
>>> 
>>> - generate the javadoc
>>> - remove the old one either via svn or through the CMS web ui
>>> - upload a tar archive of the new one using the CMS web ui
>>> 
>>> While writing the CMS build failed - so looking into that.
>> 
>> pdfbox.staging.apache.org <http://pdfbox.staging.apache.org/> is updated. 
>> pdfbox.apache.org <http://pdfbox.apache.org/> should be as soon as the 
>> publishing process finished.
> 
> unfortunately the changes didn't make it from staging to production - needs 
> investigation. Normally hitting publish in the CMS web UI should do the job.

works now.
BR
Maruan

> 
> BR
> Maruan
> 
>> 
>> BR
>> Maruan
>> 
>>> 
>>> BR
>>> Maruan
>>> 
>>> 
>>> 
>>>> Am 25.06.2015 um 20:35 schrieb John Hewson :
>>>> 
>>>> What’s the process for updating the trunk JavaDoc? It’s currently weeks 
>>>> (if not months) old...
>>>> 
>>>> — John
>>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>> 
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Apache PDFBox July 2015 board report due

2015-07-01 Thread Maruan Sahyoun

Hi Andreas,

fine for me - nothing special I'm currently looking into for 1.8.10.

BR
Maruan

> Am 01.07.2015 um 20:03 schrieb Andreas Lehmkuehler :
> 
> Hi,
> 
> find attached a quick draft of the board report we're expected to submit this
> month. It's based upon the report template which can be found at [1]
> 
> @Tilman
> Please provide a status about GSoC 2015
> 
> 
> Any further comments, objections or additions?
> 
> 
> 
> 
> Report from the Apache PDFBox project [Andreas Lehmkühler]
> 
> ## Description:
>   The Apache PDFBox library is an open source Java tool for working with
>   PDF documents.
> 
> ## Activity:
> 
> - the work on our next major release 2.0.0 is an ongoing effort
> - our plan to cut a first release candidate in April didn't come true
> - we are down to round about 25 tickets for 2.0.0
> 
> ## Issues:
> 
> - there are no issues requiring board attention at this time
> 
> ## PMC/Committership changes:
> 
> - Currently 16 committers and 16 PMC members in the project.
> - No new PMC members added in the last 3 months
> - Last PMC addition was John Hewson at Tue Feb 11 2014
> - No new committers added in the last 3 months
> - Last committer addition was John Hewson at Fri Feb 07 2014
> 
> ## Releases:
> 
> - Last release was 1.8.9 on Sat Mar 28 2015
> 
> ## Mailing list activity:
> 
> - us...@pdfbox.apache.org:
>- 496 subscribers (up 15 in the last 3 months):
>- 579 emails sent to list (572 in previous quarter)
> 
> - dev@pdfbox.apache.org:
>- 148 subscribers (down -2 in the last 3 months):
>- 2609 emails sent to list (3650 in previous quarter)
> 
> 
> ## JIRA activity:
> 
> - 110 JIRA tickets created in the last 3 months
> - 97 JIRA tickets closed/resolved in the last 3 months
> 
> 
> 
> BR
> Andreas Lehmkühler
> 
> [1] https://reporter.apache.org/?pdfbox
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox 2.0.0 release

2015-07-06 Thread Maruan Sahyoun

Hi Andreas,

I'd also go for a RC to be able to settle down on some additional 
changes/feedback before releasing.

BR
Maruan


> Am 06.07.2015 um 11:55 schrieb Andreas Lehmkühler :
> 
> Hi,
> 
> 
> I'd like to do a 2.0.0 release rather sooner than later and I guess I'm not 
> the
> only one ;-)
> 
> We are down to 24 issues marked with "Fix Version 2.0.0".
> 
> @Assignees: please have a look at "your" issues and verify if we really should
> wait for them to be resolved first or if those could be moved to a later 
> release
> (2.1.0 or 3.0.0)
> 
> To start with a release candidate would be another option, but I'd prefer to
> release 2.0.0.
> 
> WDYT?
> 
> BR
> Andreas Lehmkühler
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox 2.0.0 release

2015-07-06 Thread Maruan Sahyoun

Hi,
> Am 06.07.2015 um 19:08 schrieb Tilman Hausherr :
> 
> Yes it would be great that the 2.0 version be released. Before the opening of 
> the new Berlin airport.
> 
> IMO only the following issues are important for 2.0:
> - PDFBOX-2301 - RandomAccessBuffer consumes too much memory - isn't that one 
> done?
> - PDFBOX-2370 - Move caching outside of PDResources - I assume John has a 
> concept in his head, but hasn't implemented it
> - PDFBOX-2423 - Page tree handling needs rewriting - only the page tree 
> issues are important (if any remain), the transparency problems can be done 
> later.
> - PDFBOX-2400 - Add insertPage() method - this is related to PDFBOX-2423

could move to 2.1 as this in non API breaking

> - PDFBOX-2705 - Add IKVM support to Maven build - when this was created, I 
> thought this would be done quickly, but then nothing happened :-(
> - PDFBOX-2340 - documentation - I suggest a wiki for the migration issues.

I'd prefer the regular documentation (CMS currently) as to avoid to have yet 
another location for documentation.

BR
Maruan

> 
> A release candidate is a good idea, hopefully the people who use 2.0 already 
> without updating after every new commit can test their own applications.
> 
> Tilman
> 
> Am 06.07.2015 um 11:55 schrieb Andreas Lehmkühler:
>> Hi,
>> 
>> 
>> I'd like to do a 2.0.0 release rather sooner than later and I guess I'm not 
>> the
>> only one ;-)
>> 
>> We are down to 24 issues marked with "Fix Version 2.0.0".
>> 
>> @Assignees: please have a look at "your" issues and verify if we really 
>> should
>> wait for them to be resolved first or if those could be moved to a later 
>> release
>> (2.1.0 or 3.0.0)
>> 
>> To start with a release candidate would be another option, but I'd prefer to
>> release 2.0.0.
>> 
>> WDYT?
>> 
>> BR
>> Andreas Lehmkühler
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox 2.0.0 release

2015-07-06 Thread Maruan Sahyoun

Hi,

> Am 06.07.2015 um 20:31 schrieb Ruhong Cai :
> 
> Hi,
> 
> The following code could prove that there is a bug in getting the number of 
> the pages 
> 
> PDDocument pdf1 = PDDocument.load("C:\\ms12_TIM.pdf");

could you upload the file to a public location to take a look?

BR
Maruan


> 
> int count = pdf1.getNumberOfPages();
> 
> 
> count return “0” , the file has a page.
> 
> Thanks!
> 
> 
> Ruhong
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: 2.0.0. RC was Re: PDFBox 2.0.0 release

2015-07-07 Thread Maruan Sahyoun

Hi,

> Am 07.07.2015 um 09:34 schrieb Andreas Lehmkühler :
> 
> Hi,
> 
>> Andreas Lehmkühler  hat am 6. Juli 2015 um 11:55
>> geschrieben:
>> 
>> 
>> Hi,
>> 
>> 
>> I'd like to do a 2.0.0 release rather sooner than later and I guess I'm not
>> the
>> only one ;-)
>> 
>> We are down to 24 issues marked with "Fix Version 2.0.0".
>> 
>> @Assignees: please have a look at "your" issues and verify if we really 
>> should
>> wait for them to be resolved first or if those could be moved to a later
>> release
>> (2.1.0 or 3.0.0)
>> 
>> To start with a release candidate would be another option, but I'd prefer to
>> release 2.0.0.
>> 
>> WDYT?
> 
> As there seems to be a majority supporting a release candidate I'd like to 
> find
> out what exactly a possible RC would be so that we are all on the same page:
> 
> - is it feature complete? IMHO, yes

Yes

> - is the api stable? IMHO, yes

A RC should give us the ability to do minor tweaks - if we can't then there is 
no need for a RC

> - do we create a branch or just release from a tag? IMHO, we should branch,
> especially if the api is meant to be stable
> - we won't push the RC to maven central but would provide a possibility to
> download the RC. This is a common approach in other apache projects
> 
> 
> How long do we wait until releasing the final 2.0? We might define some
> rule/goal for that.

dependent on issues coming up, which are due to the RC and not because of new 
issues, a month or two should be fine

> 
> What exactly will be the difference between the RC and the final release? 
> (there
> are not that much open tickets left, so that I presume it won't be that big)

changes to the API because of feedback. If we declare the API to be final we 
don't need a RC IMHO but could follow up with a bug fix release shortly.

> 
> I'm in favour of a final release without an RC. Our release process is quite
> lean so that it wouldn't hurt to much to release a 2.0.x bugfix release.
> 
> BR
> Andreas
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: calling protect() after setAllSecurityToBeRemoved(true)

2015-07-11 Thread Maruan Sahyoun

Hi

> Am 11.07.2015 um 19:44 schrieb Tilman Hausherr :
> 
> Yesterday user Roberto had a problem where a file wasn't saved with 
> encryption. The cause turned out to be that he had called
> 
> setAllSecurityToBeRemoved(true)
> 
> and then
> 
> protect(...)
> 
> I didn't find it by looking at his code, only after debugging in save().
> 
> Although the javadoc of both calls is clear, I see a risk that this happens 
> again, e.g. when people combine existing code.
> 
> What should be do? Options:
> 
> 1. nothing
> 2. mention the risk in javadoc
> 3. if allSecurityToBeRemoved is true in protect(), call LOG.warn("call 
> setAllSecurityToBeRemoved(false) before saving or file won't be encrypted");
> 4. if allSecurityToBeRemoved is true in protect(), throw an 
> IllegalStateException
> 5. set allSecurityToBeRemoved to false when protect() is called
> 
> I'm for options 3 or 4.

I'd go for option 5 together with a warning as the call to protect() shows the 
intention and add that to the javadocs.

BR
Maruan


> 
> Tilman
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[DISCUSS] Inconsistencies between annotation and form package

2015-07-12 Thread Maruan Sahyoun

Hi,

there are some inconsistencies between the annotation package and the form 
package how static final fields are handled. The classes in the annotation 
package have these public. The classes in the form package have these private 
or package private. I'd propose to make them public in the form package. 

Furthermore there are some minor differences between method names such as 
setReadOnly() in annotation and setReadonly() in form which I'd like to make 
consistent prior to 2.0.0

WDYT?

BR
Maruan
-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: [DISCUSS] Inconsistencies between annotation and form package

2015-07-12 Thread Maruan Sahyoun


> Am 13.07.2015 um 03:18 schrieb John Hewson :
> 
> 
>> On 12 Jul 2015, at 15:17, Maruan Sahyoun  wrote:
>> 
>> Hi,
>> 
>> there are some inconsistencies between the annotation package and the form 
>> package how static final fields are handled. The classes in the annotation 
>> package have these public. The classes in the form package have these 
>> private or package private. I'd propose to make them public in the form 
>> package. 
> 
> I thought the goal for 2.0 was to move from using integer constants to enums?

Good point. Why not change the visibility first and if time permits move to 
enums? ATM the AP issues for fields are more important. 

> 
> — John
> 
>> Furthermore there are some minor differences between method names such as 
>> setReadOnly() in annotation and setReadonly() in form which I'd like to make 
>> consistent prior to 2.0.0
> 
> Yes, that definitely wants to be setReadOnly().

Will change it.

BR
Maruan

> 
>> WDYT?
>> 
>> BR
>> Maruan
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: [DISCUSS] Inconsistencies between annotation and form package

2015-07-13 Thread Maruan Sahyoun


> Am 13.07.2015 um 08:55 schrieb John Hewson :
> 
> 
>> On 12 Jul 2015, at 22:50, Maruan Sahyoun  wrote:
>> 
>> 
>>> Am 13.07.2015 um 03:18 schrieb John Hewson :
>>> 
>>> 
>>>> On 12 Jul 2015, at 15:17, Maruan Sahyoun  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> there are some inconsistencies between the annotation package and the form 
>>>> package how static final fields are handled. The classes in the annotation 
>>>> package have these public. The classes in the form package have these 
>>>> private or package private. I'd propose to make them public in the form 
>>>> package. 
>>> 
>>> I thought the goal for 2.0 was to move from using integer constants to 
>>> enums?
>> 
>> Good point. Why not change the visibility first and if time permits move to 
>> enums? ATM the AP issues for fields are more important. 
> 
> One reason not to change would be that once it’s public there’s no going back 
> if we don’t move to enums before 2.0.

yes, otoh it would leave us with the inconsitency between annotation and form 
which is also not good

Maruan

> 
> — John
> 
>>> 
>>> — John
>>> 
>>>> Furthermore there are some minor differences between method names such as 
>>>> setReadOnly() in annotation and setReadonly() in form which I'd like to 
>>>> make consistent prior to 2.0.0
>>> 
>>> Yes, that definitely wants to be setReadOnly().
>> 
>> Will change it.
>> 
>> BR
>> Maruan
>> 
>>> 
>>>> WDYT?
>>>> 
>>>> BR
>>>> Maruan
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: [DISCUSS] Inconsistencies between annotation and form package

2015-07-13 Thread Maruan Sahyoun


> Am 13.07.2015 um 09:42 schrieb John Hewson :
> 
> 
>> On 13 Jul 2015, at 00:27, Maruan Sahyoun  wrote:
>> 
>> 
>>> Am 13.07.2015 um 08:55 schrieb John Hewson :
>>> 
>>> 
>>>> On 12 Jul 2015, at 22:50, Maruan Sahyoun  wrote:
>>>> 
>>>> 
>>>>> Am 13.07.2015 um 03:18 schrieb John Hewson :
>>>>> 
>>>>> 
>>>>>> On 12 Jul 2015, at 15:17, Maruan Sahyoun  wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> there are some inconsistencies between the annotation package and the 
>>>>>> form package how static final fields are handled. The classes in the 
>>>>>> annotation package have these public. The classes in the form package 
>>>>>> have these private or package private. I'd propose to make them public 
>>>>>> in the form package. 
>>>>> 
>>>>> I thought the goal for 2.0 was to move from using integer constants to 
>>>>> enums?
>>>> 
>>>> Good point. Why not change the visibility first and if time permits move 
>>>> to enums? ATM the AP issues for fields are more important. 
>>> 
>>> One reason not to change would be that once it’s public there’s no going 
>>> back if we don’t move to enums before 2.0.
>> 
>> yes, otoh it would leave us with the inconsitency between annotation and 
>> form which is also not good
> 
> True, I suppose the question then is, should these fields be public at all? 
> Given that we have PD accessors for the individual flags which they set, I 
> think the answer is actually “no”. These are COS-level values which PD should 
> hide.

The accessors are fine and are the primary API that should be used. If there is 
a setter/getter missing we should add that. 

If - for whatever reason - a users decides to use the COS level objects it's 
beneficial to have these values so one could use dictionary.setFlag. So the 
(only) use case it to allow using the COS level and have the values at hand 
with a readable name instead of numeric or string values.

Maruan



> 
> — John
> 
>> Maruan
>> 
>>> 
>>> — John
>>> 
>>>>> 
>>>>> — John
>>>>> 
>>>>>> Furthermore there are some minor differences between method names such 
>>>>>> as setReadOnly() in annotation and setReadonly() in form which I'd like 
>>>>>> to make consistent prior to 2.0.0
>>>>> 
>>>>> Yes, that definitely wants to be setReadOnly().
>>>> 
>>>> Will change it.
>>>> 
>>>> BR
>>>> Maruan
>>>> 
>>>>> 
>>>>>> WDYT?
>>>>>> 
>>>>>> BR
>>>>>> Maruan
>>>>>> -
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> -
>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>>>> 
>>>> 
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: [VOTE] Release Apache PDFBox 1.8.10

2015-07-19 Thread Maruan Sahyoun

+1

BR Maruan
> Am 18.07.2015 um 18:16 schrieb Andreas Lehmkuehler :
> 
> Hi,
> 
> a candidate for the PDFBox 1.8.10 release is available at:
> 
>https://dist.apache.org/repos/dist/dev/pdfbox/1.8.10/
> 
> The release candidate is a zip archive of the sources in:
> 
>http://svn.apache.org/repos/asf/pdfbox/tags/1.8.10/
> 
> The SHA1 checksum of the archive is 0413f40458a33720e693ba02017ac6a514e856de.
> 
> Please vote on releasing this package as Apache PDFBox 1.8.10.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PDFBox PMC votes are cast.
> 
>[ ] +1 Release this package as Apache PDFBox 1.8.10
>[ ] -1 Do not release this package because...
> 
> Here is my +1
> 
> BR
> Andreas Lehmkühler
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFReader vs PDFDebugger

2015-07-26 Thread Maruan Sahyoun

Hi,

> Am 26.07.2015 um 22:00 schrieb Andreas Lehmkuehler :
> 
> Hi,
> 
> I like all those new features which were added to PDFDebugger lately, as far 
> as I've already found them ;-)
> 
> I'm thinking about removing PDFReader as PDFDebugger is now able to render 
> single pages as well and it is doing it even better.
> 

I'm not using PDFReader but the use case is different to the PDFDebugger. 
Shouldn't we be able to share the rendering between both and keep it?

BR
Maruan

> 
> WDYT?
> 
> BR
> Andreas
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFReader vs PDFDebugger

2015-07-26 Thread Maruan Sahyoun


> Am 27.07.2015 um 07:59 schrieb John Hewson :
> 
> 
>> On 26 Jul 2015, at 13:11, Maruan Sahyoun  wrote:
>> 
>> Hi,
>> 
>>> Am 26.07.2015 um 22:00 schrieb Andreas Lehmkuehler :
>>> 
>>> Hi,
>>> 
>>> I like all those new features which were added to PDFDebugger lately, as 
>>> far as I've already found them ;-)
>>> 
>>> I'm thinking about removing PDFReader as PDFDebugger is now able to render 
>>> single pages as well and it is doing it even better.
>> 
>> I'm not using PDFReader but the use case is different to the PDFDebugger. 
>> Shouldn't we be able to share the rendering between both and keep it?
> 
> There isn't really any rendering code in either of these projects, it's 
> simply a difference in how the image is presented via Swing components. These 
> aren't compatible across the two projects due to differing GUI approaches 
> being taken (e.g. PDFReader is a total mess).

i.e. we'd need to rewrite the PDFReader?

BR
Maruan


> 
> PDFDebugger's viewer is likely to gain further features specific to 
> debugging, so one size does not fit all.
> 
>> BR
>> Maruan
>> 
>>> 
>>> WDYT?
>>> 
>>> BR
>>> Andreas
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDF Technical Conference 2015

2015-08-15 Thread Maruan Sahyoun

Hi,

> Am 16.08.2015 um 05:23 schrieb Leonard Rosenthol :
> 
> Great!  Look forward to seeing you there.
> 
> (note: we already did the European one this year, and I believe Maruan was 
> there again).

yes, as I've been the year before. And it has been a very successful conference 
for us as we have been able to get in touch with a lot of knowledgable people 
sharing our thoughts about PDF - the benefits and the challenges. Many vendors 
have also been very helpful answering some technical questions we had which 
enabled us to solve some issues in PDFBox. I think that this shows the health 
of the industry that people - although being competitors - work together making 
PDF and the PDF software offerings better over time.

I hope you find the time to have a quick chat with John. He is very involved 
with rendering and fonts (and 'loves' all this pattern space, matrix, shading 
…. stuff).

And thank you for being so helpful in the past answering some of the questions 
we had.

BR
Maruan 

> 
> Leonard
> 
> 
> 
> 
> On 8/15/15, 1:19 AM, "John Hewson"  wrote:
> 
>> Hi All,
>> 
>> For those who are interested, I’ll be attending the PDF Technical Conference 
>> 2015 in San Jose. It’ll be my first time at such an event. Maybe I’ll see 
>> some of you fellow dev list subscribers there (though probably not too many 
>> from Europe).
>> 
>> http://www.pdfa.org/event/pdf-technical-conference-2015/
>> 
>> — John
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: ClassCastException during rendering pdf

2015-09-21 Thread Maruan Sahyoun

Hi Manfred,

how was that document created. Has it been processed afterwards?

The reason for the issue - if I'm not mistaken is that the document contains a 
FileAttachment annotation but that is contained in the /Contents of the page 
where there should either be a single stream or an array of streams with the 
pages graphical content. A file attachment annotation should be in the /Annots 
entry of the page (which is empty in your case)

BR
Maruan

> Am 21.09.2015 um 10:50 schrieb Manfred Pock :
> 
> 
> Sorry, it's just by one pdf, i have removed the correct one, the exception 
> exists at
> 
> http://cloud.directupload.net/2pYC
> 
> thanks, Manfred
> 
>  Weitergeleitete Nachricht 
> Betreff:  ClassCastException during rendering pdf
> Datum:Mon, 21 Sep 2015 10:44:48 +0200
> Von:  Manfred Pock 
> An:   dev@pdfbox.apache.org
> 
> 
> 
> Hi,
> 
> i get a ClassCastException in Current 2.0-trunkversion, Java 1.8,
> Windows 7, during rendering two pdfs. Is there any solution?
> 
> The Stacktrace is:
> 
> Exception in thread "AWT-EventQueue-0" java.lang.ClassCastException:
> org.apache.pdfbox.cos.COSDictionary cannot be cast to
> org.apache.pdfbox.cos.COSStream
>at org.apache.pdfbox.pdmodel.PDPage.getContents(PDPage.java:157)
>at
> org.apache.pdfbox.pdfparser.PDFStreamParser.(PDFStreamParser.java:92)
>at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:450)
>at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437)
>at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148)
>at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:179)
>at
> org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208)
>at
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139)
>at
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
> 
> and you can download the pdf's at
> 
> http://cloud.directupload.net/2pYB
> http://cloud.directupload.net/2pYC
> 
> best regarts, Manfred
> 
> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Apache PDFBox October 2015 board report due

2015-10-05 Thread Maruan Sahyoun

Hi,

+ 1 

One thing we might want to address is the large numbers of eMail to dev because 
of the commit etc. stuff.

Maruan

> Am 05.10.2015 um 19:47 schrieb Andreas Lehmkuehler :
> 
> Hi,
> 
> find attached a quick draft of the board report we're expected to submit this
> month. It's based upon the report template which can be found at [1]
> 
> 
> Any further comments, objections or additions?
> 
> 
> 
> 
> Report from the Apache PDFBox committee [Andreas Lehmkühler]
> 
> ## Description:
>   The Apache PDFBox library is an open source Java tool for working with
>   PDF documents.
> 
> ## Activity:
> - after a long time of hard work we decided to cut a release candidate for
>   2.0.0 this october. As we are down to 6 open tickets I'm quite optimistic
>   that it'll really come true
> - we joined forces with Tim Allison from Apache TIKA to run some bulk tests
>   from time to time to avoid regressions
> 
> ## Health report:
> - there is a steady stream of contributions, bug reports and questions on
>   the mailing lists
> - the core team consists of 4 - 5 active developers
> - we expect to attract more people once our new major release is out of the
>   door
> 
> ## Issues:
> - there are no issues requiring board attention at this time"
> 
> ## PMC changes:
> 
> - Currently 16 PMC members.
> - No new PMC members added in the last 3 months
> - Last PMC addition was John Hewson at Thu Feb 06 2014
> 
> ## LDAP changes:
> 
> - Currently 16 committers and 16 committee group members.
> - No new committee group members added in the last 3 months
> - No new committers added in the last 3 months
> - Last committer addition was John Hewson at Fri Feb 07 2014
> 
> ## Releases:
> 
> - 1.8.10 was released on Wed Jul 22 2015
> 
> ## Mailing list activity:
> 
> - us...@pdfbox.apache.org:
>- 497 subscribers (up 6 in the last 3 months):
>- 519 emails sent to list (578 in previous quarter)
> 
> - dev@pdfbox.apache.org:
>- 145 subscribers (down -4 in the last 3 months):
>- 2932 emails sent to list (2594 in previous quarter)
> 
> 
> ## JIRA activity:
> 
> - 151 JIRA tickets created in the last 3 months
> - 143 JIRA tickets closed/resolved in the last 3 months
> 
> 
> 
> 
> BR
> Andreas Lehmkühler
> 
> [1] https://reporter.apache.org/?pdfbox
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Apache PDFBox October 2015 board report due

2015-10-05 Thread Maruan Sahyoun

Hi,
> Am 06.10.2015 um 08:07 schrieb Andreas Lehmkuehler :
> 
> Am 05.10.2015 um 23:02 schrieb Maruan Sahyoun:
>> Hi,
>> 
>> + 1
>> 
>> One thing we might want to address is the large numbers of eMail to dev 
>> because of the commit etc. stuff.
> Hmmm, I'm not sure that I've got your point. Do you want to explain the high 
> number of mails on dev@ compared to users@?

yes - as removing the commit messages from dev the traffic as users is higher 
(which is good)

Maruan

> 
>> 
>> Maruan
>> 
>>> Am 05.10.2015 um 19:47 schrieb Andreas Lehmkuehler :
>>> 
>>> Hi,
>>> 
>>> find attached a quick draft of the board report we're expected to submit 
>>> this
>>> month. It's based upon the report template which can be found at [1]
>>> 
>>> 
>>> Any further comments, objections or additions?
>>> 
>>> 
>>> 
>>> 
>>> Report from the Apache PDFBox committee [Andreas Lehmkühler]
>>> 
>>> ## Description:
>>>   The Apache PDFBox library is an open source Java tool for working with
>>>   PDF documents.
>>> 
>>> ## Activity:
>>> - after a long time of hard work we decided to cut a release candidate for
>>>   2.0.0 this october. As we are down to 6 open tickets I'm quite optimistic
>>>   that it'll really come true
>>> - we joined forces with Tim Allison from Apache TIKA to run some bulk tests
>>>   from time to time to avoid regressions
>>> 
>>> ## Health report:
>>> - there is a steady stream of contributions, bug reports and questions on
>>>   the mailing lists
>>> - the core team consists of 4 - 5 active developers
>>> - we expect to attract more people once our new major release is out of the
>>>   door
>>> 
>>> ## Issues:
>>> - there are no issues requiring board attention at this time"
>>> 
>>> ## PMC changes:
>>> 
>>> - Currently 16 PMC members.
>>> - No new PMC members added in the last 3 months
>>> - Last PMC addition was John Hewson at Thu Feb 06 2014
>>> 
>>> ## LDAP changes:
>>> 
>>> - Currently 16 committers and 16 committee group members.
>>> - No new committee group members added in the last 3 months
>>> - No new committers added in the last 3 months
>>> - Last committer addition was John Hewson at Fri Feb 07 2014
>>> 
>>> ## Releases:
>>> 
>>> - 1.8.10 was released on Wed Jul 22 2015
>>> 
>>> ## Mailing list activity:
>>> 
>>> - us...@pdfbox.apache.org:
>>>- 497 subscribers (up 6 in the last 3 months):
>>>- 519 emails sent to list (578 in previous quarter)
>>> 
>>> - dev@pdfbox.apache.org:
>>>- 145 subscribers (down -4 in the last 3 months):
>>>- 2932 emails sent to list (2594 in previous quarter)
>>> 
>>> 
>>> ## JIRA activity:
>>> 
>>> - 151 JIRA tickets created in the last 3 months
>>> - 143 JIRA tickets closed/resolved in the last 3 months
>>> 
>>> 
>>> 
>>> 
>>> BR
>>> Andreas Lehmkühler
>>> 
>>> [1] https://reporter.apache.org/?pdfbox
>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox 2.0.0 release

2015-10-09 Thread Maruan Sahyoun

Hi,

> Am 09.10.2015 um 12:51 schrieb Andreas Lehmkühler :
> 
> Hi,
> 
> in a quick discussion on private@pdfbox we agreed to cut a new release next 
> week
> on october 15th. There are some questions which should be answered before:
> 
> - Are we still in the same boat? Are there any concerns not to do a release 
> and
> postpone it?

No concerns

> - RC or "real" release?
> 

I'd do a RC as with the vast amount of changes there might be some tweaks to 
the API necessary which we wouldn't be able to do if it's a "real" release.  


> If we do a RC the following is done/expected
> 
> - we expect the API to be stable, but there may be some anyway if necessary

yes - with some minor changes possible dependent on users feedback

> - I won't create a 2.0.0 branch

+1

> - I'll deploy the RC to maven central and we'll provide the RC for download
> through our website

+1

> - I won't close any jira tickets
> 

+1

> Did I forget anything else?
> 

Maybe we should announce a date when we would like to go from RC to final - 
15th of November ?

BR
Maruan

> 
> 
> BR
> Andreas
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[DISCUSS] Create XFA in other package

2015-10-12 Thread Maruan Sahyoun

Hi,

I'd like to create XFA handling in a different package to be able to extend it 
further (which may or may not happen) without cluttering the interactive.forms 
package.

Reasons from my perspective
- there were a number of questions around xfa lately
- we do not currently support filling out XFA forms (hybrid or dynamic) in a 
way that XFA aware applications are being dealt with (Acrobat Reader and such) 
- to do it properly there needs to be a lot of additional functionality with a 
lot of XML handling
- to abstract the XML there might be a number of new XFA related classes such 
as XFATemplate, XFADatasets, XFAField, XFASubform … (names are only for 
illustration)
- people wanting to deal with XFA forms are encouraged to use the new package.
- there is no COS or PD comparable to classic PDF objects in XFA
- PDF 2.0 deprecates XFA
- the low level PDXFAResource will be kept

My preference would be to create something like  o.a.p.services.XFA. Initially 
there will not be more than some bases classes with the same low level handling 
as currently done.

WDYT?

BR Maruan
-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox 2.0.0 release

2015-10-12 Thread Maruan Sahyoun

Hi,

> Am 12.10.2015 um 12:32 schrieb Andreas Lehmkühler :
> 
> Hi,
> 
>> Maruan Sahyoun  hat am 9. Oktober 2015 um 13:01
>> geschrieben:
>> 
>> 
>> Hi,
>> 
> 
> SNIP
> 
>>> Did I forget anything else?
>>> 
>> 
>> Maybe we should announce a date when we would like to go from RC to final -
>> 15th of November ?
> I'm ok with that, but I can't confirm the supposed date as I won't be 
> available
> as release manager. How about the 18th or 19th of November

fine for me - the date will depend on the feedback anyway.

BR Maruan

> 
>> BR
>> Maruan
>> 
> 
> BR
> Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

PDFBox 1.2.1 Javadon

2015-10-12 Thread Maruan Sahyoun

Hi,

we still have the old 1.2.1 javadoc available - can we delete it?

BR Maruan

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox 2.0.0 release

2015-10-14 Thread Maruan Sahyoun

Hi,

> Am 14.10.2015 um 19:09 schrieb Andreas Lehmkuehler :
> 
> Hi,
> 
> 
> Am 09.10.2015 um 12:51 schrieb Andreas Lehmkühler:
>> Hi,
>> 
>> in a quick discussion on private@pdfbox we agreed to cut a new release next 
>> week
>> on october 15th. There are some questions which should be answered before:
> Just a friendly reminder. I'm going to cut the release in round about 24 
> hours from now.

I've updated the javadoc to match the latest changes.

BR
Maruan

> 
> BR
> Andreas
> 
>> 
>> - Are we still in the same boat? Are there any concerns not to do a release 
>> and
>> postpone it?
>> - RC or "real" release?
>> 
>> If we do a RC the following is done/expected
>> 
>> - we expect the API to be stable, but there may be some anyway if neccessary
>> - I won't create a 2.0.0 branch
>> - I'll deploy the RC to maven central and we'll provide the RC for download
>> through our website
>> - I won't close any jira tickets
>> 
>> Did I forget anything else?
>> 
>> 
>> 
>> BR
>> Andreas
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Documentation and Subprojects

2015-10-15 Thread Maruan Sahyoun

Moving forward with the documentation I thought about including FontBox, XMPBox 
and Preflight on the website with at least a link to their javadoc. 

WDYT?

BR
Maruan
-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox 2.0.0 release

2015-10-15 Thread Maruan Sahyoun

I've added an initial migration guide which can be reviewed at 
http://pdfbox.staging.apache.org/2.0/migration.html 


Changes etc at https://issues.apache.org/jira/browse/PDFBOX-3030 


BR
Maruan

> Am 15.10.2015 um 20:12 schrieb Andreas Lehmkuehler :
> 
> Hi,
> 
> Am 15.10.2015 um 19:29 schrieb Tilman Hausherr:
>> Am 15.10.2015 um 19:27 schrieb Tilman Hausherr:
>>> Am 14.10.2015 um 19:09 schrieb Andreas Lehmkuehler:
> 
 Just a friendly reminder. I'm going to cut the release in round about 24
 hours from now.
>>> 
>>> I've aggressively set most 2.0 issues to resolved or removed them from 2.0,
>>> where possible. We're down to 3 left:
>>> - PDFBOX-2893 - this is
>>> just about the names now.
>>> - PDFBOX-2930  - hasn't
>>> been done
>>> - PDFBOX-294 3- hasn't 
>>> been
>>> done. However this is unimportant, about what value should be returned for a
>>> type 3 font space where the width and the stream are missing.
>>> 
>>> Tilman
>>> 
>> 
>> I meant to say I'm done for today, i.e. as far as I'm concerned, you can 
>> start
>> immediately.
> Thanks for the heads up, I'm starting asap.
> 
> BR
> Andreas
> 
>> 
>> Tilman
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>

Re: [VOTE] Release Apache PDFBox 2.0.0-RC1

2015-10-16 Thread Maruan Sahyoun

Hi,

thanks for preparing the RC1

+1

BR
Maruan

> Am 15.10.2015 um 22:17 schrieb Andreas Lehmkuehler :
> 
> Hi,
> 
> a candidate for the PDFBox 2.0.0-RC1 release is available at:
> 
>https://dist.apache.org/repos/dist/dev/pdfbox/2.0.0-RC1/
> 
> The release candidate is a zip archive of the sources in:
> 
>http://svn.apache.org/repos/asf/pdfbox/tags/2.0.0-RC1/
> 
> The SHA1 checksum of the archive is d9b3f098e849c2c710107fb3be8c29c48dd2eb30.
> 
> Please vote on releasing this package as Apache PDFBox 2.0.0-RC1.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PDFBox PMC votes are cast.
> 
>[ ] +1 Release this package as Apache PDFBox 2.0.0-RC1
>[ ] -1 Do not release this package because...
> 
> Here is my +1
> 
> BR
> Andreas Lehmkühler
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: ttc collections as map

2015-10-18 Thread Maruan Sahyoun

Hi,

> Am 18.10.2015 um 15:33 schrieb Tilman Hausherr :
> 
> Should we offer some sort of map of name -> ttf-font? Currently a user needs 
> to go through a list of fonts
> 
> trueTypeCollection.getFonts()
> 
> until hitting the correct one. Shouldn't we have something like
> 
> trueTypeCollection.getFontByNameMap()
> 
> so a user who knows the fonts on the system could have code like
> 
> PDType0Font.load(document, 
> newTrueTypeCollection(newFile("c:/windows/fonts/MSGothic.ttc")).getFontByName().get("MS-Gothic"),
>  true);
> 

good idea - maybe a little shorter such as 

PDType0Font.load(document, 
newTrueTypeCollection(newFile("c:/windows/fonts/MSGothic.ttc")).getFontByName("MS-Gothic"),
 true);

BR
Maruan

> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

News/Blog on PDFBox website

2015-10-19 Thread Maruan Sahyoun

Hi,

I would like to start a news/blog section on the PDFBox website so we can give 
regular updates more quickly. As that is not so easily done using the Apache 
CMS I'd like to discuss moving to a local build of the website publishing to 
the Apache CMS the same way we are doing it now for the Javadoc with the maven 
scm-publish plugin (or using svnpubsub/gitpubsub at a later stage - ).

As a base for the local build I'd propose to use jekyll [http://jekyllrb.com] a 
static site generator.

WDYT?

Maruan


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: News/Blog on PDFBox website

2015-10-20 Thread Maruan Sahyoun

Hi,

> Am 20.10.2015 um 11:45 schrieb Andreas Lehmkühler :
> 
> Hi,
> 
>> Maruan Sahyoun  hat am 19. Oktober 2015 um 10:52
>> geschrieben:
>> 
>> 
>> Hi,
>> 
>> I would like to start a news/blog section on the PDFBox website so we can 
>> give
>> regular updates more quickly. As that is not so easily done using the Apache
>> CMS I'd like to discuss moving to a local build of the website publishing to
>> the Apache CMS the same way we are doing it now for the Javadoc with the 
>> maven
>> scm-publish plugin (or using svnpubsub/gitpubsub at a later stage - ).
>> 
>> As a base for the local build I'd propose to use jekyll [http://jekyllrb.com]
>> a static site generator.
>> 
>> WDYT?
> I'm ok with that as long as 
> - the new environment is available for all common platforms (OSX, Windows and
> Linux)

that's the case with jekyll but I'm open to other suggestions with a similar 
capability. The reason I was proposing jekyll
- it's very active
- several Apache Projects such as Drill, Wicket, JClouds,  are already using it
- it supports markdown - so the current content can remain (with some minor 
changes to the file header)
- supports prebuilt content which remains as is
- it was one of the suggestions in an earlier discussion about deprecating the 
Apache CMS

> - it's not to complicated to install a local build environment

http://jekyllrb.com/docs/installation/
http://jekyllrb.com/docs/windows/

Shall I wait for more feedback or move forward converting to jekyll?

BR
Maruan

> 
>> Maruan
> 
> BR 
> Andreas
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Error in migration documentation for 2.0 RC1

2015-10-20 Thread Maruan Sahyoun

Hi Daniel,

> Am 20.10.2015 um 13:15 schrieb Daniel Wilson 
> :
> 
> Gentlemen,
> 
> I've found a mistake in the documentation that simply does not match the
> object model.
> 
> http://pdfbox.staging.apache.org/2.0/migration.html#pdf-rendering
> 
> First, the current version, then a version that I believe corrects that
> "renderImageWithDPI" line.
> 
> PDDocument document = PDDocument.load(new
> File(pdfFilename));PDFRenderer pdfRenderer = new
> PDFRenderer(document);int pageCounter = 0;for (PDPage page :
> document.getPages()){
>pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
> 
>// suffix in filename will be used as the file format
>ImageIOUtil.writeImage(bim, pdfFilename + "-" + (pageCounter++) +
> ".png", 300);}document.close();
> 
> PDDocument document = PDDocument.load(new
> File(pdfFilename));PDFRenderer pdfRenderer = new
> PDFRenderer(document);int pageCounter = 0;for (PDPage page :
> document.getPages()){
>BufferedImage bim = pdfRenderer.renderImageWithDPI(pageCounter,
> 300, ImageType.RGB);
> 
>// suffix in filename will be used as the file format
>ImageIOUtil.writeImage(bim, pdfFilename + "-" + (pageCounter++) +
> ".png", 300);}document.close();

done - thanks for the hint

Maruan


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: News/Blog on PDFBox website

2015-10-20 Thread Maruan Sahyoun

Hi,

> Am 20.10.2015 um 16:20 schrieb John Hewson :
> 
> 
> 
>> On 19 Oct 2015, at 01:52, Maruan Sahyoun  wrote:
>> 
>> Hi,
>> 
>> I would like to start a news/blog section on the PDFBox website so we can 
>> give regular updates more quickly. As that is not so easily done using the 
>> Apache CMS I'd like to discuss moving to a local build of the website 
>> publishing to the Apache CMS the same way we are doing it now for the 
>> Javadoc with the maven scm-publish plugin (or using svnpubsub/gitpubsub at a 
>> later stage - ).
> 
> A news page would be good. Maybe we could have two or three “headlines” on 
> the homepage which link to the latest items too.

idea I have is

- 'breaking news' section which will be just below the current intro ("Apache 
PDFBox - A Java Library …") showing a news entry which is tagged that way
- News section with the latest prominent and earlier ones (up to 3 or 5) with 
little text
- News archive

> 
>> 
>> As a base for the local build I'd propose to use jekyll 
>> [http://jekyllrb.com] a static site generator.
>> 
>> WDYT?
> 
> I’ve used jekyll, no complaints.

me too.

I'm currently thinking about the best transition. Maybe we could use the 
pdfbox-docs repo as a new base for the content, generate from master, push to 
asf-site branch and also push to the current Apache CMS based repo. If that 
works ask infra to switch to gitpubsub and the pdfbox-docs repo as then the 
asf-site branch will be served automatically.

Would allow us to do the transition in parallel wo affecting the current site. 
Only drawback is that the content is there twice for a little while but as the 
markdown files can be used for both that's a copy&paste. Not ideal I know.

Open for other suggestions of course.

WDYT?  

> 
> — John
> 
>> Maruan
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: News/Blog on PDFBox website

2015-10-20 Thread Maruan Sahyoun

Hi,

> Am 20.10.2015 um 16:55 schrieb John Hewson :
> 
> 
>> On 20 Oct 2015, at 07:31, Maruan Sahyoun  wrote:
>> 
>> Hi,
>> 
>>> Am 20.10.2015 um 16:20 schrieb John Hewson :
>>> 
>>> 
>>> 
>>>> On 19 Oct 2015, at 01:52, Maruan Sahyoun  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I would like to start a news/blog section on the PDFBox website so we can 
>>>> give regular updates more quickly. As that is not so easily done using the 
>>>> Apache CMS I'd like to discuss moving to a local build of the website 
>>>> publishing to the Apache CMS the same way we are doing it now for the 
>>>> Javadoc with the maven scm-publish plugin (or using svnpubsub/gitpubsub at 
>>>> a later stage - ).
>>> 
>>> A news page would be good. Maybe we could have two or three “headlines” on 
>>> the homepage which link to the latest items too.
>> 
>> idea I have is
>> 
>> - 'breaking news' section which will be just below the current intro 
>> ("Apache PDFBox - A Java Library …") showing a news entry which is tagged 
>> that way
>> - News section with the latest prominent and earlier ones (up to 3 or 5) 
>> with little text
>> - News archive
> 
> Sounds good, maybe keep those homepage headlines + text short though - think 
> of it as DRY writing!
> 
>>> 
>>>> 
>>>> As a base for the local build I'd propose to use jekyll 
>>>> [http://jekyllrb.com] a static site generator.
>>>> 
>>>> WDYT?
>>> 
>>> I’ve used jekyll, no complaints.
>> 
>> me too.
>> 
>> I'm currently thinking about the best transition. Maybe we could use the 
>> pdfbox-docs repo as a new base for the content, generate from master, push 
>> to asf-site branch and also push to the current Apache CMS based repo. If 
>> that works ask infra to switch to gitpubsub and the pdfbox-docs repo as then 
>> the asf-site branch will be served automatically.
>> 
>> Would allow us to do the transition in parallel wo affecting the current 
>> site. Only drawback is that the content is there twice for a little while 
>> but as the markdown files can be used for both that's a copy&paste. Not 
>> ideal I know.
>> 
>> Open for other suggestions of course.
>> 
>> WDYT?  
> 
> Based on the experience of last time we tried changing the website toolchain 
> I’d advocate changing either the source control, the static generator, or 
> starting over with new docs from pdfbox-docs but not all three at the same 
> time. I wouldn’t even do two at the same time. If we start by just moving the 
> existing content to jekyll then we could branch the existing CMS site on SVN 
> and port it to Jekyll in that separate branch. If we do that quickly then the 
> content won’t get out-of-date.

Good idea - so we create a 'jekyll-site' (or whatever name we choose) branch on 
SVN - generate from there and push to the current CMS production tree. As this 
is live we move to pdfbox-docs and start using gitpubsub. Correct?

BR
Maruan

> 
> — John
> 
>> 
>>> 
>>> — John
>>> 
>>>> Maruan
>>>> 
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: comparison of 1.8.10 and 2.0 trunk

2015-10-25 Thread Maruan Sahyoun

Hi Tim,

I've created https://issues.apache.org/jira/browse/PDFBOX-3058 
 to track our part of fixing 
issues as part of the test (and later onset come) and added you and Tilman as a 
watcher.

BR
Maruan


> Am 23.10.2015 um 21:36 schrieb Allison, Timothy B. :
> 
> All,
> 
>  Apologies for the delay.  I finally finished the comparison of text 
> extracted from 100k pdfs with 1.8.10 and 2.0 trunk 
> (pdfbox-2.0.0-20151022.051152-1783).
> The reports are available here [0].  I botched the commit message...
> 
>  I haven't had a chance to review the results.  The eval code is still in 
> development and there might be bugs! To view the docs, prepend: h t t p : 
> slash slash one six two . two four two . two two eight . one seven four/docs/ 
>  ... just don't let any of the scrapers read that. ;)  The docs include all 
> those within our corpus that had a rtl word (when extracted with 1.8.10 :)) 
> and then I took a random selection to fill out ~100k pdfs from common crawl 
> and govdocs1.
> 
>  Let me know if you have any questions.
> 
>  Cheers,
> 
> Tim
> 
> 
> [0] 
> https://github.com/tballison/share/blob/master/pdfbox_comparisons/pdfbox_1_8_10V2_0_20151023.zip
>

PDFBox Website [PDFBOX-3040]

2015-10-29 Thread Maruan Sahyoun

Hi,

I've done all the changes to be able to build locally using jekyll. Shall we 
merge the branch jekyll-migration back to trunk or what are the proposed next 
steps?

We could also use the jekyll-migration branch as a base for pdfbox-docs (git), 
build from there and after that remove the cmssite svn repo completely. Would 
be my preference.

After having done that I'll do some more changes to the website but wanted to 
complete the infrastructure related topics first.

BR
Maruan
-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox Website [PDFBOX-3040]

2015-10-29 Thread Maruan Sahyoun


> Am 29.10.2015 um 19:44 schrieb John Hewson :
> 
> 
>> On 29 Oct 2015, at 02:47, Maruan Sahyoun  wrote:
>> 
>> Hi,
>> 
>> I've done all the changes to be able to build locally using jekyll. Shall we 
>> merge the branch jekyll-migration back to trunk or what are the proposed 
>> next steps?
> 
> I’m confused, why would we want jekyll-migration to be in trunk? That’s where 
> our source code lives, not our website. Did you mean something else, e.g. 
> merging back into the cmssite branch?

for clarity

merge

https://svn.apache.org/repos/asf/pdfbox/cmssite/branches/jekyll-migration/

into

https://svn.apache.org/repos/asf/pdfbox/cmssite/trunk/

BR
Maruan

> 
>> We could also use the jekyll-migration branch as a base for pdfbox-docs 
>> (git), build from there and after that remove the cmssite svn repo 
>> completely. Would be my preference.
> 
> That depends on what you mean above… I’ll wait for the explanation. Also 
> cmssite is a branch, not a repo. 
> 
>> After having done that I'll do some more changes to the website but wanted 
>> to complete the infrastructure related topics first.
>> 
>> BR
>> Maruan
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox Website [PDFBOX-3040]

2015-10-29 Thread Maruan Sahyoun


> Am 29.10.2015 um 20:20 schrieb John Hewson :
> 
> 
>> On 29 Oct 2015, at 12:00, Maruan Sahyoun  wrote:
>> 
>>> 
>>> Am 29.10.2015 um 19:44 schrieb John Hewson >> <mailto:j...@jahewson.com>>:
>>> 
>>> 
>>>> On 29 Oct 2015, at 02:47, Maruan Sahyoun >>> <mailto:sahy...@fileaffairs.de>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I've done all the changes to be able to build locally using jekyll. Shall 
>>>> we merge the branch jekyll-migration back to trunk or what are the 
>>>> proposed next steps?
>>> 
>>> I’m confused, why would we want jekyll-migration to be in trunk? That’s 
>>> where our source code lives, not our website. Did you mean something else, 
>>> e.g. merging back into the cmssite branch?
>> 
>> for clarity
>> 
>> merge
>> 
>> https://svn.apache.org/repos/asf/pdfbox/cmssite/branches/jekyll-migration/ 
>> <https://svn.apache.org/repos/asf/pdfbox/cmssite/branches/jekyll-migration/>
>> 
>> into
>> 
>> https://svn.apache.org/repos/asf/pdfbox/cmssite/trunk/ 
>> <https://svn.apache.org/repos/asf/pdfbox/cmssite/trunk/>
> 
> Ok, great, that makes sense. Yes I’d like to see that get merged in.

We'd need to first remove the Apache CMS build from the …/cmssite/trunk/ as a 
merge would trigger a rebuild which is not compliant.

> 
>> BR
>> Maruan
>> 
>>> 
>>>> We could also use the jekyll-migration branch as a base for pdfbox-docs 
>>>> (git), build from there and after that remove the cmssite svn repo 
>>>> completely. Would be my preference.
>>> 
>>> That depends on what you mean above… I’ll wait for the explanation. Also 
>>> cmssite is a branch, not a repo. 
> 
> Building from SVN is far simpler. But I’d be interested in removing ApacheCMS 
> from the equation.

Building from SVN is far simpler because … ?

BR
Maruan


> 
> — John
> 
>>>> After having done that I'll do some more changes to the website but wanted 
>>>> to complete the infrastructure related topics first.
>>>> 
>>>> BR
>>>> Maruan
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org 
>>> <mailto:dev-unsubscr...@pdfbox.apache.org>
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org 
>>> <mailto:dev-h...@pdfbox.apache.org>
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org 
>> <mailto:dev-unsubscr...@pdfbox.apache.org>
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org 
>> <mailto:dev-h...@pdfbox.apache.org>


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox Website [PDFBOX-3040]

2015-10-30 Thread Maruan Sahyoun


> Am 29.10.2015 um 20:27 schrieb Maruan Sahyoun :
> 
> 
>> Am 29.10.2015 um 20:20 schrieb John Hewson :
>> 
>> 
>>> On 29 Oct 2015, at 12:00, Maruan Sahyoun  wrote:
>>> 
>>>> 
>>>> Am 29.10.2015 um 19:44 schrieb John Hewson >>> <mailto:j...@jahewson.com>>:
>>>> 
>>>> 
>>>>> On 29 Oct 2015, at 02:47, Maruan Sahyoun >>>> <mailto:sahy...@fileaffairs.de>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I've done all the changes to be able to build locally using jekyll. Shall 
>>>>> we merge the branch jekyll-migration back to trunk or what are the 
>>>>> proposed next steps?
>>>> 
>>>> I’m confused, why would we want jekyll-migration to be in trunk? That’s 
>>>> where our source code lives, not our website. Did you mean something else, 
>>>> e.g. merging back into the cmssite branch?
>>> 
>>> for clarity
>>> 
>>> merge
>>> 
>>> https://svn.apache.org/repos/asf/pdfbox/cmssite/branches/jekyll-migration/ 
>>> <https://svn.apache.org/repos/asf/pdfbox/cmssite/branches/jekyll-migration/>
>>> 
>>> into
>>> 
>>> https://svn.apache.org/repos/asf/pdfbox/cmssite/trunk/ 
>>> <https://svn.apache.org/repos/asf/pdfbox/cmssite/trunk/>
>> 
>> Ok, great, that makes sense. Yes I’d like to see that get merged in.
> 
> We'd need to first remove the Apache CMS build from the …/cmssite/trunk/ as a 
> merge would trigger a rebuild which is not compliant.

given that we agreed to use pdfbox-docs in the past and a merge will interfere 
with the Apache CMS I'll be updating pdfbox-docs with the current 
jekyll-migration content so we can start building from there.
The ../cmssite/trunk content can remain as a backup for the moment until the 
change has been done and the build process is documented.

> 
>> 
>>> BR
>>> Maruan
>>> 
>>>> 
>>>>> We could also use the jekyll-migration branch as a base for pdfbox-docs 
>>>>> (git), build from there and after that remove the cmssite svn repo 
>>>>> completely. Would be my preference.
>>>> 
>>>> That depends on what you mean above… I’ll wait for the explanation. Also 
>>>> cmssite is a branch, not a repo. 
>> 
>> Building from SVN is far simpler. But I’d be interested in removing 
>> ApacheCMS from the equation.

the Apache CMS is already no longer used with the current jekyll local build. 
Instead the content is pushed directly into the production site (same as we do 
for the javadoc).

BR
Maruan 

> 
> Building from SVN is far simpler because … ?
> 
> BR
> Maruan
> 
> 
>> 
>> — John
>> 
>>>>> After having done that I'll do some more changes to the website but 
>>>>> wanted to complete the infrastructure related topics first.
>>>>> 
>>>>> BR
>>>>> Maruan
>>>>> -
>>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>>>> 
>>>> 
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org 
>>>> <mailto:dev-unsubscr...@pdfbox.apache.org>
>>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org 
>>>> <mailto:dev-h...@pdfbox.apache.org>
>>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org 
>>> <mailto:dev-unsubscr...@pdfbox.apache.org>
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org 
>>> <mailto:dev-h...@pdfbox.apache.org>
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: PDFBox Website [PDFBOX-3040]

2015-10-30 Thread Maruan Sahyoun

Hi,
> Am 30.10.2015 um 19:39 schrieb John Hewson :
> 
> 
>> On 30 Oct 2015, at 03:33, Maruan Sahyoun  wrote:
>> 
>>> 
>>> Am 29.10.2015 um 20:27 schrieb Maruan Sahyoun :
>>> 
>>> 
>>>> Am 29.10.2015 um 20:20 schrieb John Hewson :
>>>> 
>>>> 
>>>>> On 29 Oct 2015, at 12:00, Maruan Sahyoun  wrote:
>>>>> 
>>>>>> 
>>>>>> Am 29.10.2015 um 19:44 schrieb John Hewson >>>>> <mailto:j...@jahewson.com>>:
>>>>>> 
>>>>>> 
>>>>>>> On 29 Oct 2015, at 02:47, Maruan Sahyoun >>>>>> <mailto:sahy...@fileaffairs.de>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I've done all the changes to be able to build locally using jekyll. 
>>>>>>> Shall we merge the branch jekyll-migration back to trunk or what are 
>>>>>>> the proposed next steps?
>>>>>> 
>>>>>> I’m confused, why would we want jekyll-migration to be in trunk? That’s 
>>>>>> where our source code lives, not our website. Did you mean something 
>>>>>> else, e.g. merging back into the cmssite branch?
>>>>> 
>>>>> for clarity
>>>>> 
>>>>> merge
>>>>> 
>>>>> https://svn.apache.org/repos/asf/pdfbox/cmssite/branches/jekyll-migration/
>>>>>  
>>>>> <https://svn.apache.org/repos/asf/pdfbox/cmssite/branches/jekyll-migration/>
>>>>> 
>>>>> into
>>>>> 
>>>>> https://svn.apache.org/repos/asf/pdfbox/cmssite/trunk/ 
>>>>> <https://svn.apache.org/repos/asf/pdfbox/cmssite/trunk/>
>>>> 
>>>> Ok, great, that makes sense. Yes I’d like to see that get merged in.
>>> 
>>> We'd need to first remove the Apache CMS build from the …/cmssite/trunk/ as 
>>> a merge would trigger a rebuild which is not compliant.
>> 
>> given that we agreed to use pdfbox-docs in the past and a merge will 
>> interfere with the Apache CMS I'll be updating pdfbox-docs with the current 
>> jekyll-migration content so we can start building from there.
>> The ../cmssite/trunk content can remain as a backup for the moment until the 
>> change has been done and the build process is documented.
> 
> The time for doing this has long since passed. The RC phase is not the time 
> to try changing version control systems. The current situation is not good - 
> we need people beyond yourself to be able to use and contribute to the docs 
> in the manner which they expect.

it has been possible in the past to contribute (before the Apache CMS, with the 
Apache CMS) and still is with the new system. There never was a dependency on a 
single person. Assuming with "they" you are including yourself what exactly 
doesn't meet your requirements?

> 
> If you have no way of moving to jekyll without also abandoning SVN then IMHO 
> we should not be going down that road. Especially not now.
> 

moving from the Apache CMS to jekyll IMHO is a more breaking change than moving 
from SVN to git (assuming that most people should know the basic commands for 
both of have a decent GUI anyway). Again if there are specific concerns you 
have maybe these can be addressed. 

"… not now …" Why not? Everything is still available and it's not that a lot of 
people have contributed content over content to the website lately. 

BR
Maruan


> — John
> 
>>> 
>>>> 
>>>>> BR
>>>>> Maruan
>>>>> 
>>>>>> 
>>>>>>> We could also use the jekyll-migration branch as a base for pdfbox-docs 
>>>>>>> (git), build from there and after that remove the cmssite svn repo 
>>>>>>> completely. Would be my preference.
>>>>>> 
>>>>>> That depends on what you mean above… I’ll wait for the explanation. Also 
>>>>>> cmssite is a branch, not a repo. 
>>>> 
>>>> Building from SVN is far simpler. But I’d be interested in removing 
>>>> ApacheCMS from the equation.
>> 
>> the Apache CMS is already no longer used with the current jekyll local 
>> build. Instead the content is pushed directly into the production site (same 
>> as we do for the javadoc).
>> 
>> BR
>> Maruan 
>> 
>>> 
>>> Bu

Re: PDFBox Website [PDFBOX-3040]

2015-10-30 Thread Maruan Sahyoun


> Am 30.10.2015 um 20:06 schrieb John Hewson :
> 
> 
>> On 30 Oct 2015, at 11:58, Maruan Sahyoun  wrote:
>> 
>> Hi,
>>> Am 30.10.2015 um 19:39 schrieb John Hewson :
>>> 
>>> 
>>>> On 30 Oct 2015, at 03:33, Maruan Sahyoun  wrote:
>>>> 
>>>>> 
>>>>> Am 29.10.2015 um 20:27 schrieb Maruan Sahyoun :
>>>>> 
>>>>> 
>>>>>> Am 29.10.2015 um 20:20 schrieb John Hewson :
>>>>>> 
>>>>>> 
>>>>>>> On 29 Oct 2015, at 12:00, Maruan Sahyoun  wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Am 29.10.2015 um 19:44 schrieb John Hewson >>>>>>> <mailto:j...@jahewson.com>>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 29 Oct 2015, at 02:47, Maruan Sahyoun >>>>>>>> <mailto:sahy...@fileaffairs.de>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I've done all the changes to be able to build locally using jekyll. 
>>>>>>>>> Shall we merge the branch jekyll-migration back to trunk or what are 
>>>>>>>>> the proposed next steps?
>>>>>>>> 
>>>>>>>> I’m confused, why would we want jekyll-migration to be in trunk? 
>>>>>>>> That’s where our source code lives, not our website. Did you mean 
>>>>>>>> something else, e.g. merging back into the cmssite branch?
>>>>>>> 
>>>>>>> for clarity
>>>>>>> 
>>>>>>> merge
>>>>>>> 
>>>>>>> https://svn.apache.org/repos/asf/pdfbox/cmssite/branches/jekyll-migration/
>>>>>>>  
>>>>>>> <https://svn.apache.org/repos/asf/pdfbox/cmssite/branches/jekyll-migration/>
>>>>>>> 
>>>>>>> into
>>>>>>> 
>>>>>>> https://svn.apache.org/repos/asf/pdfbox/cmssite/trunk/ 
>>>>>>> <https://svn.apache.org/repos/asf/pdfbox/cmssite/trunk/>
>>>>>> 
>>>>>> Ok, great, that makes sense. Yes I’d like to see that get merged in.
>>>>> 
>>>>> We'd need to first remove the Apache CMS build from the …/cmssite/trunk/ 
>>>>> as a merge would trigger a rebuild which is not compliant.
>>>> 
>>>> given that we agreed to use pdfbox-docs in the past and a merge will 
>>>> interfere with the Apache CMS I'll be updating pdfbox-docs with the 
>>>> current jekyll-migration content so we can start building from there.
>>>> The ../cmssite/trunk content can remain as a backup for the moment until 
>>>> the change has been done and the build process is documented.
>>> 
>>> The time for doing this has long since passed. The RC phase is not the time 
>>> to try changing version control systems. The current situation is not good 
>>> - we need people beyond yourself to be able to use and contribute to the 
>>> docs in the manner which they expect.
>> 
>> it has been possible in the past to contribute (before the Apache CMS, with 
>> the Apache CMS) and still is with the new system. There never was a 
>> dependency on a single person. Assuming with "they" you are including 
>> yourself what exactly doesn't meet your requirements?
>> 
>>> 
>>> If you have no way of moving to jekyll without also abandoning SVN then 
>>> IMHO we should not be going down that road. Especially not now.
>>> 
>> 
>> moving from the Apache CMS to jekyll IMHO is a more breaking change than 
>> moving from SVN to git (assuming that most people should know the basic 
>> commands for both of have a decent GUI anyway). Again if there are specific 
>> concerns you have maybe these can be addressed. 
> 
> My concern is that there is no technical reason to move away from SVN. 
> Literally having to reply to another email on this topic is why it’s such a 
> waste of time… can we just stick with what works and not have everyone 
> re-learning what used to work fine yesterday? An can we stop wasting time 
> discussing it?

we had the plan to move the docs to git a while ago which didn't work out as 
the Apache CMS limited us from pulling the content. With the local build this 
limitation is gone.  No need t

Convert README.txt to README.md and markdown

2015-11-02 Thread Maruan Sahyoun

Hi,

I would like to take the current README.txt in the projects top level folder, 
rename that to README.md and use markdown for formatting. In addition I would 
like to add some more content as a kind of quick start how to get help, file 
issues… Benefit would be that a) the README provides some basic information for 
the ones looking at the source code and b) renaming to .MD and using markdown 
will provide a better look&feel on GitHub as that will be the initial document 
visible at the bottom of the PDFBox GitHib repo. 

WDYT?

Maruan
-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Pdfbox trunk performance

2015-11-02 Thread Maruan Sahyoun

Hello Manfred,

> Am 02.11.2015 um 17:09 schrieb Manfred Pock :
> 
> Hello!
> 
> Currently we you use a version from pdfbox to render pdf's from the trunk at 
> date 2015-03-09. Not really fast, but it will be ok.
> 
> From time to time i try the current trunk version to check any improvements 
> on the performance side, but no enhancement, it seems the rendering process 
> is getting slower and slower.
> 
> For example the pdf at http://cloud.directupload.net/N4t needs on the 
> may-version about 5 sec to render der first page, the current version needs 
> at least 10 sec.

thanks for the pointer - do you get any (new) log messages while rendering? 
Which platform are you using?

BR
Maruan

> 
> I have tried it it with different memory settings, no really different. It 
> will be great if you can to some improvements on the rendering performance 
> before 2.0 final version will release.
> 
> best regarts, Manfred
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Pdfbox trunk performance

2015-11-02 Thread Maruan Sahyoun

Hi,

> Am 02.11.2015 um 17:20 schrieb Manfred Pock :
> 
> Windows 7 pro, jdk 8, intel core i5, 8Gb Ram.
> 
> Java-Parameter:
> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=30 -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC -Xmx512m -Xms192m
> 
> Log-message, no there are no new log message.

is it a standalone app? running in an app server …

One thing we do know about is the FontCache which has an performance impact. 
But there should be a log message about building/rebuilding the font cache 
("Building font cache, this may take a while")

BR
Maruan

> 

> BR, Manfred
> 
> Am 02.11.2015 um 17:12 schrieb Maruan Sahyoun:
>> Hello Manfred,
>> 
>>> Am 02.11.2015 um 17:09 schrieb Manfred Pock :
>>> 
>>> Hello!
>>> 
>>> Currently we you use a version from pdfbox to render pdf's from the trunk 
>>> at date 2015-03-09. Not really fast, but it will be ok.
>>> 
>>> From time to time i try the current trunk version to check any improvements 
>>> on the performance side, but no enhancement, it seems the rendering process 
>>> is getting slower and slower.
>>> 
>>> For example the pdf at http://cloud.directupload.net/N4t needs on the 
>>> may-version about 5 sec to render der first page, the current version needs 
>>> at least 10 sec.
>> thanks for the pointer - do you get any (new) log messages while rendering? 
>> Which platform are you using?
>> 
>> BR
>> Maruan
>> 
>>> I have tried it it with different memory settings, no really different. It 
>>> will be great if you can to some improvements on the rendering performance 
>>> before 2.0 final version will release.
>>> 
>>> best regarts, Manfred
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Pdfbox trunk performance

2015-11-02 Thread Maruan Sahyoun

Hi,

> Am 02.11.2015 um 17:20 schrieb Manfred Pock :
> 
> Windows 7 pro, jdk 8, intel core i5, 8Gb Ram.
> 
> Java-Parameter:
> -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=30 -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC -Xmx512m -Xms192m
> 
> Log-message, no there are no new log message.

btw - how do you do the rendering? Would you have a code snippet?

BR
Maruan


> 
> BR, Manfred
> 
> Am 02.11.2015 um 17:12 schrieb Maruan Sahyoun:
>> Hello Manfred,
>> 
>>> Am 02.11.2015 um 17:09 schrieb Manfred Pock :
>>> 
>>> Hello!
>>> 
>>> Currently we you use a version from pdfbox to render pdf's from the trunk 
>>> at date 2015-03-09. Not really fast, but it will be ok.
>>> 
>>> From time to time i try the current trunk version to check any improvements 
>>> on the performance side, but no enhancement, it seems the rendering process 
>>> is getting slower and slower.
>>> 
>>> For example the pdf at http://cloud.directupload.net/N4t needs on the 
>>> may-version about 5 sec to render der first page, the current version needs 
>>> at least 10 sec.
>> thanks for the pointer - do you get any (new) log messages while rendering? 
>> Which platform are you using?
>> 
>> BR
>> Maruan
>> 
>>> I have tried it it with different memory settings, no really different. It 
>>> will be great if you can to some improvements on the rendering performance 
>>> before 2.0 final version will release.
>>> 
>>> best regarts, Manfred
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: 2.0.0-RC2

2015-11-02 Thread Maruan Sahyoun

Hi,

> Am 02.11.2015 um 12:29 schrieb Andreas Lehmkühler :
> 
> Hi,
> 
> do we need another release candidate before releasing the final version?
> 

I'd think another RC would be good but that should potentially include an 
enhanced font cache.

BR
Maruan

> I would have some cycles to cut a RC2 this week only (on Wednesday?).
> 
> WDYT?
> 
> BR
> Andreas
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1629 matches

Mail list logo