Hi Jeremias,

Sorry for the confusion. Ken did most of the patch and then had to work on some other projects, so I did some final touches and submitted it. We already have a CCLA on file.

thanks,
brian


On Mar 2, 2009, at 3:30 AM, Jeremias Maerki wrote:

Thanks for speaking up, Ken. It's a great thing you're contributing to
PDFBox. But we actually do have legal issues to worry about here.

The way this happened, we don't have a legal trail to make sure that
your contributions are actually intended for inclusion and under what
license. Only Brian (hopefully) knows your intentions. When you attach a
patch to a Jira issue, you have to tick a checkbox indicating that you
intend this for inclusion:

"[ ] Grant license to ASF for inclusion in ASF works (as per the Apache
License §5)
Contributions intended for inclusion in ASF products (eg. patches, code)
must be licensed to ASF under the terms of the Apache License. Other
attachments (eg. log dumps, test cases) need not be."

With §5 of the ALv2 you explicitely give the ASF the same license for
your changes as the ASF gives to its users. That is enough for smaller
patches (bugfixes, small improvements). As soon as you contribute
considerable new functionality or new files which have a certain
"artistic" aspect, the §5 is considered insufficient at which point
committers are expected to ask for an Contributor License Agreement to
be filed with the ASF. Also, regular contributors should send in a CLA
as it is also a precondition to becoming a committer. For even larger
contributions (like whole new subsystems), a contribution may even have
to go through IP clearance with an explicit separate license grant on
the code submitted. So there are various levels. The lines are probably
not always very clearly drawn. But the intent is to protect the users
and the contributors (i.e. you) from legal harm [1]. That can only
happen if we have a clean legal trail.

[1] http://apache.org/foundation/how-it-works.html#what
(see especially the third point in the list)

I only notice after this started that you and Justin LeFebvre are from
the same company. Both of you have written more than one patch. So I
would like to suggest that both of you send in an ICLA [2]. Please also check if the work contracts in your company make it necessary to send in
a CCLA [2] in addition to the ICLAs.

[2] http://apache.org/licenses/#clas

A committer can always ask the PMC chair or an ASF member to check if a
particular ICLA has been recorded, yet.

Ken, can I ask you to attach the two (original) patches, that were
processed via Brian, to the JIRA issues associated with them so the gaps are filled, even if that happens after the two patches were processed. I
think that should be enough to correct the situation. In the future,
please attach your patches to a new JIRA issues and take it from there.

There are other points also: by directly working with Brian, there is no
discussion (if necessary) around this if anyone has any issues. Other
committers can only react after everything has already happened. You're
also not taking part in the community whose building is the most
important task of PDFBox being in the Apache Incubator. And you're not
getting the same visibility you'd get if you take part in discussions
here. Only that way does the existing team have a chance to get to know you and to eventually vote you in as a committer if you turn out to be a regular contributor. Given that two employees of your company contribute
to PDFBox means that it is important to you. Then it is all the more
important that you participate in the project and jointly help evolve
the project in directions that help you.

Everybody (especially Brian), don't feel bad about this! The Incubation phase is here for everybody to learn who we do things inside the Apache
Software Foundation. There are a few rules that makes the ASF so
different from the ordinary SourceForge project. I know it's a lot of
new stuff especially new committers have to learn. Hopefully, we mentors
can help clear things up if there are questions or problems.

Thank you for your understanding!

On 01.03.2009 19:31:14 Ken Glidden wrote:
I am said Ken Glidden.
I'm VP of Engineering at Basis Technology and am working directly with Brian on this.
No legal issues to worry about.
Cheers.

-----Original Message-----
From: Jeremias Maerki [mailto:[email protected]]
Sent: Saturday, February 28, 2009 12:26 PM
To: [email protected]
Subject: Re: [jira] Resolved: (PDFBOX-430) Incorrect diacritic placement in text extraction

Brian,

you state here that you've applied a patch by one Ken Glidden. I cannot find any post or submission from a person with that name on the PDFBox
mailing lists. So I'm concerned about the legal trail here. Can you
explain that, please? Thank you.

On 18.02.2009 22:36:01 Brian Carrier (JIRA) wrote:

[ https://issues.apache.org/jira/browse/PDFBOX-430? page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Carrier resolved PDFBOX-430.
----------------------------------

    Resolution: Fixed

Fixed with patch by Ken Glidden that merges a single diacritic text chunk into the previous text chunk if they overlap. Note that this will not solve problems where the diacritic comes much after the text chunk it overlays, but we have not observed PDF files like that.

Sending trunk/src/main/java/org/apache/pdfbox/util/ PDFTextStripper.java Sending trunk/src/main/java/org/apache/pdfbox/util/ TextPosition.java
Sending        trunk/test/input/Acrobat9.pdf-sorted.txt
Sending        trunk/test/input/Acrobat9.pdf.txt
Transmitting file data ....Committed revision 745665.



Incorrect diacritic placement in text extraction
------------------------------------------------

                Key: PDFBOX-430
URL: https://issues.apache.org/jira/browse/ PDFBOX-430
            Project: PDFBox
         Issue Type: Bug
           Reporter: Brian Carrier

Some PDF files store diacritics (accents over characters) as separate text elements. The PDF files essentially have a chunk of text and then backup and place the diacritic over one of the characters in the chunk of text. With text extraction, the current design does not allow the diacritic to be placed over a character in the chunk and instead it is placed after the chunk.
The debug-diac2.pdf file in PDFBOX-429 shows this problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




Jeremias Maerki





Jeremias Maerki


Reply via email to