Re: Merge Request - Temp_ComplexScripts into Trunk

Vincent Hennebert Thu, 20 Oct 2011 06:54:30 -0700

The Complex Scripts feature is obviously a great enhancement and we
would all love to have it implemented in FOP. However, that should not
come at the expense of maintainability and the implementation of other
new features.


When I look at the code in the Temp_ComplexScripts branch, I have
serious concerns regarding the latter two points. I would oppose merging
the branch back to Trunk until those are resolved.

Here are the sizes of some new files:
1075 src/java/org/apache/fop/fonts/GlyphSequence.java
1089 src/java/org/apache/fop/fonts/GlyphProcessingState.java
1269 src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java
2034 src/java/org/apache/fop/layoutmgr/BidiUtil.java
3449 test/java/org/apache/fop/complexscripts/util/TTXFile.java

This latter one contains more than 50 field
declarations, and the Handler.startElement method alone is more than
1800 lines long.

Also, the o.a.f.fonts.truetype.TTFFile class has now grown to
5502 lines. That’s 3 times its original size which was already too big.
I regularly find myself looking at bits of this class, and I would be
unable to do so on a 5500 line class.

I don’t think it needs to be explained why big classes are undesirable?

Also, most files disable Checkstyle checks, the most important ones
being line length and white space. Many files have too long lines which
makes it a pain to read through, having to horizontally scroll all the
time. We agreed on a certain coding style in the project and it would be
good if new code could adhere to it.

Speaking of variable names, here is a method picked from
o.a.f.fonts.GlyphSequence:
    /**
     * Merge overlapping and abutting sub-intervals.
     */
    private static int[] mergeIntervals ( int[] ia ) {
        int ni = ia.length;
        int i, n, nm, is, ie;
        // count merged sub-intervals
        for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) {
            int s = ia [ i + 0 ];
            int e = ia [ i + 1 ];
            if ( ( ie < 0 ) || ( s > ie ) ) {
                is = s;
                ie = e;
                nm++;
            } else if ( s >= is ) {
                if ( e > ie ) {
                    ie = e;
                }
            }
        }
        int[] mi = new int [ nm * 2 ];
        // populate merged sub-intervals
        for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) {
            int s = ia [ i + 0 ];
            int e = ia [ i + 1 ];
            int k = nm * 2;
            if ( ( ie < 0 ) || ( s > ie ) ) {
                is = s;
                ie = e;
                mi [ k + 0 ] = is;
                mi [ k + 1 ] = ie;
                nm++;
            } else if ( s >= is ) {
                if ( e > ie ) {
                    ie = e;
                }
                mi [ k - 1 ] = ie;
            }
        }
        return mi;
    }

Now I fully appreciate that one has to have some knowledge of an area to
understand code relating to that area, but no level of expertise,
however high, will help me to quickly understand this code. This is just
too easy to mistake one variable for another one when they differ by
only one letter.

Combined, these would make the code very hard to maintain by anyone
other than the original author, and this is why I’m opposed to merging
it to Trunk in its current form. When I commit code I do my very best to
make it as easy to read and understand by other people, and I would
really appreciate it I could have the same in return.


Thanks,
Vincent


On 19/10/11 19:32, Simon Pepping wrote:
> Over the past ten years computing has pervaded life in all its facets,
> and spread over the world. As a consequence computing is now used in
> all languages and all scripts.
> 
> When I open my devanagari test file in emacs, it just works. When I
> open it in firefox, it just works. The same when I open it in
> LibreOffice Writer. I am sure that, if I would open it in *the* *Word*
> processor, it would just work. When I process the file with FOP, it
> does not work. With the complex scripts functionality, it works,
> dependent on the use of supported or otherwise suitable fonts. (That
> is true for all above applications, but apparently those come
> configured with my system.)
> 
> So what does a person do who believes in the XML stack to maintain his
> documentation, and wants to send his documents in Hindi to his
> customers? See that XSL-FO with FOP is not a suitable solution for him
> because Hindi uses a complex script?
> 
> FOP needs the complex scripts functionality to remain a player in the
> global playing field.
> 
> This is for me the single overarching consideration to want this
> functionality in FOP's trunk code, and in, say, half a year in a
> release. All other considerations are minor, unless one wants to claim
> that this code will block FOP's further development and maintenance in
> the coming years.
> 
> Of course, not everybody needs this functionality, and there is a fear
> of increased maintenance overhead. But the question is: For whom do we
> develop FOP? Also for the large part of the world that uses complex
> scripts?
> 
> With the development of the complex scripts functionality, Glenn Adams
> and his sponsor Basis Technologies have created a new reality, which
> is not going to go away. If this functionality does not end up in FOP,
> it will probably live on elsewhere. If the FOP team is fine with that,
> say no to the merge request, and feel comfortable with a trusted niche
> application.
> 
> Simon Pepping
> 
> On Wed, Oct 19, 2011 at 09:50:24AM +0100, Chris Bowditch wrote:
>> On 18/10/2011 19:55, Simon Pepping wrote:
>>> I merged the ComplexScripts branch into trunk. Result:
>>
>> Hi Simon,
>>
>> As well of the question of how to do the merge there is also the
>> question should we do the merge? Of course this is a valuable
>> feature to the community, and Glenn has invested a lot of time in
>> its development but is it truely production ready? I asked Vincent
>> to take a look at the branch earlier in the year as it's a feature
>> we also need, but he had several concerns that have not be
>> adequately answered. Take a look at comment #30;
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c30
>>
>> I'm not sure why Vincent describes it as a "brief look" because he
>> spent several days on it. I also asked Peter to take a look and he
>> had similar concerns. 2 or 3 letter variable names are a barrier for
>> any committer wanting to maintain this code and I don't think it is
>> a sufficient argument to say that a pre-requisite to maintaining
>> this code is to be a domain expert. I would hope that any
>> experienced committer with a debugger should be able to solve some
>> bugs. Obviously certain problems will require domain expertise, but
>> the variables names are a key barrier to being able to maintain this
>> code.
>>
>> I realise my comments might be a little controversial and I don't
>> mean any disrespect to Glenn or his work (which is largely
>> excellent), but we should at least discuss these topics before the
>> merge is completed.

Re: Merge Request - Temp_ComplexScripts into Trunk

Reply via email to