Hello Jim and Everyone on Java2D Group

Good day to you.

A quick follow-up to Review Request on bug:
      Bug          : JDK-8015070
      Bug Link : https://bugs.openjdk.java.net/browse/JDK-8015070

Thank you Jim for the detailed feedback. 

It takes a lot of time not only in performing the review, but also in getting 
the feedback with clear words.
In each review, the solution definitely gets better & better. I 'm happy about 
it...! :)

Quick Inferences from previous feedback:

Incorporating the fast path into current logic:
       1.  I agree with you on this point and I noticed this when we modelled 
the solution as per the mask fill.
       2.  I ignored it initially for two reasons, 
                     a.  The statement - if (resA != MaxValFor4ByteArgb)...  
follows srcColor pre-multiplication step and this will ensure to skip most of 
the blending calculations pertaining to destination.
                     b.  Any addition / tweaks to current logic, might deviate 
from the SRCOVER_MASKFILL reference model.
                              Many a time, managing code with similar logic 
across implementation helps a lot.
      3.   As you said, including the fast path will avoid few multiplications 
and if checks too.
            The changes are available in the current webrev.
                     Link: 
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/

Profiling method, and Metrics:
      1.  The profiling method that was done yesterday was mere execution of 
the regression test (available in the webrev) and time measured with 
System.currentTimeMillis API
                      Since only one iteration was run, the time soared into 
milli seconds. This could be due to internal glyph caching mechanism.
      2.  Today, I evolved the regression test, into a benchmark that does the 
following:
                a.  Select Font style : {Plain, Bold, Italic, Bold Italic}
                b.  Select Font size : {20, 40, 60, 80}
                c.  Trigger drawstring once before benchmark is run. This is to 
ensure, the font subsystem is done with glyph caching internally.
                d.  Use famous string that has all characters-" The quick brown 
fox jumps over the lazy dog. 0123456789. "
                e.  For every style-size combination - run the test for 20 
iterations and take the average. (Note: Font is fixed to 'verdana' )
                f.  Modify the precision from System.currentTimeMillis to 
System.nanoTime() and reduce elapsedTime to micro seconds.

      3.  With the above setup in code, my observation on windows is as follows:
                a.  In many cases, The optimized logic is either equal-to (or) 
better in performance than the un-optimized logic.
                             The difference is very minimal - few tens to few 
hundreds of micro-seconds.
                b.  The optimized algorithm improves performance for 
Pre-multiplied alpha based destination buffer.
                c.   There are occasional huge deviations where optimized logic 
seems to take longer time.
                             These could be due to external factors like- 
stalls for memory, bus io etc., 
                             Since, the deviation is in micro seconds, I 
believe, it may not be a concern.
                d.  The complete list of time measurement taken up on windows 
x86_64 release build is as-follows-
                             I 'm not sure, how the data appears in the final 
webrev-email.
                             Kindly excuse, if the data is un-readable.

Platform : Windows x86_64 Release Build
Algorithm : Unoptimized. webrev.00
``````````````````````````````````````````````````````````````````````````
Executing Bench For Image Type: IntArgb
 -Font Style: Plain /Font Size: 20 /Time consumed (us): 84.742
 -Font Style: Plain /Font Size: 40 /Time consumed (us): 318.395
 -Font Style: Plain /Font Size: 60 /Time consumed (us): 657.474
 -Font Style: Plain /Font Size: 80 /Time consumed (us): 813.079
 -Font Style: Bold /Font Size: 20 /Time consumed (us): 386.380
 -Font Style: Bold /Font Size: 40 /Time consumed (us): 339.301
 -Font Style: Bold /Font Size: 60 /Time consumed (us): 492.631
 -Font Style: Bold /Font Size: 80 /Time consumed (us): 625.812
 -Font Style: Italic /Font Size: 20 /Time consumed (us): 235.059
 -Font Style: Italic /Font Size: 40 /Time consumed (us): 320.180
 -Font Style: Italic /Font Size: 60 /Time consumed (us): 474.558
 -Font Style: Italic /Font Size: 80 /Time consumed (us): 1188.169
 -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us): 426.988
 -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us): 374.064
 -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us): 732.375
 -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us): 864.68

Executing Bench For Image Type: IntArgb_Pre
 -Font Style: Plain /Font Size: 20 /Time consumed (us): 129.768
 -Font Style: Plain /Font Size: 40 /Time consumed (us): 206.299
 -Font Style: Plain /Font Size: 60 /Time consumed (us): 249.941
 -Font Style: Plain /Font Size: 80 /Time consumed (us): 362.372
 -Font Style: Bold /Font Size: 20 /Time consumed (us): 145.096
 -Font Style: Bold /Font Size: 40 /Time consumed (us): 151.589
 -Font Style: Bold /Font Size: 60 /Time consumed (us): 240.972
 -Font Style: Bold /Font Size: 80 /Time consumed (us): 331.894
 -Font Style: Italic /Font Size: 20 /Time consumed (us): 95.028
 -Font Style: Italic /Font Size: 40 /Time consumed (us): 245.300
 -Font Style: Italic /Font Size: 60 /Time consumed (us): 270.379
 -Font Style: Italic /Font Size: 80 /Time consumed (us): 398.139
 -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us): 93.243
 -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us): 475.406
 -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us): 280.085
 -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us): 357.486

Platform : Windows x86_64 Release Build
Algorithm : Optimized. webrev.03
``````````````````````````````````````````````````````````````````````````
Executing Bench For Image Type: IntArgb
 -Font Style: Plain /Font Size: 20 /Time consumed (us): 120.954
 -Font Style: Plain /Font Size: 40 /Time consumed (us): 364.871
 -Font Style: Plain /Font Size: 60 /Time consumed (us): 561.799
 -Font Style: Plain /Font Size: 80 /Time consumed (us): 653.390
 -Font Style: Bold /Font Size: 20 /Time consumed (us): 261.566
 -Font Style: Bold /Font Size: 40 /Time consumed (us): 311.054
 -Font Style: Bold /Font Size: 60 /Time consumed (us): 490.735
 -Font Style: Bold /Font Size: 80 /Time consumed (us): 656.559
 -Font Style: Italic /Font Size: 20 /Time consumed (us): 314.289
 -Font Style: Italic /Font Size: 40 /Time consumed (us): 378.750
 -Font Style: Italic /Font Size: 60 /Time consumed (us): 491.181
 -Font Style: Italic /Font Size: 80 /Time consumed (us): 770.172
 -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us): 375.336
 -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us): 571.371
 -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us): 548.300
 -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us): 714.526

Executing Bench For Image Type: IntArgb_Pre
 -Font Style: Plain /Font Size: 20 /Time consumed (us): 45.026
 -Font Style: Plain /Font Size: 40 /Time consumed (us): 219.016
 -Font Style: Plain /Font Size: 60 /Time consumed (us): 279.617
 -Font Style: Plain /Font Size: 80 /Time consumed (us): 282.829
 -Font Style: Bold /Font Size: 20 /Time consumed (us): 51.809
 -Font Style: Bold /Font Size: 40 /Time consumed (us): 117.563
 -Font Style: Bold /Font Size: 60 /Time consumed (us): 508.049
 -Font Style: Bold /Font Size: 80 /Time consumed (us): 402.802
 -Font Style: Italic /Font Size: 20 /Time consumed (us): 79.320
 -Font Style: Italic /Font Size: 40 /Time consumed (us): 227.473
 -Font Style: Italic /Font Size: 60 /Time consumed (us): 330.488
 -Font Style: Italic /Font Size: 80 /Time consumed (us): 353.782
 -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us): 54.687
 -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us): 235.505
 -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us): 227.205
 -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us): 324.308

Updated webrev with changes for the fast-path : 
http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/
Kindly review and provide your suggestions.

Thank you once again for detailed review and feedback
Have a good day

Prahalad N.



-----Original Message-----
From: Jim Graham 
Sent: Wednesday, March 30, 2016 2:46 AM
To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net; Sergey Bylokhov
Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased 
text on translucent backgrounds gets bright artifacts

Hi Prahalad,

This latest version looks like it should produce correct answers.

I'd like to see benchmark results on more platforms, particularly Windows since 
the different compilers may optimize these things differently.

Also, you didn't mention what data set you used for benchmarking.  I'd like to 
see benchmark results for small, medium and large font sizes, and possibly bold 
and italic fonts as well.  The reason is that the relative ratios of "empty 
glyph pixels" to "partial glyph pixels" to "fully covered glyph pixels" changes 
depending on the font type and size so if you made one of those faster and 
another slower then the results may be seen as a gain in one type of test if 
you only test one font type and size and it happens to match the part of the 
code that is more streamlined.  Also, for small font sizes the per-glyph 
overhead might hide per-pixel issues.  Please share which fonts and sizes you 
used for testing and consider using some different sizes, including something 
very large like 36 or 48 points (something with large areas of opaque
pixels) as well as more normal sizes that appear in GUIs.  Also, bold fonts can 
have a higher percentage of opaque pixels.

In particular...

This latest version is missing the "fast path" from the existing code for the 
case of srcA == 255 and glyphA == 255 where it just stores the FG_PIXEL values 
directly.  For large fonts/glyphs that case may be as important as detecting 
empty glyph pixels.

On the other hand, an additional "if" may cause the compiler to generate less 
efficient code as per Sergey's concern.  Since this "if" eliminates some 
multiplies and possibly some divides, I'd hope it would be a win-win.

You could add the fast path back inside the case where mixValSrc is 255 and 
just test srcA for 255 and do the Store ## DST ## PixelData() macro that used 
to be at the end of the block in the existing code, and then use "break;" to 
escape out of the do/while surrounding the whole macro so it moves on to the 
next pixel.

(Arguably, we might want to consider teaching our SRCOVER_MASKFILL to do the 
same thing.  I think that was one of the added values of having a separate 
GLYPH loop, but really both should be optimizing that case...)

I can see now that the macro switch to use the same macro set as 
SRCOVER_MASKFILL required you to compute the pixel address, as you noted in 
your summary.  It makes the new macro more cumbersome and makes it look like 
it's doing a bit more work per-pixel, but in reality I think the overall 
operations end up being the same as long as the compiler optimizes the 
deliberate multiplications the same way it did for implicit multiplications in 
the "pRas[foo]" and "pRas[foo*4]" code that was being generated previously.  
Benchmarks will tell us if we're good there...

                        ...jim


On 3/28/16 5:33 AM, Prahalad Kumar Narayanan wrote:
> Hello Everyone on Java2D Group
>
> Good day to you.
>
> This is a follow-up to Review Request on bug:
>      Bug          : JDK-8015070
>      Bug Link : https://bugs.openjdk.java.net/browse/JDK-8015070
>
> First, Thanks to Jim and Sergey for their feedback on the changes so far.
>
> Inferences from Jim 's Feedback on Loading destination colors:
>      1.  The API or Macro that I had earlier used to Load DST colors, indeed, 
> adjusted destination color components with divide-by-alpha if destination was 
> already pre-multiplied.
>                   My apologies.. I should have spotted this error at the 
> first iteration itself.
>      2.  Due to the divide-by-alpha adjustment, the remaining logic would 
> become incorrect. ( Especially, the multiplication with dstF based on 
> pre-mulitplication status )
>      3.  Correct API is being used now, and the dstColor components are 
> loaded directly without any divide-by-alpha adjustment.
>
> Inferences from Sergey's Feedback on Performance.
>      1.  To set the context for everyone, the logic present in the current 
> webrev.02 is modelled as per SRCOVER_MASKFILL.
>                   There are multiple if (...) conditions that remove 
> un-necessary blending calculations. Ideally this should improve performance.
>                   However, since some data are not readily available (as 
> present in SRCOVER_MASKFILL), few additional calculations have been added.
>                   They are: pre-multiplying srcColor with alpha and assigning 
> to res.
>                                       Finding the correct address of Pixel 
> using DST_PTR and PixelStride.
>                   Henceforth, as Sergey suggests, Observation on performance 
> will be beneficial.
>      2.  The performance of the new logic was measured with 
> linux-x86_64-normal-server-release config and compared with the logic used in 
> un-optimized code in webrev.00
>      3.  Result: The newer logic provides a fractional gain (about 15 - 20 
> ms) over the older logic.
>
> Other Subtle Changes:
>      1.  The test file has been renamed from AADrawStringArtifact.java to 
> more meaningful - AntialiasedTextArtifact.java
>      2.  The test file tests for both TYPE_INT_ARGB and TYPE_INT_ARGB_PRE 
> BufferedImage formats.
>                  The code has been well commented to explain the logic used 
> in every function.
>
> Kindly take your time to review the changes in the webrev link mentioned 
> below and provide your suggestions.
>           Webrev Link: 
> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.02/
>
>
> Thank you for your time in review
> Have a good day
>
> Prahalad N.
>
>
> -----Original Message-----
> From: Jim Graham
> Sent: Thursday, March 24, 2016 7:57 AM
> To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net
> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: 
> Antialiased text on translucent backgrounds gets bright artifacts
>
> Hi Prahalad,
>
> (On a side note - ouch!  I came up with these macros in the first 
> place, but 20 years later I'm now realizing just how hard they are to 
> navigate and review.  My apologies...)
>
> The macro you are still using to load the destination data is one that loads 
> it to non-premultiplied components, which means it will divide out the alpha 
> if the destination is PRE.  The rest of the logic assumes that the components 
> were loaded without any adjustment of their premultiplication so not only is 
> that division an unnecessary operation, it makes the following math wrong.
>
> The SRCOVER_MASKFILL macro seems to use "Postload ## STRATEGY ## From ## 
> TYPE" which seems to load them into separate components without any 
> adjustment of their pre-multiplication status.  This is paired with 
> "LoadAlphaFrom ## TYPE ## For ## STRATEGY" to load just the destination alpha 
> for computing dstF...
>
>                       ...jim
>
> On 3/22/16 4:35 AM, Prahalad Kumar Narayanan wrote:
>> Hello Everyone on Java2D Group
>>
>> Good day to you.
>>
>> This is a Follow-up to Review Request on the bug:
>>       Bug          : JDK-8015070   Anti-aliased Text on Translucent 
>> background gets bright artifacts
>>       Bug Link : https://bugs.openjdk.java.net/browse/JDK-8015070
>>
>> First, Sincere thanks to Jim for his valuable feedback.
>>       1.  As Jim correctly mentioned, SRCOVER_MASKFILL has a similar logic 
>> to blend two Translucent colors based on an Alpha mask.
>>       2.  The calculations are exactly the same as the changes in previous 
>> webrev.
>>                   However the logic of SRCOVER_MASKFILL is 'Optimized' to 
>> reduce the number of computations.
>>       3.  This optimization is definitely required because, the logic is 
>> executed for every single pixel in a glyph.
>>                   Example: If a string is made up of 5 English characters 
>> with each character having 32 x 32 pixels,
>>                                      The anti-aliasing logic will be 
>> executed for 5 x 32 x 32 iterations.
>>                   Reducing computation per pixel will imply a huge benefit 
>> for complete drawString operation.
>>
>> Observation from SRCOVER_MASKFILL
>>       1.  The mask fill reduces computations by through multiple if(...) 
>> conditions.
>>                     Each if condition affirms whether the next set of 
>> computations are required.
>>       2.  Optimization 1: If mask value is 0- skip entire logic.
>>       3.  Optimization 2: If mask value is maximum, say 255, take srcA 
>> directly without multiplying with maskAlpha ( Reason: 1 * srcAlpha = 
>> srcAlpha )
>>       4.  Optimization 3: Compute pre-multiplied resColor in two steps. 
>> First with srcColor and then with dstColor.
>>                     If the resAlpha from 1st step (i.e., srcColor) is fully 
>> opaque, avoid blending dstColor altogether.
>>
>> Changes in Current Webrev.01
>>       1.  The fix for the current bug is modelled based on SRCOVER_MASKFILL.
>>       2.  The changes have been verified to work on windows, linux and mac 
>> operating systems.
>>       3.  The automated Test file- AADrawStringArtifact.java runs as expected
>>                  Identifies artifact & throws exception when run on JDK 7 
>> and 8.
>>                  With JDK9, the test file returns without error.
>>       3.  JPRT build has been run to ensure, changes build on all supported 
>> platforms.
>>                   JPRT job link :
>> http://scaaa637.us.oracle.com//archive/2016/03/2016-03-22-070604.prah
>> n
>> ara-linux.client
>>
>> Kindly review the changes in the below mentioned link and provide your views
>>       Webrev Link :
>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.01/
>>
>>
>> Thank you for your time in review
>> Have a good day
>>
>> Prahalad N.
>>
>>
>> -----Original Message-----
>> From: Jim Graham
>> Sent: Friday, March 18, 2016 6:07 AM
>> To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net
>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
>> Antialiased text on translucent backgrounds gets bright artifacts
>>
>> Hi Prahalad,
>>
>> This basically boils down to "alpha blending math needs to be performed in 
>> premultiplied form" and the existing code was not doing that.
>>
>> Your changes do add a pre-multiplication step to the math in two 
>> places
>> - when you mix the src alpha and the glyph alpha at the top of the macro, 
>> and again when you do the Multiply(dst, dstA, dst) step in the middle.  In 
>> that sense, the new math is correct.
>>
>> However, it is not the optimal way to implement this.  In particular, 
>> the macro used here to load the data from the destination is the one 
>> that loads it into 4 ARGB non-premultiplied values.  If the 
>> destination is non-PRE, then your added multiply step is exactly what 
>> is needed, but it could be combined with the multiply that will 
>> happen later in the blending equation, so it is an added step rather 
>> than a fixed fraction in the existing MultMultAdd parameters.  
>> Additionally, if the destination is already PRE, then the macro being 
>> used to load the dst pixel data there will have done a divide step to 
>> divide out the alpha from the destination, only to have you reverse 
>> that math with your new
>> Multiply() step.  That's a lot of math to end up with a NOP.
>>
>> The MUL8 you added for the srcA and glyph value is needed, that change is 
>> good.  Since it is common for glyph pixels to have zero alpha, you might 
>> want to test the glyph alpha for 0 and skip the pixel before you do the 
>> multiply, though.  This would add one more if, but it would be a common case.
>>
>> The trickier part is to load the destination components without 
>> un-premultiplying them.  Unfortunately there is no "Load...ToArgbPre"
>> macro to parallel the Load macro used in the function.  Perhaps there should 
>> be, but you'd still end up with an extra multiply step as I mentioned above 
>> because you can fold the premultiplication of the dst data into the 
>> MultMultAdd by carefully choosing the parameters you use in the math there.
>>
>> The good news is that the way that the SRCOVER_MASKFILL uses the various 
>> type-specific macros works around this a bit and minimizes the number of 
>> multiplies that happen.  You could check out DEFINE_SRCOVER_MASKFILL and see 
>> how it works in the case where pMask is not null (pMask is an alpha mask 
>> with values very similar to the glyphAA data).  Modeling this code on that 
>> code would correct the math and minimize it as well...
>>
>>                      ...jim
>>
>> On 3/17/16 3:00 AM, Prahalad Kumar Narayanan wrote:
>>> Hello Everyone on Java2D Group
>>>
>>> Good day to you.
>>>
>>> Herewith, I 'm sharing the webrev for two identical Java2D Bugs.
>>>
>>> Bug ID : JDK-8015070
>>>
>>>           Title     : Antialiased text on translucent backgrounds gets
>>> bright artifacts
>>>
>>>           Link      : https://bugs.openjdk.java.net/browse/JDK-8015070
>>>
>>> Bug ID : JDK-8013564 ( currently closed as duplicate )
>>>
>>>           Title     : Font rendering anti-aliasing in translucent
>>> BufferedImages broken
>>>
>>>           Link      : https://bugs.openjdk.java.net/browse/JDK-8013564
>>>
>>> Webrev Link :
>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
>>>
>>> Quick Summary on Bugs :
>>>
>>> ````````````````````````````````````````````````
>>>
>>> 1.  Artifacts appear on Edges of text characters when anti-aliased 
>>> text is drawn on Translucent background
>>>
>>> 2.  The issue is reproducible on all platforms - windows, linux and mac os.
>>>
>>> 3.  Besides, the issue is reproduced with the commonly used pixel
>>> format- 4ByteArgb.
>>>
>>> Root Cause & Solution :
>>>
>>> ````````````````````````````````````````````````
>>>
>>> 1.  The Macro: GlyphListAABlend4ByteArgb in File: LoopMacros.h uses 
>>> the standard blending algorithm
>>>
>>>            dstColor = [ srcColor * glyphAlpha + dstColor * (1 -
>>> glyphAlpha) ] / dstAlpha
>>>
>>> 2.  The above equation works only when the srcColor and dstColor are Opaque.
>>>
>>> 3.  When srcColor and dstColor are Translucent, their alphaComponent 
>>> will influence the visibility of the color, and visibility of the 
>>> color below.
>>>
>>> 4.  The new set of calculations for blending Translucent source and 
>>> destination colors is given as
>>>
>>>            resAlpha = 1 - ((1 - srcAlpha) * (1 - dstAlpha))
>>>
>>>            resColor = [ srcColor * srcAlpha + dstColor * dstAlpha * 
>>> (1
>>> -
>>> srcAlpha) ] / resAlpha
>>>
>>> 5.  Reference text for the equation:
>>> https://en.wikipedia.org/wiki/Alpha_compositing
>>>
>>> 6.  With the above modification to the blending logic, the artifacts 
>>> do not appear and issues are fixed.
>>>
>>> Jtreg & Jprt Results :
>>>
>>> ````````````````````````````````````````````````
>>>
>>> 1.  A simple test-file: AADrawStringArtifact.java has been created 
>>> to be a part of jtreg test cases.
>>>
>>>             The test file is well commented to explain - nature of 
>>> artifact and how the test tries to identify them.
>>>
>>>             As required, the test case fails with Jdk 7, Jdk 8 and 
>>> succeeds with Jdk 9-internal (built with changes for the bug fix)
>>>
>>> 2.  The changes to blending logic lies within 
>>> java.desktop/src/share/native/...
>>>
>>>             Henceforth, JPRT was used to ensure successful build 
>>> across all supported platforms
>>>
>>>             Jprt Job Link :
>>> http://scaaa637.us.oracle.com//archive/2016/03/2016-03-17-072001.pra
>>> h
>>> n
>>> ara-linux.client
>>>
>>>             The build succeeds on all platforms.
>>>
>>> Kindly review the webrev link and provide your views and suggestions.
>>>
>>> Webrev Link :
>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
>>>
>>> If the changes look good, we could take up the changes for source checkin.
>>>
>>> Thank you for your time in review
>>>
>>> Have a good day
>>>
>>> Prahalad N.
>>>

Reply via email to