Hi Prahalad, Changes are working fine.+1. In test case please make jtreg comments as multiline comments and move them below copyright and above import statements for readability before pushing.
Thanks, Jay -----Original Message----- From: Jim Graham Sent: Tuesday, May 03, 2016 1:42 AM To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased text on translucent backgrounds gets bright artifacts Thanks Prahalad, Looks great - +1 ...jim On 5/2/2016 8:29 AM, Prahalad Kumar Narayanan wrote: > Dear Jim & Everyone on Java2D Community > > First, thanks to Jim for every word in the feedback. > > Java is used by millions of devices and users. We cannot compromise on > quality. > It 's through these reviews and iterations that we ensure a consistent > solution gets checked in. > > I 've incorporated the suggestions from last review and changes are available > in webrev. > Review Link: http://cr.openjdk.java.net/~jdv/Prahalad/8015070/webrev.06/ > > Quick Brief on Changes : > 1. The macros- Declare ## DST ## SrcOverDstBlendFactor and Store ## DST > ## SrcOverDstBlendFactor are removed. > 2. Their usage is now replaced with > DeclareAlphaVarFor4ByteArgb : For declaring the blend factor > SrcOver ## DST ## BlendFactor(dstF, dstA) : To return the > appropriate blend factor between dstF and dstA > 3. Most of the macro names contain image TYPE in the earlier part and > ForSTRATEGY at the end of the name. > (Note: This is just an observation not a Thumb rule) Hence > SrcOver ## DST ## BlendFactor macro name is being used. > 4. As with every change in the native code, Internal automated build > system was used to ensure successful compilation across all platforms. > > Kindly review the changes and provide your opinion. > > Thank you > Have a good day > > Prahalad N. > > > -----Original Message----- > From: Jim Graham > Sent: Thursday, April 28, 2016 2:05 AM > To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net > Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: > Antialiased text on translucent backgrounds gets bright artifacts > > Thanks Prahalad, > > First, the macro design issues for all of these LoopMacros.h et al files > are pretty complex, so I apologize if this is a big learning step here > and if the feedback can look like nit-picking at times. But the system > is complicated enough that we should take care to make sure the new work > remains consistent to what is already there and to keep this complicated > system maintainable in the long run. > > The only issue I have with the new macros is that Store*() naming > convention is typically used when the macro itself does the assignment. > In the case of the new Store*BlendFactor() macros, they simply return > the value and the caller does the assignment, so the name is off. This > could be fixed either by moving "blendF" to an argument to the macro, or > renaming it to one of the suggestions I gave earlier that don't imply > assignment inside the macro. > > Another thing to consider is that the type of the blend factor is > actually determined by the alpha blend token "STRATEGY", not by the > image type. Many of the alpha blend types are dependent on the image > types being used, but it is not solely determined by the image type, > sometimes it depends on the pair of src/dst image types. In any case, > the STRATEGY used for alpha blending is not directly tied to an image > type. Other alpha loop macros take a STRATEGY as an argument and invoke > the proper alpha blending value declaration and manipulation macros with > "For ## Strategy" macro name expansions. But we are hard-coding > "For4ByteArgb" in this particular macro, which means we are hard-coding > the particular type of alpha math. Since that determination is done by > hard-coding in this macro and not by delegating to the image type, then > it is inappropriate to ask the image-type-based macro to decide how to > declare the blend factor. If anything, we should use > "DeclareAlphaVarFor4ByteArgb()" to declare the variable, not an > image-based macro. Note that the Declare() macros for the alpha math > strategies take an argument called "VAR" so then you can assume that > they've named the variable the same token that you supplied. > > My recommendation would be to delete the new Declare macros and replace > their use in LoopMacros.h with a usage of the existing > DeclareAlphaVarFor4ByteArgb() macro, and then I would move the resulting > variable into the argument list for the Store ## BlendFactor macro and > move the assignment part of the statement inside of the macro... > > ...jim > > On 4/27/16 6:37 AM, Prahalad Kumar Narayanan wrote: >> Dear Jim and Everyone on Java2D Community >> >> Good day to you. >> >> First, Thanks to Jim for his feedback. >> I 've incorporated his feedback in latest version of code and webrev link is >> shared below >> Webrev Link: >> http://cr.openjdk.java.net/~aghaisas/prahalad/8015070/webrev.05/ >> >> Quick Description on Changes : >> 1. Removing redundant code MultiplyAndStore4ByteArgbComps(res, resA, >> SRC_PREFIX) >> The redundant code within if and else blocks of if ( mixValSrc != >> 0xff ) has been moved into block that does color blending. >> As Jim rightly said, the change would execute Multiply operation >> only if blending is required >> On the other hand, If resA reaches maximum value, the fast path is >> executed. >> >> 2. Fast path execution >> The fast path that directly sets foreground color has been moved >> into else block after checking if ( resA != MaxValFor4ByteArgb ) >> Conceptually, resA would be maximum only when glyphAlpha (mixValSrc) >> and srcAlpha are maximum. >> >> 3. Isolated Declare and Store macros. >> A single macro DeclareAndInit... has been split into two macros >> Declare ## DST ## SrcOverDstBlendFactor and Store ## DST ## >> SrcOverDstBlendFactor. >> This would indeed address some compilers that do not allow >> declaration in the middle of the block. >> >> 4. Other Relevant Information: >> The changes have been verified to build on all platforms through >> JPRT auto build system. >> Java2D benchmarks were run with the changes and no degradation on >> performance was seen compared to webrev.04 >> Regression was run and no new regression failures have been >> introduced with the change. >> >> 5. To reduce reviewer 's effort in review, code has been expanded with >> comments herewith. >> >> #define GlyphListAABlend4ByteArgb(DST, GLYPH_PIXELS, PIXEL_INDEX, DST_PTR, \ >> FG_PIXEL, PREFIX, SRC_PREFIX) \ >> do { \ >> DeclareAlphaVarFor4ByteArgb(resA) \ >> DeclareCompVarsFor4ByteArgb(res) \ >> jint mixValSrc = GLYPH_PIXELS[PIXEL_INDEX]; \ >> if (mixValSrc) { \ >> /* Evaluating srcAlpha component */ \ >> if (mixValSrc != 0xff) { \ >> /* No-op. Retained to match reference as SRCOVER_MASK_FILL >> */ \ >> PromoteByteAlphaFor4ByteArgb(mixValSrc); \ >> /* Glyph mask determines visibility of the srcColor */ \ >> resA = MultiplyAlphaFor4ByteArgb(mixValSrc, SRC_PREFIX ## >> A); \ >> } else { \ >> /* This is understood easier with floating point logic. */ \ >> /* 1.0f is max value used in floating point calculation */ \ >> /* (1.0f * srcA) gives srcA. Hence we directly consider */ \ >> /* srcA without Multiply with glyph mask. */ \ >> resA = SRC_PREFIX ## A; \ >> } \ >> /* Blending srcColor and dstColor */ \ >> /* This is required only when resA(i.e., srcA) is not maximum */ >> \ >> if (resA != MaxValFor4ByteArgb) { \ >> DeclareAndInvertAlphaVarFor4ByteArgb(dstF, resA) \ >> DeclareAndClearAlphaVarFor4ByteArgb(dstA) \ >> DeclareCompVarsFor4ByteArgb(dst) \ >> DeclareCompVarsFor4ByteArgb(tmp) \ >> /* Redundant statement moved from previous if -else block */ >> \ >> MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX); \ >> /* Based on the pixelFormat we could reduce calculations */ \ >> /* done to load the color and alpha components */ \ >> if (!(DST ## IsPremultiplied)) { \ >> Load ## DST ## To4ByteArgb(DST_PTR, pix, PIXEL_INDEX, \ >> dstA, dstR, dstG, dstB); \ >> Store4ByteArgbCompsUsingOp(tmp, =, dst); \ >> } else { \ >> Declare ## DST ## AlphaLoadData(DstPix) \ >> jint pixelOffset = PIXEL_INDEX * (DST ## PixelStride); \ >> DST ## DataType *pixelAddress = PtrAddBytes(DST_PTR, \ >> >> pixelOffset); \ >> /* The macro's implementation loads color components */ >> \ >> /* without divide by alpha adjustment as required for */ >> \ >> /* subsequent calculations. Note: This is used only */ >> \ >> /* with preMultiplied alpha based pixel formats */ \ >> LoadAlphaFrom ## DST ## For4ByteArgb(pixelAddress, \ >> DstPix, \ >> dst); \ >> Postload4ByteArgbFrom ## DST(pixelAddress, \ >> DstPix, \ >> tmp); \ >> } \ >> /* Avoid blending operatons if dst is fully transparent */ \ >> if (dstA) { \ >> Declare ## DST ## SrcOverDstBlendFactor(blendF) \ >> /* dstA would be 0 if either of the following is true. >> */ \ >> /* 1. srcA is max. Parent if condition validates this. >> */ \ >> /* 2. dstA is zero. The current if condition validates >> */ \ >> /* Henceforth, the following Multiply need not be moved >> */ \ >> /* ahead of the if condition. This also helps to better >> */ \ >> /* performance */ \ >> dstA = MultiplyAlphaFor4ByteArgb(dstF, dstA); \ >> resA += dstA; \ >> /* Declares blendF variable and assigns appropriate */ \ >> /* alpha value. The definitions are contained in the*/ \ >> /* header files of respective pixel formats */ \ >> blendF = Store ## DST ## SrcOverDstBlendFactor(dstF, \ >> dstA); \ >> /* This is understood easier with floating point >> logic.*/ \ >> /* 1.0f is max value used in floating point >> calculation*/ \ >> /* (1.0f * tmp) gives tmp. Hence we avoid Multiply >> */ \ >> /* operation and directly add loaded color to result*/ \ >> if (blendF != MaxValFor4ByteArgb) { \ >> MultiplyAndStore4ByteArgbComps(tmp, \ >> blendF, \ >> tmp); \ >> } \ >> Store4ByteArgbCompsUsingOp(res, +=, tmp); \ >> } \ >> } else { \ >> /* resA is maximum only when srcA and glyphAlpha are max >> */ \ >> /* Hence the fast path to store foreground pixel color and >> */ \ >> /* break to the next pixel. */ \ >> Store ## DST ## PixelData(DST_PTR, PIXEL_INDEX, \ >> FG_PIXEL, PREFIX); \ >> break; \ >> } \ >> /* In the above calculations, color values are multiplied with >> */ \ >> /* with alpha. This is perfectly fine for pre-Multiplied alpha >> */ \ >> /* based pixel formats. For non pre-multiplied alpha based >> */ \ >> /* pixel formats, the alpha is removed from color components >> */ \ >> /* and then stored to the resulting color */ \ >> if (!(DST ## IsOpaque) && \ >> !(DST ## IsPremultiplied) && resA && \ >> resA < MaxValFor4ByteArgb) \ >> { \ >> DivideAndStore4ByteArgbComps(res, res, resA); \ >> } \ >> Store ## DST ## From4ByteArgbComps(DST_PTR, pix, \ >> PIXEL_INDEX, res); \ >> } \ >> } while (0); >> >> Kindly review the changes present in the webrev and provide your views. >> >> Thank you once again for your effort in review >> Have a great day >> >> Prahalad N. >> >> >> -----Original Message----- >> From: Jim Graham >> Sent: Wednesday, April 27, 2016 2:12 AM >> To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net >> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: >> Antialiased text on translucent backgrounds gets bright artifacts >> >> Hi Prahalad, >> >> One potential portability issue - you have a "DeclareAndInit" macro for the >> new "choose which blend factor" macro in the middle of a block. >> Some C compilers allow declaring a new variable in the middle of a block, >> but not all. Since it would be hard to move the declaration to the top of >> the block, since it depends on a value computed in the first couple of >> lines, it might be better to change it from a "DeclareAndInit" >> style macro into a regular declaration, and a macro to get the value, so >> "jint blendF;" at the top of the block and "blendF = >> SrcOverDstBlendFactorFor ## DST(...);" in the middle of the block. (Or name >> it "SrcOver ## DST ## BlendFactor(...)?" >> >> I just noticed something about this .04 version of the function. At the top >> you have a test for mixValSrc != 255 and then both clauses of the if >> statement end with essentially the same code, multiplying the source by the >> value in resA. (The version in the else clause uses SRC_PREFIX ## A, but it >> could just as easily also use resA since they are the same value at that >> point.) >> >> This means you could potentially move both "MultiplyAndStore" macros until >> after the if and use resA as the multiplier. Now, if you look, the >> immediate next statement tests if resA is maxVal. If it is maxVal, then you >> don't need to do that multiply at all, so the macro could be moved even >> further to be inside the "if (resA != maxVal)" block. >> >> At that point, you could then reinstate the "else" on that if block and move >> the Store##DST##PixelData into that else, to save you another test of resA. >> This would essentially look like this: >> >> if (mixValSrc != 255) { >> PromoteByteAlpha...; >> resA = MultiplyAlpha...; >> } else { >> resA = SRC_PREFIX ## A; >> } >> if (resA != MaxVal) { >> Declare,declare,declare,declare...; >> Multiply...Comps(res, resA, SRC_PREFIX); } else { >> Store ## DST ## PixelData...; >> } >> >> It shortens the code a little, but I'm not sure if it would really help >> performance other than not having to do the test for maxVal twice. >> Still, tests are fairly expensive in code like this so it could be worth a >> shot at testing, and it would simplify the code a bit in either case... >> >> ...jim >> >> On 4/15/16 12:25 AM, Prahalad Kumar Narayanan wrote: >>> Hello Jim & Everyone on Java2D Community >>> >>> Good day to you. >>> >>> This is a review request and a follow-up to the bug-fix for the issue >>> Bug : JDK-8015070 >>> Link : https://bugs.openjdk.java.net/browse/JDK-8015070 >>> >>> Webrev Link : >>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.04/ >>> >>> Quick Inferences on Changes in Current -Webrev.04 >>> >>> 1 ) Subtle changes to the blend loop- >>> . The subtle changes taken up, have helped to improve the >>> performance. >>> . With the current logic in webrev.04, Java2DBench reports better >>> performance than Un-Optimized solution that was present in webrev.00 >>> . J2DBench was run for Font Styles { Plain, Bold, Italic, Bold n >>> Italic } and Font Sizes { 12, 20, 36, 72 } >>> . My sincere thanks to Jim for all his detailed feedback through >>> the multiple reviews that has evolved the solution today. >>> >>> (Details on changes) >>> 1.a. Loading of Color components >>> . When modelled as per SRCOVER_MASK_FILL code, the logic required >>> few additional calculations to load color components. >>> . The extra calculations indeed impacted performance figures. >>> . This could be offset in two possible ways >>> a. Inspect parent macro- NAME_SOLID_DRAWGLYPHLISTAA and >>> advance by pixel address and not by pixel index. >>> The parent macro invokes >>> GlyphListAABlendFourByteArgb through this macro- GlyphListAABlend ## >>> STRATEGY(DST, pixels, x, pPix, fgpixel, solidpix, src); >>> Changing parent macro will cause spurious >>> changes across GlyphListAABlend ## other pixel formats. >>> There is additional risk of breaking the >>> stable and well optimized LoopMacros.h. >>> b. Load color components based on pre-Multiplication status >>> This has been taken up and change is limited >>> to the function being modified. >>> Thankfully J2DBench has still reported >>> improvement in performance. >>> >>> 1.b. New Macro to avoid if (DST ## IsPremultiplied) { >>> . A new macro- DeclareAndInit ## DST ## SrcOverMaskBlendFactor has >>> been introduced to choose between dstF, or dstA >>> . The implementation is available in the header files of pixel >>> formats ( Eg: IntArgb.h IntArgbPre.h ) >>> . There are 29 different pixel formats known to Java2D, and >>> . Hence, the new macro's implementation is added only to >>> those pixel formats that require the current glyph blending logic. >>> >>> 2 ) Testing across different formats >>> . The Regression test code has been modified to test anti-aliased >>> text rendering on 7 different pixel formats- >>> . IntArgb, IntArgb_Pre, FourByteAbgr, FourByteAbgr_Pre, >>> IntRGB, IntBGR, 3ByteBGR. >>> . As expected, the test fails without the fix on JDK 8 and JDK 7 >>> versions & passes with JDK 9-internal containing the fix. >>> >>> 3 ) Explanation on Code Changes : >>> . With multiple reviews and changes, today the code fixes the bug >>> and is well optimized as well. >>> . For ease of reviewer and effort in review, I 've explained the >>> logic with /* comment statements */ herewith. >>> . Note: These comments don't figure in the webrev. >>> As one cannot guarantee how /* comments */ within >>> macros would be perceived by compiler across different platforms. >>> >>> #define GlyphListAABlend4ByteArgb(DST, GLYPH_PIXELS, PIXEL_INDEX, DST_PTR, \ >>> >>> FG_PIXEL, PREFIX, SRC_PREFIX) \ >>> do { \ >>> DeclareAlphaVarFor4ByteArgb(resA) \ >>> DeclareCompVarsFor4ByteArgb(res) \ >>> jint mixValSrc = GLYPH_PIXELS[PIXEL_INDEX]; \ >>> if (mixValSrc) { \ >>> /* Evaluating srcColor components */ \ >>> if (mixValSrc != 0xff) { \ >>> /* No-op. Retained to match reference as SRCOVER_MASK_FILL >>> */ \ >>> PromoteByteAlphaFor4ByteArgb(mixValSrc); \ >>> /* Glyph mask determines visibility of the srcColor */ \ >>> resA = MultiplyAlphaFor4ByteArgb(mixValSrc, SRC_PREFIX ## >>> A); \ >>> MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX); \ >>> } else { \ >>> /* If mixValSrc and srcA are maximum, then result color is >>> */ \ >>> /* opaque. Hence the fast path to store foreground pixel >>> */ \ >>> /* color and return. */ \ >>> if (SRC_PREFIX ## A == MaxValFor4ByteArgb) { \ >>> Store ## DST ## PixelData(DST_PTR, PIXEL_INDEX, \ >>> FG_PIXEL, PREFIX); \ >>> break; \ >>> } \ >>> /* This is understood easier with floating point logic. */ \ >>> /* 1.0f is max value used in floating point calculation */ \ >>> /* (1.0f * srcA) gives srcA. Hence we directly consider */ \ >>> /* srcA without Multiply with glyph mask. */ \ >>> resA = SRC_PREFIX ## A; \ >>> MultiplyAndStore4ByteArgbComps(res, \ >>> SRC_PREFIX ## A, \ >>> SRC_PREFIX); \ >>> } \ >>> /* Evaluating dstColor components */ \ >>> /* This is required only when resA(i.e., srcA) is not maximum >>> */ \ >>> if (resA != MaxValFor4ByteArgb) { \ >>> DeclareAndInvertAlphaVarFor4ByteArgb(dstF, resA) \ >>> DeclareAndClearAlphaVarFor4ByteArgb(dstA) \ >>> DeclareCompVarsFor4ByteArgb(dst) \ >>> DeclareCompVarsFor4ByteArgb(tmp) \ >>> /* Based on the pixelFormat we could reduce calculations */ >>> \ >>> /* done to load the color and alpha components */ \ >>> if (!(DST ## IsPremultiplied)) { \ >>> Load ## DST ## To4ByteArgb(DST_PTR, pix, PIXEL_INDEX, \ >>> dstA, dstR, dstG, dstB); \ >>> Store4ByteArgbCompsUsingOp(tmp, =, dst); \ >>> } else { \ >>> Declare ## DST ## AlphaLoadData(DstPix) \ >>> jint pixelOffset = PIXEL_INDEX * (DST ## PixelStride); \ >>> DST ## DataType *pixelAddress = PtrAddBytes(DST_PTR, \ >>> >>> pixelOffset); \ >>> /* The macro's implementation loads color components >>> */ \ >>> /* without divide by alpha adjustment as required for >>> */ \ >>> /* subsequent calculations. Note: This is used only >>> */ \ >>> /* with preMultiplied alpha based pixel formats */ \ >>> LoadAlphaFrom ## DST ## For4ByteArgb(pixelAddress, \ >>> DstPix, \ >>> dst); \ >>> Postload4ByteArgbFrom ## DST(pixelAddress, \ >>> DstPix, \ >>> tmp); \ >>> } \ >>> /* Avoid blending operatons if dst is fully transparent */ \ >>> if (dstA) { \ >>> /* dstA would be 0 if either of the following is true. >>> */ \ >>> /* 1. srcA is max. Parent if condition validates this. >>> */ \ >>> /* 2. dstA is zero. The current if condition validates >>> */ \ >>> /* Henceforth, the following Multiply need not be >>> moved*/ \ >>> /* ahead of the if condition. This also helps to >>> better*/ \ >>> /* performance */ \ >>> dstA = MultiplyAlphaFor4ByteArgb(dstF, dstA); \ >>> resA += dstA; \ >>> /* Declares blendF variable and assigns appropriate */ \ >>> /* alpha value. The definitions are contained in the*/ \ >>> /* header files of respective pixel formats */ \ >>> DeclareAndInit ## DST ## SrcOverDstBlendFactor(dstF, \ >>> dstA, \ >>> blendF); >>> \ >>> /* This is understood easier with floating point >>> logic.*/ \ >>> /* 1.0f is max value used in floating point >>> calculation*/ \ >>> /* (1.0f * tmp) gives tmp. Hence we avoid 3 Multiply >>> */ \ >>> /* operations and add the loaded color to result >>> */ \ >>> if (blendF != MaxValFor4ByteArgb) { \ >>> MultiplyAndStore4ByteArgbComps(tmp, \ >>> blendF, \ >>> tmp); \ >>> } \ >>> Store4ByteArgbCompsUsingOp(res, +=, tmp); \ >>> } \ >>> } \ >>> /* In the above calculations, color values are multiplied with >>> */ \ >>> /* with alpha. This is perfectly fine for pre-Multiplied alpha >>> */ \ >>> /* based pixel formats. For non pre-multiplied alpha based >>> */ \ >>> /* pixel formats, the alpha is removed from color components >>> */ \ >>> /* and then stored to the resulting color. */ \ >>> if (!(DST ## IsOpaque) && \ >>> !(DST ## IsPremultiplied) && resA && \ >>> resA < MaxValFor4ByteArgb) \ >>> { \ >>> DivideAndStore4ByteArgbComps(res, res, resA); \ >>> } \ >>> Store ## DST ## From4ByteArgbComps(DST_PTR, pix, \ >>> PIXEL_INDEX, res); \ >>> } \ >>> } while (0); >>> >>> My apologies if the above code did not appear on the final webrev email. >>> ( In few instances, the newlines don't appear in plain-text format ) >>> >>> Kindly review the changes present in webrev and provide your views. >>> If the changes are good, we could take up for the code check-in. >>> >>> Thank you for your time in review >>> Have a good day >>> >>> Prahalad N. >>> >>> >>> >>> -----Original Message----- >>> From: Jim Graham >>> Sent: Tuesday, April 05, 2016 3:07 AM >>> To: Prahalad Kumar Narayanan; Sergey Bylokhov; Philip Race >>> Cc: Praveen Srivastava >>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: >>> Antialiased text on translucent backgrounds gets bright artifacts >>> >>> Hi Prahalad, >>> >>> Can I see the webrev diffs for the "today's experiment" code with the new >>> macro? >>> >>> Also, Did you test this with opaque destinations? The most common use >>> of text is on an opaque destination, so that would matter more. I >>> would >>> suggest: INT_RGB, THREE_BYTE_BGR, INT_ARGB, INT_ARGB_PRE in that order of >>> precedence of importance. Also, test with translucent colors, though those >>> are less important than opaque colors. >>> >>> I'm still looking at why the non-pre would be slower than the pre. >>> About the only difference is the one line "if (!PRE) { DSTF = DSTA; }". >>> One suggestion might be to move the test for transparent destination up a >>> couple of lines, and to get rid of the extra "DSTF = dstA" >>> assignement with the following structure: >>> >>> dstA = Mult...(); >>> if (dstA) { >>> resA += dstA; >>> Declare... >>> Postload... >>> if (DST ## IsPremultiplied) { >>> MultiplyAndStore(..., DSTF, ...); >>> } else { >>> MultiplyAndStore(..., dstA, ...); >>> } >>> Store... >>> } >>> >>> Basically, dstA == 0 is the actual test for "do we need to even try to >>> blend the destination in here". If it is zero then there is no need to add >>> dstA to resA and there is no need to adjust the factor we blend by >>> (MultiplyAndStore). It can be triggered by either a transparent >>> destination pixel or an opaque source pixel, but either way, dstA is the >>> right test, not DSTF. The second part, eliminating the DSTF=dstA >>> assignment, gets rid of a line that might trip up the optimizer by simply >>> having the macro expand differently for the two types. To be even more >>> streamlined, we could create a new set of macros: >>> >>> // In the header files for PRE types: >>> #define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA) (dF) >>> // In the header files for non-PRE types: >>> #define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA) (dA) >>> >>> Then we wouldn't need the test above for "if (DST ## Pre)", we would just >>> expand the macro with: >>> MultiplyAndStore(..., SRCOVER_DST_BLEND_FACTOR ## DST(DSTF, dstA), ...) >>> which would hopefully confuse the optimizer even less. >>> >>> If you want to really eliminate the pixel address computations, you could >>> rewrite the calling loop so that it steps along a pixel pointer rather than >>> using indexing. Have the calling function/macro set up a pRas pointer to >>> the pixel and step that along the row by TYPE##PixelStride as it iterates >>> across the glyph, then hand that into the Glyph blend macro as DST_PTR (and >>> eliminate PIXEL_INDEX as it would be "0" in this case)... >>> >>> ...jim >>> >>> On 4/4/16 4:37 AM, Prahalad Kumar Narayanan wrote: >>>> Dear Jim >>>> >>>> Good day to you. >>>> >>>> ( Just in-case, you had missed my long Friday 's email ) >>>> >>>> Quick Recap of Proceedings >>>> >>>> 1.On Friday, I had profiled two solutions that fix the bug- >>>> JDK-8015070, using Java2D Bench >>>> >>>> 2.Profiling was done for 16 test cases with different combinations of >>>> >>>> a.Font Style: Plain, Bold, Italic, Bold-Italic >>>> >>>> b.Font Size: 12, 20, 36, 72 >>>> >>>> 3.Result from Friday 's profiling experiments: >>>> >>>> a.Regular Solution (Un-optimized) : This was observed to be faster >>>> for IntArgb pixel format >>>> >>>> b.Optimized Solution (based on SrcOver_MaskFill with fast path) : >>>> This was observed to be faster for IntArgb_Pre pixel format >>>> >>>> Update from Today's Experiments >>>> >>>> 1.First, I understood that new calculations introduced (pixelAddress >>>> computation) impacted performance of optimized solution in IntArgb format. >>>> >>>> 2.Henceforth, I made the following changes, while loading destination >>>> color: >>>> >>>> a.Check if the pixel format is PreMultiplied >>>> >>>> b.If the format is preMultiplied, then > take up new calculations and >>>> use LoadAlphaFrom ## DST ## For4ByteArgb macro that does *Not* cause >>>> divide by alpha adjustment >>>> >>>> c.If the format is regular Argb, then > take up loading of colors >>>> using standard Load ## DST ## To4ByteArgb >>>> >>>> 3.Once the release build was available, Java2D Bench was re-run >>>> (using pre-saved options file) >>>> >>>> Result from Bench metrics: >>>> >>>> a.In most of the test cases, the optimized solution has higher metric : >>>> Avg characters/ second for both IntArgb and IntArgb_Pre formats >>>> >>>> b.In 6 / total-16 test cases, optimized solution was marginally lower >>>> than the metrics of Regular un-optimized algorithm (only for >>>> IntArgb_Pre) >>>> >>>> c.However, J2DAnalyzer reported that even these 6-test cases were >>>> within 10% deviation. Hence the algorithms were categorized to be >>>> 'same' in performance. >>>> >>>> Suggestion Required >>>> >>>> 1.The attached zip file, contains Algorithms.cpp - Which lists down >>>> different algorithms profiled so far. >>>> >>>> 2.I 've introduced comments within the macro to explain the change. >>>> >>>> a.Note: These comments will be removed from the final source code to >>>> be checked in. >>>> >>>> 3.Kindly review the latest algorithm (for any issues/ bugs) and >>>> provide your suggestions. >>>> >>>> a.latest algorithm can be easily traced by searching for >>>> "LoadOptimized Algorithm v3" within the file. >>>> >>>> Thank you for your time in review & detailed feedback that you get >>>> every time. >>>> >>>> Every such review improves the quality of code & the solution >>>> >>>> Prahalad N. >>>> >>>> *From:* Prahalad Kumar Narayanan >>>> *Sent:* Friday, April 01, 2016 5:07 PM >>>> *To:* Jim Graham; Sergey Bylokhov; Philip Race >>>> *Cc:* Praveen Srivastava >>>> *Subject:* RE: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: >>>> Antialiased text on translucent backgrounds gets bright artifacts >>>> >>>> Dear Jim >>>> >>>> Good day to you. >>>> >>>> Thanks for your suggestions in the reviews that has evolved the fix >>>> to the bug. >>>> >>>> As guided by you, I measured the performance of Text Rendering with >>>> Text Antialiasing Hint on using Java2D Bench Tool. >>>> >>>> Solutions Profiled: >>>> >>>> We have two solutions - Un optimized solution and Optimized >>>> solution modelled as per SRCOVER_MASKFILL >>>> >>>> ( Both the solutions were profiled on windows x86_64 with >>>> JDK 9-internal Release build ) >>>> >>>> Test Cases Profiled: >>>> >>>> With Font set to : Lucida sans, different combinations of >>>> >>>> Font Styles : Plain, Bold, Italic, Bold Italic && >>>> >>>> Font Sizes : 12, 20, 36, 72 points were tested. >>>> >>>> Attached herewith: JDK8015070-Profiling Data.zip >>>> >>>> The archive contains: >>>> >>>> 1. Algorithms.cpp : Just to have a quick glance of all the >>>> algorithms that we have tried so far. >>>> >>>> 2. *.txt Files : For each test, Java2D bench >>>> reports the average metrics/second. >>>> >>>> The text file >>>> contains collection of all such average metric for nearly 16 >>>> different test cases. >>>> >>>> 3. res Output : .res output from each of the test runs. >>>> >>>> Observation from J2DBench Reports >>>> >>>> 1. I could not get time to run the Analyzer tool across >>>> each of these *.res files. I shall do them on Monday. >>>> >>>> 2. From the summary text ( average chars per. Second ) that >>>> J2DBench reported, >>>> >>>> Un-optimized solution seems to be better for >>>> IntArgb pixel format and >>>> >>>> Optimized solution is better for IntArgb_Pre >>>> pixel format by significant margin. >>>> >>>> 3. I tried to improve the optimized algorithm (based on >>>> SRCOVER_MASKFILL ) further by adding a if (dstA) { ... >>>> >>>> Though there is a marginal improvement, the >>>> optimized solution could not exceed numbers of regular algorithm (for >>>> IntArgb pixel format) >>>> >>>> This could be due to the extra calculations that >>>> we do in-order to load color components separately. >>>> >>>> However, for IntArgb_Pre pixel format, the >>>> optimized solution is way-ahead and gives better performance. >>>> >>>> 4. In the summary reports, you will find >>>> CompatibleBufferedImage( Translucent ) as well. >>>> >>>> In a separate experiment, I found that the pixel >>>> format for compatible buffered image got mapped IntArgb_Pre. >>>> >>>> Thus, the performance numbers match with that of >>>> IntArgb_Pre pixel format >>>> >>>> At the present moment, I 'm confused on the solution would be better >>>> to fix the Bug. >>>> >>>> Kindly share your views and thoughts >>>> >>>> Thank you >>>> >>>> Have a good day >>>> >>>> Prahalad N. >>>> >>>> -----Original Message----- >>>> From: Jim Graham >>>> Sent: Thursday, March 31, 2016 1:46 AM >>>> To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net >>>> <mailto:2d-dev@openjdk.java.net>; Sergey Bylokhov >>>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: >>>> Antialiased text on translucent backgrounds gets bright artifacts >>>> >>>> Hi Prahalad, >>>> >>>> Benchmarks must run for a significant period of time to be valid. >>>> >>>> Measuring with nanoTime() is fine, but the run times should exceed at >>>> least a couple of seconds, not a few nanoseconds. You might want to >>>> use Java2DBench instead (in src/demo/share/java2d in the java.desktop >>>> repo), which calibrates test times, does multiple runs, and includes >>>> an analyzer to compare results from multiple test runs... >>>> >>>> ...jim >>>> >>>> On 3/30/2016 4:27 AM, Prahalad Kumar Narayanan wrote: >>>> >>>>> Hello Jim and Everyone on Java2D Group >>>> >>>>> >>>> >>>>> Good day to you. >>>> >>>>> >>>> >>>>> A quick follow-up to Review Request on bug: >>>> >>>>> Bug : JDK-8015070 >>>> >>>>> Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070 >>>> >>>>> >>>> >>>>> Thank you Jim for the detailed feedback. >>>> >>>>> >>>> >>>>> It takes a lot of time not only in performing the review, but also in >>>>> getting the feedback with clear words. >>>> >>>>> In each review, the solution definitely gets better & better. I 'm >>>> >>>>> happy about it...! :) >>>> >>>>> >>>> >>>>> Quick Inferences from previous feedback: >>>> >>>>> >>>> >>>>> Incorporating the fast path into current logic: >>>> >>>>> 1. I agree with you on this point and I noticed this when we >>>>> modelled the solution as per the mask fill. >>>> >>>>> 2. I ignored it initially for two reasons, >>>> >>>>> a. The statement - if (resA != >>>>> MaxValFor4ByteArgb)... follows srcColor pre-multiplication step and this >>>>> will ensure to skip most of the blending calculations pertaining to >>>>> destination. >>>> >>>>> b. Any addition / tweaks to current logic, might >>>>> deviate from the SRCOVER_MASKFILL reference model. >>>> >>>>> Many a time, managing code with similar >>>>> logic across implementation helps a lot. >>>> >>>>> 3. As you said, including the fast path will avoid few >>>>> multiplications and if checks too. >>>> >>>>> The changes are available in the current webrev. >>>> >>>>> Link: >>>> >>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/ >>>> >>>>> >>>> >>>>> Profiling method, and Metrics: >>>> >>>>> 1. The profiling method that was done yesterday was mere >>>>> execution of the regression test (available in the webrev) and time >>>>> measured with System.currentTimeMillis API >>>> >>>>> Since only one iteration was run, the time soared >>>>> into milli seconds. This could be due to internal glyph caching mechanism. >>>> >>>>> 2. Today, I evolved the regression test, into a benchmark that >>>>> does the following: >>>> >>>>> a. Select Font style : {Plain, Bold, Italic, Bold >>>>> Italic} >>>> >>>>> b. Select Font size : {20, 40, 60, 80} >>>> >>>>> c. Trigger drawstring once before benchmark is run. >>>>> This is to ensure, the font subsystem is done with glyph caching >>>>> internally. >>>> >>>>> d. Use famous string that has all characters-" The >>>>> quick brown fox jumps over the lazy dog. 0123456789. " >>>> >>>>> e. For every style-size combination - run the test >>>>> for 20 iterations and take the average. (Note: Font is fixed to >>>>> 'verdana' ) >>>> >>>>> f. Modify the precision from System.currentTimeMillis >>>>> to System.nanoTime() and reduce elapsedTime to micro seconds. >>>> >>>>> >>>> >>>>> 3. With the above setup in code, my observation on windows is as >>>>> follows: >>>> >>>>> a. In many cases, The optimized logic is either >>>>> equal-to (or) better in performance than the un-optimized logic. >>>> >>>>> The difference is very minimal - few tens >>>>> to few hundreds of micro-seconds. >>>> >>>>> b. The optimized algorithm improves performance for >>>>> Pre-multiplied alpha based destination buffer. >>>> >>>>> c. There are occasional huge deviations where >>>>> optimized logic seems to take longer time. >>>> >>>>> These could be due to external factors >>>>> like- stalls for memory, bus io etc., >>>> >>>>> Since, the deviation is in micro seconds, I >>>>> believe, it may not be a concern. >>>> >>>>> d. The complete list of time measurement taken up >>>>> on windows x86_64 release build is as-follows- >>>> >>>>> I 'm not sure, how the data appears in the >>>>> final webrev-email. >>>> >>>>> Kindly excuse, if the data is un-readable. >>>> >>>>> >>>> >>>>> Platform : Windows x86_64 Release Build Algorithm : Unoptimized. >>>> >>>>> webrev.00 >>>> >>>>> ```````````````````````````````````````````````````````````````````` >>>>> ` >>>>> ` >>>> >>>>> ```` Executing Bench For Image Type: IntArgb >>>> >>>>> -Font Style: Plain /Font Size: 20 /Time consumed (us): 84.742 >>>> >>>>> -Font Style: Plain /Font Size: 40 /Time consumed (us): 318.395 >>>> >>>>> -Font Style: Plain /Font Size: 60 /Time consumed (us): 657.474 >>>> >>>>> -Font Style: Plain /Font Size: 80 /Time consumed (us): 813.079 >>>> >>>>> -Font Style: Bold /Font Size: 20 /Time consumed (us): 386.380 >>>> >>>>> -Font Style: Bold /Font Size: 40 /Time consumed (us): 339.301 >>>> >>>>> -Font Style: Bold /Font Size: 60 /Time consumed (us): 492.631 >>>> >>>>> -Font Style: Bold /Font Size: 80 /Time consumed (us): 625.812 >>>> >>>>> -Font Style: Italic /Font Size: 20 /Time consumed (us): 235.059 >>>> >>>>> -Font Style: Italic /Font Size: 40 /Time consumed (us): 320.180 >>>> >>>>> -Font Style: Italic /Font Size: 60 /Time consumed (us): 474.558 >>>> >>>>> -Font Style: Italic /Font Size: 80 /Time consumed (us): 1188.169 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us): >>>>> 426.988 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us): >>>>> 374.064 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us): >>>>> 732.375 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us): >>>>> 864.68 >>>> >>>>> >>>> >>>>> Executing Bench For Image Type: IntArgb_Pre >>>> >>>>> -Font Style: Plain /Font Size: 20 /Time consumed (us): 129.768 >>>> >>>>> -Font Style: Plain /Font Size: 40 /Time consumed (us): 206.299 >>>> >>>>> -Font Style: Plain /Font Size: 60 /Time consumed (us): 249.941 >>>> >>>>> -Font Style: Plain /Font Size: 80 /Time consumed (us): 362.372 >>>> >>>>> -Font Style: Bold /Font Size: 20 /Time consumed (us): 145.096 >>>> >>>>> -Font Style: Bold /Font Size: 40 /Time consumed (us): 151.589 >>>> >>>>> -Font Style: Bold /Font Size: 60 /Time consumed (us): 240.972 >>>> >>>>> -Font Style: Bold /Font Size: 80 /Time consumed (us): 331.894 >>>> >>>>> -Font Style: Italic /Font Size: 20 /Time consumed (us): 95.028 >>>> >>>>> -Font Style: Italic /Font Size: 40 /Time consumed (us): 245.300 >>>> >>>>> -Font Style: Italic /Font Size: 60 /Time consumed (us): 270.379 >>>> >>>>> -Font Style: Italic /Font Size: 80 /Time consumed (us): 398.139 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us): >>>>> 93.243 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us): >>>>> 475.406 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us): >>>>> 280.085 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us): >>>>> 357.486 >>>> >>>>> >>>> >>>>> Platform : Windows x86_64 Release Build Algorithm : Optimized. >>>> >>>>> webrev.03 >>>> >>>>> ```````````````````````````````````````````````````````````````````` >>>>> ` >>>>> ` >>>> >>>>> ```` Executing Bench For Image Type: IntArgb >>>> >>>>> -Font Style: Plain /Font Size: 20 /Time consumed (us): 120.954 >>>> >>>>> -Font Style: Plain /Font Size: 40 /Time consumed (us): 364.871 >>>> >>>>> -Font Style: Plain /Font Size: 60 /Time consumed (us): 561.799 >>>> >>>>> -Font Style: Plain /Font Size: 80 /Time consumed (us): 653.390 >>>> >>>>> -Font Style: Bold /Font Size: 20 /Time consumed (us): 261.566 >>>> >>>>> -Font Style: Bold /Font Size: 40 /Time consumed (us): 311.054 >>>> >>>>> -Font Style: Bold /Font Size: 60 /Time consumed (us): 490.735 >>>> >>>>> -Font Style: Bold /Font Size: 80 /Time consumed (us): 656.559 >>>> >>>>> -Font Style: Italic /Font Size: 20 /Time consumed (us): 314.289 >>>> >>>>> -Font Style: Italic /Font Size: 40 /Time consumed (us): 378.750 >>>> >>>>> -Font Style: Italic /Font Size: 60 /Time consumed (us): 491.181 >>>> >>>>> -Font Style: Italic /Font Size: 80 /Time consumed (us): 770.172 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us): >>>>> 375.336 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us): >>>>> 571.371 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us): >>>>> 548.300 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us): >>>>> 714.526 >>>> >>>>> >>>> >>>>> Executing Bench For Image Type: IntArgb_Pre >>>> >>>>> -Font Style: Plain /Font Size: 20 /Time consumed (us): 45.026 >>>> >>>>> -Font Style: Plain /Font Size: 40 /Time consumed (us): 219.016 >>>> >>>>> -Font Style: Plain /Font Size: 60 /Time consumed (us): 279.617 >>>> >>>>> -Font Style: Plain /Font Size: 80 /Time consumed (us): 282.829 >>>> >>>>> -Font Style: Bold /Font Size: 20 /Time consumed (us): 51.809 >>>> >>>>> -Font Style: Bold /Font Size: 40 /Time consumed (us): 117.563 >>>> >>>>> -Font Style: Bold /Font Size: 60 /Time consumed (us): 508.049 >>>> >>>>> -Font Style: Bold /Font Size: 80 /Time consumed (us): 402.802 >>>> >>>>> -Font Style: Italic /Font Size: 20 /Time consumed (us): 79.320 >>>> >>>>> -Font Style: Italic /Font Size: 40 /Time consumed (us): 227.473 >>>> >>>>> -Font Style: Italic /Font Size: 60 /Time consumed (us): 330.488 >>>> >>>>> -Font Style: Italic /Font Size: 80 /Time consumed (us): 353.782 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us): >>>>> 54.687 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us): >>>>> 235.505 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us): >>>>> 227.205 >>>> >>>>> -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us): >>>>> 324.308 >>>> >>>>> >>>> >>>>> Updated webrev with changes for the fast-path : >>>> >>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/ >>>> >>>>> Kindly review and provide your suggestions. >>>> >>>>> >>>> >>>>> Thank you once again for detailed review and feedback Have a good >>>>> day >>>> >>>>> >>>> >>>>> Prahalad N. >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> -----Original Message----- >>>> >>>>> From: Jim Graham >>>> >>>>> Sent: Wednesday, March 30, 2016 2:46 AM >>>> >>>>> To: Prahalad Kumar Narayanan;2d-dev@openjdk.java.net >>>>> <mailto:2d-dev@openjdk.java.net>; Sergey Bylokhov >>>> >>>>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: >>>> >>>>> Antialiased text on translucent backgrounds gets bright artifacts >>>> >>>>> >>>> >>>>> Hi Prahalad, >>>> >>>>> >>>> >>>>> This latest version looks like it should produce correct answers. >>>> >>>>> >>>> >>>>> I'd like to see benchmark results on more platforms, particularly Windows >>>>> since the different compilers may optimize these things differently. >>>> >>>>> >>>> >>>>> Also, you didn't mention what data set you used for benchmarking. >>>>> I'd >>>> >>>>> like to see benchmark results for small, medium and large font >>>>> sizes, >>>> >>>>> and possibly bold and italic fonts as well. The reason is that the >>>> >>>>> relative ratios of "empty glyph pixels" to "partial glyph pixels" to >>>> >>>>> "fully covered glyph pixels" changes depending on the font type and >>>> >>>>> size so if you made one of those faster and another slower then the >>>> >>>>> results may be seen as a gain in one type of test if you only test >>>>> one >>>> >>>>> font type and size and it happens to match the part of the code that >>>> >>>>> is more streamlined. Also, for small font sizes the per-glyph >>>> >>>>> overhead might hide per-pixel issues. Please share which fonts and >>>> >>>>> sizes you used for testing and consider using some different sizes, >>>> >>>>> including something very large like 36 or 48 points (something with >>>> >>>>> large areas of opaque >>>> >>>>> pixels) as well as more normal sizes that appear in GUIs. Also, bold >>>>> fonts can have a higher percentage of opaque pixels. >>>> >>>>> >>>> >>>>> In particular... >>>> >>>>> >>>> >>>>> This latest version is missing the "fast path" from the existing code for >>>>> the case of srcA == 255 and glyphA == 255 where it just stores the >>>>> FG_PIXEL values directly. For large fonts/glyphs that case may be as >>>>> important as detecting empty glyph pixels. >>>> >>>>> >>>> >>>>> On the other hand, an additional "if" may cause the compiler to generate >>>>> less efficient code as per Sergey's concern. Since this "if" eliminates >>>>> some multiplies and possibly some divides, I'd hope it would be a win-win. >>>> >>>>> >>>> >>>>> You could add the fast path back inside the case where mixValSrc is 255 >>>>> and just test srcA for 255 and do the Store ## DST ## PixelData() macro >>>>> that used to be at the end of the block in the existing code, and then >>>>> use "break;" to escape out of the do/while surrounding the whole macro so >>>>> it moves on to the next pixel. >>>> >>>>> >>>> >>>>> (Arguably, we might want to consider teaching our SRCOVER_MASKFILL >>>>> to >>>> >>>>> do the same thing. I think that was one of the added values of >>>>> having >>>> >>>>> a separate GLYPH loop, but really both should be optimizing that >>>> >>>>> case...) >>>> >>>>> >>>> >>>>> I can see now that the macro switch to use the same macro set as >>>>> SRCOVER_MASKFILL required you to compute the pixel address, as you noted >>>>> in your summary. It makes the new macro more cumbersome and makes it >>>>> look like it's doing a bit more work per-pixel, but in reality I think >>>>> the overall operations end up being the same as long as the compiler >>>>> optimizes the deliberate multiplications the same way it did for implicit >>>>> multiplications in the "pRas[foo]" and "pRas[foo*4]" code that was being >>>>> generated previously. Benchmarks will tell us if we're good there... >>>> >>>>> >>>> >>>>> ...jim >>>> >>>>> >>>> >>>>> >>>> >>>>> On 3/28/16 5:33 AM, Prahalad Kumar Narayanan wrote: >>>> >>>>>> Hello Everyone on Java2D Group >>>> >>>>>> >>>> >>>>>> Good day to you. >>>> >>>>>> >>>> >>>>>> This is a follow-up to Review Request on bug: >>>> >>>>>> Bug : JDK-8015070 >>>> >>>>>> Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070 >>>> >>>>>> >>>> >>>>>> First, Thanks to Jim and Sergey for their feedback on the changes so far. >>>> >>>>>> >>>> >>>>>> Inferences from Jim 's Feedback on Loading destination colors: >>>> >>>>>> 1. The API or Macro that I had earlier used to Load DST colors, >>>>>> indeed, adjusted destination color components with divide-by-alpha if >>>>>> destination was already pre-multiplied. >>>> >>>>>> My apologies.. I should have spotted this error at >>>>>> the first iteration itself. >>>> >>>>>> 2. Due to the divide-by-alpha adjustment, the remaining >>>>>> logic would become incorrect. ( Especially, the multiplication with >>>>>> dstF based on pre-mulitplication status ) >>>> >>>>>> 3. Correct API is being used now, and the dstColor components are >>>>>> loaded directly without any divide-by-alpha adjustment. >>>> >>>>>> >>>> >>>>>> Inferences from Sergey's Feedback on Performance. >>>> >>>>>> 1. To set the context for everyone, the logic present in the >>>>>> current webrev.02 is modelled as per SRCOVER_MASKFILL. >>>> >>>>>> There are multiple if (...) conditions that remove >>>>>> un-necessary blending calculations. Ideally this should improve >>>>>> performance. >>>> >>>>>> However, since some data are not readily available >>>>>> (as present in SRCOVER_MASKFILL), few additional calculations have been >>>>>> added. >>>> >>>>>> They are: pre-multiplying srcColor with alpha and >>>>>> assigning to res. >>>> >>>>>> Finding the correct address of >>>>>> Pixel using DST_PTR and PixelStride. >>>> >>>>>> Henceforth, as Sergey suggests, Observation on >>>>>> performance will be beneficial. >>>> >>>>>> 2. The performance of the new logic was measured with >>>>>> linux-x86_64-normal-server-release config and compared with the >>>>>> logic used in un-optimized code in webrev.00 >>>> >>>>>> 3. Result: The newer logic provides a fractional gain (about 15 - >>>>>> 20 ms) over the older logic. >>>> >>>>>> >>>> >>>>>> Other Subtle Changes: >>>> >>>>>> 1. The test file has been renamed from >>>>>> AADrawStringArtifact.java to more meaningful - >>>>>> AntialiasedTextArtifact.java >>>> >>>>>> 2. The test file tests for both TYPE_INT_ARGB and >>>>>> TYPE_INT_ARGB_PRE BufferedImage formats. >>>> >>>>>> The code has been well commented to explain the logic >>>>>> used in every function. >>>> >>>>>> >>>> >>>>>> Kindly take your time to review the changes in the webrev link mentioned >>>>>> below and provide your suggestions. >>>> >>>>>> Webrev Link: >>>> >>>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.02/ >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> Thank you for your time in review >>>> >>>>>> Have a good day >>>> >>>>>> >>>> >>>>>> Prahalad N. >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> -----Original Message----- >>>> >>>>>> From: Jim Graham >>>> >>>>>> Sent: Thursday, March 24, 2016 7:57 AM >>>> >>>>>> To: Prahalad Kumar Narayanan;2d-dev@openjdk.java.net >>>>>> <mailto:2d-dev@openjdk.java.net> >>>> >>>>>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: >>>> >>>>>> Antialiased text on translucent backgrounds gets bright artifacts >>>> >>>>>> >>>> >>>>>> Hi Prahalad, >>>> >>>>>> >>>> >>>>>> (On a side note - ouch! I came up with these macros in the first >>>> >>>>>> place, but 20 years later I'm now realizing just how hard they are >>>>>> to >>>> >>>>>> navigate and review. My apologies...) >>>> >>>>>> >>>> >>>>>> The macro you are still using to load the destination data is one that >>>>>> loads it to non-premultiplied components, which means it will divide out >>>>>> the alpha if the destination is PRE. The rest of the logic assumes that >>>>>> the components were loaded without any adjustment of their >>>>>> premultiplication so not only is that division an unnecessary operation, >>>>>> it makes the following math wrong. >>>> >>>>>> >>>> >>>>>> The SRCOVER_MASKFILL macro seems to use "Postload ## STRATEGY ## From ## >>>>>> TYPE" which seems to load them into separate components without any >>>>>> adjustment of their pre-multiplication status. This is paired with >>>>>> "LoadAlphaFrom ## TYPE ## For ## STRATEGY" to load just the destination >>>>>> alpha for computing dstF... >>>> >>>>>> >>>> >>>>>> ...jim >>>> >>>>>> >>>> >>>>>> On 3/22/16 4:35 AM, Prahalad Kumar Narayanan wrote: >>>> >>>>>>> Hello Everyone on Java2D Group >>>> >>>>>>> >>>> >>>>>>> Good day to you. >>>> >>>>>>> >>>> >>>>>>> This is a Follow-up to Review Request on the bug: >>>> >>>>>>> Bug : JDK-8015070 Anti-aliased Text on Translucent >>>>>>> background gets bright artifacts >>>> >>>>>>> Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070 >>>> >>>>>>> >>>> >>>>>>> First, Sincere thanks to Jim for his valuable feedback. >>>> >>>>>>> 1. As Jim correctly mentioned, SRCOVER_MASKFILL has a similar >>>>>>> logic to blend two Translucent colors based on an Alpha mask. >>>> >>>>>>> 2. The calculations are exactly the same as the changes in >>>>>>> previous webrev. >>>> >>>>>>> However the logic of SRCOVER_MASKFILL is 'Optimized' >>>>>>> to reduce the number of computations. >>>> >>>>>>> 3. This optimization is definitely required because, the logic >>>>>>> is executed for every single pixel in a glyph. >>>> >>>>>>> Example: If a string is made up of 5 English >>>>>>> characters with each character having 32 x 32 pixels, >>>> >>>>>>> The anti-aliasing logic will be >>>>>>> executed for 5 x 32 x 32 iterations. >>>> >>>>>>> Reducing computation per pixel will imply a huge >>>>>>> benefit for complete drawString operation. >>>> >>>>>>> >>>> >>>>>>> Observation from SRCOVER_MASKFILL >>>> >>>>>>> 1. The mask fill reduces computations by through multiple >>>>>>> if(...) conditions. >>>> >>>>>>> Each if condition affirms whether the next set of >>>>>>> computations are required. >>>> >>>>>>> 2. Optimization 1: If mask value is 0- skip entire logic. >>>> >>>>>>> 3. Optimization 2: If mask value is maximum, say 255, take >>>>>>> srcA directly without multiplying with maskAlpha ( Reason: 1 * >>>>>>> srcAlpha = srcAlpha ) >>>> >>>>>>> 4. Optimization 3: Compute pre-multiplied resColor in two >>>>>>> steps. First with srcColor and then with dstColor. >>>> >>>>>>> If the resAlpha from 1st step (i.e., srcColor) is >>>>>>> fully opaque, avoid blending dstColor altogether. >>>> >>>>>>> >>>> >>>>>>> Changes in Current Webrev.01 >>>> >>>>>>> 1. The fix for the current bug is modelled based on >>>>>>> SRCOVER_MASKFILL. >>>> >>>>>>> 2. The changes have been verified to work on windows, linux and >>>>>>> mac operating systems. >>>> >>>>>>> 3. The automated Test file- AADrawStringArtifact.java runs >>>>>>> as expected >>>> >>>>>>> Identifies artifact & throws exception when run on >>>>>>> JDK 7 and 8. >>>> >>>>>>> With JDK9, the test file returns without error. >>>> >>>>>>> 3. JPRT build has been run to ensure, changes build on all >>>>>>> supported platforms. >>>> >>>>>>> JPRT job link : >>>> >>>>>>> http://scaaa637.us.oracle.com//archive/2016/03/2016-03-22-070604.p >>>>>>> ra >>>> <http://scaaa637.us.oracle.com/archive/2016/03/2016-03-22-070604.pra> >>>> >>>>>>> h >>>> >>>>>>> n >>>> >>>>>>> ara-linux.client >>>> >>>>>>> >>>> >>>>>>> Kindly review the changes in the below mentioned link and provide >>>>>>> your views >>>> >>>>>>> Webrev Link : >>>> >>>>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.01/ >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> Thank you for your time in review >>>> >>>>>>> Have a good day >>>> >>>>>>> >>>> >>>>>>> Prahalad N. >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> -----Original Message----- >>>> >>>>>>> From: Jim Graham >>>> >>>>>>> Sent: Friday, March 18, 2016 6:07 AM >>>> >>>>>>> To: Prahalad Kumar Narayanan;2d-dev@openjdk.java.net >>>>>>> <mailto:2d-dev@openjdk.java.net> >>>> >>>>>>> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: >>>> >>>>>>> Antialiased text on translucent backgrounds gets bright artifacts >>>> >>>>>>> >>>> >>>>>>> Hi Prahalad, >>>> >>>>>>> >>>> >>>>>>> This basically boils down to "alpha blending math needs to be performed >>>>>>> in premultiplied form" and the existing code was not doing that. >>>> >>>>>>> >>>> >>>>>>> Your changes do add a pre-multiplication step to the math in two >>>> >>>>>>> places >>>> >>>>>>> - when you mix the src alpha and the glyph alpha at the top of the >>>>>>> macro, and again when you do the Multiply(dst, dstA, dst) step in the >>>>>>> middle. In that sense, the new math is correct. >>>> >>>>>>> >>>> >>>>>>> However, it is not the optimal way to implement this. In >>>> >>>>>>> particular, the macro used here to load the data from the >>>> >>>>>>> destination is the one that loads it into 4 ARGB non-premultiplied >>>> >>>>>>> values. If the destination is non-PRE, then your added multiply >>>> >>>>>>> step is exactly what is needed, but it could be combined with the >>>> >>>>>>> multiply that will happen later in the blending equation, so it is >>>> >>>>>>> an added step rather than a fixed fraction in the existing MultMultAdd >>>>>>> parameters. >>>> >>>>>>> Additionally, if the destination is already PRE, then the macro >>>> >>>>>>> being used to load the dst pixel data there will have done a >>>>>>> divide >>>> >>>>>>> step to divide out the alpha from the destination, only to have >>>>>>> you >>>> >>>>>>> reverse that math with your new >>>> >>>>>>> Multiply() step. That's a lot of math to end up with a NOP. >>>> >>>>>>> >>>> >>>>>>> The MUL8 you added for the srcA and glyph value is needed, that change >>>>>>> is good. Since it is common for glyph pixels to have zero alpha, you >>>>>>> might want to test the glyph alpha for 0 and skip the pixel before you >>>>>>> do the multiply, though. This would add one more if, but it would be a >>>>>>> common case. >>>> >>>>>>> >>>> >>>>>>> The trickier part is to load the destination components without >>>>>>> un-premultiplying them. Unfortunately there is no "Load...ToArgbPre" >>>> >>>>>>> macro to parallel the Load macro used in the function. Perhaps there >>>>>>> should be, but you'd still end up with an extra multiply step as I >>>>>>> mentioned above because you can fold the premultiplication of the dst >>>>>>> data into the MultMultAdd by carefully choosing the parameters you use >>>>>>> in the math there. >>>> >>>>>>> >>>> >>>>>>> The good news is that the way that the SRCOVER_MASKFILL uses the >>>>>>> various type-specific macros works around this a bit and minimizes the >>>>>>> number of multiplies that happen. You could check out >>>>>>> DEFINE_SRCOVER_MASKFILL and see how it works in the case where pMask is >>>>>>> not null (pMask is an alpha mask with values very similar to the >>>>>>> glyphAA data). Modeling this code on that code would correct the math >>>>>>> and minimize it as well... >>>> >>>>>>> >>>> >>>>>>> ...jim >>>> >>>>>>> >>>> >>>>>>> On 3/17/16 3:00 AM, Prahalad Kumar Narayanan wrote: >>>> >>>>>>>> Hello Everyone on Java2D Group >>>> >>>>>>>> >>>> >>>>>>>> Good day to you. >>>> >>>>>>>> >>>> >>>>>>>> Herewith, I 'm sharing the webrev for two identical Java2D Bugs. >>>> >>>>>>>> >>>> >>>>>>>> Bug ID : JDK-8015070 >>>> >>>>>>>> >>>> >>>>>>>> Title : Antialiased text on translucent backgrounds gets >>>> >>>>>>>> bright artifacts >>>> >>>>>>>> >>>> >>>>>>>> Link :https://bugs.openjdk.java.net/browse/JDK-8015070 >>>> >>>>>>>> >>>> >>>>>>>> Bug ID : JDK-8013564 ( currently closed as duplicate ) >>>> >>>>>>>> >>>> >>>>>>>> Title : Font rendering anti-aliasing in translucent >>>> >>>>>>>> BufferedImages broken >>>> >>>>>>>> >>>> >>>>>>>> Link :https://bugs.openjdk.java.net/browse/JDK-8013564 >>>> >>>>>>>> >>>> >>>>>>>> Webrev Link : >>>> >>>>>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/ >>>> >>>>>>>> >>>> >>>>>>>> Quick Summary on Bugs : >>>> >>>>>>>> >>>> >>>>>>>> ```````````````````````````````````````````````` >>>> >>>>>>>> >>>> >>>>>>>> 1. Artifacts appear on Edges of text characters when >>>>>>>> anti-aliased >>>> >>>>>>>> text is drawn on Translucent background >>>> >>>>>>>> >>>> >>>>>>>> 2. The issue is reproducible on all platforms - windows, linux and >>>>>>>> mac os. >>>> >>>>>>>> >>>> >>>>>>>> 3. Besides, the issue is reproduced with the commonly used pixel >>>> >>>>>>>> format- 4ByteArgb. >>>> >>>>>>>> >>>> >>>>>>>> Root Cause & Solution : >>>> >>>>>>>> >>>> >>>>>>>> ```````````````````````````````````````````````` >>>> >>>>>>>> >>>> >>>>>>>> 1. The Macro: GlyphListAABlend4ByteArgb in File: LoopMacros.h >>>>>>>> uses >>>> >>>>>>>> the standard blending algorithm >>>> >>>>>>>> >>>> >>>>>>>> dstColor = [ srcColor * glyphAlpha + dstColor * (1 - >>>> >>>>>>>> glyphAlpha) ] / dstAlpha >>>> >>>>>>>> >>>> >>>>>>>> 2. The above equation works only when the srcColor and dstColor are >>>>>>>> Opaque. >>>> >>>>>>>> >>>> >>>>>>>> 3. When srcColor and dstColor are Translucent, their >>>> >>>>>>>> alphaComponent will influence the visibility of the color, and >>>> >>>>>>>> visibility of the color below. >>>> >>>>>>>> >>>> >>>>>>>> 4. The new set of calculations for blending Translucent source >>>>>>>> and >>>> >>>>>>>> destination colors is given as >>>> >>>>>>>> >>>> >>>>>>>> resAlpha = 1 - ((1 - srcAlpha) * (1 - dstAlpha)) >>>> >>>>>>>> >>>> >>>>>>>> resColor = [ srcColor * srcAlpha + dstColor * >>>>>>>> dstAlpha >>>> >>>>>>>> * >>>> >>>>>>>> (1 >>>> >>>>>>>> - >>>> >>>>>>>> srcAlpha) ] / resAlpha >>>> >>>>>>>> >>>> >>>>>>>> 5. Reference text for the equation: >>>> >>>>>>>> https://en.wikipedia.org/wiki/Alpha_compositing >>>> >>>>>>>> >>>> >>>>>>>> 6. With the above modification to the blending logic, the >>>> >>>>>>>> artifacts do not appear and issues are fixed. >>>> >>>>>>>> >>>> >>>>>>>> Jtreg & Jprt Results : >>>> >>>>>>>> >>>> >>>>>>>> ```````````````````````````````````````````````` >>>> >>>>>>>> >>>> >>>>>>>> 1. A simple test-file: AADrawStringArtifact.java has been >>>>>>>> created >>>> >>>>>>>> to be a part of jtreg test cases. >>>> >>>>>>>> >>>> >>>>>>>> The test file is well commented to explain - nature >>>>>>>> of >>>> >>>>>>>> artifact and how the test tries to identify them. >>>> >>>>>>>> >>>> >>>>>>>> As required, the test case fails with Jdk 7, Jdk 8 >>>>>>>> and >>>> >>>>>>>> succeeds with Jdk 9-internal (built with changes for the bug fix) >>>> >>>>>>>> >>>> >>>>>>>> 2. The changes to blending logic lies within >>>> >>>>>>>> java.desktop/src/share/native/... >>>> >>>>>>>> >>>> >>>>>>>> Henceforth, JPRT was used to ensure successful build >>>> >>>>>>>> across all supported platforms >>>> >>>>>>>> >>>> >>>>>>>> Jprt Job Link : >>>> >>>>>>>> http://scaaa637.us.oracle.com//archive/2016/03/2016-03-17-072001. >>>>>>>> pr >>>> <http://scaaa637.us.oracle.com/archive/2016/03/2016-03-17-072001.pr> >>>> >>>>>>>> a >>>> >>>>>>>> h >>>> >>>>>>>> n >>>> >>>>>>>> ara-linux.client >>>> >>>>>>>> >>>> >>>>>>>> The build succeeds on all platforms. >>>> >>>>>>>> >>>> >>>>>>>> Kindly review the webrev link and provide your views and suggestions. >>>> >>>>>>>> >>>> >>>>>>>> Webrev Link : >>>> >>>>>>>> http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/ >>>> >>>>>>>> >>>> >>>>>>>> If the changes look good, we could take up the changes for source >>>>>>>> checkin. >>>> >>>>>>>> >>>> >>>>>>>> Thank you for your time in review >>>> >>>>>>>> >>>> >>>>>>>> Have a good day >>>> >>>>>>>> >>>> >>>>>>>> Prahalad N. >>>> >>>>>>>> >>>>