[jira] [Updated] (PIVOT-778) Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered

2011-10-24 Thread Sandro Martini (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIVOT-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Martini updated PIVOT-778:
-

Affects Version/s: 2.0.1
Fix Version/s: (was: 2.0.1)

Latest version of the fix (committed) from Noel probably solves this 
completely, but a critical fix like this needs more tests, so moved to 2.1 (and 
need to be reverted from committed sources).

Noel, can you attach here a patch with related changes (so if someone wants to 
patch Pivot 2.0.1) could do it ?

Thank tyou very much,
Sandro


 Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered
 

 Key: PIVOT-778
 URL: https://issues.apache.org/jira/browse/PIVOT-778
 Project: Pivot
  Issue Type: Improvement
  Components: wtk
Affects Versions: 2.0, 2.0.1
Reporter: Piotr Kołaczkowski
Assignee: Noel Grandin
  Labels: DisplayHost, caching, gc, paint, performance, repaint
 Fix For: 2.1


 We are writing sort of a game, which continually calls Component.repaint 
 method, at 60 FPS. We noticed excessive CPU usage, although the actual amount 
 of painting done by our component (actually in an overriden Panel.paint) is 
 ridiculously small. The profiler pointed us to the paintVolatileBuffered 
 method in the DisplayHost. What you are doing there is:
 1. obtain a new, fresh BufferedImage of size equal to the actual clip region, 
 let's say for a full screen game it can be about 1280x1024. This is 1.3 Mpix 
 x 4 bytes/pixel = 5.2 MB of raw data, allocated from a probably cold memory 
 region (not in the L2 cache)
 2. then you call actual paint on that buffered image (this is touching at 
 least 5.2 MB again)
 3. then you copy that to the onscreen buffer (which means copying 5.2 MB for 
 another time)
 4. in case GC kicks in after 1 and 3. it has to move the BufferedImage in 
 memory to compact young generation (= touching 5.2 MB fourth time)
 The whole process means allocating from cold memory 5.2 MB per each frame and 
 touching about 20 MB per frame.
 For 60 FPS it makes up ~300 MB/s allocation rate and 1.2GB memory throughput. 
 It also makes the GC go crazy.
 We have found that caching the buffer between the subsequent paint calls 
 improves performance a lot:
 code
 /** Stores the prepared offscreen buffer */
 private BufferedImage bufferedImage;
 /**
  * Attempts to paint the display using an offscreen buffer.
  *
  * @param graphics
  * The source graphics context.
  *
  * @return
  * tttrue/tt if the display was painted using the offscreen
  * buffer; ttfalse/tt, otherwise.
  */
 private boolean paintBuffered(Graphics2D graphics) {
 boolean painted = false;
 // Paint the display into an offscreen buffer
 GraphicsConfiguration gc = graphics.getDeviceConfiguration();
 java.awt.Rectangle clipBounds = graphics.getClipBounds();
 if (bufferedImage == null ||
 bufferedImage.getWidth()  clipBounds.width ||
 bufferedImage.getHeight()  clipBounds.height)
 bufferedImage = gc.createCompatibleImage(clipBounds.width, 
 clipBounds.height,
 Transparency.OPAQUE);
 if (bufferedImage != null) {
 Graphics2D bufferedImageGraphics = 
 (Graphics2D)bufferedImage.getGraphics();
 bufferedImageGraphics.setClip(0, 0, clipBounds.width,
 ...
 /code 
 Advantages:
 1. it saves from costly allocation of a large object from possibly not-cached 
 memory region
 2. after a few repaints the GC moves this object to the tenured generation, 
 so that the young generation collector is much more efficient (longer times 
 between runs)
 3. the image probably stays most of the time in the L2 or L3 cache, which 
 saves on memory bandwidth and speeds up painting
 Disadvantages:
 1. uses some memory that is probably not required all the time, when the app 
 doesn't need to repaint anything large, however this is almost completely 
 shadowed by the excessive GC overhead due to continuous recreation of the 
 offscreen buffered image
 Anyway, we observed about 2-4x performance increase by this simple change - 
 now when running at 60 FPS it uses only about 25% of CPU for painting, and 
 the rest can be used by the application logic (AI, etc.). Previously 60 FPS 
 was probably the most we could achieve from Core2Duo 2.2 GHz. Of course, this 
 change won't affect any business applications that don't do animations etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Updated] (PIVOT-778) Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered

2011-10-17 Thread Sandro Martini (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIVOT-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Martini updated PIVOT-778:
-

Affects Version/s: (was: 2.0.1)
Fix Version/s: (was: 2.0.2)
   2.0.1

Reassigned to 2.0.1 to see if we are able to put some enhancement in 2.0.1, and 
maybe after put a long-term solution in 2.1 (if needed).


 Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered
 

 Key: PIVOT-778
 URL: https://issues.apache.org/jira/browse/PIVOT-778
 Project: Pivot
  Issue Type: Improvement
  Components: wtk
Affects Versions: 2.0
Reporter: Piotr Kołaczkowski
Assignee: Noel Grandin
  Labels: DisplayHost, caching, gc, paint, performance, repaint
 Fix For: 2.0.1, 2.1


 We are writing sort of a game, which continually calls Component.repaint 
 method, at 60 FPS. We noticed excessive CPU usage, although the actual amount 
 of painting done by our component (actually in an overriden Panel.paint) is 
 ridiculously small. The profiler pointed us to the paintVolatileBuffered 
 method in the DisplayHost. What you are doing there is:
 1. obtain a new, fresh BufferedImage of size equal to the actual clip region, 
 let's say for a full screen game it can be about 1280x1024. This is 1.3 Mpix 
 x 4 bytes/pixel = 5.2 MB of raw data, allocated from a probably cold memory 
 region (not in the L2 cache)
 2. then you call actual paint on that buffered image (this is touching at 
 least 5.2 MB again)
 3. then you copy that to the onscreen buffer (which means copying 5.2 MB for 
 another time)
 4. in case GC kicks in after 1 and 3. it has to move the BufferedImage in 
 memory to compact young generation (= touching 5.2 MB fourth time)
 The whole process means allocating from cold memory 5.2 MB per each frame and 
 touching about 20 MB per frame.
 For 60 FPS it makes up ~300 MB/s allocation rate and 1.2GB memory throughput. 
 It also makes the GC go crazy.
 We have found that caching the buffer between the subsequent paint calls 
 improves performance a lot:
 code
 /** Stores the prepared offscreen buffer */
 private BufferedImage bufferedImage;
 /**
  * Attempts to paint the display using an offscreen buffer.
  *
  * @param graphics
  * The source graphics context.
  *
  * @return
  * tttrue/tt if the display was painted using the offscreen
  * buffer; ttfalse/tt, otherwise.
  */
 private boolean paintBuffered(Graphics2D graphics) {
 boolean painted = false;
 // Paint the display into an offscreen buffer
 GraphicsConfiguration gc = graphics.getDeviceConfiguration();
 java.awt.Rectangle clipBounds = graphics.getClipBounds();
 if (bufferedImage == null ||
 bufferedImage.getWidth()  clipBounds.width ||
 bufferedImage.getHeight()  clipBounds.height)
 bufferedImage = gc.createCompatibleImage(clipBounds.width, 
 clipBounds.height,
 Transparency.OPAQUE);
 if (bufferedImage != null) {
 Graphics2D bufferedImageGraphics = 
 (Graphics2D)bufferedImage.getGraphics();
 bufferedImageGraphics.setClip(0, 0, clipBounds.width,
 ...
 /code 
 Advantages:
 1. it saves from costly allocation of a large object from possibly not-cached 
 memory region
 2. after a few repaints the GC moves this object to the tenured generation, 
 so that the young generation collector is much more efficient (longer times 
 between runs)
 3. the image probably stays most of the time in the L2 or L3 cache, which 
 saves on memory bandwidth and speeds up painting
 Disadvantages:
 1. uses some memory that is probably not required all the time, when the app 
 doesn't need to repaint anything large, however this is almost completely 
 shadowed by the excessive GC overhead due to continuous recreation of the 
 offscreen buffered image
 Anyway, we observed about 2-4x performance increase by this simple change - 
 now when running at 60 FPS it uses only about 25% of CPU for painting, and 
 the rest can be used by the application logic (AI, etc.). Previously 60 FPS 
 was probably the most we could achieve from Core2Duo 2.2 GHz. Of course, this 
 change won't affect any business applications that don't do animations etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIVOT-778) Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered

2011-08-08 Thread Sandro Martini (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIVOT-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Martini updated PIVOT-778:
-

Fix Version/s: (was: 2.0.1)
   2.1
   2.0.2
 Priority: Major  (was: Minor)
Affects Version/s: 2.0.1

 Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered
 

 Key: PIVOT-778
 URL: https://issues.apache.org/jira/browse/PIVOT-778
 Project: Pivot
  Issue Type: Improvement
  Components: wtk
Affects Versions: 2.0, 2.0.1
Reporter: Piotr Kołaczkowski
  Labels: DisplayHost, caching, gc, paint, performance, repaint
 Fix For: 2.0.2, 2.1


 We are writing sort of a game, which continually calls Component.repaint 
 method, at 60 FPS. We noticed excessive CPU usage, although the actual amount 
 of painting done by our component (actually in an overriden Panel.paint) is 
 ridiculously small. The profiler pointed us to the paintVolatileBuffered 
 method in the DisplayHost. What you are doing there is:
 1. obtain a new, fresh BufferedImage of size equal to the actual clip region, 
 let's say for a full screen game it can be about 1280x1024. This is 1.3 Mpix 
 x 4 bytes/pixel = 5.2 MB of raw data, allocated from a probably cold memory 
 region (not in the L2 cache)
 2. then you call actual paint on that buffered image (this is touching at 
 least 5.2 MB again)
 3. then you copy that to the onscreen buffer (which means copying 5.2 MB for 
 another time)
 4. in case GC kicks in after 1 and 3. it has to move the BufferedImage in 
 memory to compact young generation (= touching 5.2 MB fourth time)
 The whole process means allocating from cold memory 5.2 MB per each frame and 
 touching about 20 MB per frame.
 For 60 FPS it makes up ~300 MB/s allocation rate and 1.2GB memory throughput. 
 It also makes the GC go crazy.
 We have found that caching the buffer between the subsequent paint calls 
 improves performance a lot:
 code
 /** Stores the prepared offscreen buffer */
 private BufferedImage bufferedImage;
 /**
  * Attempts to paint the display using an offscreen buffer.
  *
  * @param graphics
  * The source graphics context.
  *
  * @return
  * tttrue/tt if the display was painted using the offscreen
  * buffer; ttfalse/tt, otherwise.
  */
 private boolean paintBuffered(Graphics2D graphics) {
 boolean painted = false;
 // Paint the display into an offscreen buffer
 GraphicsConfiguration gc = graphics.getDeviceConfiguration();
 java.awt.Rectangle clipBounds = graphics.getClipBounds();
 if (bufferedImage == null ||
 bufferedImage.getWidth()  clipBounds.width ||
 bufferedImage.getHeight()  clipBounds.height)
 bufferedImage = gc.createCompatibleImage(clipBounds.width, 
 clipBounds.height,
 Transparency.OPAQUE);
 if (bufferedImage != null) {
 Graphics2D bufferedImageGraphics = 
 (Graphics2D)bufferedImage.getGraphics();
 bufferedImageGraphics.setClip(0, 0, clipBounds.width,
 ...
 /code 
 Advantages:
 1. it saves from costly allocation of a large object from possibly not-cached 
 memory region
 2. after a few repaints the GC moves this object to the tenured generation, 
 so that the young generation collector is much more efficient (longer times 
 between runs)
 3. the image probably stays most of the time in the L2 or L3 cache, which 
 saves on memory bandwidth and speeds up painting
 Disadvantages:
 1. uses some memory that is probably not required all the time, when the app 
 doesn't need to repaint anything large, however this is almost completely 
 shadowed by the excessive GC overhead due to continuous recreation of the 
 offscreen buffered image
 Anyway, we observed about 2-4x performance increase by this simple change - 
 now when running at 60 FPS it uses only about 25% of CPU for painting, and 
 the rest can be used by the application logic (AI, etc.). Previously 60 FPS 
 was probably the most we could achieve from Core2Duo 2.2 GHz. Of course, this 
 change won't affect any business applications that don't do animations etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIVOT-778) Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered

2011-07-27 Thread Sandro Martini (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIVOT-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Martini updated PIVOT-778:
-

Fix Version/s: 2.0.1

Comments ?

 Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered
 

 Key: PIVOT-778
 URL: https://issues.apache.org/jira/browse/PIVOT-778
 Project: Pivot
  Issue Type: Improvement
  Components: wtk
Affects Versions: 2.0
Reporter: Piotr Kołaczkowski
Priority: Minor
  Labels: DisplayHost, caching, gc, paint, performance, repaint
 Fix For: 2.0.1


 We are writing sort of a game, which continually calls Component.repaint 
 method, at 60 FPS. We noticed excessive CPU usage, although the actual amount 
 of painting done by our component (actually in an overriden Panel.paint) is 
 ridiculously small. The profiler pointed us to the paintVolatileBuffered 
 method in the DisplayHost. What you are doing there is:
 1. obtain a new, fresh BufferedImage of size equal to the actual clip region, 
 let's say for a full screen game it can be about 1280x1024. This is 1.3 Mpix 
 x 4 bytes/pixel = 5.2 MB of raw data, allocated from a probably cold memory 
 region (not in the L2 cache)
 2. then you call actual paint on that buffered image (this is touching at 
 least 5.2 MB again)
 3. then you copy that to the onscreen buffer (which means copying 5.2 MB for 
 another time)
 4. in case GC kicks in after 1 and 3. it has to move the BufferedImage in 
 memory to compact young generation (= touching 5.2 MB fourth time)
 The whole process means allocating from cold memory 5.2 MB per each frame and 
 touching about 20 MB per frame.
 For 60 FPS it makes up ~300 MB/s allocation rate and 1.2GB memory throughput. 
 It also makes the GC go crazy.
 We have found that caching the buffer between the subsequent paint calls 
 improves performance a lot:
 code
 /** Stores the prepared offscreen buffer */
 private BufferedImage bufferedImage;
 /**
  * Attempts to paint the display using an offscreen buffer.
  *
  * @param graphics
  * The source graphics context.
  *
  * @return
  * tttrue/tt if the display was painted using the offscreen
  * buffer; ttfalse/tt, otherwise.
  */
 private boolean paintBuffered(Graphics2D graphics) {
 boolean painted = false;
 // Paint the display into an offscreen buffer
 GraphicsConfiguration gc = graphics.getDeviceConfiguration();
 java.awt.Rectangle clipBounds = graphics.getClipBounds();
 if (bufferedImage == null ||
 bufferedImage.getWidth()  clipBounds.width ||
 bufferedImage.getHeight()  clipBounds.height)
 bufferedImage = gc.createCompatibleImage(clipBounds.width, 
 clipBounds.height,
 Transparency.OPAQUE);
 if (bufferedImage != null) {
 Graphics2D bufferedImageGraphics = 
 (Graphics2D)bufferedImage.getGraphics();
 bufferedImageGraphics.setClip(0, 0, clipBounds.width,
 ...
 /code 
 Advantages:
 1. it saves from costly allocation of a large object from possibly not-cached 
 memory region
 2. after a few repaints the GC moves this object to the tenured generation, 
 so that the young generation collector is much more efficient (longer times 
 between runs)
 3. the image probably stays most of the time in the L2 or L3 cache, which 
 saves on memory bandwidth and speeds up painting
 Disadvantages:
 1. uses some memory that is probably not required all the time, when the app 
 doesn't need to repaint anything large, however this is almost completely 
 shadowed by the excessive GC overhead due to continuous recreation of the 
 offscreen buffered image
 Anyway, we observed about 2-4x performance increase by this simple change - 
 now when running at 60 FPS it uses only about 25% of CPU for painting, and 
 the rest can be used by the application logic (AI, etc.). Previously 60 FPS 
 was probably the most we could achieve from Core2Duo 2.2 GHz. Of course, this 
 change won't affect any business applications that don't do animations etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIVOT-778) Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered

2011-07-27 Thread Sandro Martini (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIVOT-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandro Martini updated PIVOT-778:
-

Comment: was deleted

(was: Comments ?)

 Optimise DisplayHost.paintBuffered and DisplayHost.paintVolatileBuffered
 

 Key: PIVOT-778
 URL: https://issues.apache.org/jira/browse/PIVOT-778
 Project: Pivot
  Issue Type: Improvement
  Components: wtk
Affects Versions: 2.0
Reporter: Piotr Kołaczkowski
Priority: Minor
  Labels: DisplayHost, caching, gc, paint, performance, repaint
 Fix For: 2.0.1


 We are writing sort of a game, which continually calls Component.repaint 
 method, at 60 FPS. We noticed excessive CPU usage, although the actual amount 
 of painting done by our component (actually in an overriden Panel.paint) is 
 ridiculously small. The profiler pointed us to the paintVolatileBuffered 
 method in the DisplayHost. What you are doing there is:
 1. obtain a new, fresh BufferedImage of size equal to the actual clip region, 
 let's say for a full screen game it can be about 1280x1024. This is 1.3 Mpix 
 x 4 bytes/pixel = 5.2 MB of raw data, allocated from a probably cold memory 
 region (not in the L2 cache)
 2. then you call actual paint on that buffered image (this is touching at 
 least 5.2 MB again)
 3. then you copy that to the onscreen buffer (which means copying 5.2 MB for 
 another time)
 4. in case GC kicks in after 1 and 3. it has to move the BufferedImage in 
 memory to compact young generation (= touching 5.2 MB fourth time)
 The whole process means allocating from cold memory 5.2 MB per each frame and 
 touching about 20 MB per frame.
 For 60 FPS it makes up ~300 MB/s allocation rate and 1.2GB memory throughput. 
 It also makes the GC go crazy.
 We have found that caching the buffer between the subsequent paint calls 
 improves performance a lot:
 code
 /** Stores the prepared offscreen buffer */
 private BufferedImage bufferedImage;
 /**
  * Attempts to paint the display using an offscreen buffer.
  *
  * @param graphics
  * The source graphics context.
  *
  * @return
  * tttrue/tt if the display was painted using the offscreen
  * buffer; ttfalse/tt, otherwise.
  */
 private boolean paintBuffered(Graphics2D graphics) {
 boolean painted = false;
 // Paint the display into an offscreen buffer
 GraphicsConfiguration gc = graphics.getDeviceConfiguration();
 java.awt.Rectangle clipBounds = graphics.getClipBounds();
 if (bufferedImage == null ||
 bufferedImage.getWidth()  clipBounds.width ||
 bufferedImage.getHeight()  clipBounds.height)
 bufferedImage = gc.createCompatibleImage(clipBounds.width, 
 clipBounds.height,
 Transparency.OPAQUE);
 if (bufferedImage != null) {
 Graphics2D bufferedImageGraphics = 
 (Graphics2D)bufferedImage.getGraphics();
 bufferedImageGraphics.setClip(0, 0, clipBounds.width,
 ...
 /code 
 Advantages:
 1. it saves from costly allocation of a large object from possibly not-cached 
 memory region
 2. after a few repaints the GC moves this object to the tenured generation, 
 so that the young generation collector is much more efficient (longer times 
 between runs)
 3. the image probably stays most of the time in the L2 or L3 cache, which 
 saves on memory bandwidth and speeds up painting
 Disadvantages:
 1. uses some memory that is probably not required all the time, when the app 
 doesn't need to repaint anything large, however this is almost completely 
 shadowed by the excessive GC overhead due to continuous recreation of the 
 offscreen buffered image
 Anyway, we observed about 2-4x performance increase by this simple change - 
 now when running at 60 FPS it uses only about 25% of CPU for painting, and 
 the rest can be used by the application logic (AI, etc.). Previously 60 FPS 
 was probably the most we could achieve from Core2Duo 2.2 GHz. Of course, this 
 change won't affect any business applications that don't do animations etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira