Re: A different way to handle pulse timing

2013-08-06 Thread David Hill

On 8/6/13 Aug 6, 10:07 AM, Scott Palmer wrote:


On 2013-08-06, at 9:10 AM, Artem Ananiev  wrote:


On 8/5/2013 10:26 PM, Richard Bair wrote:

In this proposal, we also would be putting the next pulse on the end
of the queue, so it is impossible to starve input events.

Putting the next pulse on the end of the queue is surprisingly difficult task, at least on Windows. 
If we use standard APIs provided by the platform, PostMessage() and SendMessage(), events are 
always put to the head of the queue. If we use timer events, they are of the lowest priority, so 
we'll have "paint starvation" instead of "input events starvation".

If the OS message queue is fundamentally broken (i.e. it does not behave like a 
queue), can all this be done on a proper queue that you have control over?
I.e. in the OS-specific message loop, just move the messages to a more 
reasonably behaved queue.  Posting a request to process a pulse would simply 
queue the operation on the well behaved queue and not use the OS PostMessage or 
SendMessage mechanism at all.  I admit to not knowing enough about Windows 
message processing to know if that even makes sense.
(Windows seriously doesn't have a mechanism to put something on the tail end of the 
message queue? Wow, don't they understand how a "queue" is supposed to work?)

This is what Glass/Lens does - the user event thread is all in Java. But then - 
we also don't have to deal with any pesky native window managers :-)
Lens input events are taken from as close as we have to a native event loop (on 
an input thread) and posted to the java based user event thread, just like any 
other Application.InvokeLater (first in, first out queue). This also saves Lens 
a bit of JNI handling as most user (non input events) never leave java this way.

Dave


Scott




--
David Hill 
Java Embedded Development

"Sometimes the questions are complicated and the answers are simple."
-- Dr. Seuss (1904 - 1991)



Re: A different way to handle pulse timing

2013-08-06 Thread Artem Ananiev


On 8/6/2013 6:07 PM, Scott Palmer wrote:


On 2013-08-06, at 9:10 AM, Artem Ananiev  wrote:


On 8/5/2013 10:26 PM, Richard Bair wrote:


In this proposal, we also would be putting the next pulse on the
end of the queue, so it is impossible to starve input events.


Putting the next pulse on the end of the queue is surprisingly
difficult task, at least on Windows. If we use standard APIs
provided by the platform, PostMessage() and SendMessage(), events
are always put to the head of the queue. If we use timer events,
they are of the lowest priority, so we'll have "paint starvation"
instead of "input events starvation".


If the OS message queue is fundamentally broken (i.e. it does not
behave like a queue), can all this be done on a proper queue that you
have control over?


I wouldn't say it's broken :) It's implemented this way by design. BTW, 
as far as I know, Mac OS X is similar to Windows wrt event handling: all 
the selectors (correspond to PostMessage() and SendMessage()) are 
processed before input events.



I.e. in the OS-specific message loop, just move the messages to a
more reasonably behaved queue.  Posting a request to process a pulse
would simply queue the operation on the well behaved queue and not
use the OS PostMessage or SendMessage mechanism at all.  I admit to
not knowing enough about Windows message processing to know if that
even makes sense.


What you describe is close to how JavaFX is implemented on embedded 
platforms. See Lens code in Glass for details. We do this, because on 
that platforms there is virtually no native event queue, so we have our 
own. However, on platforms like Windows or Mac OS X, we have to use 
native event queues, otherwise JavaFX apps won't be good citizens there.


This is what we have in AWT/Swing, where a native event queue is 
separate from Java event queue. I can't say the exact number of minor 
(e.g. sluggish window resizing), major (dragndrop not working), and even 
fatal (JVM crashes) issues we fixed in AWT, that were caused by this 
architecture with 2 queues, but believe me this number is huge.



(Windows seriously doesn't have a mechanism to put something on the
tail end of the message queue? Wow, don't they understand how a
"queue" is supposed to work?)


Why do you think it must be a queue? :) It can be a queue, but it can be 
something more complex as well. And yes, there is no easy way to put an 
event to the tail of the queue on Windows. What we can do is to put 
events to the queue with PostMessage/SendMessage, but dequeue them in 
different order. We prototyped that in the past, but didn't find it 
acceptable.


Thanks,

Artem


Scott


Re: A different way to handle pulse timing

2013-08-06 Thread Scott Palmer


On 2013-08-06, at 9:10 AM, Artem Ananiev  wrote:

> 
> On 8/5/2013 10:26 PM, Richard Bair wrote:
>> 
>> In this proposal, we also would be putting the next pulse on the end
>> of the queue, so it is impossible to starve input events.
> 
> Putting the next pulse on the end of the queue is surprisingly difficult 
> task, at least on Windows. If we use standard APIs provided by the platform, 
> PostMessage() and SendMessage(), events are always put to the head of the 
> queue. If we use timer events, they are of the lowest priority, so we'll have 
> "paint starvation" instead of "input events starvation".

If the OS message queue is fundamentally broken (i.e. it does not behave like a 
queue), can all this be done on a proper queue that you have control over?
I.e. in the OS-specific message loop, just move the messages to a more 
reasonably behaved queue.  Posting a request to process a pulse would simply 
queue the operation on the well behaved queue and not use the OS PostMessage or 
SendMessage mechanism at all.  I admit to not knowing enough about Windows 
message processing to know if that even makes sense.
(Windows seriously doesn't have a mechanism to put something on the tail end of 
the message queue? Wow, don't they understand how a "queue" is supposed to 
work?)


Scott



Re: A different way to handle pulse timing

2013-08-06 Thread Artem Ananiev


On 8/5/2013 9:09 PM, Richard Bair wrote:

In the past we have seen situations where there are so many tasks
on the user event thread, that user response (even on desktop) was
not acceptable. Some of these items are getting better as we
improve design (ie less redundant layout operations causes by a
single change/event).


Right, but I don't see how that could still happen in this proposal?
The problem before was the pulse events were handled outside of the
event queue (as I recall) so that they got higher priority. We got
rid of the higher priority and starvation ceased. This proposal would
not reintroduce priorities, so I don't see how you could end up with
input event starvation again?


Here is that bug:

RT-20656: Pending requests from Application.invokeLater can cause input 
event starvation


It is indeed fixed, but the fix was to make sure we always have a window 
to dispatch input events (sometimes, very small, but still). Higher 
priority for user/application runnables is still there.


Thanks,

Artem


BTW - it is very easy to write a "bad" app which will demonstrate
the problem. As a thought example - if on a button click, you
calculate PI to the nth digit before updating your text field - and
you do it in the event callback - you are stalling the user event
thread. Add in enough computes and you get an very unresponsive
app. Instead of computes, you can just call sleep to see the
problem too :-)


But this is the case today as well.

Richard



Re: A different way to handle pulse timing

2013-08-06 Thread Artem Ananiev


On 8/5/2013 10:26 PM, Richard Bair wrote:

In the past we have seen situations where there are so many
tasks on the user event thread, that user response (even on
desktop) was not acceptable. Some of these items are getting
better as we improve design (ie less redundant layout
operations causes by a single change/event).

Right, but I don't see how that could still happen in this
proposal? The problem before was the pulse events were handled
outside of the event queue (as I recall) so that they got higher
priority. We got rid of the higher priority and starvation
ceased. This proposal would not reintroduce priorities, so I
don't see how you could end up with input event starvation
again?

rendering is "staged" on the event queue (layout, adding the render
job to the render thread). It has been this way for quite a while
now. As far as I remember,( other than paths with "live resize"),
we have never had a mechanism that provided for event priority (at
least not on the Linux side where I tend to live).


This is how I thought it used to be done: we had (still have) a
separate glass thread which fires off once ever 16ms or so. We used
to take this pulse and handle it at the next available opportunity,
which was explicit prioritization. If the pulse handling took longer
than 16ms, by the time the pulse ended we'd have another pulse ready
to be handled and would starve the queue. Today, we get this event
and add it to the event queue, so we are never starving the event
queue.

In this proposal, we also would be putting the next pulse on the end
of the queue, so it is impossible to starve input events.


Putting the next pulse on the end of the queue is surprisingly difficult 
task, at least on Windows. If we use standard APIs provided by the 
platform, PostMessage() and SendMessage(), events are always put to the 
head of the queue. If we use timer events, they are of the lowest 
priority, so we'll have "paint starvation" instead of "input events 
starvation".


Thanks,

Artem


Thanks
Richard


Re: A different way to handle pulse timing

2013-08-05 Thread John Hendrikx

On 5/08/2013 20:46, Richard Bair wrote:

As I wrote in the previous email, it seems that we currently are not blocked waiting for 
vsync(), at least on Windows with D3D pipeline. Anyway, even if we "fix" that, 
what you propose is that sometimes both threads will be blocked (the render thread 
waiting for vsync, the event thread waiting for the render thread), which doesn't sound 
perfect.

Right. That stinks.

So today, the FX thread will block waiting for the render thread at the point 
where we synchronize state between the FX thread and render thread. The problem 
with this proposal is that we will wait here much longer waiting for vsync in 
the case that we have animations happening, which is just dead time when we 
ought to be preparing the next frame.

Richard
Reading all this I get the distinct feeling that the current way of 
doing things is 'double buffering', where you have to wait until vsync 
arrives before you can start with the next frame, while you are looking 
for 'triple buffering', which allows a new frame to be prepared in a 
seperate buffer/graph/layout while the first (finished) 
buffer/graph/layout waits to be passed off to the render thread when 
vsync arrives.


--John



Re: A different way to handle pulse timing

2013-08-05 Thread Richard Bair
>> As I wrote in the previous email, it seems that we currently are not blocked 
>> waiting for vsync(), at least on Windows with D3D pipeline. Anyway, even if 
>> we "fix" that, what you propose is that sometimes both threads will be 
>> blocked (the render thread waiting for vsync, the event thread waiting for 
>> the render thread), which doesn't sound perfect.
> 
> Right. That stinks.

So today, the FX thread will block waiting for the render thread at the point 
where we synchronize state between the FX thread and render thread. The problem 
with this proposal is that we will wait here much longer waiting for vsync in 
the case that we have animations happening, which is just dead time when we 
ought to be preparing the next frame.

Richard

Re: A different way to handle pulse timing

2013-08-05 Thread Richard Bair
> I now see the picture.
> 
> As I wrote in the previous email, it seems that we currently are not blocked 
> waiting for vsync(), at least on Windows with D3D pipeline. Anyway, even if 
> we "fix" that, what you propose is that sometimes both threads will be 
> blocked (the render thread waiting for vsync, the event thread waiting for 
> the render thread), which doesn't sound perfect.

Right. That stinks.

> Note that on Windows and Mac OS X, input events and application runnables are 
> handled differently at the native level (either using different mechanisms, 
> or having different priorities). To implement this proposal, we'll need to 
> eliminate the difference, which may be a difficult task.

Are the application runnables at a higher or lower priority than input events?

Re: A different way to handle pulse timing

2013-08-05 Thread Richard Bair
> Right, I guess I don't have a complete picture of the threading model.
> 
> I assume that user events like mouse clicks and key presses are coming in 
> from some OS thread and queued on the "user event thread".  Meanwhile things 
> like runLater() are also queued on the user event thread. If other user 
> events from the OS happened they would naturally be interleaved with runLater 
> type operations - everything eventually gets processed no matter how busy the 
> system is, no matter what you do on the user event thread so long as 
> eventually the operation completes.  The cost of handling the input, would 
> either complete and then the next event is processed or they might trigger 
> additional work via runLater.  The runLater stuff would be queued behind any 
> other OS events that have already been queued by the OS input thread, they 
> wouldn't "jump the queue".

That should basically be correct. The only wrinkle is that since we rely on the 
OS event queues (other than on embedded), if the OS does event prioritization 
it is possible that things won't get delivered to us in exactly the same order, 
but I believe in Glass we now make sure our events are not getting higher 
prioritization to avoid event starvation, so that the above thread model 
understanding should be correct.

Richard

Re: A different way to handle pulse timing

2013-08-05 Thread Richard Bair
>>> In the past we have seen situations where there are so many tasks on the 
>>> user event thread, that user response (even on desktop) was not acceptable. 
>>> Some of these items are getting better as we improve design (ie less 
>>> redundant layout operations causes by a single change/event).
>> Right, but I don't see how that could still happen in this proposal? The 
>> problem before was the pulse events were handled outside of the event queue 
>> (as I recall) so that they got higher priority. We got rid of the higher 
>> priority and starvation ceased. This proposal would not reintroduce 
>> priorities, so I don't see how you could end up with input event starvation 
>> again?
> rendering is "staged" on the event queue (layout, adding the render job to 
> the render thread). It has been this way for quite a while now. As far as I 
> remember,( other than paths with "live resize"), we have never had a 
> mechanism that provided for event priority (at least not on the Linux side 
> where I tend to live).

This is how I thought it used to be done: we had (still have) a separate glass 
thread which fires off once ever 16ms or so. We used to take this pulse and 
handle it at the next available opportunity, which was explicit prioritization. 
If the pulse handling took longer than 16ms, by the time the pulse ended we'd 
have another pulse ready to be handled and would starve the queue. Today, we 
get this event and add it to the event queue, so we are never starving the 
event queue.

In this proposal, we also would be putting the next pulse on the end of the 
queue, so it is impossible to starve input events.

Thanks
Richard

Re: A different way to handle pulse timing

2013-08-05 Thread David Hill

On 8/5/13 Aug 5, 1:40 PM, Scott Palmer wrote:

On 2013-08-05, at 12:49 PM, David Hill  wrote:


On 8/5/13 Aug 5, 12:27 PM, Scott Palmer wrote:

The idea of user event starvation has been mentioned before and has me a little 
confused…  Why aren't things handled as a simple queue, with no priorities or 
anything, so starvation is impossible?  Is this something the OS is doing?

There is a "simple" user input queue - the problem is that we dispatch those 
arriving events on the user event *thread*, and that thread is used for a lot of thing 
other than user input. It is not so much the cost of handling the input, but rather the 
cost of handling the actions after input.

Right, I guess I don't have a complete picture of the threading model.

I think that there is a relatively small number of people that do - and I count 
myself as someone that has a good, but partial, understanding of it.


I assume that user events like mouse clicks and key presses are coming in from some OS thread and 
queued on the "user event thread".  Meanwhile things like runLater() are also queued on 
the user event thread. If other user events from the OS happened they would naturally be 
interleaved with runLater type operations - everything eventually gets processed no matter how busy 
the system is, no matter what you do on the user event thread so long as eventually the operation 
completes.  The cost of handling the input, would either complete and then the next event is 
processed or they might trigger additional work via runLater.  The runLater stuff would be queued 
behind any other OS events that have already been queued by the OS input thread, they wouldn't 
"jump the queue".

I suspect I am oversimplifying.  If there is somewhere to go to get a idea of 
the actual threading model please point me in the right direction.

As part of our "porting guide" which will just be part of the openjfx wiki - 
this is something that I want to write up, at least in overview. Not there yet though.

I suspect that some of the details will be changing over the next while anyway. 
The repo refactoring now allows us to clean up some of the rather convoluted 
means of communicating from the API through quantum to Prism and Glass.

Dave


--
David Hill 
Java Embedded Development

"The conventional view serves to protect us from the painful job of thinking."
-- John Kenneth Galbraith (1908 - 2006)



Re: A different way to handle pulse timing

2013-08-05 Thread David Hill

On 8/5/13 Aug 5, 1:09 PM, Richard Bair wrote:

In the past we have seen situations where there are so many tasks on the user 
event thread, that user response (even on desktop) was not acceptable. Some of 
these items are getting better as we improve design (ie less redundant layout 
operations causes by a single change/event).

Right, but I don't see how that could still happen in this proposal? The 
problem before was the pulse events were handled outside of the event queue (as 
I recall) so that they got higher priority. We got rid of the higher priority 
and starvation ceased. This proposal would not reintroduce priorities, so I 
don't see how you could end up with input event starvation again?

rendering is "staged" on the event queue (layout, adding the render job to the render 
thread). It has been this way for quite a while now. As far as I remember,( other than paths with 
"live resize"), we have never had a mechanism that provided for event priority (at least 
not on the Linux side where I tend to live).

BTW - it is very easy to write a "bad" app which will demonstrate the problem. 
As a thought example - if on a button click, you calculate PI to the nth digit before 
updating your text field - and you do it in the event callback - you are stalling the 
user event thread. Add in enough computes and you get an very unresponsive app. Instead 
of computes, you can just call sleep to see the problem too :-)

But this is the case today as well.

Indeed - and something we document as a "do not do that because it hurts" item. 
I used this to illustrate the problem. Just like the app writer, we need to make sure we 
use the user event queue in ways that allows us to handle events in a timely fashion. In 
some cases - that means we do a lot of work to put computes on a different thread (ie 
rendering).


Richard



--
David Hill 
Java Embedded Development

Ich weiß nicht.
-- an unknown German philosopher



Re: A different way to handle pulse timing

2013-08-05 Thread Scott Palmer

On 2013-08-05, at 12:49 PM, David Hill  wrote:

> On 8/5/13 Aug 5, 12:27 PM, Scott Palmer wrote:
>> The idea of user event starvation has been mentioned before and has me a 
>> little confused…  Why aren't things handled as a simple queue, with no 
>> priorities or anything, so starvation is impossible?  Is this something the 
>> OS is doing?
> 
> There is a "simple" user input queue - the problem is that we dispatch those 
> arriving events on the user event *thread*, and that thread is used for a lot 
> of thing other than user input. It is not so much the cost of handling the 
> input, but rather the cost of handling the actions after input.

Right, I guess I don't have a complete picture of the threading model.

I assume that user events like mouse clicks and key presses are coming in from 
some OS thread and queued on the "user event thread".  Meanwhile things like 
runLater() are also queued on the user event thread. If other user events from 
the OS happened they would naturally be interleaved with runLater type 
operations - everything eventually gets processed no matter how busy the system 
is, no matter what you do on the user event thread so long as eventually the 
operation completes.  The cost of handling the input, would either complete and 
then the next event is processed or they might trigger additional work via 
runLater.  The runLater stuff would be queued behind any other OS events that 
have already been queued by the OS input thread, they wouldn't "jump the queue".

I suspect I am oversimplifying.  If there is somewhere to go to get a idea of 
the actual threading model please point me in the right direction.

Regards,

Scott

Re: A different way to handle pulse timing

2013-08-05 Thread Richard Bair
> In the past we have seen situations where there are so many tasks on the user 
> event thread, that user response (even on desktop) was not acceptable. Some 
> of these items are getting better as we improve design (ie less redundant 
> layout operations causes by a single change/event).

Right, but I don't see how that could still happen in this proposal? The 
problem before was the pulse events were handled outside of the event queue (as 
I recall) so that they got higher priority. We got rid of the higher priority 
and starvation ceased. This proposal would not reintroduce priorities, so I 
don't see how you could end up with input event starvation again?

> BTW - it is very easy to write a "bad" app which will demonstrate the 
> problem. As a thought example - if on a button click, you calculate PI to the 
> nth digit before updating your text field - and you do it in the event 
> callback - you are stalling the user event thread. Add in enough computes and 
> you get an very unresponsive app. Instead of computes, you can just call 
> sleep to see the problem too :-)

But this is the case today as well.

Richard

Re: A different way to handle pulse timing

2013-08-05 Thread David Hill

On 8/5/13 Aug 5, 12:27 PM, Scott Palmer wrote:

The idea of user event starvation has been mentioned before and has me a little 
confused…  Why aren't things handled as a simple queue, with no priorities or 
anything, so starvation is impossible?  Is this something the OS is doing?


There is a "simple" user input queue - the problem is that we dispatch those 
arriving events on the user event *thread*, and that thread is used for a lot of thing 
other than user input. It is not so much the cost of handling the input, but rather the 
cost of handling the actions after input.

As an example, on a mouse click, a control may change - which might cause a 
re-layout, which should cause repainting to happen.

Currently, JFX uses a separate "rendering thread" for painting. This is 
goodness, especially when you have a GPU. On the user event thread we need to queue up 
and then stage the repaint request.

Things are more complicated because many (but not all) painting/window 
management tasks need to be single threaded.

In the past we have seen situations where there are so many tasks on the user 
event thread, that user response (even on desktop) was not acceptable. Some of 
these items are getting better as we improve design (ie less redundant layout 
operations causes by a single change/event).

Those of us who have been through several iterations of this are suggesting 
caution on a rework though :-)

BTW - it is very easy to write a "bad" app which will demonstrate the problem. 
As a thought example - if on a button click, you calculate PI to the nth digit before 
updating your text field - and you do it in the event callback - you are stalling the 
user event thread. Add in enough computes and you get an very unresponsive app. Instead 
of computes, you can just call sleep to see the problem too :-)


Dave



In terms of rendering fast enough that you can fit things into a vsync period.. 
that shouldn't be necessary.  If you miss one sync period then you should be 
finished by the next.. having a strict requirement to fit within a single vsync 
period is impractical.

Without access to true sync, a timer would serve the same purpose.  Having both 
a timer and sync is where things get silly.

Cheers,

Scott

On 2013-08-05, at 9:47 AM, David Hill  wrote:


On 8/1/13 Aug 1, 3:52 PM, Richard Bair wrote:

as far as I can read it, your idea is to start preparing the next frame right 
after synchronization (scenegraph to render tree) is completed for the previous 
frame. Do I get it correctly? If yes, we'll likely re-introduce the old problem 
with input events starvation. There will be no or very little window, when the 
events can be processed on the event thread, because the thread will always be 
either busy handling CSS, animations, etc., or blocked waiting for the render 
thread to finish rendering.

I think the difference is that I was going to use the vsync as the limiter. 
That is, the first time through we do a pulse, then we schedule another pulse, 
then we run that other pulse (almost immediately), then we hit the sync point 
with the render thread and have to wait for it because it is blocked on vsync. 
Meanwhile the user events are being queued up. When we get back from this, the 
next pulse is placed on the end of the queue, we process all input events, then 
the next pulse.

You are assuming several things here - most of which would not be present on 
something like the PI.
   * access to vsync
   * a fast enough rendering that you can usually fit into a vsync period.

I would be seriously concerned over user event starvation. It would not take 
much of a busy set of animations to mean we spin painting a SG that has not 
completely caught up with the  bindings/and or ignoring the incoming input 
events.

--
David Hill 
Java Embedded Development

A committee is a cul-de-sac down which ideas are lured and then quietly 
strangled.
-- Sir Barnett Cocks (1907 - 1989)




--
David Hill 
Java Embedded Development

The radical of one century is the conservative of the next. The radical invents 
the views. When he has worn them out the conservative adopts them.
-- Mark Twain



Re: A different way to handle pulse timing

2013-08-05 Thread Richard Bair
> The idea of user event starvation has been mentioned before and has me a 
> little confused…  Why aren't things handled as a simple queue, with no 
> priorities or anything, so starvation is impossible?  Is this something the 
> OS is doing?

That's what I'm wondering too.

> In terms of rendering fast enough that you can fit things into a vsync 
> period.. that shouldn't be necessary.  If you miss one sync period then you 
> should be finished by the next.. having a strict requirement to fit within a 
> single vsync period is impractical.
> 
> Without access to true sync, a timer would serve the same purpose.  Having 
> both a timer and sync is where things get silly.

Exactly what I was thinking.

Richard

Re: A different way to handle pulse timing

2013-08-05 Thread Scott Palmer
The idea of user event starvation has been mentioned before and has me a little 
confused…  Why aren't things handled as a simple queue, with no priorities or 
anything, so starvation is impossible?  Is this something the OS is doing?


In terms of rendering fast enough that you can fit things into a vsync period.. 
that shouldn't be necessary.  If you miss one sync period then you should be 
finished by the next.. having a strict requirement to fit within a single vsync 
period is impractical.

Without access to true sync, a timer would serve the same purpose.  Having both 
a timer and sync is where things get silly.

Cheers,

Scott

On 2013-08-05, at 9:47 AM, David Hill  wrote:

> On 8/1/13 Aug 1, 3:52 PM, Richard Bair wrote:
>>> as far as I can read it, your idea is to start preparing the next frame 
>>> right after synchronization (scenegraph to render tree) is completed for 
>>> the previous frame. Do I get it correctly? If yes, we'll likely 
>>> re-introduce the old problem with input events starvation. There will be no 
>>> or very little window, when the events can be processed on the event 
>>> thread, because the thread will always be either busy handling CSS, 
>>> animations, etc., or blocked waiting for the render thread to finish 
>>> rendering.
>> I think the difference is that I was going to use the vsync as the limiter. 
>> That is, the first time through we do a pulse, then we schedule another 
>> pulse, then we run that other pulse (almost immediately), then we hit the 
>> sync point with the render thread and have to wait for it because it is 
>> blocked on vsync. Meanwhile the user events are being queued up. When we get 
>> back from this, the next pulse is placed on the end of the queue, we process 
>> all input events, then the next pulse.
> You are assuming several things here - most of which would not be present on 
> something like the PI.
>   * access to vsync
>   * a fast enough rendering that you can usually fit into a vsync period.
> 
> I would be seriously concerned over user event starvation. It would not take 
> much of a busy set of animations to mean we spin painting a SG that has not 
> completely caught up with the  bindings/and or ignoring the incoming input 
> events.
> 
> -- 
> David Hill 
> Java Embedded Development
> 
> A committee is a cul-de-sac down which ideas are lured and then quietly 
> strangled.
> -- Sir Barnett Cocks (1907 - 1989)
> 



Re: A different way to handle pulse timing

2013-08-05 Thread David Hill

On 8/1/13 Aug 1, 3:52 PM, Richard Bair wrote:

as far as I can read it, your idea is to start preparing the next frame right 
after synchronization (scenegraph to render tree) is completed for the previous 
frame. Do I get it correctly? If yes, we'll likely re-introduce the old problem 
with input events starvation. There will be no or very little window, when the 
events can be processed on the event thread, because the thread will always be 
either busy handling CSS, animations, etc., or blocked waiting for the render 
thread to finish rendering.

I think the difference is that I was going to use the vsync as the limiter. 
That is, the first time through we do a pulse, then we schedule another pulse, 
then we run that other pulse (almost immediately), then we hit the sync point 
with the render thread and have to wait for it because it is blocked on vsync. 
Meanwhile the user events are being queued up. When we get back from this, the 
next pulse is placed on the end of the queue, we process all input events, then 
the next pulse.

You are assuming several things here - most of which would not be present on 
something like the PI.
   * access to vsync
   * a fast enough rendering that you can usually fit into a vsync period.

I would be seriously concerned over user event starvation. It would not take 
much of a busy set of animations to mean we spin painting a SG that has not 
completely caught up with the  bindings/and or ignoring the incoming input 
events.

--
David Hill 
Java Embedded Development

A committee is a cul-de-sac down which ideas are lured and then quietly 
strangled.
-- Sir Barnett Cocks (1907 - 1989)



Re: A different way to handle pulse timing

2013-08-02 Thread Artem Ananiev


On 8/1/2013 11:52 PM, Richard Bair wrote:

as far as I can read it, your idea is to start preparing the next
frame right after synchronization (scenegraph to render tree) is
completed for the previous frame. Do I get it correctly? If yes,
we'll likely re-introduce the old problem with input events
starvation. There will be no or very little window, when the events
can be processed on the event thread, because the thread will
always be either busy handling CSS, animations, etc., or blocked
waiting for the render thread to finish rendering.


I think the difference is that I was going to use the vsync as the
limiter. That is, the first time through we do a pulse, then we
schedule another pulse, then we run that other pulse (almost
immediately), then we hit the sync point with the render thread and
have to wait for it because it is blocked on vsync. Meanwhile the
user events are being queued up. When we get back from this, the next
pulse is placed on the end of the queue, we process all input events,
then the next pulse.


I now see the picture.

As I wrote in the previous email, it seems that we currently are not 
blocked waiting for vsync(), at least on Windows with D3D pipeline. 
Anyway, even if we "fix" that, what you propose is that sometimes both 
threads will be blocked (the render thread waiting for vsync, the event 
thread waiting for the render thread), which doesn't sound perfect.


Note that on Windows and Mac OS X, input events and application 
runnables are handled differently at the native level (either using 
different mechanisms, or having different priorities). To implement this 
proposal, we'll need to eliminate the difference, which may be a 
difficult task.


Thanks,

Artem


Whenever an animation starts, the runningAnimationCounter is
incremented. When an animation ends, it is decremented (or it could
be a Set or whatever). The pendingPulse is simply false to
start with, and is checked before we submit another pulse. Whenever a
node in the scene graph becomes dirty, or the scene is resized, or
stylesheets are changed, or in any case something happens that
requires us to draw again, we check this flag and fire a new pulse if
one is not already pending.


Scene graph is only changed on the event thread. So my guess is that "fire a new 
pulse" is just

  Platform.runLater(() -> pulse())


Right.


When a pulse occurs, we process animations first, then CSS, then
layout, then validate all the bounds, and *then we block* until the
rendering thread is available for synchronization. I believe this is
what we are doing today (it was a change Steve and I looked at with
Jasper a couple months ago IIRC).

But now for the new part. Immediately after synchronization, we check
the runningAnimationCounter. If it is > 0, then we fire off a new
pulse and leave the pendingPulse flag set to true. If
runningAnimationCounter == 0, then we flip pendingPulse to false.
Other than the pick that always happens at the end of the pulse, we
do nothing else new and, if the pick didn't cause state to change, we
are now quiescent.

Meanwhile, the render thread has run off doing its thing. The last
step of rendering is the present, where we will block until the thing
is presented, which, when we return, would put us *immediately* at
the start of the next 16.66ms cycle. Since the render thread has just
completed its duties, it goes back to waiting until the FX thread
comes around asking to sync up again.

If there is an animation going on such that a new pulse had been
fired immediately after synchronization, then that new pulse would
have been handled while the previous frame was being rendered. Most
likely, by the time the render thread completes presenting and comes
back to check with the FX thread, it will find that the FX thread is
already waiting for it with the next frames data. It will synchronize
immediately and then carry on rendering another frame.


Given that you propose to fire a new pulse() whenever anything is changed in 
scene graph, and also right after synchronization, there is no need to have an 
external timer (QuantumToolkit.pulseTimer()) any longer.


Correct.


I think the way this would behave is that, when an animation is first
played, you will get two pulses close to each other. The first pulse
will do its business and then synchronize and then immediately fire
off another pulse. That next pulse will then also get processed and
then the FX thread will block until the previous frame finishes
rendering. During this time, additional events (either application
generated via runLater calls happening on background threads, or from
OS events) will get queued up. Between pulse #2 and pulse #3 then a
bunch of other events will get processed, essentially playing
catch-up. My guess is that this won't be a problem but you might see
a hiccup at the start of a new animation if the event queue is too
full and it can't process all that stuff in 16ms (because at this
point we're really multi-theaded between th

Re: A different way to handle pulse timing

2013-08-01 Thread Richard Bair
> as far as I can read it, your idea is to start preparing the next frame right 
> after synchronization (scenegraph to render tree) is completed for the 
> previous frame. Do I get it correctly? If yes, we'll likely re-introduce the 
> old problem with input events starvation. There will be no or very little 
> window, when the events can be processed on the event thread, because the 
> thread will always be either busy handling CSS, animations, etc., or blocked 
> waiting for the render thread to finish rendering.

I think the difference is that I was going to use the vsync as the limiter. 
That is, the first time through we do a pulse, then we schedule another pulse, 
then we run that other pulse (almost immediately), then we hit the sync point 
with the render thread and have to wait for it because it is blocked on vsync. 
Meanwhile the user events are being queued up. When we get back from this, the 
next pulse is placed on the end of the queue, we process all input events, then 
the next pulse.

>> Whenever an animation starts, the runningAnimationCounter is
>> incremented. When an animation ends, it is decremented (or it could
>> be a Set or whatever). The pendingPulse is simply false to
>> start with, and is checked before we submit another pulse. Whenever a
>> node in the scene graph becomes dirty, or the scene is resized, or
>> stylesheets are changed, or in any case something happens that
>> requires us to draw again, we check this flag and fire a new pulse if
>> one is not already pending.
> 
> Scene graph is only changed on the event thread. So my guess is that "fire a 
> new pulse" is just
> 
>  Platform.runLater(() -> pulse())

Right.

>> When a pulse occurs, we process animations first, then CSS, then
>> layout, then validate all the bounds, and *then we block* until the
>> rendering thread is available for synchronization. I believe this is
>> what we are doing today (it was a change Steve and I looked at with
>> Jasper a couple months ago IIRC).
>> 
>> But now for the new part. Immediately after synchronization, we check
>> the runningAnimationCounter. If it is > 0, then we fire off a new
>> pulse and leave the pendingPulse flag set to true. If
>> runningAnimationCounter == 0, then we flip pendingPulse to false.
>> Other than the pick that always happens at the end of the pulse, we
>> do nothing else new and, if the pick didn't cause state to change, we
>> are now quiescent.
>> 
>> Meanwhile, the render thread has run off doing its thing. The last
>> step of rendering is the present, where we will block until the thing
>> is presented, which, when we return, would put us *immediately* at
>> the start of the next 16.66ms cycle. Since the render thread has just
>> completed its duties, it goes back to waiting until the FX thread
>> comes around asking to sync up again.
>> 
>> If there is an animation going on such that a new pulse had been
>> fired immediately after synchronization, then that new pulse would
>> have been handled while the previous frame was being rendered. Most
>> likely, by the time the render thread completes presenting and comes
>> back to check with the FX thread, it will find that the FX thread is
>> already waiting for it with the next frames data. It will synchronize
>> immediately and then carry on rendering another frame.
> 
> Given that you propose to fire a new pulse() whenever anything is changed in 
> scene graph, and also right after synchronization, there is no need to have 
> an external timer (QuantumToolkit.pulseTimer()) any longer.

Correct.

>> I think the way this would behave is that, when an animation is first
>> played, you will get two pulses close to each other. The first pulse
>> will do its business and then synchronize and then immediately fire
>> off another pulse. That next pulse will then also get processed and
>> then the FX thread will block until the previous frame finishes
>> rendering. During this time, additional events (either application
>> generated via runLater calls happening on background threads, or from
>> OS events) will get queued up. Between pulse #2 and pulse #3 then a
>> bunch of other events will get processed, essentially playing
>> catch-up. My guess is that this won't be a problem but you might see
>> a hiccup at the start of a new animation if the event queue is too
>> full and it can't process all that stuff in 16ms (because at this
>> point we're really multi-theaded between the FX and render threads
>> and have nearly 16ms for each thread to do their business, instead of
>> only 8ms which is what you'd have in a single threaded system).
>> 
>> Another question I have is around resize events and how those work.
>> If they also come in to glass on the FX thread (but at a higher
>> priority than user events like a pulse or other input events?) then
>> what will happen is that we will get a resize event and process a
>> half-a-pulse (or maybe a whole pulse? animations+css+layout or just
>> css+layout?) and then render, 

Re: A different way to handle pulse timing

2013-08-01 Thread Artem Ananiev

Hi, Richard,

as far as I can read it, your idea is to start preparing the next frame 
right after synchronization (scenegraph to render tree) is completed for 
the previous frame. Do I get it correctly? If yes, we'll likely 
re-introduce the old problem with input events starvation. There will be 
no or very little window, when the events can be processed on the event 
thread, because the thread will always be either busy handling CSS, 
animations, etc., or blocked waiting for the render thread to finish 
rendering.


See more comments/questions below.

On 7/26/2013 9:22 AM, Richard Bair wrote:

Hi,

I'm probably missing something obvious and you guys on Glass / Prism
/ Quantum can help set me straight. I was thinking tonight of a
different way of initiating pulse events that would, I think,
completely smooth out the pulses such that we don't end up with
"drift" due to the timer being at a different rate than the GPU.

Suppose we have two variables in the system (and for simplicity lets
talk about a single Scene, because one problem I think this idea has
is with multiple scenes and I want to discuss that separately after
the core mechanism is understood):

- boolean pendingPulse
- int runningAnimationCounter


We already have the latter. QuantumToolkit.animationRunnable is used to 
track if there are any live animations. When the last of them is 
finished, this runnable is set to null, this is my understanding.



Whenever an animation starts, the runningAnimationCounter is
incremented. When an animation ends, it is decremented (or it could
be a Set or whatever). The pendingPulse is simply false to
start with, and is checked before we submit another pulse. Whenever a
node in the scene graph becomes dirty, or the scene is resized, or
stylesheets are changed, or in any case something happens that
requires us to draw again, we check this flag and fire a new pulse if
one is not already pending.


Scene graph is only changed on the event thread. So my guess is that 
"fire a new pulse" is just


  Platform.runLater(() -> pulse())

correct?


When a pulse occurs, we process animations first, then CSS, then
layout, then validate all the bounds, and *then we block* until the
rendering thread is available for synchronization. I believe this is
what we are doing today (it was a change Steve and I looked at with
Jasper a couple months ago IIRC).

But now for the new part. Immediately after synchronization, we check
the runningAnimationCounter. If it is > 0, then we fire off a new
pulse and leave the pendingPulse flag set to true. If
runningAnimationCounter == 0, then we flip pendingPulse to false.
Other than the pick that always happens at the end of the pulse, we
do nothing else new and, if the pick didn't cause state to change, we
are now quiescent.

Meanwhile, the render thread has run off doing its thing. The last
step of rendering is the present, where we will block until the thing
is presented, which, when we return, would put us *immediately* at
the start of the next 16.66ms cycle. Since the render thread has just
completed its duties, it goes back to waiting until the FX thread
comes around asking to sync up again.

If there is an animation going on such that a new pulse had been
fired immediately after synchronization, then that new pulse would
have been handled while the previous frame was being rendered. Most
likely, by the time the render thread completes presenting and comes
back to check with the FX thread, it will find that the FX thread is
already waiting for it with the next frames data. It will synchronize
immediately and then carry on rendering another frame.


Given that you propose to fire a new pulse() whenever anything is 
changed in scene graph, and also right after synchronization, there is 
no need to have an external timer (QuantumToolkit.pulseTimer()) any longer.



I think the way this would behave is that, when an animation is first
played, you will get two pulses close to each other. The first pulse
will do its business and then synchronize and then immediately fire
off another pulse. That next pulse will then also get processed and
then the FX thread will block until the previous frame finishes
rendering. During this time, additional events (either application
generated via runLater calls happening on background threads, or from
OS events) will get queued up. Between pulse #2 and pulse #3 then a
bunch of other events will get processed, essentially playing
catch-up. My guess is that this won't be a problem but you might see
a hiccup at the start of a new animation if the event queue is too
full and it can't process all that stuff in 16ms (because at this
point we're really multi-theaded between the FX and render threads
and have nearly 16ms for each thread to do their business, instead of
only 8ms which is what you'd have in a single threaded system).

Another question I have is around resize events and how those work.
If they also come in to glass on the FX thread (but at a higher
pr

Re: A different way to handle pulse timing

2013-07-26 Thread David Hill

On 7/26/13 Jul 26, 1:22 AM, Richard Bair wrote:


As for multiple scenes, I'm actually curious how this happens today. If I have 
2 scenes, and we have just a single render thread servicing both, then when I 
go to present, it blocks? Or is there a non-blocking present method that we use 
instead? Because if we block, then having 2 scenes would cut you down to 30fps 
maximum, wouldn't it? If we are non-blocking today (is that possible?) then the 
only way this proposed solution would work is if there was a different render 
thread per stage (which actually is something I think we ought to be doing 
anyway?).


Currently we block/lock rendering using
AbstractPainter.renderLock
so you can take a 'uses' to see where it is used.

In general though, we use PaintCollector to stage render jobs in a pulse - one 
per dirty Scene. This operation is on the user event thread, and is blocked by 
any pending render tasks (in effect it waits for a current pulse to complete).
The PaintCollector uses window state", similar to a shadow scene graph, so that 
the render operation is using state that is consistent (this is done using 
SceneState)

Each render task (scene) is executed separately on the render thread, and takes 
the renderLock while it is doing its thing. This means that there is an 
unlocked state between queued rendered scene tasks.

Rendering tends to be done via either PresentingPainter (accelerated) or 
UploadingPainter (sw).

The above is the simplified view - and shows that user event thread can be 
happily running along doing stuff while the render thread is doing its thing - 
at least until it is blocked by needed to push another render pulse.

But... it is a touch more complicated than that, as we found that there are a number of user event 
thread operations that really can't be happening when we have a render operation going. These are 
mostly related to "window operations", like resizing or closing a window. Changing a 
windows state while rendering to it causes "unpredictable" results on many platforms. 
Because of this, there are a number of operations where we take the renderLock before calling from 
Quantum into glass. A sampling of these cases are WindowStage setScene, setVisible, close. The idea 
is that Glass should be treated as single threaded.

Note that embedded arm behaves a touch different - because we have a single  GL graphics context 
and no "real" windows - we always paint every stage/scene from back to front. In effect, 
we are the compositor of the screen as we are the "window manage". This is pretty obvious 
in the PaintCollector class. Given that we have a single graphics context - there is no way we 
would want one render thread per scene there.


Another note - the addition of SceneState solved a problem, one that was easily 
seen with HelloWindowAbuse, which created/resized/closed windows at a frantic 
pace. There was some discussion at the time that we might have been able to 
save less state in SceneState because we already have that data in other 
places, like the SceneGraph. SceneState was a compromise solution that provided 
the quickest fix for the least amount of code reconstruction, which of course 
means that there is likely room for improvement.

So, that is my view of the elephant 
, and I am sure that 
others will have a different take :-)






--
David Hill 
Java Embedded Development

Education: that which reveals to the wise, and conceals from the stupid, the 
vast limits of their knowledge.
-- Mark Twain



A different way to handle pulse timing

2013-07-25 Thread Richard Bair
Hi,

I'm probably missing something obvious and you guys on Glass / Prism / Quantum 
can help set me straight. I was thinking tonight of a different way of 
initiating pulse events that would, I think, completely smooth out the pulses 
such that we don't end up with "drift" due to the timer being at a different 
rate than the GPU.

Suppose we have two variables in the system (and for simplicity lets talk about 
a single Scene, because one problem I think this idea has is with multiple 
scenes and I want to discuss that separately after the core mechanism is 
understood):

- boolean pendingPulse
- int runningAnimationCounter

Whenever an animation starts, the runningAnimationCounter is incremented. When 
an animation ends, it is decremented (or it could be a Set or 
whatever). The pendingPulse is simply false to start with, and is checked 
before we submit another pulse. Whenever a node in the scene graph becomes 
dirty, or the scene is resized, or stylesheets are changed, or in any case 
something happens that requires us to draw again, we check this flag and fire a 
new pulse if one is not already pending.

When a pulse occurs, we process animations first, then CSS, then layout, then 
validate all the bounds, and *then we block* until the rendering thread is 
available for synchronization. I believe this is what we are doing today (it 
was a change Steve and I looked at with Jasper a couple months ago IIRC).

But now for the new part. Immediately after synchronization, we check the 
runningAnimationCounter. If it is > 0, then we fire off a new pulse and leave 
the pendingPulse flag set to true. If runningAnimationCounter == 0, then we 
flip pendingPulse to false. Other than the pick that always happens at the end 
of the pulse, we do nothing else new and, if the pick didn't cause state to 
change, we are now quiescent.

Meanwhile, the render thread has run off doing its thing. The last step of 
rendering is the present, where we will block until the thing is presented, 
which, when we return, would put us *immediately* at the start of the next 
16.66ms cycle. Since the render thread has just completed its duties, it goes 
back to waiting until the FX thread comes around asking to sync up again.

If there is an animation going on such that a new pulse had been fired 
immediately after synchronization, then that new pulse would have been handled 
while the previous frame was being rendered. Most likely, by the time the 
render thread completes presenting and comes back to check with the FX thread, 
it will find that the FX thread is already waiting for it with the next frames 
data. It will synchronize immediately and then carry on rendering another frame.

I think the way this would behave is that, when an animation is first played, 
you will get two pulses close to each other. The first pulse will do its 
business and then synchronize and then immediately fire off another pulse. That 
next pulse will then also get processed and then the FX thread will block until 
the previous frame finishes rendering. During this time, additional events 
(either application generated via runLater calls happening on background 
threads, or from OS events) will get queued up. Between pulse #2 and pulse #3 
then a bunch of other events will get processed, essentially playing catch-up. 
My guess is that this won't be a problem but you might see a hiccup at the 
start of a new animation if the event queue is too full and it can't process 
all that stuff in 16ms (because at this point we're really multi-theaded 
between the FX and render threads and have nearly 16ms for each thread to do 
their business, instead of only 8ms which is what you'd have in a single 
threaded system).

Another question I have is around resize events and how those work. If they 
also come in to glass on the FX thread (but at a higher priority than user 
events like a pulse or other input events?) then what will happen is that we 
will get a resize event and process a half-a-pulse (or maybe a whole pulse? 
animations+css+layout or just css+layout?) and then render, pretty much just as 
fast as we can.

As for multiple scenes, I'm actually curious how this happens today. If I have 
2 scenes, and we have just a single render thread servicing both, then when I 
go to present, it blocks? Or is there a non-blocking present method that we use 
instead? Because if we block, then having 2 scenes would cut you down to 30fps 
maximum, wouldn't it? If we are non-blocking today (is that possible?) then the 
only way this proposed solution would work is if there was a different render 
thread per stage (which actually is something I think we ought to be doing 
anyway?).

Thanks
Richard