[ 
https://issues.apache.org/jira/browse/FLUME-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264496#comment-13264496
 ] 

Hari Shreedharan commented on FLUME-1162:
-----------------------------------------

bq. During reconfig testing, I found some possible issues. Sending to this list 
to seek clarification.
bq. 
bq. (1) Is it expected that the stop() and start() methods of a component 
should be called by a lifecycleSupervisor thread?
No, both these need not be called from the same thread. 

* The stop() method for a thread is called from the unsupervise() function in 
LifecycleSupervisor, which in turn is called by the conf poller thread. So the 
conf poller is what shuts down components.
 After the unsupervised() call returns, we then go ahead and call supervise for 
the new components. 

* The supervise call creates monitorRunnable threads which will start the 
components (via its runner.start()). So the component is actually started by a 
thread which is named lifecycleSupervisor-x-y.

bq. (2) If true for (1), then is it expected that start() and stop() should be 
called by the same lifecycleSupervisor thread (for the same component instance)?
So, no there is no such requirement. Even if (1) were true, it might still not 
be the same thread that calls start() and stop(), since these are managed 
through a thread pool of monitor threads. 
bq. (3) Is it expected that a new component instance is not created at each 
reconfig event, unless that component's instance name changes in the config 
file?
bq. 

No, components are always created. This is because during reconfiguration, we 
do not know what the old components are. I have filed a jira and am working on 
a design to make this smarter. 
bq. As an example, I have the following valid config as a bare minimal test 
config. It has no sink, but the config is still valid because this config could 
be used to stage/populate the channel:
bq. 
bq. agent.channels = c1
bq. agent.sources = r1
bq. agent.sinks = k1
bq. 
bq. agent.channels.c1.type = MEMORY
bq. 
bq. agent.sources.r1.channels = c1
bq. agent.sources.r1.type = org.apache.flume.source.custom.MyBasicSource
bq. 
bq. agent.sinks.k1.channel = c1
bq. agent.sinks.k1.type = NULL
bq. 
bq. The MyBasicSource class looks like this (I created an explicit ctor so that 
I can set a breakpoint on it, and I set breakpoints at all entry points of 
methods in AbstractSource to trace which thread calls which method of the 
source component instance):
bq. 
bq. public class MyBasicSource extends AbstractSource implements 
EventDrivenSource {
bq.     public MyBasicSource() {
bq. 
bq.     }
bq. }
bq. 
bq. After a reconfig event is triggered (I just touched the config file this 
time without modifying it), the source's setChannelProcessor() (with a new 
ChannelProcessor instance) is being called before calling stop(). So It seems 
to me that if stop() does cleanup work (light setting all fields to null, 
including a ChannelProcessor field), then the new ChannelProcessor instance 
(after reconfig) would be lost.
This is expected behavior. The new channel processor or source have not started 
yet. The start call is always after the stop call, we do create and configure 
the new component before we stop the old one. We stop the old one and only then 
we start the new one. So there is no risk. 

The case you have mentioned about the stop cleaning up is not a concern 
because, the source we are stopping is different from the new source which we 
are starting immediately after. This is true for the channel processor too, we 
do create a new channel processor in the new source object. So cleaning up an 
old component will not affect the new component which is about to be started 
since they are entirely different objects, and do not share fields. 
bq. In a similar test, I repeated the above steps but this time changed the 
instance name in the config from 'r1' to 'r2'. This time upon a reconfig, a 
call to the ctor and to setName() preceeded the call to stop(), and then after 
stop() comes start().

Again, perfectly fine. Like I mentioned above, the components are created and 
configured before we actually stop the previous component. We start the new 
components only after the stop is called. This is by design. We avoid the delay 
caused by component creation and initialization. If we create and initialize 
the components after we stop the old ones, we will see a major performance drop 
during reconfigs.
bq. 
bq. 
bq. In case that made you dizzy, here's a summary of behavior during a reconfig:
bq. 
bq. Format:
bq. method():thread (where <method>. is called by <thread>)
bq. 
bq. *****MyBasicSource():conf-file-poller-0
bq. *****setName(String):conf-file-poller-0
bq. *****setChannelProcessor(ChannelProcessor):conf-file-poller-0
bq. *****start():lifecycleSupervisor-1-2
bq. 
bq. <<conf file reconfig event occurs here.. changed the name of the source 
component instance from r1 to r2>
bq. 
bq. *****MyBasicSource():conf-file-poller-0
bq. *****setName(String):conf-file-poller-0
bq. *****setChannelProcessor(ChannelProcessor):conf-file-poller-0
bq. *****stop():conf-file-poller-0
bq. *****start():lifecycleSupervisor-1-4
bq. 
bq. 
bq. Some things to notice:
bq. 
bq. 1) The LifecycleAware method are called 3 different threads. start() is 
called by lifecycleSupervisor-1-2, stop() is called by conf-file-poller-0 (!... 
expected?), and the start() in the 2nd instance of MyBasicSource is called by 
lifecycleSupervisor-1-4. Although I can understand that different instances (r1 
and r2) can be called by different lifecycle threads, note that even the same 
instance (r1) had its start() method called by different lifecycle threads 
before and after reconfig. My concern about this relates to visibility of field 
values and happens-before relationships. If one thread calls stop() and sets 
field values to null, while another thread calls start to set field values to 
some value, then the call by the first thread might propagate to the working 
space of the 2nd thread sometime after thread 2 writes the value, which means 
that it could be possible that the new instance receives a null value sometime 
after the 2nd start() is called. JCIP.
Since the stop() and start() are actually happening on different objects and 
not on the same ones, there is no happens-before issue at all. Again, this is 
not a cause of concern, even if they were on the same object(just speculating 
for theoretical joy):

The stop() call happens before the start() call in the same thread. Therefore 
anything that happens in the start() can see what was done by stop. Now the 
start() creates runnables which are submitted to an executor service in the 
same thread, that called stop earlier. 

According to the Executor service, anything that happened before the submission 
of the runnable to the service has happened before the runnable is run, and is 
thus visible to 
it(http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
 : Memory consistency effects: Actions in a thread prior to the submission of a 
Runnable or Callable task to an ExecutorService happen-before 
(http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility)
 any actions taken by that task, which in turn happen-before the result is 
retrieved via Future.get())

Therefore stop() -> start() -> executor service runnable is the order, thus not 
violating the happens before relationship. The same is true within the executor 
service. Since we have only one runnable monitoring a component, every run of 
the runnable has happened before the next run, even if they were run in 
different threads of the thread pool. It is important to understand that every 
run of a runnable/callable is not guaranteed to be in the same thread of the 
thread pool, but it *is* guaranteed that, one run will see the effects of the 
previous run, so no need to worry :)


Also note that run() in monitor runnable is synchronized on that lifecycle 
aware, so the happens before relationship is even more explicit, so each run of 
the runnable always sees the previous run's updates. Again not really an issue 
for us since stop() and start() are called for different objects during a 
reconfiguration(but being synchronized will solve the problem even for same 
objects).

Again, none of the above is of concern to us, since the stop and start are on 
entirely different objects. :D

bq. 
bq. 2) After the reconfig, setName(..) and setChannelProcessor(..) are being 
called before stop().

As I mentioned above, perfectly fine.  We create and configure the new 
components before we actually stop the previous ones. We simply start the new 
ones after the old ones are stopped.


I hope this helps.
                
> Possible reconfig issue -- setName and setChannel being called before stop
> --------------------------------------------------------------------------
>
>                 Key: FLUME-1162
>                 URL: https://issues.apache.org/jira/browse/FLUME-1162
>             Project: Flume
>          Issue Type: Bug
>          Components: Configuration
>    Affects Versions: v1.2.0
>            Reporter: Will McQueen
>             Fix For: v1.2.0
>
>
> During reconfig testing, I found some possible issues. Sending to this list 
> to seek clarification.
> (1) Is it expected that the stop() and start() methods of a component should 
> be called by a lifecycleSupervisor thread?
> (2) If true for (1), then is it expected that start() and stop() should be 
> called by the same lifecycleSupervisor thread (for the same component 
> instance)?
> (3) Is it expected that a new component instance is not created at each 
> reconfig event, unless that component's instance name changes in the config 
> file?
> As an example, I have the following valid config as a bare minimal test 
> config. It has no sink, but the config is still valid because this config 
> could be used to stage/populate the channel:
> agent.channels = c1
> agent.sources = r1
> agent.sinks = k1
> agent.channels.c1.type = MEMORY
> agent.sources.r1.channels = c1
> agent.sources.r1.type = org.apache.flume.source.custom.MyBasicSource
> agent.sinks.k1.channel = c1
> agent.sinks.k1.type = NULL
> The MyBasicSource class looks like this (I created an explicit ctor so that I 
> can set a breakpoint on it, and I set breakpoints at all entry points of 
> methods in AbstractSource to trace which thread calls which method of the 
> source component instance):
> public class MyBasicSource extends AbstractSource implements 
> EventDrivenSource {
>     public MyBasicSource() {
>     }
> }
> After a reconfig event is triggered (I just touched the config file this time 
> without modifying it), the source's setChannelProcessor() (with a new 
> ChannelProcessor instance) is being called before calling stop(). So It seems 
> to me that if stop() does cleanup work (light setting all fields to null, 
> including a ChannelProcessor field), then the new ChannelProcessor instance 
> (after reconfig) would be lost.
> In a similar test, I repeated the above steps but this time changed the 
> instance name in the config from 'r1' to 'r2'. This time upon a reconfig, a 
> call to the ctor and to setName() preceeded the call to stop(), and then 
> after stop() comes start().
> In case that made you dizzy, here's a summary of behavior during a reconfig:
> Format:
> method():thread (where <method> is called by <thread>)
> *****MyBasicSource():conf-file-poller-0
> *****setName(String):conf-file-poller-0
> *****setChannelProcessor(ChannelProcessor):conf-file-poller-0
> *****start():lifecycleSupervisor-1-2
> <<conf file reconfig event occurs here.. changed the name of the source 
> component instance from r1 to r2>
> *****MyBasicSource():conf-file-poller-0
> *****setName(String):conf-file-poller-0
> *****setChannelProcessor(ChannelProcessor):conf-file-poller-0
> *****stop():conf-file-poller-0
> *****start():lifecycleSupervisor-1-4
> Some things to notice:
> 1) The LifecycleAware method are called 3 different threads. start() is 
> called by lifecycleSupervisor-1-2, stop() is called by conf-file-poller-0 
> (!... expected?), and the start() in the 2nd instance of MyBasicSource is 
> called by lifecycleSupervisor-1-4. Although I can understand that different 
> instances (r1 and r2) can be called by different lifecycle threads, note that 
> even the same instance (r1) had its start() method called by different 
> lifecycle threads before and after reconfig. My concern about this relates to 
> visibility of field values and happens-before relationships. If one thread 
> calls stop() and sets field values to null, while another thread calls start 
> to set field values to some value, then the call by the first thread might 
> propagate to the working space of the 2nd thread sometime after thread 2 
> writes the value, which means that it could be possible that the new instance 
> receives a null value sometime after the 2nd start() is called. JCIP.
> 2) After the reconfig, setName(..) and setChannelProcessor(..) are being 
> called before stop().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to