As I mentioned before... "The information for an AsterixDB instance is "lazily" refreshed when a management operation is invoked (using managix set of commands) or an explicit describe command is invoked. "
Above, the commands are the Managix set of commands (create, start, describe etc.) that trigger a refresh and so its "lazy". Currently CC does not notify Managix. what we are discussing are the elegant way to have CC relay information to Managix. On Tue, Aug 25, 2015 at 4:10 AM, abdullah alamoudi <[email protected]> wrote: > I don't think that is there yet but the intention is to have it at some > point in the future. > > Cheers, > Abdullah. > > On Tue, Aug 25, 2015 at 12:38 PM, Chris Hillery <[email protected]> > wrote: > > > Very interesting, thank you. Can you point out a couple places in the > code > > where some of this logic is kept? Specifically where "CC can update this > > information and notify Managix" sounds interesting... > > > > Ceej > > aka Chris Hillery > > > > On Tue, Aug 25, 2015 at 12:49 AM, Raman Grover <[email protected]> > > wrote: > > > > > > , and what code is > > > > responsible for keeping it up-to-date? > > > > > > > Apparently, no one is :-) > > > > > > The information for an AsterixDB instance is "lazily" refreshed when a > > > management operation is invoked (using managix set of commands) or an > > > explicit describe command is invoked. > > > Between the time t1 (when state of an AsterixDB instance changes, say > due > > > to NC failure) and t2 (when a management operation is invoked), the > > > information about the AsterixDB instance inside Zookeeper remains > stale. > > CC > > > can update this information and notify Managix; this way Managix > realizes > > > the changed state as soon as it has occurred. This can be particularly > > > useful when showing on a management console the up-to-date state of an > > > instance in real time or having Managix respond to an event. > > > > > > Regards, > > > Raman > > > > > > ---------- Forwarded message ---------- > > > From: abdullah alamoudi <[email protected]> > > > Date: Tue, Aug 25, 2015 at 12:27 AM > > > Subject: Re: The solution to the sporadic connection refused exceptions > > > To: [email protected] > > > > > > > > > On Tue, Aug 25, 2015 at 3:40 AM, Chris Hillery <[email protected]> > > > wrote: > > > > > > > Perhaps an aside, but: exactly what is kept in Zookeeper > > > > > > > > > A serialized instance of > edu.uci.ics.asterix.event.model.AsterixInstance > > > > > > > > > > , and what code is > > > > responsible for keeping it up-to-date? > > > > > > > Apparently, no one is :-) > > > > > > > > > > > > > > Ceej > > > > > > > > On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover < > [email protected] > > > > > > > wrote: > > > > > > > > > Well, the state of an instance (and metadata including > configuration) > > > is > > > > > kept in Zookeeper instance that is accessible to Managix and CC. CC > > > > should > > > > > be able to set the state of the cluster in Zookeeper under the > right > > > > znode > > > > > which can viewed by Managix. > > > > > > > > > > There exists a communication channel for CC and Managix to share > > > > > information on state etc. I am not sure if we need another channel > > such > > > > as > > > > > RMI between Managix and CC. > > > > > > > > > > Regards, > > > > > Raman > > > > > > > > > > > > > > > > > > > > On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Well, it depends on your definition of the boundaries of managix. > > > What > > > > I > > > > > > did is that I added an RMI object in the InstallerDriver which > > > > basically > > > > > > listen for state changes from the cluster controller. This means > > some > > > > > > additional logic in the CCApplicationEntryPoint where after the > CC > > is > > > > > > ready, it contacts the InstallerDriver using RMI and at that > point > > > > only, > > > > > > the InstallerDriver can return to managix and tells it that the > > > startup > > > > > is > > > > > > complete. > > > > > > > > > > > > Not sure if this is the right way to do it but it definitely is > > > better > > > > > than > > > > > > what we currently have. > > > > > > Abdullah. > > > > > > > > > > > > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery > > > <[email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hopefully the solution won't involve additional important logic > > > > inside > > > > > > > Managix itself? > > > > > > > > > > > > > > Ceej > > > > > > > aka Chris Hillery > > > > > > > > > > > > > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi < > > > > [email protected] > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > That works but it doesn't feel right doing it this way. I am > > > going > > > > to > > > > > > fix > > > > > > > > this one for good. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Abdullah. > > > > > > > > > > > > > > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <[email protected]> > > > wrote: > > > > > > > > > > > > > > > > > The way I assured liveness for the YARN installer was to > try > > > > > running > > > > > > > "for > > > > > > > > > $x in dataset Metadata.Dataset return $x" via the API. I > just > > > > > polled > > > > > > > for > > > > > > > > a > > > > > > > > > reasonable amount of time (though honestly, thinking about > > it > > > > now, > > > > > > the > > > > > > > > > correct parameter to use for the polling interval is the > > > startup > > > > > wait > > > > > > > > time > > > > > > > > > in the parameters file :) ). It's not perfect, but it gives > > > less > > > > > > false > > > > > > > > > positives than just checking ps for processes that look > like > > > > > CCs/NCs. > > > > > > > > > > > > > > > > > > - Ian. > > > > > > > > > > > > > > > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi < > > > > > > [email protected] > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Now that I think about it. Maybe we should provide > multiple > > > > ways > > > > > to > > > > > > > do > > > > > > > > > > this. A polling mechanism to be used for arbitrary time > > and a > > > > > > pushing > > > > > > > > > > mechanism on startup. > > > > > > > > > > I am going to start implementation of this and will > > probably > > > > use > > > > > > RMI > > > > > > > > for > > > > > > > > > > this task both ways (CC to InstallerDriver and > > > InstallerDriver > > > > to > > > > > > > CC). > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Abdullah. > > > > > > > > > > > > > > > > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi < > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > So after further investigation, turned out our startup > > > > process > > > > > > just > > > > > > > > > > starts > > > > > > > > > > > the CC and NC processes and then make sure the > processes > > > are > > > > > > > running > > > > > > > > > and > > > > > > > > > > if > > > > > > > > > > > the processes were found to be running, it returns the > > > state > > > > of > > > > > > the > > > > > > > > > > cluster > > > > > > > > > > > to be active and the subsequent test commands can start > > > > > > > immediately. > > > > > > > > > > > > > > > > > > > > > > This means that the CC could've started but is not yet > > > ready > > > > > when > > > > > > > we > > > > > > > > > try > > > > > > > > > > > to process the next command. To address this, we need a > > > > better > > > > > > way > > > > > > > to > > > > > > > > > > tell > > > > > > > > > > > when the startup procedure has completed. we can do > this > > by > > > > > > pushing > > > > > > > > (CC > > > > > > > > > > > informs installer driver when the startup is complete) > or > > > > > polling > > > > > > > > (The > > > > > > > > > > > installer driver needs to actually query the CC for the > > > state > > > > > of > > > > > > > the > > > > > > > > > > > cluster). > > > > > > > > > > > > > > > > > > > > > > I can do either way so let's vote. My vote goes to the > > > > pushing > > > > > > > > > mechanism. > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi < > > > > > > > > > [email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> This solution turned out to be incorrect. Actually, > the > > > test > > > > > > cases > > > > > > > > > when > > > > > > > > > > I > > > > > > > > > > >> build after using the join method never fails but > > running > > > an > > > > > > > actual > > > > > > > > > > asterix > > > > > > > > > > >> instance never succeeds which is quite confusing. > > > > > > > > > > >> > > > > > > > > > > >> I also think that the startup script has a major bug > > where > > > > it > > > > > > > might > > > > > > > > > > >> returns before the startup is complete. More on this > > > > > later...... > > > > > > > > > > >> > > > > > > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi < > > > > > > > > > [email protected]> > > > > > > > > > > >> wrote: > > > > > > > > > > >> > > > > > > > > > > >>> It is highly unlikely that it is related. > > > > > > > > > > >>> > > > > > > > > > > >>> Cheers, > > > > > > > > > > >>> Abdullah. > > > > > > > > > > >>> > > > > > > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > >>> > > > > > > > > > > >>>> @Abdullah: Is this issue related to > > > > > > > > > > >>>> > https://issues.apache.org/jira/browse/ASTERIXDB-1074? > > > Ian > > > > > > and I > > > > > > > > > plan > > > > > > > > > > to > > > > > > > > > > >>>> look into the details on Monday. > > > > > > > > > > >>>> > > > > > > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi > < > > > > > > > > > > [email protected] > > > > > > > > > > >>>> > > > > > > > > > > > >>>> wrote: > > > > > > > > > > >>>> > > > > > > > > > > >>>> > About 3-4 days ago, I was working on the addition > of > > > the > > > > > > > > > filesystem > > > > > > > > > > >>>> based > > > > > > > > > > >>>> > feed adapter and it didn't take anytime to > complete. > > > > > > However, > > > > > > > > > when I > > > > > > > > > > >>>> wanted > > > > > > > > > > >>>> > to build and make sure all tests pass, I kept > > getting > > > > > > > > > > >>>> ConnectionRefused > > > > > > > > > > >>>> > errors which caused the installer tests to fail > > every > > > > now > > > > > > and > > > > > > > > > then. > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > I knew the new change had nothing to do with this > > > > failure, > > > > > > > yet, > > > > > > > > I > > > > > > > > > > >>>> couldn't > > > > > > > > > > >>>> > direct my attention away from this bug (It just > > > bothered > > > > > me > > > > > > so > > > > > > > > > much > > > > > > > > > > >>>> and I > > > > > > > > > > >>>> > knew it needs to be resolved ASAP). After wasting > > > > > countless > > > > > > > > > hours, I > > > > > > > > > > >>>> was > > > > > > > > > > >>>> > finally able to figure out what was happening :-) > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > In the startup routine, we start three Jetty web > > > servers > > > > > > (Web > > > > > > > > > > >>>> interface > > > > > > > > > > >>>> > server, JSON API server, and Feed server). > Sometime > > > ago, > > > > > we > > > > > > > used > > > > > > > > > to > > > > > > > > > > >>>> end the > > > > > > > > > > >>>> > startup call before making sure the > > server.isStarted() > > > > > > method > > > > > > > > > > returns > > > > > > > > > > >>>> true > > > > > > > > > > >>>> > on all servers. At that time, I introduced the > > > > > > > > > waitUntilServerStarts > > > > > > > > > > >>>> method > > > > > > > > > > >>>> > to make sure we don't return before the servers > are > > > > ready. > > > > > > > > Turned > > > > > > > > > > >>>> out, that > > > > > > > > > > >>>> > was an incorrect way to handle this (We can blame > > > > > > > stackoverflow > > > > > > > > > for > > > > > > > > > > >>>> this > > > > > > > > > > >>>> > one!) and it is not enough that the server > > isStarted() > > > > > > returns > > > > > > > > > true. > > > > > > > > > > >>>> The > > > > > > > > > > >>>> > correct way to do this is to call the > server.join() > > > > method > > > > > > > after > > > > > > > > > the > > > > > > > > > > >>>> > server.start(). > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > See: > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > This was equally satisfying as it was frustrating > > and > > > > you > > > > > > are > > > > > > > > > > welcome > > > > > > > > > > >>>> for > > > > > > > > > > >>>> > the future time I saved each of you :) > > > > > > > > > > >>>> > -- > > > > > > > > > > >>>> > Amoudi, Abdullah. > > > > > > > > > > >>>> > > > > > > > > > > > >>>> > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > > >>> -- > > > > > > > > > > >>> Amoudi, Abdullah. > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> -- > > > > > > > > > > >> Amoudi, Abdullah. > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Amoudi, Abdullah. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Amoudi, Abdullah. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Amoudi, Abdullah. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Amoudi, Abdullah. > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Raman > > > > > > > > > > > > > > > > > > > > > -- > > > Amoudi, Abdullah. > > > > > > > > > > > > -- > > > Raman > > > > > > > > > -- > Amoudi, Abdullah. > -- Raman
