[
https://issues.apache.org/jira/browse/VCL-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804575#comment-13804575
]
Azade Khalaj edited comment on VCL-693 at 10/24/13 8:37 PM:
------------------------------------------------------------
During the creation of a cluster reservation, a vcld process is created for
each computer which is loaded with the “reserved” module. This vcld process
changes the state of the computer to "reserved", runs vcl post reserve scripts,
etc., then stays running until the user logs into the computer. If the user
does not log into one of computers of the cluster reservation, the above
mentioned vcld process, corresponds to that computer, stays running. After
reinstalling the reservation, the main vcld process does not create the child
vcld process with the “reserved” module, since it sees that a process is still
running for that computer (vcld, subroutine: main, line: 276). As a result, the
sate of the computer is not changed to the “reserved” and post reserve scripts
are not executed. The “Pending ...” message on the “Current Reservation” page
changes to the “Connect” button only when the state of all computers of a
cluster reservation is changed to “reserved”. Although, in this situation,
since the state of some computers, that were not logged into before the
reinstallation, has never changed to the “reserved”, the “Connect” button never
appears after reinstalling of the cluster reservation.
Our solution:
At the start of the “new” module, that is executed for all computer after
reinstallation, a script is run that kills all processes that are running for
this computer. These processes have <requestid>:<reservationid> included in
their name (this vcld process, running the “new” module, is excluded). As a
result, after finishing the execution of the “new” module, the main vcld
process processes reservation for all computers, and runs the “reserved” module
for all of computers. So, the state of all computers changes to "reserved" and
the "Connect" button appears.
was (Author: azade):
During the creation of a cluster reservation, a vcld process is created for
each computer which is loaded with the “reserved” module. This vcld process
changes the state of the computer to "reserved", runs vcl post reserve scripts,
etc., then stays running until the user logs into the computer. If the user
does not log into one of computers of the cluster reservation, the above
mentioned vcld process, corresponds to that computer, stays running. After
reinstalling the reservation, the main vcld process does not create the child
vcld process with the “reserved” module, since it sees that a process is still
running for that computer (vcld, subroutine: main, line: 276). As a result, the
sate of the computer is not changed to the “reserved” and post reserve scripts
are not executed. The “Pending ...” message on the “Current Reservation” page
changes to the “Connect” button only when the state of all computers of a
cluster reservation is changed to “reserved”. Although, in this situation,
since the state of some computers, that were not logged into before the
reinstallation, has never changed to the “reserved”, the “Connect” button never
appears after reinstalling of the cluster reservation.
Our soulution:
At the start of the “new” module, that is executed for all computer after
reinstallation, a script is run that kills all processes that are running for
this computer. These processes have <requestid>:<reservationid> included in
their name (this vcld process, running the “new” module, is excluded). As a
result, after finishing the execution of the “new” module, the main vcld
process processes reservation for all computers, and runs the “reserved” module
for all of computers. So, the state of all computers changes to "reserved" and
the "Connect" button appears.
> VCL Cluster Reinstall Fails
> ---------------------------
>
> Key: VCL-693
> URL: https://issues.apache.org/jira/browse/VCL-693
> Project: VCL
> Issue Type: Bug
> Components: vcld (backend)
> Affects Versions: 2.3.1
> Environment: CentOS, libvirt/kvm
> Reporter: Nathaniel Sherry
> Labels: cluster, vcld
>
> It seems that when I reinstall a cluster from the "Current Reservations"
> page, the post_reserve scripts don't get run on the child nodes. I think the
> issue is that vcld isn't spawning a child process for the child nodes when
> invoking reserved.pm after finishing new.pm during the reinstallation. The
> comptuer state of the child is also left at "Reloading".
> This results in the "Connect" button never reappearing.
> DB - http://pastebin.com/waHeP2wd
> Log - http://pastebin.com/EsqZhbTi
--
This message was sent by Atlassian JIRA
(v6.1#6144)