[ 
https://issues.apache.org/jira/browse/VCL-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804575#comment-13804575
 ] 

Azade Khalaj edited comment on VCL-693 at 10/24/13 8:37 PM:
------------------------------------------------------------

During the creation of a cluster reservation, a vcld process is created for 
each computer which is loaded with the “reserved” module. This vcld process 
changes the state of the computer to "reserved", runs vcl post reserve scripts, 
etc., then stays running until the user logs into the computer. If the user 
does not log into one of computers of the cluster reservation, the above 
mentioned vcld process, corresponds to that computer, stays running. After 
reinstalling the reservation, the main vcld process does not create the child 
vcld process with the “reserved” module, since it sees that a process is still 
running for that computer (vcld, subroutine: main, line: 276). As a result, the 
sate of the computer is not changed to the “reserved” and post reserve scripts 
are not executed. The “Pending ...” message  on the “Current Reservation” page 
changes to the “Connect” button only when the state of all computers of a 
cluster reservation is changed to “reserved”. Although, in this situation, 
since the state of some computers, that were not logged into before the 
reinstallation, has never changed to the “reserved”, the “Connect” button never 
appears after reinstalling of the cluster reservation.

Our solution:
At the start of the “new” module, that is executed for all computer after 
reinstallation, a script is run that kills all processes that are running for 
this computer. These processes have <requestid>:<reservationid> included in 
their name (this vcld process, running the “new” module, is excluded). As a 
result, after finishing the execution of the “new” module, the main vcld 
process processes reservation for all computers, and runs the “reserved” module 
for all of computers. So, the state of all computers changes to "reserved" and 
the "Connect" button appears.



was (Author: azade):
During the creation of a cluster reservation, a vcld process is created for 
each computer which is loaded with the “reserved” module. This vcld process 
changes the state of the computer to "reserved", runs vcl post reserve scripts, 
etc., then stays running until the user logs into the computer. If the user 
does not log into one of computers of the cluster reservation, the above 
mentioned vcld process, corresponds to that computer, stays running. After 
reinstalling the reservation, the main vcld process does not create the child 
vcld process with the “reserved” module, since it sees that a process is still 
running for that computer (vcld, subroutine: main, line: 276). As a result, the 
sate of the computer is not changed to the “reserved” and post reserve scripts 
are not executed. The “Pending ...” message  on the “Current Reservation” page 
changes to the “Connect” button only when the state of all computers of a 
cluster reservation is changed to “reserved”. Although, in this situation, 
since the state of some computers, that were not logged into before the 
reinstallation, has never changed to the “reserved”, the “Connect” button never 
appears after reinstalling of the cluster reservation.

Our soulution:
At the start of the “new” module, that is executed for all computer after 
reinstallation, a script is run that kills all processes that are running for 
this computer. These processes have <requestid>:<reservationid> included in 
their name (this vcld process, running the “new” module, is excluded). As a 
result, after finishing the execution of the “new” module, the main vcld 
process processes reservation for all computers, and runs the “reserved” module 
for all of computers. So, the state of all computers changes to "reserved" and 
the "Connect" button appears.


> VCL Cluster Reinstall Fails
> ---------------------------
>
>                 Key: VCL-693
>                 URL: https://issues.apache.org/jira/browse/VCL-693
>             Project: VCL
>          Issue Type: Bug
>          Components: vcld (backend)
>    Affects Versions: 2.3.1
>         Environment: CentOS, libvirt/kvm
>            Reporter: Nathaniel Sherry
>              Labels: cluster, vcld
>
> It seems that when I reinstall a cluster from the "Current Reservations" 
> page, the post_reserve scripts don't get run on the child nodes. I think the 
> issue is that vcld isn't spawning a child process for the child nodes when 
> invoking reserved.pm after finishing new.pm during the reinstallation. The 
> comptuer state of the child is also left at "Reloading". 
> This results in the "Connect" button never reappearing.
> DB - http://pastebin.com/waHeP2wd
> Log - http://pastebin.com/EsqZhbTi



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to