Dear Ruben, Alain,
our improved iSCSI driver set that we proposed before should solve this issue.
As mentioned in the ticket, it is possible to simultaneously start hundreds of
non persistent virtual machines.
The TM concurrency level is 15.
You can check the details at: http://dev.opennebula.org/issues/1592
All the best,
Mark Gergely
MTA-SZTAKI LPDS
On 2012.12.06., at 20:01, "Ruben S. Montero" wrote:
> Hi Alain,
>
> You are totally right, this may be a problem when instantiated
> multiple VMs at the same time. I've filled an issue to look for the
> best way to generate the TID [1].
>
> We'd be interested in updating the tgtadm_next_tid function in
> scripts_common.sh. Also if the tgt server is getting overloaded by
> this simultaneous deployments, there are several ways to limit the
> concurrency of the TM (e.g. the -t option in oned.conf)
>
> THANKS for the feedback!
>
> Ruben
>
> [1] http://dev.opennebula.org/issues/1682
>
> [1] http://dev.opennebula.org/issues/1682
>
> On Thu, Dec 6, 2012 at 1:52 PM, Alain Pannetrat
> wrote:
>> Hi all,
>>
>> I'm new to OpenNebula and this mailing list, so forgive me if I
>> stumble over a topic that may have already been discussed.
>>
>> I'm currently discovering opennebula 3.8.1 with a simple 3 node
>> system: a control node, a compute node and a datastore node
>> (iscsi+lvm).
>>
>> I have been testing the bulk instantiation of virtual machines in
>> sunstone, where I initiate the bulk creation of 8 virtual machines in
>> parallel. I have noticed that between 2 and 4 machines just fail to
>> instantiate correctly with the typical following error message:
>>
>> 08 2012 [TM][I]: Command execution fail:
>> /var/lib/one/remotes/tm/iscsi/clone
>> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26
>> compute.admin.lan:/var/lib/one//datastores/0/111/disk.0 111 101
>> Thu Dec 6 14:40:08 2012 [TM][E]: clone: Command "set -e
>> Thu Dec 6 14:40:08 2012 [TM][I]: set -x
>> Thu Dec 6 14:40:08 2012 [TM][I]:
>> Thu Dec 6 14:40:08 2012 [TM][I]: # get size
>> Thu Dec 6 14:40:08 2012 [TM][I]: SIZE=$(sudo lvs --noheadings -o
>> lv_size "/dev/vg-one/lv-one-26")
>> Thu Dec 6 14:40:08 2012 [TM][I]:
>> Thu Dec 6 14:40:08 2012 [TM][I]: # create lv
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo lvcreate -L${SIZE} vg-one -n
>> lv-one-26-111
>> Thu Dec 6 14:40:08 2012 [TM][I]:
>> Thu Dec 6 14:40:08 2012 [TM][I]: # clone lv with dd
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo dd if=/dev/vg-one/lv-one-26
>> of=/dev/vg-one/lv-one-26-111 bs=64k
>> Thu Dec 6 14:40:08 2012 [TM][I]:
>> Thu Dec 6 14:40:08 2012 [TM][I]: # new iscsi target
>> Thu Dec 6 14:40:08 2012 [TM][I]: TID=$(sudo tgtadm --lld iscsi --op
>> show --mode target | grep "Target" | tail -n 1 |
>> awk '{split($2,tmp,":"); print tmp[1]+1;}')
>> Thu Dec 6 14:40:08 2012 [TM][I]:
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new
>> --mode target --tid $TID --targetname
>> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26-111
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op bind
>> --mode target --tid $TID -I ALL
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new
>> --mode logicalunit --tid $TID --lun 1 --backing-store
>> /dev/vg-one/lv-one-26-111
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgt-admin --dump |sudo tee
>> /etc/tgt/targets.conf > /dev/null 2>&1" failed: + sudo lvs
>> --noheadings -o lv_size /dev/vg-one/lv-one-26
>> Thu Dec 6 14:40:08 2012 [TM][I]: 131072+0 records in
>> Thu Dec 6 14:40:08 2012 [TM][I]: 131072+0 records out
>> Thu Dec 6 14:40:08 2012 [TM][I]: 8589934592 bytes (8.6 GB) copied,
>> 898.903 s, 9.6 MB/s
>> Thu Dec 6 14:40:08 2012 [TM][I]: tgtadm: this target already exists
>> Thu Dec 6 14:40:08 2012 [TM][E]: Error cloning
>> compute.admin.lan:/dev/vg-one/lv-one-26-111
>> Thu Dec 6 14:40:08 2012 [TM][I]: ExitCode: 22
>> Thu Dec 6 14:40:08 2012 [TM][E]: Error executing image transfer
>> script: Error cloning compute.admin.lan:/dev/vg-one/lv-one-26-111
>> Thu Dec 6 14:40:09 2012 [DiM][I]: New VM state is FAILED
>>
>> After adding traces in the code, I found that there seems to be a race
>> condition in /var/lib/one/remotes/tm/iscsi/clone here the following
>> commands get executed:
>>
>> TID=\$($SUDO $(tgtadm_next_tid))
>> $SUDO $(tgtadm_target_new "\$TID" "$NEW_IQN")
>>
>> These commands are typically expanded to something like this:
>>