On 12/08/16 12:08 pm, "Fam Zheng" <f...@redhat.com> wrote:





>On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
>> From: Prerna Saxena <prerna.sax...@nutanix.com>
>> 
>> The set_mem_table command currently does not seek a reply. Hence, there is
>> no easy way for a remote application to notify to QEMU when it finished
>> setting up memory, or if there were errors doing so.
>> 
>> As an example:
>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
>> application). SET_MEM_TABLE does not require a reply according to the spec.
>> (2) Qemu commits the memory to the guest.
>> (3) Guest issues an I/O operation over a new memory region which was 
>> configured on (1).
>> (4) The application has not yet remapped the memory, but it sees the I/O 
>> request.
>> (5) The application cannot satisfy the request because it does not know 
>> about those GPAs.
>> 
>> While a guaranteed fix would require a protocol extension (committed 
>> separately),
>> a best-effort workaround for existing applications is to send a GET_FEATURES
>> message before completing the vhost_user_set_mem_table() call.
>> Since GET_FEATURES requires a reply, an application that processes vhost-user
>> messages synchronously would probably have completed the SET_MEM_TABLE 
>> before replying.
>> 
>> Signed-off-by: Prerna Saxena <prerna.sax...@nutanix.com>
>> Reviewed-by: Michael S. Tsirkin <m...@redhat.com>
>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
>
>Sporadic hangs are seen with test-vhost-user after this patch:
>
>https://travis-ci.org/qemu/qemu/builds
>
>Reverting seems to fix it for me.
>
>Is this a known problem?
>
>Fam

Hi Fam,
Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my 
Centos 6 environment, so missed this.
I am setting up the docker test env to repro this, but I think I can guess the 
problem :

In tests/vhost-user-test.c: 

static void chr_read(void *opaque, const uint8_t *buf, int size)
{
..[snip]..

    case VHOST_USER_SET_MEM_TABLE:
       /* received the mem table */
       memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
       s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));


       /* signal the test that it can continue */
       g_cond_signal(&s->data_cond);
       break;
..[snip]..
}


The test seems to be marked complete as soon as mem_table is copied. 
However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost 
command implementation with qemu. SET_MEM_TABLE now sends out a new message 
GET_FEATURES, and the call is only completed once it receives features from the 
remote application. (or the test framework, as is the case here.)
While the test itself can be modified (Do not signal completion until we’ve 
sent a follow-up response to GET_FEATURES), I am now wondering if this patch 
may break existing vhost applications too ? If so, reverting it possibly better.
What confuses me is why it doesn’t fail all the time, but only about 20% to 30% 
time as Fam reports. 

Thoughts : Michael, Fam, MarcAndre ?

Regards,
Prerna 

Reply via email to