Hi. I sent an email the other day about difficulties initializing DPDK with a certain NIC card. Basically I got bizarre errors when I added a dpdk port to a bridge using this card (Mellanox CX3Pro) with OVS 2.5+DPDK-16.04 (ovs 2.5 + some commits on branch-2.5, and with a patch for DPDK 16.04 constants) after an apparently normal EAL initialization. I found that not daemonizing the vswitchd process fixed the issue, and created a patch to initialize the eal after daemonization instead of before in vswitchd/ovs-vswitchd.c, and this fixed the issue. My email was then replied to, and I was asked to try it also with 2.5.90. I did this and the issue went away. I found that the commit that fixed it was bab6940, which changed how dpdk was initialized amongst other things; it was initialized this time during bridge_run which was after the daemonization of the vswitchd process. This can't be backported to 2.5 because it's also the commit that changes DPDK to initialize itself from the ovs database.

I then attempted to find out exactly what was causing this problem. An obvious explanation for this issue was that rte_eal_init created threads which were killed when, after they were created, the vswitchd process was daemonized. So I set a watchpoint for pthread_create and fork and ran ovs-vswitchd linked against DPDK 16.04 and found thatthere are actually several calls to pthread_create:

    RTE_LCORE_FOREACH_SLAVE(i) {

        /*
         * create communication pipes between master thread
         * and children
         */
        if (pipe(lcore_config[i].pipe_master2slave) < 0)
            rte_panic("Cannot create pipe\n");
        if (pipe(lcore_config[i].pipe_slave2master) < 0)
            rte_panic("Cannot create pipe\n");

        lcore_config[i].state = WAIT;

        /* create a thread for each lcore */
        ret = pthread_create(&lcore_config[i].thread_id, NULL,
                     eal_thread_loop, NULL);
        if (ret != 0)
            rte_panic("Cannot create thread\n");

        /* Set thread_name for aid in debugging. */
        snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
            "lcore-slave-%d", i);
        ret = rte_thread_setname(lcore_config[i].thread_id,
                        thread_name);
        if (ret != 0)
            RTE_LOG(ERR, EAL,
                "Cannot set name for lcore thread\n");
    }

This is during rte_eal_init which is called before daemonization (if --daemonize is passed). This is true for DPDK 2.2 and DPDK 16.04 - all these threads will die when the parent process exits as part of daemonization according to the best of my (incomplete) knowledge about how pthreads/unix processes work. Even so this same software without any changes did work with the niantic NIC. Could someone explain to me if and how this is correct, or if it needs fixing? If so, is there a chance we can get a patch in branch-2.5 that changes the way DPDK initializes? bab6940 can't be used because it changes also the way DPDK gets its parameters.


Thanks,

  John

_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to