On Wed, Oct 29, 2014 at 03:22:25PM +0000, Ramia, Kannan Babu wrote: > The problem still remains if one of the process gets abruptly killed, then > the corresponding refcount will not get decremented. There should be some > scavenging function to be implemented to check the process liveliness. > Well, abnormal termination results in abnormal consequences. You expect garbage to get left behind of a program crashes, so I wouldn't really worry about that too much. If you really wanted to you can register chained handlers for SIGSEGV/SIGBUS/etc to catch those conditions, but honestly, that seems like overkill. If a program that uses shared resources terminates abnormally, its well understood that those shared resources may not get released properly, and manual intervention is required to clean them up
Neil > regards > Kannan Babu > > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman > Sent: Wednesday, October 29, 2014 7:58 PM > To: Richardson, Bruce > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] add free hugepage function > > On Wed, Oct 29, 2014 at 10:26:35AM +0000, Bruce Richardson wrote: > > On Wed, Oct 29, 2014 at 02:49:05PM +0800, Linhaifeng wrote: > > > > > > > > > On 2014/10/29 13:26, Qiu, Michael wrote: > > > > ? 10/29/2014 11:46 AM, Matthew Hall ??: > > > >> On Wed, Oct 29, 2014 at 03:27:58AM +0000, Qiu, Michael wrote: > > > >>> I just saw one return path with value '0', and no any other > > > >>> place return a negative value, so it is better to be designed > > > >>> as one non-return function, > > > >>> > > > >>> +void > > > >>> +rte_eal_hugepage_free(void) > > > >>> +{ > > > >>> + struct hugepage_file *hugepg_tbl = g_hugepage_table.hugepg_tbl; > > > >>> + unsigned i; > > > >>> + unsigned nr_hugefiles = g_hugepage_table.nr_hugefiles; > > > >>> + > > > >>> + RTE_LOG(INFO, EAL, "unlink %u hugepage files\n", > > > >>> +nr_hugefiles); > > > >>> + > > > >>> + for (i = 0; i < nr_hugefiles; i++) { > > > >>> + unlink(hugepg_tbl[i].filepath); > > > >>> + hugepg_tbl[i].orig_va = NULL; > > > >>> + } > > > >>> +} > > > >>> + > > > >>> > > > >>> Thanks, > > > >>> Michael > > > >> Actually, I don't think that's quite right. > > > >> > > > >> http://linux.die.net/man/2/unlink > > > >> > > > >> "On success, zero is returned. On error, -1 is returned, and > > > >> errno is set appropriately." So it should be returning an error, > > > >> and logging a message for a file it cannot unlink or people will be > > > >> surprised with weird failures. > > > > > > > > Really need one message for unlink failed, but I'm afraid that if > > > > it make sense for return an error code when application exit. > > > > > > > > Thanks > > > > Michael > > > >> It also had some minor typos / English in the comments but we can fix > > > >> that too. > > > >> > > > >> Matthew. > > > >> > > > > > > > > > > > > > > > Agree.May be it is not need to return error? > > > -- > > > Regards, > > > Haifeng > > > > > > > May I throw out a few extra ideas. > > > > The basic problem with DPDK and hugepages on exit, as you have already > > diagnosed, is not that the hugepages aren't released for re-use, its > > that the files inside the hugepage directory aren't unlinked. There > > are two > I agree, this patch seems rather unbalanced. Hugepages are initalized on > startup, not allocated explicitly, and so should not be freed on exit. > Instead, they should be globally refcounted and (potentially per > configuration) freed when the last application to exit drops the refcount to > zero. > > > considerations here that prevented us from coming up previously with a > > solution to this that we were fully happy with. > > 1. There is no automatic way (at least none that I'm aware of), to > > have the kernel automatically delete a file on exit. This means that > > any scheme to delete the files on application termination will have to > > be a voluntary one that won't happen if an app crashed or is forcibly > > terminated. That is why the EAL right now does the clean-up on the restart > > of the app instead. > Thats not true. The common POSIX compliant method is to use the atexit() > function to register a function to be called when a process terminates. It > can be used from rte_eth_hugepage_init so that a refcount is increased there > and decreased when the process exits. The atexit registered function can > also check the refcount for zero and, dependent on configuration, unlink the > hugepage files. That way the application can be completely unaware of a need > to do freeing/unlinking > > > 2. The other consideration is multi-process. In a multi-process > > environment, it is unclear when exactly an app is finished - just > > because one process terminates does not mean that the whole app is > > finished with the hugepage files. If the files are deleted before a > > DPDK secondary process starts up, it won't be able to start as it > > won't be able to mmap the hugepage memory it needs. > > > Refcounting fixes that. If every process takes a reference count on the > hugepage set, then you unlink it when the refcount hits zero. > > > Now, that being said, I think we could see about having automatic > > hugepage deletion controlled by an EAL parameter. That way, the > > parameter could be omitted in the multi-process case, but could be > > used for those who want to have a clean hugepage dir after app termination. > > > > What I think we could do is, instead of having the app save the list > > of hugepages it uses as the app initializes, is to have the app delete > > the hugepage files as soon as it closes the filehandle for them. If my > > understanding is correct, the kernel will actually keep the hugepage > > memory around in the app as usual from that point onwards, and will > > only release it back [automatically] on app termination. New processes > > won't be able to mmap the file, but the running process will be > > unaffected. The best thing about this approach is that the hugepage > > file entries will be deleted whether or not the application terminates > > normally or crashes. > > > That doesn't really work in a use case in which processes are created and > removed (i.e. if you have two processes, one exits, then another one is > forked, that third process won't find the hugepage file, because the second > process will have unlinked it on exit). > > > So, any thoughts? Would such a scheme work? I haven't prototyped it, > > yet, but I think it should work. > > > > Regards, > > /Bruce > > > > PS: As well as multi-process not being able to use this scheme, I'd > > also note that this would prevent the dump_cfg utility app - or any > > other DPDK analysis app like it - from being used to inspect the > > memory area of the running process. That is another reason why I think > > the flag should not be set by default. > > > >