[dpdk-dev] DPDK's vhost-user logging capability

2016-03-23 Thread shesha Sreenivasamurthy (shesha)
Hi All,

I was going over vhost-user migration capability in DPDK in lieu of a Cisco's 
multi-q DPDK vhost-user application. I see that log_base address is implemented 
as per virtio_net device. However, desc, addr and used is per vhost_virtqueue. 
Additionally, QEMU sends one VHOST_USER_SET_LOG_BASE per queue-pair (QEMU - 
hw/virtio/vhost.c::vhost_dev_set_log).

Does it mean we need to log dirty pages of all rings to same location ? If that 
is the case then why does QEMU sends separate VHOST_USER_SET_LOG_BASE per queue 
pair ?

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }


[dpdk-dev] Reshuffling of rte_mbuf structure.

2015-11-04 Thread shesha Sreenivasamurthy (shesha)
Is there a way where we can just define the fields that ought to be there in 
the mbuf structure, but the position and size is implementation dependent ? The 
application can provide "mbuf_impl.h" that contains mbuf_rte fields in the 
order that seems appropriate to application.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: Matthew Hall mailto:mh...@mhcomputing.net>>
Date: Monday, November 2, 2015 at 4:21 PM
To: Thomas Monjalon mailto:thomas.monjalon at 
6wind.com>>
Cc: Cisco Employee mailto:shesha at cisco.com>>, Arnon 
Warshavsky mailto:arnon at qwilt.com>>, "dev at 
dpdk.org" mailto:dev at dpdk.org>>
Subject: Re: [dpdk-dev] Reshuffling of rte_mbuf structure.

On Mon, Nov 02, 2015 at 11:51:23PM +0100, Thomas Monjalon wrote:
But it is simpler to say that having an API depending of some options
is a "no-design" which could seriously slow down the DPDK adoption.

What about something similar to how Java JNI works? It needed to support
multiple Java JRE / JDK brands, implementations etc. Upon initialization, a
function pointer array is created, and specific slots are filled with pointers
to the real implementation of some native API functions you can call from
inside your library to perform operations.

In the DPDK case, we need flexible data instead of flexible function
implementations.

To do this there would be some pointer slots in the mbuf that are are filled
with pointers to metadata for required DPDK features. The data could be placed
in the following cachelines, using some reserved tailroom between the mbuf
control block and the packet data block. Then the prefetch could be set up to
prefetch only the used parts of the tailroom at any given point, to prevent
unwanted slowdowns.

Matthew.



[dpdk-dev] Reshuffling of rte_mbuf structure.

2015-11-02 Thread shesha Sreenivasamurthy (shesha)
Ok, You are saying re-order the fields based on the configurations params. I 
took word "NO" in the param to eliminate. Sure, this does not require and 
change in the code that uses it. Will it not now boil down to same as having 
completely different layout definition and be more messier ?

For example: Rather than having:

#ifdef NO_TX_OFFLOAD
Struct mbuf_rte {
fieldA
field1
field2
fieldB
field4
filed5
};
#endif

#ifdef NO_MULTISEG
Struct mbuf_rte{
fieldA
field2
field1
fieldB
filed5
field4
}
#endif

We end up having

Struct mbuf_rte {
fieldA
#ifdef NO_TX_OFFLOAD
field1
field2
#endif
#ifdef NO_MULTISEG
field2
field1
#endif
fieldB
#ifdef NO_TX_OFFLOAD
field4
field5
#endif
#ifdef NO_MULTISEG
field5
field4
#endif
};



--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: Arnon Warshavsky mailto:ar...@qwilt.com>>
Date: Monday, November 2, 2015 at 10:35 AM
To: Cisco Employee mailto:shesha at cisco.com>>
Cc: Stephen Hemminger mailto:stephen at 
networkplumber.org>>, "dev at dpdk.org<mailto:dev at dpdk.org>" mailto:dev at dpdk.org>>
Subject: Re: [dpdk-dev] Reshuffling of rte_mbuf structure.

If NO_TX_OFFLOAD only changes the layout in terms of relative field location in 
cache lines, and does not eliminate the fields themselves
why should the using code be affected?

On Mon, Nov 2, 2015 at 8:30 PM, shesha Sreenivasamurthy (shesha) mailto:shesha at cisco.com>> wrote:
One issue I see with optimization config options such as NO_TX_OFFLOAD, 
NO_MULTISEG, NO_REFCOUNT is: It is not sufficient to have those "Ifdefs" inside 
mbuf structure, but should be sprinkled all over the code where corresponding 
fields are used. This may make the code messier.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: Stephen Hemminger mailto:step...@networkplumber.org>>
Date: Monday, November 2, 2015 at 8:24 AM
To: Arnon Warshavsky mailto:arnon at qwilt.com>>
Cc: Cisco Employee mailto:shesha at cisco.com>>, "dev at 
dpdk.org<mailto:dev at dpdk.org>" mailto:dev at dpdk.org>>
Subject: Re: [dpdk-dev] Reshuffling of rte_mbuf structure.

On Sun, 1 Nov 2015 06:45:31 +0200
Arnon Warshavsky mailto:arnon at qwilt.com>> wrote:

My 2 cents,
This was brought up in the recent user space summit, and it seems that
indeed there is no one cache lines arrangement that fits all.
OTOH multiple compile time options to suffice all flavors, would make it
unpleasant to read maintain test and debug.
(I think there was quiet a consensus in favor of reducing compile options
in general)
Currently I manage similar deviations via our own source control which I
admit to be quite a pain.
I would prefer an option of code manipulation/generation by some script
during dpdk install,
which takes the default version of rte_mbuf.h,
along with an optional user file (json,xml,elvish,whatever) defining the
structure replacements,
creating your custom version, and placing it instead of the installed copy
of rte_mbuf.h.
Maybe the only facility required from dpdk is just the ability to register
calls to such user scripts at some install stage(s), providing the mean
along with responsibility to the user.
/Arnon
On Sat, Oct 31, 2015 at 6:44 AM, shesha Sreenivasamurthy (shesha) <
shesha at cisco.com<mailto:shesha at cisco.com>> wrote:
> In Cisco, we are using DPDK for a very high speed packet processor
> application. We don't use NIC TCP offload / RSS hashing. Putting those
> fields in the first cache-line - and the obligatory mb->next datum in the
> second cache line - causes significant LSU pressure and performance
> degradation. If it does not affect other applications, I would like to
> propose reshuffling of fields so that the obligator "next" field falls in
> first cache line and RSS hashing goes to next. If this re-shuffling indeed
> hurts other applications, another idea is to make it compile time
> configurable. Please provide feedback.
>
> --
> - Thanks
> char * (*shesha) (uint64_t cache, uint8_t F00D)
> { return 0xC0DE; }
>

Having different layouts will be a disaster for distro's they have to choose 
one.
And I hate to introduce more configuration!

But we see the same issue. It would make sense if there were configuration 
options
for some common optimizations NO_TX_OFFLOAD, NO_MULTISEG, NO_REFCOUNT and then
the mbuf got optimized for those combinations. Seems better than config options
like LAYOUT1, LAYOUT2, ...

In this specific case, I think lots of driver could be check nb_segs == 1 and 
avoiding
the next field for simple packets.

Long term, I think this will be losing battle. As DPDK grows more features, the 
current
mbuf structure will grow there is really nothing stopping the bloat of meta 
data.




--

Arnon Warshavsky
Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon at 
qwilt.com<mailto:arnon at qwilt.com>


[dpdk-dev] Reshuffling of rte_mbuf structure.

2015-11-02 Thread shesha Sreenivasamurthy (shesha)
One issue I see with optimization config options such as NO_TX_OFFLOAD, 
NO_MULTISEG, NO_REFCOUNT is: It is not sufficient to have those "Ifdefs" inside 
mbuf structure, but should be sprinkled all over the code where corresponding 
fields are used. This may make the code messier.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: Stephen Hemminger mailto:step...@networkplumber.org>>
Date: Monday, November 2, 2015 at 8:24 AM
To: Arnon Warshavsky mailto:arnon at qwilt.com>>
Cc: Cisco Employee mailto:shesha at cisco.com>>, "dev at 
dpdk.org<mailto:dev at dpdk.org>" mailto:dev at dpdk.org>>
Subject: Re: [dpdk-dev] Reshuffling of rte_mbuf structure.

On Sun, 1 Nov 2015 06:45:31 +0200
Arnon Warshavsky mailto:arnon at qwilt.com>> wrote:

My 2 cents,
This was brought up in the recent user space summit, and it seems that
indeed there is no one cache lines arrangement that fits all.
OTOH multiple compile time options to suffice all flavors, would make it
unpleasant to read maintain test and debug.
(I think there was quiet a consensus in favor of reducing compile options
in general)
Currently I manage similar deviations via our own source control which I
admit to be quite a pain.
I would prefer an option of code manipulation/generation by some script
during dpdk install,
which takes the default version of rte_mbuf.h,
along with an optional user file (json,xml,elvish,whatever) defining the
structure replacements,
creating your custom version, and placing it instead of the installed copy
of rte_mbuf.h.
Maybe the only facility required from dpdk is just the ability to register
calls to such user scripts at some install stage(s), providing the mean
along with responsibility to the user.
/Arnon
On Sat, Oct 31, 2015 at 6:44 AM, shesha Sreenivasamurthy (shesha) <
shesha at cisco.com<mailto:shesha at cisco.com>> wrote:
> In Cisco, we are using DPDK for a very high speed packet processor
> application. We don't use NIC TCP offload / RSS hashing. Putting those
> fields in the first cache-line - and the obligatory mb->next datum in the
> second cache line - causes significant LSU pressure and performance
> degradation. If it does not affect other applications, I would like to
> propose reshuffling of fields so that the obligator "next" field falls in
> first cache line and RSS hashing goes to next. If this re-shuffling indeed
> hurts other applications, another idea is to make it compile time
> configurable. Please provide feedback.
>
> --
> - Thanks
> char * (*shesha) (uint64_t cache, uint8_t F00D)
> { return 0xC0DE; }
>

Having different layouts will be a disaster for distro's they have to choose 
one.
And I hate to introduce more configuration!

But we see the same issue. It would make sense if there were configuration 
options
for some common optimizations NO_TX_OFFLOAD, NO_MULTISEG, NO_REFCOUNT and then
the mbuf got optimized for those combinations. Seems better than config options
like LAYOUT1, LAYOUT2, ...

In this specific case, I think lots of driver could be check nb_segs == 1 and 
avoiding
the next field for simple packets.

Long term, I think this will be losing battle. As DPDK grows more features, the 
current
mbuf structure will grow there is really nothing stopping the bloat of meta 
data.



[dpdk-dev] Reshuffling of rte_mbuf structure.

2015-10-31 Thread shesha Sreenivasamurthy (shesha)
In Cisco, we are using DPDK for a very high speed packet processor application. 
We don't use NIC TCP offload / RSS hashing. Putting those fields in the first 
cache-line - and the obligatory mb->next datum in the second cache line - 
causes significant LSU pressure and performance degradation. If it does not 
affect other applications, I would like to propose reshuffling of fields so 
that the obligator "next" field falls in first cache line and RSS hashing goes 
to next. If this re-shuffling indeed hurts other applications, another idea is 
to make it compile time configurable. Please provide feedback.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }


[dpdk-dev] [PATCH v6] mem: command line option to delete hugepage backing files

2015-10-28 Thread shesha Sreenivasamurthy (shesha)
When an application using huge-pages crash or exists, the hugetlbfs backing 
files are not cleaned up. This is a patch to clean those files. There are 
multi-process DPDK applications that may be benefited by those backing files. 
Therefore, I have made that configurable so that the application that does not 
need those backing files can remove them, thus not changing the current default 
behavior. The application itself can clean it up, however the rationale behind 
DPDK cleaning it up is, DPDK created it and therefore, it is better it unlinks 
it.

Signed-off-by: Shesha Sreenivasamurthy 
Acked-by: Sergio Gonzalez Monroy 
---
lib/librte_eal/common/eal_common_options.c | 12 
lib/librte_eal/common/eal_internal_cfg.h   |  1 +
lib/librte_eal/common/eal_options.h|  2 ++
lib/librte_eal/linuxapp/eal/eal_memory.c   | 30 ++
4 files changed, 45 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..5fe6374 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -79,6 +79,7 @@ eal_long_options[] = {
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
+   {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;

+   case OPT_HUGE_UNLINK_NUM:
+   conf->hugepage_unlink = 1;
+   break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
@@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
return -1;
}

+   if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+   RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
+   "be specified together with --"OPT_NO_HUGE"\n");
+   return -1;
+   }
+
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
@@ -906,6 +917,7 @@ eal_common_usage(void)
   "  -h, --help  This help\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
+  "  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after 
initalization\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..84b075f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -64,6 +64,7 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
+   volatile unsigned hugepage_unlink; /** < true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen 
Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index f6714d9..745f38c 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -63,6 +63,8 @@ enum {
OPT_PROC_TYPE_NUM,
#define OPT_NO_HPET   "no-hpet"
OPT_NO_HPET_NUM,
+#define OPT_HUGE_UNLINK"huge-unlink"
+   OPT_HUGE_UNLINK_NUM,
#define OPT_NO_HUGE   "no-huge"
OPT_NO_HUGE_NUM,
#define OPT_NO_PCI"no-pci"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ac2745e..c7e2485 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -786,6 +786,28 @@ copy_hugepages_to_shared_mem(struct hugepage_file * dst, 
int dest_size,
return 0;
}

+static int
+unlink_hugepage_files(struct hugepage_file *hugepg_tbl,
+   unsigned num_hp_info)
+{
+   unsigned socket, size;
+   int page, nrpages = 0;
+
+   /* get total number of hugepages */
+   for (size = 0; s

[dpdk-dev] [PATCH v5] mem: command line option to delete hugepage backing files

2015-10-27 Thread shesha Sreenivasamurthy (shesha)
When an application using huge-pages crash or exists, the hugetlbfs
backing files are not cleaned up. This is a patch to clean those files.
There are multi-process DPDK applications that may be benefited by those
backing files. Therefore, I have made that configurable so that the
application that does not need those backing files can remove them, thus
not changing the current default behavior. The application itself can
clean it up, however the rationale behind DPDK cleaning it up is, DPDK
created it and therefore, it is better it unlinks it.

Signed-off-by: Shesha Sreenivasamurthy 
Acked-by: Sergio Gonzalez Monroy 
---
 lib/librte_eal/common/eal_common_options.c | 12 
 lib/librte_eal/common/eal_internal_cfg.h   |  1 +
 lib/librte_eal/common/eal_options.h|  2 ++
 lib/librte_eal/linuxapp/eal/eal_memory.c   | 30
++
 4 files changed, 45 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..5fe6374 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -79,6 +79,7 @@ eal_long_options[] = {
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
+   {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;

+   case OPT_HUGE_UNLINK_NUM:
+   conf->hugepage_unlink = 1;
+   break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
@@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config
*internal_cfg)
return -1;
}

+   if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+   RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
+   "be specified together with --"OPT_NO_HUGE"\n");
+   return -1;
+   }
+
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
@@ -906,6 +917,7 @@ eal_common_usage(void)
   "  -h, --help  This help\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
+  "  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after
initalization\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..84b075f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -64,6 +64,7 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
+   volatile unsigned hugepage_unlink; /** < true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen
Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
diff --git a/lib/librte_eal/common/eal_options.h
b/lib/librte_eal/common/eal_options.h
index f6714d9..745f38c 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -63,6 +63,8 @@ enum {
OPT_PROC_TYPE_NUM,
 #define OPT_NO_HPET   "no-hpet"
OPT_NO_HPET_NUM,
+#define OPT_HUGE_UNLINK"huge-unlink"
+   OPT_HUGE_UNLINK_NUM,
 #define OPT_NO_HUGE   "no-huge"
OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI"no-pci"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ac2745e..c7e2485 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -786,6 +786,28 @@ copy_hugepages_to_shared_mem(struct hugepage_file *
dst, int dest_size,
return 0;
 }

+static int
+unlink_hugepage_files(struct hugepage_file *hugepg_tbl,
+   unsigned num_hp_info)
+{
+   unsigned socket, size;
+   int page, nrpages = 0;
+
+   /* get total number of hugepages */
+   for (size = 0; size < 

[dpdk-dev] [PATCH v5] mem: command line option to delete hugepage backing files

2015-10-27 Thread shesha Sreenivasamurthy (shesha)
When an application using huge-pages crash or exists, the hugetlbfs
backing files are not cleaned up. This is a patch to clean those files.
There are multi-process DPDK applications that may be benefited by those
backing files. Therefore, I have made that configurable so that the
application that does not need those backing files can remove them, thus
not changing the current default behavior. The application itself can
clean it up, however the rationale behind DPDK cleaning it up is, DPDK
created it and therefore, it is better it unlinks it.


Signed-off-by: Shesha Sreenivasamurthy 
Acked-by: Sergio Gonzalez Monroy 
---
 lib/librte_eal/common/eal_common_options.c | 12 
 lib/librte_eal/common/eal_internal_cfg.h   |  1 +
 lib/librte_eal/common/eal_options.h|  2 ++
 lib/librte_eal/linuxapp/eal/eal_memory.c   | 30
++
 4 files changed, 45 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..5fe6374 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -79,6 +79,7 @@ eal_long_options[] = {
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
+   {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;

+   case OPT_HUGE_UNLINK_NUM:
+   conf->hugepage_unlink = 1;
+   break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
@@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config
*internal_cfg)
return -1;
}

+   if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+   RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
+   "be specified together with --"OPT_NO_HUGE"\n");
+   return -1;
+   }
+
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
@@ -906,6 +917,7 @@ eal_common_usage(void)
   "  -h, --help  This help\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
+  "  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after
initalization\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..84b075f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -64,6 +64,7 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
+   volatile unsigned hugepage_unlink; /** < true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen
Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
diff --git a/lib/librte_eal/common/eal_options.h
b/lib/librte_eal/common/eal_options.h
index f6714d9..745f38c 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -63,6 +63,8 @@ enum {
OPT_PROC_TYPE_NUM,
 #define OPT_NO_HPET   "no-hpet"
OPT_NO_HPET_NUM,
+#define OPT_HUGE_UNLINK"huge-unlink"
+   OPT_HUGE_UNLINK_NUM,
 #define OPT_NO_HUGE   "no-huge"
OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI"no-pci"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ac2745e..c7e2485 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -786,6 +786,28 @@ copy_hugepages_to_shared_mem(struct hugepage_file *
dst, int dest_size,
return 0;
 }

+static int
+unlink_hugepage_files(struct hugepage_file *hugepg_tbl,
+   unsigned num_hp_info)
+{
+   unsigned socket, size;
+   int page, nrpages = 0;
+
+   /* get total number of hugepages */
+   for (size = 0; size <

[dpdk-dev] [PATCH v3] mem: command line option to delete hugepage backing files

2015-10-23 Thread shesha Sreenivasamurthy (shesha)
Understood and thanks for the clarification. Should I have to re-send
patch v3 or are we good here ?

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }


-Original Message-
From: Sergio Gonzalez Monroy 
Date: Friday, October 23, 2015 at 2:57 AM
To: Cisco Employee 
Cc: "dev at dpdk.org" , Bruce Richardson

Subject: Re: [dpdk-dev] [PATCH v3] mem: command line option to delete
hugepage backing files

On 22/10/2015 17:03, shesha Sreenivasamurthy (shesha) wrote:
> Sergio,
>Your comment regarding remap_all_functions is correct and can be fixed
> by unlinking in remap_all_hugepages() too. However, regarding you comment
> that ?unmap_unneeded_hugepages? will fail ? in the
> unmap_unneeded_hugepages() we do not unlink if final_va is equal to NULL
> guarded by RTE_EAL_SINGLE_FILE_SEGMENTS. My testing did not catch as
> RTE_EAL_SINGLE_FILE_SEGMENTS was set. Is there any reason why we should
> not skip unlinking if final_va is null always (removing ifdef
> RTE_EAL_SINGLE_FILE_SEGMENTS) ?
The issue with unmap_unneeded_hugepages happens regardless of
SINGLE_FILE_SEGMENT
being set or not.
The problem is that in that function, it assumes that no file has been
unlinked.
In fact, with SINGLE_FILE_SEGMENT, it might re-open files again if it
needs to truncate it,
so we need those files present in the file system.
> However, if you think having a separate function is better, I am all for
> it.
My initial thought was the same and to do the unlinking inside an
existing function,
but as you may realized, the code is not the most straight forward, and
the resulting
diff may be even bigger than with a single function.

I think the single function in v3 works properly because we only unlink
the files left after
we have done all this mapping-unmapping.

Sergio
> --
> - Thanks
> char * (*shesha) (uint64_t cache, uint8_t F00D)
> { return 0xC0DE; }
>
>
> -Original Message-
> From: Sergio Gonzalez Monroy 
> Date: Thursday, October 22, 2015 at 1:51 AM
> To: Cisco Employee 
> Cc: "dev at dpdk.org" , Bruce Richardson
> 
> Subject: Re: [dpdk-dev] [PATCH v3] mem: command line option to delete
> hugepage backing files
>
> On 21/10/2015 17:34, Bruce Richardson wrote:
>> On Wed, Oct 21, 2015 at 04:22:45PM +, shesha Sreenivasamurthy
>> (shesha) wrote:
>>> When an application using huge-pages crash or exists, the hugetlbfs
>>> backing files are not cleaned up. This is a patch to clean those files.
>>> There are multi-process DPDK applications that may be benefited by
>>>those
>>> backing files. Therefore, I have made that configurable so that the
>>> application that does not need those backing files can remove them,
>>>thus
>>> not changing the current default behavior. The application itself can
>>> clean it up, however the rationale behind DPDK cleaning it up is, DPDK
>>> created it and therefore, it is better it unlinks it.
>>>
>>> Signed-off-by: Shesha Sreenivasamurthy 
>>> ---
>>>lib/librte_eal/common/eal_common_options.c | 12 
>>>lib/librte_eal/common/eal_internal_cfg.h   |  1 +
>>>lib/librte_eal/common/eal_options.h|  2 ++
>>>lib/librte_eal/linuxapp/eal/eal_memory.c   | 30
>>> ++
>>>4 files changed, 45 insertions(+)
>>>
>> 
>>> +static int
>>> +unlink_hugepage_files(struct hugepage_file *hugepg_tbl,
>>> +   unsigned num_hp_info)
>>> +{
>>> +   unsigned socket, size;
>>> +   int page, nrpages = 0;
>>> +
>>> +   /* get total number of hugepages */
>>> +   for (size = 0; size < num_hp_info; size++)
>>> +   for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++)
>>> +   nrpages += 
>>> internal_config.hugepage_info[size].num_pages[socket];
>>> +
>>> +   for (page = 0; page < nrpages; page++) {
>>> +   struct hugepage_file *hp = &hugepg_tbl[page];
>>> +   if (hp->final_va != NULL && unlink(hp->filepath)) {
>>> +   RTE_LOG(WARNING, EAL, "%s(): Removing %s failed: %s\n",
>>> +   __func__, hp->filepath, strerror(errno));
>>> +   }
>>> +   }
>>> +   return 0;
>>> +}
>>> +
>>>/*
>>> * unmaps hugepages that are not going to be used. since we
>>>originally
>>> allocate
>>> * ALL hugepages (not just those we need), additional unmapping
>>>needs
>>> to
>>> be done.
>>> @@ -1289,6 +1311,14 @@ r

[dpdk-dev] [PATCH v3] mem: command line option to delete hugepage backing files

2015-10-22 Thread shesha Sreenivasamurthy (shesha)
Sergio,
  Your comment regarding remap_all_functions is correct and can be fixed
by unlinking in remap_all_hugepages() too. However, regarding you comment
that ?unmap_unneeded_hugepages? will fail ? in the
unmap_unneeded_hugepages() we do not unlink if final_va is equal to NULL
guarded by RTE_EAL_SINGLE_FILE_SEGMENTS. My testing did not catch as
RTE_EAL_SINGLE_FILE_SEGMENTS was set. Is there any reason why we should
not skip unlinking if final_va is null always (removing ifdef
RTE_EAL_SINGLE_FILE_SEGMENTS) ?

However, if you think having a separate function is better, I am all for
it.
--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }


-Original Message-
From: Sergio Gonzalez Monroy 
Date: Thursday, October 22, 2015 at 1:51 AM
To: Cisco Employee 
Cc: "dev at dpdk.org" , Bruce Richardson

Subject: Re: [dpdk-dev] [PATCH v3] mem: command line option to delete
hugepage backing files

On 21/10/2015 17:34, Bruce Richardson wrote:
> On Wed, Oct 21, 2015 at 04:22:45PM +, shesha Sreenivasamurthy
>(shesha) wrote:
>> When an application using huge-pages crash or exists, the hugetlbfs
>> backing files are not cleaned up. This is a patch to clean those files.
>> There are multi-process DPDK applications that may be benefited by those
>> backing files. Therefore, I have made that configurable so that the
>> application that does not need those backing files can remove them, thus
>> not changing the current default behavior. The application itself can
>> clean it up, however the rationale behind DPDK cleaning it up is, DPDK
>> created it and therefore, it is better it unlinks it.
>>
>> Signed-off-by: Shesha Sreenivasamurthy 
>> ---
>>   lib/librte_eal/common/eal_common_options.c | 12 
>>   lib/librte_eal/common/eal_internal_cfg.h   |  1 +
>>   lib/librte_eal/common/eal_options.h|  2 ++
>>   lib/librte_eal/linuxapp/eal/eal_memory.c   | 30
>> ++
>>   4 files changed, 45 insertions(+)
>>
> 
>> +static int
>> +unlink_hugepage_files(struct hugepage_file *hugepg_tbl,
>> +unsigned num_hp_info)
>> +{
>> +unsigned socket, size;
>> +int page, nrpages = 0;
>> +
>> +/* get total number of hugepages */
>> +for (size = 0; size < num_hp_info; size++)
>> +for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++)
>> +nrpages += 
>> internal_config.hugepage_info[size].num_pages[socket];
>> +
>> +for (page = 0; page < nrpages; page++) {
>> +struct hugepage_file *hp = &hugepg_tbl[page];
>> +if (hp->final_va != NULL && unlink(hp->filepath)) {
>> +RTE_LOG(WARNING, EAL, "%s(): Removing %s failed: %s\n",
>> +__func__, hp->filepath, strerror(errno));
>> +}
>> +}
>> +return 0;
>> +}
>> +
>>   /*
>>* unmaps hugepages that are not going to be used. since we originally
>> allocate
>>* ALL hugepages (not just those we need), additional unmapping needs
>>to
>> be done.
>> @@ -1289,6 +1311,14 @@ rte_eal_hugepage_init(void)
>>  goto fail;
>>  }
>>   
>> +/* free the hugepage backing files */
>> +if (internal_config.hugepage_unlink &&
>> +unlink_hugepage_files(tmp_hp,
>> +internal_config.num_hugepage_sizes) < 0) {
>> +RTE_LOG(ERR, EAL, "Unlinking hugepage backing files 
>> failed!\n");
>> +goto fail;
>> +}
>> +
> Sorry for the late comment, but...
>
> Rather than adding a whole new function to be called here, can the same
>effect
> not be got by adding in 2/3 lines like:
>   if (internal_config.hugepage_unlink)
>   unlink(hugetlb[i].filepath)
>
> at line 409 of eal_memory.c where were have done our final mmap of the
>file.
> [You also need the same couple of lines for the 32-bit special case at
>line 351].
> It would be a shorter diff.
>
> /Bruce
If you wanted to avoid the extra function call, I might be cleaner to
just unlink all files when
doing unmap_all_hugepages_orig.
My two cents: I think it would be easier to read/debug having a function
that "unlinks files" instead
of unlinking files at different points in map_all_hugepages.

Unfortunately the proposed approach does not work for all cases:
- If we have single file segment, map_all_hugepages does not get call a
second time, instead we call
   remap_all_hugepages
- If we use options -m or --socket-mem, because unmap_unneeded_hugepages
does not expect files
   already unlinked, it will fail when trying to unlink unneeded
hugepage files.

The current patch would work as we only unlink after
unmap_unneeded_hugepages.

Sergio




[dpdk-dev] [PATCH v4] mem: command line option to delete hugepage backing files

2015-10-21 Thread shesha Sreenivasamurthy (shesha)
When an application using huge-pages crash or exists, the hugetlbfs
backing files are not cleaned up. This is a patch to clean those files.
There are multi-process DPDK applications that may be benefited by those
backing files. Therefore, I have made that configurable so that the
application that does not need those backing files can remove them, thus
not changing the current default behavior. The application itself can
clean it up, however the rationale behind DPDK cleaning it up is, DPDK
created it and therefore, it is better it unlinks it.


Signed-off-by: Shesha Sreenivasamurthy 
---
 lib/librte_eal/common/eal_common_options.c | 12 
 lib/librte_eal/common/eal_internal_cfg.h   |  1 +
 lib/librte_eal/common/eal_options.h|  2 ++
 lib/librte_eal/linuxapp/eal/eal_memory.c   | 12 
 4 files changed, 27 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..5fe6374 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -79,6 +79,7 @@ eal_long_options[] = {
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
+   {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;

+   case OPT_HUGE_UNLINK_NUM:
+   conf->hugepage_unlink = 1;
+   break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
@@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config
*internal_cfg)
return -1;
}

+   if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+   RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
+   "be specified together with --"OPT_NO_HUGE"\n");
+   return -1;
+   }
+
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
@@ -906,6 +917,7 @@ eal_common_usage(void)
   "  -h, --help  This help\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
+  "  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after
initalization\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..84b075f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -64,6 +64,7 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
+   volatile unsigned hugepage_unlink; /** < true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen
Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
diff --git a/lib/librte_eal/common/eal_options.h
b/lib/librte_eal/common/eal_options.h
index f6714d9..745f38c 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -63,6 +63,8 @@ enum {
OPT_PROC_TYPE_NUM,
 #define OPT_NO_HPET   "no-hpet"
OPT_NO_HPET_NUM,
+#define OPT_HUGE_UNLINK"huge-unlink"
+   OPT_HUGE_UNLINK_NUM,
 #define OPT_NO_HUGE   "no-huge"
OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI"no-pci"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ac2745e..c6f383b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -348,6 +348,12 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
|| (hugepage_sz == RTE_PGSIZE_16G)) {
hugepg_tbl[i].final_va = hugepg_tbl[i].orig_va;
hugepg_tbl[i].orig_va = NULL;
+   if (internal_config.hugepage_unlink &&
+   unlink(hugepg_tbl[i].filepath)) {
+   

[dpdk-dev] [PATCH v3] mem: command line option to delete hugepage backing files

2015-10-21 Thread shesha Sreenivasamurthy (shesha)
When an application using huge-pages crash or exists, the hugetlbfs
backing files are not cleaned up. This is a patch to clean those files.
There are multi-process DPDK applications that may be benefited by those
backing files. Therefore, I have made that configurable so that the
application that does not need those backing files can remove them, thus
not changing the current default behavior. The application itself can
clean it up, however the rationale behind DPDK cleaning it up is, DPDK
created it and therefore, it is better it unlinks it.

Signed-off-by: Shesha Sreenivasamurthy 
---
 lib/librte_eal/common/eal_common_options.c | 12 
 lib/librte_eal/common/eal_internal_cfg.h   |  1 +
 lib/librte_eal/common/eal_options.h|  2 ++
 lib/librte_eal/linuxapp/eal/eal_memory.c   | 30
++
 4 files changed, 45 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..5fe6374 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -79,6 +79,7 @@ eal_long_options[] = {
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
+   {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;

+   case OPT_HUGE_UNLINK_NUM:
+   conf->hugepage_unlink = 1;
+   break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
@@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config
*internal_cfg)
return -1;
}

+   if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+   RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
+   "be specified together with --"OPT_NO_HUGE"\n");
+   return -1;
+   }
+
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
@@ -906,6 +917,7 @@ eal_common_usage(void)
   "  -h, --help  This help\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
+  "  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after
initalization\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..84b075f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -64,6 +64,7 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
+   volatile unsigned hugepage_unlink; /** < true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen
Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
diff --git a/lib/librte_eal/common/eal_options.h
b/lib/librte_eal/common/eal_options.h
index f6714d9..745f38c 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -63,6 +63,8 @@ enum {
OPT_PROC_TYPE_NUM,
 #define OPT_NO_HPET   "no-hpet"
OPT_NO_HPET_NUM,
+#define OPT_HUGE_UNLINK"huge-unlink"
+   OPT_HUGE_UNLINK_NUM,
 #define OPT_NO_HUGE   "no-huge"
OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI"no-pci"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ac2745e..c7e2485 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -786,6 +786,28 @@ copy_hugepages_to_shared_mem(struct hugepage_file *
dst, int dest_size,
return 0;
 }

+static int
+unlink_hugepage_files(struct hugepage_file *hugepg_tbl,
+   unsigned num_hp_info)
+{
+   unsigned socket, size;
+   int page, nrpages = 0;
+
+   /* get total number of hugepages */
+   for (size = 0; size < num_hp_info; size++)
+

[dpdk-dev] [PATCH v2] mem: command line option to delete hugepage backing files

2015-10-20 Thread shesha Sreenivasamurthy (shesha)
When an application using huge-pages crash or exists, the hugetlbfs
backing files are not cleaned up. This is a patch to clean those files.
There are multi-process DPDK applications that may be benefited by those
backing files. Therefore, I have made that configurable so that the
application that does not need those backing files can remove them, thus
not changing the current default behavior. The application itself can
clean it up, however the rationale behind DPDK cleaning it up is, DPDK
created it and therefore, it is better it unlinks it.

Signed-off-by: Shesha Sreenivasamurthy 
---
 lib/librte_eal/common/eal_common_options.c | 12 +
 lib/librte_eal/common/eal_internal_cfg.h   |  1 +
 lib/librte_eal/common/eal_options.h|  2 ++
 lib/librte_eal/linuxapp/eal/eal_memory.c   | 39
++
 4 files changed, 54 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..5fe6374 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -79,6 +79,7 @@ eal_long_options[] = {
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
+   {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;

+   case OPT_HUGE_UNLINK_NUM:
+   conf->hugepage_unlink = 1;
+   break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
@@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config
*internal_cfg)
return -1;
}

+   if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+   RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
+   "be specified together with --"OPT_NO_HUGE"\n");
+   return -1;
+   }
+
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
@@ -906,6 +917,7 @@ eal_common_usage(void)
   "  -h, --help  This help\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
+  "  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after
initalization\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..84b075f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -64,6 +64,7 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
+   volatile unsigned hugepage_unlink; /** < true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen
Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
diff --git a/lib/librte_eal/common/eal_options.h
b/lib/librte_eal/common/eal_options.h
index f6714d9..745f38c 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -63,6 +63,8 @@ enum {
OPT_PROC_TYPE_NUM,
 #define OPT_NO_HPET   "no-hpet"
OPT_NO_HPET_NUM,
+#define OPT_HUGE_UNLINK"huge-unlink"
+   OPT_HUGE_UNLINK_NUM,
 #define OPT_NO_HUGE   "no-huge"
OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI"no-pci"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ac2745e..2b86428 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -786,6 +786,37 @@ copy_hugepages_to_shared_mem(struct hugepage_file *
dst, int dest_size,
return 0;
 }

+static int
+unlink_hugepage_files(struct hugepage_file *hugepg_tbl,
+   struct hugepage_info *hpi,
+   unsigned num_hp_info)
+{
+   unsigned socket, size;
+   int page, nrpages = 0;
+
+   /* get total number of hugepages */
+   for (size = 0; s

[dpdk-dev] [PATCH v2] mem: command line option to delete hugepage backing files

2015-10-20 Thread shesha Sreenivasamurthy (shesha)
When an application using huge-pages crash or exists, the hugetlbfs backing 
files are not cleaned up. This is a patch to clean those files. There are 
multi-process DPDK applications that may be benefited by those backing files. 
Therefore, I have made that configurable so that the application that does not 
need those backing files can remove them, thus not changing the current default 
behavior. The application itself can clean it up, however the rationale behind 
DPDK cleaning it up is, DPDK created it and therefore, it is better it unlinks 
it.

Signed-off-by: Shesha Sreenivasamurthy mailto:shesha at 
cisco.com>>
---
lib/librte_eal/common/eal_common_options.c | 12 +
lib/librte_eal/common/eal_internal_cfg.h   |  1 +
lib/librte_eal/common/eal_options.h|  2 ++
lib/librte_eal/linuxapp/eal/eal_memory.c   | 39 ++
4 files changed, 54 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..5fe6374 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -79,6 +79,7 @@ eal_long_options[] = {
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
+ {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;
+ case OPT_HUGE_UNLINK_NUM:
+ conf->hugepage_unlink = 1;
+ break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
@@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
return -1;
}
+ if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
+ "be specified together with --"OPT_NO_HUGE"\n");
+ return -1;
+ }
+
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
@@ -906,6 +917,7 @@ eal_common_usage(void)
   "  -h, --help  This help\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
+"  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after 
initalization\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..84b075f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -64,6 +64,7 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
+ volatile unsigned hugepage_unlink; /** < true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index f6714d9..745f38c 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -63,6 +63,8 @@ enum {
OPT_PROC_TYPE_NUM,
#define OPT_NO_HPET   "no-hpet"
OPT_NO_HPET_NUM,
+#define OPT_HUGE_UNLINK"huge-unlink"
+ OPT_HUGE_UNLINK_NUM,
#define OPT_NO_HUGE   "no-huge"
OPT_NO_HUGE_NUM,
#define OPT_NO_PCI"no-pci"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ac2745e..2b86428 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -786,6 +786,37 @@ copy_hugepages_to_shared_mem(struct hugepage_file * dst, 
int dest_size,
return 0;
}
+static int
+unlink_hugepage_files(struct hugepage_file *hugepg_tbl,
+ struct hugepage_info *hpi,
+ unsigned num_hp_info)
+{
+ unsigned socket, size;
+ int page, nrpages = 0;
+
+ /* get total number of hugepages */
+ for (size = 0; size < num_hp_info; size++)
+ for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++)
+ nrpages += internal_config.hugepage_info[size].num_pages[socket];
+
+ for (size = 0; size < num_hp_info; size++) {
+ for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++) {
+ for (page = 0; page < nrpages; page++) {
+ struct hugepage_file *hp = &hugepg_tbl[page];
+ if ((hp->size == hpi[size].hugepage_sz) &&
+ (hp->socket_id == (int) socket) &&
+ hp->final_va != 

[dpdk-dev] [PATCH] mem: Command line option to delete hugepage backing files

2015-10-14 Thread shesha Sreenivasamurthy (shesha)
When an application using huge-pages crash or exists, the hugetlbfs backing 
files are not cleaned up. This is a patch to clean those files. There are 
multi-process DPDK applications that may be benefited by those backing files. 
Therefore, I have made that configurable so that the application that does not 
need those backing files can remove them, thus not changing the current default 
behavior. The application itself can clean it up, however the rationale behind 
DPDK cleaning it up is, DPDK created it therefore, it is better it unlinks it.

---
lib/librte_eal/common/eal_common_options.c | 12 ++
lib/librte_eal/common/eal_internal_cfg.h   |  1 +
lib/librte_eal/common/eal_options.h|  2 ++
lib/librte_eal/linuxapp/eal/eal_memory.c   | 37 ++
4 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..5fe6374 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -79,6 +79,7 @@ eal_long_options[] = {
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
+ {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;
+ case OPT_HUGE_UNLINK_NUM:
+ conf->hugepage_unlink = 1;
+ break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
@@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
return -1;
}
+ if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
+ "be specified together with --"OPT_NO_HUGE"\n");
+ return -1;
+ }
+
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
@@ -906,6 +917,7 @@ eal_common_usage(void)
   "  -h, --help  This help\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
+"  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after 
initalization\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..84b075f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -64,6 +64,7 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
+ volatile unsigned hugepage_unlink; /** < true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index f6714d9..745f38c 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -63,6 +63,8 @@ enum {
OPT_PROC_TYPE_NUM,
#define OPT_NO_HPET   "no-hpet"
OPT_NO_HPET_NUM,
+#define OPT_HUGE_UNLINK"huge-unlink"
+ OPT_HUGE_UNLINK_NUM,
#define OPT_NO_HUGE   "no-huge"
OPT_NO_HUGE_NUM,
#define OPT_NO_PCI"no-pci"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ac2745e..016cac6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -786,6 +786,37 @@ copy_hugepages_to_shared_mem(struct hugepage_file * dst, 
int dest_size,
return 0;
}
+static int
+unlink_hugepage_bkup_files(struct hugepage_file *hugepg_tbl,
+ struct hugepage_info *hpi,
+ unsigned num_hp_info)
+{
+ unsigned socket, size;
+ int page, nrpages = 0;
+
+ /* get total number of hugepages */
+ for (size = 0; size < num_hp_info; size++)
+ for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++)
+ nrpages += internal_config.hugepage_info[size].num_pages[socket];
+
+ for (size = 0; size < num_hp_info; size++) {
+ for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++) {
+ for (page = 0; page < nrpages; page++) {
+ struct hugepage_file *hp = &hugepg_tbl[page];
+ if ((hp->size == hpi[size].hugepage_sz) &&
+ (hp->socket_id == (int) socket) &&
+hp->final_va != NULL) {
+if (unlink(hp->f

[dpdk-dev] [PATCH] mem: Command line option to delete hugepage backing files

2015-10-14 Thread shesha Sreenivasamurthy (shesha)
---
lib/librte_eal/common/eal_common_options.c | 12 ++
lib/librte_eal/common/eal_internal_cfg.h   |  1 +
lib/librte_eal/common/eal_options.h|  2 ++
lib/librte_eal/linuxapp/eal/eal_memory.c   | 37 ++
4 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..5fe6374 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -79,6 +79,7 @@ eal_long_options[] = {
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
+ {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_hugetlbfs = 1;
break;
+ case OPT_HUGE_UNLINK_NUM:
+ conf->hugepage_unlink = 1;
+ break;
+
case OPT_NO_PCI_NUM:
conf->no_pci = 1;
break;
@@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
return -1;
}
+ if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
+ "be specified together with --"OPT_NO_HUGE"\n");
+ return -1;
+ }
+
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
@@ -906,6 +917,7 @@ eal_common_usage(void)
   "  -h, --help  This help\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
+"  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after 
initalization\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..84b075f 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -64,6 +64,7 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
+ volatile unsigned hugepage_unlink; /** < true to unlink backing files */
volatile unsigned xen_dom0_support; /**< support app running on Xen Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index f6714d9..745f38c 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -63,6 +63,8 @@ enum {
OPT_PROC_TYPE_NUM,
#define OPT_NO_HPET   "no-hpet"
OPT_NO_HPET_NUM,
+#define OPT_HUGE_UNLINK"huge-unlink"
+ OPT_HUGE_UNLINK_NUM,
#define OPT_NO_HUGE   "no-huge"
OPT_NO_HUGE_NUM,
#define OPT_NO_PCI"no-pci"
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ac2745e..016cac6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -786,6 +786,37 @@ copy_hugepages_to_shared_mem(struct hugepage_file * dst, 
int dest_size,
return 0;
}
+static int
+unlink_hugepage_bkup_files(struct hugepage_file *hugepg_tbl,
+ struct hugepage_info *hpi,
+ unsigned num_hp_info)
+{
+ unsigned socket, size;
+ int page, nrpages = 0;
+
+ /* get total number of hugepages */
+ for (size = 0; size < num_hp_info; size++)
+ for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++)
+ nrpages += internal_config.hugepage_info[size].num_pages[socket];
+
+ for (size = 0; size < num_hp_info; size++) {
+ for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++) {
+ for (page = 0; page < nrpages; page++) {
+ struct hugepage_file *hp = &hugepg_tbl[page];
+ if ((hp->size == hpi[size].hugepage_sz) &&
+ (hp->socket_id == (int) socket) &&
+hp->final_va != NULL) {
+if (unlink(hp->filepath)) {
+RTE_LOG(WARNING, EAL, "%s(): Removing %s failed: %s\n",
+__func__, hp->filepath, strerror(errno));
+}
+}
+ } /* foreach page */
+ } /* foreach socket */
+ } /* foreach pagesize */
+ return 0;
+}
+
/*
  * unmaps hugepages that are not going to be used. since we originally allocate
  * ALL hugepages (not just those we need), additional unmapping needs to be 
done.
@@ -1290,6 +1321,12 @@ rte_eal_hugepage_init(void)
}
/* free the temporary hugepag

[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-30 Thread shesha Sreenivasamurthy (shesha)
My bad that I said its not working, apologies.

Isn?t it correct to say that single process application do not benefit from 
having backing files ? In that case can make this configurable by passing a 
command line argument that will either unlink or keep the backing files, 
defaulting it to keeping the backing files. Single process application to do 
not need these files around can pass additional param to unlink these files ?

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: "Ananyev, Konstantin" mailto:konstantin.anan...@intel.com>>
Date: Wednesday, September 30, 2015 at 2:53 PM
To: Cisco Employee mailto:shesha at cisco.com>>, "dev at 
dpdk.org<mailto:dev at dpdk.org>" mailto:dev at dpdk.org>>
Cc: "Michael S. Tsirkin" mailto:mst at redhat.com>>
Subject: RE: [dpdk-dev] Unlinking hugepage backing file after initialiation



-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of shesha Sreenivasamurthy 
(shesha)
Sent: Wednesday, September 30, 2015 10:44 PM
To: dev at dpdk.org<mailto:dev at dpdk.org>
Cc: Michael S. Tsirkin
Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation
What I heard is the following: A multi-process DPDK application, working either 
in master-worker or master-slave fashion, can
potentially benefit by keeping the backing files in hugetlbfs. However, it is 
does not work today as the pages are cleaned and added
back when the application restarts.

Who says it is not working?
I admit that DPDK MP model is probably a bit constrained, but it does work.
It is probably good to read some docs:
http://dpdk.org/doc/guides/prog_guide/multi_proc_support.html
and/or look at the code that does MP support inside DPDK.
I think that might make things clearer.
Konstantin

On the other hand, for a single process application there is actually no 
benefit keeping the pages
around.
Therefore, I was wondering if we can make this configurable by passing a 
command line argument that will either unlink or keep the
backing files.
--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }
From: "Michael S. Tsirkin" mailto:mst at 
redhat.com><mailto:m...@redhat.com>>
Date: Tuesday, September 29, 2015 at 2:35 PM
To: Cisco Employee mailto:shesha at 
cisco.com><mailto:shesha at cisco.com>>
Cc: "Xie, Huawei" mailto:huawei.xie at 
intel.com><mailto:huawei.xie at intel.com>>, "dev at dpdk.org<mailto:dev at 
dpdk.org><mailto:dev at dpdk.org>"
mailto:dev at dpdk.org><mailto:dev at dpdk.org>>
Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation
On Tue, Sep 29, 2015 at 05:50:00PM +, shesha Sreenivasamurthy (shesha) 
wrote:
Sure. Then, is there any real reason why the backing files should not be
unlinked ?
AFAIK qemu unlinks them already.
--
MST




[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-30 Thread shesha Sreenivasamurthy (shesha)
What I heard is the following: A multi-process DPDK application, working either 
in master-worker or master-slave fashion, can potentially benefit by keeping 
the backing files in hugetlbfs. However, it is does not work today as the pages 
are cleaned and added back when the application restarts. On the other hand, 
for a single process application there is actually no benefit keeping the pages 
around.

Therefore, I was wondering if we can make this configurable by passing a 
command line argument that will either unlink or keep the backing files.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: "Michael S. Tsirkin" mailto:m...@redhat.com>>
Date: Tuesday, September 29, 2015 at 2:35 PM
To: Cisco Employee mailto:shesha at cisco.com>>
Cc: "Xie, Huawei" mailto:huawei.xie at intel.com>>, 
"dev at dpdk.org<mailto:dev at dpdk.org>" mailto:dev at 
dpdk.org>>
Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation

On Tue, Sep 29, 2015 at 05:50:00PM +, shesha Sreenivasamurthy (shesha) 
wrote:
Sure. Then, is there any real reason why the backing files should not be
unlinked ?

AFAIK qemu unlinks them already.

--
MST



[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-29 Thread shesha Sreenivasamurthy (shesha)
Sure. Then, is there any real reason why the backing files should not be 
unlinked ?

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: "Michael S. Tsirkin" mailto:m...@redhat.com>>
Date: Tuesday, September 29, 2015 at 9:16 AM
To: Cisco Employee mailto:shesha at cisco.com>>
Cc: "Xie, Huawei" mailto:huawei.xie at intel.com>>, 
"dev at dpdk.org<mailto:dev at dpdk.org>" mailto:dev at 
dpdk.org>>
Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation

On Tue, Sep 29, 2015 at 03:48:08PM +, shesha Sreenivasamurthy (shesha) 
wrote:
If huge pages are allocated for the guest and if the guest crashes there may be
a chance that the new guest may not be able to get huge pages again as some
other guest or process on the host used it. But I am not able to understand
memory corruption you are talking about. In my opinion, if a process using a
piece of memory goes away, it should not re-attach to the same piece of memory
without running a sanity check on it.

guest memory is allocated an freed by hypervisor, right?
I don't think it's dpdk's job.

--
MST



[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-29 Thread shesha Sreenivasamurthy (shesha)
If huge pages are allocated for the guest and if the guest crashes there may be 
a chance that the new guest may not be able to get huge pages again as some 
other guest or process on the host used it. But I am not able to understand 
memory corruption you are talking about. In my opinion, if a process using a 
piece of memory goes away, it should not re-attach to the same piece of memory 
without running a sanity check on it.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: "Xie, Huawei" mailto:huawei@intel.com>>
Date: Tuesday, September 29, 2015 at 8:15 AM
To: Cisco Employee mailto:shesha at cisco.com>>
Cc: "dev at dpdk.org<mailto:dev at dpdk.org>" mailto:dev at 
dpdk.org>>, "ms >> Michael S. Tsirkin" mailto:mst at 
redhat.com>>
Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation

On 9/29/2015 10:38 AM, Xie, Huawei wrote:
On 9/29/2015 8:04 AM, shesha Sreenivasamurthy (shesha) wrote:
Hello,
As of DPDK2.1, backing files are created in hugetablefs during mapping (in 
eal_memory.c::rte_eal_hugepage_init()) and these files are not cleaned up 
(unlinked) after initialization (mmap-ing). This means, when the application 
crashes or stopped, the memory is still consumed. Therefore, is there any 
reason not to unlink backing files after initialization ? If no, I will send a 
patch for the change.
shesha:
You remind me the virtio unexpected crashing issue. DPDK runs in user
space. It is quite possible it dies unexpectedly, either crash or being
killed.
When the dpdk virtio app crashes, it doesn't have a chance to notify
host, so host is still using its memory, backed by guest huge page.
If huge page files are still reserved in hugetlbfs, we have a chance to
recover virtio first, then unlink the huge pages.
Otherwise if the huge pages are allocated by other process, its memory
could be corrupted by host.

Certainly it is not implemented like that for this purpose, but i think
it is a temporary solution for this user space virtio driver issue.

I realized it is not a virtio specific issue, but apply to all user
space driver.
And the chance is very very small.

Also commented by Bruce/Konstantin, it is implemented this way for
multiple processes.


/huawei






--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }






[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-29 Thread shesha Sreenivasamurthy (shesha)
What do you mean by secondary process attaching to primary process 
(Master-slave setup ?) ? The first process crashed, how can we be sure that 
memory is not half written ?

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: Bruce Richardson mailto:bruce.richard...@intel.com>>
Organization: Intel Shannon Ltd.
Date: Tuesday, September 29, 2015 at 4:14 AM
To: "Ananyev, Konstantin" mailto:konstantin.ananyev at intel.com>>
Cc: Cisco Employee mailto:shesha at cisco.com>>, "dev at 
dpdk.org<mailto:dev at dpdk.org>" mailto:dev at dpdk.org>>
Subject: Re: [dpdk-dev] Unlinking hugepage backing file after initialiation

On Tue, Sep 29, 2015 at 09:03:15AM +, Ananyev, Konstantin wrote:
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of shesha 
> Sreenivasamurthy (shesha)
> Sent: Tuesday, September 29, 2015 1:04 AM
> To: dev at dpdk.org<mailto:dev at dpdk.org>
> Subject: [dpdk-dev] Unlinking hugepage backing file after initialiation
>
> Hello,
> As of DPDK2.1, backing files are created in hugetablefs during mapping (in 
> eal_memory.c::rte_eal_hugepage_init()) and these files are
> not cleaned up (unlinked) after initialization (mmap-ing). This means, when 
> the application crashes or stopped, the memory is still
> consumed. Therefore, is there any reason not to unlink backing files after 
> initialization
For secondary process(es) to be able to open/map them too?
Konstantin

Exactly. The hugepages are kept present on the file system so that secondary
processes can use them to attach to a primary process memory in a multi-process
setup.
What is done instead is that any old hugepage files are cleaned up when the
application starts (or restarts).

Regards,
/Bruce

>? If no, I will send a patch for the change.
>
> --
> - Thanks
> char * (*shesha) (uint64_t cache, uint8_t F00D)
> { return 0xC0DE; }



[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-29 Thread shesha Sreenivasamurthy (shesha)
Additional info:

Before staring Application:
-
cat /sys/devices/system/node/node*/meminfo  | grep HugePages_
Node 0 HugePages_Total:  2048
Node 0 HugePages_Free:   2048
Node 0 HugePages_Surp:  0
Node 1 HugePages_Total:  2048
Node 1 HugePages_Free:   2048
Node 1 HugePages_Surp:  0

While application is running:
-
cat /sys/devices/system/node/node*/meminfo  | grep HugePages_
Node 0 HugePages_Total:  2048
Node 0 HugePages_Free:   1536
Node 0 HugePages_Surp:  0
Node 1 HugePages_Total:  2048
Node 1 HugePages_Free:   1536
Node 1 HugePages_Surp:  0

After Application is stopped:
-
cat /sys/devices/system/node/node*/meminfo  | grep HugePages_
Node 0 HugePages_Total:  2048
Node 0 HugePages_Free:   1536
Node 0 HugePages_Surp:  0
Node 1 HugePages_Total:  2048
Node 1 HugePages_Free:   1536
Node 1 HugePages_Surp:  0

With UNLINKING in eal_memory.c::rte_eal_hugepage_init() and after application 
is stopped:

cat /sys/devices/system/node/node*/meminfo  | grep HugePages_
Node 0 HugePages_Total:  2048
Node 0 HugePages_Free:   2048
Node 0 HugePages_Surp:  0
Node 1 HugePages_Total:  2048
Node 1 HugePages_Free:   2048
Node 1 HugePages_Surp:  0

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }

From: dev mailto:dev-bounces at dpdk.org>> on behalf 
of Cisco Employee mailto:she...@cisco.com>>
Date: Monday, September 28, 2015 at 5:04 PM
To: "dev at dpdk.org" mailto:dev at 
dpdk.org>>
Subject: [dpdk-dev] Unlinking hugepage backing file after initialiation

Hello,
As of DPDK2.1, backing files are created in hugetablefs during mapping (in 
eal_memory.c::rte_eal_hugepage_init()) and these files are not cleaned up 
(unlinked) after initialization (mmap-ing). This means, when the application 
crashes or stopped, the memory is still consumed. Therefore, is there any 
reason not to unlink backing files after initialization ? If no, I will send a 
patch for the change.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }



[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-09-29 Thread shesha Sreenivasamurthy (shesha)
Hello,
As of DPDK2.1, backing files are created in hugetablefs during mapping (in 
eal_memory.c::rte_eal_hugepage_init()) and these files are not cleaned up 
(unlinked) after initialization (mmap-ing). This means, when the application 
crashes or stopped, the memory is still consumed. Therefore, is there any 
reason not to unlink backing files after initialization ? If no, I will send a 
patch for the change.

--
- Thanks
char * (*shesha) (uint64_t cache, uint8_t F00D)
{ return 0xC0DE; }