The is an RFC patch demonstrating automatic figure references
in the documentation. The figure numbers in the generated
Html and PDF docs with by automatically numbered based on
section. Requires Sphinx >= 1.3.

The patch makes the following changes.

* Changes image:: tag to figure:: and moves image caption
  to the figure.

* Adds captions to figures that didn't previously have any.

* Un-templates the |image-name| substitution definitions
  into explicit figure:: tags. They weren't used more
  than once anyway and Sphinx doesn't support them
  for figure.

* Adds a target to each image that didn't previously
  have one so that they can be cross-referenced.

* Renamed existing image target to match the image
  name for consistency.

* Replaces the Figures lists with automatic :numref:
  :ref: entries to generate automatic numbering
  and captions.

* Replaces "Figure" references with automatic :numref:
  references.

Note: a V2 patch would be required to do the same for
      tables.

Signed-off-by: John McNamara <john.mcnamara at intel.com>
---
 doc/guides/conf.py                                 |   2 +
 doc/guides/nics/index.rst                          |  18 ++-
 doc/guides/nics/intel_vf.rst                       |  37 ++---
 doc/guides/nics/virtio.rst                         |  18 ++-
 doc/guides/nics/vmxnet3.rst                        |  18 ++-
 doc/guides/prog_guide/env_abstraction_layer.rst    |   8 +-
 doc/guides/prog_guide/index.rst                    |  92 +++++++-----
 doc/guides/prog_guide/ivshmem_lib.rst              |   8 +-
 doc/guides/prog_guide/kernel_nic_interface.rst     |  40 ++---
 .../prog_guide/link_bonding_poll_mode_drv_lib.rst  |  43 ++++--
 doc/guides/prog_guide/lpm6_lib.rst                 |   8 +-
 doc/guides/prog_guide/lpm_lib.rst                  |   8 +-
 doc/guides/prog_guide/malloc_lib.rst               |   9 +-
 doc/guides/prog_guide/mbuf_lib.rst                 |  20 +--
 doc/guides/prog_guide/mempool_lib.rst              |  32 ++--
 doc/guides/prog_guide/multi_proc_support.rst       |   9 +-
 doc/guides/prog_guide/overview.rst                 |   9 +-
 doc/guides/prog_guide/packet_distrib_lib.rst       |  15 +-
 doc/guides/prog_guide/packet_framework.rst         |  81 +++++-----
 doc/guides/prog_guide/qos_framework.rst            | 163 +++++++--------------
 doc/guides/prog_guide/ring_lib.rst                 | 159 +++++++++++---------
 doc/guides/sample_app_ug/dist_app.rst              |  20 ++-
 doc/guides/sample_app_ug/exception_path.rst        |   8 +-
 doc/guides/sample_app_ug/index.rst                 |  58 ++++----
 doc/guides/sample_app_ug/intel_quickassist.rst     |  11 +-
 doc/guides/sample_app_ug/kernel_nic_interface.rst  |   9 +-
 doc/guides/sample_app_ug/l2_forward_job_stats.rst  |  23 +--
 .../sample_app_ug/l2_forward_real_virtual.rst      |  22 +--
 .../sample_app_ug/l3_forward_access_ctrl.rst       |  21 ++-
 doc/guides/sample_app_ug/load_balancer.rst         |   9 +-
 doc/guides/sample_app_ug/multi_process.rst         |  36 ++---
 doc/guides/sample_app_ug/qos_scheduler.rst         |   9 +-
 doc/guides/sample_app_ug/quota_watermark.rst       |  36 ++---
 doc/guides/sample_app_ug/test_pipeline.rst         |   9 +-
 doc/guides/sample_app_ug/vhost.rst                 |  45 ++----
 doc/guides/sample_app_ug/vm_power_management.rst   |  18 +--
 doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst   |  11 +-
 doc/guides/xen/pkt_switch.rst                      |  30 ++--
 38 files changed, 539 insertions(+), 633 deletions(-)

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index b1ef323..1bc031f 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -41,6 +41,8 @@ release = version

 master_doc = 'index'

+numfig = True
+
 latex_documents = [
     ('index',
      'doc.tex',
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index aadbae3..1ee67fa 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -50,14 +50,20 @@ Network Interface Controller Drivers

 **Figures**

-:ref:`Figure 1. Virtualization for a Single Port NIC in SR-IOV Mode 
<nic_figure_1>`
+:numref:`figure_single_port_nic` :ref:`figure_single_port_nic`

-:ref:`Figure 2. SR-IOV Performance Benchmark Setup <nic_figure_2>`
+:numref:`figure_perf_benchmark` :ref:`figure_perf_benchmark`

-:ref:`Figure 3. Fast Host-based Packet Processing <nic_figure_3>`
+:numref:`figure_fast_pkt_proc` :ref:`figure_fast_pkt_proc`

-:ref:`Figure 4. SR-IOV Inter-VM Communication <nic_figure_4>`
+:numref:`figure_inter_vm_comms` :ref:`figure_inter_vm_comms`

-:ref:`Figure 5. Virtio Host2VM Communication Example Using KNI vhost Back End 
<nic_figure_5>`
+:numref:`figure_host_vm_comms` :ref:`figure_host_vm_comms`

-:ref:`Figure 6. Virtio Host2VM Communication Example Using Qemu vhost Back End 
<nic_figure_6>`
+:numref:`figure_host_vm_comms_qemu` :ref:`figure_host_vm_comms_qemu`
+
+:numref:`figure_vmxnet3_int` :ref:`figure_vmxnet3_int`
+
+:numref:`figure_vswitch_vm` :ref:`figure_vswitch_vm`
+
+:numref:`figure_vm_vm_comms` :ref:`figure_vm_vm_comms`
diff --git a/doc/guides/nics/intel_vf.rst b/doc/guides/nics/intel_vf.rst
index e773627..17e83a2 100644
--- a/doc/guides/nics/intel_vf.rst
+++ b/doc/guides/nics/intel_vf.rst
@@ -49,9 +49,9 @@ SR-IOV Mode Utilization in a DPDK Environment
 The DPDK uses the SR-IOV feature for hardware-based I/O sharing in IOV mode.
 Therefore, it is possible to partition SR-IOV capability on Ethernet 
controller NIC resources logically and
 expose them to a virtual machine as a separate PCI function called a "Virtual 
Function".
-Refer to Figure 10.
+Refer to :numref:`figure_single_port_nic`.

-Therefore, a NIC is logically distributed among multiple virtual machines (as 
shown in Figure 10),
+Therefore, a NIC is logically distributed among multiple virtual machines (as 
shown in :numref:`figure_single_port_nic`),
 while still having global data in common to share with the Physical Function 
and other Virtual Functions.
 The DPDK fm10kvf, i40evf, igbvf or ixgbevf as a Poll Mode Driver (PMD) serves 
for the Intel? 82576 Gigabit Ethernet Controller,
 Intel? Ethernet Controller I350 family, Intel? 82599 10 Gigabit Ethernet 
Controller NIC,
@@ -72,11 +72,12 @@ For more detail on SR-IOV, please refer to the following 
documents:

 *   `Scalable I/O Virtualized Servers 
<http://www.intel.com/content/www/us/en/virtualization/server-virtualization/scalable-i-o-virtualized-servers-paper.html>`_

-.. _nic_figure_1:
+.. _figure_single_port_nic:

-**Figure 1. Virtualization for a Single Port NIC in SR-IOV Mode**
+.. figure:: img/single_port_nic.*
+
+   Virtualization for a Single Port NIC in SR-IOV Mode

-.. image:: img/single_port_nic.*

 Physical and Virtual Function Infrastructure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -548,13 +549,14 @@ The setup procedure is as follows:
         can also be used to bind and unbind devices to a virtual machine in 
Ubuntu.
         If this option is used, step 6 in the instructions provided will be 
different.

-    *   The Virtual Machine Monitor (see Figure 11) is equivalent to a Host OS 
with KVM installed as described in the instructions.
+    *   The Virtual Machine Monitor (see :numref:`figure_perf_benchmark`) is 
equivalent to a Host OS with KVM installed as described in the instructions.
+
+.. _figure_perf_benchmark:

-.. _nic_figure_2:
+.. figure:: img/perf_benchmark.*

-**Figure 2. Performance Benchmark Setup**
+   Performance Benchmark Setup

-.. image:: img/perf_benchmark.*

 DPDK SR-IOV PMD PF/VF Driver Usage Model
 ----------------------------------------
@@ -569,14 +571,15 @@ the DPDK VF PMD driver performs the same throughput 
result as a non-VT native en
 With such host instance fast packet processing, lots of services such as 
filtering, QoS,
 DPI can be offloaded on the host fast path.

-Figure 12 shows the scenario where some VMs directly communicate externally 
via a VFs,
+:numref:`figure_fast_pkt_proc` shows the scenario where some VMs directly 
communicate externally via a VFs,
 while others connect to a virtual switch and share the same uplink bandwidth.

-.. _nic_figure_3:
+.. _figure_fast_pkt_proc:
+
+.. figure:: img/fast_pkt_proc.*

-**Figure 3. Fast Host-based Packet Processing**
+   Fast Host-based Packet Processing

-.. image:: img/fast_pkt_proc.*

 SR-IOV (PF/VF) Approach for Inter-VM Communication
 --------------------------------------------------
@@ -587,7 +590,7 @@ So VF-to-VF traffic within the same physical port 
(VM0<->VM1) have hardware acce
 However, when VF crosses physical ports (VM0<->VM2), there is no such hardware 
bridge.
 In this case, the DPDK PMD PF driver provides host forwarding between such VMs.

-Figure 13 shows an example.
+:numref:`figure_inter_vm_comms` shows an example.
 In this case an update of the MAC address lookup tables in both the NIC and 
host DPDK application is required.

 In the NIC, writing the destination of a MAC address belongs to another cross 
device VM to the PF specific pool.
@@ -598,8 +601,8 @@ that is, the packet is forwarded to the correct PF pool.
 The SR-IOV NIC switch forwards the packet to a specific VM according to the 
MAC destination address
 which belongs to the destination VF on the VM.

-.. _nic_figure_4:
+.. _figure_inter_vm_comms:

-**Figure 4. Inter-VM Communication**
+.. figure:: img/inter_vm_comms.*

-.. image:: img/inter_vm_comms.*
+   Inter-VM Communication
diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst
index 073d980..9f18b3a 100644
--- a/doc/guides/nics/virtio.rst
+++ b/doc/guides/nics/virtio.rst
@@ -106,11 +106,12 @@ Virtio with kni vhost Back End

 This section demonstrates kni vhost back end example setup for Phy-VM 
Communication.

-.. _nic_figure_5:
+.. _figure_host_vm_comms:

-**Figure 5. Host2VM Communication Example Using kni vhost Back End**
+.. figure:: img/host_vm_comms.*
+
+   Host2VM Communication Example Using kni vhost Back End

-.. image:: img/host_vm_comms.*

 Host2VM communication example

@@ -174,7 +175,9 @@ Host2VM communication example

     We use testpmd as the forwarding application in this example.

-    .. image:: img/console.*
+    .. figure:: img/console.*
+
+       Running testpmd

 #.  Use IXIA packet generator to inject a packet stream into the KNI physical 
port.

@@ -185,11 +188,12 @@ Host2VM communication example
 Virtio with qemu virtio Back End
 --------------------------------

-.. _nic_figure_6:
+.. _figure_host_vm_comms_qemu:
+
+.. figure:: img/host_vm_comms_qemu.*

-**Figure 6. Host2VM Communication Example Using qemu vhost Back End**
+   Host2VM Communication Example Using qemu vhost Back End

-.. image:: img/host_vm_comms_qemu.*

 .. code-block:: console

diff --git a/doc/guides/nics/vmxnet3.rst b/doc/guides/nics/vmxnet3.rst
index 3aa5b40..fe32a41 100644
--- a/doc/guides/nics/vmxnet3.rst
+++ b/doc/guides/nics/vmxnet3.rst
@@ -121,7 +121,11 @@ The following prerequisites apply:
 *   Before starting a VM, a VMXNET3 interface to a VM through VMware vSphere 
Client must be assigned.
     This is shown in the figure below.

-.. image:: img/vmxnet3_int.*
+.. _figure_vmxnet3_int:
+
+.. figure:: img/vmxnet3_int.*
+
+   Assigning a VMXNET3 interface to a VM using VMware vSphere Client

 .. note::

@@ -142,7 +146,11 @@ VMXNET3 with a Native NIC Connected to a vSwitch

 This section describes an example setup for Phy-vSwitch-VM-Phy communication.

-.. image:: img/vswitch_vm.*
+.. _figure_vswitch_vm:
+
+.. figure:: img/vswitch_vm.*
+
+   VMXNET3 with a Native NIC Connected to a vSwitch

 .. note::

@@ -159,7 +167,11 @@ VMXNET3 Chaining VMs Connected to a vSwitch

 The following figure shows an example VM-to-VM communication over a 
Phy-VM-vSwitch-VM-Phy communication channel.

-.. image:: img/vm_vm_comms.*
+.. _figure_vm_vm_comms:
+
+.. figure:: img/vm_vm_comms.*
+
+   VMXNET3 Chaining VMs Connected to a vSwitch

 .. note::

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
b/doc/guides/prog_guide/env_abstraction_layer.rst
index 1b531e2..3656ca6 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -85,13 +85,12 @@ A check is also performed at initialization time to ensure 
that the micro archit
 Then, the main() function is called. The core initialization and launch is 
done in rte_eal_init() (see the API documentation).
 It consist of calls to the pthread library (more specifically, pthread_self(), 
pthread_create(), and pthread_setaffinity_np()).

-.. _pg_figure_2:
+.. _figure_linuxapp_launch:

-**Figure 2. EAL Initialization in a Linux Application Environment**
+.. figure:: img/linuxapp_launch.*

-.. image3_png has been replaced
+   EAL Initialization in a Linux Application Environment

-|linuxapp_launch|

 .. note::

@@ -367,4 +366,3 @@ We expect only 50% of CPU spend on packet IO.
     echo  50000 > pkt_io/cpu.cfs_quota_us


-.. |linuxapp_launch| image:: img/linuxapp_launch.*
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index a9966a0..84a657e 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -80,71 +80,97 @@ Programmer's Guide

 **Figures**

-:ref:`Figure 1. Core Components Architecture <pg_figure_1>`
+:numref:`figure_architecture-overview` :ref:`figure_architecture-overview`

-:ref:`Figure 2. EAL Initialization in a Linux Application Environment 
<pg_figure_2>`
+:numref:`figure_linuxapp_launch` :ref:`figure_linuxapp_launch`

-:ref:`Figure 3. Example of a malloc heap and malloc elements within the malloc 
library <pg_figure_3>`
+:numref:`figure_malloc_heap` :ref:`figure_malloc_heap`

-:ref:`Figure 4. Ring Structure <pg_figure_4>`
+:numref:`figure_ring1` :ref:`figure_ring1`

-:ref:`Figure 5. Two Channels and Quad-ranked DIMM Example <pg_figure_5>`
+:numref:`figure_ring-enqueue1` :ref:`figure_ring-enqueue1`

-:ref:`Figure 6. Three Channels and Two Dual-ranked DIMM Example <pg_figure_6>`
+:numref:`figure_ring-enqueue2` :ref:`figure_ring-enqueue2`

-:ref:`Figure 7. A mempool in Memory with its Associated Ring <pg_figure_7>`
+:numref:`figure_ring-enqueue3` :ref:`figure_ring-enqueue3`

-:ref:`Figure 8. An mbuf with One Segment <pg_figure_8>`
+:numref:`figure_ring-dequeue1` :ref:`figure_ring-dequeue1`

-:ref:`Figure 9. An mbuf with Three Segments <pg_figure_9>`
+:numref:`figure_ring-dequeue2` :ref:`figure_ring-dequeue2`

-:ref:`Figure 16. Memory Sharing inthe Intel? DPDK Multi-process Sample 
Application <pg_figure_16>`
+:numref:`figure_ring-dequeue3` :ref:`figure_ring-dequeue3`

-:ref:`Figure 17. Components of an Intel? DPDK KNI Application <pg_figure_17>`
+:numref:`figure_ring-mp-enqueue1` :ref:`figure_ring-mp-enqueue1`

-:ref:`Figure 18. Packet Flow via mbufs in the Intel DPDK? KNI <pg_figure_18>`
+:numref:`figure_ring-mp-enqueue2` :ref:`figure_ring-mp-enqueue2`

-:ref:`Figure 19. vHost-net Architecture Overview <pg_figure_19>`
+:numref:`figure_ring-mp-enqueue3` :ref:`figure_ring-mp-enqueue3`

-:ref:`Figure 20. KNI Traffic Flow <pg_figure_20>`
+:numref:`figure_ring-mp-enqueue4` :ref:`figure_ring-mp-enqueue4`

-:ref:`Figure 21. Complex Packet Processing Pipeline with QoS Support 
<pg_figure_21>`
+:numref:`figure_ring-mp-enqueue5` :ref:`figure_ring-mp-enqueue5`

-:ref:`Figure 22. Hierarchical Scheduler Block Internal Diagram <pg_figure_22>`
+:numref:`figure_ring-modulo1` :ref:`figure_ring-modulo1`

-:ref:`Figure 23. Scheduling Hierarchy per Port <pg_figure_23>`
+:numref:`figure_ring-modulo2` :ref:`figure_ring-modulo2`

-:ref:`Figure 24. Internal Data Structures per Port <pg_figure_24>`
+:numref:`figure_memory-management` :ref:`figure_memory-management`

-:ref:`Figure 25. Prefetch Pipeline for the Hierarchical Scheduler Enqueue 
Operation <pg_figure_25>`
+:numref:`figure_memory-management2` :ref:`figure_memory-management2`

-:ref:`Figure 26. Pipe Prefetch State Machine for the Hierarchical Scheduler 
Dequeue Operation <pg_figure_26>`
+:numref:`figure_mempool` :ref:`figure_mempool`

-:ref:`Figure 27. High-level Block Diagram of the Intel? DPDK Dropper 
<pg_figure_27>`
+:numref:`figure_mbuf1` :ref:`figure_mbuf1`

-:ref:`Figure 28. Flow Through the Dropper <pg_figure_28>`
+:numref:`figure_mbuf2` :ref:`figure_mbuf2`

-:ref:`Figure 29. Example Data Flow Through Dropper <pg_figure_29>`
+:numref:`figure_multi_process_memory` :ref:`figure_multi_process_memory`

-:ref:`Figure 30. Packet Drop Probability for a Given RED Configuration 
<pg_figure_30>`
+:numref:`figure_kernel_nic_intf` :ref:`figure_kernel_nic_intf`

-:ref:`Figure 31. Initial Drop Probability (pb), Actual Drop probability (pa) 
Computed Using a Factor 1 (Blue Curve) and a Factor 2 (Red Curve) 
<pg_figure_31>`
+:numref:`figure_pkt_flow_kni` :ref:`figure_pkt_flow_kni`

-:ref:`Figure 32. Example of packet processing pipeline. The input ports 0 and 
1 are connected with the output ports 0, 1 and 2 through tables 0 and 1. 
<pg_figure_32>`
+:numref:`figure_vhost_net_arch2` :ref:`figure_vhost_net_arch2`

-:ref:`Figure 33. Sequence of steps for hash table operations in packet 
processing context <pg_figure_33>`
+:numref:`figure_kni_traffic_flow` :ref:`figure_kni_traffic_flow`

-:ref:`Figure 34. Data structures for configurable key size hash tables 
<pg_figure_34>`

-:ref:`Figure 35. Bucket search pipeline for key lookup operation (configurable 
key size hash tables) <pg_figure_35>`
+:numref:`figure_pkt_proc_pipeline_qos` :ref:`figure_pkt_proc_pipeline_qos`

-:ref:`Figure 36. Pseudo-code for match, match_many and match_pos 
<pg_figure_36>`
+:numref:`figure_hier_sched_blk` :ref:`figure_hier_sched_blk`

-:ref:`Figure 37. Data structures for 8-byte key hash tables <pg_figure_37>`
+:numref:`figure_sched_hier_per_port` :ref:`figure_sched_hier_per_port`

-:ref:`Figure 38. Data structures for 16-byte key hash tables <pg_figure_38>`
+:numref:`figure_data_struct_per_port` :ref:`figure_data_struct_per_port`
+
+:numref:`figure_prefetch_pipeline` :ref:`figure_prefetch_pipeline`
+
+:numref:`figure_pipe_prefetch_sm` :ref:`figure_pipe_prefetch_sm`
+
+:numref:`figure_blk_diag_dropper` :ref:`figure_blk_diag_dropper`
+
+:numref:`figure_flow_tru_droppper` :ref:`figure_flow_tru_droppper`
+
+:numref:`figure_ex_data_flow_tru_dropper` 
:ref:`figure_ex_data_flow_tru_dropper`
+
+:numref:`figure_pkt_drop_probability` :ref:`figure_pkt_drop_probability`
+
+:numref:`figure_drop_probability_graph` :ref:`figure_drop_probability_graph`
+
+:numref:`figure_figure32` :ref:`figure_figure32`
+
+:numref:`figure_figure33` :ref:`figure_figure33`
+
+:numref:`figure_figure34` :ref:`figure_figure34`
+
+:numref:`figure_figure35` :ref:`figure_figure35`
+
+:numref:`figure_figure37` :ref:`figure_figure37`
+
+:numref:`figure_figure38` :ref:`figure_figure38`
+
+:numref:`figure_figure39` :ref:`figure_figure39`

-:ref:`Figure 39. Bucket search pipeline for key lookup operation (single key 
size hash tables) <pg_figure_39>`

 **Tables**

diff --git a/doc/guides/prog_guide/ivshmem_lib.rst 
b/doc/guides/prog_guide/ivshmem_lib.rst
index c76d2b3..af4c7a9 100644
--- a/doc/guides/prog_guide/ivshmem_lib.rst
+++ b/doc/guides/prog_guide/ivshmem_lib.rst
@@ -43,9 +43,11 @@ they are automatically recognized by the DPDK Environment 
Abstraction Layer (EAL

 A typical DPDK IVSHMEM use case looks like the following.

-.. image28_png has been renamed

-|ivshmem|
+.. figure:: img/ivshmem.*
+
+   Typical Ivshmem use case
+

 The same could work with several virtual machines, providing host-to-VM or 
VM-to-VM communication.
 The maximum number of metadata files is 32 (by default) and each metadata file 
can contain different (or even the same) hugepages.
@@ -154,5 +156,3 @@ It is important to note that once QEMU is started, it holds 
on to the hugepages
 As a result, if the user wishes to shut down or restart the IVSHMEM host 
application,
 it is not enough to simply shut the application down.
 The virtual machine must also be shut down (if not, it will hold onto outdated 
host data).
-
-.. |ivshmem| image:: img/ivshmem.*
diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst 
b/doc/guides/prog_guide/kernel_nic_interface.rst
index bac2215..3402fd2 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -42,15 +42,14 @@ The benefits of using the DPDK KNI are:

 *   Allows an interface with the kernel network stack.

-The components of an application using the DPDK Kernel NIC Interface are shown 
in Figure 17.
+The components of an application using the DPDK Kernel NIC Interface are shown 
in :numref:`figure_kernel_nic_intf`.

-.. _pg_figure_17:
+.. _figure_kernel_nic_intf:

-**Figure 17. Components of a DPDK KNI Application**
+.. figure:: img/kernel_nic_intf.*

-.. image43_png has been renamed
+   Components of a DPDK KNI Application

-|kernel_nic_intf|

 The DPDK KNI Kernel Module
 --------------------------
@@ -114,15 +113,14 @@ To minimize the amount of DPDK code running in kernel 
space, the mbuf mempool is
 The kernel module will be aware of mbufs,
 but all mbuf allocation and free operations will be handled by the DPDK 
application only.

-Figure 18 shows a typical scenario with packets sent in both directions.
+:numref:`figure_pkt_flow_kni` shows a typical scenario with packets sent in 
both directions.

-.. _pg_figure_18:
+.. _figure_pkt_flow_kni:

-**Figure 18. Packet Flow via mbufs in the DPDK KNI**
+.. figure:: img/pkt_flow_kni.*

-.. image44_png has been renamed
+   Packet Flow via mbufs in the DPDK KNI

-|pkt_flow_kni|

 Use Case: Ingress
 -----------------
@@ -189,13 +187,12 @@ it naturally supports both legacy virtio -net and the 
DPDK PMD virtio.
 There is a little penalty that comes from the non-polling mode of vhost.
 However, it scales throughput well when using KNI in multi-thread mode.

-.. _pg_figure_19:
+.. _figure_vhost_net_arch2:

-**Figure 19. vHost-net Architecture Overview**
+.. figure:: img/vhost_net_arch.*

-.. image45_png has been renamed
+   vHost-net Architecture Overview

-|vhost_net_arch|

 Packet Flow
 ~~~~~~~~~~~
@@ -208,13 +205,12 @@ All the packet copying, irrespective of whether it is on 
the transmit or receive
 happens in the context of vhost kthread.
 Every vhost-net device is exposed to a front end virtio device in the guest.

-.. _pg_figure_20:
+.. _figure_kni_traffic_flow:

-**Figure 20. KNI Traffic Flow**
+.. figure:: img/kni_traffic_flow.*

-.. image46_png  has been renamed
+   KNI Traffic Flow

-|kni_traffic_flow|

 Sample Usage
 ~~~~~~~~~~~~
@@ -280,11 +276,3 @@ since the kni-vhost does not yet support those features.
 Even if the option is turned on, kni-vhost will ignore the information that 
the header contains.
 When working with legacy virtio on the guest, it is better to turn off 
unsupported offload features using ethtool -K.
 Otherwise, there may be problems such as an incorrect L4 checksum error.
-
-.. |kni_traffic_flow| image:: img/kni_traffic_flow.*
-
-.. |vhost_net_arch| image:: img/vhost_net_arch.*
-
-.. |pkt_flow_kni| image:: img/pkt_flow_kni.*
-
-.. |kernel_nic_intf| image:: img/kernel_nic_intf.*
diff --git a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst 
b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
index 24a1a36..fd3ac5e 100644
--- a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
+++ b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
@@ -35,7 +35,10 @@ In addition to Poll Mode Drivers (PMDs) for physical and 
virtual hardware,
 DPDK also includes a pure-software library that
 allows physical PMD's to be bonded together to create a single logical PMD.

-|bond-overview|
+.. figure:: img/bond-overview.*
+
+   Bonded PMDs
+

 The Link Bonding PMD library(librte_pmd_bond) supports bonding of groups of
 ``rte_eth_dev`` ports of the same speed and duplex to provide
@@ -62,7 +65,10 @@ Currently the Link Bonding PMD library supports 4 modes of 
operation:

 *   **Round-Robin (Mode 0):**

-|bond-mode-0|
+.. figure:: img/bond-mode-0.*
+
+   Round-Robin (Mode 0)
+

     This mode provides load balancing and fault tolerance by transmission of
     packets in sequential order from the first available slave device through
@@ -72,7 +78,10 @@ Currently the Link Bonding PMD library supports 4 modes of 
operation:

 *   **Active Backup (Mode 1):**

-|bond-mode-1|
+.. figure:: img/bond-mode-1.*
+
+   Active Backup (Mode 1)
+

     In this mode only one slave in the bond is active at any time, a different
     slave becomes active if, and only if, the primary active slave fails,
@@ -82,7 +91,10 @@ Currently the Link Bonding PMD library supports 4 modes of 
operation:

 *   **Balance XOR (Mode 2):**

-|bond-mode-2|
+.. figure:: img/bond-mode-2.*
+
+   Balance XOR (Mode 2)
+

     This mode provides transmit load balancing (based on the selected
     transmission policy) and fault tolerance. The default policy (layer2) uses
@@ -101,14 +113,20 @@ Currently the Link Bonding PMD library supports 4 modes 
of operation:

 *   **Broadcast (Mode 3):**

-|bond-mode-3|
+.. figure:: img/bond-mode-3.*
+
+   Broadcast (Mode 3)
+

     This mode provides fault tolerance by transmission of packets on all slave
     ports.

 *   **Link Aggregation 802.3AD (Mode 4):**

-|bond-mode-4|
+.. figure:: img/bond-mode-4.*
+
+   Link Aggregation 802.3AD (Mode 4)
+

     This mode provides dynamic link aggregation according to the 802.3ad
     specification. It negotiates and monitors aggregation groups that share the
@@ -128,7 +146,10 @@ Currently the Link Bonding PMD library supports 4 modes of 
operation:

 *   **Transmit Load Balancing (Mode 5):**

-|bond-mode-5|
+.. figure:: img/bond-mode-5.*
+
+   Transmit Load Balancing (Mode 5)
+

     This mode provides an adaptive transmit load balancing. It dynamically
     changes the transmitting slave, according to the computed load. Statistics
@@ -433,11 +454,3 @@ Create a bonded device in balance mode with two slaves 
specified by their PCI ad
 .. code-block:: console

     $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=2, 
slave=0000:00a:00.01,slave=0000:004:00.00,xmit_policy=l34' -- 
--port-topology=chained
-
-.. |bond-overview| image:: img/bond-overview.*
-.. |bond-mode-0| image:: img/bond-mode-0.*
-.. |bond-mode-1| image:: img/bond-mode-1.*
-.. |bond-mode-2| image:: img/bond-mode-2.*
-.. |bond-mode-3| image:: img/bond-mode-3.*
-.. |bond-mode-4| image:: img/bond-mode-4.*
-.. |bond-mode-5| image:: img/bond-mode-5.*
diff --git a/doc/guides/prog_guide/lpm6_lib.rst 
b/doc/guides/prog_guide/lpm6_lib.rst
index abc5adb..87f5066 100644
--- a/doc/guides/prog_guide/lpm6_lib.rst
+++ b/doc/guides/prog_guide/lpm6_lib.rst
@@ -108,9 +108,11 @@ This is not feasible due to resource restrictions.
 By splitting the process in different tables/levels and limiting the number of 
tbl8s,
 we can greatly reduce memory consumption while maintaining a very good lookup 
speed (one memory access per level).

-.. image40_png has been renamed

-|tbl24_tbl8_tbl8|
+.. figure:: img/tbl24_tbl8_tbl8.*
+
+   Table split into different levels
+

 An entry in a table contains the following fields:

@@ -231,5 +233,3 @@ Use Case: IPv6 Forwarding
 -------------------------

 The LPM algorithm is used to implement the Classless Inter-Domain Routing 
(CIDR) strategy used by routers implementing IP forwarding.
-
-.. |tbl24_tbl8_tbl8| image:: img/tbl24_tbl8_tbl8.*
diff --git a/doc/guides/prog_guide/lpm_lib.rst 
b/doc/guides/prog_guide/lpm_lib.rst
index 692e37f..c33e469 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -90,9 +90,11 @@ Instead, this approach takes advantage of the fact that 
rules longer than 24 bit
 By splitting the process in two different tables/levels and limiting the 
number of tbl8s,
 we can greatly reduce memory consumption while maintaining a very good lookup 
speed (one memory access, most of the times).

-.. image39 has been renamed

-|tbl24_tbl8|
+.. figure:: img/tbl24_tbl8.*
+
+   Table split into different levels
+

 An entry in tbl24 contains the following fields:

@@ -219,5 +221,3 @@ References

 *   Pankaj Gupta, Algorithms for Routing Lookups and Packet Classification, 
PhD Thesis, Stanford University,
     2000  (`http://klamath.stanford.edu/~pankaj/thesis/ thesis_1sided.pdf 
<http://klamath.stanford.edu/~pankaj/thesis/%20thesis_1sided.pdf>`_ )
-
-.. |tbl24_tbl8| image:: img/tbl24_tbl8.*
diff --git a/doc/guides/prog_guide/malloc_lib.rst 
b/doc/guides/prog_guide/malloc_lib.rst
index b9298f8..6418fab 100644
--- a/doc/guides/prog_guide/malloc_lib.rst
+++ b/doc/guides/prog_guide/malloc_lib.rst
@@ -117,13 +117,12 @@ The key fields of the heap structure and their function 
are described below (see
     since these are never touched except when they are to be freed again -
     at which point the pointer to the block is an input to the free() function.

-.. _pg_figure_3:
+.. _figure_malloc_heap:

-**Figure 3. Example of a malloc heap and malloc elements within the malloc 
library**
+.. figure:: img/malloc_heap.*

-.. image4_png has been renamed
+   Example of a malloc heap and malloc elements within the malloc library

-|malloc_heap|

 Structure: malloc_elem
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -232,5 +231,3 @@ These next and previous elements are then checked to see if 
they too are free,
 and if so, they are merged with the current elements.
 This means that we can never have two free memory blocks adjacent to one 
another,
 they are always merged into a single block.
-
-.. |malloc_heap| image:: img/malloc_heap.*
diff --git a/doc/guides/prog_guide/mbuf_lib.rst 
b/doc/guides/prog_guide/mbuf_lib.rst
index 8f546e0..8845039 100644
--- a/doc/guides/prog_guide/mbuf_lib.rst
+++ b/doc/guides/prog_guide/mbuf_lib.rst
@@ -71,23 +71,21 @@ Message buffers may be used to carry control information, 
packets, events,
 and so on between different entities in the system.
 Message buffers may also use their buffer pointers to point to other message 
buffer data sections or other structures.

-Figure 8 and Figure 9 show some of these scenarios.
+:numref:`figure_mbuf1` and :numref:`figure_mbuf2` show some of these scenarios.

-.. _pg_figure_8:
+.. _figure_mbuf1:

-**Figure 8. An mbuf with One Segment**
+.. figure:: img/mbuf1.*

-.. image22_png  has been replaced
+   An mbuf with One Segment

-|mbuf1|

-.. _pg_figure_9:
+.. _figure_mbuf2:

-**Figure 9. An mbuf with Three Segments**
+.. figure:: img/mbuf2.*

-.. image23_png has been replaced
+   An mbuf with Three Segments

-|mbuf2|

 The Buffer Manager implements a fairly standard set of buffer access functions 
to manipulate network packets.

@@ -277,7 +275,3 @@ Use Cases
 ---------

 All networking application should use mbufs to transport network packets.
-
-.. |mbuf1| image:: img/mbuf1.*
-
-.. |mbuf2| image:: img/mbuf2.*
diff --git a/doc/guides/prog_guide/mempool_lib.rst 
b/doc/guides/prog_guide/mempool_lib.rst
index f9b7cfe..f0ca06f 100644
--- a/doc/guides/prog_guide/mempool_lib.rst
+++ b/doc/guides/prog_guide/mempool_lib.rst
@@ -74,28 +74,27 @@ When running an application, the EAL command line options 
provide the ability to

     The command line must always have the number of memory channels specified 
for the processor.

-Examples of alignment for different DIMM architectures are shown in Figure 5 
and Figure 6.
+Examples of alignment for different DIMM architectures are shown in
+:numref:`figure_memory-management` and :numref:`figure_memory-management2`.

-.. _pg_figure_5:
+.. _figure_memory-management:

-**Figure 5. Two Channels and Quad-ranked DIMM Example**
+.. figure:: img/memory-management.*

-.. image19_png has been replaced
+   Two Channels and Quad-ranked DIMM Example

-|memory-management|

 In this case, the assumption is that a packet is 16 blocks of 64 bytes, which 
is not true.

 The Intel? 5520 chipset has three channels, so in most cases,
 no padding is required between objects (except for objects whose size are n x 
3 x 64 bytes blocks).

-.. _pg_figure_6:
+.. _figure_memory-management2:

-**Figure 6. Three Channels and Two Dual-ranked DIMM Example**
+.. figure:: img/memory-management2.*

-.. image20_png has been replaced
+   Three Channels and Two Dual-ranked DIMM Example

-|memory-management2|

 When creating a new pool, the user can specify to use this feature or not.

@@ -119,15 +118,14 @@ This cache can be enabled or disabled at creation of the 
pool.

 The maximum size of the cache is static and is defined at compilation time 
(CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE).

-Figure 7 shows a cache in operation.
+:numref:`figure_mempool` shows a cache in operation.

-.. _pg_figure_7:
+.. _figure_mempool:

-**Figure 7. A mempool in Memory with its Associated Ring**
+.. figure:: img/mempool.*

-.. image21_png has been replaced
+   A mempool in Memory with its Associated Ring

-|mempool|

 Use Cases
 ---------
@@ -140,9 +138,3 @@ Below are some examples:
 *   :ref:`Environment Abstraction Layer <Environment_Abstraction_Layer>` , for 
logging service

 *   Any application that needs to allocate fixed-sized objects in the data 
plane and that will be continuously utilized by the system.
-
-.. |memory-management| image:: img/memory-management.*
-
-.. |memory-management2| image:: img/memory-management2.*
-
-.. |mempool| image:: img/mempool.*
diff --git a/doc/guides/prog_guide/multi_proc_support.rst 
b/doc/guides/prog_guide/multi_proc_support.rst
index 25a6056..6562f0d 100644
--- a/doc/guides/prog_guide/multi_proc_support.rst
+++ b/doc/guides/prog_guide/multi_proc_support.rst
@@ -83,13 +83,12 @@ and point to the same objects, in both processes.
     Refer to Section 23.3 "Multi-process Limitations" for details of
     how Linux kernel Address-Space Layout Randomization (ASLR) can affect 
memory sharing.

-.. _pg_figure_16:
+.. _figure_multi_process_memory:

-**Figure 16. Memory Sharing in the DPDK Multi-process Sample Application**
+.. figure:: img/multi_process_memory.*

-.. image42_png has been replaced
+   Memory Sharing in the DPDK Multi-process Sample Application

-|multi_process_memory|

 The EAL also supports an auto-detection mode (set by EAL --proc-type=auto flag 
),
 whereby an DPDK process is started as a secondary instance if a primary 
instance is already running.
@@ -199,5 +198,3 @@ instead of the functions which do the hashing internally, 
such as rte_hash_add()
     which means that only the first, primary DPDK process instance can open 
and mmap  /dev/hpet.
     If the number of required DPDK processes exceeds that of the number of 
available HPET comparators,
     the TSC (which is the default timer in this release) must be used as a 
time source across all processes instead of the HPET.
-
-.. |multi_process_memory| image:: img/multi_process_memory.*
diff --git a/doc/guides/prog_guide/overview.rst 
b/doc/guides/prog_guide/overview.rst
index 062d923..cef6ca7 100644
--- a/doc/guides/prog_guide/overview.rst
+++ b/doc/guides/prog_guide/overview.rst
@@ -120,13 +120,12 @@ Core Components
 The *core components* are a set of libraries that provide all the elements 
needed
 for high-performance packet processing applications.

-.. _pg_figure_1:
+.. _figure_architecture-overview:

-**Figure 1. Core Components Architecture**
+.. figure:: img/architecture-overview.*

-.. image2_png has been replaced
+   Core Components Architecture

-|architecture-overview|

 Memory Manager (librte_malloc)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -203,5 +202,3 @@ librte_net
 The librte_net library is a collection of IP protocol definitions and 
convenience macros.
 It is based on code from the FreeBSD* IP stack and contains protocol numbers 
(for use in IP headers),
 IP-related macros, IPv4/IPv6 header structures and TCP, UDP and SCTP header 
structures.
-
-.. |architecture-overview| image:: img/architecture-overview.*
diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst 
b/doc/guides/prog_guide/packet_distrib_lib.rst
index 767accc..b5bdabb 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -38,7 +38,10 @@ which is responsible for load balancing or distributing 
packets,
 and a set of worker lcores which are responsible for receiving the packets 
from the distributor and operating on them.
 The model of operation is shown in the diagram below.

-|packet_distributor1|
+.. figure:: img/packet_distributor1.*
+
+   Packet Distributor mode of operation
+

 Distributor Core Operation
 --------------------------
@@ -91,9 +94,11 @@ No packet ordering guarantees are made about packets which 
do not share a common
 Using the process and returned_pkts API, the following application workflow 
can be used,
 while allowing packet order within a packet flow -- identified by a tag -- to 
be maintained.

-.. image41_png has been renamed

-|packet_distributor2|
+.. figure:: img/packet_distributor2.*
+
+   Application workflow
+

 The flush and clear_returns API calls, mentioned previously,
 are likely of less use that the process and returned_pkts APIS, and are 
principally provided to aid in unit testing of the library.
@@ -110,7 +115,3 @@ Since it may be desirable to vary the number of worker 
cores, depending on the t
 i.e. to save power at times of lighter load,
 it is possible to have a worker stop processing packets by calling 
"rte_distributor_return_pkt()" to indicate that
 it has finished the current packet and does not want a new one.
-
-.. |packet_distributor1| image:: img/packet_distributor1.*
-
-.. |packet_distributor2| image:: img/packet_distributor2.*
diff --git a/doc/guides/prog_guide/packet_framework.rst 
b/doc/guides/prog_guide/packet_framework.rst
index 8e8e32f..42bbbaa 100644
--- a/doc/guides/prog_guide/packet_framework.rst
+++ b/doc/guides/prog_guide/packet_framework.rst
@@ -66,15 +66,15 @@ one of the table entries (on lookup hit) or the default 
table entry (on lookup m
 provides the set of actions to be applied on the current packet,
 as well as the next hop for the packet, which can be either another table, an 
output port or packet drop.

-An example of packet processing pipeline is presented in Figure 32:
+An example of packet processing pipeline is presented in 
:numref:`figure_figure32`:

-.. _pg_figure_32:
+.. _figure_figure32:

-**Figure 32 Example of Packet Processing Pipeline where Input Ports 0 and 1 
are Connected with Output Ports 0, 1 and 2 through Tables 0 and 1**
+.. figure:: img/figure32.*

-.. Object_1_png has been renamed
+   Example of Packet Processing Pipeline where Input Ports 0 and 1
+   are Connected with Output Ports 0, 1 and 2 through Tables 0 and 1

-|figure32|

 Port Library Design
 -------------------
@@ -344,13 +344,14 @@ considering *n_bits* as the number of bits set in 
*bucket_mask = n_buckets - 1*,
 this means that all the keys that end up in the same hash table bucket have 
the lower *n_bits* of their signature identical.
 In order to reduce the number of keys in the same bucket (collisions), the 
number of hash table buckets needs to be increased.

-In packet processing context, the sequence of operations involved in hash 
table operations is described in Figure 33:
+In packet processing context, the sequence of operations involved in hash 
table operations is described in :numref:`figure_figure33`:

-.. _pg_figure_33:
+.. _figure_figure33:

-**Figure 33 Sequence of Steps for Hash Table Operations in a Packet Processing 
Context**
+.. figure:: img/figure33.*
+
+   Sequence of Steps for Hash Table Operations in a Packet Processing Context

-|figure33|


 Hash Table Use Cases
@@ -553,16 +554,15 @@ This avoids the important cost associated with flushing 
the CPU core execution p
 Configurable Key Size Hash Table
 """"""""""""""""""""""""""""""""

-Figure 34, Table 25 and Table 26 detail the main data structures used to 
implement configurable key size hash tables (either LRU or extendable bucket,
+:numref:`figure_figure34`, Table 25 and Table 26 detail the main data 
structures used to implement configurable key size hash tables (either LRU or 
extendable bucket,
 either with pre-computed signature or "do-sig").

-.. _pg_figure_34:
+.. _figure_figure34:

-**Figure 34 Data Structures for Configurable Key Size Hash Tables**
+.. figure:: img/figure34.*

-.. image65_png has been renamed
+   Data Structures for Configurable Key Size Hash Tables

-|figure34|

 .. _pg_table_25:

@@ -627,15 +627,17 @@ either with pre-computed signature or "do-sig").
 
+---+------------------+--------------------+------------------------------------------------------------------+


-Figure 35 and Table 27 detail the bucket search pipeline stages (either LRU or 
extendable bucket,
+:numref:`figure_figure35` and Table 27 detail the bucket search pipeline 
stages (either LRU or extendable bucket,
 either with pre-computed signature or "do-sig").
 For each pipeline stage, the described operations are applied to each of the 
two packets handled by that stage.

-.. _pg_figure_35:
+.. _figure_figure35:
+
+.. figure:: img/figure35.*

-**Figure 35 Bucket Search Pipeline for Key Lookup Operation (Configurable Key 
Size Hash Tables)**
+   Bucket Search Pipeline for Key Lookup Operation (Configurable Key Size Hash
+   Tables)

-|figure35|

 .. _pg_table_27:

@@ -814,11 +816,8 @@ Given the input *mask*, the values for *match*, 
*match_many* and *match_pos* can
 |            |                                          |                   |
 +------------+------------------------------------------+-------------------+

-The pseudo-code is displayed in Figure 36.
-
-.. _pg_figure_36:

-**Figure 36 Pseudo-code for match, match_many and match_pos**
+The pseudo-code for match, match_many and match_pos is::

     match = (0xFFFELLU >> mask) & 1;

@@ -829,24 +828,22 @@ The pseudo-code is displayed in Figure 36.
 Single Key Size Hash Tables
 """""""""""""""""""""""""""

-Figure 37, Figure 38, Table 30 and 31 detail the main data structures used to 
implement 8-byte and 16-byte key hash tables
+:numref:`figure_figure37`, :numref:`figure_figure38`, Table 30 and 31 detail 
the main data structures used to implement 8-byte and 16-byte key hash tables
 (either LRU or extendable bucket, either with pre-computed signature or 
"do-sig").

-.. _pg_figure_37:
+.. _figure_figure37:

-**Figure 37 Data Structures for 8-byte Key Hash Tables**
+.. figure:: img/figure37.*

-.. image66_png has been renamed
+   Data Structures for 8-byte Key Hash Tables

-|figure37|

-.. _pg_figure_38:
+.. _figure_figure38:

-**Figure 38 Data Structures for 16-byte Key Hash Tables**
+.. figure:: img/figure38.*

-.. image67_png has been renamed
+   Data Structures for 16-byte Key Hash Tables

-|figure38|

 .. _pg_table_30:

@@ -914,11 +911,13 @@ and detail the bucket search pipeline used to implement 
8-byte and 16-byte key h
 either with pre-computed signature or "do-sig").
 For each pipeline stage, the described operations are applied to each of the 
two packets handled by that stage.

-.. _pg_figure_39:
+.. _figure_figure39:

-**Figure 39 Bucket Search Pipeline for Key Lookup Operation (Single Key Size 
Hash Tables)**
+.. figure:: img/figure39.*
+
+   Bucket Search Pipeline for Key Lookup Operation (Single Key Size Hash
+   Tables)

-|figure39|

 .. _pg_table_32:

@@ -1167,17 +1166,3 @@ Usually, to support a specific functional block, 
specific implementation of Pack
 with all the implementations sharing the same API: pure SW implementation (no 
acceleration), implementation using accelerator A, implementation using 
accelerator B, etc.
 The selection between these implementations could be done at build time or at 
run-time (recommended), based on which accelerators are present in the system,
 with no application changes required.
-
-.. |figure33| image:: img/figure33.*
-
-.. |figure35| image:: img/figure35.*
-
-.. |figure39| image:: img/figure39.*
-
-.. |figure34| image:: img/figure34.*
-
-.. |figure32| image:: img/figure32.*
-
-.. |figure37| image:: img/figure37.*
-
-.. |figure38| image:: img/figure38.*
diff --git a/doc/guides/prog_guide/qos_framework.rst 
b/doc/guides/prog_guide/qos_framework.rst
index b609841..59f7fb3 100644
--- a/doc/guides/prog_guide/qos_framework.rst
+++ b/doc/guides/prog_guide/qos_framework.rst
@@ -38,13 +38,12 @@ Packet Pipeline with QoS Support

 An example of a complex packet processing pipeline with QoS support is shown 
in the following figure.

-.. _pg_figure_21:
+.. _figure_pkt_proc_pipeline_qos:

-**Figure 21. Complex Packet Processing Pipeline with QoS Support**
+.. figure:: img/pkt_proc_pipeline_qos.*

-.. image47_png has been renamed
+   Complex Packet Processing Pipeline with QoS Support

-|pkt_proc_pipeline_qos|

 This pipeline can be built using reusable DPDK software libraries.
 The main blocks implementing QoS in this pipeline are: the policer, the 
dropper and the scheduler.
@@ -139,13 +138,12 @@ It typically acts like a buffer that is able to 
temporarily store a large number
 as the NIC TX is requesting more packets for transmission,
 these packets are later on removed and handed over to the NIC TX with the 
packet selection logic observing the predefined SLAs (dequeue operation).

-.. _pg_figure_22:
+.. _figure_hier_sched_blk:

-**Figure 22. Hierarchical Scheduler Block Internal Diagram**
+.. figure:: img/hier_sched_blk.*

-.. image48_png has been renamed
+   Hierarchical Scheduler Block Internal Diagram

-|hier_sched_blk|

 The hierarchical scheduler is optimized for a large number of packet queues.
 When only a small number of queues are needed, message passing queues should 
be used instead of this block.
@@ -154,7 +152,7 @@ See Section 26.2.5 "Worst Case Scenarios for Performance" 
for a more detailed di
 Scheduling Hierarchy
 ~~~~~~~~~~~~~~~~~~~~

-The scheduling hierarchy is shown in Figure 23.
+The scheduling hierarchy is shown in :numref:`figure_sched_hier_per_port`.
 The first level of the hierarchy is the Ethernet TX port 1/10/40 GbE,
 with subsequent hierarchy levels defined as subport, pipe, traffic class and 
queue.

@@ -163,13 +161,12 @@ Each traffic class is the representation of a different 
traffic type with specif
 delay and jitter requirements, such as voice, video or data transfers.
 Each queue hosts packets from one or multiple connections of the same type 
belonging to the same user.

-.. _pg_figure_23:
+.. _figure_sched_hier_per_port:

-**Figure 23. Scheduling Hierarchy per Port**
+.. figure:: img/sched_hier_per_port.*

-.. image49_png has been renamed
+   Scheduling Hierarchy per Port

-|sched_hier_per_port|

 The functionality of each hierarchical level is detailed in the following 
table.

@@ -293,13 +290,12 @@ Internal Data Structures per Port

 A schematic of the internal data structures in shown in with details in.

-.. _pg_figure_24:
+.. _figure_data_struct_per_port:

-**Figure 24. Internal Data Structures per Port**
+.. figure:: img/data_struct_per_port.*

-.. image50_png has been renamed
+    Internal Data Structures per Port

-|data_struct_per_port|

 .. _pg_table_4:

@@ -434,16 +430,15 @@ the processor should not attempt to access the data 
structure currently under pr
 The only other work available is to execute different stages of the enqueue 
sequence of operations on other input packets,
 thus resulting in a pipelined implementation for the enqueue operation.

-Figure 25 illustrates a pipelined implementation for the enqueue operation 
with 4 pipeline stages and each stage executing 2 different input packets.
+:numref:`figure_prefetch_pipeline` illustrates a pipelined implementation for 
the enqueue operation with 4 pipeline stages and each stage executing 2 
different input packets.
 No input packet can be part of more than one pipeline stage at a given time.

-.. _pg_figure_25:
+.. _figure_prefetch_pipeline:

-**Figure 25. Prefetch Pipeline for the Hierarchical Scheduler Enqueue 
Operation**
+.. figure:: img/prefetch_pipeline.*

-.. image51 has been renamed
+    Prefetch Pipeline for the Hierarchical Scheduler Enqueue Operation

-|prefetch_pipeline|

 The congestion management scheme implemented by the enqueue pipeline described 
above is very basic:
 packets are enqueued until a specific queue becomes full,
@@ -478,13 +473,13 @@ The dequeue pipe state machine exploits the data presence 
into the processor cac
 therefore it tries to send as many packets from the same pipe TC and pipe as 
possible (up to the available packets and credits) before
 moving to the next active TC from the same pipe (if any) or to another active 
pipe.

-.. _pg_figure_26:
+.. _figure_pipe_prefetch_sm:

-**Figure 26. Pipe Prefetch State Machine for the Hierarchical Scheduler 
Dequeue Operation**
+.. figure:: img/pipe_prefetch_sm.*

-.. image52 has been renamed
+   Pipe Prefetch State Machine for the Hierarchical Scheduler Dequeue
+   Operation

-|pipe_prefetch_sm|

 Timing and Synchronization
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1173,17 +1168,16 @@ Dropper
 The purpose of the DPDK dropper is to drop packets arriving at a packet 
scheduler to avoid congestion.
 The dropper supports the Random Early Detection (RED),
 Weighted Random Early Detection (WRED) and tail drop algorithms.
-Figure 1 illustrates how the dropper integrates with the scheduler.
+:numref:`figure_blk_diag_dropper` illustrates how the dropper integrates with 
the scheduler.
 The DPDK currently does not support congestion management
 so the dropper provides the only method for congestion avoidance.

-.. _pg_figure_27:
+.. _figure_blk_diag_dropper:

-**Figure 27. High-level Block Diagram of the DPDK Dropper**
+.. figure:: img/blk_diag_dropper.*

-.. image53_png has been renamed
+   High-level Block Diagram of the DPDK Dropper

-|blk_diag_dropper|

 The dropper uses the Random Early Detection (RED) congestion avoidance 
algorithm as documented in the reference publication.
 The purpose of the RED algorithm is to monitor a packet queue,
@@ -1202,16 +1196,15 @@ In the case of severe congestion, the dropper resorts 
to tail drop.
 This occurs when a packet queue has reached maximum capacity and cannot store 
any more packets.
 In this situation, all arriving packets are dropped.

-The flow through the dropper is illustrated in Figure 28.
+The flow through the dropper is illustrated in 
:numref:`figure_flow_tru_droppper`.
 The RED/WRED algorithm is exercised first and tail drop second.

-.. _pg_figure_28:
+.. _figure_flow_tru_droppper:

-**Figure 28. Flow Through the Dropper**
+.. figure:: img/flow_tru_droppper.*

-..  image54_png has been renamed
+   Flow Through the Dropper

-|flow_tru_droppper|

 The use cases supported by the dropper are:

@@ -1270,17 +1263,16 @@ for example, a filter weight parameter value of 9 
corresponds to a filter weight
 Enqueue Operation
 ~~~~~~~~~~~~~~~~~

-In the example shown in Figure 29, q (actual queue size) is the input value,
+In the example shown in :numref:`figure_ex_data_flow_tru_dropper`, q (actual 
queue size) is the input value,
 avg (average queue size) and count (number of packets since the last drop) are 
run-time values,
 decision is the output value and the remaining values are configuration 
parameters.

-.. _pg_figure_29:
+.. _figure_ex_data_flow_tru_dropper:

-**Figure 29. Example Data Flow Through Dropper**
+.. figure:: img/ex_data_flow_tru_dropper.*

-.. image55_png has been renamed
+   Example Data Flow Through Dropper

-|ex_data_flow_tru_dropper|

 EWMA Filter Microblock
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -1298,11 +1290,7 @@ Average Queue Size Calculation when the Queue is not 
Empty

 The definition of the EWMA filter is given in the following equation.

-**Equation 1.**
-
-.. image56_png has been renamed
-
-|ewma_filter_eq_1|
+.. image:: img/ewma_filter_eq_1.*

 Where:

@@ -1326,11 +1314,7 @@ When the queue becomes empty, average queue size should 
decay gradually to zero
 or remaining stagnant at the last computed value.
 When a packet is enqueued on an empty queue, the average queue size is 
computed using the following formula:

-**Equation 2.**
-
-.. image57_png has been renamed
-
-|ewma_filter_eq_2|
+.. image:: img/ewma_filter_eq_2.*

 Where:

@@ -1338,9 +1322,7 @@ Where:

 In the dropper module, *m* is defined as:

-.. image58_png has been renamed
-
-|m_definition|
+.. image:: img/m_definition.*

 Where:

@@ -1374,15 +1356,13 @@ A numerical method is used to compute the factor 
(1-wq)^m that appears in Equati

 This method is based on the following identity:

-.. image59_png has been renamed
+.. image:: img/eq2_factor.*

-|eq2_factor|

 This allows us to express the following:

-.. image60_png has been renamed
+.. image:: img/eq2_expression.*

-|eq2_expression|

 In the dropper module, a look-up table is used to compute log2(1-wq) for each 
value of wq supported by the dropper module.
 The factor (1-wq)^m can then be obtained by multiplying the table value by *m* 
and applying shift operations.
@@ -1465,11 +1445,7 @@ Initial Packet Drop Probability

 The initial drop probability is calculated using the following equation.

-**Equation 3.**
-
-.. image61_png has been renamed
-
-|drop_probability_eq3|
+.. image:: img/drop_probability_eq3.*

 Where:

@@ -1481,19 +1457,18 @@ Where:

 *   *maxth*  = maximum threshold

-The calculation of the packet drop probability using Equation 3 is illustrated 
in Figure 30.
+The calculation of the packet drop probability using Equation 3 is illustrated 
in :numref:`figure_pkt_drop_probability`.
 If the average queue size is below the minimum threshold, an arriving packet 
is enqueued.
 If the average queue size is at or above the maximum threshold, an arriving 
packet is dropped.
 If the average queue size is between the minimum and maximum thresholds,
 a drop probability is calculated to determine if the packet should be enqueued 
or dropped.

-.. _pg_figure_30:
+.. _figure_pkt_drop_probability:

-**Figure 30. Packet Drop Probability for a Given RED Configuration**
+.. figure:: img/pkt_drop_probability.*

-.. image62_png has been renamed
+   Packet Drop Probability for a Given RED Configuration

-|pkt_drop_probability|

 Actual Drop Probability
 """""""""""""""""""""""
@@ -1501,11 +1476,7 @@ Actual Drop Probability
 If the average queue size is between the minimum and maximum thresholds,
 then the actual drop probability is calculated from the following equation.

-**Equation 4.**
-
-.. image63_png has been renamed
-
-|drop_probability_eq4|
+.. image:: img/drop_probability_eq4.*

 Where:

@@ -1518,7 +1489,7 @@ given in the reference document where a value of 1 is 
used instead.
 It should be noted that the value pa computed from can be negative or greater 
than 1.
 If this is the case, then a value of 1 should be used instead.

-The initial and actual drop probabilities are shown in Figure 31.
+The initial and actual drop probabilities are shown in 
:numref:`figure_drop_probability_graph`.
 The actual drop probability is shown for the case where
 the formula given in the reference document1 is used (blue curve)
 and also for the case where the formula implemented in the dropper module,
@@ -1528,13 +1499,13 @@ compared to the mark probability configuration 
parameter specified by the user.
 The choice to deviate from the reference document is simply a design decision 
and
 one that has been taken by other RED implementations, for example, FreeBSD* 
ALTQ RED.

-.. _pg_figure_31:
+.. _figure_drop_probability_graph:

-**Figure 31. Initial Drop Probability (pb), Actual Drop probability (pa) 
Computed Using a Factor 1 (Blue Curve) and a Factor 2 (Red Curve)**
+.. figure:: img/drop_probability_graph.*

-.. image64_png has been renamed
+   Initial Drop Probability (pb), Actual Drop probability (pa) Computed Using
+   a Factor 1 (Blue Curve) and a Factor 2 (Red Curve)

-|drop_probability_graph|

 .. _Queue_Empty_Operation:

@@ -1727,39 +1698,3 @@ For each input packet, the steps for the srTCM / trTCM 
algorithms are:
     the input color of the packet is also considered.
     When the output color is not red, a number of tokens equal to the length 
of the IP packet are
     subtracted from the C or E /P or both buckets, depending on the algorithm 
and the output color of the packet.
-
-.. |flow_tru_droppper| image:: img/flow_tru_droppper.*
-
-.. |drop_probability_graph| image:: img/drop_probability_graph.*
-
-.. |drop_probability_eq3| image:: img/drop_probability_eq3.*
-
-.. |eq2_expression| image:: img/eq2_expression.*
-
-.. |drop_probability_eq4| image:: img/drop_probability_eq4.*
-
-.. |pkt_drop_probability| image:: img/pkt_drop_probability.*
-
-.. |pkt_proc_pipeline_qos| image:: img/pkt_proc_pipeline_qos.*
-
-.. |ex_data_flow_tru_dropper| image:: img/ex_data_flow_tru_dropper.*
-
-.. |ewma_filter_eq_1| image:: img/ewma_filter_eq_1.*
-
-.. |ewma_filter_eq_2| image:: img/ewma_filter_eq_2.*
-
-.. |data_struct_per_port| image:: img/data_struct_per_port.*
-
-.. |prefetch_pipeline| image:: img/prefetch_pipeline.*
-
-.. |pipe_prefetch_sm| image:: img/pipe_prefetch_sm.*
-
-.. |blk_diag_dropper| image:: img/blk_diag_dropper.*
-
-.. |m_definition| image:: img/m_definition.*
-
-.. |eq2_factor| image:: img/eq2_factor.*
-
-.. |sched_hier_per_port| image:: img/sched_hier_per_port.*
-
-.. |hier_sched_blk| image:: img/hier_sched_blk.*
diff --git a/doc/guides/prog_guide/ring_lib.rst 
b/doc/guides/prog_guide/ring_lib.rst
index 8547b38..3b92a8f 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -72,13 +72,12 @@ The disadvantages:

 A simplified representation of a Ring is shown in with consumer and producer 
head and tail pointers to objects stored in the data structure.

-.. _pg_figure_4:
+.. _figure_ring1:

-**Figure 4. Ring Structure**
+.. figure:: img/ring1.*

-.. image5_png has been replaced
+   Ring Structure

-|ring1|

 References for Ring Implementation in FreeBSD*
 ----------------------------------------------
@@ -155,9 +154,13 @@ The prod_next local variable points to the next element of 
the table, or several

 If there is not enough room in the ring (this is detected by checking 
cons_tail), it returns an error.

-.. image6_png has been replaced

-|ring-enqueue1|
+.. _figure_ring-enqueue1:
+
+.. figure:: img/ring-enqueue1.*
+
+   Enqueue first step
+

 Enqueue Second Step
 ^^^^^^^^^^^^^^^^^^^
@@ -166,9 +169,13 @@ The second step is to modify *ring->prod_head* in ring 
structure to point to the

 A pointer to the added object is copied in the ring (obj4).

-.. image7_png has been replaced

-|ring-enqueue2|
+.. _figure_ring-enqueue2:
+
+.. figure:: img/ring-enqueue2.*
+
+   Enqueue second step
+

 Enqueue Last Step
 ^^^^^^^^^^^^^^^^^
@@ -176,9 +183,13 @@ Enqueue Last Step
 Once the object is added in the ring, ring->prod_tail in the ring structure is 
modified to point to the same location as *ring->prod_head*.
 The enqueue operation is finished.

-.. image8_png has been replaced

-|ring-enqueue3|
+.. _figure_ring-enqueue3:
+
+.. figure:: img/ring-enqueue3.*
+
+   Enqueue last step
+

 Single Consumer Dequeue
 ~~~~~~~~~~~~~~~~~~~~~~~
@@ -196,9 +207,13 @@ The cons_next local variable points to the next element of 
the table, or several

 If there are not enough objects in the ring (this is detected by checking 
prod_tail), it returns an error.

-.. image9_png has been replaced

-|ring-dequeue1|
+.. _figure_ring-dequeue1:
+
+.. figure:: img/ring-dequeue1.*
+
+   Dequeue last step
+

 Dequeue Second Step
 ^^^^^^^^^^^^^^^^^^^
@@ -207,9 +222,13 @@ The second step is to modify ring->cons_head in the ring 
structure to point to t

 The pointer to the dequeued object (obj1) is copied in the pointer given by 
the user.

-.. image10_png has been replaced

-|ring-dequeue2|
+.. _figure_ring-dequeue2:
+
+.. figure:: img/ring-dequeue2.*
+
+   Dequeue second step
+

 Dequeue Last Step
 ^^^^^^^^^^^^^^^^^
@@ -217,9 +236,13 @@ Dequeue Last Step
 Finally, ring->cons_tail in the ring structure is modified to point to the 
same location as ring->cons_head.
 The dequeue operation is finished.

-.. image11_png has been replaced

-|ring-dequeue3|
+.. _figure_ring-dequeue3:
+
+.. figure:: img/ring-dequeue3.*
+
+   Dequeue last step
+

 Multiple Producers Enqueue
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -229,8 +252,8 @@ In this example, only the producer head and tail (prod_head 
and prod_tail) are m

 The initial state is to have a prod_head and prod_tail pointing at the same 
location.

-MC Enqueue First Step
-^^^^^^^^^^^^^^^^^^^^^
+Multiple Consumer Enqueue First Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 On both cores, *ring->prod_head* and ring->cons_tail are copied in local 
variables.
 The prod_next local variable points to the next element of the table,
@@ -238,12 +261,16 @@ or several elements after in the case of bulk enqueue.

 If there is not enough room in the ring (this is detected by checking 
cons_tail), it returns an error.

-.. image12_png has been replaced

-|ring-mp-enqueue1|
+.. _figure_ring-mp-enqueue1:
+
+.. figure:: img/ring-mp-enqueue1.*
+
+   Multiple consumer enqueue first step
+

-MC Enqueue Second Step
-^^^^^^^^^^^^^^^^^^^^^^
+Multiple Consumer Enqueue Second Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 The second step is to modify ring->prod_head in the ring structure to point to 
the same location as prod_next.
 This operation is done using a Compare And Swap (CAS) instruction, which does 
the following operations atomically:
@@ -256,41 +283,57 @@ This operation is done using a Compare And Swap (CAS) 
instruction, which does th

 In the figure, the operation succeeded on core 1, and step one restarted on 
core 2.

-.. image13_png has been replaced

-|ring-mp-enqueue2|
+.. _figure_ring-mp-enqueue2:

-MC Enqueue Third Step
-^^^^^^^^^^^^^^^^^^^^^
+.. figure:: img/ring-mp-enqueue2.*
+
+   Multiple consumer enqueue second step
+
+
+Multiple Consumer Enqueue Third Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 The CAS operation is retried on core 2 with success.

 The core 1 updates one element of the ring(obj4), and the core 2 updates 
another one (obj5).

-.. image14_png has been replaced

-|ring-mp-enqueue3|
+.. _figure_ring-mp-enqueue3:
+
+.. figure:: img/ring-mp-enqueue3.*
+
+   Multiple consumer enqueue third step
+

-MC Enqueue Fourth Step
-^^^^^^^^^^^^^^^^^^^^^^
+Multiple Consumer Enqueue Fourth Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Each core now wants to update ring->prod_tail.
 A core can only update it if ring->prod_tail is equal to the prod_head local 
variable.
 This is only true on core 1. The operation is finished on core 1.

-.. image15_png has been replaced

-|ring-mp-enqueue4|
+.. _figure_ring-mp-enqueue4:

-MC Enqueue Last Step
-^^^^^^^^^^^^^^^^^^^^
+.. figure:: img/ring-mp-enqueue4.*
+
+   Multiple consumer enqueue fourth step
+
+
+Multiple Consumer Enqueue Last Step
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Once ring->prod_tail is updated by core 1, core 2 is allowed to update it too.
 The operation is also finished on core 2.

-.. image16_png has been replaced

-|ring-mp-enqueue5|
+.. _figure_ring-mp-enqueue5:
+
+.. figure:: img/ring-mp-enqueue5.*
+
+   Multiple consumer enqueue last step
+

 Modulo 32-bit Indexes
 ~~~~~~~~~~~~~~~~~~~~~
@@ -309,15 +352,23 @@ The following are two examples that help to explain how 
indexes are used in a ri
     In addition, the four indexes are defined as unsigned 16-bit integers,
     as opposed to unsigned 32-bit integers in the more realistic case.

-.. image17_png has been replaced

-|ring-modulo1|
+.. _figure_ring-modulo1:
+
+.. figure:: img/ring-modulo1.*
+
+   Modulo 32-bit indexes - Example 1
+

 This ring contains 11000 entries.

-.. image18_png has been replaced

-|ring-modulo2|
+.. _figure_ring-modulo2:
+
+.. figure:: img/ring-modulo2.*
+
+      Modulo 32-bit indexes - Example 2
+

 This ring contains 12536 entries.

@@ -346,31 +397,3 @@ References
     *   `bufring.c in FreeBSD 
<http://svn.freebsd.org/viewvc/base/release/8.0.0/sys/kern/subr_bufring.c?revision=199625&amp;view=markup>`_
 (version 8)

     *   `Linux Lockless Ring Buffer Design <http://lwn.net/Articles/340400/>`_
-
-.. |ring1| image:: img/ring1.*
-
-.. |ring-enqueue1| image:: img/ring-enqueue1.*
-
-.. |ring-enqueue2| image:: img/ring-enqueue2.*
-
-.. |ring-enqueue3| image:: img/ring-enqueue3.*
-
-.. |ring-dequeue1| image:: img/ring-dequeue1.*
-
-.. |ring-dequeue2| image:: img/ring-dequeue2.*
-
-.. |ring-dequeue3| image:: img/ring-dequeue3.*
-
-.. |ring-mp-enqueue1| image:: img/ring-mp-enqueue1.*
-
-.. |ring-mp-enqueue2| image:: img/ring-mp-enqueue2.*
-
-.. |ring-mp-enqueue3| image:: img/ring-mp-enqueue3.*
-
-.. |ring-mp-enqueue4| image:: img/ring-mp-enqueue4.*
-
-.. |ring-mp-enqueue5| image:: img/ring-mp-enqueue5.*
-
-.. |ring-modulo1| image:: img/ring-modulo1.*
-
-.. |ring-modulo2| image:: img/ring-modulo2.*
diff --git a/doc/guides/sample_app_ug/dist_app.rst 
b/doc/guides/sample_app_ug/dist_app.rst
index bcff0dd..25844e0 100644
--- a/doc/guides/sample_app_ug/dist_app.rst
+++ b/doc/guides/sample_app_ug/dist_app.rst
@@ -47,11 +47,12 @@ into each other.
 This application can be used to benchmark performance using the traffic
 generator as shown in the figure below.

-.. _figure_22:
+.. _figure_dist_perf:

-**Figure 22. Performance Benchmarking Setup (Basic Environment)**
+.. figure:: img/dist_perf.*
+
+   Performance Benchmarking Setup (Basic Environment)

-|dist_perf|

 Compiling the Application
 -------------------------
@@ -106,7 +107,7 @@ Explanation
 The distributor application consists of three types of threads: a receive
 thread (lcore_rx()), a set of worker threads(locre_worker())
 and a transmit thread(lcore_tx()). How these threads work together is shown
-in Fig2 below. The main() function launches  threads of these three types.
+in :numref:`figure_dist_app` below. The main() function launches  threads of 
these three types.
 Each thread has a while loop which will be doing processing and which is
 terminated only upon SIGINT or ctrl+C. The receive and transmit threads
 communicate using a software ring (rte_ring structure).
@@ -136,11 +137,12 @@ Users who wish to terminate the running of the 
application have to press ctrl+C
 in the application will terminate all running threads gracefully and print
 final statistics to the user.

-.. _figure_23:
+.. _figure_dist_app:
+
+.. figure:: img/dist_app.*

-**Figure 23. Distributor Sample Application Layout**
+   Distributor Sample Application Layout

-|dist_app|

 Debug Logging Support
 ---------------------
@@ -171,7 +173,3 @@ Sample Application. See Section 9.4.4, "RX Queue 
Initialization".

 TX queue initialization is done in the same way as it is done in the L2 
Forwarding
 Sample Application. See Section 9.4.5, "TX Queue Initialization".
-
-.. |dist_perf| image:: img/dist_perf.*
-
-.. |dist_app| image:: img/dist_app.*
diff --git a/doc/guides/sample_app_ug/exception_path.rst 
b/doc/guides/sample_app_ug/exception_path.rst
index 6c06959..3cc7cbe 100644
--- a/doc/guides/sample_app_ug/exception_path.rst
+++ b/doc/guides/sample_app_ug/exception_path.rst
@@ -46,13 +46,12 @@ The second thread reads from a TAP interface and writes the 
data unmodified to t

 The packet flow through the exception path application is as shown in the 
following figure.

-.. _figure_1:
+.. _figure_exception_path_example:

-**Figure 1. Packet Flow**
+.. figure:: img/exception_path_example.*

-.. image2_png has been replaced
+   Packet Flow

-|exception_path_example|

 To make throughput measurements, kernel bridges must be setup to forward data 
between the bridges appropriately.

@@ -327,4 +326,3 @@ To remove bridges and persistent TAP interfaces, the 
following commands are used
     brctl delbr br0
     openvpn --rmtun --dev tap_dpdk_00

-.. |exception_path_example| image:: img/exception_path_example.*
diff --git a/doc/guides/sample_app_ug/index.rst 
b/doc/guides/sample_app_ug/index.rst
index aaa95ef..745a7ac 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -74,57 +74,63 @@ Sample Applications User Guide

 **Figures**

-:ref:`Figure 1.Packet Flow <figure_1>`
+:numref:`figure_exception_path_example` :ref:`figure_exception_path_example`

-:ref:`Figure 2.Kernel NIC Application Packet Flow <figure_2>`
+:numref:`figure_kernel_nic` :ref:`figure_kernel_nic`

-:ref:`Figure 3.Performance Benchmark Setup (Basic Environment) <figure_3>`
+:numref:`figure_l2_fwd_benchmark_setup_jobstats` 
:ref:`figure_l2_fwd_benchmark_setup_jobstats`

-:ref:`Figure 4.Performance Benchmark Setup (Virtualized Environment) 
<figure_4>`
+:numref:`figure_l2_fwd_virtenv_benchmark_setup_jobstats` 
:ref:`figure_l2_fwd_virtenv_benchmark_setup_jobstats`

-:ref:`Figure 5.Load Balancer Application Architecture <figure_5>`
+:numref:`figure_l2_fwd_benchmark_setup` :ref:`figure_l2_fwd_benchmark_setup`

-:ref:`Figure 5.Example Rules File <figure_5_1>`
+:numref:`figure_l2_fwd_virtenv_benchmark_setup` 
:ref:`figure_l2_fwd_virtenv_benchmark_setup`

-:ref:`Figure 6.Example Data Flow in a Symmetric Multi-process Application 
<figure_6>`
+:numref:`figure_ipv4_acl_rule` :ref:`figure_ipv4_acl_rule`

-:ref:`Figure 7.Example Data Flow in a Client-Server Symmetric Multi-process 
Application <figure_7>`
+:numref:`figure_example_rules` :ref:`figure_example_rules`

-:ref:`Figure 8.Master-slave Process Workflow <figure_8>`
+:numref:`figure_load_bal_app_arch` :ref:`figure_load_bal_app_arch`

-:ref:`Figure 9.Slave Process Recovery Process Flow <figure_9>`
+:numref:`figure_sym_multi_proc_app` :ref:`figure_sym_multi_proc_app`

-:ref:`Figure 10.QoS Scheduler Application Architecture <figure_10>`
+:numref:`figure_client_svr_sym_multi_proc_app` 
:ref:`figure_client_svr_sym_multi_proc_app`

-:ref:`Figure 11.Intel?QuickAssist Technology Application Block Diagram 
<figure_11>`
+:numref:`figure_master_slave_proc` :ref:`figure_master_slave_proc`

-:ref:`Figure 12.Pipeline Overview <figure_12>`
+:numref:`figure_slave_proc_recov` :ref:`figure_slave_proc_recov`

-:ref:`Figure 13.Ring-based Processing Pipeline Performance Setup <figure_13>`
+:numref:`figure_qos_sched_app_arch` :ref:`figure_qos_sched_app_arch`

-:ref:`Figure 14.Threads and Pipelines <figure_14>`
+:numref:`figure_quickassist_block_diagram` 
:ref:`figure_quickassist_block_diagram`

-:ref:`Figure 15.Packet Flow Through the VMDQ and DCB Sample Application 
<figure_15>`
+:numref:`figure_pipeline_overview` :ref:`figure_pipeline_overview`

-:ref:`Figure 16.QEMU Virtio-net (prior to vhost-net) <figure_16>`
+:numref:`figure_ring_pipeline_perf_setup` 
:ref:`figure_ring_pipeline_perf_setup`

-:ref:`Figure 17.Virtio with Linux* Kernel Vhost <figure_17>`
+:numref:`figure_threads_pipelines` :ref:`figure_threads_pipelines`

-:ref:`Figure 18.Vhost-net Architectural Overview <figure_18>`
+:numref:`figure_vmdq_dcb_example` :ref:`figure_vmdq_dcb_example`

-:ref:`Figure 19.Packet Flow Through the vhost-net Sample Application 
<figure_19>`
+:numref:`figure_qemu_virtio_net` :ref:`figure_qemu_virtio_net`

-:ref:`Figure 20.Packet Flow on TX in DPDK-testpmd <figure_20>`
+:numref:`figure_virtio_linux_vhost` :ref:`figure_virtio_linux_vhost`

-:ref:`Figure 21.Test Pipeline Application <figure_21>`
+:numref:`figure_vhost_net_arch` :ref:`figure_vhost_net_arch`

-:ref:`Figure 22.Performance Benchmarking Setup (Basic Environment) <figure_22>`
+:numref:`figure_vhost_net_sample_app` :ref:`figure_vhost_net_sample_app`

-:ref:`Figure 23.Distributor Sample Application Layout <figure_23>`
+:numref:`figure_tx_dpdk_testpmd` :ref:`figure_tx_dpdk_testpmd`

-:ref:`Figure 24.High level Solution <figure_24>`
+:numref:`figure_test_pipeline_app` :ref:`figure_test_pipeline_app`

-:ref:`Figure 25.VM request to scale frequency <figure_25>`
+:numref:`figure_dist_perf` :ref:`figure_dist_perf`
+
+:numref:`figure_dist_app` :ref:`figure_dist_app`
+
+:numref:`figure_vm_power_mgr_highlevel` :ref:`figure_vm_power_mgr_highlevel`
+
+:numref:`figure_vm_power_mgr_vm_request_seq` 
:ref:`figure_vm_power_mgr_vm_request_seq`

 **Tables**

diff --git a/doc/guides/sample_app_ug/intel_quickassist.rst 
b/doc/guides/sample_app_ug/intel_quickassist.rst
index 7f55282..a80d4ca 100644
--- a/doc/guides/sample_app_ug/intel_quickassist.rst
+++ b/doc/guides/sample_app_ug/intel_quickassist.rst
@@ -46,17 +46,16 @@ For this sample application, there is a dependency on 
either of:
 Overview
 --------

-An overview of the application is provided in Figure 11.
+An overview of the application is provided in 
:numref:`figure_quickassist_block_diagram`.
 For simplicity, only two NIC ports and one Intel? QuickAssist Technology 
device are shown in this diagram,
 although the number of NIC ports and Intel? QuickAssist Technology devices can 
be different.

-.. _figure_11:
+.. _figure_quickassist_block_diagram:

-**Figure 11. Intel? QuickAssist Technology Application Block Diagram**
+.. figure:: img/quickassist_block_diagram.*

-.. image14_png has been renamed
+   Intel? QuickAssist Technology Application Block Diagram

-|quickassist_block_diagram|

 The application allows the configuration of the following items:

@@ -220,5 +219,3 @@ performing AES-CBC-128 encryption with AES-XCBC-MAC-96 
hash, the following setti

 Refer to the *DPDK Test Report* for more examples of traffic generator setup 
and the application startup command lines.
 If no errors are generated in response to the startup commands, the 
application is running correctly.
-
-.. |quickassist_block_diagram| image:: img/quickassist_block_diagram.*
diff --git a/doc/guides/sample_app_ug/kernel_nic_interface.rst 
b/doc/guides/sample_app_ug/kernel_nic_interface.rst
index d6876e2..02dde59 100644
--- a/doc/guides/sample_app_ug/kernel_nic_interface.rst
+++ b/doc/guides/sample_app_ug/kernel_nic_interface.rst
@@ -71,13 +71,12 @@ it is just for performance testing, or it can work together 
with VMDq support in

 The packet flow through the Kernel NIC Interface application is as shown in 
the following figure.

-.. _figure_2:
+.. _figure_kernel_nic:

-**Figure 2. Kernel NIC Application Packet Flow**
+.. figure:: img/kernel_nic.*

-.. image3_png has been renamed to kernel_nic.*
+   Kernel NIC Application Packet Flow

-|kernel_nic|

 Compiling the Application
 -------------------------
@@ -616,5 +615,3 @@ Currently, setting a new MTU and configuring the network 
interface (up/ down) ar
             RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);
         return ret;
     }
-
-.. |kernel_nic| image:: img/kernel_nic.*
diff --git a/doc/guides/sample_app_ug/l2_forward_job_stats.rst 
b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
index eafb8df..b588faa 100644
--- a/doc/guides/sample_app_ug/l2_forward_job_stats.rst
+++ b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
@@ -55,27 +55,24 @@ Also, the MAC addresses are affected as follows:

 *   The destination MAC address is replaced by  02:00:00:00:00:TX_PORT_ID

-This application can be used to benchmark performance using a 
traffic-generator, as shown in the Figure 3.
+This application can be used to benchmark performance using a 
traffic-generator, as shown in the 
:numref:`figure_l2_fwd_benchmark_setup_jobstats`.

-The application can also be used in a virtualized environment as shown in 
Figure 4.
+The application can also be used in a virtualized environment as shown in 
:numref:`figure_l2_fwd_virtenv_benchmark_setup_jobstats`.

 The L2 Forwarding application can also be used as a starting point for 
developing a new application based on the DPDK.

-.. _figure_3:
+.. _figure_l2_fwd_benchmark_setup_jobstats:

-**Figure 3. Performance Benchmark Setup (Basic Environment)**
+.. figure:: img/l2_fwd_benchmark_setup.*

-.. image4_png has been replaced
+   Performance Benchmark Setup (Basic Environment)

-|l2_fwd_benchmark_setup|
+.. _figure_l2_fwd_virtenv_benchmark_setup_jobstats:

-.. _figure_4:
+.. figure:: img/l2_fwd_virtenv_benchmark_setup.*

-**Figure 4. Performance Benchmark Setup (Virtualized Environment)**
+   Performance Benchmark Setup (Virtualized Environment)

-.. image5_png has been renamed
-
-|l2_fwd_virtenv_benchmark_setup|

 Virtual Function Setup Instructions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -631,7 +628,3 @@ however it improves performance:
          * in which it was called. */
         rte_jobstats_finish(&qconf->flush_job, qconf->flush_job.target);
     }
-
-.. |l2_fwd_benchmark_setup| image:: img/l2_fwd_benchmark_setup.*
-
-.. |l2_fwd_virtenv_benchmark_setup| image:: 
img/l2_fwd_virtenv_benchmark_setup.*
diff --git a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst 
b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
index 234d71d..9334e75 100644
--- a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
+++ b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
@@ -54,27 +54,25 @@ Also, the MAC addresses are affected as follows:

 *   The destination MAC address is replaced by  02:00:00:00:00:TX_PORT_ID

-This application can be used to benchmark performance using a 
traffic-generator, as shown in the Figure 3.
+This application can be used to benchmark performance using a 
traffic-generator, as shown in the :numref:`figure_l2_fwd_benchmark_setup`.

-The application can also be used in a virtualized environment as shown in 
Figure 4.
+The application can also be used in a virtualized environment as shown in 
:numref:`figure_l2_fwd_virtenv_benchmark_setup`.

 The L2 Forwarding application can also be used as a starting point for 
developing a new application based on the DPDK.

-.. _figure_3:
+.. _figure_l2_fwd_benchmark_setup:

-**Figure 3. Performance Benchmark Setup (Basic Environment)**
+.. figure:: img/l2_fwd_benchmark_setup.*

-.. image4_png has been replaced
+   Performance Benchmark Setup (Basic Environment)

-|l2_fwd_benchmark_setup|

-.. _figure_4:
+.. _figure_l2_fwd_virtenv_benchmark_setup:

-**Figure 4. Performance Benchmark Setup (Virtualized Environment)**
+.. figure:: img/l2_fwd_virtenv_benchmark_setup.*

-.. image5_png has been renamed
+   Performance Benchmark Setup (Virtualized Environment)

-|l2_fwd_virtenv_benchmark_setup|

 Virtual Function Setup Instructions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -526,7 +524,3 @@ however it improves performance:

         prev_tsc = cur_tsc;
     }
-
-.. |l2_fwd_benchmark_setup| image:: img/l2_fwd_benchmark_setup.*
-
-.. |l2_fwd_virtenv_benchmark_setup| image:: 
img/l2_fwd_virtenv_benchmark_setup.*
diff --git a/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst 
b/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
index 73fa4df..dbf47c7 100644
--- a/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
+++ b/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
@@ -142,9 +142,13 @@ Other lines types are considered invalid.

 *   A typical IPv4 ACL rule line should have a format as shown below:

-.. image6_png has been renamed

-|ipv4_acl_rule|
+.. _figure_ipv4_acl_rule:
+
+.. figure:: img/ipv4_acl_rule.*
+
+   A typical IPv4 ACL rule
+

 IPv4 addresses are specified in CIDR format as specified in RFC 4632.
 They consist of the dot notation for the address and a prefix length separated 
by '/'.
@@ -164,15 +168,12 @@ For example: 6/0xfe matches protocol values 6 and 7.
 Rules File Example
 ~~~~~~~~~~~~~~~~~~

-.. _figure_5_1:
+.. _figure_example_rules:

-Figure 5 is an example of a rules file. This file has three rules, one for ACL 
and two for route information.
+.. figure:: img/example_rules.*

-**Figure 5.Example Rules File**
+   Rules example

-.. image7_png has been renamed
-
-|example_rules|

 Each rule is explained as follows:

@@ -397,7 +398,3 @@ Finally, the application creates contexts handler from the 
ACL library,
 adds rules parsed from the file into the database and build an ACL trie.
 It is important to note that the application creates an independent copy of 
each database for each socket CPU
 involved in the task to reduce the time for remote memory access.
-
-.. |ipv4_acl_rule| image:: img/ipv4_acl_rule.*
-
-.. |example_rules| image:: img/example_rules.*
diff --git a/doc/guides/sample_app_ug/load_balancer.rst 
b/doc/guides/sample_app_ug/load_balancer.rst
index 6237633..857eb8a 100644
--- a/doc/guides/sample_app_ug/load_balancer.rst
+++ b/doc/guides/sample_app_ug/load_balancer.rst
@@ -44,13 +44,12 @@ Overview

 The architecture of the Load Balance application is presented in the following 
figure.

-.. _figure_5:
+.. _figure_load_bal_app_arch:

-**Figure 5. Load Balancer Application Architecture**
+.. figure:: img/load_bal_app_arch.*

-.. image8_png has been renamed
+   Load Balancer Application Architecture

-|load_bal_app_arch|

 For the sake of simplicity, the diagram illustrates a specific case of two I/O 
RX and two I/O TX lcores off loading the packet I/O
 overhead incurred by four NIC ports from four worker cores, with each I/O 
lcore handling RX/TX for two NIC ports.
@@ -241,5 +240,3 @@ are on the same or different CPU sockets, the following 
run-time scenarios are p
 #.  ABC: The packet is received on socket A, it is processed by an lcore on 
socket B,
     then it has to be transmitted out by a NIC connected to socket C.
     The performance price for crossing the CPU socket boundary is paid twice 
for this packet.
-
-.. |load_bal_app_arch| image:: img/load_bal_app_arch.*
diff --git a/doc/guides/sample_app_ug/multi_process.rst 
b/doc/guides/sample_app_ug/multi_process.rst
index 7ca71ca..f42cb9a 100644
--- a/doc/guides/sample_app_ug/multi_process.rst
+++ b/doc/guides/sample_app_ug/multi_process.rst
@@ -190,13 +190,12 @@ such as a client-server mode of operation seen in the 
next example,
 where different processes perform different tasks, yet co-operate to form a 
packet-processing system.)
 The following diagram shows the data-flow through the application, using two 
processes.

-.. _figure_6:
+.. _figure_sym_multi_proc_app:

-**Figure 6. Example Data Flow in a Symmetric Multi-process Application**
+.. figure:: img/sym_multi_proc_app.*

-.. image9_png has been renamed
+   Example Data Flow in a Symmetric Multi-process Application

-|sym_multi_proc_app|

 As the diagram shows, each process reads packets from each of the network 
ports in use.
 RSS is used to distribute incoming packets on each port to different hardware 
RX queues.
@@ -296,13 +295,12 @@ In this case, the client applications just perform 
level-2 forwarding of packets

 The following diagram shows the data-flow through the application, using two 
client processes.

-.. _figure_7:
+.. _figure_client_svr_sym_multi_proc_app:

-**Figure 7. Example Data Flow in a Client-Server Symmetric Multi-process 
Application**
+.. figure:: img/client_svr_sym_multi_proc_app.*

-.. image10_png has been renamed
+   Example Data Flow in a Client-Server Symmetric Multi-process Application

-|client_svr_sym_multi_proc_app|

 Running the Application
 ^^^^^^^^^^^^^^^^^^^^^^^
@@ -395,13 +393,12 @@ Once the master process begins to run, it tries to 
initialize all the resources
 memory, CPU cores, driver, ports, and so on, as the other examples do.
 Thereafter, it creates slave processes, as shown in the following figure.

-.. _figure_8:
+.. _figure_master_slave_proc:

-**Figure 8. Master-slave Process Workflow**
+.. figure:: img/master_slave_proc.*

-.. image11_png has been renamed
+   Master-slave Process Workflow

-|master_slave_proc|

 The master process calls the rte_eal_mp_remote_launch() EAL function to launch 
an application function for each pinned thread through the pipe.
 Then, it waits to check if any slave processes have exited.
@@ -475,13 +472,12 @@ Therefore, to provide the capability to resume the new 
slave instance if the pre

 The following diagram describes slave process recovery.

-.. _figure_9:
+.. _figure_slave_proc_recov:

-**Figure 9. Slave Process Recovery Process Flow**
+.. figure:: img/slave_proc_recov.*

-.. image12_png has been renamed
+   Slave Process Recovery Process Flow

-|slave_proc_recov|

 Floating Process Support
 ^^^^^^^^^^^^^^^^^^^^^^^^
@@ -774,11 +770,3 @@ so it remaps the resource to the new core ID slot.
         }
         return 0;
     }
-
-.. |sym_multi_proc_app| image:: img/sym_multi_proc_app.*
-
-.. |client_svr_sym_multi_proc_app| image:: img/client_svr_sym_multi_proc_app.*
-
-.. |master_slave_proc| image:: img/master_slave_proc.*
-
-.. |slave_proc_recov| image:: img/slave_proc_recov.*
diff --git a/doc/guides/sample_app_ug/qos_scheduler.rst 
b/doc/guides/sample_app_ug/qos_scheduler.rst
index 56326df..66c261c 100644
--- a/doc/guides/sample_app_ug/qos_scheduler.rst
+++ b/doc/guides/sample_app_ug/qos_scheduler.rst
@@ -38,13 +38,12 @@ Overview

 The architecture of the QoS scheduler application is shown in the following 
figure.

-.. _figure_10:
+.. _figure_qos_sched_app_arch:

-**Figure 10. QoS Scheduler Application Architecture**
+.. figure:: img/qos_sched_app_arch.*

-.. image13_png has been renamed
+   QoS Scheduler Application Architecture

-|qos_sched_app_arch|

 There are two flavors of the runtime execution for this application,
 with two or three threads per each packet flow configuration being used.
@@ -347,5 +346,3 @@ This application classifies based on the QinQ double VLAN 
tags and the IP destin
 
+----------------+-------------------------+--------------------------------------------------+----------------------------------+

 Please refer to the "QoS Scheduler" chapter in the *DPDK Programmer's Guide* 
for more information about these parameters.
-
-.. |qos_sched_app_arch| image:: img/qos_sched_app_arch.*
diff --git a/doc/guides/sample_app_ug/quota_watermark.rst 
b/doc/guides/sample_app_ug/quota_watermark.rst
index e091ad9..de9e118 100644
--- a/doc/guides/sample_app_ug/quota_watermark.rst
+++ b/doc/guides/sample_app_ug/quota_watermark.rst
@@ -54,15 +54,14 @@ and ports 2 and 3 forward into each other.
 The MAC addresses of the forwarded Ethernet frames are not affected.

 Internally, packets are pulled from the ports by the master logical core and 
put on a variable length processing pipeline,
-each stage of which being connected by rings, as shown in Figure 12.
+each stage of which being connected by rings, as shown in 
:numref:`figure_pipeline_overview`.

-.. _figure_12:
+.. _figure_pipeline_overview:

-**Figure 12. Pipeline Overview**
+.. figure:: img/pipeline_overview.*

-.. image15_png has been renamed
+   Pipeline Overview

-|pipeline_overview|

 An adjustable quota value controls how many packets are being moved through 
the pipeline per enqueue and dequeue.
 Adjustable watermark values associated with the rings control a back-off 
mechanism that
@@ -79,15 +78,14 @@ eventually lead to an Ethernet flow control frame being 
send to the source.

 On top of serving as an example of quota and watermark usage,
 this application can be used to benchmark ring based processing pipelines 
performance using a traffic- generator,
-as shown in Figure 13.
+as shown in :numref:`figure_ring_pipeline_perf_setup`.

-.. _figure_13:
+.. _figure_ring_pipeline_perf_setup:

-**Figure 13. Ring-based Processing Pipeline Performance Setup**
+.. figure:: img/ring_pipeline_perf_setup.*

-.. image16_png has been renamed
+   Ring-based Processing Pipeline Performance Setup

-|ring_pipeline_perf_setup|

 Compiling the Application
 -------------------------
@@ -311,7 +309,7 @@ Logical Cores Assignment
 The application uses the master logical core to poll all the ports for new 
packets and enqueue them on a ring associated with the port.

 Each logical core except the last runs pipeline_stage() after a ring for each 
used port is initialized on that core.
-pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue 
them on its own rings. See Figure 14.
+pipeline_stage() on core X dequeues packets from core X-1's rings and enqueue 
them on its own rings. See :numref:`figure_threads_pipelines`.

 .. code-block:: c

@@ -340,16 +338,12 @@ sending them out on the destination port setup by 
pair_ports().
 Receive, Process and Transmit Packets
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-.. _figure_14:
+.. _figure_threads_pipelines:

-Figure 14 shows where each thread in the pipeline is.
-It should be used as a reference while reading the rest of this section.
+.. figure:: img/threads_pipelines.*

-**Figure 14. Threads and Pipelines**
+   Threads and Pipelines

-.. image17_png has been renamed
-
-|threads_pipelines|

 In the receive_stage() function running on the master logical core,
 the main task is to read ingress packets from the RX ports and enqueue them
@@ -498,9 +492,3 @@ low_watermark from the rte_memzone previously created by qw.

         low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
     }
-
-.. |pipeline_overview| image:: img/pipeline_overview.*
-
-.. |ring_pipeline_perf_setup| image:: img/ring_pipeline_perf_setup.*
-
-.. |threads_pipelines| image:: img/threads_pipelines.*
diff --git a/doc/guides/sample_app_ug/test_pipeline.rst 
b/doc/guides/sample_app_ug/test_pipeline.rst
index 0432942..fbc290c 100644
--- a/doc/guides/sample_app_ug/test_pipeline.rst
+++ b/doc/guides/sample_app_ug/test_pipeline.rst
@@ -49,13 +49,12 @@ The application uses three CPU cores:

 *   Core C ("TX core") receives traffic from core B through software queues 
and sends it to the NIC ports for transmission.

-.. _figure_21:
+.. _figure_test_pipeline_app:

-**Figure 21.Test Pipeline Application**
+.. figure:: img/test_pipeline_app.*

-.. image24_png has been renamed
+   Test Pipeline Application

-|test_pipeline_app|

 Compiling the Application
 -------------------------
@@ -281,5 +280,3 @@ The profile for input traffic is TCP/IPv4 packets with:
 *   destination TCP port fixed to 0

 *   source TCP port fixed to 0
-
-.. |test_pipeline_app| image:: img/test_pipeline_app.*
diff --git a/doc/guides/sample_app_ug/vhost.rst 
b/doc/guides/sample_app_ug/vhost.rst
index cd9b232..5c4b79d 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -48,13 +48,12 @@ between host and guest.
 It was found that virtio-net performance was poor due to context switching and 
packet copying between host, guest, and QEMU.
 The following figure shows the system architecture for a virtio-based 
networking (virtio-net).

-.. _figure_16:
+.. _figure_qemu_virtio_net:

-**Figure16. QEMU Virtio-net (prior to vhost-net)**
+.. figure:: img/qemu_virtio_net.*

-.. image19_png has been renamed
+   System Architecture for Virtio-based Networking (virtio-net).

-|qemu_virtio_net|

 The Linux* Kernel vhost-net module was developed as an offload mechanism for 
virtio-net.
 The vhost-net module enables KVM (QEMU) to offload the servicing of virtio-net 
devices to the vhost-net kernel module,
@@ -76,13 +75,12 @@ This is achieved by QEMU sharing the following information 
with the vhost-net mo

 The following figure shows the system architecture for virtio-net networking 
with vhost-net offload.

-.. _figure_17:
+.. _figure_virtio_linux_vhost:

-**Figure 17. Virtio with Linux* Kernel Vhost**
+.. figure:: img/virtio_linux_vhost.*

-.. image20_png has been renamed
+   Virtio with Linux

-|virtio_linux_vhost|

 Sample Code Overview
 --------------------
@@ -119,23 +117,21 @@ The vhost sample code application is a simple packet 
switching application with

 The following figure shows the architecture of the Vhost sample application 
based on vhost-cuse.

-.. _figure_18:
+.. _figure_vhost_net_arch:

-**Figure 18. Vhost-net Architectural Overview**
+.. figure:: img/vhost_net_arch.*

-.. image21_png has been renamed
+   Vhost-net Architectural Overview

-|vhost_net_arch|

 The following figure shows the flow of packets through the vhost-net sample 
application.

-.. _figure_19:
+.. _figure_vhost_net_sample_app:

-**Figure 19. Packet Flow Through the vhost-net Sample Application**
+.. figure:: img/vhost_net_sample_app.*

-.. image22_png  has been renamed
+   Packet Flow Through the vhost-net Sample Application

-|vhost_net_sample_app|

 Supported Distributions
 -----------------------
@@ -794,13 +790,12 @@ In the "wait and retry" mode if the virtqueue is found to 
be full, then testpmd
 The "wait and retry" algorithm is implemented in DPDK testpmd as a forwarding 
method call "mac_retry".
 The following sequence diagram describes the algorithm in detail.

-.. _figure_20:
+.. _figure_tx_dpdk_testpmd:

-**Figure 20. Packet Flow on TX in DPDK-testpmd**
+.. figure:: img/tx_dpdk_testpmd.*

-.. image23_png has been renamed
+   Packet Flow on TX in DPDK-testpmd

-|tx_dpdk_testpmd|

 Running Testpmd
 ~~~~~~~~~~~~~~~
@@ -861,13 +856,3 @@ For example:
 The above message indicates that device 0 has been registered with MAC address 
cc:bb:bb:bb:bb:bb and VLAN tag 1000.
 Any packets received on the NIC with these values is placed on the devices 
receive queue.
 When a virtio-net device transmits packets, the VLAN tag is added to the 
packet by the DPDK vhost sample code.
-
-.. |vhost_net_arch| image:: img/vhost_net_arch.*
-
-.. |qemu_virtio_net| image:: img/qemu_virtio_net.*
-
-.. |tx_dpdk_testpmd| image:: img/tx_dpdk_testpmd.*
-
-.. |vhost_net_sample_app| image:: img/vhost_net_sample_app.*
-
-.. |virtio_linux_vhost| image:: img/virtio_linux_vhost.*
diff --git a/doc/guides/sample_app_ug/vm_power_management.rst 
b/doc/guides/sample_app_ug/vm_power_management.rst
index 2a923d8..81db6ad 100644
--- a/doc/guides/sample_app_ug/vm_power_management.rst
+++ b/doc/guides/sample_app_ug/vm_power_management.rst
@@ -74,11 +74,12 @@ The solution is comprised of two high-level components:
    The l3fwd-power application will use this implementation when deployed on a 
VM
    (see Chapter 11 "L3 Forwarding with Power Management Application").

-.. _figure_24:
+.. _figure_vm_power_mgr_highlevel:

-**Figure 24. Highlevel Solution**
+.. figure:: img/vm_power_mgr_highlevel.*
+
+   Highlevel Solution

-|vm_power_mgr_highlevel|

 Overview
 --------
@@ -105,11 +106,12 @@ at runtime based on the environment.
 Upon receiving a request, the host translates the vCPU to a pCPU via
 the libvirt API before forwarding to the host librte_power.

-.. _figure_25:
+.. _figure_vm_power_mgr_vm_request_seq:
+
+.. figure:: img/vm_power_mgr_vm_request_seq.*

-**Figure 25. VM request to scale frequency**
+   VM request to scale frequency

-|vm_power_mgr_vm_request_seq|

 Performance Considerations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -355,7 +357,3 @@ Where {core_num} is the lcore and channel to change 
frequency by scaling up/down
 .. code-block:: console

   set_cpu_freq {core_num} up|down|min|max
-
-.. |vm_power_mgr_highlevel| image:: img/vm_power_mgr_highlevel.*
-
-.. |vm_power_mgr_vm_request_seq| image:: img/vm_power_mgr_vm_request_seq.*
diff --git a/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst 
b/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
index e5d34e1..49ec6ce 100644
--- a/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
+++ b/doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
@@ -53,7 +53,7 @@ All traffic is read from a single incoming port (port 0) and 
output on port 1, w
 The traffic is split into 128 queues on input, where each thread of the 
application reads from multiple queues.
 For example, when run with 8 threads, that is, with the -c FF option, each 
thread receives and forwards packets from 16 queues.

-As supplied, the sample application configures the VMDQ feature to have 16 
pools with 8 queues each as indicated in Figure 15.
+As supplied, the sample application configures the VMDQ feature to have 16 
pools with 8 queues each as indicated in :numref:`figure_vmdq_dcb_example`.
 The Intel? 82599 10 Gigabit Ethernet Controller NIC also supports the 
splitting of traffic into 32 pools of 4 queues each and
 this can be used by changing the NUM_POOLS parameter in the supplied code.
 The NUM_POOLS parameter can be passed on the command line, after the EAL 
parameters:
@@ -64,13 +64,12 @@ The NUM_POOLS parameter can be passed on the command line, 
after the EAL paramet

 where, NP can be 16 or 32.

-.. _figure_15:
+.. _figure_vmdq_dcb_example:

-**Figure 15. Packet Flow Through the VMDQ and DCB Sample Application**
+.. figure:: img/vmdq_dcb_example.*

-.. image18_png has been replaced
+   Packet Flow Through the VMDQ and DCB Sample Application

-|vmdq_dcb_example|

 In Linux* user space, the application can display statistics with the number 
of packets received on each queue.
 To have the application display the statistics, send a SIGHUP signal to the 
running application process, as follows:
@@ -247,5 +246,3 @@ To generate the statistics output, use the following 
command:

 Please note that the statistics output will appear on the terminal where the 
vmdq_dcb_app is running,
 rather than the terminal from which the HUP signal was sent.
-
-.. |vmdq_dcb_example| image:: img/vmdq_dcb_example.*
diff --git a/doc/guides/xen/pkt_switch.rst b/doc/guides/xen/pkt_switch.rst
index a9eca52..3a6fc47 100644
--- a/doc/guides/xen/pkt_switch.rst
+++ b/doc/guides/xen/pkt_switch.rst
@@ -52,9 +52,13 @@ The switching back end maps those grant table references and 
creates shared ring

 The following diagram describes the functionality of the DPDK Xen Packet- 
Switching Solution.

-.. image35_png has been renamed

-|dpdk_xen_pkt_switch|
+.. _figure_dpdk_xen_pkt_switch:
+
+.. figure:: img/dpdk_xen_pkt_switch.*
+
+   Functionality of the DPDK Xen Packet Switching Solution.
+

 Note 1 The Xen hypervisor uses a mechanism called a Grant Table to share 
memory between domains
 (`http://wiki.xen.org/wiki/Grant Table 
<http://wiki.xen.org/wiki/Grant%20Table>`_).
@@ -62,9 +66,13 @@ Note 1 The Xen hypervisor uses a mechanism called a Grant 
Table to share memory
 A diagram of the design is shown below, where "gva" is the Guest Virtual 
Address,
 which is the data pointer of the mbuf, and "hva" is the Host Virtual Address:

-.. image36_png has been renamed

-|grant_table|
+.. _figure_grant_table:
+
+.. figure:: img/grant_table.*
+
+   DPDK Xen Layout
+

 In this design, a Virtio ring is used as a para-virtualized interface for 
better performance over a Xen private ring
 when packet switching to and from a VM.
@@ -139,9 +147,13 @@ Take idx#_mempool_gref node for example, the host maps 
those Grant references to
 The real Grant reference information is stored in this virtual address space,
 where (gref, pfn) pairs follow each other with -1 as the terminator.

-.. image37_pnng has been renamed

-|grant_refs|
+.. _figure_grant_refs:
+
+.. figure:: img/grant_refs.*
+
+   Mapping Grant references to a continuous virtual address space
+

 After all gref# IDs are retrieved, the host maps them to a continuous virtual 
address space.
 With the guest mempool virtual address, the host establishes 1:1 address 
mapping.
@@ -456,9 +468,3 @@ then sent out through hardware with destination MAC address 
00:00:00:00:00:33.
 The packet flow is:

 packet generator->Virtio in guest VM1->switching backend->Virtio in guest 
VM2->switching backend->wire
-
-.. |grant_table| image:: img/grant_table.*
-
-.. |grant_refs| image:: img/grant_refs.*
-
-.. |dpdk_xen_pkt_switch| image:: img/dpdk_xen_pkt_switch.*
-- 
1.8.1.4

Reply via email to