Howdy all,

We are planning on migrating several LUN's we have on an oracle box to a
new NetApp all flash storage backend. We've gone through a few tests
ourselves to ensure that we don't cause impact to the box and everything
has been successful so far. The server will remain up during the migration
and we are not planning on bring down any services. I just wanted to see if
others had any similar experience and wouldn't mind sharing. Particularly,
does anyone see any steps that might cause impact, halt the box, or cause
path's to fail where the storage itself becomes unavailable. This is our
current high level steps:


   1. Zone the host to include the new HA NetApp pair *[no impact, no
   server changes, only SAN fabric additions (very safe)]*
   2. Create a volume on the destination HA NetApp pair *[no impact, no
   server changes (very safe)]*
   3. Validate the portset on NetApp to include the destination HA pair *[no
   impact, no changes, verification only (very safe)]*
   4. Add reporting nodes to LUN *[no impact, no server changes, NetApp
   additions only (very safe)]*
   5. LUN scan each HBA individually ($echo "- - -" >
   /sys/class/scsi_host/host3/scan && sleep 5 && echo "- - -" >
   /sys/class/scsi_host/host4/scan) *[Should not cause impact (generally
   safe)]*
   6. Validate that 8 new non-optimized paths now appear on the server
   ($multipath -ll) *[no impact, command does not make changes (very safe)]*
   7. Validate the new paths are secondary ($sanlun lun show -p -v) *[no
   impact, command does not make changes (very safe)]*
   8. Perform the NetApp LUN move *[no impact, no server changes (very
   safe)]*
   9. Remove the reporting nodes from LUN *[no impact, no server changes,
   NetApp deletion only (generally safe)]*
   10. Validate the 8 original paths are now failed ($multipath -ll) *[no
   impact, command does not make changes (very safe)]*
   11. Validate that Linux automatically sees 4 optimized paths among the 8
   new paths ($sanlun lun show -p -v) *[no impact, command does not make
   changes (very safe)]*
   12. Delete the failed paths (echo 1 > /sys/block/sdX/device/delete) *[Should
   not cause impact (generally safe)]*

My only concern is related to part of some Red Hat documentation I came
across [1] that states the following:

    "interconnect scanning is not recommended when the system is under
memory pressure. To determine the level of memory pressure, run the
command vmstat
1 100; interconnect scanning is not recommended if free memory is less than
5% of the total memory in more than 10 samples per 100. It is also not
recommended if swapping is active (non-zero si and so columns in the
vmstat output).
The command free can also display the total memory."

These oracle boxes typically have all their memory used (I see the cached
39G).

    [root@oraspace01 ~]# free -g
                 total       used       free     shared    buffers
cached
    Mem:           188        187          1          0          0
39
    -/+ buffers/cache:        147         41
    Swap:           79          0         79

I'm not an oracle DBA so I don't know a lot of specifics about their
inter-workings, but from what I understand some oracle systems/processes
can use all the memory a machine has, no matter how much you give it. I've
seen ZFS and VMWare do this as well. They claim a large amount of memory,
but aren't using it until they actually need it. It's more efficient and
allows for higher throughput and processing. So the fact that free thinks
the machine is low on memory isn't really an issue for me, I'm just
concerned with the documentation shown earlier.

Does anyone know if running a scan on the SCSI bus while the system thinks
there isn't much available memory would cause issues? Has anyone done
similar types of migrations (doesn't have to be with NetApp). In essence
all we are doing is presenting additional paths temporarily, moving the
storage, then deleting the old paths. Is there a better way to delete
paths? A rescan of the SCSI bus only adds paths (at least from what I found
and read). Anybody have some nifty or cleaver step to add that makes things
easier/safer/better/faster/etc?

Thanks,
Joshua Schaeffer

[1]
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/scanning-storage-interconnects.html

Reply via email to