Thanks Aaron and Mike for the detailed email. Here is what we are trying to do.


  1.  We are trying to set up a Marklogic Cluster of 3 servers for Faileover.
  2.  We do not have GFS/Clustered file system.
  3.  We are trying to find out what are our best options.

>From the email chain this is what I understand our options are.

Scenario 1:

  1.  Use dedicated NAS one for each server. A total of 3 dedicated NAS will be 
needed.
  2.  Configure a file replication service to replicate forests among all 3 
instances of  NAS.

    Question: is there any documentation on how to configure the replication 
service for Marklogic forest replication?

Scenario 2:

  1.  Use local storage on all three servers
  2.  Configure a file replication service to replicate forests among all 3 
instances of  local storage.

    Question: In this scenario if the local storage reached its capacity we can 
not increase the local storage. What are the options if  local storage gets 
maxed out?


Suggestions are most welcome.

__________________________
Kashif Khan




On 2/9/13 6:41 PM, "Aaron Rosenbaum" 
<aaron.rosenb...@marklogic.com<mailto:aaron.rosenb...@marklogic.com>> wrote:

Yes, you can use NAS. Like SAN, the key is adequate performance. This is the 
tricky part because getting that performance is very difficult and very 
expensive. When internal policies and infrastructure dictate SAN or NAS, 
dedicated high quality NAS can often be preferable to shared, under provisioned 
SAN (while being cheaper.)

As Mike pointed out, can you maintain HA with your NAS setup? This is 
particular to the unit.

Without a clustered file system, you won't have multiple nodes pointing at same 
volume. Each node should receive dedicated pools and bandwidth.

You should not stripe across all volumes then thin provision out of a single 
pool.

No CIFS, windows shares, SMB. NFS has performance limitations even with 10g 
under Linux. Test, test, test.

It is often overlay services of "fancy" NAS that kill performance - dedup, 
compression, site-to-site replication, etc that kill performance.

Is this a shared resource? If so, how do ensure enough bandwidth for the 
MarkLogic nodes? How do you ensure you don't destroy the performance of other 
nodes?  You should have explicit visibility and control of each volume.

An example of successful SLA's can be found in Amazons Provisioned IOPS 
storage. While neither SAN nor NAS, it's sets a standard for what you should 
expect/demand from shared storage:
- explicit bandwidth guarantee to the storage pool (110 mb/sec for most high 
end instances - coincidently the practical throughput limit for many NFS 
limitations.)
- guaranteed IOPS at large block sizes for each volume. You need 20 mb/sec per 
forest. 16 forests a node, not unreasonable for a nice system with local 
storage, would need 240,000 IOPS at 4k blocks from your NAS. I think you'll 
find local storage much more cost effective.
- sustained SLA compliance even if maxing out all guarantees. A typical pattern 
sometimes is that a MarkLogic user will ask for that much bandwidth (80K 4k 
IOPS per node) then get laughed at by the storage admins. It's out of band with 
everything they have experience with. MarkLogic can end up looking more like a 
video streaming load than like Oracle. It really uses that much bandwidth and 
if the total provided is less, performance can drop off a cliff.

We are developing guidelines now for AWS storage but one rule of thumb is 
probably useful for NAS also. If you can, provision one volume per forest so 
you can track an allocate performance by volume/forest with less effort.  It 
also will make reallocation of load easier.

Local disk replication will move the copies of forests around for HA. Don't try 
to do that with the disk subsystem.

If you pass along more details as to planned configurations, I may be of more 
help.

Aaron Rosenbaum
Director, Product Management
aaron.rosenb...@marklogic.com<mailto:aaron.rosenb...@marklogic.com>


Sent from my iPhone



An HTML attachment was scrubbed...
URL: 
http://developer.marklogic.com/pipermail/general/attachments/20130208/53611d26/attachment-0001.html
------------------------------
Message: 2
Date: Fri, 8 Feb 2013 14:51:30 -0800
From: Michael Blakeley <m...@blakeley.com<mailto:m...@blakeley.com>>
Subject: Re: [MarkLogic Dev General] Marklogic Cluster Setup
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Message-ID: 
<ef106553-25fd-4ee4-9615-4cf50b0e3...@blakeley.com<mailto:ef106553-25fd-4ee4-9615-4cf50b0e3...@blakeley.com>>
Content-Type: text/plain; charset=windows-1252
The question "which is faster?" is impossible to answer generically. It's 
possible to design local storage so that it is slower or faster than a given 
NAS. It's possible to design NAS so that it is slower or faster than given 
local storage. But in most cases it is cheaper to build out similar levels of 
performance from local disk than from NAS (or SAN).
Performance aside, I would not use a NAS as part of a failover solution. The 
whole point of failover is high availability, and relying on a NAS simply 
introduces another system that can fail. Using a NAS also implies shared 
filesystems, which are cantankerous and require their own fencing mechanisms. 
This pulls in yet more systems that can fail, and probably will.
I prefer to use local storage, with local replication of forests. This also 
avoids the strong probability that the I/O demands of the cluster will swamp 
the network link to the NAS, or the NAS controller.
So I would size the number of forests needed, then the storage capacity and I/O 
performance needed, and finally specify local disk and network to meet those 
needs.
-- Mike


On 8 Feb 2013, at 14:26 , "Khan, Kashif" 
<kashif.k...@hmhco.com<mailto:kashif.k...@hmhco.com>> wrote:
Hello Everyone, We are creating a Marklogic Cluster for failover. I have a 
couple of questions.
    ? We are planning to use NAS for data storage. Is there any performance hit 
if we use NAS over SAN?
    ? We do not have GFS setup.
        ? It is possible to attach One NAS file store to all 3 MarkLogic 
Servers in the cluster?
        ? OR do we have to attach an Independent NAS with each Marklogic 
Instance and set up a cloning job to transfer data to each of the other 2 NAS 
instances.
>From the documentation it seems like we can not attach one NAS file store to 
>all three MArkLogic servers unless we have GFS. Any info will be greatly 
>appreciated.

Kashif Khan
_______________________________________________
General mailing list
General@developer.marklogic.com<mailto:General@developer.marklogic.com>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to