[Dspace-tech] Load balancing / clustering

2007-03-07 Thread Ryan Ordway

I have been digging around to find information about sites using load
balancing and/or clustering with their Dspace installations. All I could
find was mention of load balancing web requests to multiple Tomcat instances
using mod_jk.

First some background, and then my question:

What I am looking to do is put my Dspace web servers behind my load balancer
to balance the HTTP requests. The web servers then both load balance their
Tomcat connections via mod_jk to each other, with their own instances being
weighted heavier so that they will prefer localhost.

For the database, for now I'm just using a single Postgres instance. I'm
hoping to get Dspace ported to MySQL to take advantage of my existing MySQL
cluster. 

My question is, are there any issues to watch for? Will just rsync'ing the
assetstore between the two web/app servers suffice? Are there any issues
with running multiple handle servers?

Thanks,

Ryan

--
Ryan Ordway  E-mail:   [EMAIL PROTECTED]
Unix Systems Administrator [EMAIL PROTECTED]
OSU Libraries, Corvallis, OR 97370Office: Valley Library #4657



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Load balancing / clustering

2007-03-08 Thread Cory Snavely
I'm not clear on why you would want load-balancing both in *front* of
Apache *and between* Apache and Tomcat. In particular I would think if
you had the former you would not benefit from the latter. I guess you're
concerned about Tomcat failing independently of Apache. In my case, I've
just eliminated Apache from the picture.

At any rate, re: the assetstore, if you want a load-balanced
environment, I am quite sure that real-time synchronization is
necessary. Even with an hourly rsync--problematic at best with a large
repository, BTW--a deposit on one instance and a subsequent attempted
retrieval of it on the other would cause issues. There are a number of
ways to share a file system among several servers but I would think that
the most accessible would be any reasonable NAS storage backend
depending on your existing storage infrastructure.

Make sure you run the indexer on only one instance.

I run two regular handle servers redundantly, not against DSpace, but
against MySQL with bidirectional MySQL replication. The folks at CNRI
helped me work through the issues involved, which mainly involved having
a shared private key between the two and making sure that the two
servers were configured as masters so they did not try to use handle
replication. I would think that redundant handle servers operating
against DSpace (that is, DSpace methods for Postgres or MySQL access)
would be about the same thing--just making sure that the handle server
configurations are identical on each server.

Cory Snavely
University of Michigan Library IT Core Services

On Wed, 2007-03-07 at 11:52 -0800, Ryan Ordway wrote:
> I have been digging around to find information about sites using load
> balancing and/or clustering with their Dspace installations. All I could
> find was mention of load balancing web requests to multiple Tomcat instances
> using mod_jk.
> 
> First some background, and then my question:
> 
> What I am looking to do is put my Dspace web servers behind my load balancer
> to balance the HTTP requests. The web servers then both load balance their
> Tomcat connections via mod_jk to each other, with their own instances being
> weighted heavier so that they will prefer localhost.
> 
> For the database, for now I'm just using a single Postgres instance. I'm
> hoping to get Dspace ported to MySQL to take advantage of my existing MySQL
> cluster. 
> 
> My question is, are there any issues to watch for? Will just rsync'ing the
> assetstore between the two web/app servers suffice? Are there any issues
> with running multiple handle servers?
> 
> Thanks,
> 
> Ryan
> 
> --
> Ryan Ordway  E-mail:   [EMAIL PROTECTED]
> Unix Systems Administrator [EMAIL PROTECTED]
> OSU Libraries, Corvallis, OR 97370Office: Valley Library #4657
> 
> 
> 
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> ___
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Load balancing / clustering

2007-03-08 Thread Ryan Ordway
On 3/8/07 4:54 AM, "Cory Snavely" <[EMAIL PROTECTED]> spake:

> I'm not clear on why you would want load-balancing both in *front* of
> Apache *and between* Apache and Tomcat. In particular I would think if
> you had the former you would not benefit from the latter. I guess you're
> concerned about Tomcat failing independently of Apache. In my case, I've
> just eliminated Apache from the picture.

If only Tomcat were to go down on one of the hosts (say host A), this would
allow the Apache on host A to still serve requests and talk to Tomcat on
host B to fetch the content. Since

> At any rate, re: the assetstore, if you want a load-balanced
> environment, I am quite sure that real-time synchronization is
> necessary. Even with an hourly rsync--problematic at best with a large
> repository, BTW--a deposit on one instance and a subsequent attempted
> retrieval of it on the other would cause issues. There are a number of
> ways to share a file system among several servers but I would think that
> the most accessible would be any reasonable NAS storage backend
> depending on your existing storage infrastructure.

I am also trying to avoid single points of failure. These hosts are both
connected to a SAN, but want both hosts to have a copy of the data.

I'm considering some form of on-demand synchronization, in addition to
scheduled synchronization. For instance, when a new item is added having it
trigger a synchronization to push the new data to the other node.

Rsync is quite speedy. :-)
 
> Make sure you run the indexer on only one instance.

Good to know!
 
> I run two regular handle servers redundantly, not against DSpace, but
> against MySQL with bidirectional MySQL replication. The folks at CNRI
> helped me work through the issues involved, which mainly involved having
> a shared private key between the two and making sure that the two
> servers were configured as masters so they did not try to use handle
> replication. I would think that redundant handle servers operating
> against DSpace (that is, DSpace methods for Postgres or MySQL access)
> would be about the same thing--just making sure that the handle server
> configurations are identical on each server.

What is the benefit to using the handle server with MySQL? What needs to be
done to Dspace to get it to use the MySQL data rather than using the Dspace
methods?

Thanks for the input,

Ryan
 
> Cory Snavely
> University of Michigan Library IT Core Services
> 
> On Wed, 2007-03-07 at 11:52 -0800, Ryan Ordway wrote:
>> I have been digging around to find information about sites using load
>> balancing and/or clustering with their Dspace installations. All I could
>> find was mention of load balancing web requests to multiple Tomcat instances
>> using mod_jk.
>> 
>> First some background, and then my question:
>> 
>> What I am looking to do is put my Dspace web servers behind my load balancer
>> to balance the HTTP requests. The web servers then both load balance their
>> Tomcat connections via mod_jk to each other, with their own instances being
>> weighted heavier so that they will prefer localhost.
>> 
>> For the database, for now I'm just using a single Postgres instance. I'm
>> hoping to get Dspace ported to MySQL to take advantage of my existing MySQL
>> cluster. 
>> 
>> My question is, are there any issues to watch for? Will just rsync'ing the
>> assetstore between the two web/app servers suffice? Are there any issues
>> with running multiple handle servers?


-- 
Ryan Ordway  E-mail:   [EMAIL PROTECTED]
Unix Systems Administrator [EMAIL PROTECTED]
OSU Libraries, Corvallis, OR 97370Office: Valley Library #4657



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Load balancing / clustering

2007-03-08 Thread Cory Snavely
On Thu, 2007-03-08 at 11:54 -0800, Ryan Ordway wrote:
> On 3/8/07 4:54 AM, "Cory Snavely" <[EMAIL PROTECTED]> spake:
> 
> > At any rate, re: the assetstore, if you want a load-balanced
> > environment, I am quite sure that real-time synchronization is
> > necessary. Even with an hourly rsync--problematic at best with a large
> > repository, BTW--a deposit on one instance and a subsequent attempted
> > retrieval of it on the other would cause issues. There are a number of
> > ways to share a file system among several servers but I would think that
> > the most accessible would be any reasonable NAS storage backend
> > depending on your existing storage infrastructure.
> 
> I am also trying to avoid single points of failure. These hosts are both
> connected to a SAN, but want both hosts to have a copy of the data.
> 
> I'm considering some form of on-demand synchronization, in addition to
> scheduled synchronization. For instance, when a new item is added having it
> trigger a synchronization to push the new data to the other node.
> 
> Rsync is quite speedy. :-)

Well, whether your storage backend is a single point of failure depends
largely on its architecture. If you use dual pathing, dual active-active
controllers, etc, and some reasonable RAID level I would not at all
consider it to be a single point of failure.

If you still favor the idea of two separate storage systems, I think you
are heading down the road of bi-directional, real-time replication in
order to really do it right. I am of the opinion that most any system
reliant on crawling across large filesystems on a regular basis is
unacceptable at a large scale. I have also seen rsync require huge
amounts of memory at large scale. Lastly, the bidirectionality is also
an issue that could be complicated in particular if you allow objects to
be removed from your repository (consider whether you would use the
--delete flag or not, and how a new submission looks to one system like
a deletion to the other).

That said, if you rig up something to trigger a push to the other site,
you'll probably be able to get it to work...but it's really work that
could be achieved at the file system layer.
 
> > Make sure you run the indexer on only one instance.
> 
> Good to know!
>  
> > I run two regular handle servers redundantly, not against DSpace, but
> > against MySQL with bidirectional MySQL replication. The folks at CNRI
> > helped me work through the issues involved, which mainly involved having
> > a shared private key between the two and making sure that the two
> > servers were configured as masters so they did not try to use handle
> > replication. I would think that redundant handle servers operating
> > against DSpace (that is, DSpace methods for Postgres or MySQL access)
> > would be about the same thing--just making sure that the handle server
> > configurations are identical on each server.
> 
> What is the benefit to using the handle server with MySQL? What needs to be
> done to Dspace to get it to use the MySQL data rather than using the Dspace
> methods?

It won't apply here. To resolve handles in DSpace, you have to configure
the handle server to run against the DSpace metadata store through Java
methods.

My point with that was simply to say that handle servers can run in an
active-active load-balancing mode, but they need to both believe they
are masters and they need to use the same private key.

c

> > On Wed, 2007-03-07 at 11:52 -0800, Ryan Ordway wrote:
> >> I have been digging around to find information about sites using load
> >> balancing and/or clustering with their Dspace installations. All I could
> >> find was mention of load balancing web requests to multiple Tomcat 
> >> instances
> >> using mod_jk.
> >> 
> >> First some background, and then my question:
> >> 
> >> What I am looking to do is put my Dspace web servers behind my load 
> >> balancer
> >> to balance the HTTP requests. The web servers then both load balance their
> >> Tomcat connections via mod_jk to each other, with their own instances being
> >> weighted heavier so that they will prefer localhost.
> >> 
> >> For the database, for now I'm just using a single Postgres instance. I'm
> >> hoping to get Dspace ported to MySQL to take advantage of my existing MySQL
> >> cluster. 
> >> 
> >> My question is, are there any issues to watch for? Will just rsync'ing the
> >> assetstore between the two web/app servers suffice? Are there any issues
> >> with running multiple handle servers?
> 
> 


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/li

Re: [Dspace-tech] Load balancing / clustering

2007-03-08 Thread James Rutherford
On Thu, Mar 08, 2007 at 11:54:38AM -0800, Ryan Ordway wrote:
> I am also trying to avoid single points of failure. These hosts are both
> connected to a SAN, but want both hosts to have a copy of the data.

If you're trying to avoid single points of failure, I'm curious as to
how you (and others) are dealing with this at the db layer (I think you
mentioned a mysql cluster). I've started a page on the wiki:

http://wiki.dspace.org/index.php/HOWTO_Clustering

where I've gathered some information on clustering postgres. If anyone
has anything to add here, it would be most appreciated since most
available solutions are proving troublesome thus far.

Jim

--
James Rutherford
Research Engineer
HP Labs, Bristol, UK
+44 117 312 7066
[EMAIL PROTECTED]

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Load balancing / clustering (fwd)

2007-03-08 Thread Bill Jordan

We run DSpace in our load-balanced LVS web cluster.  Apache and Tomcat 
instances are on separate hosts.  Each Apache host talks to one Tomcat 
host -- if either Apache or Tomcat fails, we take that pair out of the LVS 
rotation.  We've configured the DSpace virtual services to be persistent
so a client sticks to the same real server that handled its first request.

We use NFS filesystems for the LVS cluster.  Postgres and NFS servers are 
on their own hosts, and we run both in an active/passive high-availability 
configuration.

Indexing and the handle server run on the "primary" Tomcat nodes.  We use 
mon and heartbeat to watch the service and initiate failover of the handle 
server if the primary node goes down.

This has worked fine for us in our low-volume installation.  It's overkill 
for our transaction volume, but we get high availability and I avoid 
having hosts dedicated to DSpace.

--Bill


William Jordan
Associate Dean
University of Washington Libraries
Resource Acquisition and Description/
Information Technology Services 
Box 352900, Seattle, WA 98195-2900
Voice: (206) 685-1625   Fax: (206) 543-5457


On Wed, 2007-03-07 at 11:52 -0800, Ryan Ordway wrote:
> I have been digging around to find information about sites using load
> balancing and/or clustering with their Dspace installations. All I could
> find was mention of load balancing web requests to multiple Tomcat instances
> using mod_jk.
>
> First some background, and then my question:
>
> What I am looking to do is put my Dspace web servers behind my load balancer
> to balance the HTTP requests. The web servers then both load balance their
> Tomcat connections via mod_jk to each other, with their own instances being
> weighted heavier so that they will prefer localhost.
>
> For the database, for now I'm just using a single Postgres instance. I'm
> hoping to get Dspace ported to MySQL to take advantage of my existing MySQL
> cluster.
>
> My question is, are there any issues to watch for? Will just rsync'ing the
> assetstore between the two web/app servers suffice? Are there any issues
> with running multiple handle servers?
>
> Thanks,
>
> Ryan
>
> --
> Ryan Ordway  E-mail:   [EMAIL PROTECTED]
> Unix Systems Administrator [EMAIL PROTECTED]
> OSU Libraries, Corvallis, OR 97370Office: Valley Library #4657
>
>
>
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> ___
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech