Re: load balancing question
On Wed, Oct 09, 2002 at 02:17:30PM +0930, Richard Sharpe wrote: Right, a distributed file system should support distributed locking, and the rate of logons should not be that high that it stretches the ability of the DFS to lock and unlock the file. The point is not to make that work, this could be doable. The point is to make that *REALLY* fast. All that Solaris fcntl blurb that is popping up here is about a subobtimal fcntl implementation. Samba performance completely dies when fcntl is not lightning fast. Volker msg03598/pgp0.pgp Description: PGP signature
Re: load balancing question
To clear up some confustion in this thread. A DFS will only help distribute the load if the clients are accessing files in different directories. \\host\share\dir1\ would be served by host_1 \\host\share\dir2\ would be served by host_2 \\host\share\dir3\ would be served by host_3 If all the clients are reading the same file say in \host\share\dir1\ then using the DFS does not help at all. All the requests will be handled by host_1. If you have a fail-over disk system that is common to the three hosts, then dir1, dir2, dir3 can be moved from one host to another as needed, but the move is time consuming, and still one host at a time. In a shared access disk system such as in an OpenVMS cluster, you have the following: \\host\share\dir1\ would be served by host_1, host_2, and host_3, ... \\host\share\dir2\ would be served by host_1, host_2, and host_3, ... \\host\share\dir3\ would be served by host_1, host_2, and host_3, ... With an I.P. alias, clients can be connected to any host and still have access. This will load share. And DFS has no involvement in the scheme, and provides no advantage. DFS only helps if the clients are reading files out of different directories, which is not how I am interpreting the information in the post that started this thread. DFS can be enhanced to allow a better fail over mechanism for the platforms that do not support simultaneous access to a common disk. But that is not load balancing. All clients accessing the same directory are either all on the same host, or one host will have direct access, and the rest will be taking a second indirect path through the network. Locking is an issue. Samba 2.0.6 for OpenVMS uses file system locking, and the slower share locking. I am not sure about the 2.2.4 port. -John [EMAIL PROTECTED] Personal Opinion Only
RE: load balancing question
Why dont we chose a mechanism to connect to the host itself like static dns rounrobin ,each connection will round robin'ed' or some dynamic round robin utilies -Original Message- From: John E. Malmberg [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 08, 2002 9:58 PM To: [EMAIL PROTECTED] Subject: Re: load balancing question Volker Lendecke wrote: On Tue, Oct 08, 2002 at 07:29:44AM -0400, John E. Malmberg wrote: So it all comes down to what the underlying platform supports for shared simultaneous disk access. Even that will not help. Load Balancing SMB will not work due to the locking stuff across connections. If you could get tdb's work (fast!) across nodes, then we might have a chance. Why would the TDBs not work if they were located storage actively shared between all of the hosts? All the hosts would be reading and updating the information in the same tdb. -John [EMAIL PROTECTED] Personal Opinion Only
Re: load balancing question
Javid Abdul-AJAVID1 wrote: Why dont we chose a mechanism to connect to the host itself like static dns rounrobin ,each connection will round robin'ed' or some dynamic round robin utilies Well, you are the only one that really knows the requirements of your application. You seem to be asking more questions each time, with out giving us any more detail on what you really need to be done. So far though you have not offered any insights to us on the data patterns of the clients to determine if the load can be distributed. With out that information from you, it is not possible to make any recomendations. Are all of the files in the same directory? Your first post implied that they were, but most of the solutions that you seem to want to look at are not compatable with that assumption. Are the clients all accessing the same files? Same as above. Are the clients modifying the files? And then referencing the modified files? How often is the server updating the files? You have not indicated the platform for the servers, or anything that really allows any estimation about how much data is being moved. We do not know how fault tolerant the application needs to be, or how much downtime costs. We also do not know what you are wanting to use as a host for the SAMBA server. Some hosts and filesystems allow you to transparently distribute the load under all conditions. Other hosts and file systems will only allow you to distribute the load if your application meets certain requirements. We must assume that the clients must be running Microsoft Windows, as if they were running a different operating system, there are other file sharing systems that could be used. All of these factors are important to know, and maybe a few others. And unless we have the answers to those questions, there is no way that any of us can know if the advise we are giving is applicable. We do not even know if your application would even tax a single host running SAMBA, or if SAMBA is even a good fit for what you need to do. -John [EMAIL PROTECTED] Personal Opinion Only
Re: load balancing question
Richard Sharpe wrote: On Tue, 8 Oct 2002, René Nieuwenhuizen wrote: Richard Sharpe wrote: On Mon, 7 Oct 2002, John E. Malmberg wrote: Javid Abdul-AJAVID1 wrote: MSDFS is filey system right, how will it help to load balance samba connections MSDFS does not really load balance. MSDFS distributes the subdirectories of a directory between multiple servers transparently to the clients. Wouldn't it be easy to run a script on the msdfs-root that monitors the load on different machines and that recreates the referrals based on this load. Sure, but it seems better to defer the re-ordering of the referrals until someone asks :-) That's what we plan to do, and then try to do policy based stuff, like if this machine has more capacity currently, hand out it at the top of the list, and order them by power as well, like P4s before PIIIs etc. In order for that to work the underlying cluster file system must support simultaneous access from the multiple hosts. If you have that, then you do not need to deal with the MSDFS feature. If you do not have that, then you can not load balance between servers, unless you completely replicate all of the data. And that will only help if all the access is read only. There are only a limited number of Operating Systems / File Systems that support simultaneous shared access from multiple hosts, like OpenVMS clusters do. And I am not aware of any of them that will support the number of hosts or the distance that OpenVMS does. Most of the systems on UNIX use a primary / secondary relationship where only one host is ever directly accessing the filesystems, and the other hosts are using a network type interconnection to access the files. These primary / secondary systems are good for fail-over cases, but not load balancing. Any file access from a secondary is much slower and resource intensive than access from the primary. Also switching the file serving from the primary to a secondary is not a cost free operation. So redirectly clients to a secondary server usually will mean that the data must travel on the wire twice, unless the secondary server has a good caching mechanism. So it all comes down to what the underlying platform supports for shared simultaneous disk access. Or finding out the exact requirements for the project to see what all the options are. -John [EMAIL PROTECTED] Personal Opinion Only
Re: load balancing question
Volker Lendecke wrote: On Tue, Oct 08, 2002 at 07:29:44AM -0400, John E. Malmberg wrote: So it all comes down to what the underlying platform supports for shared simultaneous disk access. Even that will not help. Load Balancing SMB will not work due to the locking stuff across connections. If you could get tdb's work (fast!) across nodes, then we might have a chance. Why would the TDBs not work if they were located storage actively shared between all of the hosts? All the hosts would be reading and updating the information in the same tdb. -John [EMAIL PROTECTED] Personal Opinion Only
Re: load balancing question
On Tue, 8 Oct 2002, John E. Malmberg wrote: Volker Lendecke wrote: On Tue, Oct 08, 2002 at 07:29:44AM -0400, John E. Malmberg wrote: So it all comes down to what the underlying platform supports for shared simultaneous disk access. Even that will not help. Load Balancing SMB will not work due to the locking stuff across connections. If you could get tdb's work (fast!) across nodes, then we might have a chance. Why would the TDBs not work if they were located storage actively shared between all of the hosts? All the hosts would be reading and updating the information in the same tdb. Right, a distributed file system should support distributed locking, and the rate of logons should not be that high that it stretches the ability of the DFS to lock and unlock the file. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], http://www.richardsharpe.com
RE: load balancing question
On Mon, 7 Oct 2002, Javid Abdul-AJAVID1 wrote: MSDFS is filey system right, how will it help to load balance samba connections what criteria does it rely to load balance ( like memory, or no of connetions etc.. ) Well, that is up to you. It is simple enough to roundrobin the entries. Doing something more sophisticated is a small matter of programming. Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], http://www.richardsharpe.com
Re: load balancing question
Javid Abdul-AJAVID1 wrote: MSDFS is filey system right, how will it help to load balance samba connections what criteria does it rely to load balance ( like memory, or no of connetions etc.. ) MSDFS does not really load balance. MSDFS distributes the subdirectories of a directory between multiple servers transparently to the clients. So if each of your clients is accessing different files from different directories, then MSDFS can improve your performance. If all the clients are accessing the same files in the same directory, than you will need to either get a large enough single server, or you will need a file system that supports multiple hosts with a direct connection concurrent access to the disks. This is not really a SAMBA issue, because if the underlying filesystem and hosts support this, then SAMBA will transparently. I have received reports of SAMBA 1.19.x being used on a shared disk access OpenVMS cluster. There are also commercial LANMAN servers for some of these platforms, including those of my employer that run as a single process instead of the multiple process model of SAMBA. I do not know of any competative benchmarks between the commercial LANMAN servers and SAMBA. Such benchmarks could be difficult to instrument properly, and are highly dependent on the skill of the system administrator for each system, and the quality of the compilers for that platform. So it really depends on the specific client load as to what the best solution for you would be. It may require a more detailed engineering than could be done in a mailing list. The multiple SAMBA processes may not be the bottleneck for your proposed process. If the platorm knows how to share the code segment in memory, and the disks have good caching, the overhead for the processes may not be significant. -John [EMAIL PROTECTED] Personal Opinion Only
Re: load balancing question
On Mon, 7 Oct 2002, John E. Malmberg wrote: Javid Abdul-AJAVID1 wrote: MSDFS is filey system right, how will it help to load balance samba connections what criteria does it rely to load balance ( like memory, or no of connetions etc.. ) MSDFS does not really load balance. MSDFS distributes the subdirectories of a directory between multiple servers transparently to the clients. Well, my suggestion was that MSDFS be modified to return referrals to different machines that each provide access to the same distributed file system, and that these referals be rotated in a round robin fashion. At least, that is what we will be doing. So if each of your clients is accessing different files from different directories, then MSDFS can improve your performance. If all the clients are accessing the same files in the same directory, than you will need to either get a large enough single server, or you will need a file system that supports multiple hosts with a direct connection concurrent access to the disks. This is not really a SAMBA issue, because if the underlying filesystem and hosts support this, then SAMBA will transparently. I have received reports of SAMBA 1.19.x being used on a shared disk access OpenVMS cluster. There are also commercial LANMAN servers for some of these platforms, including those of my employer that run as a single process instead of the multiple process model of SAMBA. I do not know of any competative benchmarks between the commercial LANMAN servers and SAMBA. Such benchmarks could be difficult to instrument properly, and are highly dependent on the skill of the system administrator for each system, and the quality of the compilers for that platform. So it really depends on the specific client load as to what the best solution for you would be. It may require a more detailed engineering than could be done in a mailing list. The multiple SAMBA processes may not be the bottleneck for your proposed process. If the platorm knows how to share the code segment in memory, and the disks have good caching, the overhead for the processes may not be significant. -John [EMAIL PROTECTED] Personal Opinion Only -- Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], http://www.richardsharpe.com
RE: load balancing question
Good load balancing utility would be lsf , which is dynamic but not free i suppose, others would BIND static round robin any more suggestions :-))) -Original Message- From: John E. Malmberg To: [EMAIL PROTECTED] Sent: 10/5/02 10:03 PM Subject: Re: load balancing question Stephan Stapel wrote: Dear people on the list! I hope it's ok to ask a feature question on this core-feature list. What I would like to know is whether there are some efforts on implementing load balancing features into samba or some experiences/ experiments in this area. As standard-Windows doesn't offer these features, adding them would give samba-based systems yet another (very big) advantage over a standard nt server system. Just that you know why I'm asking for this. We have the problem to serve 3d scenes as well as image data to about 100 render nodes running under Windows NT. When starting to render, all machines are asking at exactly the same time for exactly the same data, which might be about 500 megabytes per machine. Action like this results into a server load of about 30-50 which isn't really satisfactory... Load balancing can be implemented with out making any changes to Samba, and has been. Load balancing on TCP/IP generally requires having a metric server on each host that feeds information to a DNS that understands how to round robin connection requests. The next issue that you run into is simultaneous access to the disks. Since this is read only data, you could replicate it before the rendering, but I am guessing that there is some reason that you are not replicating the data. If your platform allows simultaneous access to disks, then the load broker should be sufficient. If not, then you need to do more research. If you do not have multiple hosts sharing simulaneous access to the disks, then there probably is not much to gain by load sharing them from multiple servers, as only one host will really be doing all of the work. But again, there is nothing in Samba that prevents using existing load sharing techniques, if the underlying platform supports it. Now a server load of 30-50 to a machine is not excessive to some classes of machines, and if they are really all hitting the same data, then file system caching will help. Many of the systems my employer sells can handle that type of load easily, they also support simultaneous disk access from multiple hosts. What actually would help more is a custom protocol that used multicast packets which would reduce the total amount of network traffic. -John [EMAIL PROTECTED] Personal Opinion Only
RE: load balancing question
On Sun, 6 Oct 2002, Javid Abdul-AJAVID1 wrote: Good load balancing utility would be lsf , which is dynamic but not free i suppose, others would BIND static round robin any more suggestions :-))) Hmmm, one way to do load-balancing under Samba is to use DFS. I have some patches somewhere for the MSDFS code under Samba that gets it to rotate the referrals that are sent back. I guess I could try to dig them up. Maybe I will put them up at my web site. -Original Message- From: John E. Malmberg To: [EMAIL PROTECTED] Sent: 10/5/02 10:03 PM Subject: Re: load balancing question Stephan Stapel wrote: Dear people on the list! I hope it's ok to ask a feature question on this core-feature list. What I would like to know is whether there are some efforts on implementing load balancing features into samba or some experiences/ experiments in this area. As standard-Windows doesn't offer these features, adding them would give samba-based systems yet another (very big) advantage over a standard nt server system. Just that you know why I'm asking for this. We have the problem to serve 3d scenes as well as image data to about 100 render nodes running under Windows NT. When starting to render, all machines are asking at exactly the same time for exactly the same data, which might be about 500 megabytes per machine. Action like this results into a server load of about 30-50 which isn't really satisfactory... Load balancing can be implemented with out making any changes to Samba, and has been. Load balancing on TCP/IP generally requires having a metric server on each host that feeds information to a DNS that understands how to round robin connection requests. The next issue that you run into is simultaneous access to the disks. Since this is read only data, you could replicate it before the rendering, but I am guessing that there is some reason that you are not replicating the data. If your platform allows simultaneous access to disks, then the load broker should be sufficient. If not, then you need to do more research. If you do not have multiple hosts sharing simulaneous access to the disks, then there probably is not much to gain by load sharing them from multiple servers, as only one host will really be doing all of the work. But again, there is nothing in Samba that prevents using existing load sharing techniques, if the underlying platform supports it. Now a server load of 30-50 to a machine is not excessive to some classes of machines, and if they are really all hitting the same data, then file system caching will help. Many of the systems my employer sells can handle that type of load easily, they also support simultaneous disk access from multiple hosts. What actually would help more is a custom protocol that used multicast packets which would reduce the total amount of network traffic. -John [EMAIL PROTECTED] Personal Opinion Only -- Regards - Richard Sharpe, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], http://www.richardsharpe.com
load balancing question
Dear people on the list! I hope it's ok to ask a feature question on this core-feature list. What I would like to know is whether there are some efforts on implementing load balancing features into samba or some experiences/ experiments in this area. As standard-Windows doesn't offer these features, adding them would give samba-based systems yet another (very big) advantage over a standard nt server system. Just that you know why I'm asking for this. We have the problem to serve 3d scenes as well as image data to about 100 render nodes running under Windows NT. When starting to render, all machines are asking at exactly the same time for exactly the same data, which might be about 500 megabytes per machine. Action like this results into a server load of about 30-50 which isn't really satisfactory... Kind regards, Stephan __ Alleine + Gemeinsam = WEB.DE Spielgemeinschaften Die clevere Kombination von Gewinn-Garantie und Jackpot-Chance!