Using rsync or unisom or any of the improvements on top of rsync as a direct replacement for FTP would be a low-hanging fruit improvement. Tunneling it over an encrypted layer such as SSH improves security as well.
You're absolutely right that if you have a cluster of machines in the DMZ, a more scalable architecture is to have them all share storage rather than unicast-replicate to each webserver individually. So you have to choice between SAN / NAS to do this. NAS is generally cheaper for a mature solution on a budget. Perhaps your next lowest hanging fruit is to put a secure NAS on your DMZ, then replicate to it from your internal trusted network via a single port. Rsync / unisom again, over an encrypted protocol like SSH. There are many possible variations - using stronger or weaker security on the replication channel depending on the exact use case and data, replicating on an event-driven system instead of via polling, buying the replication software or NAS appliance(s) that handles replication already. Your instincts are correct that it /is/ largely a solved problem! Good Luck, -D Flaherty, Patrick wrote: > I've been soliciting solutions from everyone I can think of on moving a > large number of files from inside our lan to a dmz on a regular basis. > > I have a cluster of machine producing 20k small files (30kbytes or so) > inside our lan. After the files are created, they are pushed to a few > web servers in the DMZ using ftp. The push is done by the machine that > created the file. Ideally, the files make it out to the DMZ in less than > 30 seconds but there have been some issues. > > FTP seems to fall down when scaling out to more than a web server or > two, many retries and transfer failures. It also adds to complexity to > the processing. What if one of the web servers is down? How many time do > you retry? Should you notify the other hosts in the cluster? All that > logic needs to be in the pushing script, which becomes a bit ungainly. > There's also the issue with constantly opening up new ftp sessions, > which is a bit expensive. > > So I'm looking for a cleaner architecture. An ideal solution would be an > NFS/CIFS share internal to the lan replicated readonly to an NFS/CIFS > share in the DMZ. The cluster can write to the nfs share, the web > servers can read from the nfs share. Everyone is happy. The big sticking > point is being careful violating the security by multi homing the > storage. Many solutions require an open connection network on many ports > between the two storage boxes, which would be an easy way in to our lan. > > So far I'm poking at (and some downsides): > FUSE + (sshfs/ftpfs): High performance hit (60%ish from what I've read) > ZFS + StorageTek: Great, another operating system train people on. > DRBD: requires full network connection between lan and dmz boxes. > dataplow sfs + das box: sales people will promise you the world. > Software SAN replicators of to many names to mention. > > This is such a common problem, I'm not sure why there isn't a nice > canned solution of two cheap pieces of hardware. Maybe I'm just an idiot > and there is. Oh please please please tell me I'm an idiot. > > Anyone have any brilliant ideas? > > Best, > Patrick > > _______________________________________________ > gnhlug-discuss mailing list > gnhlug-discuss@mail.gnhlug.org > http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/ > _______________________________________________ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/