Re: Balance and sync data
On Thu, 2008-09-18 at 23:31 +0200, André Warnier wrote: > Martin Spinassi wrote: > [...] > > Martin, > > I re-read the thread from the beginning, and as I understand it you have > - clients that upload files, most of then images > - clients that download these same images > - and you would like a system that handles this and duplicates the > images to 2 or more "synchronised" places, so as to have redundancy and > backup. > > Let me describe a part of an application which I designed, and see if > this inspires you. This was under Apache, but it should be possible > also under Tomcat. > > I wanted to provide clients with a hierarchical folder hierarchy where > they could upload their documents via a simple drag and drop, but I did > not want to have to scan the whole structure regularly to check if > anything had been uploaded there. > Plus, I wanted to know who uploaded what when, and wanted to do > something to those files after they uploaded them. > Plus, I am lazy and not such a big-shot programmer, so if something > already exists and works well, I prefer to use it than to re-develop my > own buggy version. > > At the core, for allowing clients to upload the (in my case) documents, > there is DAV (which is also implemented under Tomcat). > DAV, allows the client to see a folder structure on the server, and > drag-drop files in it, just like to a remote network drive. It even > works in Windows with the Explorer (not IE, the other one), it's called > "web folders" there. > But once the file is dropped somewhere, you don't know anymore who put > it there. Plus, since they can drop a file anywhere in the folder > hierarchy, you have to explore the whole hierarchy regularly on the > server to find the files they've dropped, if any. > > Except that, at the base, DAV is just an HTTP protocol extension. It > makes requests through URLs, and such requests get processed by a HTTP > server. The requests just use different "command verbs" than GET and > POST. For a while, I was thinking of creating my own handlers for those > verbs (PUT, MKCOL, OPTONS,..), or taking the DAV code, and implement my > own additional desired functionality into it. > Then I realised that DAV being a HTTP protocol extension, you can do > HTTP authentication, and you can use filters around it. That's true in > Apache, and also in Tomcat. > > So let's say that when a user wants to drop a file via DAV, you > intercept the HTTP requests, authenticate the request, and save that > somewhere. > Next, your filter gets to run. It sees where the user is going to drop > the file (the URL of the PUT), and remembers it. Then it lets the > request go through DAV (the actual file upload into a folder somewhere), > DAB being the filtered application here. Then when the DAV response > comes back through the filter, the filter takes the uploaded file from > where it is now (it knows the exact folder), and copies it to another > place (or does whatever you want with it). In addition, the filter also > knows who did it and other details, so it can pass this information > somewhere to be saved (into a database record ?). > > I personally find this more elegant than > a) re-inventing the wheel : to upload/download files from a HTTP server, > is something for which DAV was designed, and the developer spent a lot > of time making it work reliably > b) triggering external syncs in real-time > 3) scanning the file structure later to sync > > DAV also allows drag-and-drop downloads, and they also go through HTTP > requests... > > You don't need to change DAV in any way, you just "wrap" it in filters > that do what you want around it. > > André André, first of all, thank you very much to take the time to re-read the thread and write such a good response, I really appreciate it. About DAV, it looks like you really made something big there. I don't know if it applies to my case (or if I have the chance to do it). The site I'm trying to make is some kind of a forum/social web. People make threads or posts, and add it some pictures to illustrate it. My first shot was rsync, just in case that one tomcat suddenly dies, the other one would have almost all the pictures. NFS can be a better option, and have a daily reply in one tomcat box, just in case the nfs server stop working. Once again, thanks for your response and time. I don't know how to apply it here, but it surely is a must read, I'll give a try to the DAV documentation. Cheers Martín - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
Martin Spinassi wrote: [...] Martin, I re-read the thread from the beginning, and as I understand it you have - clients that upload files, most of then images - clients that download these same images - and you would like a system that handles this and duplicates the images to 2 or more "synchronised" places, so as to have redundancy and backup. Let me describe a part of an application which I designed, and see if this inspires you. This was under Apache, but it should be possible also under Tomcat. I wanted to provide clients with a hierarchical folder hierarchy where they could upload their documents via a simple drag and drop, but I did not want to have to scan the whole structure regularly to check if anything had been uploaded there. Plus, I wanted to know who uploaded what when, and wanted to do something to those files after they uploaded them. Plus, I am lazy and not such a big-shot programmer, so if something already exists and works well, I prefer to use it than to re-develop my own buggy version. At the core, for allowing clients to upload the (in my case) documents, there is DAV (which is also implemented under Tomcat). DAV, allows the client to see a folder structure on the server, and drag-drop files in it, just like to a remote network drive. It even works in Windows with the Explorer (not IE, the other one), it's called "web folders" there. But once the file is dropped somewhere, you don't know anymore who put it there. Plus, since they can drop a file anywhere in the folder hierarchy, you have to explore the whole hierarchy regularly on the server to find the files they've dropped, if any. Except that, at the base, DAV is just an HTTP protocol extension. It makes requests through URLs, and such requests get processed by a HTTP server. The requests just use different "command verbs" than GET and POST. For a while, I was thinking of creating my own handlers for those verbs (PUT, MKCOL, OPTONS,..), or taking the DAV code, and implement my own additional desired functionality into it. Then I realised that DAV being a HTTP protocol extension, you can do HTTP authentication, and you can use filters around it. That's true in Apache, and also in Tomcat. So let's say that when a user wants to drop a file via DAV, you intercept the HTTP requests, authenticate the request, and save that somewhere. Next, your filter gets to run. It sees where the user is going to drop the file (the URL of the PUT), and remembers it. Then it lets the request go through DAV (the actual file upload into a folder somewhere), DAB being the filtered application here. Then when the DAV response comes back through the filter, the filter takes the uploaded file from where it is now (it knows the exact folder), and copies it to another place (or does whatever you want with it). In addition, the filter also knows who did it and other details, so it can pass this information somewhere to be saved (into a database record ?). I personally find this more elegant than a) re-inventing the wheel : to upload/download files from a HTTP server, is something for which DAV was designed, and the developer spent a lot of time making it work reliably b) triggering external syncs in real-time 3) scanning the file structure later to sync DAV also allows drag-and-drop downloads, and they also go through HTTP requests... You don't need to change DAV in any way, you just "wrap" it in filters that do what you want around it. André - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
--- On Wed, 9/17/08, Hassan Schroeder <[EMAIL PROTECTED]> wrote: > From: Hassan Schroeder <[EMAIL PROTECTED]> > Subject: Re: Balance and sync data > To: "Tomcat Users List" > Date: Wednesday, September 17, 2008, 6:13 PM > On Wed, Sep 17, 2008 at 2:57 PM, Christopher Schultz > <[EMAIL PROTECTED]> wrote: > > >> Why not have your upload servlet invoke rsync when > a new file has > >> been stored? > > > > You're not seriously suggesting that as a viable > production strategy, > > are you? > > [ IM IN UR DURECTRY COPYNG UR IMAGES ] > > Sure -- why not? It works nicely for a use case like this. > And exec'ing > a process as needed beats spawning one every minute! > > > NFS, baby. NFS. > > Um, single point of failure? :-) > What if NFS source is from DRBD [1]? DRBD provides alternative to an expensive single HA system ;) [1] http://www.drbd.org/ - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
On Thu, 2008-09-18 at 08:34 -0700, Hassan Schroeder wrote: > On Thu, Sep 18, 2008 at 7:47 AM, Christopher Schultz > <[EMAIL PROTECTED]> wrote: > > > I suppose it depends on the frequency of image uploads. 100 images a day > > wouldn't be too bad. 100 images per minute would seriously suck. > > True, I was envisioning a relatively low-frequency operation, for > no particular good reason :-) > > >> Um, single point of failure? :-) > > > > NFS /can/ be done robustly. > > OK, I haven't encountered an NFS cluster in the wild, but apparently > they exist. So, yes, that'd be a solution, and would probably scale > better than using rsync. > NFS was one of my firsts shots (after rsync). Of course it should use ip-sec to make it more secure, the problem with that option is that I don't really know yet how much will consume the image load to decide to set it in a dedicated server (to look to the future) or add it to a tomcat server and make the load balance hit that server less than the other one. The problem here is the single point of failure, but supposing it's just an nfs with a bunch of images, it won't take too much to restore another server with the same data. Thanks for your help Hassan, may be nfs would be ok if I can securize it enough. Cheers Martín - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
On Thu, Sep 18, 2008 at 7:47 AM, Christopher Schultz <[EMAIL PROTECTED]> wrote: > I suppose it depends on the frequency of image uploads. 100 images a day > wouldn't be too bad. 100 images per minute would seriously suck. True, I was envisioning a relatively low-frequency operation, for no particular good reason :-) >> Um, single point of failure? :-) > > NFS /can/ be done robustly. OK, I haven't encountered an NFS cluster in the wild, but apparently they exist. So, yes, that'd be a solution, and would probably scale better than using rsync. -- Hassan Schroeder [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hassan, Hassan Schroeder wrote: > On Wed, Sep 17, 2008 at 2:57 PM, Christopher Schultz > <[EMAIL PROTECTED]> wrote: > >>> Why not have your upload servlet invoke rsync when a new file has >>> been stored? >> You're not seriously suggesting that as a viable production strategy, >> are you? > > [ IM IN UR DURECTRY COPYNG UR IMAGES ] > > Sure -- why not? It works nicely for a use case like this. And exec'ing > a process as needed beats spawning one every minute! I suppose it depends on the frequency of image uploads. 100 images a day wouldn't be too bad. 100 images per minute would seriously suck. >> NFS, baby. NFS. > > Um, single point of failure? :-) NFS /can/ be done robustly. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjSag8ACgkQ9CaO5/Lv0PDWiACfY8paCVV3A++E5uZMSfn3yENH hoMAn3n0+xcZxZ2KjL6Oh68vzbbuQUq7 =CGac -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
On Wed, Sep 17, 2008 at 2:57 PM, Christopher Schultz <[EMAIL PROTECTED]> wrote: >> Why not have your upload servlet invoke rsync when a new file has >> been stored? > > You're not seriously suggesting that as a viable production strategy, > are you? [ IM IN UR DURECTRY COPYNG UR IMAGES ] Sure -- why not? It works nicely for a use case like this. And exec'ing a process as needed beats spawning one every minute! > NFS, baby. NFS. Um, single point of failure? :-) -- Hassan Schroeder [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hassan, Hassan Schroeder wrote: > On Tue, Sep 16, 2008 at 6:38 AM, Martin Spinassi > <[EMAIL PROTECTED]> wrote: > >> I don't know yet, I didn't try it yet, I was waiting to see if there is >> a better solution than rsync them every minute. > > Why not have your upload servlet invoke rsync when a new file has > been stored? You're not seriously suggesting that as a viable production strategy, are you? NFS, baby. NFS. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkjRfUoACgkQ9CaO5/Lv0PA8zgCfRxagWpEeQPkbw+xaa91v+PST 6hEAn0WfeA4rT9k2RN5bjFq9Gij+nCFJ =7TnM -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Balance and sync data
On Tue, 2008-09-16 at 17:59 -0400, Paul McGurn wrote: > If you're expecting the size of your image store to grow, or better yet, grow > rapidly, you'd be best served to consider a strategy either with > mod_proxy/mod_rewrite, or better yet, looking into a CDN (content delivery > network) to host the images themselves. > > Example, I'm about to launch an offering that will allow for our support team > to publish video tutorials on how to use our products. It makes absolutely > no sense to have a copy of each video file on each front end webserver (we > use tomcat as the web server and application container), and it also isn't > responsible to deliver a content offering with no redundancy in case of > outage/downtime/disaster. > > Instead, we're leveraging some clever (but very easy) DNS, and Amazon S3 to > host the files. > > By leveraging Amazon, we can link all our content by using a CNAME DNS > record, like content.yourname.com , and automatically deliver that content > from Amazon. > > Of course, there are drawbacks. I don't think this method would work in SSL > implementations for instance. > > This link is to the instructions I followed ot deliver content via S3: > http://www.carltonbale.com/2007/09/how-to-alias-a-domain-name-or-sub-domain-to-amazon-s3/ > > > Paul McGurn Looks pretty good, but have to check the price for that... Anyway, your videos are uploaded by people of your staff, and then viewed as html for your clients. Here, people upload images, so our server modify those images (resize, format, take off transparencies, etc), so it'll make more traffic from our server to the s3 server. I'll keep the idea in mind, but I don't see it as a solution to use soon in this case. Thanks for the idea anyway Paul, I'll check it out. Cheers Martín - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Balance and sync data
If you're expecting the size of your image store to grow, or better yet, grow rapidly, you'd be best served to consider a strategy either with mod_proxy/mod_rewrite, or better yet, looking into a CDN (content delivery network) to host the images themselves. Example, I'm about to launch an offering that will allow for our support team to publish video tutorials on how to use our products. It makes absolutely no sense to have a copy of each video file on each front end webserver (we use tomcat as the web server and application container), and it also isn't responsible to deliver a content offering with no redundancy in case of outage/downtime/disaster. Instead, we're leveraging some clever (but very easy) DNS, and Amazon S3 to host the files. By leveraging Amazon, we can link all our content by using a CNAME DNS record, like content.yourname.com , and automatically deliver that content from Amazon. Of course, there are drawbacks. I don't think this method would work in SSL implementations for instance. This link is to the instructions I followed ot deliver content via S3: http://www.carltonbale.com/2007/09/how-to-alias-a-domain-name-or-sub-domain-to-amazon-s3/ Paul McGurn -Original Message- From: Martin Spinassi [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 16, 2008 5:53 PM To: users@tomcat.apache.org Subject: Re: Balance and sync data On Tue, 2008-09-16 at 22:20 +0200, André Warnier wrote: [...] > The question I've been holding back since your initial post, is why > exactly you do want to load-balance similar requests to 2 Tomcats ? > > Just an idea : > > If it is because you have a) "image stuff" and b) "non-image stuff", and > they each represent about 50% of the load, then maybe you do not really > want to balance (with the problems of sharing and/or duplicating the > images), but you could just use a front-end to split the image stuff and > send it to Tomcat-1, and the non-image stuff and send it to Tomcat-2. > (Apache + mod_rewrite + mod_proxy). > This way, only Tomcat-1 would need to handle the images (up and down) > and it would always be up-to-date. > They are all "image stuff". The idea of duplicate those images are availability, just in case one tomcat goes down. But, in the other hand, images (and resize, thumbnails, etc) consumes resources, and the possibility of using just one server is still around my head. If the images load makes big enough, may be using another server just for that could be a good option. I've to read something more about mod_proxy, and see if I can apply it to resolve some load issues. Thanks for your help André, I'll keep your idea in mind to try it before selecting the right one to production. Cheers. Martín - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
On Tue, 2008-09-16 at 22:20 +0200, André Warnier wrote: [...] > The question I've been holding back since your initial post, is why > exactly you do want to load-balance similar requests to 2 Tomcats ? > > Just an idea : > > If it is because you have a) "image stuff" and b) "non-image stuff", and > they each represent about 50% of the load, then maybe you do not really > want to balance (with the problems of sharing and/or duplicating the > images), but you could just use a front-end to split the image stuff and > send it to Tomcat-1, and the non-image stuff and send it to Tomcat-2. > (Apache + mod_rewrite + mod_proxy). > This way, only Tomcat-1 would need to handle the images (up and down) > and it would always be up-to-date. > They are all "image stuff". The idea of duplicate those images are availability, just in case one tomcat goes down. But, in the other hand, images (and resize, thumbnails, etc) consumes resources, and the possibility of using just one server is still around my head. If the images load makes big enough, may be using another server just for that could be a good option. I've to read something more about mod_proxy, and see if I can apply it to resolve some load issues. Thanks for your help André, I'll keep your idea in mind to try it before selecting the right one to production. Cheers. Martín - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
Martin Spinassi wrote: On Tue, 2008-09-16 at 08:56 -0700, Hassan Schroeder wrote: On Tue, Sep 16, 2008 at 8:17 AM, Martin Spinassi <[EMAIL PROTECTED]> wrote: Why not have your upload servlet invoke rsync when a new file has been stored? Can you give me some more details or where to get some more info? Runtime.exec("/usr/bin/rsync") -- though you may want to instead invoke a script file containing the appropriate rsync arguments. This works fine. I've even used rsync's "dry-run" mode to create a list of files that differed between two systems (e.g. staging and production) to generate a form page and allow the user to pick which ones to sync. HTH, Thanks Hassan! I'll talk to developers to give it a try at our test environment. The question I've been holding back since your initial post, is why exactly you do want to load-balance similar requests to 2 Tomcats ? Just an idea : If it is because you have a) "image stuff" and b) "non-image stuff", and they each represent about 50% of the load, then maybe you do not really want to balance (with the problems of sharing and/or duplicating the images), but you could just use a front-end to split the image stuff and send it to Tomcat-1, and the non-image stuff and send it to Tomcat-2. (Apache + mod_rewrite + mod_proxy). This way, only Tomcat-1 would need to handle the images (up and down) and it would always be up-to-date. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
On Tue, 2008-09-16 at 08:56 -0700, Hassan Schroeder wrote: > On Tue, Sep 16, 2008 at 8:17 AM, Martin Spinassi > <[EMAIL PROTECTED]> wrote: > > >> Why not have your upload servlet invoke rsync when a new file has > >> been stored? > > > Can you give me some more details or where to get some more info? > > Runtime.exec("/usr/bin/rsync") -- though you may want to instead > invoke a script file containing the appropriate rsync arguments. > > This works fine. I've even used rsync's "dry-run" mode to create a list > of files that differed between two systems (e.g. staging and production) > to generate a form page and allow the user to pick which ones to sync. > > HTH, Thanks Hassan! I'll talk to developers to give it a try at our test environment. Martín - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
On Tue, Sep 16, 2008 at 8:17 AM, Martin Spinassi <[EMAIL PROTECTED]> wrote: >> Why not have your upload servlet invoke rsync when a new file has >> been stored? > Can you give me some more details or where to get some more info? Runtime.exec("/usr/bin/rsync") -- though you may want to instead invoke a script file containing the appropriate rsync arguments. This works fine. I've even used rsync's "dry-run" mode to create a list of files that differed between two systems (e.g. staging and production) to generate a form page and allow the user to pick which ones to sync. HTH, -- Hassan Schroeder [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
On Tue, 2008-09-16 at 07:37 -0700, Hassan Schroeder wrote: > On Tue, Sep 16, 2008 at 6:38 AM, Martin Spinassi > <[EMAIL PROTECTED]> wrote: > > > I don't know yet, I didn't try it yet, I was waiting to see if there is > > a better solution than rsync them every minute. > > Why not have your upload servlet invoke rsync when a new file has > been stored? > Is that possible? I'm far away from a java programmer, and have few "fights" against tomcat to know those kinds of "tricks". Can you give me some more details or where to get some more info? This looks like what I was searching for. Thanks! Martín - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Balance and sync data
On Tue, Sep 16, 2008 at 6:38 AM, Martin Spinassi <[EMAIL PROTECTED]> wrote: > I don't know yet, I didn't try it yet, I was waiting to see if there is > a better solution than rsync them every minute. Why not have your upload servlet invoke rsync when a new file has been stored? -- Hassan Schroeder [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]