Re: Storing images in PostgreSQL (was: Streaming files)
Guten Tag Randolf Richardson, am Dienstag, 20. August 2019 um 19:06 schrieben Sie: > One of the great things about Perl is that there is more than one > way to do things, and this flows through to how to handle the storing > and retrieving of images. (Everyone else is welcome to share their > perspectives and experiences on this in replies to this message.) For very similar reasons like you already mentioned we are even storing user generated files with sometimes hundreds of MiB each in Postgres currently. Most files are far smaller, though. Works, but has cons as well, especially things like too large dumps in the long term, increased I/O getting things into postgres etc. Especially properly using RAII-pattern and new file systems like BTRFS/ZFS in my opinion are the better way to go today/in future. RAII helps managing things cleanly, snapshots+ACLs can be used to protect data, both support checksumming for data integrity and replication is somewhat easily implemented using sending/receiving snapshots as well. ZFS and ZREP is already used to mirror servers in our setup and works reliably. Nevertheless, your mail in my opinion is a good collection of arguments which I could have written myself as well. :-) Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...05151- 9468- 55 Fax...05151- 9468- 88 Mobil..0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
Storing images in PostgreSQL (was: Streaming files)
In answer to your question, I recognize that there are many pros and cons to storing images in PostgreSQL. I'm providing a fuller answer here and including it in the ModPerl mailing list in the hopes that it will be helpful in some way to others who are considering this... While we have sites that don't store images in a database, we're finding it's a lot more convenient to do so, and not only for backups as the advantages include: 1. transactional operations all the way through 2. easier clean-up cascading due to referential integrity 3. simpler multi-server replication 4. faster serving of individual images where directories with very large numbers of files can take longer to seek for than an efficiently-indexed table (Unix and Linux both seem to generally be very efficient in this regard in most instances, and Novell NetWare was too when we hosted web sites with it in the 1990s, but many other Operating Systems are not very good at this) 5. code that's easier to maintain because we don't have to have to deal with data outside of the database separately We also run a number of projects that use a mixture of large images in the filesystem and small images in the database (e.g., icons and user avatars), but find this hybrid approach to be more of a hassle so we're gradually moving away to file-system or fully-database. Note: We'll still store general site design images in the file system for the web page's logo and other graphics that make the site user-friendly for the end-user. For database storage, our primary focus is on storing user-supplied uploaded images in the database. From an access control standpoint, since we need to query the database from a Perl script anyway (e.g., to determine access control, expiry policies, etc.) before serving an image that is stored outside of the web site's DocumentRoot, having the image data, length (calculated), and mime type in the ResultSet works very well for generating the needed HTTP headers. From a security standpoint, if an image disappears from the file system (there are many reasons this can occur), or someone accidentally runs ransomeware that encrypts some or all of the images, then our dataset therefore becomes unreliable. Instead of writing a set of scripts to check that no image files are missing and administering that, I find it a lot easier to just store the image data in the database and not have to worry about this aspect at all. In our experience, PostgreSQL actually handles this very well, and the strategy we use is to store images in separate tables that were specifically created for storing one image per row along with other information, including original filename (if the project calls for it), dimensions (e.g., width and height), mime type, etc., along with arbitrary additional data such as creation timestampt, user ID (of uploader), title, description (which I know we can do anyway with a reference to a file), and even a view count (which can be maintained by an SQL Trigger). Storing the dimensions is also advantageous for dynamic HTML generation, by the way, since it means that we can generate HTML "img" tags with the width, height, and alt values. We take this a step further though by also storing a smaller version of the image (a.k.a., a thumbnail) that we serve with a different script, or with an additional character (or keyword) in the filename so that our script knows which image to send to the client. Both the full-size and smaller images exist in the same row in separate columns, which is also very convenient because then we have only one row to reference for both the full-size and thumbnail image. One of the great things about Perl is that there is more than one way to do things, and this flows through to how to handle the storing and retrieving of images. (Everyone else is welcome to share their perspectives and experiences on this in replies to this message.) > Why would you store images in a database? > > > > On 18 Aug 2019, at 23:28, Randolf Richardson wrote: > > > > Yes, you can use a Perl handler to do that. (I do this to send > > images stored in PostgreSQL that are retrieved with DBI instead of > > reading from files on the file system outside of the document root.) > > > > Perl can read X number of bytes into a local variable, then send the > > contents of that local variable. (Flushing the output is optional as > > Apache HTTPd should handle this efficiently and automatically.) > > > > Make sure you set the "Content-type" header appropriately so that > > your users' web browsers can handle the download appropriately: > > > > $r->content_type($MIME_TYPE); # --- Set MIME type > > > > Within a loop you'll keep reading the next portion of your data into > > a variab
Re: Streaming files
Yes, you can use a Perl handler to do that. (I do this to send images stored in PostgreSQL that are retrieved with DBI instead of reading from files on the file system outside of the document root.) Perl can read X number of bytes into a local variable, then send the contents of that local variable. (Flushing the output is optional as Apache HTTPd should handle this efficiently and automatically.) Make sure you set the "Content-type" header appropriately so that your users' web browsers can handle the download appropriately: $r->content_type($MIME_TYPE); # --- Set MIME type Within a loop you'll keep reading the next portion of your data into a variable such as $chunk, and simply send it in the standard way: $r->print($chunk); # --- Send data You can also send other headers (I usually set the content-type after sending other headers), such as "Content-Disposition" and whatnot to communicate whatever filename you prefer, and to specify the file size so that your users can see "percent complete" download indicators from their web browsers. (If you'd like an example, feel free to ask.) > Using a perl handler as the download link would be ideal. So, if I am > understanding you correctly, so long as I load chunks of the file and call > the perl print function on each chunk individually instead of loading all > of them into a variable and THEN calling print, I should avoid having to > load the complete file into memory? > > On Fri, Aug 16, 2019 at 2:36 PM Randolf Richardson > wrote: > > > One fairly straight-forward approach would be to write a script > > that > > serves as the path for downloads, then have it parse the filename to > > use as a key in determining which content to send to the user. (The > > AcceptPathInfo directive could be helpful for this, especially if you > > want to the script to appear as a subdirectory to the user.) > > > > That script could perform all the necessary security checks that > > you > > need, and read portions of the file to be streamed as needed (the > > streaming protocol implementation details would also need to be > > handled by your script, and you might find it helpful to look at > > what's available in CPAN for the streaming protocol you want to use). > > > > > In Java servlets, I can stream a file back to the browser one chunk at a > > > time. This has 2 benefits which interest me. > > > 1) Files can be stored outside the web root so users cannot download them > > > unless they are logged in, even if they know the path. > > > 2) Large files can be streamed back to the client without having the > > entire > > > file loaded into memory at the same time > > > > > > How would you recommend achieving similar functionality in mod_perl? > > > > > > -- > > > John Dunlap > > > *CTO | Lariat * > > > > > > *Direct:* > > > *j...@lariat.co * > > > > > > *Customer Service:* > > > 877.268.6667 > > > supp...@lariat.co > > > > > > > > > Randolf Richardson - rand...@inter-corporate.com > > Inter-Corporate Computer & Network Services, Inc. > > Beautiful British Columbia, Canada > > https://www.inter-corporate.com/ > > > > > > > > -- > John Dunlap > *CTO | Lariat * > > *Direct:* > *j...@lariat.co * > > *Customer Service:* > 877.268.6667 > supp...@lariat.co > Randolf Richardson - rand...@inter-corporate.com Inter-Corporate Computer & Network Services, Inc. Beautiful British Columbia, Canada https://www.inter-corporate.com/
Re: Streaming files
1. Streaming in chunks has nothing to do with security - you should have proper authentication and authorization layer wrapping these requests. 2. Handling chunks and large files are things which apache has perfected and we usually let apache do the nitty gritty. After all the validations are complete I just do return $r->lookup_file($file)->run(); - lookup_file is provided by Apache2::SubRequest On Fri, Aug 16, 2019 at 1:37 PM Dan Ullom wrote: > You could put the files behind Nginx and use your mod_perl web app as an > external authentication service. I've never done it but it doesn't look too > hard. > > On Fri, Aug 16, 2019, 12:25 PM John Dunlap wrote: > >> In Java servlets, I can stream a file back to the browser one chunk at a >> time. This has 2 benefits which interest me. >> 1) Files can be stored outside the web root so users cannot download them >> unless they are logged in, even if they know the path. >> 2) Large files can be streamed back to the client without having the >> entire file loaded into memory at the same time >> >> How would you recommend achieving similar functionality in mod_perl? >> >> -- >> John Dunlap >> *CTO | Lariat * >> >> *Direct:* >> *j...@lariat.co * >> >> *Customer Service:* >> 877.268.6667 >> supp...@lariat.co >> >
Re: Streaming files
You could put the files behind Nginx and use your mod_perl web app as an external authentication service. I've never done it but it doesn't look too hard. On Fri, Aug 16, 2019, 12:25 PM John Dunlap wrote: > In Java servlets, I can stream a file back to the browser one chunk at a > time. This has 2 benefits which interest me. > 1) Files can be stored outside the web root so users cannot download them > unless they are logged in, even if they know the path. > 2) Large files can be streamed back to the client without having the > entire file loaded into memory at the same time > > How would you recommend achieving similar functionality in mod_perl? > > -- > John Dunlap > *CTO | Lariat * > > *Direct:* > *j...@lariat.co * > > *Customer Service:* > 877.268.6667 > supp...@lariat.co >
Re: Streaming files
One fairly straight-forward approach would be to write a script that serves as the path for downloads, then have it parse the filename to use as a key in determining which content to send to the user. (The AcceptPathInfo directive could be helpful for this, especially if you want to the script to appear as a subdirectory to the user.) That script could perform all the necessary security checks that you need, and read portions of the file to be streamed as needed (the streaming protocol implementation details would also need to be handled by your script, and you might find it helpful to look at what's available in CPAN for the streaming protocol you want to use). > In Java servlets, I can stream a file back to the browser one chunk at a > time. This has 2 benefits which interest me. > 1) Files can be stored outside the web root so users cannot download them > unless they are logged in, even if they know the path. > 2) Large files can be streamed back to the client without having the entire > file loaded into memory at the same time > > How would you recommend achieving similar functionality in mod_perl? > > -- > John Dunlap > *CTO | Lariat * > > *Direct:* > *j...@lariat.co * > > *Customer Service:* > 877.268.6667 > supp...@lariat.co > Randolf Richardson - rand...@inter-corporate.com Inter-Corporate Computer & Network Services, Inc. Beautiful British Columbia, Canada https://www.inter-corporate.com/
Streaming files
In Java servlets, I can stream a file back to the browser one chunk at a time. This has 2 benefits which interest me. 1) Files can be stored outside the web root so users cannot download them unless they are logged in, even if they know the path. 2) Large files can be streamed back to the client without having the entire file loaded into memory at the same time How would you recommend achieving similar functionality in mod_perl? -- John Dunlap *CTO | Lariat * *Direct:* *j...@lariat.co * *Customer Service:* 877.268.6667 supp...@lariat.co