Re: [OT] Data store options
At 08:59 08.11.01 -0800, you wrote: My specifics are that I have a need to permanently store tens of thousands of smallish (5K) items. I'm currently using a simple file system store, one file per record, all in the same directory. Clearly, I need to move into a directory tree for better performance as the number of files increases. A modern FS like XFS or ReiserFS will do that without a directory tree. Joachim -- ... ein Geschlecht erfinderischer Zwerge, die fuer alles gemietet werden koennen.- Bertolt Brecht - Leben des Galilei
Proble by Build mod_perl as a DSO outside the Apache Source Tree via APXS
Hello Gurus, please help me to get update for mod_perl. I have good run apache 1.3, which has got by default on Installing the Linux 6.2 (Redhat) Now I install perl 5.6.1 and To extend an already installed Apache with mod_perl I followed the steps in http://perl.apache.org/guide/install.html#Build_mod_perl_as_a_DSO_outside_ But, unable to access the apache httpd from browser ( http://192.168.1.235/ ). i.e., Build mod_perl as a DSO outside the Apache Source Tree via APXS This will build the DSO libperl.so outside the Apache source tree with the new Apache 1.3 support tool apxs and install it into the existing Apache hierarchy. When I checked log file of the httpd, seems something ERROR [root@qclinux /root]# tail /etc/log/httpd/error_log Apache/1.3.12 (Unix) (Red Hat/Linux) PHP/3.0.15 mod_perl/1.26 configured -- resuming normal operationsconfigured -- resuming normal operationsSize magic not implemented. When I tried from browser not any page and checked log file [root@qclinux /root]# tail /var/log/httpd/error_log[Fri Nov 9 07:16:44 2001] [notice] child pid 856 exit signal Segmentation fault (11)[Fri Nov 9 07:16:44 2001] [notice] child pid 855 exit signal Segmentation fault (11)[Fri Nov 9 07:16:45 2001] [notice] child pid 860 exit signal Segmentation fault (11)[Fri Nov 9 07:16:45 2001] [notice] child pid 859 exit signal Segmentation fault (11) Please, kindly give me suggestion to get on work mod_perl with existing Apache. because, it has some many loaded module. I am facing so many problem to get work with new apache + mod_perl. So I want to continue with existing Apache and new mod_perl environment. Thanks in advance. -SubbaReddy M
Proble by Build mod_perl as a DSO outside the Apache Source Tree via APXS
Hello Gurus, please help me to get update for mod_perl. I have good run apache 1.3, which has got by default on Installing the Linux 6.2 (Redhat) Now I install perl 5.6.1 and To extend an already installed Apache with mod_perl I followed the steps in http://perl.apache.org/guide/install.html#Build_mod_perl_as_a_DSO_outside_ But, unable to access the apache httpd from browser ( http://192.168.1.235/ ). i.e., Build mod_perl as a DSO outside the Apache Source Tree via APXS This will build the DSO libperl.so outside the Apache source tree with the new Apache 1.3 support tool apxs and install it into the existing Apache hierarchy. When I checked log file of the httpd, seems something ERROR [root@qclinux /root]# tail /etc/log/httpd/error_log Apache/1.3.12 (Unix) (Red Hat/Linux) PHP/3.0.15 mod_perl/1.26 configured -- resuming normal operations configured -- resuming normal operations Size magic not implemented. When I tried from browser not any page and checked log file [root@qclinux /root]# tail /var/log/httpd/error_log [Fri Nov 9 07:16:44 2001] [notice] child pid 856 exit signal Segmentation fault (11) [Fri Nov 9 07:16:44 2001] [notice] child pid 855 exit signal Segmentation fault (11) [Fri Nov 9 07:16:45 2001] [notice] child pid 860 exit signal Segmentation fault (11) [Fri Nov 9 07:16:45 2001] [notice] child pid 859 exit signal Segmentation fault (11) Please, kindly give me suggestion to get on work mod_perl with existing Apache. because, it has some many loaded module. I am facing so many problem to get work with new apache + mod_perl. So I want to continue with existing Apache and new mod_perl environment. Thanks in advance. -SubbaReddy M
[OT] Data store options
Hi, verbose I'm looking for a little discussion on selecting a data storage method, and I'm posting here because Cache::Cache often is discussed here (along with Apache::Session). And people here are smart, of course ;). Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley DB, and locking issues. (Perrin, I've been curious why at etoys you used Berkeley DB over other caching options, such as Cache::Cache). I think RDBMS is not required as I'm only reading/writing and not doing any kind of selects on the data -- also I could end up doing thousands of selects for a request. So far, performance has been good with the file system store. My specifics are that I have a need to permanently store tens of thousands of smallish (5K) items. I'm currently using a simple file system store, one file per record, all in the same directory. Clearly, I need to move into a directory tree for better performance as the number of files increases. The data is accessed in a few ways: 1) Read/write a single record 2) Read anywhere from a few to thousands of records in a request. This is the typical mod_perl-based request. I know the record IDs that I need to read from another source. I basically need a way to get some subset of records fast, by record ID. 3) Traverse the data store and read every record. I don't need features to automatically expire the records. They are permanent. When reading (item 2) I have to create a perl data structure from the data, which doesn't change. So, I want to store this in my record, using Storable.pm. That can work with any data store, of course. It's not a complicated design. My choices are something like: 1) use Storable and write the files out myself. 2) use Cache::FileCache and have the work done (but can I traverse?) 3) use Berkeley DB (I understand the issues discussed in The Guide) So, what kind of questions and answers would help be weigh the options? With regard to locking, IIRC, Cache::Cache doesn't lock, rather writes go to a temp file, then there's an atomic rename. Last in wins. If updates to a record are not based on previous content (such as a counter file) is there any reason this is not a perfectly good method -- as opposed to flock? Again, I'm really looking more for discussion, not an answer to my specific needs. What issues would you use when selecting a data store method, and why? /verbose Thanks very much, Bill Moseley mailto:[EMAIL PROTECTED]
Re: [OT] Data store options
Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley DB, and locking issues. (Perrin, I've been curious why at etoys you used Berkeley DB over other caching options, such as Cache::Cache). Cache::Cache didn't exist at the time. BerkeleyDB seemed easier than rolling our own file cache, but in retrospect I'm not sure it was the right way to go. We spent a fair amount of time figuring out how to work with BerkeleyDB effectively. 1) use Storable and write the files out myself. 2) use Cache::FileCache and have the work done (but can I traverse?) 3) use Berkeley DB (I understand the issues discussed in The Guide) If you do use BerkeleyDB, I suggest you just use the simple database-level lock. Otherwise, you have to think about deadlocks and I found the deadlock daemon that comes with it kind of difficult to use. So, what kind of questions and answers would help be weigh the options? If you have too many records, I suppose Cache::FileCache might eat up all your inodes. You can probably go pretty far with a modern OS before you hit a problem with this. I always wanted to benchmark Cache::FileCache against BerkeleyDB for different read/write loads and see how they do. I think FileCache would win on Linux. With regard to locking, IIRC, Cache::Cache doesn't lock, rather writes go to a temp file, then there's an atomic rename. Last in wins. If updates to a record are not based on previous content (such as a counter file) is there any reason this is not a perfectly good method -- as opposed to flock? That should be fine. - Perrin
Cache::* and MD5 collisions [was: [OT] Data store options]
On Thu, Nov 08, 2001 at 08:59:55AM -0800, Bill Moseley wrote: Hi, verbose I'm looking for a little discussion on selecting a data storage method, and I'm posting here because Cache::Cache often is discussed here (along with Apache::Session). And people here are smart, of course ;). Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley DB, and locking issues. Even a bit more OT: one thing to watch out for, especially if you plan on caching a *lot* of data, is that the Cache::* modules did not do collision detection on MD5 collisions the last time I looked. Forgive me if that's changed recently. The probability is extremely low, but they can and do occur in the wild (I won't bring anyone's name up here :). In other words, it's a remote possibility that one URI might serve up another's content if the two hash keys map to the same MD5 value. Given an infinite number of web monkeys using Cache::* and an infinite number of user monkeys clicking on links... - Barrie
Re: Cache::* and MD5 collisions [was: [OT] Data store options]
On Thu, 2001-11-08 at 10:11, Barrie Slaymaker wrote: On Thu, Nov 08, 2001 at 08:59:55AM -0800, Bill Moseley wrote: Hi, verbose I'm looking for a little discussion on selecting a data storage method, and I'm posting here because Cache::Cache often is discussed here (along with Apache::Session). And people here are smart, of course ;). Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley DB, and locking issues. Even a bit more OT: one thing to watch out for, especially if you plan on caching a *lot* of data, is that the Cache::* modules did not do collision detection on MD5 collisions the last time I looked. Forgive me if that's changed recently. The probability is extremely low, but they can and do occur in the wild (I won't bring anyone's name up here :). In other words, it's a remote possibility that one URI might serve up another's content if the two hash keys map to the same MD5 value. Given an infinite number of web monkeys using Cache::* and an infinite number of user monkeys clicking on links... You could switch to SHA1. SHA1 collisions will be much more rare than MD5 collisions. -jwb
Re: Cache::* and MD5 collisions [was: [OT] Data store options]
On Thu, Nov 08, 2001 at 01:11:21PM -0500, Barrie Slaymaker wrote: Even a bit more OT: one thing to watch out for, especially if you plan on caching a *lot* of data, is that the Cache::* modules did not do collision detection on MD5 collisions the last time I looked. Forgive me if that's changed recently. I knew that a collision was technically possible, but the odds seemed astronomically low. But if security is a concern, then I agree that MD5-ing the key isn't the best way to create a unique identifier. The thing that I liked most about MD5 (other than its convenience) is that it tends to distribute very evenly. For example, file system caches fill their directories roughly equally when their paths are created from MD5 hashed keys. Doing something simple and unique like URL-encoding the key to make a legal identifier (legal in the sense that it is a valid filename) wouldn't distribute as evenly. Barrie raised this question privately to me earlier this year, and this is what I wrote at the time: Good question. They would overwrite each other. You may want to check out this RFC for more information about the odds of that: http://www.faqs.org/rfcs/rfc1321.html Specifically, they say that it is conjectured that the difficulty of coming up with two messages having the same message digest is on the order of 2^64 operations, and that the difficulty of coming up with any message having a given message digest is on the order of 2^128 operations. This means you would need 18,446,744,073,709,551,616 keys to make it 100% likely that there would be a collision. Assuming you were only willing to risk a 0.01% chance that there is a collision, you could still fit 1,844,674,407,370,955 keys. Since each value stored will take at least 1k in memory, that number of keys would require 1,844,674 Terrabytes of storage. In other words, it is highly unlikely that you could get a collision in real world usage. Note that I'm not disagreeing with Barrie at all -- I just took the easy way out and refuted the likelihood of an infinite number of monkeys hitting my code. :) -DeWitt
Re: Cache::* and MD5 collisions [was: [OT] Data store options]
Hello, DCFor example, file system caches fill their directories roughly equally DCwhen their paths are created from MD5 hashed keys. Doing something DCsimple and unique like URL-encoding the key to make a legal identifier DC(legal in the sense that it is a valid filename) wouldn't distribute as DCevenly. Let me point out that if you are using MD5 hashes for directory spreading (i.e. to spread a large number of files across a tree of directories so that no one directory is filled with too many files for efficient filesystem access), the easy solution is to just append the unique key that you are hashing to the files involved. For example, say your keys are e-mail addresses and you just want to use an MD5 hash to spread your data files over directories so that no one directory has too many files in it. Say your original key is [EMAIL PROTECTED] (hex encoded MD5 hash of this is RfbmPiuRLyPGGt3oHBagt). Instead of just storing the key in the file R/Rf/Rfb/Rfbm/RfbmPiuRLyPGGt3oHBagt.dat, store the key in the file [EMAIL PROTECTED] Presto... collisions are impossible. Humbly, Andrew -- Andrew Ho http://www.tellme.com/ [EMAIL PROTECTED] Engineer [EMAIL PROTECTED] Voice 650-930-9062 Tellme Networks, Inc. 1-800-555-TELLFax 650-930-9101 --
Re: [OT] Data store options
Hello, PHIf you do use BerkeleyDB, I suggest you just use the simple PHdatabase-level lock. Otherwise, you have to think about deadlocks and I PHfound the deadlock daemon that comes with it kind of difficult to use. Later versions of BerkeleyDB have a row-level lock available which works pretty transparently. However, this works using mmap() so it won't work if your BerkeleyDB file(s) is/are mounted via NFS. flock()'s scalability over NFS using lockd under Solaris is also questionable so many people end up implementing their own database-level lock using, say, atomic moves. It's an icky world out there. Humbly, Andrew -- Andrew Ho http://www.tellme.com/ [EMAIL PROTECTED] Engineer [EMAIL PROTECTED] Voice 650-930-9062 Tellme Networks, Inc. 1-800-555-TELLFax 650-930-9101 --
Re: Cache::* and MD5 collisions [was: [OT] Data store options]
At 10:54 AM 11/08/01 -0800, Andrew Ho wrote: For example, say your keys are e-mail addresses and you just want to use an MD5 hash to spread your data files over directories so that no one directory has too many files in it. Say your original key is [EMAIL PROTECTED] (hex encoded MD5 hash of this is RfbmPiuRLyPGGt3oHBagt). Instead of just storing the key in the file R/Rf/Rfb/Rfbm/RfbmPiuRLyPGGt3oHBagt.dat, store the key in the file [EMAIL PROTECTED] Presto... collisions are impossible. That has the nice side effect that I can run through the directory tree and get the key for every file. I do need a way to read every key in the store. Order is not important. Bill Moseley mailto:[EMAIL PROTECTED]
Re: Cache::* and MD5 collisions [was: [OT] Data store options]
On Thu, Nov 08, 2001 at 10:54:11AM -0800, Andrew Ho wrote: Let me point out that if you are using MD5 hashes for directory spreading (i.e. to spread a large number of files across a tree of directories so that no one directory is filled with too many files for efficient filesystem access), the easy solution is to just append the unique key that you are hashing to the files involved. Neat. That adds directory levels if you do URI-filesystem_path mapping for the key, or runs in to namelength limits if you fold / and, say, % and . in to %FF style escape codes or base64 it. And namelength limitations are even more severe on many non-Unixish filesystems :). I prefer the technique of storing the full text of the hash key in the stored object's metadata (ie in the cache) and comparing it to the requested key on retrieval. When a collision is detected, you can then do overflow processing (like most hashed dictionary data structures from CS-land) and seek the cached copy somewhere else in the cache or just treat it like a cache miss. MD5 is a good thing, but relying on it's uniqueness for a site that needs to be reliable is a bit risky in my mind. YMMV, I just want folks to be aware of the issues. - Barrie
Re: [OT] Data store options
Bill Moseley wrote: Hi, verbose I'm looking for a little discussion on selecting a data storage method, and I'm posting here because Cache::Cache often is discussed here (along with Apache::Session). And people here are smart, of course ;). Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley DB, and locking issues. (Perrin, I've been curious why at etoys you used Berkeley DB over other caching options, such as Cache::Cache). I think RDBMS is not required as I'm only reading/writing and not doing any kind of selects on the data -- also I could end up doing thousands of selects for a request. So far, performance has been good with the file system store. Hey Bill, I'll tell you about using MLDBM::Sync for this, of which I'm the author. MLDBM::Sync is a wrapper around MLDBM databases which could be SDBM_File, GDBM_File, DB_File, and recently Tie::TextDir based. It provides the locking layer that you need to keep access to the database safe, without corrupting things or losing data. Depending on the OS, and whether the dbm is on something like a RAM disk, performance will vary. MLDBM has long been a simple way to read and write complex data to dbm file through an easy tied interface: $dbm{$key} = \%data; my $data = $dbm{$key}; What you get with MLDBM::Sync is the locking API, plus some other goodies like RAM caching and auto checksum keys if you like. 1) Read/write a single record 2) Read anywhere from a few to thousands of records in a request. This is the typical mod_perl-based request. I know the record IDs that I need to read from another source. I basically need a way to get some subset of records fast, by record ID. 3) Traverse the data store and read every record. Regarding some of these specific issues ... I wrote MLDBM::Sync to be able to specifically handle #1 safely. For #2, there is an API that you can use like tied(%hash)-Lock(); OR tied(%hash)-ReadLock(); ... do lots of reads/writes ... tied(%hash)-Unlock(); that can be used to improve the performance of multiple reads and writes between requests. You can use the locking strategy too to do #3 really fast, or slower without locking. I wrote this using the techniques I had long been using in Apache::ASP for $Session and $Application support, and recently bolted MLDBM::Sync in for these. I have been using MLDBM::Sync in production for something like 6 months to a year now as a stand alone module, but only recently added support for Tie::TextDir. When reading (item 2) I have to create a perl data structure from the data, which doesn't change. So, I want to store this in my record, using Storable.pm. That can work with any data store, of course. MLDBM supports this kind of thing natively, via: use MLDBM qw(DB_File Storable);# use Storable for serializing Below are some benchmarks when running bench/bench_sync.pl in the MLDBM::Sync distribution on my 2.2.14 Linux kernel on ext2 fs. Only in my .23 dev release have I added the -n --bundle options you see below. The bundle option in particular is the # of reads/writes per lock, which is used to improve performance. I would probably use GDBM_File in your position, as I am not sure that Tie::TextDir would scale as well past 1 files/entries. Happy hacking! --Josh _ Joshua Chamas Chamas Enterprises Inc. NodeWorks Founder Huntington Beach, CA USA http://www.nodeworks.com1-714-625-4051 [MLDBM-Sync-0.21]# perl bench/bench_sync.pl === INSERT OF 50 BYTE RECORDS === Time for 100 writes + 100 reads for SDBM_File 0.14 seconds 12288 bytes Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.17 seconds 12288 bytes Time for 100 writes + 100 reads for GDBM_File 3.00 seconds 18066 bytes Time for 100 writes + 100 reads for DB_File4.10 seconds 20480 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 0.24 seconds 9096 bytes === INSERT OF 500 BYTE RECORDS === Time for 100 writes + 100 reads for SDBM_File 0.24 seconds 1297408 bytes Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.54 seconds 207872 bytes Time for 100 writes + 100 reads for GDBM_File 2.98 seconds 63472 bytes Time for 100 writes + 100 reads for DB_File4.29 seconds 114688 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 0.27 seconds 54096 bytes === INSERT OF 5000 BYTE RECORDS === (skipping test for SDBM_File 1024 byte limit) Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 1.35 seconds 1911808 bytes Time for 100 writes + 100 reads for GDBM_File 4.11 seconds 832400 bytes Time for 100 writes + 100 reads for
mod_dir DirectoryIndex vs. $r-uri() manipulation
Hello again modperlers, I am using perl 5.6.1 / Apache 1.3.20 / mod_perl 1.26. I have an interesting problem which maybe someone could shed some light on : I have an early phase handler (can be done as either a PerlPostReadRequestHandler or a PerlTransHandler) which manipulates the request URI. The purpose of the handler is to strip out the first segment of the URI path and employ that as a user type string. I presently have it as a TransHandler which returns DECLINED, expecting Apache to continue URI translation based on the new URI setting. For almost all types of URIs this functionality is working: it works for files in the DocumentRoot, it works for PerlHandlers declared in Location blocks, and it works for Alias mapped paths. Only one thing breaks... The one URI type which isn't resolved correctly is DirectoryIndex-resolved paths. If mod_autoindex is on, then an AutoIndex directory listing is displayed. If mod_autoindex is off, a NOT FOUND error is displayed. If I access the same directory without the URI manipulation (and without the extra user type segment), the DirectoryIndex-specified file is served as expected. I thought to myself, maybe the mod_dir resolution wants to be before any declared PerlTransHandler (even though mod_autoindex comes after) so I tried moving my handler to a PostReadRequestHandler. Same effect though -- something is short-circuiting the DirectoryIndex function of mod_dir. Anyone else encounter this? Any ideas how to resolve? Thanks, -- David Pisoni -- [EMAIL PROTECTED] Education is a progressive discovery of our own ignorance. -Will Durant, historian (1885-1981)
Re: [JOB] Red Hat Network Web Engineer positions open
Jiminy Crickets Chip! Yer makin me drool! If it weren't for the Move to NC part, I'd be doing my best to be your new closest buddy ;-) --Jon Robison Uniphied Thought, LLC Dearborn, MI Chip Turner wrote: Hello all, We have a couple openings doing intense and interesting mod_perl work here at Red Hat. Formal description is below. Key skills are perl, mod_perl, apache, and DBI (especially Oracle). Must relocate to Research Triangle Park, North Carolina. Job Title: Web Engineer Red Hat is looking for some skilled web engineers to design, enhance, and implement applications on the Red Hat web site. You will work on a global team to create a best-of-breed web site and implement web services that support Red Hat's Open Source products and services. The ideal candidate will have the following characteristics: 1. Ability to learn. Candidate must show they can adapt to the rapidly changing technology landscape. 2. Ability to communicate. Candidate must show that they can express ideas with elegance and flair. 3. Mature technical ability. Candidate must understand how the web works from a low to a high level. Candidate must also understand how quality software is designed and constructed. We want you to think as much as you code. Other required skills: Four year degree or equivalent industry experience. Previous experience developing web applications, preferably for a high volume revenue based web site. Understanding and experience with object-oriented design and coding. Experience using Perl with Apache, preferably mod_perl directly. Understanding of relational database concepts and SQL (prefer Oracle) Experience with HTML and HTML tools, as well as XML and XHTML. Knowledge of basic Software Engineering concepts: development models, testing, documentation, and revision control. Pluses: Knowledge of Red Hat's products and services. Understanding of Apache 1.3.x configuration. Knowledge of Apache internals and C module development also a plus. Expert Perl programming experience including: Perl web applications under Apache and mod_perl, Apache modules, Perl modules, Perl OOP extensions. Languages: Java, Javascript, Python, PHP, C, C++ Web Technology: XML, XSLT and XML-RPC; Internationalization; Applications Servers (HTML::Mason, Embperl, Apache::ASP, Zope, Zentropa, AxKit, Cocoon etc.); Search systems and concepts; cookies and authentication mechanisms Oracle database and Oracle Applications (ERP, CRM) Experience. -- Chip Turner [EMAIL PROTECTED] Red Hat Network
Mac OSX 10.1 - mod_perl build instructions
Apparently, mod_perl wants to be built static into Apache on OSX, and yet wants to use mod_so to load any additional thingies like Apache::Request or Apache::Template. So after many hours of trying different combinations of things, I finally yelled out Yippee Skippee (a phrase my dad firmly installed in my brain) when I stumbled across this combination of things: So, here's the current, verynicelyworking config.status for Apache, and the build instructions for mod_perl with Apache: 1) unpack mod_perl from the CPAN, and execute sudo perl Makefile.PL \ APACHE_SRC=/opt/apache/1.3.22/src/apache_1.3.22/src \ NO_HTTPD=1 \ USE_APACI=1 \ PREP_HTTPD=1 \ EVERYTHING=1 # hit return a bunch of times sudo make all install 2) then go to your apache installation, and execute sudo ./configure \ --with-layout=GNU \ --prefix=/opt/apache/1.3.22 \ --activate-module=src/modules/perl/libperl.a \ --enable-module=most \ --enable-shared=max \ --disable-shared=perl sudo make all install fixing prefix to be wherever you want. I'm sticking all local installs under /opt/PACKAGE/VERSION/* so I can quickly see what I've added. I usually ln -s /opt/PACKAGE/VERSION/bin/* to /usr/local/bin so that I can invoke the commands, though. And there it is... so nobody else has to tweak the same way I did. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!