[PERFORM] Mass file imports
Hello, I'm hunting for some advice on loading 50,000+ files all less than 32KB to a 7.3.2 database. The table is simple. create table files ( idint8 not null primary key, file text not null, size int8 not null, uid int not null, raw oid ); The script (currently bash) pulls a TAR file out of a queue, unpacks it to a large ramdisk mounted with noatime and performs a battery of tests on the files included in the TAR file. For each file in the TAR is will add the following to a SQL file... update files set raw=lo_import('/path/to/file/from/tar') where file='/path/to/file/from/tar'; This file begins with BEGIN; and ends with END; and is fed to Postgres via a "psql -f sqlfile" command. This part of the process can take anywhere from 30 to over 90 minutes depending on the number of files included in the TAR file. System is a RedHat 7.3 running a current 2.4.20 RedHat kernel and dual PIII 1.4GHz 2GB of memory 512MB ramdisk (mounted noatime) mirrored internal SCSI160 10k rpm drives for OS and swap 1 PCI 66MHz 64bit QLA2300 1 Gbit SAN with several RAID5 LUN's on a Hitachi 9910 All filesystems are ext3. Any thoughts? Greg -- Greg Spiegelberg Sr. Product Development Engineer Cranel, Incorporated. Phone: 614.318.4314 Fax: 614.431.8388 Email: [EMAIL PROTECTED] Cranel. Technology. Integrity. Focus. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] opinion on RAID choice
Ron Johnson wrote: On Tue, 2003-09-02 at 11:14, Andrew Sullivan wrote: On Thu, Aug 28, 2003 at 03:26:14PM -0600, scott.marlowe wrote: My experience has been that once you get past 6 disks, RAID5 is faster than RAID1+0. Also depends on your filesystem and volume manager. As near as I can tell, you do _not_ want to use RAID 5 with Veritas. Why should Veritas care? Or is it that Veritas has a high overhead of small block writes? I agree with Scott however only when it's hardware RAID 5 and only certain hardware implementations of it. A Sun A1000 RAID 5 is not equal to a Sun T3. Putting disk technologies aside, the A1000 array XOR function is in software whereas the T3 is implemented in hardware. Additionally, most external hardware based RAID systems have some form of battery backup to ensure all data is written. Veritas Volume Manager and even Linux, HP-UX and AIX LVM works just fine when slicing & dicing but not for stitching LUN's together. IMHO, if you have the $$ for VxVM buy a hardware based RAID solution as well and let it do the work. Greg -- Greg Spiegelberg Sr. Product Development Engineer Cranel, Incorporated. Phone: 614.318.4314 Fax: 614.431.8388 Email: [EMAIL PROTECTED] Cranel. Technology. Integrity. Focus. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Compare rows
All, Anyone have any suggestions on how to efficiently compare rows in the same table? This table has 637 columns to be compared and 642 total columns. TIA, Greg -- Greg Spiegelberg Sr. Product Development Engineer Cranel, Incorporated. Phone: 614.318.4314 Fax: 614.431.8388 Email: [EMAIL PROTECTED] Cranel. Technology. Integrity. Focus. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Compare rows
Josh Berkus wrote: Greg, Anyone have any suggestions on how to efficiently compare rows in the same table? This table has 637 columns to be compared and 642 total columns. 637 columns? Are you sure that's normalized? It's hard for me to conceive of a circumstance where that many columns would be necessary. If this isn't a catastrophic normalization problem (which it sounds like), then you will probably still need to work through procedureal normalization code, as SQL simply doesn't offer any way around naming all the columns by hand. Perhaps you could describe the problem in more detail? The data represents metrics at a point in time on a system for network, disk, memory, bus, controller, and so-on. Rx, Tx, errors, speed, and whatever else can be gathered. We arrived at this one 642 column table after testing the whole process from data gathering, methods of temporarily storing then loading to the database. Initially, 37+ tables were in use but the one big-un has saved us over 3.4 minutes. The reason for my initial question was this. We save changes only. In other words, if system S has row T1 for day D1 and if on day D2 we have another row T1 (excluding our time column) we don't want to save it. That said, if the 3.4 minutes gets burned during our comparison which saves changes only we may look at reverting to separate tables. There are only 1,700 to 3,000 rows on average per load. Oh, PostgreSQL 7.3.3, PHP 4.3.1, RedHat 7.3, kernel 2.4.20-18.7smp, 2x1.4GHz PIII, 2GB memory, and 1Gbs SAN w/ Hitachi 9910 LUN's. Greg -- Greg Spiegelberg Sr. Product Development Engineer Cranel, Incorporated. Phone: 614.318.4314 Fax: 614.431.8388 Email: [EMAIL PROTECTED] Cranel. Technology. Integrity. Focus. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]