Re: Data Archiving
On 08/04/13 20:59, Rob Owens wrote: On Mon, Apr 08, 2013 at 09:30:52AM -0700, Gary Roach wrote: As for as OCR vs retyping vs scan and preses - still up in the air at this point. I suspect that all three methods might be used. There are commercial companies that will do bulk scanning and OCR. I used one in the past and I found the price to be pretty reasonable. I'm not sure if anybody has OCR software that works on hand-written documents, though... -Rob If the database / digital storage space is not a problem I'd rather suggest storing scanned documents as they are (in graphical format) - you never know when this may come handy. Sometimes you want to see how the document looked like, not just to read its content only. And OCRed version for searching purposes, linked with the images. I remember there was a (commercial, MS Windows) Russian OCR software some years ago, really good working then; probably it could handle handwriting to some extent: http://finereader.abbyy.com/ I never used it for handwritten text though. Another option is to hire some teenagers or students who'd like to earn some additional money if you have funds for that - both ways need proof-reading afterwards. Not sure which database would be good for storing images; is MySQL capable of managing it in an efficient way; perhaps Postgress SQL or another solution is better. You could also store the images on the drive and keep links (paths) to the images in the database. Did you try contacting other institutions which might have similar needs - what do they use? Kind regards, Michal -- Michal R. Hoffmann -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/5165c6c2.20...@o2.pl
Re: Data Archiving
On Sun, Apr 07, 2013 at 01:57:51PM -0700, Gary Roach wrote: Hi all, I have a records archiving problem and don't know where to start. There are 100 years of records that include hand written material, type written hard copy, photos and a lot of email. I would like to have a system based around mysql (if possible) that would allow flexible data mining. Are there any good books on the subject or existing Debian software that would do the job. Any suggestions will be sincerely appreciated. An answer of I'm an idiot for even trying is acceptable. No answers here, but here's[1] a similar question asked on slashdot recently. [1] http://ask.slashdot.org/story/13/04/07/1926229/ask-slashdot-open-source-for-bill-and-document-management Thanks in advance Gary R -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/5161ddcf.1050...@verizon.net signature.asc Description: Digital signature
Re: Data Archiving
On 04/07/2013 05:56 PM, Zenaan Harkness wrote: On 4/8/13, Gary Roachgary719_li...@verizon.net wrote: Hi all, I have a records archiving problem and don't know where to start. There are 100 years of records that include hand written material, type written hard copy, photos and a lot of email. I would like to have a system based around mysql (if possible) that would allow flexible data mining. How many records? How many people will be entering these records? Are you going to OCR or type some of them, or just scan them in and record data about each record? flexible data mining sound a bit sales pitch-y, but that might just be me. Is this for you, or a company, or a library? Or do you want to write some software to do this sort of thing? Or is it a University project? Ok, the organization is the Unitarian Universalist Church of Long Beach CA. We have been around since 1913. I recently got stuck with the job of Church Historian and am concerned about the closet full of records going back to day one. Record organization is virtually non-existent. Our minister recently tried to put together a quick history and it took her 2 weeks of digging to find anything. We also have problems like Board meeting minutes with motions that have long since been forgotten. There is a small but constant demand for information on past events (especially pictures). There are other needs too numerous to list here. I agree that data mining is the latest buzz word out there but couldn't think of a better description for what I need. Being able to retrieve date by categories and then sub-categories would be essential. As for as OCR vs retyping vs scan and preses - still up in the air at this point. I suspect that all three methods might be used. We have some really sharp computer types that maintain our database, email and web site but no one is working with archiving. I would like to be able to cut something like mySQL to our needs (some minimal experience here) but really don't know how to put such a system together. A good book on the subject or or an extensive on line treatise would help. Gary R. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/5162f0bc.6000...@verizon.net
Re: Data Archiving
On 4/8/13 7:30 PM, Gary Roach wrote: Ok, the organization is the Unitarian Universalist Church of Long Beach CA. We have been around since 1913. I recently got stuck with the job of Church Historian and am concerned about the closet full of records going back to day one. [snip] Is there a historical archive in the region where you might get advice? Or perhaps a library school with an archival line with students in need of projects or practical training? Helping solve this kind of problem can be very interesting for the right people, if you can find them. Regards, /Lars -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/5162f19e.8010...@gmail.com
Re: Data Archiving
On Mon, Apr 08, 2013 at 09:30:52AM -0700, Gary Roach wrote: As for as OCR vs retyping vs scan and preses - still up in the air at this point. I suspect that all three methods might be used. There are commercial companies that will do bulk scanning and OCR. I used one in the past and I found the price to be pretty reasonable. I'm not sure if anybody has OCR software that works on hand-written documents, though... -Rob -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130408195911.gb27...@aurora.owens.net
Data Archiving
Hi all, I have a records archiving problem and don't know where to start. There are 100 years of records that include hand written material, type written hard copy, photos and a lot of email. I would like to have a system based around mysql (if possible) that would allow flexible data mining. Are there any good books on the subject or existing Debian software that would do the job. Any suggestions will be sincerely appreciated. An answer of I'm an idiot for even trying is acceptable. Thanks in advance Gary R -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/5161ddcf.1050...@verizon.net
Re: Data Archiving
On 4/8/13, Gary Roach gary719_li...@verizon.net wrote: Hi all, I have a records archiving problem and don't know where to start. There are 100 years of records that include hand written material, type written hard copy, photos and a lot of email. I would like to have a system based around mysql (if possible) that would allow flexible data mining. How many records? How many people will be entering these records? Are you going to OCR or type some of them, or just scan them in and record data about each record? flexible data mining sound a bit sales pitch-y, but that might just be me. Is this for you, or a company, or a library? Or do you want to write some software to do this sort of thing? Or is it a University project? -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CAOsGNSRbQBdTBxVA-7_=djfubg8drkmnnhjdnzyf08f1uyw...@mail.gmail.com
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Douglas Allan Tutty wrote: On Wed, May 23, 2007 at 10:49:51PM -0500, Ron Johnson wrote: On 05/23/07 20:17, Douglas Allan Tutty wrote: On Wed, May 23, 2007 at 07:05:23PM -0500, Ron Johnson wrote: It would be very nice if there was a universal cross-platform rw + encrypt filesystem for archives. Something that you could be confident that you could decrypt and access in 10 years using whatever OS was current then. tar is cross-platform, as is ASCII CSV. PGP/GPG is also cross-platform. I don't know if a generic tarball I make today will be readable by whatever OS in 10 years, which is why I store a current install cd. In 10 years, hopefully I can find a computer that will boot it. You've got bigger problems if you think that a CD-R will keep it's integrity for 10 years. No. I figure a CD is good for at least a year. Every year, I pull the two netinst cds from the bank, take an SHA hash and compare it with the written notes, then run something like cdck on them. So far, my Woody CDs are fine. Funny enough, so is my woody floppy set (the whole shebang set of 20 floppies) on Maxell floppies; needed for my 486 that doesn't boot from CD or run an installer after woody's. Wow, you seem to have a lot of spare time. How long does it take you to perform all these checks? I do backups to usb hard drives. They have 40GB to 120GB and it takes on the order of 1 minute / GB to diff -r them. Considering lifetime and how often you're able to rewrite / reuse the media, they are cheaper per GB than CDs/DVDs. I guess I will realize soon enough, whenever usb and/or ext3 will be phased out and replaced by a different technology and will then move the data to new media / formats. Johannes -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGVqEuC1NzPRl9qEURAt0oAJ9vcdaqkzDZXUaxSFOZegNJVduupwCeLnz+ 6SN2ixUqFEdumdy5Jtk0Mls= =sUqD -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
On Fri, May 25, 2007 at 10:41:18AM +0200, Johannes Wiedersich wrote: Douglas Allan Tutty wrote: No. I figure a CD is good for at least a year. Every year, I pull the two netinst cds from the bank, take an SHA hash and compare it with the written notes, then run something like cdck on them. So far, my Woody CDs are fine. Funny enough, so is my woody floppy set (the whole shebang set of 20 floppies) on Maxell floppies; needed for my 486 that doesn't boot from CD or run an installer after woody's. Wow, you seem to have a lot of spare time. How long does it take you to perform all these checks? My backup set isn't that large so only an hour or so. I do backups to usb hard drives. They have 40GB to 120GB and it takes on the order of 1 minute / GB to diff -r them. So how often to you fsck -c the filesystem so that it attempts to read every block so that in turn the drive hardware can handle fading sectors? Hard drives on a shelf aren't maintenance-free either. Considering lifetime and how often you're able to rewrite / reuse the media, they are cheaper per GB than CDs/DVDs. True. However, for a small data set (under 1 GB) the need for three copies means three hard drives. Using a hard drive and rewriting over it means that you loose old archives. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Douglas Allan Tutty wrote: On Fri, May 25, 2007 at 10:41:18AM +0200, Johannes Wiedersich wrote: Douglas Allan Tutty wrote: No. I figure a CD is good for at least a year. Every year, I pull the two netinst cds from the bank, take an SHA hash and compare it with the written notes, then run something like cdck on them. So far, my Woody CDs are fine. Funny enough, so is my woody floppy set (the whole shebang set of 20 floppies) on Maxell floppies; needed for my 486 that doesn't boot from CD or run an installer after woody's. Wow, you seem to have a lot of spare time. How long does it take you to perform all these checks? My backup set isn't that large so only an hour or so. I do backups to usb hard drives. They have 40GB to 120GB and it takes on the order of 1 minute / GB to diff -r them. So how often to you fsck -c the filesystem so that it attempts to read every block so that in turn the drive hardware can handle fading sectors? Hard drives on a shelf aren't maintenance-free either. My point was not that they are maintenance-free. My point was that both backups and maintenance are much faster. Considering lifetime and how often you're able to rewrite / reuse the media, they are cheaper per GB than CDs/DVDs. True. However, for a small data set (under 1 GB) the need for three copies means three hard drives. Using a hard drive and rewriting over it means that you loose old archives. If you have 1GB of data and a say 40GB hard disk that means about 40 full backups on each. With incremental backups those would last much, much longer. For your three disks you'd have 120 full backups! Of course in the case of failure you'd loose 40 of them, instead of loosing one unreadable CD, but I consider checking 120 CDs for unreadable sectors etc. a nightmare. Just my .02 Johannes NB: My backup system started out as DVD-RAMs, since those are said to be more reliable than CDs/DVDs. I gave up on that scheme, when the data to be backed up started to exceed 2 disks (and when my first DVD-Ram died - probably related to the death of my laptop's multiburner). I now use CDs, DVD-RAMs etc. for non-mission-critical stuff only. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD4DBQFGVuzkC1NzPRl9qEURAiydAJsHZGQFU31GP/toAo6lNzJ37yaVrwCVFoUh eL9tWT0LqzWQv5m6xjH6Iw== =L8x6 -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/25/07 09:04, Johannes Wiedersich wrote: Douglas Allan Tutty wrote: [snip] True. However, for a small data set (under 1 GB) the need for three copies means three hard drives. Using a hard drive and rewriting over it means that you loose old archives. If you have 1GB of data and a say 40GB hard disk that means about 40 full backups on each. With incremental backups those would last much, much longer. For your three disks you'd have 120 full backups! Of course in the case of failure you'd loose 40 of them, instead of loosing one unreadable CD, but I consider checking 120 CDs for unreadable sectors etc. a nightmare. But isn't that putting all your eggs in one basket? (Unless I'm mis-reading you.) - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGVvAmS9HxQb37XmcRAryQAKCbn59hmgEmhMt3LmZm/sj7/jzPZACdF1g1 kac37iDVg9pkSSNIKX4F8vU= =HsST -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ron Johnson wrote: On 05/25/07 09:04, Johannes Wiedersich wrote: Douglas Allan Tutty wrote: [snip] True. However, for a small data set (under 1 GB) the need for three copies means three hard drives. Using a hard drive and rewriting over it means that you loose old archives. If you have 1GB of data and a say 40GB hard disk that means about 40 full backups on each. With incremental backups those would last much, much longer. For your three disks you'd have 120 full backups! Of course in the case of failure you'd loose 40 of them, instead of loosing one unreadable CD, but I consider checking 120 CDs for unreadable sectors etc. a nightmare. But isn't that putting all your eggs in one basket? (Unless I'm mis-reading you.) 3 disks in three different locations (according to Douglas' requirements). You'd have backup 1,4,7,... on the first disk 2,5,8,... on the second and so forth. Johannes -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGVy6SC1NzPRl9qEURAmalAJ0Qftp3FxJurDTSx8bRs+PFXR3hbgCeJu8l 0WXMOX3cEjNIAMsKNmtbhpE= =+5Yz -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/25/07 13:44, Johannes Wiedersich wrote: Ron Johnson wrote: [snip] But isn't that putting all your eggs in one basket? (Unless I'm mis-reading you.) 3 disks in three different locations (according to Douglas' requirements). You'd have backup 1,4,7,... on the first disk 2,5,8,... on the second and so forth. Roger that! - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGV0gHS9HxQb37XmcRAv8bAKDpDj75Bn+yGJAxyKdMY0OePrq1pgCfYRtr JsWnHvCdj0Sw3vsbWt0n21A= =Fadv -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
On Wed, May 23, 2007 at 10:49:51PM -0500, Ron Johnson wrote: On 05/23/07 20:17, Douglas Allan Tutty wrote: On Wed, May 23, 2007 at 07:05:23PM -0500, Ron Johnson wrote: It would be very nice if there was a universal cross-platform rw + encrypt filesystem for archives. Something that you could be confident that you could decrypt and access in 10 years using whatever OS was current then. tar is cross-platform, as is ASCII CSV. PGP/GPG is also cross-platform. I don't know if a generic tarball I make today will be readable by whatever OS in 10 years, which is why I store a current install cd. In 10 years, hopefully I can find a computer that will boot it. You've got bigger problems if you think that a CD-R will keep it's integrity for 10 years. No. I figure a CD is good for at least a year. Every year, I pull the two netinst cds from the bank, take an SHA hash and compare it with the written notes, then run something like cdck on them. So far, my Woody CDs are fine. Funny enough, so is my woody floppy set (the whole shebang set of 20 floppies) on Maxell floppies; needed for my 486 that doesn't boot from CD or run an installer after woody's. Tape (using tar, and a media used by large data processing shops, since they are supported for a LONG LONG LONG time, unlike that gee whiz specialized crap that NASA seems to love) or SCSI hard drives (in external enclosures so you can spin them up annually) formatted ext2 or FAT32 are what I would choose. Would you use tar to make a tarball and put it on a hard drive formatted ext2, copy as is to ext2 (changing ctime in the process), or forgo a filesystem and write tar directly to the raw disk? What tar format would you use: GNU or Posix? FAT has been around for 26+ years, and ext2 is 14 years old. Which is more resistant to bad blocks popping up after time in storage? If I gpg a tarball today with whatever algorithm is current, in 10 years that algorithm may be long cracked. Will the gpg authors keep support for it? Perhaps. This is FLOSS. Save the source on a separate disk with SHA512 hash codes. And text is *the* guaranteed data format. Database backups should be text format extracts and Office documents should be in ODF format which is just zipped text. Never heard of ODF, or is it specific to *Office programmes? Personally, I save my latex as latex. The origional contents are plainly visible. accessible? Perhaps the time capsule would have to include a whole computer and not just the archive media. Yes. If by computer you also mean the whole schmeer, including many tape drives since it's common on Big Systems to backup a single database in parallel to multiple tape drive. Computers keep getting smaller. Computer could mean a little brick that has an interface for the archive drive, the archive drive unit, and some kind of user interface. RS-232C has been around for ever; will it be around for evermore? If the backup medium was hard disks, then an interface for the hard drive plus an enclosure for the drives if the brick didn't have the pysical space. Doug. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/24/07 08:47, Douglas Allan Tutty wrote: On Wed, May 23, 2007 at 10:49:51PM -0500, Ron Johnson wrote: On 05/23/07 20:17, Douglas Allan Tutty wrote: On Wed, May 23, 2007 at 07:05:23PM -0500, Ron Johnson wrote: It would be very nice if there was a universal cross-platform rw + encrypt filesystem for archives. Something that you could be confident that you could decrypt and access in 10 years using whatever OS was current then. tar is cross-platform, as is ASCII CSV. PGP/GPG is also cross-platform. I don't know if a generic tarball I make today will be readable by whatever OS in 10 years, which is why I store a current install cd. In 10 years, hopefully I can find a computer that will boot it. You've got bigger problems if you think that a CD-R will keep it's integrity for 10 years. No. I figure a CD is good for at least a year. Every year, I pull the two netinst cds from the bank, take an SHA hash and compare it with the written notes, then run something like cdck on them. So far, my Woody CDs are fine. Funny enough, so is my woody floppy set (the whole shebang set of 20 floppies) on Maxell floppies; needed for my 486 that doesn't boot from CD or run an installer after woody's. Tape (using tar, and a media used by large data processing shops, since they are supported for a LONG LONG LONG time, unlike that gee whiz specialized crap that NASA seems to love) or SCSI hard drives (in external enclosures so you can spin them up annually) formatted ext2 or FAT32 are what I would choose. Would you use tar to make a tarball and put it on a hard drive formatted ext2, copy as is to ext2 (changing ctime in the process), or forgo a filesystem and write tar directly to the raw disk? Copy a (possibly compressed) tarball to an ext2 volume. That way, you could have multiple timestamped or different project tarballs on the same device. What tar format would you use: GNU or Posix? Good question. GNU tar does Posix. I'd have to research the differences between the two formats. FAT has been around for 26+ years, and ext2 is 14 years old. Which is more resistant to bad blocks popping up after time in storage? Are either? Maybe you'd also have to create PAR2 (forward error correction) files. http://en.wikipedia.org/wiki/PAR2 If I gpg a tarball today with whatever algorithm is current, in 10 years that algorithm may be long cracked. Will the gpg authors keep support for it? Perhaps. This is FLOSS. Save the source on a separate disk with SHA512 hash codes. And text is *the* guaranteed data format. Database backups should be text format extracts and Office documents should be in ODF format which is just zipped text. Never heard of ODF, or is it specific to *Office programmes? Personally, I save my latex as latex. The origional contents are plainly visible. Never heard of ODF It's the OpenOffice.org 2.0 document format, aka OASIS Open Document Format for Office Applications, ISO/IEC 26300:2006. http://en.wikipedia.org/wiki/OpenDocument accessible? Perhaps the time capsule would have to include a whole computer and not just the archive media. Yes. If by computer you also mean the whole schmeer, including many tape drives since it's common on Big Systems to backup a single database in parallel to multiple tape drive. Computers keep getting smaller. Computer could mean a little brick that has an interface for the archive drive, the archive drive unit, and some kind of user interface. RS-232C has been around for ever; will it be around for evermore? If the backup medium was hard disks, then an interface for the hard drive plus an enclosure for the drives if the brick didn't have the pysical space. If you did that in 1990, you'd have put in an ISA controller. Five years ago, it would have been a PATA drive. All these issues just go to show how difficult the subject of digital archiving is. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGVcVTS9HxQb37XmcRArS1AJ9tthx04o3J8O/QI2X7jBSVphaWQQCdFFou 6+zW4Gjqbk/m1L3R2ZrBEAQ= =JGee -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
On Thu, May 24, 2007 at 12:03:15PM -0500, Ron Johnson wrote: Never heard of ODF, or is it specific to *Office programmes? Personally, I save my latex as latex. The origional contents are plainly visible. Never heard of ODF It's the OpenOffice.org 2.0 document format, aka OASIS Open Document Format for Office Applications, ISO/IEC 26300:2006. I've never used open office, or any other office-type product. The last word processor I used was WordPerfect 5.0 for OS/2. Transitioned from that to Lout when I went to Linux. Yes. If by computer you also mean the whole schmeer, including many tape drives since it's common on Big Systems to backup a single database in parallel to multiple tape drive. Computers keep getting smaller. Computer could mean a little brick that has an interface for the archive drive, the archive drive unit, and some kind of user interface. RS-232C has been around for ever; will it be around for evermore? If the backup medium was hard disks, then an interface for the hard drive plus an enclosure for the drives if the brick didn't have the pysical space. If you did that in 1990, you'd have put in an ISA controller. Five years ago, it would have been a PATA drive. I guess that's my point. Once you choose a data medium, you would need to store at least one drive and a computer to access it. If your backup media was 3.5 IDE drives, then that means storing a computer that has at least IDE port. Best to fill it up with memory SIMMS and other spare parts so that in when the time-capsule is opened, hopefully the machine still boots. Also hopefully you can still connect the computer itself to whatever computers in the future look like to get the data off. Will anyone know what UTP is by then? There is something to be said for casting something in plain text in bronze and gold plating it. Somewhere between the two is a Jolly Psysic, or at least a Happy Medium :) Doug. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/24/07 16:18, Douglas Allan Tutty wrote: On Thu, May 24, 2007 at 12:03:15PM -0500, Ron Johnson wrote: Never heard of ODF, or is it specific to *Office programmes? Personally, I save my latex as latex. The origional contents are plainly visible. Never heard of ODF It's the OpenOffice.org 2.0 document format, aka OASIS Open Document Format for Office Applications, ISO/IEC 26300:2006. I've never used open office, or any other office-type product. The last word processor I used was WordPerfect 5.0 for OS/2. Transitioned from that to Lout when I went to Linux. Well, at least it's text... Yes. If by computer you also mean the whole schmeer, including many tape drives since it's common on Big Systems to backup a single database in parallel to multiple tape drive. Computers keep getting smaller. Computer could mean a little brick that has an interface for the archive drive, the archive drive unit, and some kind of user interface. RS-232C has been around for ever; will it be around for evermore? If the backup medium was hard disks, then an interface for the hard drive plus an enclosure for the drives if the brick didn't have the pysical space. If you did that in 1990, you'd have put in an ISA controller. Five years ago, it would have been a PATA drive. I guess that's my point. Once you choose a data medium, you would need to store at least one drive and a computer to access it. If your backup media was 3.5 IDE drives, then that means storing a computer that has at least IDE port. Best to fill it up with memory SIMMS and other spare parts so that in when the time-capsule is opened, hopefully the machine still boots. Also hopefully you can still connect the computer itself to whatever computers in the future look like to get the data off. Will anyone know what UTP is by then? There is something to be said for casting something in plain text in bronze and gold plating it. Buffered lignin-free paper. http://www.ifla.org/IV/ifla64/044-114e.htm http://www.nmnh.si.edu/naa/copar/bulletin14.htm http://www.amigos.org/preservation/faq/preserve.html Somewhere between the two is a Jolly Psysic, or at least a Happy Medium :) - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGVhrlS9HxQb37XmcRAuQ5AJ4mL6flmXhT397bEQuK6cnpLyeYiwCggqLj tT0V8y5v1gtSicl7bF3tbgo= =XwlE -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
On Thu, May 24, 2007 at 06:08:21PM -0500, Ron Johnson wrote: There is something to be said for casting something in plain text in bronze and gold plating it. Buffered lignin-free paper. Burns. Bronze melts. Pottery breaks. Acid rain eats granite. I guess the bottom line is that information that is not used is eventually lost. It must be taught always to new generations, either people or hardware/software. Hard drives have spare sectors and reassign when sectors become degraded; they 'teach' a new sector the information from an old sector. If the drive is on the shelf, we have to spin it up and get the drive to test all sectors. When enough sectors get bad, SMART tells us so we can teach a new drive the old drive's data. Data is never maintenance-free. I know, Ron, I'm preaching to the choir. Doug. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/24/07 18:58, Douglas Allan Tutty wrote: On Thu, May 24, 2007 at 06:08:21PM -0500, Ron Johnson wrote: There is something to be said for casting something in plain text in bronze and gold plating it. Buffered lignin-free paper. Burns. Bronze melts. Pottery breaks. Acid rain eats granite. I guess the bottom line is that information that is not used is eventually lost. It must be taught always to new generations, either people or hardware/software. Hard drives have spare sectors and reassign when sectors become degraded; they 'teach' a new sector the information from an old sector. If the drive is on the shelf, we have to spin it up and get the drive to test all sectors. When enough sectors get bad, SMART tells us so we can teach a new drive the old drive's data. Data is never maintenance-free. I know, Ron, I'm preaching to the choir. A once-a-year spin-up, fsck and unzip -t (of compressed tarballs) would be darned useful. Externally-enclosed hard drives won't need to worry about whether they are SATA or IDE, but if USB ever goes away, it's time to migrate to a modern external drive. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGVjx2S9HxQb37XmcRAhYkAJ9URAbnSDet7qRGv8b+FXxMrcspHQCeMQHW +pRsYUAUeu7+uwBfFKbvUiU= =NT8n -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
[semi-OT] Data archiving (was Re: Query on adding a USB hdd)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/23/07 20:17, Douglas Allan Tutty wrote: On Wed, May 23, 2007 at 07:05:23PM -0500, Ron Johnson wrote: It would be very nice if there was a universal cross-platform rw + encrypt filesystem for archives. Something that you could be confident that you could decrypt and access in 10 years using whatever OS was current then. tar is cross-platform, as is ASCII CSV. PGP/GPG is also cross-platform. Problem solved? Perhaps. I don't know if a generic tarball I make today will be readable by whatever OS in 10 years, which is why I store a current install cd. In 10 years, hopefully I can find a computer that will boot it. You've got bigger problems if you think that a CD-R will keep it's integrity for 10 years. Tape (using tar, and a media used by large data processing shops, since they are supported for a LONG LONG LONG time, unlike that gee whiz specialized crap that NASA seems to love) or SCSI hard drives (in external enclosures so you can spin them up annually) formatted ext2 or FAT32 are what I would choose. FAT has been around for 26+ years, and ext2 is 14 years old. If I gpg a tarball today with whatever algorithm is current, in 10 years that algorithm may be long cracked. Will the gpg authors keep support for it? Perhaps. This is FLOSS. Save the source on a separate disk with SHA512 hash codes. I one relies on legacy hardware obscurity for off-site backup, what happens in a disaster and all the legacy hardware is toast? What if you Sungard. Two days after 9/11, they rolled in a truck to our Westchester[0] data center and rolled out a lot of kit. In a week, that site was back up and running. [0] http://members.cox.net/ron.l.johnson/ACS_Government_Services.kmz can't buy replacement ancient hardware to read those backups? When I was using OS/2, my backups were on QIC-80 IBM tape. That drive is not supported under Debian. Luckily, OS/2 was useable enough to allow me to transfer that tape data to a spare hard drive and OS/2 and Linux had support for a few filesystems in common. Perfect example of what I was talking about above. And text is *the* guaranteed data format. Database backups should be text format extracts and Office documents should be in ODF format which is just zipped text. Taking archiving to the limit, what would a time-capsule for electronic data look like? If you assumed that the software to extract the archive would be unavailabe, you could include the source but what about the compiler? How would you get the source off if the filesystem is not accessible? Perhaps the time capsule would have to include a whole computer and not just the archive media. Yes. If by computer you also mean the whole schmeer, including many tape drives since it's common on Big Systems to backup a single database in parallel to multiple tape drive. I guess this is why banks still save paper. Anyway, this has gone well off the origional topic. Not really, if OP's purpose was partly to save data. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGVQtfS9HxQb37XmcRArKhAJwJ6LdtE+iOpSN6k6y5Acsu2j23vwCffoxn YXy/czdwbvMnA156SUAAsQM= =ZRYs -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]