Re: Data Archiving

2013-04-10 Thread MRH

On 08/04/13 20:59, Rob Owens wrote:

On Mon, Apr 08, 2013 at 09:30:52AM -0700, Gary Roach wrote:

As for as OCR vs retyping vs scan and preses - still up in the air
at this point. I suspect that all three methods might be used.


There are commercial companies that will do bulk scanning and OCR.  I
used one in the past and I found the price to be pretty reasonable.  I'm
not sure if anybody has OCR software that works on hand-written
documents, though...

-Rob




If the database / digital storage space is not a problem I'd rather 
suggest storing scanned documents as they are (in graphical format) - 
you never know when this may come handy. Sometimes you want to see how 
the document looked like, not just to read its content only. And OCRed 
version for searching purposes, linked with the images.


I remember there was a (commercial, MS Windows) Russian OCR software 
some years ago, really good working then; probably it could handle 
handwriting to some extent:

http://finereader.abbyy.com/
I never used it for handwritten text though.
Another option is to hire some teenagers or students who'd like to earn 
some additional money if you have funds for that - both ways need 
proof-reading afterwards.


Not sure which database would be good for storing images; is MySQL 
capable of managing it in an efficient way; perhaps Postgress SQL or 
another solution is better. You could also store the images on the drive 
and keep links (paths) to the images in the database.


Did you try contacting other institutions which might have similar needs 
- what do they use?


Kind regards,
Michal
--
Michal R. Hoffmann


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/5165c6c2.20...@o2.pl



Re: Data Archiving

2013-04-08 Thread Darac Marjal
On Sun, Apr 07, 2013 at 01:57:51PM -0700, Gary Roach wrote:
 Hi all,
 
 I have a records archiving problem and don't know where to start.
 There are 100 years of records that include hand written material,
 type written hard copy, photos and a lot of email. I would like to
 have a system based around mysql (if possible) that would allow
 flexible data mining.
 
 Are there any good books on the subject or existing Debian software
 that would do the job. Any suggestions will be sincerely
 appreciated. An answer of I'm an idiot for even trying is
 acceptable.

No answers here, but here's[1] a similar question asked on slashdot
recently.


[1] 
http://ask.slashdot.org/story/13/04/07/1926229/ask-slashdot-open-source-for-bill-and-document-management

 
 Thanks in advance
 
 Gary R
 
 
 -- 
 To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a
 subject of unsubscribe. Trouble? Contact
 listmas...@lists.debian.org
 Archive: http://lists.debian.org/5161ddcf.1050...@verizon.net
 


signature.asc
Description: Digital signature


Re: Data Archiving

2013-04-08 Thread Gary Roach

On 04/07/2013 05:56 PM, Zenaan Harkness wrote:

On 4/8/13, Gary Roachgary719_li...@verizon.net  wrote:

Hi all,

I have a records archiving problem and don't know where to start. There
are 100 years of records that include hand written material, type
written hard copy, photos and a lot of email. I would like to have a
system based around mysql (if possible) that would allow flexible data
mining.

How many records?

How many people will be entering these records?

Are you going to OCR or type some of them, or just scan them in and
record data about each record?

flexible data mining sound a bit sales pitch-y, but that might just be me.

Is this for you, or a company, or a library? Or do you want to write
some software to do this sort of thing? Or is it a University project?


Ok, the organization is the Unitarian Universalist Church of Long Beach 
CA. We have been around since 1913. I recently got stuck with the job of 
Church Historian and am concerned about the closet full of records going 
back to day one. Record organization is virtually non-existent. Our 
minister recently tried to put together a quick  history and it took her 
2 weeks of digging to find anything. We also have problems like Board 
meeting minutes with motions that have long since been forgotten. There 
is a small but constant demand for information on past events 
(especially pictures). There are other needs too numerous to list here.


I agree that data mining is the latest buzz word out there but couldn't 
think of a better description for what I need. Being able to retrieve 
date by categories and then sub-categories would be essential.


As for as OCR vs retyping vs scan and preses - still up in the air at 
this point. I suspect that all three methods might be used.


We have some really sharp computer types that maintain our database, 
email and web site but no one is working with archiving. I would like to 
be able to cut something like mySQL to our needs (some minimal 
experience here) but really don't know how  to put such a system 
together. A good book on the subject or or an extensive on line treatise 
would help.


Gary R.


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/5162f0bc.6000...@verizon.net



Re: Data Archiving

2013-04-08 Thread Lars Noodén
On 4/8/13 7:30 PM, Gary Roach wrote:
 Ok, the organization is the Unitarian Universalist Church of Long Beach
 CA. We have been around since 1913. I recently got stuck with the job of
 Church Historian and am concerned about the closet full of records going
 back to day one. 
[snip]

Is there a historical archive in the region where you might get advice?
 Or perhaps a library school with an archival line with students in need
of projects or practical training?  Helping solve this kind of problem
can be very interesting for the right people, if you can find them.

Regards,
/Lars


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5162f19e.8010...@gmail.com



Re: Data Archiving

2013-04-08 Thread Rob Owens
On Mon, Apr 08, 2013 at 09:30:52AM -0700, Gary Roach wrote:
 As for as OCR vs retyping vs scan and preses - still up in the air
 at this point. I suspect that all three methods might be used.
 
There are commercial companies that will do bulk scanning and OCR.  I
used one in the past and I found the price to be pretty reasonable.  I'm
not sure if anybody has OCR software that works on hand-written
documents, though...

-Rob


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130408195911.gb27...@aurora.owens.net



Data Archiving

2013-04-07 Thread Gary Roach

Hi all,

I have a records archiving problem and don't know where to start. There 
are 100 years of records that include hand written material, type 
written hard copy, photos and a lot of email. I would like to have a 
system based around mysql (if possible) that would allow flexible data 
mining.


Are there any good books on the subject or existing Debian software that 
would do the job. Any suggestions will be sincerely appreciated. An 
answer of I'm an idiot for even trying is acceptable.


Thanks in advance

Gary R


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/5161ddcf.1050...@verizon.net



Re: Data Archiving

2013-04-07 Thread Zenaan Harkness
On 4/8/13, Gary Roach gary719_li...@verizon.net wrote:
 Hi all,

 I have a records archiving problem and don't know where to start. There
 are 100 years of records that include hand written material, type
 written hard copy, photos and a lot of email. I would like to have a
 system based around mysql (if possible) that would allow flexible data
 mining.

How many records?

How many people will be entering these records?

Are you going to OCR or type some of them, or just scan them in and
record data about each record?

flexible data mining sound a bit sales pitch-y, but that might just be me.

Is this for you, or a company, or a library? Or do you want to write
some software to do this sort of thing? Or is it a University project?


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CAOsGNSRbQBdTBxVA-7_=djfubg8drkmnnhjdnzyf08f1uyw...@mail.gmail.com



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-25 Thread Johannes Wiedersich
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Douglas Allan Tutty wrote:
 On Wed, May 23, 2007 at 10:49:51PM -0500, Ron Johnson wrote:
 On 05/23/07 20:17, Douglas Allan Tutty wrote:
 On Wed, May 23, 2007 at 07:05:23PM -0500, Ron Johnson wrote:

 It would be very nice if there was a universal cross-platform rw +
 encrypt filesystem for archives.  Something that you could be confident
 that you could decrypt and access in 10 years using whatever OS was
 current then.
 tar is cross-platform, as is ASCII CSV.  PGP/GPG is also cross-platform.
 
 I don't know if a generic tarball I make today will be readable by
 whatever OS in 10 years, which is why I store a current install cd.  In
 10 years, hopefully I can find a computer that will boot it.
 You've got bigger problems if you think that a CD-R will keep it's
 integrity for 10 years.
 
 No.  I figure a CD is good for at least a year.  Every year, I
 pull the two netinst cds from the bank, take an SHA hash and compare it
 with the written notes, then run something like cdck on them.  So far,
 my Woody CDs are fine.  Funny enough, so is my woody floppy set (the
 whole shebang set of 20 floppies) on Maxell floppies; needed for my 486
 that doesn't boot from CD or run an installer after woody's.

Wow, you seem to have a lot of spare time. How long does it take you to
perform all these checks?

I do backups to usb hard drives. They have 40GB to 120GB and it takes on
the order of 1 minute / GB to diff -r them.

Considering lifetime and how often you're able to rewrite / reuse the
media, they are cheaper per GB than CDs/DVDs.

I guess I will realize soon enough, whenever usb and/or ext3 will be
phased out and replaced by a different technology and will then move the
data to new media / formats.

Johannes

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVqEuC1NzPRl9qEURAt0oAJ9vcdaqkzDZXUaxSFOZegNJVduupwCeLnz+
6SN2ixUqFEdumdy5Jtk0Mls=
=sUqD
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-25 Thread Douglas Allan Tutty
On Fri, May 25, 2007 at 10:41:18AM +0200, Johannes Wiedersich wrote:
 Douglas Allan Tutty wrote:
  No.  I figure a CD is good for at least a year.  Every year, I
  pull the two netinst cds from the bank, take an SHA hash and compare it
  with the written notes, then run something like cdck on them.  So far,
  my Woody CDs are fine.  Funny enough, so is my woody floppy set (the
  whole shebang set of 20 floppies) on Maxell floppies; needed for my 486
  that doesn't boot from CD or run an installer after woody's.
 
 Wow, you seem to have a lot of spare time. How long does it take you to
 perform all these checks?
 

My backup set isn't that large so only an hour or so.

 I do backups to usb hard drives. They have 40GB to 120GB and it takes on
 the order of 1 minute / GB to diff -r them.
 

So how often to you fsck -c the filesystem so that it attempts to read
every block so that in turn the drive hardware can handle fading
sectors?  Hard drives on a shelf aren't maintenance-free either.

 Considering lifetime and how often you're able to rewrite / reuse the
 media, they are cheaper per GB than CDs/DVDs.
 

True.  However, for a small data set (under 1 GB) the need for three
copies means three hard drives.  Using a hard drive and rewriting over
it means that you loose old archives.  




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-25 Thread Johannes Wiedersich
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Douglas Allan Tutty wrote:
 On Fri, May 25, 2007 at 10:41:18AM +0200, Johannes Wiedersich wrote:
 Douglas Allan Tutty wrote:
 No.  I figure a CD is good for at least a year.  Every year, I
 pull the two netinst cds from the bank, take an SHA hash and compare it
 with the written notes, then run something like cdck on them.  So far,
 my Woody CDs are fine.  Funny enough, so is my woody floppy set (the
 whole shebang set of 20 floppies) on Maxell floppies; needed for my 486
 that doesn't boot from CD or run an installer after woody's.
 Wow, you seem to have a lot of spare time. How long does it take you to
 perform all these checks?

 
 My backup set isn't that large so only an hour or so.



 I do backups to usb hard drives. They have 40GB to 120GB and it takes on
 the order of 1 minute / GB to diff -r them.

 
 So how often to you fsck -c the filesystem so that it attempts to read
 every block so that in turn the drive hardware can handle fading
 sectors?  Hard drives on a shelf aren't maintenance-free either.

My point was not that they are maintenance-free. My point was that both
backups and maintenance are much faster.

 Considering lifetime and how often you're able to rewrite / reuse the
 media, they are cheaper per GB than CDs/DVDs.

 
 True.  However, for a small data set (under 1 GB) the need for three
 copies means three hard drives.  Using a hard drive and rewriting over
 it means that you loose old archives.  

If you have 1GB of data and a say 40GB hard disk that means about 40
full backups on each. With incremental backups those would last much,
much longer.

For your three disks you'd have 120 full backups! Of course in the case
of failure you'd loose 40 of them, instead of loosing one unreadable CD,
but I consider checking 120 CDs for unreadable sectors etc. a nightmare.

Just my .02

Johannes

NB: My backup system started out as DVD-RAMs, since those are said to be
more reliable than CDs/DVDs. I gave up on that scheme, when the data to
be backed up started to exceed 2 disks (and when my first DVD-Ram died -
probably related to the death of my laptop's multiburner). I now use
CDs, DVD-RAMs etc. for non-mission-critical stuff only.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD4DBQFGVuzkC1NzPRl9qEURAiydAJsHZGQFU31GP/toAo6lNzJ37yaVrwCVFoUh
eL9tWT0LqzWQv5m6xjH6Iw==
=L8x6
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-25 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/25/07 09:04, Johannes Wiedersich wrote:
 Douglas Allan Tutty wrote:
[snip]
 True.  However, for a small data set (under 1 GB) the need for three
 copies means three hard drives.  Using a hard drive and rewriting over
 it means that you loose old archives.  
 
 If you have 1GB of data and a say 40GB hard disk that means about 40
 full backups on each. With incremental backups those would last much,
 much longer.
 
 For your three disks you'd have 120 full backups! Of course in the case
 of failure you'd loose 40 of them, instead of loosing one unreadable CD,
 but I consider checking 120 CDs for unreadable sectors etc. a nightmare.

But isn't that putting all your eggs in one basket?  (Unless I'm
mis-reading you.)

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVvAmS9HxQb37XmcRAryQAKCbn59hmgEmhMt3LmZm/sj7/jzPZACdF1g1
kac37iDVg9pkSSNIKX4F8vU=
=HsST
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-25 Thread Johannes Wiedersich
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ron Johnson wrote:
 On 05/25/07 09:04, Johannes Wiedersich wrote:
 Douglas Allan Tutty wrote:
 [snip]
 True.  However, for a small data set (under 1 GB) the need for three
 copies means three hard drives.  Using a hard drive and rewriting over
 it means that you loose old archives.  
 If you have 1GB of data and a say 40GB hard disk that means about 40
 full backups on each. With incremental backups those would last much,
 much longer.
 
 For your three disks you'd have 120 full backups! Of course in the case
 of failure you'd loose 40 of them, instead of loosing one unreadable CD,
 but I consider checking 120 CDs for unreadable sectors etc. a nightmare.
 
 But isn't that putting all your eggs in one basket?  (Unless I'm
 mis-reading you.)

3 disks in three different locations (according to Douglas'
requirements). You'd have backup 1,4,7,... on the first disk 2,5,8,...
on the second and so forth.

Johannes
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVy6SC1NzPRl9qEURAmalAJ0Qftp3FxJurDTSx8bRs+PFXR3hbgCeJu8l
0WXMOX3cEjNIAMsKNmtbhpE=
=+5Yz
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-25 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/25/07 13:44, Johannes Wiedersich wrote:
 Ron Johnson wrote:
[snip]
 But isn't that putting all your eggs in one basket?  (Unless I'm
 mis-reading you.)
 
 3 disks in three different locations (according to Douglas'
 requirements). You'd have backup 1,4,7,... on the first disk 2,5,8,...
 on the second and so forth.

Roger that!

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGV0gHS9HxQb37XmcRAv8bAKDpDj75Bn+yGJAxyKdMY0OePrq1pgCfYRtr
JsWnHvCdj0Sw3vsbWt0n21A=
=Fadv
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-24 Thread Douglas Allan Tutty
On Wed, May 23, 2007 at 10:49:51PM -0500, Ron Johnson wrote:
 On 05/23/07 20:17, Douglas Allan Tutty wrote:
  On Wed, May 23, 2007 at 07:05:23PM -0500, Ron Johnson wrote:
 
  It would be very nice if there was a universal cross-platform rw +
  encrypt filesystem for archives.  Something that you could be confident
  that you could decrypt and access in 10 years using whatever OS was
  current then.
  tar is cross-platform, as is ASCII CSV.  PGP/GPG is also cross-platform.

  I don't know if a generic tarball I make today will be readable by
  whatever OS in 10 years, which is why I store a current install cd.  In
  10 years, hopefully I can find a computer that will boot it.
 
 You've got bigger problems if you think that a CD-R will keep it's
 integrity for 10 years.

No.  I figure a CD is good for at least a year.  Every year, I
pull the two netinst cds from the bank, take an SHA hash and compare it
with the written notes, then run something like cdck on them.  So far,
my Woody CDs are fine.  Funny enough, so is my woody floppy set (the
whole shebang set of 20 floppies) on Maxell floppies; needed for my 486
that doesn't boot from CD or run an installer after woody's.

 
 Tape (using tar, and a media used by large data processing shops,
 since they are supported for a LONG LONG LONG time, unlike that gee
 whiz specialized crap that NASA seems to love) or SCSI hard drives
 (in external enclosures so you can spin them up annually) formatted
 ext2 or FAT32 are what I would choose.
 

Would you use tar to make a tarball and put it on a hard drive formatted
ext2, copy as is to ext2 (changing ctime in the process), or forgo a
filesystem and write tar directly to the raw disk?  What tar format
would you use: GNU or Posix?

 FAT has been around for 26+ years, and ext2 is 14 years old.
 

Which is more resistant to bad blocks popping up after time in storage?  

  If I gpg a tarball today with whatever algorithm is current, in 10 years
  that algorithm may be long cracked.  Will the gpg authors keep support
  for it?  Perhaps.
 
 This is FLOSS.  Save the source on a separate disk with SHA512 hash
 codes.
 
 And text is *the* guaranteed data format.  Database backups should
 be text format extracts and Office documents should be in ODF
 format which is just zipped text.

Never heard of ODF, or is it specific to *Office programmes?
Personally, I save my latex as latex.  The origional contents are
plainly visible.



  accessible?  Perhaps the time capsule would have to include a whole
  computer and not just the archive media.
 
 Yes.  If by computer you also mean the whole schmeer, including
 many tape drives since it's common on Big Systems to backup a
 single database in parallel to multiple tape drive.

Computers keep getting smaller.  Computer could mean a little brick that
has an interface for the archive drive, the archive drive unit, and some
kind of user interface.  RS-232C has been around for ever; will it be
around for evermore?  If the backup medium was hard disks, then an
interface for the hard drive plus an enclosure for the drives if the
brick didn't have the pysical space.

Doug.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-24 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/24/07 08:47, Douglas Allan Tutty wrote:
 On Wed, May 23, 2007 at 10:49:51PM -0500, Ron Johnson wrote:
 On 05/23/07 20:17, Douglas Allan Tutty wrote:
 On Wed, May 23, 2007 at 07:05:23PM -0500, Ron Johnson wrote:

 It would be very nice if there was a universal cross-platform rw +
 encrypt filesystem for archives.  Something that you could be confident
 that you could decrypt and access in 10 years using whatever OS was
 current then.
 tar is cross-platform, as is ASCII CSV.  PGP/GPG is also cross-platform.
 
 I don't know if a generic tarball I make today will be readable by
 whatever OS in 10 years, which is why I store a current install cd.  In
 10 years, hopefully I can find a computer that will boot it.
 You've got bigger problems if you think that a CD-R will keep it's
 integrity for 10 years.
 
 No.  I figure a CD is good for at least a year.  Every year, I
 pull the two netinst cds from the bank, take an SHA hash and compare it
 with the written notes, then run something like cdck on them.  So far,
 my Woody CDs are fine.  Funny enough, so is my woody floppy set (the
 whole shebang set of 20 floppies) on Maxell floppies; needed for my 486
 that doesn't boot from CD or run an installer after woody's.
 
 Tape (using tar, and a media used by large data processing shops,
 since they are supported for a LONG LONG LONG time, unlike that gee
 whiz specialized crap that NASA seems to love) or SCSI hard drives
 (in external enclosures so you can spin them up annually) formatted
 ext2 or FAT32 are what I would choose.

 
 Would you use tar to make a tarball and put it on a hard drive formatted
 ext2, copy as is to ext2 (changing ctime in the process), or forgo a
 filesystem and write tar directly to the raw disk?

Copy a (possibly compressed) tarball to an ext2 volume.  That way,
you could have multiple timestamped or different project tarballs
on the same device.

 What tar format
 would you use: GNU or Posix?

Good question.  GNU tar does Posix.  I'd have to research the
differences between the two formats.

 FAT has been around for 26+ years, and ext2 is 14 years old.

 
 Which is more resistant to bad blocks popping up after time in storage?  

Are either?

Maybe you'd also have to create PAR2 (forward error correction) files.

http://en.wikipedia.org/wiki/PAR2

 If I gpg a tarball today with whatever algorithm is current, in 10 years
 that algorithm may be long cracked.  Will the gpg authors keep support
 for it?  Perhaps.
 This is FLOSS.  Save the source on a separate disk with SHA512 hash
 codes.
  
 And text is *the* guaranteed data format.  Database backups should
 be text format extracts and Office documents should be in ODF
 format which is just zipped text.
 
 Never heard of ODF, or is it specific to *Office programmes?
 Personally, I save my latex as latex.  The origional contents are
 plainly visible.

Never heard of ODF  It's the OpenOffice.org 2.0 document format,
aka OASIS Open Document Format for Office Applications, ISO/IEC
26300:2006.

http://en.wikipedia.org/wiki/OpenDocument

 accessible?  Perhaps the time capsule would have to include a whole
 computer and not just the archive media.
 Yes.  If by computer you also mean the whole schmeer, including
 many tape drives since it's common on Big Systems to backup a
 single database in parallel to multiple tape drive.
 
 Computers keep getting smaller.  Computer could mean a little brick that
 has an interface for the archive drive, the archive drive unit, and some
 kind of user interface.  RS-232C has been around for ever; will it be
 around for evermore?  If the backup medium was hard disks, then an
 interface for the hard drive plus an enclosure for the drives if the
 brick didn't have the pysical space.

If you did that in 1990, you'd have put in an ISA controller.  Five
years ago, it would have been a PATA drive.

All these issues just go to show how difficult the subject of
digital archiving is.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVcVTS9HxQb37XmcRArS1AJ9tthx04o3J8O/QI2X7jBSVphaWQQCdFFou
6+zW4Gjqbk/m1L3R2ZrBEAQ=
=JGee
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-24 Thread Douglas Allan Tutty
On Thu, May 24, 2007 at 12:03:15PM -0500, Ron Johnson wrote:
  
  Never heard of ODF, or is it specific to *Office programmes?
  Personally, I save my latex as latex.  The origional contents are
  plainly visible.
 
 Never heard of ODF  It's the OpenOffice.org 2.0 document format,
 aka OASIS Open Document Format for Office Applications, ISO/IEC
 26300:2006.

I've never used open office, or any other office-type product.  The last
word processor I used was WordPerfect 5.0 for OS/2.  Transitioned from
that to Lout when I went to Linux.

  Yes.  If by computer you also mean the whole schmeer, including
  many tape drives since it's common on Big Systems to backup a
  single database in parallel to multiple tape drive.
  
  Computers keep getting smaller.  Computer could mean a little brick that
  has an interface for the archive drive, the archive drive unit, and some
  kind of user interface.  RS-232C has been around for ever; will it be
  around for evermore?  If the backup medium was hard disks, then an
  interface for the hard drive plus an enclosure for the drives if the
  brick didn't have the pysical space.
 
 If you did that in 1990, you'd have put in an ISA controller.  Five
 years ago, it would have been a PATA drive.

I guess that's my point.  Once you choose a data medium, you would need
to store at least one drive and a computer to access it.

If your backup media was 3.5 IDE drives, then that means storing a
computer that has at least IDE port.  Best to fill it up with memory
SIMMS and other spare parts so that in when the time-capsule is opened,
hopefully the machine still boots.  Also hopefully you can still connect
the computer itself to whatever computers in the future look like to get
the data off.  Will anyone know what UTP is by then?  

There is something to be said for casting something in plain text in
bronze and gold plating it.

Somewhere between the two is a Jolly Psysic, or at least a Happy Medium
:)

Doug.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-24 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/24/07 16:18, Douglas Allan Tutty wrote:
 On Thu, May 24, 2007 at 12:03:15PM -0500, Ron Johnson wrote:
   
 Never heard of ODF, or is it specific to *Office programmes?
 Personally, I save my latex as latex.  The origional contents are
 plainly visible.
 Never heard of ODF  It's the OpenOffice.org 2.0 document format,
 aka OASIS Open Document Format for Office Applications, ISO/IEC
 26300:2006.
 
 I've never used open office, or any other office-type product.  The last
 word processor I used was WordPerfect 5.0 for OS/2.  Transitioned from
 that to Lout when I went to Linux.

Well, at least it's text...

 Yes.  If by computer you also mean the whole schmeer, including
 many tape drives since it's common on Big Systems to backup a
 single database in parallel to multiple tape drive.
 Computers keep getting smaller.  Computer could mean a little brick that
 has an interface for the archive drive, the archive drive unit, and some
 kind of user interface.  RS-232C has been around for ever; will it be
 around for evermore?  If the backup medium was hard disks, then an
 interface for the hard drive plus an enclosure for the drives if the
 brick didn't have the pysical space.
 If you did that in 1990, you'd have put in an ISA controller.  Five
 years ago, it would have been a PATA drive.
 
 I guess that's my point.  Once you choose a data medium, you would need
 to store at least one drive and a computer to access it.
 
 If your backup media was 3.5 IDE drives, then that means storing a
 computer that has at least IDE port.  Best to fill it up with memory
 SIMMS and other spare parts so that in when the time-capsule is opened,
 hopefully the machine still boots.  Also hopefully you can still connect
 the computer itself to whatever computers in the future look like to get
 the data off.  Will anyone know what UTP is by then?  
 
 There is something to be said for casting something in plain text in
 bronze and gold plating it.

Buffered lignin-free paper.

http://www.ifla.org/IV/ifla64/044-114e.htm
http://www.nmnh.si.edu/naa/copar/bulletin14.htm
http://www.amigos.org/preservation/faq/preserve.html

 Somewhere between the two is a Jolly Psysic, or at least a Happy Medium
 :)

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVhrlS9HxQb37XmcRAuQ5AJ4mL6flmXhT397bEQuK6cnpLyeYiwCggqLj
tT0V8y5v1gtSicl7bF3tbgo=
=XwlE
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-24 Thread Douglas Allan Tutty
On Thu, May 24, 2007 at 06:08:21PM -0500, Ron Johnson wrote:
  
  There is something to be said for casting something in plain text in
  bronze and gold plating it.
 
 Buffered lignin-free paper.
 

Burns.

Bronze melts.

Pottery breaks.

Acid rain eats granite.

I guess the bottom line is that information that is not used is
eventually lost.  It must be taught always to new generations, either
people or hardware/software.

Hard drives have spare sectors and reassign when sectors become
degraded; they 'teach' a new sector the information from an old sector.
If the drive is on the shelf, we have to spin it up and get the drive to
test all sectors.  When enough sectors get bad, SMART tells us so we can
teach a new drive the old drive's data.

Data is never maintenance-free.

I know, Ron, I'm preaching to the choir.

Doug.





-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: [semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-24 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/24/07 18:58, Douglas Allan Tutty wrote:
 On Thu, May 24, 2007 at 06:08:21PM -0500, Ron Johnson wrote:
   
 There is something to be said for casting something in plain text in
 bronze and gold plating it.
 Buffered lignin-free paper.

 
 Burns.
 
 Bronze melts.
 
 Pottery breaks.
 
 Acid rain eats granite.
 
 I guess the bottom line is that information that is not used is
 eventually lost.  It must be taught always to new generations, either
 people or hardware/software.
 
 Hard drives have spare sectors and reassign when sectors become
 degraded; they 'teach' a new sector the information from an old sector.
 If the drive is on the shelf, we have to spin it up and get the drive to
 test all sectors.  When enough sectors get bad, SMART tells us so we can
 teach a new drive the old drive's data.
 
 Data is never maintenance-free.
 
 I know, Ron, I'm preaching to the choir.

A once-a-year spin-up, fsck and unzip -t (of compressed tarballs)
would be darned useful.

Externally-enclosed hard drives won't need to worry about whether
they are SATA or IDE, but if USB ever goes away, it's time to
migrate to a modern external drive.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVjx2S9HxQb37XmcRAhYkAJ9URAbnSDet7qRGv8b+FXxMrcspHQCeMQHW
+pRsYUAUeu7+uwBfFKbvUiU=
=NT8n
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



[semi-OT] Data archiving (was Re: Query on adding a USB hdd)

2007-05-23 Thread Ron Johnson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/23/07 20:17, Douglas Allan Tutty wrote:
 On Wed, May 23, 2007 at 07:05:23PM -0500, Ron Johnson wrote:

 It would be very nice if there was a universal cross-platform rw +
 encrypt filesystem for archives.  Something that you could be confident
 that you could decrypt and access in 10 years using whatever OS was
 current then.
 tar is cross-platform, as is ASCII CSV.  PGP/GPG is also cross-platform.

 Problem solved?


 Perhaps.


 I don't know if a generic tarball I make today will be readable by
 whatever OS in 10 years, which is why I store a current install cd.  In
 10 years, hopefully I can find a computer that will boot it.

You've got bigger problems if you think that a CD-R will keep it's
integrity for 10 years.

Tape (using tar, and a media used by large data processing shops,
since they are supported for a LONG LONG LONG time, unlike that gee
whiz specialized crap that NASA seems to love) or SCSI hard drives
(in external enclosures so you can spin them up annually) formatted
ext2 or FAT32 are what I would choose.

FAT has been around for 26+ years, and ext2 is 14 years old.

 If I gpg a tarball today with whatever algorithm is current, in 10 years
 that algorithm may be long cracked.  Will the gpg authors keep support
 for it?  Perhaps.

This is FLOSS.  Save the source on a separate disk with SHA512 hash
codes.

 I one relies on legacy hardware obscurity for off-site backup, what
 happens in a disaster and all the legacy hardware is toast?  What if you

Sungard.

Two days after 9/11, they rolled in a truck to our Westchester[0]
data center and rolled out a lot of kit.  In a week, that site was
back up and running.

[0] http://members.cox.net/ron.l.johnson/ACS_Government_Services.kmz

 can't buy replacement ancient hardware to read those backups?   When I
 was using OS/2, my backups were on QIC-80 IBM tape.  That drive is not
 supported under Debian.  Luckily, OS/2 was useable enough to allow me to
 transfer that tape data to a spare hard drive and OS/2 and Linux had
 support for a few filesystems in common.

Perfect example of what I was talking about above.

And text is *the* guaranteed data format.  Database backups should
be text format extracts and Office documents should be in ODF
format which is just zipped text.

 Taking archiving to the limit, what would a time-capsule for electronic
 data look like?  If you assumed that the software to extract the archive
 would be unavailabe, you could include the source but what about the
 compiler?  How would you get the source off if the filesystem is not
 accessible?  Perhaps the time capsule would have to include a whole
 computer and not just the archive media.

Yes.  If by computer you also mean the whole schmeer, including
many tape drives since it's common on Big Systems to backup a
single database in parallel to multiple tape drive.

 I guess this is why banks still save paper.

 Anyway, this has gone well off the origional topic.

Not really, if OP's purpose was partly to save data.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGVQtfS9HxQb37XmcRArKhAJwJ6LdtE+iOpSN6k6y5Acsu2j23vwCffoxn
YXy/czdwbvMnA156SUAAsQM=
=ZRYs
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]