file size estimation

2006-09-15 Thread Zhao Peng

Hi,

Suppose a EBCDIC file on a tape from IBM mainframe is read onto a Linux 
server, and this EBCDIC file on the tape has 100 records with a length 
of 13054, is it correct to estimate the size of the file on Linux server 
would be 1,305,400 bytes? Is block size information also needed to 
calculate the size?


Please correct me if these terms are used incorrectly, also hopefully 
this question is not too OT.


Thanks,
Zhao
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


MonadLUG meeting last night: Ray Cote and find, mpom, and Ted Roche on NX server

2006-09-15 Thread Ted Roche
Seven folks attended the September meeting of MonadLUG last night.  
Charlie Farinella ran the meeting in Guy's absence, and we had a  
brief discussion on keeping the administrative overhead to minimum to  
focus on the contents of the meeting. All were in agreement and we  
dove in.


It was Ray Côté's evening to present the Man page of the month and  
Ray chose the find command. Armed with a two-sided handout for us  
(to be posted to the wiki once polished up a bit), we review the  
basic syntax, some variants, and talked about worthwhile application  
of the command to isolate files by certain filters and execute  
further commands upon them. The technique of using -exec to execute  
directly on the files inline was contrasted with the use of piping or  
xargs to process them, and the -print0 argument for separating files  
was explained with the matching -0 argument to xargs to spilt  
filenames with spaces embedded.  We all picked up a trick or two.  
Thanks, Ray!


I did the main presentation, and can't really comment on how well it  
went, as I was too busy trying to make it go well. Slides for the  
presentation are linked from the wiki's event pages and available on  
my website for attendees who want to reference them. In two  
sentences: NX gives you remote X Windows access suitable for low- 
bandwidth use with most common configuration tasks taken care of for  
you. If you're looking to experiment with the technology, choose the  
newly-Open-Sourced NX 2.0 technology or the more mature freenx  
project, but don't try to mix the two.


Brief announcements included the exciting Software Freedom Day  
tomorrow at Hopkinton Library 10-2, MerriLUG's exciting data carving  
meeting next week, and Hosstraders only three weeks away. Hope to see  
you at one or more of these events!


Ted Roche
Ted Roche  Associates, LLC
http://www.tedroche.com



___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Tapes and close to a quarter-century.

2006-09-15 Thread Jon maddog Hall

[EMAIL PROTECTED] said:
 Suppose a EBCDIC file on a tape from IBM mainframe is read onto a Linux
 server, and this EBCDIC file on the tape has 100 records with a length  of
 13054, is it correct to estimate the size of the file on Linux server  would
 be 1,305,400 bytes?

Generally speaking, yes.  Tapes can be both variable length records and variable
length blocks, but often were fixed-length records and fixed-length blocks.  If
fixed-length blocks then data size is usually a simple multiplication.

 Is block size information also needed to  calculate the size?

A block on a magnetic tape is the size of the data between inter-record 
gaps.
On a start-stop tape drive this gap allowed the tape drive to come to a stop, 
then
gather speed to read again.  There was no useful data in the gap.  Usually the 
block
size was an even multiple of the record size, particularly on fixed length 
record
tapes.

Typically the larger the blocks, the more likely the block was to have a read 
error
(particularly in the early days of tapes) and therefore block sizes tended to be
low multiples of record size.  Block sizes tended to increase as tape drive 
mechanisms
and techniques improved.  After a while streaming tape (tapes with no real
start-stop gaps) took over.

Just because you had a record of 13054, does not mean that each record had
13054 useful bytes of data in each one, it depended on what the program put 
into it.

On the other hand, your record length is sort of an odd record length, so it 
may be a
real record of information.

Usually you can use the 'dd' command to easily read an ibm EBCDIC tape and 
convert it
to ASCII.  Remember that most Unix systems (heck, most systems in general)
use ASCII, not EBCDIC, so you might want to convert it, assuming that it is
character data on the tape.  But if it is not character data, just binary 
data,
then converting it would be a mistake.

Here is a rough example of what you will need:

dd if=/dev/tape_drive of=file_name conv=ascii,ibs=13054

The ibs in the command line stands for input block size, and you might try 
it
both as 13054 and the actual block size of the tape, as I do not remember which
number it relates to.  Sorry, but it has been 23 years since I last had to do 
this.

Regards,

maddog
-- 
Jon maddog Hall
Executive Director   Linux International(R)
email: [EMAIL PROTECTED] 80 Amherst St. 
Voice: +1.603.672.4557   Amherst, N.H. 03031-3032 U.S.A.
WWW: http://www.li.org

Board Member: Uniforum Association
Board Member Emeritus: USENIX Association (2000-2006)

(R)Linux is a registered trademark of Linus Torvalds in several countries.
(R)Linux International is a registered trademark in the USA used pursuant
   to a license from Linux Mark Institute, authorized licensor of Linus
   Torvalds, owner of the Linux trademark on a worldwide basis
(R)UNIX is a registered trademark of The Open Group in the USA and other
   countries.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Tapes and close to a quarter-century.

2006-09-15 Thread Jerry Feldman
On Friday 15 September 2006 3:22 pm, Jon maddog Hall wrote:
 Usually you can use the 'dd' command to easily read an ibm EBCDIC tape
 and convert it to ASCII.  Remember that most Unix systems (heck, most
 systems in general) use ASCII, not EBCDIC, so you might want to
 convert it, assuming that it is character data on the tape.
Years ago, when I worked at Burger King Corp as a programmer, we wrote a 
system to transfer data to the parent, Pillsbury.
We had Burroughs medium systems equipment (eg. EBCDIC). Pillsbury had all 
Honeywell (formerly GE) ASCII using the GECOS OS. They also had 36-bit 
words.  To read our tapes, required Pillsbury to go to a service bureau. 
The only common media was punched cards. if I recall, we couldn't produce 
ASCII tapes in a format suitable to Pillsbury. The communications system we 
wrote essentially sent punched card images into their RJE system. Most of 
the code we wrote was COBOL. Since Burroughs did not have a linkage editor 
at that time, every program was somewhat monolithic. Fortunately, you could 
write COBOL that looked like:
PROCEDURE DIVISION.
ENTER SYMBOLIC.
lots of assembler code
ENTER COBOL.


-- 
Jerry Feldman [EMAIL PROTECTED]
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Tapes and close to a quarter-century.

2006-09-15 Thread bmcculley


 Original message 
From: Jon maddog Hall [EMAIL PROTECTED]  
[EMAIL PROTECTED] said:
 Suppose a EBCDIC file on a tape from IBM mainframe is read onto a Linux
 server, and this EBCDIC file on the tape has 100 records with a length  of
 13054, is it correct to estimate the size of the file on Linux server  would
 be 1,305,400 bytes?

Generally speaking, yes.  Tapes can be both variable length records and 
variable
length blocks, but often were fixed-length records and fixed-length blocks.  If
fixed-length blocks then data size is usually a simple multiplication.

Usually but not always.  It's simple for fixed length records in fixed length 
blocks, more complicated for variable length records in fixed length blocks.  
Depending on the software that wrote the tape the records might or might not be 
allowed to span block boundaries.  The stated situation sounds like fixed 
length records, but fixed length records could be containing variable length 
data fields, just to add another possible level of complication.  Just 
something to be aware of.

 Is block size information also needed to  calculate the size?

A block on a magnetic tape is the size of the data between inter-record 
gaps.
On a start-stop tape drive this gap allowed the tape drive to come to a 
stop, then
gather speed to read again.  There was no useful data in the gap.  Usually the 
block
size was an even multiple of the record size, particularly on fixed length 
record
tapes.

Typically the larger the blocks, the more likely the block was to have a read 
error
(particularly in the early days of tapes) and therefore block sizes tended to 
be
low multiples of record size.  Block sizes tended to increase as tape drive 
mechanisms
and techniques improved.  After a while streaming tape (tapes with no real
start-stop gaps) took over.

Streaming tapes did have inter-record gaps, possibly longer than those on 
start-stop drives.  This is because a streaming tape kept the tape in motion as 
the data stream was being written, and did not stop immediately after the end 
of a write.  If the next block was supplied in time the tape kept moving and 
the data was written following a gap of some indeterminate length.  If more 
data was not supplied in time, a horrendously expensive stop-reposition-start 
sequence was initiated.  Thus the time-out interval was usually not very tight, 
and gaps were loose.

-Bruce McCulley
(ex-RSX11 devo)

Just because you had a record of 13054, does not mean that each record had
13054 useful bytes of data in each one, it depended on what the program put 
into it.

On the other hand, your record length is sort of an odd record length, so it 
may be a
real record of information.

Usually you can use the 'dd' command to easily read an ibm EBCDIC tape and 
convert it
to ASCII.  Remember that most Unix systems (heck, most systems in general)
use ASCII, not EBCDIC, so you might want to convert it, assuming that it is
character data on the tape.  But if it is not character data, just binary 
data,
then converting it would be a mistake.

Here is a rough example of what you will need:

dd if=/dev/tape_drive of=file_name conv=ascii,ibs=13054

The ibs in the command line stands for input block size, and you might try 
it
both as 13054 and the actual block size of the tape, as I do not remember which
number it relates to.  Sorry, but it has been 23 years since I last had to do 
this.

Regards,

maddog
-- 
Jon maddog Hall
Executive Director   Linux International(R)
email: [EMAIL PROTECTED] 80 Amherst St. 
Voice: +1.603.672.4557   Amherst, N.H. 03031-3032 U.S.A.
WWW: http://www.li.org

Board Member: Uniforum Association
Board Member Emeritus: USENIX Association (2000-2006)

(R)Linux is a registered trademark of Linus Torvalds in several countries.
(R)Linux International is a registered trademark in the USA used pursuant
   to a license from Linux Mark Institute, authorized licensor of Linus
   Torvalds, owner of the Linux trademark on a worldwide basis
(R)UNIX is a registered trademark of The Open Group in the USA and other
   countries.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: Tapes and close to a quarter-century.

2006-09-15 Thread Jon maddog Hall

[EMAIL PROTECTED] said:
 Usually but not always.

Usually usually means not always.  But you brought up a good point.  Logical
records often bridged physical recordsand in the case of the start-stop 
tapes
the usual real physical record was the block.

Along the same lines, some tapes contained source code and the systems wrote 
them
as 80-byte records (we will not go into the issues of six-bit bytes, eight-bit 
bytes,
nibbles, etc.) with each line padded to the full 80-character card image.  So 
you
often had 800 or 8000 or 8 byte blocks, with the record length being 80 
bytes.

I remember how odd it was to me that Unix simply put a new-line character at 
the
end of every line, and did not have to pad the record with blanks.  In days of
five megabyte disk drives (no screaming about how big that is, o.k.?) this was
a big savings.

[EMAIL PROTECTED] said:
 If the next block was supplied in time the tape kept moving and the data was
 written following a gap of some indeterminate length.  If more data was not
 supplied in time, a horrendously expensive stop-reposition-start sequence was
 initiated.  Thus the time-out interval was usually not very tight, and gaps
 were loose.

I was under the impression that if actual data was not written in time that the
streaming tape would write null data (not the same as the traditional 
inter-record
gap, until it either received more data to write or it gave up and stopped.  
When
it got more data it started your reposition-start again.  When it did 
reposition,
it repositioned before the end of the good data and then started writing.

We overcame this in Ultrix by utilizing a ring buffer which managed to deliver
the data fast enough to the tape drive to keep it streaming.

The inter-record (really inter-block) gap on a start-stop, 9-track IBM tape was 
.75
inch.

- Jon maddog Hall
(ex-TK50 junkie)
-- 
Jon maddog Hall
Executive Director   Linux International(R)
email: [EMAIL PROTECTED] 80 Amherst St. 
Voice: +1.603.672.4557   Amherst, N.H. 03031-3032 U.S.A.
WWW: http://www.li.org

Board Member: Uniforum Association
Board Member Emeritus: USENIX Association (2000-2006)

(R)Linux is a registered trademark of Linus Torvalds in several countries.
(R)Linux International is a registered trademark in the USA used pursuant
   to a license from Linux Mark Institute, authorized licensor of Linus
   Torvalds, owner of the Linux trademark on a worldwide basis
(R)UNIX is a registered trademark of The Open Group in the USA and other
   countries.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss


Re: file size estimation

2006-09-15 Thread Bill Ricker

Suppose a EBCDIC file on a tape from IBM mainframe is read onto a Linux
server, and this EBCDIC file on the tape has 100 records with a length
of 13054, is it correct to estimate the size of the file on Linux server
would be 1,305,400 bytes?


Maybe.

[Last time I did this, I out-sourced it to a boutique conversion shop
in Cambridge ... he had a VMS system with one of every tape drive
known to man, and set a custom conversion table since the tapes I had
were Mutant International EBCDIC from NLM. Sorry, I don't have name
handy, this was 10 years ago.]


Is block size information also needed to
calculate the size?


Probably not, although it will probably be necessary to read the tape,
depending on utility used. eg., dd(1) will require being told
blocksize and lrecl.


Please correct me if these terms are used incorrectly, also hopefully
this question is not too OT.


Terminology seems correct.

Normally, if doing EBCDIC=ASCII conversion for use on Unix later, I
would also do an LRECL=NL. This would insert an additional 100 bytes
beyond the size computed in your example.  If you really only plan to
use the file with sysread(2) as LRECL, you don't need to do this, but
to view it with more(1) or anything else, it's highly desireable, even
though the lrecl is rather long by Unix standards and will crush any
old fixed 1000 byte buffers.

(If by some disaster you convert it into Unicode, the size could be a
bit larger due to non-ASCII characters appearing in the EBCDIC, or
doubled if you convert it to UTF16. I wouldn't recommend that unless
you had compelling reasons!)

--
Bill
[EMAIL PROTECTED] [EMAIL PROTECTED]
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss