file size estimation
Hi, Suppose a EBCDIC file on a tape from IBM mainframe is read onto a Linux server, and this EBCDIC file on the tape has 100 records with a length of 13054, is it correct to estimate the size of the file on Linux server would be 1,305,400 bytes? Is block size information also needed to calculate the size? Please correct me if these terms are used incorrectly, also hopefully this question is not too OT. Thanks, Zhao ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
MonadLUG meeting last night: Ray Cote and find, mpom, and Ted Roche on NX server
Seven folks attended the September meeting of MonadLUG last night. Charlie Farinella ran the meeting in Guy's absence, and we had a brief discussion on keeping the administrative overhead to minimum to focus on the contents of the meeting. All were in agreement and we dove in. It was Ray Côté's evening to present the Man page of the month and Ray chose the find command. Armed with a two-sided handout for us (to be posted to the wiki once polished up a bit), we review the basic syntax, some variants, and talked about worthwhile application of the command to isolate files by certain filters and execute further commands upon them. The technique of using -exec to execute directly on the files inline was contrasted with the use of piping or xargs to process them, and the -print0 argument for separating files was explained with the matching -0 argument to xargs to spilt filenames with spaces embedded. We all picked up a trick or two. Thanks, Ray! I did the main presentation, and can't really comment on how well it went, as I was too busy trying to make it go well. Slides for the presentation are linked from the wiki's event pages and available on my website for attendees who want to reference them. In two sentences: NX gives you remote X Windows access suitable for low- bandwidth use with most common configuration tasks taken care of for you. If you're looking to experiment with the technology, choose the newly-Open-Sourced NX 2.0 technology or the more mature freenx project, but don't try to mix the two. Brief announcements included the exciting Software Freedom Day tomorrow at Hopkinton Library 10-2, MerriLUG's exciting data carving meeting next week, and Hosstraders only three weeks away. Hope to see you at one or more of these events! Ted Roche Ted Roche Associates, LLC http://www.tedroche.com ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Tapes and close to a quarter-century.
[EMAIL PROTECTED] said: Suppose a EBCDIC file on a tape from IBM mainframe is read onto a Linux server, and this EBCDIC file on the tape has 100 records with a length of 13054, is it correct to estimate the size of the file on Linux server would be 1,305,400 bytes? Generally speaking, yes. Tapes can be both variable length records and variable length blocks, but often were fixed-length records and fixed-length blocks. If fixed-length blocks then data size is usually a simple multiplication. Is block size information also needed to calculate the size? A block on a magnetic tape is the size of the data between inter-record gaps. On a start-stop tape drive this gap allowed the tape drive to come to a stop, then gather speed to read again. There was no useful data in the gap. Usually the block size was an even multiple of the record size, particularly on fixed length record tapes. Typically the larger the blocks, the more likely the block was to have a read error (particularly in the early days of tapes) and therefore block sizes tended to be low multiples of record size. Block sizes tended to increase as tape drive mechanisms and techniques improved. After a while streaming tape (tapes with no real start-stop gaps) took over. Just because you had a record of 13054, does not mean that each record had 13054 useful bytes of data in each one, it depended on what the program put into it. On the other hand, your record length is sort of an odd record length, so it may be a real record of information. Usually you can use the 'dd' command to easily read an ibm EBCDIC tape and convert it to ASCII. Remember that most Unix systems (heck, most systems in general) use ASCII, not EBCDIC, so you might want to convert it, assuming that it is character data on the tape. But if it is not character data, just binary data, then converting it would be a mistake. Here is a rough example of what you will need: dd if=/dev/tape_drive of=file_name conv=ascii,ibs=13054 The ibs in the command line stands for input block size, and you might try it both as 13054 and the actual block size of the tape, as I do not remember which number it relates to. Sorry, but it has been 23 years since I last had to do this. Regards, maddog -- Jon maddog Hall Executive Director Linux International(R) email: [EMAIL PROTECTED] 80 Amherst St. Voice: +1.603.672.4557 Amherst, N.H. 03031-3032 U.S.A. WWW: http://www.li.org Board Member: Uniforum Association Board Member Emeritus: USENIX Association (2000-2006) (R)Linux is a registered trademark of Linus Torvalds in several countries. (R)Linux International is a registered trademark in the USA used pursuant to a license from Linux Mark Institute, authorized licensor of Linus Torvalds, owner of the Linux trademark on a worldwide basis (R)UNIX is a registered trademark of The Open Group in the USA and other countries. ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Tapes and close to a quarter-century.
On Friday 15 September 2006 3:22 pm, Jon maddog Hall wrote: Usually you can use the 'dd' command to easily read an ibm EBCDIC tape and convert it to ASCII. Remember that most Unix systems (heck, most systems in general) use ASCII, not EBCDIC, so you might want to convert it, assuming that it is character data on the tape. Years ago, when I worked at Burger King Corp as a programmer, we wrote a system to transfer data to the parent, Pillsbury. We had Burroughs medium systems equipment (eg. EBCDIC). Pillsbury had all Honeywell (formerly GE) ASCII using the GECOS OS. They also had 36-bit words. To read our tapes, required Pillsbury to go to a service bureau. The only common media was punched cards. if I recall, we couldn't produce ASCII tapes in a format suitable to Pillsbury. The communications system we wrote essentially sent punched card images into their RJE system. Most of the code we wrote was COBOL. Since Burroughs did not have a linkage editor at that time, every program was somewhat monolithic. Fortunately, you could write COBOL that looked like: PROCEDURE DIVISION. ENTER SYMBOLIC. lots of assembler code ENTER COBOL. -- Jerry Feldman [EMAIL PROTECTED] Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9 ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Tapes and close to a quarter-century.
Original message From: Jon maddog Hall [EMAIL PROTECTED] [EMAIL PROTECTED] said: Suppose a EBCDIC file on a tape from IBM mainframe is read onto a Linux server, and this EBCDIC file on the tape has 100 records with a length of 13054, is it correct to estimate the size of the file on Linux server would be 1,305,400 bytes? Generally speaking, yes. Tapes can be both variable length records and variable length blocks, but often were fixed-length records and fixed-length blocks. If fixed-length blocks then data size is usually a simple multiplication. Usually but not always. It's simple for fixed length records in fixed length blocks, more complicated for variable length records in fixed length blocks. Depending on the software that wrote the tape the records might or might not be allowed to span block boundaries. The stated situation sounds like fixed length records, but fixed length records could be containing variable length data fields, just to add another possible level of complication. Just something to be aware of. Is block size information also needed to calculate the size? A block on a magnetic tape is the size of the data between inter-record gaps. On a start-stop tape drive this gap allowed the tape drive to come to a stop, then gather speed to read again. There was no useful data in the gap. Usually the block size was an even multiple of the record size, particularly on fixed length record tapes. Typically the larger the blocks, the more likely the block was to have a read error (particularly in the early days of tapes) and therefore block sizes tended to be low multiples of record size. Block sizes tended to increase as tape drive mechanisms and techniques improved. After a while streaming tape (tapes with no real start-stop gaps) took over. Streaming tapes did have inter-record gaps, possibly longer than those on start-stop drives. This is because a streaming tape kept the tape in motion as the data stream was being written, and did not stop immediately after the end of a write. If the next block was supplied in time the tape kept moving and the data was written following a gap of some indeterminate length. If more data was not supplied in time, a horrendously expensive stop-reposition-start sequence was initiated. Thus the time-out interval was usually not very tight, and gaps were loose. -Bruce McCulley (ex-RSX11 devo) Just because you had a record of 13054, does not mean that each record had 13054 useful bytes of data in each one, it depended on what the program put into it. On the other hand, your record length is sort of an odd record length, so it may be a real record of information. Usually you can use the 'dd' command to easily read an ibm EBCDIC tape and convert it to ASCII. Remember that most Unix systems (heck, most systems in general) use ASCII, not EBCDIC, so you might want to convert it, assuming that it is character data on the tape. But if it is not character data, just binary data, then converting it would be a mistake. Here is a rough example of what you will need: dd if=/dev/tape_drive of=file_name conv=ascii,ibs=13054 The ibs in the command line stands for input block size, and you might try it both as 13054 and the actual block size of the tape, as I do not remember which number it relates to. Sorry, but it has been 23 years since I last had to do this. Regards, maddog -- Jon maddog Hall Executive Director Linux International(R) email: [EMAIL PROTECTED] 80 Amherst St. Voice: +1.603.672.4557 Amherst, N.H. 03031-3032 U.S.A. WWW: http://www.li.org Board Member: Uniforum Association Board Member Emeritus: USENIX Association (2000-2006) (R)Linux is a registered trademark of Linus Torvalds in several countries. (R)Linux International is a registered trademark in the USA used pursuant to a license from Linux Mark Institute, authorized licensor of Linus Torvalds, owner of the Linux trademark on a worldwide basis (R)UNIX is a registered trademark of The Open Group in the USA and other countries. ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Tapes and close to a quarter-century.
[EMAIL PROTECTED] said: Usually but not always. Usually usually means not always. But you brought up a good point. Logical records often bridged physical recordsand in the case of the start-stop tapes the usual real physical record was the block. Along the same lines, some tapes contained source code and the systems wrote them as 80-byte records (we will not go into the issues of six-bit bytes, eight-bit bytes, nibbles, etc.) with each line padded to the full 80-character card image. So you often had 800 or 8000 or 8 byte blocks, with the record length being 80 bytes. I remember how odd it was to me that Unix simply put a new-line character at the end of every line, and did not have to pad the record with blanks. In days of five megabyte disk drives (no screaming about how big that is, o.k.?) this was a big savings. [EMAIL PROTECTED] said: If the next block was supplied in time the tape kept moving and the data was written following a gap of some indeterminate length. If more data was not supplied in time, a horrendously expensive stop-reposition-start sequence was initiated. Thus the time-out interval was usually not very tight, and gaps were loose. I was under the impression that if actual data was not written in time that the streaming tape would write null data (not the same as the traditional inter-record gap, until it either received more data to write or it gave up and stopped. When it got more data it started your reposition-start again. When it did reposition, it repositioned before the end of the good data and then started writing. We overcame this in Ultrix by utilizing a ring buffer which managed to deliver the data fast enough to the tape drive to keep it streaming. The inter-record (really inter-block) gap on a start-stop, 9-track IBM tape was .75 inch. - Jon maddog Hall (ex-TK50 junkie) -- Jon maddog Hall Executive Director Linux International(R) email: [EMAIL PROTECTED] 80 Amherst St. Voice: +1.603.672.4557 Amherst, N.H. 03031-3032 U.S.A. WWW: http://www.li.org Board Member: Uniforum Association Board Member Emeritus: USENIX Association (2000-2006) (R)Linux is a registered trademark of Linus Torvalds in several countries. (R)Linux International is a registered trademark in the USA used pursuant to a license from Linux Mark Institute, authorized licensor of Linus Torvalds, owner of the Linux trademark on a worldwide basis (R)UNIX is a registered trademark of The Open Group in the USA and other countries. ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: file size estimation
Suppose a EBCDIC file on a tape from IBM mainframe is read onto a Linux server, and this EBCDIC file on the tape has 100 records with a length of 13054, is it correct to estimate the size of the file on Linux server would be 1,305,400 bytes? Maybe. [Last time I did this, I out-sourced it to a boutique conversion shop in Cambridge ... he had a VMS system with one of every tape drive known to man, and set a custom conversion table since the tapes I had were Mutant International EBCDIC from NLM. Sorry, I don't have name handy, this was 10 years ago.] Is block size information also needed to calculate the size? Probably not, although it will probably be necessary to read the tape, depending on utility used. eg., dd(1) will require being told blocksize and lrecl. Please correct me if these terms are used incorrectly, also hopefully this question is not too OT. Terminology seems correct. Normally, if doing EBCDIC=ASCII conversion for use on Unix later, I would also do an LRECL=NL. This would insert an additional 100 bytes beyond the size computed in your example. If you really only plan to use the file with sysread(2) as LRECL, you don't need to do this, but to view it with more(1) or anything else, it's highly desireable, even though the lrecl is rather long by Unix standards and will crush any old fixed 1000 byte buffers. (If by some disaster you convert it into Unicode, the size could be a bit larger due to non-ASCII characters appearing in the EBCDIC, or doubled if you convert it to UTF16. I wouldn't recommend that unless you had compelling reasons!) -- Bill [EMAIL PROTECTED] [EMAIL PROTECTED] ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss