Re: [Tutor] converting EBCIDIC to ASCII

2012-07-16 Thread Flynn, Stephen (L P - IT)
I am trying to convert an EBCIDIC file to ASCII, when the records are
fixed length I can convert it fine, I have some files that are coming in
as variable length records, is there a way to convert the file in
Python? I tried using no length but then it just reads in to a fixed
buffer size and I can't seem to break the records properly

Hi Craig,

You might find it easier to pass the records through iconv if
you're on a Linux/Unix box and convert to ISO8859 from IBM037 (or
whatever codepage your ENCDIC files are in). There are versions of this
gnu software for Windows too, if that's your platform - it's trivial to
use. Shout if you need a hand.

Saying that, you'll almost certainly find that the 4 byte RDW
has been stripped from the file when it was sent to you, so you're not
being given any information to determine the length of each variable
length record.

Quick way to check - open the EBCDIC file up in an hex editor (I
use HxD (from http://mh-nexus.de/en/hxd/as it will happily run in EBCDIC
mode). If you can't see 4 bytes at the start of each record, then you're
in trouble as you have no way of determining the record length, without
the copybook for the file on the mainframe.

S.


This email and any attachment to it are confidential.  Unless you are the 
intended recipient, you may not use, copy or disclose either the message or any 
information contained in the message. If you are not the intended recipient, 
you should delete this email and notify the sender immediately.

Any views or opinions expressed in this email are those of the sender only, 
unless otherwise stated.  All copyright in any Capita material in this email is 
reserved.

All emails, incoming and outgoing, may be recorded by Capita and monitored for 
legitimate business purposes. 

Capita exclude all liability for any loss or damage arising or resulting from 
the receipt, use or transmission of this email to the fullest extent permitted 
by law.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] converting EBCIDIC to ASCII

2012-07-16 Thread Flynn, Stephen (L P - IT)
 -Original Message-
 From: tutor-bounces+steve.flynn=capita.co...@python.org [mailto:tutor-
 bounces+steve.flynn=capita.co...@python.org] On Behalf Of Steven
D'Aprano
 Sent: Saturday, July 14, 2012 2:42 AM
 To: tutor@python.org
 Subject: Re: [Tutor] converting EBCIDIC to ASCII
 
 Prinn, Craig wrote:
  I am trying to convert an EBCIDIC file to ASCII, when the records
are
 fixed
  length I can convert it fine, I have some files that are coming in
as
  variable length records, is there a way to convert the file in
Python? I
  tried using no length but then it just reads in to a fixed buffer
size
 and
  I can't seem to break the records properly
 
 
 I'm afraid that I have no idea what you mean here. What are you
actually
 doing? What does tried using no length mean?

The conversion to ASCII from EBCDIC is only going to get Craig so far -
depending on how the sender transferred the files to him, there's a very
good chance that the 4 byte RDW (Record Descriptor Word) has been
stripped off the start of each record.

This 4 byte RDW should indicate that the next N bytes belong to this
record. Without it, you have no way of determining how long the current
record should be and thus where the next RDW should be. This makes
finding the start and end of records tricky to say the least.

I've written to Craig off list with some info as it's not particularly
relevant to Python, other than letting python do the work of iconv.

S.



This email and any attachment to it are confidential.  Unless you are the 
intended recipient, you may not use, copy or disclose either the message or any 
information contained in the message. If you are not the intended recipient, 
you should delete this email and notify the sender immediately.

Any views or opinions expressed in this email are those of the sender only, 
unless otherwise stated.  All copyright in any Capita material in this email is 
reserved.

All emails, incoming and outgoing, may be recorded by Capita and monitored for 
legitimate business purposes. 

Capita exclude all liability for any loss or damage arising or resulting from 
the receipt, use or transmission of this email to the fullest extent permitted 
by law.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] converting EBCIDIC to ASCII

2012-07-13 Thread Prinn, Craig
I am trying to convert an EBCIDIC file to ASCII, when the records are fixed 
length I can convert it fine, I have some files that are coming in as variable 
length records, is there a way to convert the file in Python? I tried using no 
length but then it just reads in to a fixed buffer size and I can't seem to 
break the records properly

Craig Prinn
Manager, Data Management
Phone: 919-767-6640
Cell: 410-320-9962
Address: Bell and Howell
  3600 Clipper Mill Road
  Suite 404
  Baltimore MD 21211

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] converting EBCIDIC to ASCII

2012-07-13 Thread Marc Tompkins
On Thu, Jul 5, 2012 at 9:30 AM, Prinn, Craig craig.pr...@bhemail.comwrote:

 ** ** ** ** ** **

 I am trying to convert an EBCIDIC file to ASCII, when the records are
 fixed length I can convert it fine, I have some files that are coming in as
 variable length records, is there a way to convert the file in Python? I
 tried using no length but then it just reads in to a fixed buffer size and
 I can’t seem to break the records properly


I know of only three varieties of variable-length-record files:
-  Delimited - i.e. there's some special character that ends the record,
and (perhaps) a special character that separates fields.  CSV is the
classic example: newlines to separate records, commas to separate fields.

-  Prefixed - there's a previously-agreed schema of record lengths, where
(for example) a record that starts with A is 25 characters long; a B
record is 136 characters long, etc.

-  Sequential - record types/lengths appear in a previously-agreed order,
such as 25 characters, 136 characters, 45 characters, etc.

For each of these types, the schema may be externally-published, or it may
be encoded in a special record at the beginning of the file - to use an
example near and dear to my own experience, ANSI X12 EDI files all start
with a fixed-length ISA record, which among other things contains the
element separator, repetition separator, sub-element separator, and segment
terminator characters in positions 3, 104, 84, and 105.  To read an X12
file, therefore, you read it in - look at positions 3,84, 104, and 105 -
and then use that information to break up the rest of the file into records
and fields.

How you handle variable-length records depends on what kind they are, and
how much you know about them going in.  Python is just a tool for applying
your specialized domain knowledge - by itself, it doesn't know any more
about your particular solution than you do.

If you have more information about the structure of your files, and need
help implementing an algorithm to deal with 'em, let us know!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] converting EBCIDIC to ASCII

2012-07-13 Thread Marc Tompkins
On Fri, Jul 13, 2012 at 1:28 PM, Prinn, Craig craig.pr...@bhemail.comwrote:

  The records are coming off of a mainframe, so there probably was a 2
 byte RDW or length indicator at one point. If there is a x0D x0A at the end
 would that work?

 Thanks

 Craig


I presume so, but (despite my bloviating about the generalities of
variable-length records) I don't actually know all that much about how
systems that use EBCDIC tend to structure their files (my big iron days
were in an HP 3000 shop, which DID use EBCDIC, but that was  22 years ago -
and at the time I was a database-only programmer and didn't need to worry
my little head about actual file I/O.)  By at the end do you mean 'at the
end of each record', or 'at the end of the file'?

If you meant 'at the end of each record', then my approach would be:
-  create an empty list called lines
-  read in the file (or buffer-sized chunks of it, anyway) - call it inFile
-  create recordBegin and recordEnd pointers, initialized to 0
-  search for x0D x0A (or whatever) in the stream of bytes
-  each time I find it,
   -  set the recordEnd pointer
   -  make a copy of the bytes between recordBegin and recordEnd and append
it to lines
   -  copy recordEnd to recordBegin
   -  lather, rinse, repeat
-  at the end, decode each bytestream in lines

If you meant 'at the end of the file', then I'm not sure it helps, and I
don't know what you'd need to move forward.

Good luck!



 ** **
  --

 *From:* Marc Tompkins [mailto:marc.tompk...@gmail.com]
 *Sent:* Friday, July 13, 2012 3:30 PM
 *To:* Prinn, Craig
 *Cc:* tutor@python.org
 *Subject:* Re: [Tutor] converting EBCIDIC to ASCII

 ** **

 On Thu, Jul 5, 2012 at 9:30 AM, Prinn, Craig craig.pr...@bhemail.com
 wrote:

 I am trying to convert an EBCIDIC file to ASCII, when the records are
 fixed length I can convert it fine, I have some files that are coming in as
 variable length records, is there a way to convert the file in Python? I
 tried using no length but then it just reads in to a fixed buffer size and
 I can’t seem to break the records properly

 ** **

 I know of only three varieties of variable-length-record files:
 -  Delimited - i.e. there's some special character that ends the record,
 and (perhaps) a special character that separates fields.  CSV is the
 classic example: newlines to separate records, commas to separate fields.

 -  Prefixed - there's a previously-agreed schema of record lengths, where
 (for example) a record that starts with A is 25 characters long; a B
 record is 136 characters long, etc.

 -  Sequential - record types/lengths appear in a previously-agreed order,
 such as 25 characters, 136 characters, 45 characters, etc.

 For each of these types, the schema may be externally-published, or it may
 be encoded in a special record at the beginning of the file - to use an
 example near and dear to my own experience, ANSI X12 EDI files all start
 with a fixed-length ISA record, which among other things contains the
 element separator, repetition separator, sub-element separator, and segment
 terminator characters in positions 3, 104, 84, and 105.  To read an X12
 file, therefore, you read it in - look at positions 3,84, 104, and 105 -
 and then use that information to break up the rest of the file into records
 and fields.

 How you handle variable-length records depends on what kind they are, and
 how much you know about them going in.  Python is just a tool for applying
 your specialized domain knowledge - by itself, it doesn't know any more
 about your particular solution than you do.

 If you have more information about the structure of your files, and need
 help implementing an algorithm to deal with 'em, let us know!

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] converting EBCIDIC to ASCII

2012-07-13 Thread Steven D'Aprano

Prinn, Craig wrote:

I am trying to convert an EBCIDIC file to ASCII, when the records are fixed
length I can convert it fine, I have some files that are coming in as
variable length records, is there a way to convert the file in Python? I
tried using no length but then it just reads in to a fixed buffer size and
I can't seem to break the records properly



I'm afraid that I have no idea what you mean here. What are you actually 
doing? What does tried using no length mean?


Converting from one encoding to another should have nothing to do with whether 
they are fixed-length records, variable-length records, or free-form text. 
First you read the file as bytes, then use the encoding to convert to text, 
then process the file however you like.


Using Python 3, I prepared an EBCIDIC file. If I open it in binary mode, you 
get the raw bytes, which are a mess:


py raw = open('/home/steve/ebcidic.text', 'rb').read()
py print(raw)
b'\xe3\x88\x89\xa2@\x89\xa2@\\\xa2\x96\x94\x85\\@\xe3 ...

For brevity, I truncated the output.

But if you open in text mode, and set the encoding correctly, Python 
automatically converts the bytes into text according to the rules of EBCIDIC:



py text = open('/home/steve/ebcidic.text', 'r', encoding='cp500').read()
py print(text)
This is *some* Text containing punctuation  other things(!) which
may{?} NOT be the +++same+++ when encoded into ASCII|EBCIDIC.


This is especially useful if you need to process the file line by line. Simple 
open the file with the right encoding, then loop over the file as normal.



f = open('/home/steve/ebcidic.text', 'r', encoding='cp500')
for line in f:
print(line)


In this case, I used IBM's standard EBCIDIC encoding for Western Europe. 
Python knows about some others, see the documentation for the codecs module 
for the list.


http://docs.python.org/library/codecs.html
http://docs.python.org/py3k/library/codecs.html

Once you have the text, you can then treat it as fixed width, variable width, 
or whatever else you might have.



Python 2 is a little trickier. You can manually decode the bytes:

# not tested
text = open('/home/steve/ebcidic.text', 'rb').read().decode('cp500')

or you can use the codecs manual to get very close to the same functionality 
as Python 3:


# also untested
import codecs
f = codecs.open('/home/steve/ebcidic.text', 'r', encoding='cp500')
for line in f:
print line



--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Converting ebcidic to ascii

2011-06-15 Thread Prinn, Craig
Thanks Mark that did the trick, couldn't quite figure out the syntax before. 

Craig Prinn
Document Solutions Manager
Office Phone 919-767-6640
Cell Phone410-320-9962
Fax  410-243-0973
3600 Clipper Mill Road
Suite 404
Baltimore, MD 21211
-Original Message-
From: tutor-bounces+craig.prinn=bowebellhowell@python.org 
[mailto:tutor-bounces+craig.prinn=bowebellhowell@python.org] On Behalf Of 
tutor-requ...@python.org
Sent: Wednesday, June 15, 2011 6:00 AM
To: tutor@python.org
Subject: Tutor Digest, Vol 88, Issue 54

Send Tutor mailing list submissions to
tutor@python.org

To subscribe or unsubscribe via the World Wide Web, visit
http://mail.python.org/mailman/listinfo/tutor
or, via email, send a message with subject or body 'help' to
tutor-requ...@python.org

You can reach the person managing the list at
tutor-ow...@python.org

When replying, please edit your Subject line so it is more specific
than Re: Contents of Tutor digest...


Today's Topics:

   1. Re: Already Initialized Object Inheritance? (WolfRage)
   2. Re: trying to translate and ebcidic file (Mark Tolonen)


--

Message: 1
Date: Tue, 14 Jun 2011 23:42:59 -0700
From: WolfRage wolfrage8...@gmail.com
To: Japhy Bartlett ja...@pearachute.com
Cc: Python Tutor tutor@python.org
Subject: Re: [Tutor] Already Initialized Object Inheritance?
Message-ID: 1308120179.1952.50.camel@wolfrage-LE1600
Content-Type: text/plain; charset=UTF-8

Unfortunately I am not able to inherit stdscr using that method. As
Python returns with an error stating that stdscr is not defined. This
error is returned at run time and by the compiler prior to actual
execution. If you would like I can write a quick example that will
generate the error message for that method.
--
Jordan
On Wed, 2011-06-15 at 02:04 -0400, Japhy Bartlett wrote:
 When you're subclassing something, you use the syntax:
 
 class Foo(Bar):
 
 It seems like you're trying to do:
 
 class Bar:
 class Foo:
 
 - Japhy
 
 On Wed, Jun 15, 2011 at 12:47 AM, WolfRage wolfrage8...@gmail.com wrote:
  I can not get this to behave in the manor that I would like. I am trying
  to have an object refereed to as CursesApp.Screen become the already
  initialized object stdscr. To elaborate I would like it to become that
  object but to also be able to define additional methods and properties,
  so more along the lines of inherit from stdscr. Is this even possible?
  Well I can make it equal to that object I can not add additional methods
  and properties to it? Additionally, so that I learn; where has my
  thinking been too short sited? Thank you for your help.
  --
  Jordan
 
  CODE BELOW
 
  #!/usr/bin/python3
  With thi method I can make the class Screen become stdscr but if
  I refernce any of the new methods or properties the applications
  promptly fails and notifies me that the method or property does not
  exist. Another downside of this method is I can not reference
  self.Screen.* or it crashes.
  import curses
  class CursesApp:
 def __init__(self, stdscr):
 self.Screen(stdscr) #This is the stdscr object.
 curses.init_pair(1,curses.COLOR_BLUE,curses.COLOR_YELLOW)
 #self.Screen.bkgd(' ', curses.color_pair(1))
 #self.mainLoop()
 
 #def mainLoop(self):
 #while 1:
 #self.Screen.refresh()
 #key=self.Screen.getch()
 #if key==ord('q'): break
 
 class Screen:
 def __init__(self,stdscr):
 self=stdscr
 #self.height, self.width = self.getmaxyx() # any reference
  to these crashes
 #self.offsety, self.offsetx = -self.height/2, -self.width/2
  # any reference to these crashes
 #self.curx, self.cury = 1, 1 # any reference to these
  crashes
 self.clear()
 self.border(0)
 while 1:
 self.refresh()
 key=self.getch()
 if key==ord('q'): break
 
  def main():
 cursesapp = curses.wrapper(setup)
 
  def setup(stdscr):
 CursesApp(stdscr)
 
  if __name__ == '__main__':
 main()
 
 
 
  CODE BELOW
 
  #!/usr/bin/python3
  With this method I can make Screen become stdscr but if I
  obviously can not even define any new methods or properties. But atleast
  the references can be used through out the class with out crashing.
  import curses
  class CursesApp:
 def __init__(self, stdscr):
 self.Screen=stdscr #This is the stdscr object.
 curses.init_pair(1,curses.COLOR_BLUE,curses.COLOR_YELLOW)
 self.Screen.bkgd(' ', curses.color_pair(1))
 self.mainLoop()
 
 def mainLoop(self):
 while 1:
 self.Screen.refresh()
 key=self.Screen.getch()
 if key==ord('q'): break
 
  def main():
 cursesapp = curses.wrapper(setup)
 
  def setup(stdscr):
 CursesApp(stdscr)
 
  if __name__ ==