Re: [Jprogramming] Parsing EDI data and converting them into a database format

Joe Bogner Fri, 13 Nov 2015 12:08:53 -0800

George, here's some ideas to get you started:

Forgive the terrible display in email. See this:
https://gist.github.com/joebo/42c914ba332c9e5d628c



msg=: 0 : 0
ST*997*2878~AK1*HS*293328532~AK2*270*307272179~
AK3*NM1*8*L1010_0*8~AK4*0:0*66*1~AK4*0:1*66*1~AK4*0:2*
66*1~AK3*NM1*8*L1010_1*8~AK4*1:0*66*1~AK4*1:1*66*1~AK3*
NM1*8*L1010_2*8~AK4*2:0*66*1~AK5*R*5~AK9*R*1*1*0~SE*8*2878~
)

NB. from
https://www.ameren.com/-/media/corporate-site/Files/BusinessPartners/CPWG/CPWGIL814E-Request.pdf
msg2 =: 0 : 0
ST*814*0001
BGN*13*2010063000001*20100630
N1*8S*UTILITY*1*006912345
N1*SJ*SUPPLIER*9*007909111IL00
N1*8R*CUSTOMER NAME
LIN*1*SH*EL*SH*CE
 ASI*7*021
 REF*11*0012345600
 REF*12*0312345624
 REF*BLT*LDC
 REF*PC*DUAL
 REF*9V*Y
SE*13*0001
)

chop=: >@: ((('*' cut ]&dlb) each each) @: ('~' cut each ]) @: (LF cut ]))

   chop msg
+---------------------+---------------------+-------------------+--------------+-----------+
|+--+---+----+        |+---+--+---------+   |+---+---+---------+|
   |           |
||ST|997|2878|        ||AK1|HS|293328532|   ||AK2|270|307272179||
   |           |
|+--+---+----+        |+---+--+---------+   |+---+---+---------+|
   |           |
+---------------------+---------------------+-------------------+--------------+-----------+
|+---+---+-+-------+-+|+---+---+--+-+       |+---+---+--+-+     |+---+---+
    |           |
||AK3|NM1|8|L1010_0|8|||AK4|0:0|66|1|       ||AK4|0:1|66|1|     ||AK4|0:2|
    |           |
|+---+---+-+-------+-+|+---+---+--+-+       |+---+---+--+-+     |+---+---+
    |           |
+---------------------+---------------------+-------------------+--------------+-----------+
|+--+-+               |+---+---+-+-------+-+|+---+---+--+-+
|+---+---+--+-+|+---+      |
||66|1|               ||AK3|NM1|8|L1010_1|8|||AK4|1:0|66|1|
||AK4|1:1|66|1|||AK3|      |
|+--+-+               |+---+---+-+-------+-+|+---+---+--+-+
|+---+---+--+-+|+---+      |
+---------------------+---------------------+-------------------+--------------+-----------+
|+---+-+-------+-+    |+---+---+--+-+       |+---+-+-+
 |+---+-+-+-+-+ |+--+-+----+|
||NM1|8|L1010_2|8|    ||AK4|2:0|66|1|       ||AK5|R|5|
 ||AK9|R|1|1|0| ||SE|8|2878||
|+---+-+-------+-+    |+---+---+--+-+       |+---+-+-+
 |+---+-+-+-+-+ |+--+-+----+|
+---------------------+---------------------+-------------------+--------------+-----------+

   chop msg2
+--------------------------------+
|+--+---+----+                   |
||ST|814|0001|                   |
|+--+---+----+                   |
+--------------------------------+
|+---+--+-------------+--------+ |
||BGN|13|2010063000001|20100630| |
|+---+--+-------------+--------+ |
+--------------------------------+
|+--+--+-------+-+---------+     |
||N1|8S|UTILITY|1|006912345|     |
|+--+--+-------+-+---------+     |
+--------------------------------+
|+--+--+--------+-+-------------+|
||N1|SJ|SUPPLIER|9|007909111IL00||
|+--+--+--------+-+-------------+|
+--------------------------------+
|+--+--+-------------+           |
||N1|8R|CUSTOMER NAME|           |
|+--+--+-------------+           |
+--------------------------------+
|+---+-+--+--+--+--+             |
||LIN|1|SH|EL|SH|CE|             |
|+---+-+--+--+--+--+             |
+--------------------------------+
|+---+-+---+                     |
||ASI|7|021|                     |
|+---+-+---+                     |
+--------------------------------+
|+---+--+----------+             |
||REF|11|0012345600|             |
|+---+--+----------+             |
+--------------------------------+
|+---+--+----------+             |
||REF|12|0312345624|             |
|+---+--+----------+             |
+--------------------------------+
|+---+---+---+                   |
||REF|BLT|LDC|                   |
|+---+---+---+                   |
+--------------------------------+
|+---+--+----+                   |
||REF|PC|DUAL|                   |
|+---+--+----+                   |
+--------------------------------+
|+---+--+-+                      |
||REF|9V|Y|                      |
|+---+--+-+                      |
+--------------------------------+
|+--+--+----+                    |
||SE|13|0001|                    |
|+--+--+----+                    |
+--------------------------------+



On Fri, Nov 13, 2015 at 2:31 PM, George Dallas <[email protected]>
wrote:

> Hi Chris, thank you for the reply. I'll start studying J's cut. It looks
> like it'll require some hard studying from what I see in the dictionary
> entry for cut (pasted below).
>
> Regards,
> George
>
> *Cut *m;.n  u;.n  _ 1/2 _
>
> x u;.0 y applies u to a rectangle or cuboid of y with one vertex at the
> point in y indexed by v=:0{x , and with the opposite vertex determined as
> follows: the dimension is |1{x , but the rectangle extends *back* from v
> along
> any axis j for which the index j{v is negative. Finally, the order of the
> selected items is reversed along each axis k for which k{1{x is negative.
> If
>  xis a vector, it is treated as the matrix 0,:x .
>
>
>
> ----------------------------------------------------------------------------------------------------------------------------------
> chris burke cburke at jsoftware.com
> <programming%
> 40forums.jsoftware.com?Subject=Re%3A%20%5BJprogramming%5D%20Parsing%20EDI%20data%20and%20converting%20them%20into%20a%0A%20database%20format&In-Reply-To=%3CCAAK_udWVCzatMug3QR7JqkaN03BCJ3Hy6d-Xuh1hGx2ukEFisA%40mail.gmail.com%3E
> >
> *Fri Nov 13 18:53:56 UTC 2015*
>
> I did this some years ago and found that J can parse any given EDI format
> very efficiently, using cut to chop up the strings. You might need
> different functions for specific EDI formats, rather than a single function
> to parse arbitrary EDI.
>
>
> On Fri, Nov 13, 2015 at 12:36 PM, George Dallas <[email protected]>
> wrote:
>
> > Hi Joe, thank you for your reply. I am indeed thinking about a subset of
> X12 messages and specifically 20 types of utility exchanges with power
> suppliers, found on the link here:
> https://www.ameren.com/business-partners/cpwg/illinois-edi-implementation-guide
> .
> >
> > The x12parser you mentioned is a good and extensive project and with a
> little work it might provide for what I need, but it's the verbosity of C#
> used there that drives me towards thinking of a cleaner version that
> possibly could be implemented in J.
> >
> > I'm wondering if given any specification, say the 997 you mentioned
> below, the essence of the problem of converting an edi message to a flat
> file in normalized form can be expressed concisely in J. If that were the
> case, I suspect it would scale better and be a much faster implementation.
> >
> > If I were to go down this route are there any J facilities you'd
> recommend for parsing and transforming text files?
> >
> > Thank you,
> >
> > George
> >
> >
> >
> ------------------------------------------------------------------------------------------------------
> >
> > On Fri, Nov 13, 2015 at 11:10 AM, George Dallas <george.dallas at
> gmail.com <http://jsoftware.com/mailman/listinfo/programming>> wrote:
> > >* Hello,
> > *>>* Has anyone had the chance to work with EDI data using J?
> > *
> > Hi George, I have not, but I spent a few minutes looking into it.
> >
> > >>* Of course there is a huge industry out there spun to deal with this
> > *>* problem, but I was wondering if anyone have had to tackle the issue
> using J
> > *>* and if you think it's a doable project for J.
> > *>
> > I think we would need a bit more information about what you see for
> > the project. Are you interested in building a library in J capable of
> > parsing and interpreting all the various types of X12 messages or do
> > you just need to work with a subset?
> >
> > If you were working with a small subset then I would consider
> > implementing just what is necessary to parse those messages. If it's
> > many messages, then I would lean towards integrating with something
> > that has already solved the problem. The spec sounds reasonably
> > complex and to make use of the information, the definitions are
> > required.
> >
> > Here's one possible implementation to work with:
> https://x12parser.codeplex.com/
> >
> > Here's the 997 specification out of the nearly 1000 options
> >
> https://x12parser.codeplex.com/SourceControl/latest#trunk/src/OopFactory.X12/Specifications/Ansi-997-4010Specification.xml
> >
> >
> > On Fri, Nov 13, 2015 at 10:10 AM, George Dallas <[email protected]
> >
> > wrote:
> >
> >> Hello,
> >>
> >> Has anyone had the chance to work with EDI data using J?
> >>
> >> EDI messages are text files formatted for facilitating business to
> >> business communications. If one has a sufficient large history of these
> >> files and manage to insert them into a database, then querying the
> database
> >> would give answers to many business questions regarding customers, costs
> >> etc.
> >>
> >> The link and text pasted below I found it to be a concise description of
> >> the problem.
> >>
> >> Of course there is a huge industry out there spun to deal with this
> >> problem, but I was wondering if anyone have had to tackle the issue
> using J
> >> and if you think it's a doable project for J.
> >>
> >> Regards,
> >> George
> >>
> >>
> >>
> >> https://github.com/pstuteville/x12
> >>
> >> == The problem
> >>
> >> X12 is a set of "standards" possessing all the elegance of an elephant
> >> designed by committee, and quite literally so, see http://www.x12.org.
> >> X12 defines rough syntax for specifying text messages, but each of
> >> more than 300 specifications defines its own message structure. While
> >> messages themselves are easy to parse with a simple tokenizer, their
> >> semantics is heavily dependent on the domain. For example, this is
> >> X12/997 message conveying "Functional Acknowledgment":
> >>
> >>   ST*997*2878~AK1*HS*293328532~AK2*270*307272179~AK3*NM1*8*L1010_0*8~
> >>   AK4*0:0*66*1~AK4*0:1*66*1~AK4*0:2*66*1~AK3*NM1*8*L1010_1*8~AK4*1:0*
> >>   66*1~AK4*1:1*66*1~AK3*NM1*8*L1010_2*8~AK4*2:0*66*1~AK5*R*5~AK9*R*1*
> >>   1*0~SE*8*2878~
> >>
> >> I.e., X12 defines an alphabet and somewhat of a dictionary - not a
> >> grammar or semantics for each particular data interchange
> >> conversation. Because of many entrenched implementations and
> >> government mandates, the X12 is not going to die anytime soon,
> >> unfortunately.
> >>
> >> The message above can be easily represented in Ruby as a nested array:
> >>
> >>  m = [
> >>       ['ST', '997', '2878'],
> >>       ['AK1', 'HS', '293328532'],
> >>       ['AK2', '270', '307272179'],
> >>       ['AK3', 'NM1', '8', 'L1010_0', '8'],
> >>       ['AK4', '0:0', '66', '1'],
> >>       ['AK4', '0:1', '66', '1'],
> >>       ['AK4', '0:2', '66', '1'],
> >>       ['AK3', 'NM1', '8', 'L1010_1', '8'],
> >>       ['AK4', '1:0', '66', '1'],
> >>       ['AK4', '1:1', '66', '1'],
> >>       ['AK3', 'NM1', '8', 'L1010_2', '8'],
> >>       ['AK4', '2:0', '66', '1'],
> >>       ['AK5', 'R', '5'],
> >>       ['AK9', 'R', '1', '1', '0'],
> >>       ['SE', '8', '2878'],
> >>      ]
> >>
> >> but it will not help any since, say, segment 'AK4' is ambiguously
> >> defined and its meaning not at all obvious until the message's
> >> structure is interpreted and correct 'AK4' segment is found.
> >>
> >> == The solution
> >>
> >> === Message structure
> >>
> >> Each participant in EDI has to know the structure of the data coming
> >> across the wire - X12 or no X12. The X12 structures are defined in
> >> so-called Implementation Guides - thick books with all the data pieces
> >> spelled out. There is no other choice, but to invent a
> >> computer-readable definition language that will codify these
> >> books. For familiarity sake we'll use XML. For example, the X12/997
> >> message can be defined as
> >>
> >>   <Definition>
> >>     <Loop name="997">
> >>       <Segment name="ST" min="1" max="1"/>
> >>       <Segment name="AK1" min="1" max="1"/>
> >>       <Loop name="L1000" max="999999" required="y">
> >>         <Segment name="AK2" max="1" required="n"/>
> >>         <Loop name="L1010" max="999999" required="n">
> >>           <Segment name="AK3" max="1" required="n"/>
> >>           <Segment name="AK4" max="99" required="n"/>
> >>         </Loop>
> >>         <Segment name="AK5" max="1" required="y"/>
> >>       </Loop>
> >>       <Segment name="AK9" max="1" required="y"/>
> >>       <Segment name="SE"  max="1" required="y"/>
> >>     </Loop>
> >>   </Definition>
> >>
> >> Namely, the 997 is a 'loop' containing segments ST (only one), AK1
> >> (also only one), another loop L1000 (zero or many repeats), segments
> >> AK9 and SE. The loop L1000 can contain a segment AK2 (optional) and
> >> another loop L1010 (zero or many), and so on.
> >>
> >> The segments' structure can be further defined as, for example,
> >>
> >>   <Segment name="AK2">
> >>     <Field name="TransactionSetIdentifierCode" required="y" min="3"
> max="3" validation="T143"/>
> >>     <Field name="TransactionSetControlNumber"  required="y" min="4"
> max="9"/>
> >>   </Segment>
> >>
> >> which defines a segment AK2 as having two fields:
> >> TransactionSetIdentifierCode and TransactionSetControlNumber. The
> >> field TransactionSetIdentifierCode is defined as having a type of
> >> string (default), being required, having length of minimum 3 and
> >> maximum 3 characters, and being validated against a table T143. The
> >> validation table is defined as
> >>
> >>   <Table name="T143">
> >>     <Entry name="100" value="Insurance Plan Description"/>
> >>     <Entry name="101" value="Name and Address Lists"/>
> >>     ...
> >>     <Entry name="997" value="Functional Acknowledgment"/>
> >>     <Entry name="998" value="Set Cancellation"/>
> >>   </Table>
> >>
> >> with entries having just names and values.
> >>
> >> This message is fully flashed out in an example 'misc/997.xml' file,
> >> copied from the ASC X12N 276/277 (004010X093) "Health Care
> >> Claim Status Request and Response" National Electronic Data
> >> Interchange Transaction Set Implementation Guide.
> >>
> >> Now expressions like
> >>
> >>   message.L1000.L1010[1].AK4.DataElementReferenceNumber
> >>
> >> start making sense of sorts, overall X12's idiocy notwithstanding - it's
> >> a field called 'DataElementReferenceNumber' of a first of possibly
> >> many segments 'AK4' found in the second repeat of the loop 'L1010'
> >> inside the enclosing loop 'L1000'. The meaning of the value '66' found
> >> in this field is still in the eye of the beholder, but, at least its
> >> location is clearly identified in the message.
> >>
> >>
> >>
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Parsing EDI data and converting them into a database format

Reply via email to