George, here's some ideas to get you started: Forgive the terrible display in email. See this: https://gist.github.com/joebo/42c914ba332c9e5d628c
msg=: 0 : 0 ST*997*2878~AK1*HS*293328532~AK2*270*307272179~ AK3*NM1*8*L1010_0*8~AK4*0:0*66*1~AK4*0:1*66*1~AK4*0:2* 66*1~AK3*NM1*8*L1010_1*8~AK4*1:0*66*1~AK4*1:1*66*1~AK3* NM1*8*L1010_2*8~AK4*2:0*66*1~AK5*R*5~AK9*R*1*1*0~SE*8*2878~ ) NB. from https://www.ameren.com/-/media/corporate-site/Files/BusinessPartners/CPWG/CPWGIL814E-Request.pdf msg2 =: 0 : 0 ST*814*0001 BGN*13*2010063000001*20100630 N1*8S*UTILITY*1*006912345 N1*SJ*SUPPLIER*9*007909111IL00 N1*8R*CUSTOMER NAME LIN*1*SH*EL*SH*CE ASI*7*021 REF*11*0012345600 REF*12*0312345624 REF*BLT*LDC REF*PC*DUAL REF*9V*Y SE*13*0001 ) chop=: >@: ((('*' cut ]&dlb) each each) @: ('~' cut each ]) @: (LF cut ])) chop msg +---------------------+---------------------+-------------------+--------------+-----------+ |+--+---+----+ |+---+--+---------+ |+---+---+---------+| | | ||ST|997|2878| ||AK1|HS|293328532| ||AK2|270|307272179|| | | |+--+---+----+ |+---+--+---------+ |+---+---+---------+| | | +---------------------+---------------------+-------------------+--------------+-----------+ |+---+---+-+-------+-+|+---+---+--+-+ |+---+---+--+-+ |+---+---+ | | ||AK3|NM1|8|L1010_0|8|||AK4|0:0|66|1| ||AK4|0:1|66|1| ||AK4|0:2| | | |+---+---+-+-------+-+|+---+---+--+-+ |+---+---+--+-+ |+---+---+ | | +---------------------+---------------------+-------------------+--------------+-----------+ |+--+-+ |+---+---+-+-------+-+|+---+---+--+-+ |+---+---+--+-+|+---+ | ||66|1| ||AK3|NM1|8|L1010_1|8|||AK4|1:0|66|1| ||AK4|1:1|66|1|||AK3| | |+--+-+ |+---+---+-+-------+-+|+---+---+--+-+ |+---+---+--+-+|+---+ | +---------------------+---------------------+-------------------+--------------+-----------+ |+---+-+-------+-+ |+---+---+--+-+ |+---+-+-+ |+---+-+-+-+-+ |+--+-+----+| ||NM1|8|L1010_2|8| ||AK4|2:0|66|1| ||AK5|R|5| ||AK9|R|1|1|0| ||SE|8|2878|| |+---+-+-------+-+ |+---+---+--+-+ |+---+-+-+ |+---+-+-+-+-+ |+--+-+----+| +---------------------+---------------------+-------------------+--------------+-----------+ chop msg2 +--------------------------------+ |+--+---+----+ | ||ST|814|0001| | |+--+---+----+ | +--------------------------------+ |+---+--+-------------+--------+ | ||BGN|13|2010063000001|20100630| | |+---+--+-------------+--------+ | +--------------------------------+ |+--+--+-------+-+---------+ | ||N1|8S|UTILITY|1|006912345| | |+--+--+-------+-+---------+ | +--------------------------------+ |+--+--+--------+-+-------------+| ||N1|SJ|SUPPLIER|9|007909111IL00|| |+--+--+--------+-+-------------+| +--------------------------------+ |+--+--+-------------+ | ||N1|8R|CUSTOMER NAME| | |+--+--+-------------+ | +--------------------------------+ |+---+-+--+--+--+--+ | ||LIN|1|SH|EL|SH|CE| | |+---+-+--+--+--+--+ | +--------------------------------+ |+---+-+---+ | ||ASI|7|021| | |+---+-+---+ | +--------------------------------+ |+---+--+----------+ | ||REF|11|0012345600| | |+---+--+----------+ | +--------------------------------+ |+---+--+----------+ | ||REF|12|0312345624| | |+---+--+----------+ | +--------------------------------+ |+---+---+---+ | ||REF|BLT|LDC| | |+---+---+---+ | +--------------------------------+ |+---+--+----+ | ||REF|PC|DUAL| | |+---+--+----+ | +--------------------------------+ |+---+--+-+ | ||REF|9V|Y| | |+---+--+-+ | +--------------------------------+ |+--+--+----+ | ||SE|13|0001| | |+--+--+----+ | +--------------------------------+ On Fri, Nov 13, 2015 at 2:31 PM, George Dallas <[email protected]> wrote: > Hi Chris, thank you for the reply. I'll start studying J's cut. It looks > like it'll require some hard studying from what I see in the dictionary > entry for cut (pasted below). > > Regards, > George > > *Cut *m;.n u;.n _ 1/2 _ > > x u;.0 y applies u to a rectangle or cuboid of y with one vertex at the > point in y indexed by v=:0{x , and with the opposite vertex determined as > follows: the dimension is |1{x , but the rectangle extends *back* from v > along > any axis j for which the index j{v is negative. Finally, the order of the > selected items is reversed along each axis k for which k{1{x is negative. > If > xis a vector, it is treated as the matrix 0,:x . > > > > ---------------------------------------------------------------------------------------------------------------------------------- > chris burke cburke at jsoftware.com > <programming% > 40forums.jsoftware.com?Subject=Re%3A%20%5BJprogramming%5D%20Parsing%20EDI%20data%20and%20converting%20them%20into%20a%0A%20database%20format&In-Reply-To=%3CCAAK_udWVCzatMug3QR7JqkaN03BCJ3Hy6d-Xuh1hGx2ukEFisA%40mail.gmail.com%3E > > > *Fri Nov 13 18:53:56 UTC 2015* > > I did this some years ago and found that J can parse any given EDI format > very efficiently, using cut to chop up the strings. You might need > different functions for specific EDI formats, rather than a single function > to parse arbitrary EDI. > > > On Fri, Nov 13, 2015 at 12:36 PM, George Dallas <[email protected]> > wrote: > > > Hi Joe, thank you for your reply. I am indeed thinking about a subset of > X12 messages and specifically 20 types of utility exchanges with power > suppliers, found on the link here: > https://www.ameren.com/business-partners/cpwg/illinois-edi-implementation-guide > . > > > > The x12parser you mentioned is a good and extensive project and with a > little work it might provide for what I need, but it's the verbosity of C# > used there that drives me towards thinking of a cleaner version that > possibly could be implemented in J. > > > > I'm wondering if given any specification, say the 997 you mentioned > below, the essence of the problem of converting an edi message to a flat > file in normalized form can be expressed concisely in J. If that were the > case, I suspect it would scale better and be a much faster implementation. > > > > If I were to go down this route are there any J facilities you'd > recommend for parsing and transforming text files? > > > > Thank you, > > > > George > > > > > > > ------------------------------------------------------------------------------------------------------ > > > > On Fri, Nov 13, 2015 at 11:10 AM, George Dallas <george.dallas at > gmail.com <http://jsoftware.com/mailman/listinfo/programming>> wrote: > > >* Hello, > > *>>* Has anyone had the chance to work with EDI data using J? > > * > > Hi George, I have not, but I spent a few minutes looking into it. > > > > >>* Of course there is a huge industry out there spun to deal with this > > *>* problem, but I was wondering if anyone have had to tackle the issue > using J > > *>* and if you think it's a doable project for J. > > *> > > I think we would need a bit more information about what you see for > > the project. Are you interested in building a library in J capable of > > parsing and interpreting all the various types of X12 messages or do > > you just need to work with a subset? > > > > If you were working with a small subset then I would consider > > implementing just what is necessary to parse those messages. If it's > > many messages, then I would lean towards integrating with something > > that has already solved the problem. The spec sounds reasonably > > complex and to make use of the information, the definitions are > > required. > > > > Here's one possible implementation to work with: > https://x12parser.codeplex.com/ > > > > Here's the 997 specification out of the nearly 1000 options > > > https://x12parser.codeplex.com/SourceControl/latest#trunk/src/OopFactory.X12/Specifications/Ansi-997-4010Specification.xml > > > > > > On Fri, Nov 13, 2015 at 10:10 AM, George Dallas <[email protected] > > > > wrote: > > > >> Hello, > >> > >> Has anyone had the chance to work with EDI data using J? > >> > >> EDI messages are text files formatted for facilitating business to > >> business communications. If one has a sufficient large history of these > >> files and manage to insert them into a database, then querying the > database > >> would give answers to many business questions regarding customers, costs > >> etc. > >> > >> The link and text pasted below I found it to be a concise description of > >> the problem. > >> > >> Of course there is a huge industry out there spun to deal with this > >> problem, but I was wondering if anyone have had to tackle the issue > using J > >> and if you think it's a doable project for J. > >> > >> Regards, > >> George > >> > >> > >> > >> https://github.com/pstuteville/x12 > >> > >> == The problem > >> > >> X12 is a set of "standards" possessing all the elegance of an elephant > >> designed by committee, and quite literally so, see http://www.x12.org. > >> X12 defines rough syntax for specifying text messages, but each of > >> more than 300 specifications defines its own message structure. While > >> messages themselves are easy to parse with a simple tokenizer, their > >> semantics is heavily dependent on the domain. For example, this is > >> X12/997 message conveying "Functional Acknowledgment": > >> > >> ST*997*2878~AK1*HS*293328532~AK2*270*307272179~AK3*NM1*8*L1010_0*8~ > >> AK4*0:0*66*1~AK4*0:1*66*1~AK4*0:2*66*1~AK3*NM1*8*L1010_1*8~AK4*1:0* > >> 66*1~AK4*1:1*66*1~AK3*NM1*8*L1010_2*8~AK4*2:0*66*1~AK5*R*5~AK9*R*1* > >> 1*0~SE*8*2878~ > >> > >> I.e., X12 defines an alphabet and somewhat of a dictionary - not a > >> grammar or semantics for each particular data interchange > >> conversation. Because of many entrenched implementations and > >> government mandates, the X12 is not going to die anytime soon, > >> unfortunately. > >> > >> The message above can be easily represented in Ruby as a nested array: > >> > >> m = [ > >> ['ST', '997', '2878'], > >> ['AK1', 'HS', '293328532'], > >> ['AK2', '270', '307272179'], > >> ['AK3', 'NM1', '8', 'L1010_0', '8'], > >> ['AK4', '0:0', '66', '1'], > >> ['AK4', '0:1', '66', '1'], > >> ['AK4', '0:2', '66', '1'], > >> ['AK3', 'NM1', '8', 'L1010_1', '8'], > >> ['AK4', '1:0', '66', '1'], > >> ['AK4', '1:1', '66', '1'], > >> ['AK3', 'NM1', '8', 'L1010_2', '8'], > >> ['AK4', '2:0', '66', '1'], > >> ['AK5', 'R', '5'], > >> ['AK9', 'R', '1', '1', '0'], > >> ['SE', '8', '2878'], > >> ] > >> > >> but it will not help any since, say, segment 'AK4' is ambiguously > >> defined and its meaning not at all obvious until the message's > >> structure is interpreted and correct 'AK4' segment is found. > >> > >> == The solution > >> > >> === Message structure > >> > >> Each participant in EDI has to know the structure of the data coming > >> across the wire - X12 or no X12. The X12 structures are defined in > >> so-called Implementation Guides - thick books with all the data pieces > >> spelled out. There is no other choice, but to invent a > >> computer-readable definition language that will codify these > >> books. For familiarity sake we'll use XML. For example, the X12/997 > >> message can be defined as > >> > >> <Definition> > >> <Loop name="997"> > >> <Segment name="ST" min="1" max="1"/> > >> <Segment name="AK1" min="1" max="1"/> > >> <Loop name="L1000" max="999999" required="y"> > >> <Segment name="AK2" max="1" required="n"/> > >> <Loop name="L1010" max="999999" required="n"> > >> <Segment name="AK3" max="1" required="n"/> > >> <Segment name="AK4" max="99" required="n"/> > >> </Loop> > >> <Segment name="AK5" max="1" required="y"/> > >> </Loop> > >> <Segment name="AK9" max="1" required="y"/> > >> <Segment name="SE" max="1" required="y"/> > >> </Loop> > >> </Definition> > >> > >> Namely, the 997 is a 'loop' containing segments ST (only one), AK1 > >> (also only one), another loop L1000 (zero or many repeats), segments > >> AK9 and SE. The loop L1000 can contain a segment AK2 (optional) and > >> another loop L1010 (zero or many), and so on. > >> > >> The segments' structure can be further defined as, for example, > >> > >> <Segment name="AK2"> > >> <Field name="TransactionSetIdentifierCode" required="y" min="3" > max="3" validation="T143"/> > >> <Field name="TransactionSetControlNumber" required="y" min="4" > max="9"/> > >> </Segment> > >> > >> which defines a segment AK2 as having two fields: > >> TransactionSetIdentifierCode and TransactionSetControlNumber. The > >> field TransactionSetIdentifierCode is defined as having a type of > >> string (default), being required, having length of minimum 3 and > >> maximum 3 characters, and being validated against a table T143. The > >> validation table is defined as > >> > >> <Table name="T143"> > >> <Entry name="100" value="Insurance Plan Description"/> > >> <Entry name="101" value="Name and Address Lists"/> > >> ... > >> <Entry name="997" value="Functional Acknowledgment"/> > >> <Entry name="998" value="Set Cancellation"/> > >> </Table> > >> > >> with entries having just names and values. > >> > >> This message is fully flashed out in an example 'misc/997.xml' file, > >> copied from the ASC X12N 276/277 (004010X093) "Health Care > >> Claim Status Request and Response" National Electronic Data > >> Interchange Transaction Set Implementation Guide. > >> > >> Now expressions like > >> > >> message.L1000.L1010[1].AK4.DataElementReferenceNumber > >> > >> start making sense of sorts, overall X12's idiocy notwithstanding - it's > >> a field called 'DataElementReferenceNumber' of a first of possibly > >> many segments 'AK4' found in the second repeat of the loop 'L1010' > >> inside the enclosing loop 'L1000'. The meaning of the value '66' found > >> in this field is still in the eye of the beholder, but, at least its > >> location is clearly identified in the message. > >> > >> > >> > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
