I did this some years ago and found that J can parse any given EDI format very efficiently, using cut to chop up the strings. You might need different functions for specific EDI formats, rather than a single function to parse arbitrary EDI.
On 13 November 2015 at 08:10, George Dallas <[email protected]> wrote: > Hello, > > Has anyone had the chance to work with EDI data using J? > > EDI messages are text files formatted for facilitating business to business > communications. If one has a sufficient large history of these files and > manage to insert them into a database, then querying the database would > give answers to many business questions regarding customers, costs etc. > > The link and text pasted below I found it to be a concise description of > the problem. > > Of course there is a huge industry out there spun to deal with this > problem, but I was wondering if anyone have had to tackle the issue using J > and if you think it's a doable project for J. > > Regards, > George > > > > https://github.com/pstuteville/x12 > > == The problem > > X12 is a set of "standards" possessing all the elegance of an elephant > designed by committee, and quite literally so, see http://www.x12.org. > X12 defines rough syntax for specifying text messages, but each of > more than 300 specifications defines its own message structure. While > messages themselves are easy to parse with a simple tokenizer, their > semantics is heavily dependent on the domain. For example, this is > X12/997 message conveying "Functional Acknowledgment": > > ST*997*2878~AK1*HS*293328532~AK2*270*307272179~AK3*NM1*8*L1010_0*8~ > AK4*0:0*66*1~AK4*0:1*66*1~AK4*0:2*66*1~AK3*NM1*8*L1010_1*8~AK4*1:0* > 66*1~AK4*1:1*66*1~AK3*NM1*8*L1010_2*8~AK4*2:0*66*1~AK5*R*5~AK9*R*1* > 1*0~SE*8*2878~ > > I.e., X12 defines an alphabet and somewhat of a dictionary - not a > grammar or semantics for each particular data interchange > conversation. Because of many entrenched implementations and > government mandates, the X12 is not going to die anytime soon, > unfortunately. > > The message above can be easily represented in Ruby as a nested array: > > m = [ > ['ST', '997', '2878'], > ['AK1', 'HS', '293328532'], > ['AK2', '270', '307272179'], > ['AK3', 'NM1', '8', 'L1010_0', '8'], > ['AK4', '0:0', '66', '1'], > ['AK4', '0:1', '66', '1'], > ['AK4', '0:2', '66', '1'], > ['AK3', 'NM1', '8', 'L1010_1', '8'], > ['AK4', '1:0', '66', '1'], > ['AK4', '1:1', '66', '1'], > ['AK3', 'NM1', '8', 'L1010_2', '8'], > ['AK4', '2:0', '66', '1'], > ['AK5', 'R', '5'], > ['AK9', 'R', '1', '1', '0'], > ['SE', '8', '2878'], > ] > > but it will not help any since, say, segment 'AK4' is ambiguously > defined and its meaning not at all obvious until the message's > structure is interpreted and correct 'AK4' segment is found. > > == The solution > > === Message structure > > Each participant in EDI has to know the structure of the data coming > across the wire - X12 or no X12. The X12 structures are defined in > so-called Implementation Guides - thick books with all the data pieces > spelled out. There is no other choice, but to invent a > computer-readable definition language that will codify these > books. For familiarity sake we'll use XML. For example, the X12/997 > message can be defined as > > <Definition> > <Loop name="997"> > <Segment name="ST" min="1" max="1"/> > <Segment name="AK1" min="1" max="1"/> > <Loop name="L1000" max="999999" required="y"> > <Segment name="AK2" max="1" required="n"/> > <Loop name="L1010" max="999999" required="n"> > <Segment name="AK3" max="1" required="n"/> > <Segment name="AK4" max="99" required="n"/> > </Loop> > <Segment name="AK5" max="1" required="y"/> > </Loop> > <Segment name="AK9" max="1" required="y"/> > <Segment name="SE" max="1" required="y"/> > </Loop> > </Definition> > > Namely, the 997 is a 'loop' containing segments ST (only one), AK1 > (also only one), another loop L1000 (zero or many repeats), segments > AK9 and SE. The loop L1000 can contain a segment AK2 (optional) and > another loop L1010 (zero or many), and so on. > > The segments' structure can be further defined as, for example, > > <Segment name="AK2"> > <Field name="TransactionSetIdentifierCode" required="y" min="3" > max="3" validation="T143"/> > <Field name="TransactionSetControlNumber" required="y" min="4" > max="9"/> > </Segment> > > which defines a segment AK2 as having two fields: > TransactionSetIdentifierCode and TransactionSetControlNumber. The > field TransactionSetIdentifierCode is defined as having a type of > string (default), being required, having length of minimum 3 and > maximum 3 characters, and being validated against a table T143. The > validation table is defined as > > <Table name="T143"> > <Entry name="100" value="Insurance Plan Description"/> > <Entry name="101" value="Name and Address Lists"/> > ... > <Entry name="997" value="Functional Acknowledgment"/> > <Entry name="998" value="Set Cancellation"/> > </Table> > > with entries having just names and values. > > This message is fully flashed out in an example 'misc/997.xml' file, > copied from the ASC X12N 276/277 (004010X093) "Health Care > Claim Status Request and Response" National Electronic Data > Interchange Transaction Set Implementation Guide. > > Now expressions like > > message.L1000.L1010[1].AK4.DataElementReferenceNumber > > start making sense of sorts, overall X12's idiocy notwithstanding - it's > a field called 'DataElementReferenceNumber' of a first of possibly > many segments 'AK4' found in the second repeat of the loop 'L1010' > inside the enclosing loop 'L1000'. The meaning of the value '66' found > in this field is still in the eye of the beholder, but, at least its > location is clearly identified in the message. > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
