Hello, Has anyone had the chance to work with EDI data using J?
EDI messages are text files formatted for facilitating business to business communications. If one has a sufficient large history of these files and manage to insert them into a database, then querying the database would give answers to many business questions regarding customers, costs etc. The link and text pasted below I found it to be a concise description of the problem. Of course there is a huge industry out there spun to deal with this problem, but I was wondering if anyone have had to tackle the issue using J and if you think it's a doable project for J. Regards, George https://github.com/pstuteville/x12 == The problem X12 is a set of "standards" possessing all the elegance of an elephant designed by committee, and quite literally so, see http://www.x12.org. X12 defines rough syntax for specifying text messages, but each of more than 300 specifications defines its own message structure. While messages themselves are easy to parse with a simple tokenizer, their semantics is heavily dependent on the domain. For example, this is X12/997 message conveying "Functional Acknowledgment": ST*997*2878~AK1*HS*293328532~AK2*270*307272179~AK3*NM1*8*L1010_0*8~ AK4*0:0*66*1~AK4*0:1*66*1~AK4*0:2*66*1~AK3*NM1*8*L1010_1*8~AK4*1:0* 66*1~AK4*1:1*66*1~AK3*NM1*8*L1010_2*8~AK4*2:0*66*1~AK5*R*5~AK9*R*1* 1*0~SE*8*2878~ I.e., X12 defines an alphabet and somewhat of a dictionary - not a grammar or semantics for each particular data interchange conversation. Because of many entrenched implementations and government mandates, the X12 is not going to die anytime soon, unfortunately. The message above can be easily represented in Ruby as a nested array: m = [ ['ST', '997', '2878'], ['AK1', 'HS', '293328532'], ['AK2', '270', '307272179'], ['AK3', 'NM1', '8', 'L1010_0', '8'], ['AK4', '0:0', '66', '1'], ['AK4', '0:1', '66', '1'], ['AK4', '0:2', '66', '1'], ['AK3', 'NM1', '8', 'L1010_1', '8'], ['AK4', '1:0', '66', '1'], ['AK4', '1:1', '66', '1'], ['AK3', 'NM1', '8', 'L1010_2', '8'], ['AK4', '2:0', '66', '1'], ['AK5', 'R', '5'], ['AK9', 'R', '1', '1', '0'], ['SE', '8', '2878'], ] but it will not help any since, say, segment 'AK4' is ambiguously defined and its meaning not at all obvious until the message's structure is interpreted and correct 'AK4' segment is found. == The solution === Message structure Each participant in EDI has to know the structure of the data coming across the wire - X12 or no X12. The X12 structures are defined in so-called Implementation Guides - thick books with all the data pieces spelled out. There is no other choice, but to invent a computer-readable definition language that will codify these books. For familiarity sake we'll use XML. For example, the X12/997 message can be defined as <Definition> <Loop name="997"> <Segment name="ST" min="1" max="1"/> <Segment name="AK1" min="1" max="1"/> <Loop name="L1000" max="999999" required="y"> <Segment name="AK2" max="1" required="n"/> <Loop name="L1010" max="999999" required="n"> <Segment name="AK3" max="1" required="n"/> <Segment name="AK4" max="99" required="n"/> </Loop> <Segment name="AK5" max="1" required="y"/> </Loop> <Segment name="AK9" max="1" required="y"/> <Segment name="SE" max="1" required="y"/> </Loop> </Definition> Namely, the 997 is a 'loop' containing segments ST (only one), AK1 (also only one), another loop L1000 (zero or many repeats), segments AK9 and SE. The loop L1000 can contain a segment AK2 (optional) and another loop L1010 (zero or many), and so on. The segments' structure can be further defined as, for example, <Segment name="AK2"> <Field name="TransactionSetIdentifierCode" required="y" min="3" max="3" validation="T143"/> <Field name="TransactionSetControlNumber" required="y" min="4" max="9"/> </Segment> which defines a segment AK2 as having two fields: TransactionSetIdentifierCode and TransactionSetControlNumber. The field TransactionSetIdentifierCode is defined as having a type of string (default), being required, having length of minimum 3 and maximum 3 characters, and being validated against a table T143. The validation table is defined as <Table name="T143"> <Entry name="100" value="Insurance Plan Description"/> <Entry name="101" value="Name and Address Lists"/> ... <Entry name="997" value="Functional Acknowledgment"/> <Entry name="998" value="Set Cancellation"/> </Table> with entries having just names and values. This message is fully flashed out in an example 'misc/997.xml' file, copied from the ASC X12N 276/277 (004010X093) "Health Care Claim Status Request and Response" National Electronic Data Interchange Transaction Set Implementation Guide. Now expressions like message.L1000.L1010[1].AK4.DataElementReferenceNumber start making sense of sorts, overall X12's idiocy notwithstanding - it's a field called 'DataElementReferenceNumber' of a first of possibly many segments 'AK4' found in the second repeat of the loop 'L1010' inside the enclosing loop 'L1000'. The meaning of the value '66' found in this field is still in the eye of the beholder, but, at least its location is clearly identified in the message. ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
