[Jprogramming] Parsing EDI data and converting them into a database format

George Dallas Fri, 13 Nov 2015 08:11:30 -0800

Hello,

Has anyone had the chance to work with EDI data using J?


EDI messages are text files formatted for facilitating business to business
communications. If one has a sufficient large history of these files and
manage to insert them into a database, then querying the database would
give answers to many business questions regarding customers, costs etc.

The link and text pasted below I found it to be a concise description of
the problem.

Of course there is a huge industry out there spun to deal with this
problem, but I was wondering if anyone have had to tackle the issue using J
and if you think it's a doable project for J.

Regards,
George



https://github.com/pstuteville/x12

== The problem

X12 is a set of "standards" possessing all the elegance of an elephant
designed by committee, and quite literally so, see http://www.x12.org.
X12 defines rough syntax for specifying text messages, but each of
more than 300 specifications defines its own message structure. While
messages themselves are easy to parse with a simple tokenizer, their
semantics is heavily dependent on the domain. For example, this is
X12/997 message conveying "Functional Acknowledgment":

  ST*997*2878~AK1*HS*293328532~AK2*270*307272179~AK3*NM1*8*L1010_0*8~
  AK4*0:0*66*1~AK4*0:1*66*1~AK4*0:2*66*1~AK3*NM1*8*L1010_1*8~AK4*1:0*
  66*1~AK4*1:1*66*1~AK3*NM1*8*L1010_2*8~AK4*2:0*66*1~AK5*R*5~AK9*R*1*
  1*0~SE*8*2878~

I.e., X12 defines an alphabet and somewhat of a dictionary - not a
grammar or semantics for each particular data interchange
conversation. Because of many entrenched implementations and
government mandates, the X12 is not going to die anytime soon,
unfortunately.

The message above can be easily represented in Ruby as a nested array:

 m = [
      ['ST', '997', '2878'],
      ['AK1', 'HS', '293328532'],
      ['AK2', '270', '307272179'],
      ['AK3', 'NM1', '8', 'L1010_0', '8'],
      ['AK4', '0:0', '66', '1'],
      ['AK4', '0:1', '66', '1'],
      ['AK4', '0:2', '66', '1'],
      ['AK3', 'NM1', '8', 'L1010_1', '8'],
      ['AK4', '1:0', '66', '1'],
      ['AK4', '1:1', '66', '1'],
      ['AK3', 'NM1', '8', 'L1010_2', '8'],
      ['AK4', '2:0', '66', '1'],
      ['AK5', 'R', '5'],
      ['AK9', 'R', '1', '1', '0'],
      ['SE', '8', '2878'],
     ]

but it will not help any since, say, segment 'AK4' is ambiguously
defined and its meaning not at all obvious until the message's
structure is interpreted and correct 'AK4' segment is found.

== The solution

=== Message structure

Each participant in EDI has to know the structure of the data coming
across the wire - X12 or no X12. The X12 structures are defined in
so-called Implementation Guides - thick books with all the data pieces
spelled out. There is no other choice, but to invent a
computer-readable definition language that will codify these
books. For familiarity sake we'll use XML. For example, the X12/997
message can be defined as

  <Definition>
    <Loop name="997">
      <Segment name="ST" min="1" max="1"/>
      <Segment name="AK1" min="1" max="1"/>
      <Loop name="L1000" max="999999" required="y">
        <Segment name="AK2" max="1" required="n"/>
        <Loop name="L1010" max="999999" required="n">
          <Segment name="AK3" max="1" required="n"/>
          <Segment name="AK4" max="99" required="n"/>
        </Loop>
        <Segment name="AK5" max="1" required="y"/>
      </Loop>
      <Segment name="AK9" max="1" required="y"/>
      <Segment name="SE"  max="1" required="y"/>
    </Loop>
  </Definition>

Namely, the 997 is a 'loop' containing segments ST (only one), AK1
(also only one), another loop L1000 (zero or many repeats), segments
AK9 and SE. The loop L1000 can contain a segment AK2 (optional) and
another loop L1010 (zero or many), and so on.

The segments' structure can be further defined as, for example,

  <Segment name="AK2">
    <Field name="TransactionSetIdentifierCode" required="y" min="3"
max="3" validation="T143"/>
    <Field name="TransactionSetControlNumber"  required="y" min="4" max="9"/>
  </Segment>

which defines a segment AK2 as having two fields:
TransactionSetIdentifierCode and TransactionSetControlNumber. The
field TransactionSetIdentifierCode is defined as having a type of
string (default), being required, having length of minimum 3 and
maximum 3 characters, and being validated against a table T143. The
validation table is defined as

  <Table name="T143">
    <Entry name="100" value="Insurance Plan Description"/>
    <Entry name="101" value="Name and Address Lists"/>
    ...
    <Entry name="997" value="Functional Acknowledgment"/>
    <Entry name="998" value="Set Cancellation"/>
  </Table>

with entries having just names and values.

This message is fully flashed out in an example 'misc/997.xml' file,
copied from the ASC X12N 276/277 (004010X093) "Health Care
Claim Status Request and Response" National Electronic Data
Interchange Transaction Set Implementation Guide.

Now expressions like

  message.L1000.L1010[1].AK4.DataElementReferenceNumber

start making sense of sorts, overall X12's idiocy notwithstanding - it's
a field called 'DataElementReferenceNumber' of a first of possibly
many segments 'AK4' found in the second repeat of the loop 'L1010'
inside the enclosing loop 'L1000'. The meaning of the value '66' found
in this field is still in the eye of the beholder, but, at least its
location is clearly identified in the message.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] Parsing EDI data and converting them into a database format

Reply via email to