Thanks Raul! I would have totally missed that and I think this cut must also be what Joe used in his chop function.
Regards, George *--------------------------------------------------------------------------------------------------------------------------------------------------------* Raul Miller rauldmiller at gmail.com <programming%40forums.jsoftware.com?Subject=Re%3A%20%5BJprogramming%5D%20Parsing%20EDI%20data%20and%20converting%20them%20into%20a%0A%20database%20format&In-Reply-To=%3CCAD2jOU88de%2BNpjHGCzOffJMYNwrtG1S00KbQFYits1McDu3f6Q%40mail.gmail.com%3E> *Fri Nov 13 20:30:03 UTC 2015* Please keep in mind that there's cut and there's http://www.jsoftware.com/help/dictionary/d331.htm Here's the cut which I think Chris was referring to: cut ' '&$: :([: -.&a: <;._2@,~) That verb actually useshttp://www.jsoftware.com/help/dictionary/d331.htm but it's predefined to break on a specific character (which defaults to ' ' but can be specified as the left argument). cut 'this is a test' +----+--+-+----+ |*this|is|a|test| *+----+--+-+----+ 't' cut 'this is a test' +---------+--+ |*his is a |es| *+---------+--+ I hope this helps, -- Raul On Fri, Nov 13, 2015 at 3:49 PM, George Dallas <[email protected]> wrote: > Joe, this is amazing!! What an incredibly powerful one-line function!! This > is not just a step in the right direction, but actually you're putting me in > a canon and shoot me flying towards the right direction :-) > > Of course there is a lot to study here for me, but enabling me to study in > the context of the problem is extremely helpful. > > Thank you very much! > > George > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > Joe Bogner joebogner at gmail.com > <programming%40forums.jsoftware.com?Subject=Re%3A%20%5BJprogramming%5D%20Parsing%20EDI%20data%20and%20converting%20them%20into%20a%0A%20database%20format&In-Reply-To=%3CCAEtzV1a2eOLUNeq_%3DU0VxarbxqXM0pyU0Coj%3D97fYK0KcAL8Bg%40mail.gmail.com%3E> > *Fri Nov 13 20:07:31 UTC 2015* > > George, here's some ideas to get you started: > > Forgive the terrible display in email. See > this:https://gist.github.com/joebo/42c914ba332c9e5d628c > > > msg=: 0 : 0 > ST*997*2878~AK1*HS*293328532~AK2*270*307272179~ > AK3*NM1*8*L1010_0*8~AK4*0:0*66*1~AK4*0:1*66*1~AK4*0:2* > 66*1~AK3*NM1*8*L1010_1*8~AK4*1:0*66*1~AK4*1:1*66*1~AK3* > NM1*8*L1010_2*8~AK4*2:0*66*1~AK5*R*5~AK9*R*1*1*0~SE*8*2878~ > ) > > NB. > fromhttps://www.ameren.com/-/media/corporate-site/Files/BusinessPartners/CPWG/CPWGIL814E-Request.pdf > msg2 =: 0 : 0 > ST*814*0001 > BGN*13*2010063000001*20100630 > N1*8S*UTILITY*1*006912345 > N1*SJ*SUPPLIER*9*007909111IL00 > N1*8R*CUSTOMER NAME > LIN*1*SH*EL*SH*CE > ASI*7*021 > REF*11*0012345600 > REF*12*0312345624 > REF*BLT*LDC > REF*PC*DUAL > REF*9V*Y > SE*13*0001 > ) > > chop=: >@: ((('*' cut ]&dlb) each each) @: ('~' cut each ]) @: (LF cut ])) > > > On Fri, Nov 13, 2015 at 1:31 PM, George Dallas <[email protected]> > wrote: > >> Hi Chris, thank you for the reply. I'll start studying J's cut. It looks >> like it'll require some hard studying from what I see in the dictionary >> entry for cut (pasted below). >> >> Regards, >> George >> >> *Cut *m;.n u;.n _ 1/2 _ >> >> x u;.0 y applies u to a rectangle or cuboid of y with one vertex at the >> point in y indexed by v=:0{x , and with the opposite vertex determined >> as follows: the dimension is |1{x , but the rectangle extends *back* from >> v along any axis j for which the index j{v is negative. Finally, the >> order of the selected items is reversed along each axis k for which >> k{1{x is negative. If xis a vector, it is treated as the matrix 0,:x . >> >> >> >> ---------------------------------------------------------------------------------------------------------------------------------- >> chris burke cburke at jsoftware.com >> <programming%40forums.jsoftware.com?Subject=Re%3A%20%5BJprogramming%5D%20Parsing%20EDI%20data%20and%20converting%20them%20into%20a%0A%20database%20format&In-Reply-To=%3CCAAK_udWVCzatMug3QR7JqkaN03BCJ3Hy6d-Xuh1hGx2ukEFisA%40mail.gmail.com%3E> >> *Fri Nov 13 18:53:56 UTC 2015* >> >> I did this some years ago and found that J can parse any given EDI format >> very efficiently, using cut to chop up the strings. You might need >> different functions for specific EDI formats, rather than a single function >> to parse arbitrary EDI. >> >> >> On Fri, Nov 13, 2015 at 12:36 PM, George Dallas <[email protected]> >> wrote: >> >>> Hi Joe, thank you for your reply. I am indeed thinking about a subset of >>> X12 messages and specifically 20 types of utility exchanges with power >>> suppliers, found on the link here: >>> https://www.ameren.com/business-partners/cpwg/illinois-edi-implementation-guide. >>> >>> The x12parser you mentioned is a good and extensive project and with a >>> little work it might provide for what I need, but it's the verbosity of C# >>> used there that drives me towards thinking of a cleaner version that >>> possibly could be implemented in J. >>> >>> I'm wondering if given any specification, say the 997 you mentioned below, >>> the essence of the problem of converting an edi message to a flat file in >>> normalized form can be expressed concisely in J. If that were the case, I >>> suspect it would scale better and be a much faster implementation. >>> >>> If I were to go down this route are there any J facilities you'd recommend >>> for parsing and transforming text files? >>> >>> Thank you, >>> >>> George >>> >>> >>> ------------------------------------------------------------------------------------------------------ >>> >>> On Fri, Nov 13, 2015 at 11:10 AM, George Dallas <george.dallas at gmail.com >>> <http://jsoftware.com/mailman/listinfo/programming>> wrote: >>> >* Hello, >>> *>>* Has anyone had the chance to work with EDI data using J? >>> * >>> Hi George, I have not, but I spent a few minutes looking into it. >>> >>> >>* Of course there is a huge industry out there spun to deal with this >>> *>* problem, but I was wondering if anyone have had to tackle the issue >>> using J >>> *>* and if you think it's a doable project for J. >>> *> >>> I think we would need a bit more information about what you see for >>> the project. Are you interested in building a library in J capable of >>> parsing and interpreting all the various types of X12 messages or do >>> you just need to work with a subset? >>> >>> If you were working with a small subset then I would consider >>> implementing just what is necessary to parse those messages. If it's >>> many messages, then I would lean towards integrating with something >>> that has already solved the problem. The spec sounds reasonably >>> complex and to make use of the information, the definitions are >>> required. >>> >>> Here's one possible implementation to work with: >>> https://x12parser.codeplex.com/ >>> >>> Here's the 997 specification out of the nearly 1000 options >>> https://x12parser.codeplex.com/SourceControl/latest#trunk/src/OopFactory.X12/Specifications/Ansi-997-4010Specification.xml >>> >>> >>> On Fri, Nov 13, 2015 at 10:10 AM, George Dallas <[email protected] >>> > wrote: >>> >>>> Hello, >>>> >>>> Has anyone had the chance to work with EDI data using J? >>>> >>>> EDI messages are text files formatted for facilitating business to >>>> business communications. If one has a sufficient large history of these >>>> files and manage to insert them into a database, then querying the database >>>> would give answers to many business questions regarding customers, costs >>>> etc. >>>> >>>> The link and text pasted below I found it to be a concise description >>>> of the problem. >>>> >>>> Of course there is a huge industry out there spun to deal with this >>>> problem, but I was wondering if anyone have had to tackle the issue using J >>>> and if you think it's a doable project for J. >>>> >>>> Regards, >>>> George >>>> >>>> >>>> >>>> https://github.com/pstuteville/x12 >>>> >>>> == The problem >>>> >>>> X12 is a set of "standards" possessing all the elegance of an elephant >>>> designed by committee, and quite literally so, see http://www.x12.org. >>>> X12 defines rough syntax for specifying text messages, but each of >>>> more than 300 specifications defines its own message structure. While >>>> messages themselves are easy to parse with a simple tokenizer, their >>>> semantics is heavily dependent on the domain. For example, this is >>>> X12/997 message conveying "Functional Acknowledgment": >>>> >>>> ST*997*2878~AK1*HS*293328532~AK2*270*307272179~AK3*NM1*8*L1010_0*8~ >>>> AK4*0:0*66*1~AK4*0:1*66*1~AK4*0:2*66*1~AK3*NM1*8*L1010_1*8~AK4*1:0* >>>> 66*1~AK4*1:1*66*1~AK3*NM1*8*L1010_2*8~AK4*2:0*66*1~AK5*R*5~AK9*R*1* >>>> 1*0~SE*8*2878~ >>>> >>>> I.e., X12 defines an alphabet and somewhat of a dictionary - not a >>>> grammar or semantics for each particular data interchange >>>> conversation. Because of many entrenched implementations and >>>> government mandates, the X12 is not going to die anytime soon, >>>> unfortunately. >>>> >>>> The message above can be easily represented in Ruby as a nested array: >>>> >>>> m = [ >>>> ['ST', '997', '2878'], >>>> ['AK1', 'HS', '293328532'], >>>> ['AK2', '270', '307272179'], >>>> ['AK3', 'NM1', '8', 'L1010_0', '8'], >>>> ['AK4', '0:0', '66', '1'], >>>> ['AK4', '0:1', '66', '1'], >>>> ['AK4', '0:2', '66', '1'], >>>> ['AK3', 'NM1', '8', 'L1010_1', '8'], >>>> ['AK4', '1:0', '66', '1'], >>>> ['AK4', '1:1', '66', '1'], >>>> ['AK3', 'NM1', '8', 'L1010_2', '8'], >>>> ['AK4', '2:0', '66', '1'], >>>> ['AK5', 'R', '5'], >>>> ['AK9', 'R', '1', '1', '0'], >>>> ['SE', '8', '2878'], >>>> ] >>>> >>>> but it will not help any since, say, segment 'AK4' is ambiguously >>>> defined and its meaning not at all obvious until the message's >>>> structure is interpreted and correct 'AK4' segment is found. >>>> >>>> == The solution >>>> >>>> === Message structure >>>> >>>> Each participant in EDI has to know the structure of the data coming >>>> across the wire - X12 or no X12. The X12 structures are defined in >>>> so-called Implementation Guides - thick books with all the data pieces >>>> spelled out. There is no other choice, but to invent a >>>> computer-readable definition language that will codify these >>>> books. For familiarity sake we'll use XML. For example, the X12/997 >>>> message can be defined as >>>> >>>> <Definition> >>>> <Loop name="997"> >>>> <Segment name="ST" min="1" max="1"/> >>>> <Segment name="AK1" min="1" max="1"/> >>>> <Loop name="L1000" max="999999" required="y"> >>>> <Segment name="AK2" max="1" required="n"/> >>>> <Loop name="L1010" max="999999" required="n"> >>>> <Segment name="AK3" max="1" required="n"/> >>>> <Segment name="AK4" max="99" required="n"/> >>>> </Loop> >>>> <Segment name="AK5" max="1" required="y"/> >>>> </Loop> >>>> <Segment name="AK9" max="1" required="y"/> >>>> <Segment name="SE" max="1" required="y"/> >>>> </Loop> >>>> </Definition> >>>> >>>> Namely, the 997 is a 'loop' containing segments ST (only one), AK1 >>>> (also only one), another loop L1000 (zero or many repeats), segments >>>> AK9 and SE. The loop L1000 can contain a segment AK2 (optional) and >>>> another loop L1010 (zero or many), and so on. >>>> >>>> The segments' structure can be further defined as, for example, >>>> >>>> <Segment name="AK2"> >>>> <Field name="TransactionSetIdentifierCode" required="y" min="3" >>>> max="3" validation="T143"/> >>>> <Field name="TransactionSetControlNumber" required="y" min="4" >>>> max="9"/> >>>> </Segment> >>>> >>>> which defines a segment AK2 as having two fields: >>>> TransactionSetIdentifierCode and TransactionSetControlNumber. The >>>> field TransactionSetIdentifierCode is defined as having a type of >>>> string (default), being required, having length of minimum 3 and >>>> maximum 3 characters, and being validated against a table T143. The >>>> validation table is defined as >>>> >>>> <Table name="T143"> >>>> <Entry name="100" value="Insurance Plan Description"/> >>>> <Entry name="101" value="Name and Address Lists"/> >>>> ... >>>> <Entry name="997" value="Functional Acknowledgment"/> >>>> <Entry name="998" value="Set Cancellation"/> >>>> </Table> >>>> >>>> with entries having just names and values. >>>> >>>> This message is fully flashed out in an example 'misc/997.xml' file, >>>> copied from the ASC X12N 276/277 (004010X093) "Health Care >>>> Claim Status Request and Response" National Electronic Data >>>> Interchange Transaction Set Implementation Guide. >>>> >>>> Now expressions like >>>> >>>> message.L1000.L1010[1].AK4.DataElementReferenceNumber >>>> >>>> start making sense of sorts, overall X12's idiocy notwithstanding - it's >>>> a field called 'DataElementReferenceNumber' of a first of possibly >>>> many segments 'AK4' found in the second repeat of the loop 'L1010' >>>> inside the enclosing loop 'L1000'. The meaning of the value '66' found >>>> in this field is still in the eye of the beholder, but, at least its >>>> location is clearly identified in the message. >>>> >>>> >>>> >>> >> > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
