I didn't work out the entire regex out of laziness but adding -? at the 
beginning of the subexpression for the ISA control number will handle both 
cases.  There is no assumption that the segment terminator is at any particular 
position. 



You (plural) are correct that my regex assumes that the interchange is one 
continguous string.  Converting 80 character blocks back into that form is not 
hard to do, nor is converting one-segment per line.  Then we're back in 
business.  Also, I am aware there are ways to deal with data spread out over 
multiple lines within a regex, but I've never done that myself so I can't say 
for sure it would be able to handle the block form.

Chris's comments on the difficulty of debugging using regexes almost makes 
sense.  They tend to look like line noise at first, until you learn them.  
However, the code necessary in their place, in any language, must look even 
more 
intimidating and would present greater challenges in debugging simply by being 
longer.  In my opinion, the advantages justify taking the time to learn the 
skill.

Howard
1 Peter 4:10
PS: It's good to be back!  I'm with Werner Enterprises in Omaha now.
 



________________________________
From: Michael  Mattias/LS <[email protected]>
To: [email protected]
Sent: Thu, July 22, 2010 6:51:59 AM

Subject: Re: [EDI-L] <TECH>ISA recognitionH

  
> This is where regular expressions really shine.
>
> /ISA(.).{2}\1.{10}\1.{2}/
>
> which says "look for the upper case letters ISA followed by a single 
> character
> (remember it), followed by 2 characters, your remembered character, 10
> characters, that pesky character again, followed by 2 more ... and on i

One problem with anything which assumes a fixed length of 106 (or even the 
'negative control number' size of 107) bytes is, it cannot recognize 
'blocked' format; that is, 80 (or other value) characters, <newline>rest of 
segment..... OK, solved that.... now deal simulataneously with the problem 
when the 'segment terminator'  *is* <newline>.... and don't forget <newline> 
can be *either*  CRLF (PC) or LF only ('nix).... oh, and did I mention? 
There is still some data out there in which data are nominally one segment 
per record, but space-filled to some fixed record length following the 
segment's significant data.

Oh yes, I almost forgot... all these different formats may exist within the 
same input, the only constant being the format will not change WITHIN an 
Interchange.

Absent some constraints on "input format",  there's really no shortcut to 
"find valid ISA (or any other) segment"  except doing it more or less 
character by character and keeping track of what you've got so far, which is 
how I wrote my scanner back in '94 or '95 or so. Almost all the maintenance 
on it since then has been dealing with new and imaginative forms of 
invalid-ness.

Michael C. Mattias
Tal Systems Inc.
Racine WI


      

[Non-text portions of this message have been removed]



------------------------------------

...
Please use the following Message Identifiers as your subject prefix: <SALES>, 
<JOBS>, <LIST>, <TECH>, <MISC>, <EVENT>, <OFF-TOPIC>

Job postings are welcome, but for job postings or requests for work: <JOBS> IS 
REQUIRED in the subject line as a prefix.Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/EDI-L/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/EDI-L/join
    (Yahoo! ID required)

<*> To change settings via email:
    [email protected] 
    [email protected]

<*> To unsubscribe from this group, send an email to:
    [email protected]

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/

Reply via email to