Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Glenn Knickerbocker
On 3/16/2023 6:38 PM, Rob van der Heij wrote:
> KEEP to include the line end characters in the output stream? Would you
> also need a BEFORE and AFTER, or is it good enough to just have them at the
> end? You could use STRIP TRAILING ANYOF with the same set.

I picture wanting them at the end of the record--but I'm sure cases
would arise where they'd be more useful at the start.

¬R


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Rob van der Heij
On Thu, 16 Mar 2023 at 23:29, Glenn Knickerbocker  wrote:

> On 3/16/2023 4:46 PM, Rob van der Heij wrote:
> > alternatives (with preference for the first one)
> > - line end is any unique sequence of the specified characters, so if you
> > specify the CR and LF as candidate, then CR, LF, CR LF, and LF CR are all
> > one single end of line, but CR CR would imply a null line between (like
> CR
> > LF LF)
> > - the first string of characters from that set is taken as the line end
> > sequence, until eof. So when you start with a bare CR then the next CR
> will
> > cause LF to be the start of a new line.
>
> I like both of these alternatives for different uses--and in both cases,
> it could be useful to have a KEEP operand and/or send the linend off to
> the alternate, to be able to reconstruct the file with the original
> separators.
>

That's why asking just gives me more work...  I still like the ANYOF
keyword to specify the set of characters, and ONCE added to stick with
whatever we got first.
KEEP to include the line end characters in the output stream? Would you
also need a BEFORE and AFTER, or is it good enough to just have them at the
end? You could use STRIP TRAILING ANYOF with the same set.

Rob


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Glenn Knickerbocker
On 3/16/2023 4:46 PM, Rob van der Heij wrote:
> alternatives (with preference for the first one)
> - line end is any unique sequence of the specified characters, so if you
> specify the CR and LF as candidate, then CR, LF, CR LF, and LF CR are all
> one single end of line, but CR CR would imply a null line between (like CR
> LF LF)
> - the first string of characters from that set is taken as the line end
> sequence, until eof. So when you start with a bare CR then the next CR will
> cause LF to be the start of a new line.

I like both of these alternatives for different uses--and in both cases,
it could be useful to have a KEEP operand and/or send the linend off to
the alternate, to be able to reconstruct the file with the original
separators.

¬R


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Rob van der Heij
On Thu, 16 Mar 2023 at 22:33, Paul Gilmartin  wrote:

>
> Otherwise, the format of the record separator might be an optional
> parameter to your program.
>

We already have the ability to specify the line-end character  or string.
The discussion was about when you don't know in advance what convention is
used. Some of the RFCs around HTTP suggest to "prefer CRLF but tolerate
other conventions"

Rob


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Paul Gilmartin

On 3/16/23 14:46:04, Rob van der Heij wrote:


...
Yes, I think we all realized the ambiguity. I was considering these
alternatives (with preference for the first one)
- line end is any unique sequence of the specified characters, so if you
specify the CR and LF as candidate, then CR, LF, CR LF, and LF CR are all
one single end of line, but CR CR would imply a null line between (like CR
LF LF)
- the first string of characters from that set is taken as the line end
sequence, until eof. So when you start with a bare CR then the next CR will
cause LF to be the start of a new line.



Be careful.  As I read it:
FOOBAR ...
is three records:
FOO (terminated by )
 (terminated by )
BAR ...

If that behavior satisfies your taste, amen, seasoned with a dash of GIGO.
If not, amend the rules and let the devil pose a test case.

OMVS provides for metadata specifying the line separator:



If you expect your code to run under TSO (unlikely) it should respect the
extended attributes of its input file.

Otherwise, the format of the record separator might be an optional
parameter to your program.

DWIM is likely to yield unexpected results.

--
gil


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Rob van der Heij
On Thu, 16 Mar 2023 at 21:29, Glenn Knickerbocker  wrote:

> Figuring out the reasonable assumptions to make to make that decision is
> the biggest part of what I meant by "getting it right."
>

Yes, I think we all realized the ambiguity. I was considering these
alternatives (with preference for the first one)
- line end is any unique sequence of the specified characters, so if you
specify the CR and LF as candidate, then CR, LF, CR LF, and LF CR are all
one single end of line, but CR CR would imply a null line between (like CR
LF LF)
- the first string of characters from that set is taken as the line end
sequence, until eof. So when you start with a bare CR then the next CR will
cause LF to be the start of a new line.

Rob


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Glenn Knickerbocker
On 3/13/2023 7:41 PM, Paul Gilmartin wrote:
> The problem is not well-posed.  Consider
>     Foo
>     
>     
>     Bar
> Is that two records, or three with a null record between the  and
> the .

Figuring out the reasonable assumptions to make to make that decision is
the biggest part of what I meant by "getting it right."

¬R


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-16 Thread Glenn Knickerbocker
On 3/15/2023 1:28 AM, Rob van der Heij wrote:
> Splitting a record with no words will pass the record (see example in usage
> note :-)

Right, and in that case it was a real null line in the input ...0a0a...,
so it hasn't added any *extra* null records.

¬R


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Rob van der Heij
Splitting a record with no words will pass the record (see example in usage
note :-)

On Wed, 15 Mar 2023 at 01:13, Glenn Knickerbocker  wrote:

> On Tue, 14 Mar 2023 08:07:15 +0100, Rob wrote:
> >On Tue, 14 Mar 2023 at 00:44, Donald Russell 
> wrote:
> >> —>   ... | deblock linend 0a | split 0d | ...
> >> Could that cause extra lines?
> >Yes, it does.
>
> I didn't find a case where SPLIT created any extra null records.  It will
> *lose* null lines if there's no 0x0a between them, so it has *both* of
> the problems I want to avoid in that case.
>
> (Something else that came to mind before SPLIT *did* add null records
> when both characters were present, but I forget what it was.  I was
> thinking it was some other option of DEBLOCK, but I'm not seeing one that
> doesn't take a byte stream as input.)
>
> ¬R
>


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Glenn Knickerbocker
On Tue, 14 Mar 2023 08:07:15 +0100, Rob wrote:
>On Tue, 14 Mar 2023 at 00:44, Donald Russell  wrote:
>> —>   ... | deblock linend 0a | split 0d | ...
>> Could that cause extra lines?
>Yes, it does. 

I didn't find a case where SPLIT created any extra null records.  It will
*lose* null lines if there's no 0x0a between them, so it has *both* of
the problems I want to avoid in that case.

(Something else that came to mind before SPLIT *did* add null records
when both characters were present, but I forget what it was.  I was
thinking it was some other option of DEBLOCK, but I'm not seeing one that
doesn't take a byte stream as input.)

¬R


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Rob van der Heij
On Tue, 14 Mar 2023 at 16:59, John P. Hartmann  wrote:

> On 3/14/23 16:05, Rob van der Heij wrote:
> > The repeating "range" doesn't really do that, but I just checked that
> >   strip x0d 1   will take one from either or both sides of the record.
>
> You're right.  It gives an error because it supports only ranges
> relative to the beginning of the record and from left to right.
>
> My apologies.
>

And we both know why! That's why XLATE got Usage Note 5.


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread John P. Hartmann

On 3/14/23 16:05, Rob van der Heij wrote:

The repeating "range" doesn't really do that, but I just checked that
  strip x0d 1   will take one from either or both sides of the record.


You're right.  It gives an error because it supports only ranges
relative to the beginning of the record and from left to right.

My apologies.


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Rob van der Heij
On Tue, 14 Mar 2023 at 15:56, John P. Hartmann  wrote:


> A more reliable way to remove just one cr from either end might be
> something like
>
> change (1 -1) x0d //
>

The repeating "range" doesn't really do that, but I just checked that
 strip x0d 1   will take one from either or both sides of the record.

Rob


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread John P. Hartmann

On 3/14/23 00:44, Donald Russell wrote:

If you know the data has crlf or lfcr or just lf but never just cr then
… deblock 0a | strip both 0d | …


A more reliable way to remove just one cr from either end might be 
something like


change (1 -1) x0d //


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-14 Thread Rob van der Heij
On Tue, 14 Mar 2023 at 00:44, Donald Russell  wrote:

> —>   ... | deblock linend 0a | split 0d | ...
>
> Could that cause extra lines?
>

Yes, it does. I agree removing a trailing x0d would have done. But the
challenge we see is that a file with just CR will buffer the input before
splitting. I've done a few things with "split after anyof x0d0a" and then
"joincont" on the leading CRLF, but it might make sense to enhance "deblock
linend" - I got frowns when I suggested to re-use the ANYOF keyword...

Rob


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-13 Thread Donald Russell
—>   ... | deblock linend 0a | split 0d | ...

Could that cause extra lines?

If you know the data has crlf or lfcr or just lf but never just cr then
… deblock 0a | strip both 0d | …
I suspect strip is more efficient than split because split has to scan
entire record, where as strip starts at each end and stops on the first
mismatch, no need to continue.


On Mon, Mar 13, 2023 at 15:43 Glenn Knickerbocker  wrote:

> (Copied here from an IBM internal discussion because I should have come
> here first anyway:)
>
> Anyone have an idiom for deblocking and translating a file in ASCII that
> may have either or both of CR and LF (*), and may be split into records,
> without unnecessarily buffering the whole file?  The possibility of null
> records that I might want to preserve makes this particularly confusing
> to think through.
>
> (*) in either order--I know LFCR is rare, but I'm positive I've run into
> it on some goofy old system, probably some BBS 30 years ago
>
>   ... | deblock linend 0a | split 0d | ...
>
> was good enough for the application at hand, but I was hoping someone
> might already have worked through getting it right.
>
> ¬R
>


Re: [CMS-PIPELINES] deblocking with various possible linends

2023-03-13 Thread Paul Gilmartin

On 3/13/23 16:41:47, Glenn Knickerbocker wrote:


Anyone have an idiom for deblocking and translating a file in ASCII that
may have either or both of CR and LF (*), ...


The problem is not well-posed.  Consider
Foo


Bar
Is that two records, or three with a null record between the  and the .


... and may be split into records,
without unnecessarily buffering the whole file?  The possibility of null
records that I might want to preserve makes this particularly confusing
to think through.

(*) in either order--I know LFCR is rare, but I'm positive I've run into
it on some goofy old system, probably some BBS 30 years ago


I believe PostScript accepts  and perhaps HTML does.  I've encountered
editors which assume the first apparent line separator characterize the file.

Some of my code has had problems with files created by Windows editors that
don't terminate the last line.


   ... | deblock linend 0a | split 0d | ...

was good enough for the application at hand, but I was hoping someone
might already have worked through getting it right.


FSVO "right".

--
gil