Re: Dropping duplicates.

2005-06-30 Thread David Andrews
Check out the Unix uniq utility, which will eliminate duplicate lines
in a sorted file.

-- 
David Andrews
A. Duda and Sons, Inc.
[EMAIL PROTECTED]

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Dropping duplicates.

2005-06-29 Thread Adrian H Auer-Hudson
Listers,

Anyone have any thoughts on this:  I have a large sequential file.  I need to 
drop duplicate records from said file.  Sort would work fine if I knew the 
correct key sequence.  This information is not imediately available.   The file 
needs to retain its input sequence.  Duplicates are always grouped together.

Short of writing a program is there a quick way to fix this?

Thanks

Adrian.

Webmaster, http://www.losangelesmetro.net.
Supporter of Expo Light Rail - Enabler for the Digital Coast 
http://www.friends4expo.org.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: Dropping duplicates.

2005-06-29 Thread Adrian H Auer-Hudson
Thanks Frank,

No DFSORT and NO ICETools here.  DFSORT would have been great.  Regular sort 
will not work without a Sort or merge statement.

The file characteristics are:

 Organization  . . . : PS 
 Record format . . . : FB 
 Record length . . . : 320
 Block size  . . . . : 27840  

The first eighty bytes look like:

077075333730D2123200506001435L062M79506   MPC   

It fills 114 cylinders.

Dups are identical records and they should not exist.

Thanks again

A.




Webmaster, http://www.losangelesmetro.net.
Supporter of Expo Light Rail - Enabler for the Digital Coast 
http://www.friends4expo.org.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: Dropping duplicates.

2005-06-29 Thread Frank Yaeger
Actually I thought of a much better way to do this with DFSORT's ICETOOL
given that you say all of the duplicates are grouped together.
This version only requires one copy pass rather than two sort passes.

//S1EXEC  PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD *
 01
 02
 01
 01
 02
 03
 04
 01
 02
 03
 01
/*
//OUT DD SYSOUT=*
//TOOLIN DD *
* Select first record with each key.
 SELECT FROM(IN) TO(OUT) ON(1,4,CH) FIRST USING(CTL1)
/*
//CTL1CNTL DD *
* Force copy instead of sort since dup records are
* grouped together.
   OPTION COPY
/*

Frank Yaeger - DFSORT Team (IBM)
 Specialties: ICETOOL, IFTHEN, OVERLAY, Symbols, Migration
 = DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort/

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: Dropping duplicates.

2005-06-29 Thread Arthur T.
On 29 Jun 2005 17:21:35 -0700, in bit.listserv.ibm-main 
(Message-ID:[EMAIL PROTECTED]) 
[EMAIL PROTECTED] (Adrian H Auer-Hudson) wrote:


Anyone have any thoughts on this:  I have a large 
sequential file.  I need to drop duplicate records from 
said file.  Sort would work fine if I knew the correct key 
sequence.  This information is not imediately 
available.   The file needs to retain its input 
sequence.  Duplicates are always grouped together.


Short of writing a program is there a quick way to fix this?


 These are statements for Syncsort.  I'm not sure if 
there's an exact equivalent for DFSORT.  Also, I haven't 
tried it, but this *might* work:


SORT FIELDS=COPY
   SUM FIELDS=NONE  


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html