Re: identify duplicates
What about comparing hashes? > Am 31.10.2017 um 21:40 schrieb David Witton via 4D_Tech > <4d_tech@lists.4d.com>: > Does anyone have a suggestion on how to proceed? Reiner Kief Software & IT-Consulting Weinbergstrasse 45 63853 Moemlingen Germany www.kief-online.de Member of the german developer network die4DWerkstatt ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: identify duplicates
Here is a strategy I have used since V6 to enforce a semi-informal primary key for records in an important file. Another parameter was added when our business changed a few years ago. I think you could make it work for your task: C_BOOLEAN($0;$DupExists) C_TEXT($2;$3;$Value1;$Value2) C_LONGINT($1;$CurrentKey;$4;$Value3) $CurrentKey:=$1 $Value1:=$2 $Value2:=$3 If (Count parameters>3) $Value3:=$4 Else Value3:=0 End if $DupExists:=False CREATE SET([Contracts];"Duplicates") SET QUERY DESTINATION(Into set;"Duplicates") QUERY([Table];[Table]Value1=$Value1;*) QUERY([Table]; & ;[Table]Value2=$Value2;*) if($Value3#0) QUERY([Table]; & ;[Table]CAID=$Value3;*) End if QUERY([Table]; & ;[Table]Key#$CurrentKey) //In case record we are checking has already been saved If (Records in set("Duplicates")>0) $DupExists:=True End if SET QUERY DESTINATION(Into current selection) CLEAR SET("Duplicates") $0:=$DupExists JJ > On Oct 31, 2017, at 3:12 PM, 4d_tech-requ...@lists.4d.com wrote: > > Message: 4 > Date: Tue, 31 Oct 2017 14:40:50 -0600 > From: David Witton <dhwit...@gmail.com <mailto:dhwit...@gmail.com>> > To: 4D iNug Technical <4d_tech@lists.4d.com <mailto:4d_tech@lists.4d.com>> > Subject: identify duplicates > Message-ID: > <cae-8fqy9hmzfuptgxd0--cohhuu+diumg9ne6mlignduu_x...@mail.gmail.com > <mailto:cae-8fqy9hmzfuptgxd0--cohhuu+diumg9ne6mlignduu_x...@mail.gmail.com>> > Content-Type: text/plain; charset="UTF-8" > > I'm looking for a strategy to identify duplicate records in a table - that > is, records for which 3 fields are identical across two or more records - > or in another case, where a single field is not unique. > > Does anyone have a suggestion on how to proceed? > > -- > David Witton > > ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: identify duplicates
One of the things others have mentioned, but have not really expounded upon... What constitutes a duplicate? For an individual field this can be complex, for multiple fields... For some data types (numerics) a duplicate is easy to define 2 = 2 = 2 = 2 (integer/long integer) For others (text) it is not simple. A = A But does A = a ? or A = ? (such as A[or a] + umlaut) Does Mike MacKenzie = MacKenzie, Mike ? If you have 3 fields which you are looking to locate duplication it becomes more of an issue (assuming each field is text). It also could be an issue depending on how the 3 fields are concatenated. ex. Name + longint ID + street address How is the name constructed? is it consistent across all data (Mike MacKenzie vs MacKenzie Mike vs Michael MacKenzie) How is the longint ID converted to text? String(ID), or String(ID;), is the padding long enough? How is the street Name constructed? Number Street Name Designation (123 East West Ave) Is the street name construct consistent across all data? (123 East West Ave vs 123 East West vs 123 East West Avenue) Assuming (a BIG assumption), that the data is 'clean', i.e. all the data is formatted similarly) a couple of approaches: - might be to use the function DISTINCT VALUES (with counting option On - if available in your version of 4D) on all 3 fields independently then use one of the arrays to loop over the data looking for duplicate values in the other fields. - if you can modify the data (add a field to the table), and the amount of data is not too large, or you have very little execution time constraint, you could add a field which would hold the concatenated 3 field values, then use DISTINCT VALUES. - I have not really worked with this aspect of 4D, so I am not sure, but I know that it is possible to build a composite index (2 [or more?] fields to make a single index. Whether you can use the composite index with DISTINCT VALUES I am not sure. I'm sure there are ways in which the approach can be made efficient, but it will definitely depend on the data, and what constitutes a duplicate. On Tue, 31 Oct 2017 14:40:50 -0600, David Witton via 4D_Tech wrote: > I'm looking for a strategy to identify duplicate records in a table - that > is, records for which 3 fields are identical across two or more records - > or in another case, where a single field is not unique. > > Does anyone have a suggestion on how to proceed? > > -- > David Witton > ** > 4D Internet Users Group (4D iNUG) > FAQ: http://lists.4d.com/faqnug.html > Archive: http://lists.4d.com/archives.html > Options: http://lists.4d.com/mailman/options/4d_tech > Unsub: mailto:4d_tech-unsubscr...@lists.4d.com > ** --- Gas is for washing parts Alcohol is for drinkin' Nitromethane is for racing ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: identify duplicates
> Le 31 oct. 2017 à 21:40, David Witton via 4D_Tech <4d_tech@lists.4d.com> a > écrit : > > I'm looking for a strategy to identify duplicate records in a table - that > is, records for which 3 fields are identical across two or more records - > or in another case, where a single field is not unique. > > Does anyone have a suggestion on how to proceed? I have some code to find duplicates of a combination of N fields in a given selection. The result is a sub selection of found duplicates. Is that the need? Surely not the fastest solution, BTW… -- Arnaud de Montard ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: identify duplicates
The third parameter of DISTINCT VALUES really saves work. Here is a strategy to find dups in a field within a selection. The set can be a listbox highlight set. // // Method: DuplicatesToSet // - // INPUT1: Pointer - field // INPUT2: Text - Set Name // OUTPUT: // $field:=$1 $set:=$2 ARRAY TEXT($aVals;0) ARRAY LONGINT($aCount;0) DISTINCT VALUES($field->;$aVals;$aCount) SORT ARRAY($aCount;$aVals;<) $pos:=Find in array($aCount;1) If ($pos#-1) ARRAY TEXT($aVals;$pos-1) End if SET QUERY DESTINATION(Into set;$set) QUERY SELECTION WITH ARRAY(;$field->;$aVals) SET QUERY DESTINATION(Into current selection) Keith - CDI > On Oct 31, 2017, at 3:40 PM, David Witton via 4D_Tech <4d_tech@lists.4d.com> > wrote: > > I'm looking for a strategy to identify duplicate records in a table - that > is, records for which 3 fields are identical across two or more records - > or in another case, where a single field is not unique. > > Does anyone have a suggestion on how to proceed? > > -- > David Witton ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
RE: identify duplicates
Hi David, Perhaps using GROUP BY like Neil suggested in an SQL call in a similar fashion to the following will help: Begin SQL SELECT tbl.fld_1,tbl.fld_2,tbl.fld_3,Count(*) AS recCount FROM tbl GROUP BY tbl.fld_1,tbl.fld_2,tbl.fld_3 HAVING Count(*)>1 INTO :$arrFld1,:$arrFld2,:$arrFld3,:$arrCount; End SQL The size of the arrays show how many “DISTINCT” set up duplicates you have and the ones in the same element position show the values and the Count of duplicates found. Best Regards, -Tai B. From: David Witton [mailto:dhwit...@gmail.com] Sent: Tuesday, October 31, 2017 2:07 PM To: Tai Bui Subject: Re: identify duplicates It helps with the single field problem - now I'm trying to solve the multiple field problem without creating a new field for each combination of fields I want to examine. On Tue, Oct 31, 2017 at 2:43 PM, Tai Bui <t...@4d.com<mailto:t...@4d.com>> wrote: Hi David, Would the DISTINCT VALUES command help? http://doc.4d.com/4Dv16R4/4D/16-R4/DISTINCT-VALUES.301-3317277.en.html Best Regards, -Tai B. Tai Bui Technical Services Engineer 4D Inc 95 S. Market Street, Suite #240 San Jose,CA 95113 United States Telephone: +1-408-557-4600<tel:%2B1-408-557-4600> Fax: +1-408-271-5080<tel:%2B1-408-271-5080> Email: t...@4d.com<mailto:t...@4d.com> Web: www.4d.com<http://www.4d.com> -Original Message- From: 4D_Tech [mailto:4d_tech-boun...@lists.4d.com<mailto:4d_tech-boun...@lists.4d.com>] On Behalf Of David Witton via 4D_Tech Sent: Tuesday, October 31, 2017 1:41 PM To: 4D iNug Technical Cc: David Witton Subject: identify duplicates I'm looking for a strategy to identify duplicate records in a table - that is, records for which 3 fields are identical across two or more records - or in another case, where a single field is not unique. Does anyone have a suggestion on how to proceed? -- David Witton ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com<mailto:4d_tech-unsubscr...@lists.4d.com> ** -- David Witton ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: identify duplicates
> I'm looking for a strategy to identify duplicate records in a table - that > is, records for which 3 fields are identical across two or more records - > or in another case, where a single field is not unique. If you are just looking for fields that are exactly the same, the SQL with GroupBy and Having gives a lot of flexibility Also the Distinct Values in most recent versions has a count array as a second parameter, in prior versions will just give you the list of unique values Neil ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
Re: identify duplicates
As has been recommended distinct values is your friend but it really depends upon what you mean by duplicates. For example Charles and Karl are the same name but in different languages. So what do you mean by duplicates. Also is case important or do able and Able equal each other Regards Chuck Sent from my iPhone > On Oct 31, 2017, at 4:40 PM, David Witton via 4D_Tech <4d_tech@lists.4d.com> > wrote: > > I'm looking for a strategy to identify duplicate records in a table - that > is, records for which 3 fields are identical across two or more records - > or in another case, where a single field is not unique. > > Does anyone have a suggestion on how to proceed? ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
RE: identify duplicates
Hi David, Would the DISTINCT VALUES command help? http://doc.4d.com/4Dv16R4/4D/16-R4/DISTINCT-VALUES.301-3317277.en.html Best Regards, -Tai B. -Original Message- From: 4D_Tech [mailto:4d_tech-boun...@lists.4d.com] On Behalf Of David Witton via 4D_Tech Sent: Tuesday, October 31, 2017 1:41 PM To: 4D iNug Technical Cc: David Witton Subject: identify duplicates I'm looking for a strategy to identify duplicate records in a table - that is, records for which 3 fields are identical across two or more records - or in another case, where a single field is not unique. Does anyone have a suggestion on how to proceed? -- David Witton ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com ** ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **
identify duplicates
I'm looking for a strategy to identify duplicate records in a table - that is, records for which 3 fields are identical across two or more records - or in another case, where a single field is not unique. Does anyone have a suggestion on how to proceed? -- David Witton ** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **