Re: identify duplicates

2017-11-01 Thread Reiner Kief via 4D_Tech
What about comparing hashes?

> Am 31.10.2017 um 21:40 schrieb David Witton via 4D_Tech 
> <4d_tech@lists.4d.com>:
> Does anyone have a suggestion on how to proceed?


Reiner Kief
Software & IT-Consulting
Weinbergstrasse 45
63853 Moemlingen
Germany

www.kief-online.de
Member of the german developer network die4DWerkstatt









**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: identify duplicates

2017-11-01 Thread Janet Jonas via 4D_Tech
Here is a strategy I have used since V6 to enforce a semi-informal primary key 
for records in an important file. Another parameter was added when our business 
changed a few years ago. I think you could make it work for your task:

C_BOOLEAN($0;$DupExists)
C_TEXT($2;$3;$Value1;$Value2)
C_LONGINT($1;$CurrentKey;$4;$Value3)

$CurrentKey:=$1
$Value1:=$2
$Value2:=$3
If (Count parameters>3)
$Value3:=$4
Else
Value3:=0
End if 

$DupExists:=False

CREATE SET([Contracts];"Duplicates")
SET QUERY DESTINATION(Into set;"Duplicates")

QUERY([Table];[Table]Value1=$Value1;*)
QUERY([Table]; & ;[Table]Value2=$Value2;*)
if($Value3#0)
QUERY([Table]; & ;[Table]CAID=$Value3;*)
End if
QUERY([Table]; & ;[Table]Key#$CurrentKey)  //In case record we are checking has 
already been saved

If (Records in set("Duplicates")>0)
$DupExists:=True
End if 
SET QUERY DESTINATION(Into current selection)
CLEAR SET("Duplicates")

$0:=$DupExists

JJ


> On Oct 31, 2017, at 3:12 PM, 4d_tech-requ...@lists.4d.com wrote:
> 
> Message: 4
> Date: Tue, 31 Oct 2017 14:40:50 -0600
> From: David Witton <dhwit...@gmail.com <mailto:dhwit...@gmail.com>>
> To: 4D iNug Technical <4d_tech@lists.4d.com <mailto:4d_tech@lists.4d.com>>
> Subject: identify duplicates
> Message-ID:
>   <cae-8fqy9hmzfuptgxd0--cohhuu+diumg9ne6mlignduu_x...@mail.gmail.com 
> <mailto:cae-8fqy9hmzfuptgxd0--cohhuu+diumg9ne6mlignduu_x...@mail.gmail.com>>
> Content-Type: text/plain; charset="UTF-8"
> 
> I'm looking for a strategy to identify duplicate records in a table - that
> is, records for which 3 fields are identical across two or more records -
> or in another case, where a single field is not unique.
> 
> Does anyone have a suggestion on how to proceed?
> 
> -- 
> David Witton
> 
> 

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: identify duplicates

2017-11-01 Thread Chip Scheide via 4D_Tech
One of the things others have mentioned, but have not really expounded 
upon...
What constitutes a duplicate?
For an individual field this can be complex, for multiple fields...

For some data types (numerics) a duplicate is easy to define
2 = 2 = 2 = 2 (integer/long integer)

For others (text) it is not simple.
A = A
But does 
A = a ?
or
A = ? (such as A[or a] + umlaut) 

Does
Mike MacKenzie = MacKenzie, Mike ?

If you have 3 fields which you are looking to locate duplication it 
becomes more of an issue (assuming each field is text).
It also could be an issue depending on how the 3 fields are 
concatenated.
ex.  Name + longint ID + street address

How is the name constructed? is it consistent across all data (Mike 
MacKenzie vs MacKenzie Mike vs Michael MacKenzie)
How is the longint ID converted to text?  String(ID), or 
String(ID;), is the padding long enough?
How is the street Name constructed? Number Street Name Designation (123 
East West Ave)
Is the street name construct consistent across all data? (123 East West 
Ave vs 123 East West vs 123 East West Avenue)

Assuming (a BIG assumption), that the data is 'clean', i.e. all the 
data is formatted similarly) a couple of approaches:
- might be to use the function DISTINCT VALUES (with counting option On 
- if available in your version of 4D) on all 3 fields independently 
then use one of the arrays to loop over the data looking for duplicate 
values in the other fields.

- if you can modify the data (add a field to the table), and the amount 
of data is not too large, or you have very little execution time 
constraint, you could add a field which would hold the concatenated 3 
field values, then use DISTINCT VALUES.

- I have not really worked with this aspect of 4D, so I am not sure, 
but I know that it is possible to build a composite index (2 [or more?] 
fields to make a single index. Whether you can use the composite index 
with DISTINCT VALUES I am not sure.

I'm sure there are ways in which the approach can be made efficient, 
but it will definitely depend on the data, and what constitutes a 
duplicate.

On Tue, 31 Oct 2017 14:40:50 -0600, David Witton via 4D_Tech wrote:
> I'm looking for a strategy to identify duplicate records in a table - that
> is, records for which 3 fields are identical across two or more records -
> or in another case, where a single field is not unique.
> 
> Does anyone have a suggestion on how to proceed?
> 
> -- 
> David Witton
> **
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **
---
Gas is for washing parts
Alcohol is for drinkin'
Nitromethane is for racing 
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: identify duplicates

2017-11-01 Thread Arnaud de Montard via 4D_Tech

> Le 31 oct. 2017 à 21:40, David Witton via 4D_Tech <4d_tech@lists.4d.com> a 
> écrit :
> 
> I'm looking for a strategy to identify duplicate records in a table - that
> is, records for which 3 fields are identical across two or more records -
> or in another case, where a single field is not unique.
> 
> Does anyone have a suggestion on how to proceed?

I have some code to find duplicates of a combination of N fields in a given 
selection. The result is a sub selection of found duplicates. Is that the need? 
Surely not the fastest solution, BTW… 

-- 
Arnaud de Montard 




**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: identify duplicates

2017-10-31 Thread Keith Culotta via 4D_Tech
The third parameter of DISTINCT VALUES really saves work.  Here is a strategy 
to find dups in a field within a selection.  The set can be a listbox highlight 
set.

  // 
  // Method: DuplicatesToSet
  // - 
  // INPUT1: Pointer - field
  // INPUT2: Text - Set Name
  // OUTPUT: 
  // 
$field:=$1
$set:=$2
ARRAY TEXT($aVals;0)
ARRAY LONGINT($aCount;0)

DISTINCT VALUES($field->;$aVals;$aCount)
SORT ARRAY($aCount;$aVals;<)
$pos:=Find in array($aCount;1)

If ($pos#-1)
ARRAY TEXT($aVals;$pos-1)
End if 

SET QUERY DESTINATION(Into set;$set)
QUERY SELECTION WITH ARRAY(;$field->;$aVals)
SET QUERY DESTINATION(Into current selection)


Keith - CDI

> On Oct 31, 2017, at 3:40 PM, David Witton via 4D_Tech <4d_tech@lists.4d.com> 
> wrote:
> 
> I'm looking for a strategy to identify duplicate records in a table - that
> is, records for which 3 fields are identical across two or more records -
> or in another case, where a single field is not unique.
> 
> Does anyone have a suggestion on how to proceed?
> 
> -- 
> David Witton 
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

RE: identify duplicates

2017-10-31 Thread Tai Bui via 4D_Tech
Hi David,

Perhaps using GROUP BY like Neil suggested in an SQL call in a similar fashion 
to the following will help:

Begin SQL
 SELECT tbl.fld_1,tbl.fld_2,tbl.fld_3,Count(*) AS recCount
 FROM tbl
 GROUP BY tbl.fld_1,tbl.fld_2,tbl.fld_3
 HAVING Count(*)>1
 INTO :$arrFld1,:$arrFld2,:$arrFld3,:$arrCount;
End SQL

The size of the  arrays show how many “DISTINCT” set up duplicates you have and 
the ones in the same element position show the values and the Count of 
duplicates found.

Best Regards,
-Tai B.




From: David Witton [mailto:dhwit...@gmail.com]
Sent: Tuesday, October 31, 2017 2:07 PM
To: Tai Bui
Subject: Re: identify duplicates

It helps with the single field problem - now I'm trying to solve the multiple 
field problem without creating a new field for each combination of fields I 
want to examine.

On Tue, Oct 31, 2017 at 2:43 PM, Tai Bui <t...@4d.com<mailto:t...@4d.com>> 
wrote:
Hi David,

Would the DISTINCT VALUES command help?
http://doc.4d.com/4Dv16R4/4D/16-R4/DISTINCT-VALUES.301-3317277.en.html

Best Regards,
-Tai B.




Tai Bui
Technical Services Engineer

4D Inc
95 S. Market Street, Suite #240
San Jose,CA 95113
United States

Telephone: +1-408-557-4600<tel:%2B1-408-557-4600>
Fax:   +1-408-271-5080<tel:%2B1-408-271-5080>
Email: t...@4d.com<mailto:t...@4d.com>
Web:   www.4d.com<http://www.4d.com>

-Original Message-
From: 4D_Tech 
[mailto:4d_tech-boun...@lists.4d.com<mailto:4d_tech-boun...@lists.4d.com>] On 
Behalf Of David Witton via 4D_Tech
Sent: Tuesday, October 31, 2017 1:41 PM
To: 4D iNug Technical
Cc: David Witton
Subject: identify duplicates

I'm looking for a strategy to identify duplicate records in a table - that is, 
records for which 3 fields are identical across two or more records - or in 
another case, where a single field is not unique.

Does anyone have a suggestion on how to proceed?

--
David Witton
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  
mailto:4d_tech-unsubscr...@lists.4d.com<mailto:4d_tech-unsubscr...@lists.4d.com>
**



--
David Witton
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: identify duplicates

2017-10-31 Thread npdennis via 4D_Tech
> I'm looking for a strategy to identify duplicate records in a table - that
> is, records for which 3 fields are identical across two or more records -
> or in another case, where a single field is not unique.


If you are just looking for fields that are exactly the same, the SQL with 
GroupBy and Having gives a lot of flexibility
Also the Distinct Values in most recent versions has a count array as a second 
parameter, in prior versions will just give you the list of unique values

Neil

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: identify duplicates

2017-10-31 Thread Chuck Miller via 4D_Tech
As has been recommended distinct values is your friend but it really depends 
upon what you mean by duplicates. For example Charles and Karl are the same 
name but in different languages. So what do you mean by duplicates. Also is 
case important or do able and Able equal each other 

Regards
Chuck 

Sent from my iPhone

> On Oct 31, 2017, at 4:40 PM, David Witton via 4D_Tech <4d_tech@lists.4d.com> 
> wrote:
> 
> I'm looking for a strategy to identify duplicate records in a table - that
> is, records for which 3 fields are identical across two or more records -
> or in another case, where a single field is not unique.
> 
> Does anyone have a suggestion on how to proceed?

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

RE: identify duplicates

2017-10-31 Thread Tai Bui via 4D_Tech
Hi David,

Would the DISTINCT VALUES command help?
http://doc.4d.com/4Dv16R4/4D/16-R4/DISTINCT-VALUES.301-3317277.en.html

Best Regards,
-Tai B.




-Original Message-
From: 4D_Tech [mailto:4d_tech-boun...@lists.4d.com] On Behalf Of David Witton 
via 4D_Tech
Sent: Tuesday, October 31, 2017 1:41 PM
To: 4D iNug Technical
Cc: David Witton
Subject: identify duplicates

I'm looking for a strategy to identify duplicate records in a table - that is, 
records for which 3 fields are identical across two or more records - or in 
another case, where a single field is not unique.

Does anyone have a suggestion on how to proceed?

--
David Witton
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

identify duplicates

2017-10-31 Thread David Witton via 4D_Tech
I'm looking for a strategy to identify duplicate records in a table - that
is, records for which 3 fields are identical across two or more records -
or in another case, where a single field is not unique.

Does anyone have a suggestion on how to proceed?

-- 
David Witton
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**