Re: Preventing Duplicate Data WAS UUID vs Longint primary key

2017-08-08 Thread Jody Bevan via 4D_Tech
Chip:

I have always tried doing at the point of data entry. Obviously no code is 
perfect, and either is the object between the keyboard and the chair. My 
experience is that few Administrators would take the time to filter through 
duplicates. Getting them to do system maintenance was almost impossible. They 
had too many other responsibilities. Typically they only worked on this if a 
problem was identified. For that we have a built in record merger.

Application: Medical

If a phone number is associated with the record for an individual - then I will 
use that. How it is implemented depends on the application (even within the 
application).

Lets say I am adding in a person. They enter in a name (which is not that good 
as people will call themselves by different names). I remember one patient that 
would present with a different name (About 6 personalities) in a walk in 
clinic. If a new record was going to be created they would try and get a phone 
number from the patient. When the phone number was put in a dialog would come 
up displaying all the names of people with the same phone number. It let the 
staff (if they ‘felt like it’) to try and filter for duplicates at that point.

For those staff that were diligent it prevented many duplicates in the medical 
data base. We had clinics that had 420,000 ‘regular’ patients. Then they had a 
walk in that had another 500,000 irregular patients. Their Administration was 
more eager to keep the duplicates down.

We also used the Health Insurance Number as another indicator. Not all 
patient’s would present with their health care card though. With ‘universal’ 
health care it is not as critical.

Application: National Order / Deliver Desk

We do the same in Canada for Postal Code and Street name

With a national order delivery system we had the addresses divided up so that 
we could look up Postal Codes (Canada) by address. This was also used for 
keeping duplicates down. This is when I learned that in Canada Postal Codes 
change weekly (nationally - there are postal codes changing somewhere).


Looking in your system for unique or even semi-unique data can be used to help 
keep duplicates down.
In the medical system we let them have a picture in the patient’s record that 
was presented (if security settings permitted) at the time of entry to the 
facility.

Those are the ones that I have used.

Jody



Jody Bevan
ARGUS Productions Inc.
Developer

Argus Productions Inc. 




> On Aug 8, 2017, at 8:25 AM, Chip Scheide via 4D_Tech <4d_tech@lists.4d.com> 
> wrote:
> 
> Jody,
> what are your normal duplicate reducing/removal technic(s)?

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

2017-08-08 Thread Chip Scheide via 4D_Tech
Jody,
what are your normal duplicate reducing/removal technic(s)?

On Mon, 7 Aug 2017 12:28:21 -0600, Jody Bevan via 4D_Tech wrote:
> John:
> 
> Thanks for sharing this. I had not thought of this way after all 
> these years. I will take the idea and apply where appropriate, in 
> addition to my normal code to reduce duplicates.
> 
> 
> Jody Bevan
> ARGUS Productions Inc.
> Developer
> 
> Argus Productions Inc. 
> 
> 
> 
> 
>> On Aug 7, 2017, at 11:50 AM, John Baughman via 4D_Tech 
>> <4d_tech@lists.4d.com> wrote:
>> 
>> What I have done is to have a field in the table  that contains the 
>> keys that make the record unique with all spaces, special 
>> characters, and vowels removed, as well as eliminating any 
>> consecutive consonants . See my example below. So far this has 
>> worked pretty well for me and I guess would fall under David’s 
>> category of "Carefully program your system to detect and prevent 
>> duplicate rows”
>> 
>> John
>> 
>> ―― Example 
>> $firstName:=[Contacts]firstName (John)
>> $lastName:=[Contacts]lastName (Baughman)
>> $company:=[Contacts]company (BY’te DESIGN Hawwaii)  notice I have 
>> mistakenly put more than 1 w in Hawaii.
>> $DupeCheck:= $firstName+ $lastName+$company
>> [Contacts]DupeCheck:=AlphaOnlyNoVowels ($DupeCheck;"*”)  
>> //AlphaOnlyNoVowels does the heavy lifting. The asterisk tells the 
>> method to remove consecutive consonants.
>> 
>> [Contacts]DupeCheck now contains  “JHNBGHMNBYTDSGNHW”
>> 
>> I wrap the above in a duplicate checking method for the [Contacts] 
>> table called ContactsDuplicateManager
>> 
>> Whenever a record is updated or created in the Contacts table…
>> 
>> $DupeCheck:= ContactsDuplicateManager ("isDuplcate”;[Contacts]ID)
>> 
>> The ContactsDuplicate method creates the check string as above and 
>> searches the contacts table for duplicates using the 
>> [Contacts]DupeCheck field. If no duplicates are found it returns the 
>> check string. If a duplicate is found it returns the check string 
>> with a prepended asterisk.  The contact ID if passed prevents the 
>> dupe check from finding the record being updated. If this is a new 
>> record 0 is passed for the Contact ID. So…
>> 
>> If ($DupeCheck =“*@“
>>   Handle the duplicate in context. If, for example this is a user 
>> updating or creating a contact record, warn the user of the possible 
>> duplicate with available options.
>> 
>> else
>>   [Contacts]DupeCheck:=$DupeCheck
>>   SAVE RECORD([Contacts)
>> 
>> end if
>> 
>> -
>> 
> 
> **
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **
---
Gas is for washing parts
Alcohol is for drinkin'
Nitromethane is for racing 
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: ​Re: Preventing Duplicate Data WAS UUID vs Longint primary key

2017-08-07 Thread David Adams via 4D_Tech
I've found that after hours "helper" routines that fire off and run in the
> background working through data to flag dupes for admin oversight next
day is
> popular with some managers who prefer to make their own decision about
whether
> some stuff really is a dupe or not. Some duplicate data has to be
eliminated
> "right now" before the record is saved; whereas some might be interesting
to
> investigate a little more in depth. The whys and wherefores --what caused
it,
> who caused it, where did it come from, and why, is it legit or a real
mistake;
> such questions can often lead to better processes, better training, better
> form design, better import provisioning, or better import pre-cleaning,
etc.
> It all depends on the project and biz needs.

Nicely described. Human judgement is pretty important or critical with some
"duplicate" row choices. Says the guy with a name so common he's seen his
name taken of a flight manifest because they already had one David Adams on
board. Man, that guy sucks. Absolutely agree about the training benefits of
*quick* feedback on duplicates. You run a scan at night, find a list of
possible duplicates and then can go and find the person responsible. Like
you said, the perfect opportuity for better data entry tools ("Oh, the big
screen is too hard to enter when we get a phone call, so we always just add
a new customr record"), better training (see previous) and, hopefully,
better communications amongst everyone involved.
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

​Re: Preventing Duplicate Data WAS UUID vs Longint primary key

2017-08-07 Thread steve simpson via 4D_Tech
On Mon, Aug 7, 2017 at 3:00 PM,
​
​
David Adams  wrote:

>
>
> > How do you deal with that problem (Preventing duplicate data)
>

Definitely
​
"Carefully program your system to detect and prevent duplicate rows" as
appropriate. Generally such a Dupe Check can take many forms depending on
the business needs, the data in question, and the data entry
process/environment. I've not yet found a solution that fits all. Generally
it is much easier to do this if the "data entry user" is a browser post -
where you have the time and space to do more complicated look-ups. (More
and more of my own projects are web front ends to 4D in the backend.)
 ​I've used similar constructions as John's ContactsDuplicateManager example
as well, although I steer away from storing extra data if I can. For less
immediacy data needs I've found that after hours "helper" routines that
fire off and run in the background working through data to flag dupes for
admin oversight next day is popular with some managers who prefer to make
their own decision about whether some stuff really is a dupe or not. Some
duplicate data has to be eliminated "right now" before the record is saved;
whereas some might be interesting to investigate a little more in depth.
The whys and wherefores --what caused it, who caused it, where did it come
from, and why, is it legit or a real mistake; such questions can often lead
to better processes, better training, better form design, better import
provisioning, or better import pre-cleaning, etc. It all depends on the
project and biz needs.

Steve Simpson
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

2017-08-07 Thread David Adams via 4D_Tech
As a variant on John's technique, I'll combine fields into a single text
block that I then run through a fast hashing algorithm that returns a
longint.

What good is that longint? It helps in two cases:

* If you're comparing two copies of the same record during an update/sync,
etc., then you can hash the new copy and see if the hash differs from the
stored original. If they match, you can figure there's been no update.
[Subject to availability, limitations apply. See 'hashing' for complete
terms and details.]

* When you don't know if the row is a duplicate or not, hash the incoming
data and see if it matches something else. If it  does not match, you've
got a new row. If it does match one or more rows, you *might* have a
duplicate, but at least you only need to gets a small # of records to find
out.

Regarding hashing, I don't use SHA1, MD5, etc. Because I don't need them
and don't want the overhead. Instead I use some hashing functions from an
old (10+ years ago) tech notes. They still work great and are a good match
for exactly this sort of application.
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

2017-08-07 Thread Jody Bevan via 4D_Tech
John:

Thanks for sharing this. I had not thought of this way after all these years. I 
will take the idea and apply where appropriate, in addition to my normal code 
to reduce duplicates.


Jody Bevan
ARGUS Productions Inc.
Developer

Argus Productions Inc. 




> On Aug 7, 2017, at 11:50 AM, John Baughman via 4D_Tech <4d_tech@lists.4d.com> 
> wrote:
> 
> What I have done is to have a field in the table  that contains the keys that 
> make the record unique with all spaces, special characters, and vowels 
> removed, as well as eliminating any consecutive consonants . See my example 
> below. So far this has worked pretty well for me and I guess would fall under 
> David’s category of "Carefully program your system to detect and prevent 
> duplicate rows”
> 
> John
> 
> —— Example 
> $firstName:=[Contacts]firstName (John)
> $lastName:=[Contacts]lastName (Baughman)
> $company:=[Contacts]company (BY’te DESIGN Hawwaii)  notice I have mistakenly 
> put more than 1 w in Hawaii.
> $DupeCheck:= $firstName+ $lastName+$company
> [Contacts]DupeCheck:=AlphaOnlyNoVowels ($DupeCheck;"*”)  //AlphaOnlyNoVowels 
> does the heavy lifting. The asterisk tells the method to remove consecutive 
> consonants.
> 
> [Contacts]DupeCheck now contains  “JHNBGHMNBYTDSGNHW”
> 
> I wrap the above in a duplicate checking method for the [Contacts] table 
> called ContactsDuplicateManager
> 
> Whenever a record is updated or created in the Contacts table…
> 
> $DupeCheck:= ContactsDuplicateManager ("isDuplcate”;[Contacts]ID)
> 
> The ContactsDuplicate method creates the check string as above and searches 
> the contacts table for duplicates using the [Contacts]DupeCheck field. If no 
> duplicates are found it returns the check string. If a duplicate is found it 
> returns the check string with a prepended asterisk.  The contact ID if passed 
> prevents the dupe check from finding the record being updated. If this is a 
> new record 0 is passed for the Contact ID. So…
> 
> If ($DupeCheck =“*@“
>   Handle the duplicate in context. If, for example this is a user updating or 
> creating a contact record, warn the user of the possible duplicate with 
> available options.
> 
> else
>   [Contacts]DupeCheck:=$DupeCheck
>   SAVE RECORD([Contacts)
> 
> end if
> 
> -
> 

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

2017-08-07 Thread John Baughman via 4D_Tech
What I have done is to have a field in the table  that contains the keys that 
make the record unique with all spaces, special characters, and vowels removed, 
as well as eliminating any consecutive consonants . See my example below. So 
far this has worked pretty well for me and I guess would fall under David’s 
category of "Carefully program your system to detect and prevent duplicate rows”

John

—— Example 
$firstName:=[Contacts]firstName (John)
$lastName:=[Contacts]lastName (Baughman)
$company:=[Contacts]company (BY’te DESIGN Hawwaii)  notice I have mistakenly 
put more than 1 w in Hawaii.
$DupeCheck:= $firstName+ $lastName+$company
[Contacts]DupeCheck:=AlphaOnlyNoVowels ($DupeCheck;"*”)  //AlphaOnlyNoVowels 
does the heavy lifting. The asterisk tells the method to remove consecutive 
consonants.

[Contacts]DupeCheck now contains  “JHNBGHMNBYTDSGNHW”

I wrap the above in a duplicate checking method for the [Contacts] table called 
ContactsDuplicateManager

Whenever a record is updated or created in the Contacts table…

$DupeCheck:= ContactsDuplicateManager ("isDuplcate”;[Contacts]ID)

The ContactsDuplicate method creates the check string as above and searches the 
contacts table for duplicates using the [Contacts]DupeCheck field. If no 
duplicates are found it returns the check string. If a duplicate is found it 
returns the check string with a prepended asterisk.  The contact ID if passed 
prevents the dupe check from finding the record being updated. If this is a new 
record 0 is passed for the Contact ID. So…

If ($DupeCheck =“*@“
   Handle the duplicate in context. If, for example this is a user updating or 
creating a contact record, warn the user of the possible duplicate with 
available options.

else
   [Contacts]DupeCheck:=$DupeCheck
   SAVE RECORD([Contacts)
  
end if

-





> On Aug 7, 2017, at 6:56 AM, Dennis, Neil via 4D_Tech <4d_tech@lists.4d.com> 
> wrote:
> 
>> How do you deal with that problem (Preventing duplicate data)
> 
> When unique data is required because of a business need, I do implement one 
> of your suggested methods: "Carefully program your system to detect and 
> prevent duplicate rows."
> 
> I would suggest not doing this in a trigger, but instead on data entry 
> (imports, user entry). The 4D command "Find in Field" works in many of these 
> cases.
> 
> Neil
> 
> 
> 
> 
> --
> 
> 
> Privacy Disclaimer: This message contains confidential information and is 
> intended only for the named addressee. If you are not the named addressee you 
> should not disseminate, distribute or copy this email. Please delete this 
> email from your system and notify the sender immediately by replying to this 
> email.  If you are not the intended recipient you are notified that 
> disclosing, copying, distributing or taking any action in reliance on the 
> contents of this information is strictly prohibited.
> 
> The Alternative Investments division of UMB Fund Services provides a full 
> range of services to hedge funds, funds of funds and private equity funds.  
> Any tax advice in this communication is not intended to be used, and cannot 
> be used, by a client or any other person or entity for the purpose of (a) 
> avoiding penalties that may be imposed on any taxpayer or (b) promoting, 
> marketing, or recommending to another party any matter addressed herein.
> **
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **

John Baughman
Kailua, Hawaii
(808) 262-0328
john...@hawaii.rr.com





**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: Preventing Duplicate Data WAS UUID vs Longint primary key

2017-08-07 Thread Chuck Miller via 4D_Tech
perhaps you can use sounded and store that in a hidden field.

Regards
Chuck

 Chuck Miller Voice: (617) 739-0306
 Informed Solutions, Inc. Fax: (617) 232-1064   
 mailto:cjmillerinformed-solutions.com 
 Brookline, MA 02446 USA Registered 4D Developer
   Providers of 4D and Sybase connectivity
  http://www.informed-solutions.com  

This message and any attached documents contain information which may be 
confidential, subject to privilege or exempt from disclosure under applicable 
law.  These materials are intended only for the use of the intended recipient. 
If you are not the intended recipient of this transmission, you are hereby 
notified that any distribution, disclosure, printing, copying, storage, 
modification or the taking of any action in reliance upon this transmission is 
strictly prohibited.  Delivery of this message to any person other than the 
intended recipient shall not compromise or waive such confidentiality, 
privilege or exemption from disclosure as to this communication. 

> On Aug 7, 2017, at 12:56 PM, Dennis, Neil via 4D_Tech <4d_tech@lists.4d.com> 
> wrote:
> 
> When unique data is required because of a business need, I do implement one 
> of your suggested methods: "Carefully program your system to detect and 
> prevent duplicate rows."
> 
> I would suggest not doing this in a trigger, but instead on data entry 
> (imports, user entry). The 4D command "Find in Field" works in many of these 
> cases.
> 
> Neil

**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Preventing Duplicate Data WAS UUID vs Longint primary key

2017-08-07 Thread Dennis, Neil via 4D_Tech
> How do you deal with that problem (Preventing duplicate data)

When unique data is required because of a business need, I do implement one of 
your suggested methods: "Carefully program your system to detect and prevent 
duplicate rows."

I would suggest not doing this in a trigger, but instead on data entry 
(imports, user entry). The 4D command "Find in Field" works in many of these 
cases.

Neil




--


Privacy Disclaimer: This message contains confidential information and is 
intended only for the named addressee. If you are not the named addressee you 
should not disseminate, distribute or copy this email. Please delete this email 
from your system and notify the sender immediately by replying to this email.  
If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited.

The Alternative Investments division of UMB Fund Services provides a full range 
of services to hedge funds, funds of funds and private equity funds.  Any tax 
advice in this communication is not intended to be used, and cannot be used, by 
a client or any other person or entity for the purpose of (a) avoiding 
penalties that may be imposed on any taxpayer or (b) promoting, marketing, or 
recommending to another party any matter addressed herein.
**
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**