Re: How to hash fields and detect changes in a record
On June 14, 2022 10:29:40 PM CDT, Mike Dewhirst wrote: >On 14/06/2022 11:20 pm, Ryan Nowakowski wrote: >> >> Summing the ordinal of the characters won't catch transposition: >> >> >>> chars = 'ab' >> >>> sum([ord(c) for c in chars]) >> 195 >> >>> chars = 'ba' >> >>> sum([ord(c) for c in chars]) >> 195 >> >> Better to use a real hash algorithm if you're trying to detect changes. My >> note above about hashing not being required is because you don't need to >> detect changes because you explicitly already know when changes are being >> made. >> > >Thanks Ryan. > >It is all working now. I append " - No longer relevant" to the note title if >any change is detected. Otherwise the note gets deleted. > Good to hear! Seems like an interesting project. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/935A6EB4-032D-4264-BD83-24508764B189%40fattuba.com.
Re: How to hash fields and detect changes in a record
On 14/06/2022 11:20 pm, Ryan Nowakowski wrote: Summing the ordinal of the characters won't catch transposition: >>> chars = 'ab' >>> sum([ord(c) for c in chars]) 195 >>> chars = 'ba' >>> sum([ord(c) for c in chars]) 195 Better to use a real hash algorithm if you're trying to detect changes. My note above about hashing not being required is because you don't need to detect changes because you explicitly already know when changes are being made. Thanks Ryan. It is all working now. I append " - No longer relevant" to the note title if any change is detected. Otherwise the note gets deleted. Cheers Mike -- Signed email is an absolute defence against phishing. This email has been signed with my private key. If you import my public key you can automatically decrypt my signature and be sure it came from me. Just ask and I'll send it to you. Your email software can handle signing. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/75458bb4-bb4b-0e44-54c1-59768d34d403%40dewhirst.com.au. OpenPGP_signature Description: OpenPGP digital signature
Re: How to hash fields and detect changes in a record
On 6/12/22 11:40 PM, Mike Dewhirst wrote: Original message From: Ryan Nowakowski Date: 13/6/22 07:09 (GMT+10:00) To: django-users@googlegroups.com Subject: Re: How to hash fields and detect changes in a record On Sat, Jun 11, 2022 at 12:13:16AM +1000, Mike Dewhirst wrote: > On 10/06/2022 11:24 pm, Ryan Nowakowski wrote: > > On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote: > > > I think the solution might be to hash note.title and note.note into a new > > > field note.hash on being auto-created. On subsequent saves, compare the > > > latest hash with note.hash to decide whether to delete auto-inserted notes > > > prior to generating the next set. Those subsequent saves could be months or > > > years later. > > Hashing is useful if you want to check that something has been > > unexpectedly changed. I assume the note can only be changed through > > your web app so you know when a user is changing a note. > > These are automatically generated notes which taken together constitute > advice on how to deal with the analysis. Users can edit them. For example, > someone might record some action taken regarding the advice. I don't want to > delete that. If nothing has been edited, it is safe to delete. > > So how do I know it is the same as when originally generated - and safe to > delete - except by storing a hash of the interesting fields. Because when the user edits a note, during the form.save()(assuming you're using Django forms), you'll set `altered_by_user` to True. Notes can also be altered in the Admin You have a couple of choices then. You could alter the note details view in the admin to set the altered_by_user field. Alternatively and more generically, you could check the pk field in your model save method. If it is None, then you are creating a new note. If the pk field is not None, then you are updating an existing note so you can set altered_by_user to True. > And if that is the best approach, what sort of hashing will survive Python > upgrades etc? Pick a hash algorithm[1](ex: sha256). The output will remain the same even with Python upgrades. So the mechanism doesn't need to be a hash - as you said.I now just sum ord(char) for the title and the note and keep that in a flag field. Summing the ordinal of the characters won't catch transposition: chars = 'ab' sum([ord(c) for c in chars]) 195 chars = 'ba' sum([ord(c) for c in chars]) 195 Better to use a real hash algorithm if you're trying to detect changes. My note above about hashing not being required is because you don't need to detect changes because you explicitly already know when changes are being made. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/a0d61798-d885-dffd-bfbb-b23a63fbd820%40fattuba.com.
Re: How to hash fields and detect changes in a record
--(Unsigned mail from my phone) Original message From: Ryan Nowakowski Date: 13/6/22 07:09 (GMT+10:00) To: django-users@googlegroups.com Subject: Re: How to hash fields and detect changes in a record On Sat, Jun 11, 2022 at 12:13:16AM +1000, Mike Dewhirst wrote:> On 10/06/2022 11:24 pm, Ryan Nowakowski wrote:> > On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote:> > > I think the solution might be to hash note.title and note.note into a new> > > field note.hash on being auto-created. On subsequent saves, compare the> > > latest hash with note.hash to decide whether to delete auto-inserted notes> > > prior to generating the next set. Those subsequent saves could be months or> > > years later.> > Hashing is useful if you want to check that something has been> > unexpectedly changed. I assume the note can only be changed through> > your web app so you know when a user is changing a note.> > These are automatically generated notes which taken together constitute> advice on how to deal with the analysis. Users can edit them. For example,> someone might record some action taken regarding the advice. I don't want to> delete that. If nothing has been edited, it is safe to delete.> > So how do I know it is the same as when originally generated - and safe to> delete - except by storing a hash of the interesting fields.Because when the user edits a note, during the form.save()(assumingyou're using Django forms), you'll set `altered_by_user` to True.Notes can also be altered in the Admin> And if that is the best approach, what sort of hashing will survive Python> upgrades etc?Pick a hash algorithm[1](ex: sha256). The output will remain the sameeven with Python upgrades.So the mechanism doesn't need to be a hash - as you said. I now just sum ord(char) for the title and the note and keep that in a flag field.Only the auto-notes get a flag because they are the only ones I would consider deleting. [1] https://docs.python.org/3/library/hashlib.html> > Since you're> > expecting users to change some of the notes and you know when they do,> > hashing might be overkill. Instead, add a boolean `altered_by_user`> > field to the note model. Initially when you automatically create the> > note altered_by_user would be set to False. If a user changes the note,> > set altered_by_user to True.>> Not sure this would work. Note creation and eventually automatic deletion is> all driven from model methods executed on saving.Why wouldn't this work? During note creation, altered_by_user would beset to False automatically because that's the default. Whenautomatically deleting, do: Note.objects.filter(altered_by_user=False).delete()-- You received this message because you are subscribed to the Google Groups "Django users" group.To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com.to view this discussion on the web visit https://groups.google.com/d/msgid/django-users/20220612210931.GA32625%40fattuba.com. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/62a6c0f0.1c69fb81.c1d5.c26fSMTPIN_ADDED_MISSING%40gmr-mx.google.com.
Re: How to hash fields and detect changes in a record
On Sat, Jun 11, 2022 at 12:13:16AM +1000, Mike Dewhirst wrote: > On 10/06/2022 11:24 pm, Ryan Nowakowski wrote: > > On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote: > > > I think the solution might be to hash note.title and note.note into a new > > > field note.hash on being auto-created. On subsequent saves, compare the > > > latest hash with note.hash to decide whether to delete auto-inserted notes > > > prior to generating the next set. Those subsequent saves could be months > > > or > > > years later. > > Hashing is useful if you want to check that something has been > > unexpectedly changed. I assume the note can only be changed through > > your web app so you know when a user is changing a note. > > These are automatically generated notes which taken together constitute > advice on how to deal with the analysis. Users can edit them. For example, > someone might record some action taken regarding the advice. I don't want to > delete that. If nothing has been edited, it is safe to delete. > > So how do I know it is the same as when originally generated - and safe to > delete - except by storing a hash of the interesting fields. Because when the user edits a note, during the form.save()(assuming you're using Django forms), you'll set `altered_by_user` to True. > And if that is the best approach, what sort of hashing will survive Python > upgrades etc? Pick a hash algorithm[1](ex: sha256). The output will remain the same even with Python upgrades. [1] https://docs.python.org/3/library/hashlib.html > > Since you're > > expecting users to change some of the notes and you know when they do, > > hashing might be overkill. Instead, add a boolean `altered_by_user` > > field to the note model. Initially when you automatically create the > > note altered_by_user would be set to False. If a user changes the note, > > set altered_by_user to True. > > Not sure this would work. Note creation and eventually automatic deletion is > all driven from model methods executed on saving. Why wouldn't this work? During note creation, altered_by_user would be set to False automatically because that's the default. When automatically deleting, do: Note.objects.filter(altered_by_user=False).delete() -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/20220612210931.GA32625%40fattuba.com.
Re: How to hash fields and detect changes in a record
Ryan Thanks very much - you triggered the necessary amount of thinking and I reckon you are correct - hashing is overkill. I just need a self-referential value comparison where the value is independent of outside influences. That means I should not use a hash library from anywhere. I'll just convert all the chars in all the fields I'm interested in into integers and sum them into my "hash" field. Should be quick and easy! Cheers Mike On 11/06/2022 12:13 am, Mike Dewhirst wrote: On 10/06/2022 11:24 pm, Ryan Nowakowski wrote: On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote: The use case is auto-deletion of out-of-date records if they have not changed. That might sound weird but it is the solution I have come to for a particular problem. My software analyses chemical properties and writes note records containing advice, each with a FK to the chemical. When values change sufficiently on the chemical, the software would construct a set of mostly different note records. The problem is that note records still exist from the previous set of properties. These would definitely confuse the user and thereby invalidate the advice. You might consider versioning your chemical model objects. Then when values change sufficiently on the chemical model object, you can create a new version of the chemical object, leaving the old notes associated with the old version of the chemical object. In your web app, you could allow the users to browse old versions of the chemical including the notes which may have been altered. That's not really appropriate. The user doesn't care about older versions beyond annual an summary of the calculated analysis. As volumes (manufactured and/or imported) change the analysis and therefore current advice changes. There is no need to keep track of out-of-date advice notes. What really matters is that *when* things change the advice needs to change and the old advice needs to be deleted. The only reason I need to avoid deleting old notes is if the user has edited the advice itself - in any of the the individual notes. Probably it would be OK to delete an edited note because it is old advice BUT I feel it would be wrong for software to make that decision. As I said, I'm happy to document why. Just thinking about that, I could maybe adjust the note.title to append something like "Out of date" if I detect it has been edited. The workaround is for the user to delete all notes *prior* to re-saving and auto-generating a new correct set of notes. There is a proviso that you wouldn't want to delete notes altered by users. I would document that so users understand why the software skipped deleting those notes. I think the solution might be to hash note.title and note.note into a new field note.hash on being auto-created. On subsequent saves, compare the latest hash with note.hash to decide whether to delete auto-inserted notes prior to generating the next set. Those subsequent saves could be months or years later. Hashing is useful if you want to check that something has been unexpectedly changed. I assume the note can only be changed through your web app so you know when a user is changing a note. These are automatically generated notes which taken together constitute advice on how to deal with the analysis. Users can edit them. For example, someone might record some action taken regarding the advice. I don't want to delete that. If nothing has been edited, it is safe to delete. So how do I know it is the same as when originally generated - and safe to delete - except by storing a hash of the interesting fields. And if that is the best approach, what sort of hashing will survive Python upgrades etc? Since you're expecting users to change some of the notes and you know when they do, hashing might be overkill. Instead, add a boolean `altered_by_user` field to the note model. Initially when you automatically create the note altered_by_user would be set to False. If a user changes the note, set altered_by_user to True. Not sure this would work. Note creation and eventually automatic deletion is all driven from model methods executed on saving. If unchanged, the old note is safe to delete because it is no longer relevant. I've googled around and there are lots of possible solutions but it seems the major problem might be that hashes are difficult to guarantee when the environment - such as the version of Python - changes. Also, I'm not convinced I have chosen the correct strategy. Hope I've explained the problem adequately. -- Signed email is an absolute defence against phishing. This email has been signed with my private key. If you import my public key you can automatically decrypt my signature and be sure it came from me. Just ask and I'll send it to you. Your email software can handle signing. -- You received this message because you are subscribed to the Google Groups "Django users" group. To
Re: How to hash fields and detect changes in a record
On 10/06/2022 11:24 pm, Ryan Nowakowski wrote: On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote: The use case is auto-deletion of out-of-date records if they have not changed. That might sound weird but it is the solution I have come to for a particular problem. My software analyses chemical properties and writes note records containing advice, each with a FK to the chemical. When values change sufficiently on the chemical, the software would construct a set of mostly different note records. The problem is that note records still exist from the previous set of properties. These would definitely confuse the user and thereby invalidate the advice. You might consider versioning your chemical model objects. Then when values change sufficiently on the chemical model object, you can create a new version of the chemical object, leaving the old notes associated with the old version of the chemical object. In your web app, you could allow the users to browse old versions of the chemical including the notes which may have been altered. That's not really appropriate. The user doesn't care about older versions beyond annual an summary of the calculated analysis. As volumes (manufactured and/or imported) change the analysis and therefore current advice changes. There is no need to keep track of out-of-date advice notes. What really matters is that *when* things change the advice needs to change and the old advice needs to be deleted. The only reason I need to avoid deleting old notes is if the user has edited the advice itself - in any of the the individual notes. Probably it would be OK to delete an edited note because it is old advice BUT I feel it would be wrong for software to make that decision. As I said, I'm happy to document why. Just thinking about that, I could maybe adjust the note.title to append something like "Out of date" if I detect it has been edited. The workaround is for the user to delete all notes *prior* to re-saving and auto-generating a new correct set of notes. There is a proviso that you wouldn't want to delete notes altered by users. I would document that so users understand why the software skipped deleting those notes. I think the solution might be to hash note.title and note.note into a new field note.hash on being auto-created. On subsequent saves, compare the latest hash with note.hash to decide whether to delete auto-inserted notes prior to generating the next set. Those subsequent saves could be months or years later. Hashing is useful if you want to check that something has been unexpectedly changed. I assume the note can only be changed through your web app so you know when a user is changing a note. These are automatically generated notes which taken together constitute advice on how to deal with the analysis. Users can edit them. For example, someone might record some action taken regarding the advice. I don't want to delete that. If nothing has been edited, it is safe to delete. So how do I know it is the same as when originally generated - and safe to delete - except by storing a hash of the interesting fields. And if that is the best approach, what sort of hashing will survive Python upgrades etc? Since you're expecting users to change some of the notes and you know when they do, hashing might be overkill. Instead, add a boolean `altered_by_user` field to the note model. Initially when you automatically create the note altered_by_user would be set to False. If a user changes the note, set altered_by_user to True. Not sure this would work. Note creation and eventually automatic deletion is all driven from model methods executed on saving. If unchanged, the old note is safe to delete because it is no longer relevant. I've googled around and there are lots of possible solutions but it seems the major problem might be that hashes are difficult to guarantee when the environment - such as the version of Python - changes. Also, I'm not convinced I have chosen the correct strategy. Hope I've explained the problem adequately. -- Signed email is an absolute defence against phishing. This email has been signed with my private key. If you import my public key you can automatically decrypt my signature and be sure it came from me. Just ask and I'll send it to you. Your email software can handle signing. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/65f8ee1b-043f-d6f6-2b63-5ce564ad888f%40dewhirst.com.au. OpenPGP_signature Description: OpenPGP digital signature
Re: How to hash fields and detect changes in a record
On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote: > The use case is auto-deletion of out-of-date records if they have not > changed. > > That might sound weird but it is the solution I have come to for a > particular problem. My software analyses chemical properties and writes note > records containing advice, each with a FK to the chemical. > > When values change sufficiently on the chemical, the software would > construct a set of mostly different note records. The problem is that note > records still exist from the previous set of properties. These would > definitely confuse the user and thereby invalidate the advice. You might consider versioning your chemical model objects. Then when values change sufficiently on the chemical model object, you can create a new version of the chemical object, leaving the old notes associated with the old version of the chemical object. In your web app, you could allow the users to browse old versions of the chemical including the notes which may have been altered. > The workaround is for the user to delete all notes *prior* to re-saving and > auto-generating a new correct set of notes. There is a proviso that you > wouldn't want to delete notes altered by users. I would document that so > users understand why the software skipped deleting those notes. > > I think the solution might be to hash note.title and note.note into a new > field note.hash on being auto-created. On subsequent saves, compare the > latest hash with note.hash to decide whether to delete auto-inserted notes > prior to generating the next set. Those subsequent saves could be months or > years later. Hashing is useful if you want to check that something has been unexpectedly changed. I assume the note can only be changed through your web app so you know when a user is changing a note. Since you're expecting users to change some of the notes and you know when they do, hashing might be overkill. Instead, add a boolean `altered_by_user` field to the note model. Initially when you automatically create the note altered_by_user would be set to False. If a user changes the note, set altered_by_user to True. > If unchanged, the old note is safe to delete because it is no longer > relevant. > > I've googled around and there are lots of possible solutions but it seems > the major problem might be that hashes are difficult to guarantee when the > environment - such as the version of Python - changes. > > Also, I'm not convinced I have chosen the correct strategy. > > Hope I've explained the problem adequately. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/20220610132452.GA18658%40fattuba.com.
How to hash fields and detect changes in a record
The use case is auto-deletion of out-of-date records if they have not changed. That might sound weird but it is the solution I have come to for a particular problem. My software analyses chemical properties and writes note records containing advice, each with a FK to the chemical. When values change sufficiently on the chemical, the software would construct a set of mostly different note records. The problem is that note records still exist from the previous set of properties. These would definitely confuse the user and thereby invalidate the advice. The workaround is for the user to delete all notes *prior* to re-saving and auto-generating a new correct set of notes. There is a proviso that you wouldn't want to delete notes altered by users. I would document that so users understand why the software skipped deleting those notes. I think the solution might be to hash note.title and note.note into a new field note.hash on being auto-created. On subsequent saves, compare the latest hash with note.hash to decide whether to delete auto-inserted notes prior to generating the next set. Those subsequent saves could be months or years later. If unchanged, the old note is safe to delete because it is no longer relevant. I've googled around and there are lots of possible solutions but it seems the major problem might be that hashes are difficult to guarantee when the environment - such as the version of Python - changes. Also, I'm not convinced I have chosen the correct strategy. Hope I've explained the problem adequately. Thoughts appreciated Cheers Mike -- Signed email is an absolute defence against phishing. This email has been signed with my private key. If you import my public key you can automatically decrypt my signature and be sure it came from me. Just ask and I'll send it to you. Your email software can handle signing. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/ce245947-54ac-150c-b295-0d489d20db40%40dewhirst.com.au. OpenPGP_signature Description: OpenPGP digital signature