Re: How to hash fields and detect changes in a record

2022-06-15 Thread Ryan Nowakowski



On June 14, 2022 10:29:40 PM CDT, Mike Dewhirst  wrote:
>On 14/06/2022 11:20 pm, Ryan Nowakowski wrote:
>> 
>> Summing the ordinal of the characters won't catch transposition:
>> 
>> >>> chars = 'ab'
>> >>> sum([ord(c) for c in chars])
>> 195
>> >>> chars = 'ba'
>> >>> sum([ord(c) for c in chars])
>> 195
>> 
>> Better to use a real hash algorithm if you're trying to detect changes.  My 
>> note above about hashing not being required is because you don't need to 
>> detect changes because you explicitly already know when changes are being 
>> made.
>> 
>
>Thanks Ryan.
>
>It is all working now. I append " - No longer relevant" to the note title if 
>any change is detected. Otherwise the note gets deleted.
>

Good to hear! Seems like an interesting project.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/935A6EB4-032D-4264-BD83-24508764B189%40fattuba.com.


Re: How to hash fields and detect changes in a record

2022-06-14 Thread Mike Dewhirst

On 14/06/2022 11:20 pm, Ryan Nowakowski wrote:


Summing the ordinal of the characters won't catch transposition:

>>> chars = 'ab'
>>> sum([ord(c) for c in chars])
195
>>> chars = 'ba'
>>> sum([ord(c) for c in chars])
195

Better to use a real hash algorithm if you're trying to detect 
changes.  My note above about hashing not being required is because 
you don't need to detect changes because you explicitly already know 
when changes are being made.




Thanks Ryan.

It is all working now. I append " - No longer relevant" to the note 
title if any change is detected. Otherwise the note gets deleted.


Cheers

Mike




--
Signed email is an absolute defence against phishing. This email has
been signed with my private key. If you import my public key you can
automatically decrypt my signature and be sure it came from me. Just
ask and I'll send it to you. Your email software can handle signing.

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/75458bb4-bb4b-0e44-54c1-59768d34d403%40dewhirst.com.au.


OpenPGP_signature
Description: OpenPGP digital signature


Re: How to hash fields and detect changes in a record

2022-06-14 Thread Ryan Nowakowski

On 6/12/22 11:40 PM, Mike Dewhirst wrote:


 Original message 
From: Ryan Nowakowski 
Date: 13/6/22 07:09 (GMT+10:00)
To: django-users@googlegroups.com
Subject: Re: How to hash fields and detect changes in a record

On Sat, Jun 11, 2022 at 12:13:16AM +1000, Mike Dewhirst wrote:
> On 10/06/2022 11:24 pm, Ryan Nowakowski wrote:
> > On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote:
> > > I think the solution might be to hash note.title and note.note 
into a new
> > > field note.hash on being auto-created. On subsequent saves, 
compare the
> > > latest hash with note.hash to decide whether to delete 
auto-inserted notes
> > > prior to generating the next set. Those subsequent saves could 
be months or

> > > years later.
> > Hashing is useful if you want to check that something has been
> > unexpectedly changed.  I assume the note can only be changed through
> > your web app so you know when a user is changing a note.
>
> These are automatically generated notes which taken together constitute
> advice on how to deal with the analysis. Users can edit them. For 
example,
> someone might record some action taken regarding the advice. I don't 
want to

> delete that. If nothing has been edited, it is safe to delete.
>
> So how do I know it is the same as when originally generated - and 
safe to

> delete - except by storing a hash of the interesting fields.

Because when the user edits a note, during the form.save()(assuming
you're using Django forms), you'll set `altered_by_user` to True.

Notes can also be altered in the Admin



You have a couple of choices then.  You could alter the note details 
view in the admin to set the altered_by_user field. Alternatively and 
more generically, you could check the pk field in your model save 
method.  If it is None, then you are creating a new note.  If the pk 
field is not None, then you are updating an existing note so you can set 
altered_by_user to True.



> And if that is the best approach, what sort of hashing will survive 
Python

> upgrades etc?

Pick a hash algorithm[1](ex: sha256).  The output will remain the same
even with Python upgrades.

So the mechanism doesn't need to be a hash - as you said.I now just 
sum ord(char) for the title and the note and keep that in a flag field.


Summing the ordinal of the characters won't catch transposition:


chars = 'ab'
sum([ord(c) for c in chars])

195

chars = 'ba'
sum([ord(c) for c in chars])

195

Better to use a real hash algorithm if you're trying to detect changes.  
My note above about hashing not being required is because you don't need 
to detect changes because you explicitly already know when changes are 
being made.


--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/a0d61798-d885-dffd-bfbb-b23a63fbd820%40fattuba.com.


Re: How to hash fields and detect changes in a record

2022-06-12 Thread Mike Dewhirst
--(Unsigned mail from my phone)
 Original message From: Ryan Nowakowski  
Date: 13/6/22  07:09  (GMT+10:00) To: django-users@googlegroups.com Subject: 
Re: How to hash fields and detect changes in a record On Sat, Jun 11, 2022 at 
12:13:16AM +1000, Mike Dewhirst wrote:> On 10/06/2022 11:24 pm, Ryan Nowakowski 
wrote:> > On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote:> > > I 
think the solution might be to hash note.title and note.note into a new> > > 
field note.hash on being auto-created. On subsequent saves, compare the> > > 
latest hash with note.hash to decide whether to delete auto-inserted notes> > > 
prior to generating the next set. Those subsequent saves could be months or> > 
> years later.> > Hashing is useful if you want to check that something has 
been> > unexpectedly changed.  I assume the note can only be changed through> > 
your web app so you know when a user is changing a note.> > These are 
automatically generated notes which taken together constitute> advice on how to 
deal with the analysis. Users can edit them. For example,> someone might record 
some action taken regarding the advice. I don't want to> delete that. If 
nothing has been edited, it is safe to delete.> > So how do I know it is the 
same as when originally generated - and safe to> delete - except by storing a 
hash of the interesting fields.Because when the user edits a note, during the 
form.save()(assumingyou're using Django forms), you'll set `altered_by_user` to 
True.Notes can also be altered in the Admin> And if that is the best approach, 
what sort of hashing will survive Python> upgrades etc?Pick a hash 
algorithm[1](ex: sha256).  The output will remain the sameeven with Python 
upgrades.So the mechanism doesn't need to be a hash - as you said. I now just 
sum ord(char) for the title and the note and keep that in a flag field.Only the 
auto-notes get a flag because they are the only ones I would consider deleting. 
[1] https://docs.python.org/3/library/hashlib.html> > Since you're> > expecting 
users to change some of the notes and you know when they do,> > hashing might 
be overkill.  Instead, add a boolean `altered_by_user`> > field to the note 
model.  Initially when you automatically create the> > note altered_by_user 
would be set to False.  If a user changes the note,> > set altered_by_user to 
True.>> Not sure this would work. Note creation and eventually automatic 
deletion is> all driven from model methods executed on saving.Why wouldn't this 
work? During note creation, altered_by_user would beset to False automatically 
because that's the default.  Whenautomatically deleting, do:    
Note.objects.filter(altered_by_user=False).delete()-- You received this message 
because you are subscribed to the Google Groups "Django users" group.To 
unsubscribe from this group and stop receiving emails from it, send an email to 
django-users+unsubscr...@googlegroups.com.to view this discussion on the web 
visit 
https://groups.google.com/d/msgid/django-users/20220612210931.GA32625%40fattuba.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/62a6c0f0.1c69fb81.c1d5.c26fSMTPIN_ADDED_MISSING%40gmr-mx.google.com.


Re: How to hash fields and detect changes in a record

2022-06-12 Thread Ryan Nowakowski
On Sat, Jun 11, 2022 at 12:13:16AM +1000, Mike Dewhirst wrote:
> On 10/06/2022 11:24 pm, Ryan Nowakowski wrote:
> > On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote:
> > > I think the solution might be to hash note.title and note.note into a new
> > > field note.hash on being auto-created. On subsequent saves, compare the
> > > latest hash with note.hash to decide whether to delete auto-inserted notes
> > > prior to generating the next set. Those subsequent saves could be months 
> > > or
> > > years later.
> > Hashing is useful if you want to check that something has been
> > unexpectedly changed.  I assume the note can only be changed through
> > your web app so you know when a user is changing a note.
> 
> These are automatically generated notes which taken together constitute
> advice on how to deal with the analysis. Users can edit them. For example,
> someone might record some action taken regarding the advice. I don't want to
> delete that. If nothing has been edited, it is safe to delete.
> 
> So how do I know it is the same as when originally generated - and safe to
> delete - except by storing a hash of the interesting fields.

Because when the user edits a note, during the form.save()(assuming
you're using Django forms), you'll set `altered_by_user` to True.

> And if that is the best approach, what sort of hashing will survive Python
> upgrades etc?

Pick a hash algorithm[1](ex: sha256).  The output will remain the same
even with Python upgrades.

[1] https://docs.python.org/3/library/hashlib.html

> > Since you're
> > expecting users to change some of the notes and you know when they do,
> > hashing might be overkill.  Instead, add a boolean `altered_by_user`
> > field to the note model.  Initially when you automatically create the
> > note altered_by_user would be set to False.  If a user changes the note,
> > set altered_by_user to True.
>
> Not sure this would work. Note creation and eventually automatic deletion is
> all driven from model methods executed on saving.

Why wouldn't this work? During note creation, altered_by_user would be
set to False automatically because that's the default.  When
automatically deleting, do:

Note.objects.filter(altered_by_user=False).delete()

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/20220612210931.GA32625%40fattuba.com.


Re: How to hash fields and detect changes in a record

2022-06-10 Thread Mike Dewhirst

Ryan

Thanks very much - you triggered the necessary amount of thinking and I 
reckon you are correct - hashing is overkill.


I just need a self-referential value comparison where the value is 
independent of outside influences. That means I should not use a hash 
library from anywhere.


I'll just convert all the chars in all the fields I'm interested in into 
integers and sum them into my "hash" field.


Should be quick and easy!

Cheers

Mike

On 11/06/2022 12:13 am, Mike Dewhirst wrote:

On 10/06/2022 11:24 pm, Ryan Nowakowski wrote:

On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote:

The use case is auto-deletion of out-of-date records if they have not
changed.

That might sound weird but it is the solution I have come to for a
particular problem. My software analyses chemical properties and 
writes note

records containing advice, each with a FK to the chemical.

When values change sufficiently on the chemical, the software would
construct a set of mostly different note records. The problem is 
that note

records still exist from the previous set of properties. These would
definitely confuse the user and thereby invalidate the advice.

You might consider versioning your chemical model objects.  Then when
values change sufficiently on the chemical model object, you can create
a new version of the chemical object, leaving the old notes associated
with the old version of the chemical object.  In your web app, you could
allow the users to browse old versions of the chemical including the
notes which may have been altered.


That's not really appropriate. The user doesn't care about older 
versions beyond annual an summary of the calculated analysis. As 
volumes (manufactured and/or imported) change the analysis and 
therefore current advice changes. There is no need to keep track of 
out-of-date advice notes.


What really matters is that *when* things change the advice needs to 
change and the old advice needs to be deleted.


The only reason I need to avoid deleting old notes is if the user has 
edited the advice itself - in any of the the individual notes. 
Probably it would be OK to delete an edited note because it is old 
advice BUT I feel it would be wrong for software to make that 
decision. As I said, I'm happy to document why.


Just thinking about that, I could maybe adjust the note.title to 
append something like "Out of date" if I detect it has been edited.




The workaround is for the user to delete all notes *prior* to 
re-saving and

auto-generating a new correct set of notes. There is a proviso that you
wouldn't want to delete notes altered by users. I would document 
that so

users understand why the software skipped deleting those notes.

I think the solution might be to hash note.title and note.note into 
a new

field note.hash on being auto-created. On subsequent saves, compare the
latest hash with note.hash to decide whether to delete auto-inserted 
notes
prior to generating the next set. Those subsequent saves could be 
months or

years later.

Hashing is useful if you want to check that something has been
unexpectedly changed.  I assume the note can only be changed through
your web app so you know when a user is changing a note.


These are automatically generated notes which taken together 
constitute advice on how to deal with the analysis. Users can edit 
them. For example, someone might record some action taken regarding 
the advice. I don't want to delete that. If nothing has been edited, 
it is safe to delete.


So how do I know it is the same as when originally generated - and 
safe to delete - except by storing a hash of the interesting fields.


And if that is the best approach, what sort of hashing will survive 
Python upgrades etc?



Since you're
expecting users to change some of the notes and you know when they do,
hashing might be overkill.  Instead, add a boolean `altered_by_user`
field to the note model.  Initially when you automatically create the
note altered_by_user would be set to False.  If a user changes the note,
set altered_by_user to True.


Not sure this would work. Note creation and eventually automatic 
deletion is all driven from model methods executed on saving.





If unchanged, the old note is safe to delete because it is no longer
relevant.

I've googled around and there are lots of possible solutions but it 
seems
the major problem might be that hashes are difficult to guarantee 
when the

environment - such as the version of Python - changes.

Also, I'm not convinced I have chosen the correct strategy.

Hope I've explained the problem adequately.








--
Signed email is an absolute defence against phishing. This email has
been signed with my private key. If you import my public key you can
automatically decrypt my signature and be sure it came from me. Just
ask and I'll send it to you. Your email software can handle signing.

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To 

Re: How to hash fields and detect changes in a record

2022-06-10 Thread Mike Dewhirst

On 10/06/2022 11:24 pm, Ryan Nowakowski wrote:

On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote:

The use case is auto-deletion of out-of-date records if they have not
changed.

That might sound weird but it is the solution I have come to for a
particular problem. My software analyses chemical properties and writes note
records containing advice, each with a FK to the chemical.

When values change sufficiently on the chemical, the software would
construct a set of mostly different note records. The problem is that note
records still exist from the previous set of properties. These would
definitely confuse the user and thereby invalidate the advice.

You might consider versioning your chemical model objects.  Then when
values change sufficiently on the chemical model object, you can create
a new version of the chemical object, leaving the old notes associated
with the old version of the chemical object.  In your web app, you could
allow the users to browse old versions of the chemical including the
notes which may have been altered.


That's not really appropriate. The user doesn't care about older 
versions beyond annual an summary of the calculated analysis. As volumes 
(manufactured and/or imported) change the analysis and therefore current 
advice changes. There is no need to keep track of out-of-date advice notes.


What really matters is that *when* things change the advice needs to 
change and the old advice needs to be deleted.


The only reason I need to avoid deleting old notes is if the user has 
edited the advice itself - in any of the the individual notes. Probably 
it would be OK to delete an edited note because it is old advice BUT I 
feel it would be wrong for software to make that decision. As I said, 
I'm happy to document why.


Just thinking about that, I could maybe adjust the note.title to append 
something like "Out of date" if I detect it has been edited.





The workaround is for the user to delete all notes *prior* to re-saving and
auto-generating a new correct set of notes. There is a proviso that you
wouldn't want to delete notes altered by users. I would document that so
users understand why the software skipped deleting those notes.

I think the solution might be to hash note.title and note.note into a new
field note.hash on being auto-created. On subsequent saves, compare the
latest hash with note.hash to decide whether to delete auto-inserted notes
prior to generating the next set. Those subsequent saves could be months or
years later.

Hashing is useful if you want to check that something has been
unexpectedly changed.  I assume the note can only be changed through
your web app so you know when a user is changing a note.


These are automatically generated notes which taken together constitute 
advice on how to deal with the analysis. Users can edit them. For 
example, someone might record some action taken regarding the advice. I 
don't want to delete that. If nothing has been edited, it is safe to delete.


So how do I know it is the same as when originally generated - and safe 
to delete - except by storing a hash of the interesting fields.


And if that is the best approach, what sort of hashing will survive 
Python upgrades etc?



Since you're
expecting users to change some of the notes and you know when they do,
hashing might be overkill.  Instead, add a boolean `altered_by_user`
field to the note model.  Initially when you automatically create the
note altered_by_user would be set to False.  If a user changes the note,
set altered_by_user to True.


Not sure this would work. Note creation and eventually automatic 
deletion is all driven from model methods executed on saving.





If unchanged, the old note is safe to delete because it is no longer
relevant.

I've googled around and there are lots of possible solutions but it seems
the major problem might be that hashes are difficult to guarantee when the
environment - such as the version of Python - changes.

Also, I'm not convinced I have chosen the correct strategy.

Hope I've explained the problem adequately.
  




--
Signed email is an absolute defence against phishing. This email has
been signed with my private key. If you import my public key you can
automatically decrypt my signature and be sure it came from me. Just
ask and I'll send it to you. Your email software can handle signing.

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/65f8ee1b-043f-d6f6-2b63-5ce564ad888f%40dewhirst.com.au.


OpenPGP_signature
Description: OpenPGP digital signature


Re: How to hash fields and detect changes in a record

2022-06-10 Thread Ryan Nowakowski
On Fri, Jun 10, 2022 at 05:52:48PM +1000, Mike Dewhirst wrote:
> The use case is auto-deletion of out-of-date records if they have not
> changed.
> 
> That might sound weird but it is the solution I have come to for a
> particular problem. My software analyses chemical properties and writes note
> records containing advice, each with a FK to the chemical.
> 
> When values change sufficiently on the chemical, the software would
> construct a set of mostly different note records. The problem is that note
> records still exist from the previous set of properties. These would
> definitely confuse the user and thereby invalidate the advice.

You might consider versioning your chemical model objects.  Then when
values change sufficiently on the chemical model object, you can create
a new version of the chemical object, leaving the old notes associated
with the old version of the chemical object.  In your web app, you could
allow the users to browse old versions of the chemical including the
notes which may have been altered.

> The workaround is for the user to delete all notes *prior* to re-saving and
> auto-generating a new correct set of notes. There is a proviso that you
> wouldn't want to delete notes altered by users. I would document that so
> users understand why the software skipped deleting those notes.
> 
> I think the solution might be to hash note.title and note.note into a new
> field note.hash on being auto-created. On subsequent saves, compare the
> latest hash with note.hash to decide whether to delete auto-inserted notes
> prior to generating the next set. Those subsequent saves could be months or
> years later.

Hashing is useful if you want to check that something has been
unexpectedly changed.  I assume the note can only be changed through
your web app so you know when a user is changing a note.  Since you're
expecting users to change some of the notes and you know when they do,
hashing might be overkill.  Instead, add a boolean `altered_by_user`
field to the note model.  Initially when you automatically create the
note altered_by_user would be set to False.  If a user changes the note,
set altered_by_user to True.

> If unchanged, the old note is safe to delete because it is no longer
> relevant.
> 
> I've googled around and there are lots of possible solutions but it seems
> the major problem might be that hashes are difficult to guarantee when the
> environment - such as the version of Python - changes.
> 
> Also, I'm not convinced I have chosen the correct strategy.
> 
> Hope I've explained the problem adequately.
 

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/20220610132452.GA18658%40fattuba.com.


How to hash fields and detect changes in a record

2022-06-10 Thread Mike Dewhirst
The use case is auto-deletion of out-of-date records if they have not 
changed.


That might sound weird but it is the solution I have come to for a 
particular problem. My software analyses chemical properties and writes 
note records containing advice, each with a FK to the chemical.


When values change sufficiently on the chemical, the software would 
construct a set of mostly different note records. The problem is that 
note records still exist from the previous set of properties. These 
would definitely confuse the user and thereby invalidate the advice.


The workaround is for the user to delete all notes *prior* to re-saving 
and auto-generating a new correct set of notes. There is a proviso that 
you wouldn't want to delete notes altered by users. I would document 
that so users understand why the software skipped deleting those notes.


I think the solution might be to hash note.title and note.note into a 
new field note.hash on being auto-created. On subsequent saves, compare 
the latest hash with note.hash to decide whether to delete auto-inserted 
notes prior to generating the next set. Those subsequent saves could be 
months or years later.


If unchanged, the old note is safe to delete because it is no longer 
relevant.


I've googled around and there are lots of possible solutions but it 
seems the major problem might be that hashes are difficult to guarantee 
when the environment - such as the version of Python - changes.


Also, I'm not convinced I have chosen the correct strategy.

Hope I've explained the problem adequately.

Thoughts appreciated

Cheers

Mike

--
Signed email is an absolute defence against phishing. This email has
been signed with my private key. If you import my public key you can
automatically decrypt my signature and be sure it came from me. Just
ask and I'll send it to you. Your email software can handle signing.

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/ce245947-54ac-150c-b295-0d489d20db40%40dewhirst.com.au.


OpenPGP_signature
Description: OpenPGP digital signature