Re: [Tutor] Which is better in principle: to store (in file) calculated data or to re-calculate it upon restarting program?

2019-07-31 Thread Alan Gauld via Tutor
On 31/07/2019 03:02, boB Stepp wrote:

> preceding scores plus the current one.  If the data in the file
> somehow got mangled, it would be an extraordinary coincidence for
> every row to yield a correct total score if that total score was
> recalculated from the corrupted data.

True but the likelihood of that happening is vanishingly small.
What is much more likely is that a couple of bits in the
entire file will be wrong. So a 5 becomes a 7 for example.
Remember that the data in the files is a character based
(assuming its a text file) not numerical. The conversion
to numbers happens when you read it. The conversion is more
likely to detect corrupted data than any calculations you perform.

> But the underlying question that I am trying to answer is how
> likely/unlikely is it for a file to get corrupted nowadays?  

It is still quite likely. Not as much as it was 40 years ago,
but still very much a possibility. Especially if the data
is stored/accessed over a network link. It is still very
much a real issue for anyone dealing with critical data.

> worthwhile verifying the integrity of every file in a program, or, at
> least, every data file accessed by a program every program run?  Which
> leads to your point...

Anything critical should go in a database. That will be much
less likely to get corrupted since most RDBMS systems include
data cleansing and verification as part of their function.
Also for working with large volumes of data(where corruption
risk rises just because of the volumes) a database is a more
effective way of storing data anyway.

>> Checking data integrity is what checksums are for.
> 
> When should this be done in  normal programming practice?

Any time you gave a critical piece of data in a text file.
If it is important to know that the data has changed (for
any reason, not just data corruption) then use a checksum.
Certainly if it's publicly available or you plan on shipping
it over a network a checksum is a good idea.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Which is better in principle: to store (in file) calculated data or to re-calculate it upon restarting program?

2019-07-31 Thread Chris Roy-Smith

On 31/7/19 2:21 am, boB Stepp wrote:

I have been using various iterations of a solitaire scorekeeper
program to explore different programming thoughts.  In my latest
musings I am wondering about -- in general -- whether it is best to
store calculated data values in a file and reload these values, or
whether to recalculate such data upon each new run of a program.  In
terms of my solitaire scorekeeper program is it better to store "Hand
Number, Date, Time, Score, Total Score" or instead, "Hand Number,
Date, Time, Score"?  Of course I don't really need to store hand
number since it is easily determined by its row/record number in its
csv file.

In this trivial example I cannot imagine there is any realistic
difference between the two approaches, but I am trying to generalize
my thoughts for potentially much more expensive calculations, very
large data sets, and what is the likelihood of storage errors
occurring in files.  Any thoughts on this?

TIA!

From a scientific viewpoint, you want to keep the raw data, so you can 
perform other calculations that you may not have thought of yet. But 
that's not got much to do with programming ;)

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor