Re: [Tutor] Which is better in principle: to store (in file) calculated data or to re-calculate it upon restarting program?
On 31/07/2019 03:02, boB Stepp wrote: > preceding scores plus the current one. If the data in the file > somehow got mangled, it would be an extraordinary coincidence for > every row to yield a correct total score if that total score was > recalculated from the corrupted data. True but the likelihood of that happening is vanishingly small. What is much more likely is that a couple of bits in the entire file will be wrong. So a 5 becomes a 7 for example. Remember that the data in the files is a character based (assuming its a text file) not numerical. The conversion to numbers happens when you read it. The conversion is more likely to detect corrupted data than any calculations you perform. > But the underlying question that I am trying to answer is how > likely/unlikely is it for a file to get corrupted nowadays? It is still quite likely. Not as much as it was 40 years ago, but still very much a possibility. Especially if the data is stored/accessed over a network link. It is still very much a real issue for anyone dealing with critical data. > worthwhile verifying the integrity of every file in a program, or, at > least, every data file accessed by a program every program run? Which > leads to your point... Anything critical should go in a database. That will be much less likely to get corrupted since most RDBMS systems include data cleansing and verification as part of their function. Also for working with large volumes of data(where corruption risk rises just because of the volumes) a database is a more effective way of storing data anyway. >> Checking data integrity is what checksums are for. > > When should this be done in normal programming practice? Any time you gave a critical piece of data in a text file. If it is important to know that the data has changed (for any reason, not just data corruption) then use a checksum. Certainly if it's publicly available or you plan on shipping it over a network a checksum is a good idea. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Which is better in principle: to store (in file) calculated data or to re-calculate it upon restarting program?
On 31/7/19 2:21 am, boB Stepp wrote: I have been using various iterations of a solitaire scorekeeper program to explore different programming thoughts. In my latest musings I am wondering about -- in general -- whether it is best to store calculated data values in a file and reload these values, or whether to recalculate such data upon each new run of a program. In terms of my solitaire scorekeeper program is it better to store "Hand Number, Date, Time, Score, Total Score" or instead, "Hand Number, Date, Time, Score"? Of course I don't really need to store hand number since it is easily determined by its row/record number in its csv file. In this trivial example I cannot imagine there is any realistic difference between the two approaches, but I am trying to generalize my thoughts for potentially much more expensive calculations, very large data sets, and what is the likelihood of storage errors occurring in files. Any thoughts on this? TIA! From a scientific viewpoint, you want to keep the raw data, so you can perform other calculations that you may not have thought of yet. But that's not got much to do with programming ;) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor