Re: how to know if folder contents have changed
I think that without further information from the OP about the requirements all we can do is guessing. So both of our solutions are just theory after all (just my personal opinion) 2007/11/14, [EMAIL PROTECTED] [EMAIL PROTECTED]: On Nov 12, 11:27 am, Martin Marcher [EMAIL PROTECTED] wrote: 2007/11/12, [EMAIL PROTECTED] [EMAIL PROTECTED]: a) create a thread that pulls all the time for changes or Given that it would only involve a check of one timestamp (the directory the files are located in), I don't think polling from time to time would be unreasonable. The modification timestamp of the directory should be sufficient given the use case. Even if it's not, tracking modification times for the files in the directory would not be unreasonable. Not for the 400 Files but the OP asks about more files too. How about 40.000 files or 400.000 files? That could be a problem... b) test everytime for changes Checking a timestamp should be a very quick operation. Unless everytime occurs *very* frequently, it's certainly not unreasonable. See above I think it also depends on the number of files fam informs in a notification like way. FAM would work too. However, 1) According to http://oss.sgi.com/projects/fam/faq.html#what_os_fam, FAM should be fairly easy to port to ... Unix-like operating systems . If the original poster is a user of a Uniix-like operating system he/she may actually be able to use it. Regardless, it seems to me that you would lose a great deal of portability (i.e., is there a Windows port?), which may or may not be important to the poster. I don't use windows so speaking about portability you are right. It may be a personal thing but I stopped providing solution (or trying to think about them) for windows (another discussion probably best placed in a forum about social interests or something) 2) FAM undoubtedly uses some system resources. Probably very little, but it's still an overhead that must be taken into account. Both is true but most Linux distributions do use FAM at some point anyway so the overhead is actually very little. Also I think that on most OSs there is a similiar thing like FAM that could be used... 3) You still need to use another method for maintaining state across program invocations, do you not? You need some method no matter wether your program is a long running process or just invoked in irregular intervals. After all I'm pretty sure that there is something FAM like that is available on most OSs. FAM isn't probably available on OSX either but I guess they provide some mechanism. If you want it really portable I'd use an abstraction layer that tries to communicate with some notification daemon which is probably available on the host os and if all that fails provide a fallback implementation that does naive tests. All accessible thru the same abstraction interface. Using timestamps are: 1) Portable. Can you name one OS that does not provide timestamps? Last I checked, even Windows does :-) 2) Storage efficient. I don't have to actually *store* the timestamps. I can just check to see if a file/directory was modified after the last time I checked. read below, a changed timestamp isn't necessarily a sign that a file has indeed changed (backups, ) 3) Easy to maintain persistent state -- just store the timestamp! Well I don't have to actually *store* the timestamps. and just store the timestamp! are a bit confusing. I think you absolutely need to store the timestamp since between runs you won't know what to check for anyway (new files, deleted files, changed files - if these cases are important to you) Personally I'd create a hidden cache file parsable by configparser and have filename = $favorite_checksum_algo - key value pairs in it if it's not a long running process. What is your reasoning for this? because all I need to do to check for changes is getCache(configFile) and compare the results to getActual(os.listdir) and those 2 methods would give me the needed info (of course I'm just blindly guessing as I don't know anything about the further requirements) Of course with a lot of files this could be a problem. I wouldn't want a configparser object with 40.000 (or even just a few thousand) entries to be alive all the time. You'd probably have to create some iterator for the file so that you can check thru the entries in a memory efficient way... It seems to me that it is inefficient and unreliable. First of all you have to compute the checksum (which undoubtedly would involve reading every byte the file) -- not just once, but everytime (or however often you perform the check). Secondly, it is possible for the checksum to be the same even if the file has changed. Unlikely? Perhaps (depends on checksum algorithm used). Impossible? No. So, in effect, you are using a slow algorithm that is known to give incorrect results in certain cases -- all to replace something as basic as
Re: how to know if folder contents have changed
I just found this for win32 which seems to be the same as FAM provides: http://tgolden.sc.sabren.com/python/win32_how_do_i/watch_directory_for_changes.html So it's not about FAM as a definitive product to be used but more like something nearer to the OS that is there anyway and will tell you about it... -- http://noneisyours.marcher.name http://feeds.feedburner.com/NoneIsYours -- http://mail.python.org/mailman/listinfo/python-list
Re: how to know if folder contents have changed
On Nov 12, 11:27 am, Martin Marcher [EMAIL PROTECTED] wrote: 2007/11/12, [EMAIL PROTECTED] [EMAIL PROTECTED]: Why not use the file creation/modification timestamps? because you'd have to a) create a thread that pulls all the time for changes or Given that it would only involve a check of one timestamp (the directory the files are located in), I don't think polling from time to time would be unreasonable. The modification timestamp of the directory should be sufficient given the use case. Even if it's not, tracking modification times for the files in the directory would not be unreasonable. b) test everytime for changes Checking a timestamp should be a very quick operation. Unless everytime occurs *very* frequently, it's certainly not unreasonable. fam informs in a notification like way. FAM would work too. However, 1) According to http://oss.sgi.com/projects/fam/faq.html#what_os_fam, FAM should be fairly easy to port to ... Unix-like operating systems . If the original poster is a user of a Uniix-like operating system he/she may actually be able to use it. Regardless, it seems to me that you would lose a great deal of portability (i.e., is there a Windows port?), which may or may not be important to the poster. 2) FAM undoubtedly uses some system resources. Probably very little, but it's still an overhead that must be taken into account. 3) You still need to use another method for maintaining state across program invocations, do you not? Using timestamps are: 1) Portable. Can you name one OS that does not provide timestamps? Last I checked, even Windows does :-) 2) Storage efficient. I don't have to actually *store* the timestamps. I can just check to see if a file/directory was modified after the last time I checked. 3) Easy to maintain persistent state -- just store the timestamp! Personally I'd create a hidden cache file parsable by configparser and have filename = $favorite_checksum_algo - key value pairs in it if it's not a long running process. What is your reasoning for this? It seems to me that it is inefficient and unreliable. First of all you have to compute the checksum (which undoubtedly would involve reading every byte the file) -- not just once, but everytime (or however often you perform the check). Secondly, it is possible for the checksum to be the same even if the file has changed. Unlikely? Perhaps (depends on checksum algorithm used). Impossible? No. So, in effect, you are using a slow algorithm that is known to give incorrect results in certain cases -- all to replace something as basic as timestamps? Otherwise I'd probably go with fam (or hal i think that's the other thing that does that) hth martin --http://noneisyours.marcher.namehttp://feeds.feedburner.com/NoneIsYours Thanks for the critique -- feel free to punch holes. --Nathan Davis -- http://mail.python.org/mailman/listinfo/python-list
Re: how to know if folder contents have changed
[EMAIL PROTECTED] wrote: can someone suggest a better way? i know it is a general programming problem..but i wish to know if a python solution exists Use pyfam. I believe all docs are in fam but it integrates with that. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to know if folder contents have changed
On Nov 11, 11:03 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: hi i am trying to create a cache of digitized values of around 100 image files in a folder..In my program i would like to know from time to time if a new image has been added or removed from the folder.. Why not use the file creation/modification timestamps? -- http://mail.python.org/mailman/listinfo/python-list
Re: how to know if folder contents have changed
2007/11/12, [EMAIL PROTECTED] [EMAIL PROTECTED]: Why not use the file creation/modification timestamps? because you'd have to a) create a thread that pulls all the time for changes or b) test everytime for changes fam informs in a notification like way. Personally I'd create a hidden cache file parsable by configparser and have filename = $favorite_checksum_algo - key value pairs in it if it's not a long running process. Otherwise I'd probably go with fam (or hal i think that's the other thing that does that) hth martin -- http://noneisyours.marcher.name http://feeds.feedburner.com/NoneIsYours -- http://mail.python.org/mailman/listinfo/python-list
how to know if folder contents have changed
hi i am trying to create a cache of digitized values of around 100 image files in a folder..In my program i would like to know from time to time if a new image has been added or removed from the folder.. one scheme suggested was to create a string from the names of sorted image files and give it as the cache name.. ie ,if i have one.jpg,three.jpg,new.jpg , i will name the cache as 'newonethree.cache' and everytime i want to check if new image added/removed i wd create a string from the contents of folder and compare it with cachename. this scheme is ok for a small number of files,.. can someone suggest a better way? i know it is a general programming problem..but i wish to know if a python solution exists -- http://mail.python.org/mailman/listinfo/python-list
Re: how to know if folder contents have changed
On Sun, 11 Nov 2007 21:03:33 -0800, [EMAIL PROTECTED] wrote: one scheme suggested was to create a string from the names of sorted image files and give it as the cache name.. ie ,if i have one.jpg,three.jpg,new.jpg , i will name the cache as 'newonethree.cache' and everytime i want to check if new image added/removed i wd create a string from the contents of folder and compare it with cachename. this scheme is ok for a small number of files,.. Not really. `xxx.jpg` - `xxx.cache` Now `xxx.jpg` is deleted and `x.jpg` and `xx.jpg` are created. `x.jpg`, `xx.jpg` - `xxx.cache` can someone suggest a better way? i know it is a general programming problem..but i wish to know if a python solution exists Don't store the names in the cache file name but in the cache file. Take a look at the `set()` type for operations to easily find out the differences between two set of names and the `pickle` module to store Python objects in files. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list