Re: How To Do It Faster?!?
[EMAIL PROTECTED] writes: > >$ find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/files.txt > That is a nice idea. I don't know very much about Unix, but I suppose that > on a ksh I can run this command (or a similar one) in order to obtain the > list I need. If anyone knows if that command will run also on a simple ksh, > could please confirm that? That depends on the unix flavor you're using -- my example was for GNU utilities which are heavily used on (probably all) Linux systems. BSDs, Solaris and other unixen have slightly different 'find' syntax. Use "man find" to find out more about how 'find' works on your system. On all systems I know, it goes like: find [-switches ...] where the switches vary depending on the system. In my example, I used -type f (f as in file) to only list files (otherwise 'find' will include directories too in the output) and -printf to include desired data -- in this case, owner, last modified time, size, path -- in the output (otherwise 'find' will only print the path). You should at least go through the -printf formatting codes to see what information you're able to include in the output (=> man find). I used %T@ to print the last modified time in Unix time because it's as simple as it can be: an integer, counting the number of seconds since Jan 1 1970. Python's "time" module groks Unix time just like that. > Moreover, I could run this script in a while loop, like: Except that, I'd imagine, constantly traversing the filesystem will seriously degrade the performance of the file server. You want to run your script periodically over a day, maybe at times when the server is inactive. Or hourly between 8am-4pm and then once at night. In Unix, there's a facility called cron to do just that, it runs scripts and commands over and over again hourly, daily, weekly, or just whenever you want it. Consult your unix flavor's manual or newsgroup on that. > copy /yourserverroot/files.txt /yourserverroot/filesbackup.txt > always have the filesbackup.txt up-to-date, as a function of the "find" > speed on the server. Yes, creating a temporary file is a good approach. I'd suggest moving the new list over the old one (mv tmpfile filelist.txt) instead of copying, since usually move is merely a rename operation on the filesystem and doesn't involve actually copying of any data. br, S -- [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
How To Do It Faster?!?
Hello Simo & NG, >Correct me if I'm wrong but since it _seems_ that the listing doesn't >need to be up-to-date each minute/hour as the users will be looking >primarily for old/unused files, why not have a daily cronjob on the >Unix server to produce an appropriate file list on e.g. the root >directory of your file server? You are correct. I don't need this list to be updated every minute/hour. >$ find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/files.txt That is a nice idea. I don't know very much about Unix, but I suppose that on a ksh I can run this command (or a similar one) in order to obtain the list I need. If anyone knows if that command will run also on a simple ksh, could please confirm that? Moreover, I could run this script in a while loop, like: while 1: do if -e [/yourserverroot/filesbackup.txt]; then find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/files.txt copy /yourserverroot/files.txt /yourserverroot/filesbackup.txt else find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/filesbackup.txt fi done or something similar (I don't have Unix at hand now, I can not test the commands and, as I said, I don't know Unix very well...). In this way, I always have the filesbackup.txt up-to-date, as a function of the "find" speed on the server. Then my GUI could scan the filesbackup.txt file and search for a particular user information. Thanks to all the NG for your suggestions! Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: How To Do It Faster?!?
[EMAIL PROTECTED] writes: > Every user of thsi big directory works on big studies regarding oil > fields. Knowing the amount of data (and number of files) we have to > deal with (produced by simulators, visualization tools, and so on) > and knowing that users are usually lazy in doing clean up of > unused/old files, this is a way for one of us to "fast" scan all the > directories and identify which files belong to him. Having them in > an organized, size-sorted wxPython list, the user can decide if he > want to delete some files (that almost surely he forgot even that > they exist...) or not. It is easy as a button click (retrieve the > data-->delete the files). Correct me if I'm wrong but since it _seems_ that the listing doesn't need to be up-to-date each minute/hour as the users will be looking primarily for old/unused files, why not have a daily cronjob on the Unix server to produce an appropriate file list on e.g. the root directory of your file server? Your Python client would then load that (possibly compressed) text file from the network share and find the needed bits in there. Note that if some "old/unneeded" files are missing today, they'll show right up the following day. For example, running the GNU find command like this: $ find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/files.txt produces a file where each line contains the last modified time, username, size and path for one file. Dead easy to parse with Python, and you'll only have to set up the cronjob _once_ on the Unix server. (If the file becomes too big, grep can be additionally used to split the file e.g. per each user.) br, S -- [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
FAM and Python? (was Re: How To Do It Faster?!?)
On Sat, 02 Apr 2005 02:02:31 +0200, andrea_gavana wrote: > Hello Jeremy & NG, > Every user of thsi big directory works on big studies regarding oil fields. > Knowing the amount of data (and number of files) we have to deal with > (produced > by simulators, visualization tools, and so on) and knowing that users are > usually lazy in doing clean up of unused/old files, this is a way for one > of us to "fast" scan all the directories and identify which files belong > to him. Having them in an organized, size-sorted wxPython list, the user > can decide if he want to delete some files (that almost surely he forgot > even that they exist...) or not. It is easy as a button click (retrieve > the data-->delete the files). Got it. A good idea! >>Here's an idea to sort of come at the problem from a different angle. Can >>you run something on the file server itself, and use RPC to access it? > > I don't even know what is RPC... I have to look at it. RPC stands for "remote procedure call". The idea is that you do something that looks like a normal function call, except it happens on a remote server. Complexity varies widely. Given your situation, and if running something on the UNIX server is a possibility, I'd recommend downloading and playing with Pyro; it is Python specific, so I think it would be the best thing for you, being powerful, well integrated with Python, and easy to use. Then, on your client machine in Windows, ultimately you'd make some sort of call to your server like fileList = server.getFileList(user) and you'd get the file list for that user, returning whatever you want for your app; a list of tuples, objects, whatever you want. Pyro will add no constraints to your app. > I am not sure if my new explanation fits with your last information... as > above, I didn't even know about fam... I've read a little, but probably > I am too newbie to see a link between it and my scope. Do you think it exists? > It would be nice to have something that tracks the file status on all the > file system, but probably is a LOT of work wrt what my app should be able > to do. Maybe, maybe not. I've never used FAM. Perhaps someone who has can chime in about the ease of use; I've changed the subject to try to attract such a person. It also depends on if FAM works on your UNIX. My point is that you can do one scan at startup (can't avoid this), but then as the file system monitor tells you that a change has occurred, you update your data structures to account for the change. That way, your data is always in sync. (For safety's sake, you might set the server to terminate itself and re-start every night.) Since it's always in sync, you can send this data back instead of scanning the file system. At this point, my suggestion would be to consider whether you want to spend the effort to speed it up like this, which is something only you (and presumably your managers) are in a position to know, given that you have an existing tool (at least, you seem to speak like you have a functional tool). If you do, then I'd take some time and work a bit with Pyro and FAM, and *then* re-evaluate where you stand. By then you'll probably be able to ask better questions, too, and like I said above, perhaps someone will share their experiences with FAM. Good luck, and have fun; seriously, that's important here. -- http://mail.python.org/mailman/listinfo/python-list
How To Do It Faster?!?
Hello Jeremy & NG, >Yes, clearer, though I still don't know what you're *doing* with that data :-) Every user of thsi big directory works on big studies regarding oil fields. Knowing the amount of data (and number of files) we have to deal with (produced by simulators, visualization tools, and so on) and knowing that users are usually lazy in doing clean up of unused/old files, this is a way for one of us to "fast" scan all the directories and identify which files belong to him. Having them in an organized, size-sorted wxPython list, the user can decide if he want to delete some files (that almost surely he forgot even that they exist...) or not. It is easy as a button click (retrieve the data-->delete the files). >Here's an idea to sort of come at the problem from a different angle. Can >you run something on the file server itself, and use RPC to access it? I don't even know what is RPC... I have to look at it. >The reason I mention this is a lot of UNIXes have an API to detect file >changes live; for instance, google "python fam". It would be easy to hook >something up to scan the files at startup and maintain your totals live, >and then use one of the many extremely easy Python RPC mechanisms to >request the data as the user wants it, which would most likely come back >at network speeds (fast). I am not sure if my new explanation fits with your last information... as above, I didn't even know about fam... I've read a little, but probably I am too newbie to see a link between it and my scope. Do you think it exists? It would be nice to have something that tracks the file status on all the file system, but probably is a LOT of work wrt what my app should be able to do. Anyway, thanks for the hints! If my new explanation changed something, can anyone post some more comments? Thanks to you all. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: How To Do It Faster?!?
On Sat, 02 Apr 2005 01:00:34 +0200, andrea_gavana wrote: > Hello Jeremy & NG, > ... > I hope to have been clearer this time... > > I really welcome all your suggestions. Yes, clearer, though I still don't know what you're *doing* with that data :-) Here's an idea to sort of come at the problem from a different angle. Can you run something on the file server itself, and use RPC to access it? The reason I mention this is a lot of UNIXes have an API to detect file changes live; for instance, google "python fam". It would be easy to hook something up to scan the files at startup and maintain your totals live, and then use one of the many extremely easy Python RPC mechanisms to request the data as the user wants it, which would most likely come back at network speeds (fast). This would be orders of magnitude faster, and no scanning system could compete with it. -- http://mail.python.org/mailman/listinfo/python-list
How To Do It Faster?!?
Hello Jeremy & NG, >* Poke around in the Windows API for a function that does what you want, >and hope it can do it faster due to being in the kernel. I could try it, but I think I have to explain a little bit more my problem. >If you post more information about how you are using this data, I can try to help you. Basically, I have to scan a really BIG directory: essentially, is a UNIX file system where all our projects resides, with thousand and thousand of files and more than 1 TB of information. However, we are about 200-300 users of this space. This is what I do now and I would like to improve: 1) For a particular user (1 and only 1 at a time), I would like to scan all directories and subdirectories in order to find which FILES are owned by this user (I am NOT interested in directory owner, only files). Noting that I am searching only for 1 user, its disc quota is around 20-30 GB, or something like this; 2) My application is a GUI designed with wxPython. It run on Windows, at the moment (this is why I am asking for Windows user IDs and similar, on Unix is much simpler); 3) While scanning the directories (using os.walk), I process the results of my command "dir /q /-c /a-d MyDirectory" and I display this results on a wxListCtrl (a list viewer) of wxPython in my GUI; 4) I would not use the suggested command "dir /S" on a DOS shell because, even if it scans recursively all directories, I am NOT able to process intermediate results because this command never returns until it has finished to scan ALL directories (and for 1 TB of files, it can take a LOT of time); 5) For all the files in each directory scanned, I do: - IF a file belongs to that particular user THEN: Get the file name; Get the file size; Get the last modification date; Display the result on my wxListCtrl - ELSE: Disregard the information; - END I get the file owner using the /Q switch of the DIR command, and I exclude a priori the subdirectories using the /a-d switch. That because I am using os.walk(). 6) All of our users can see this big unix directory on their PC, labeled as E:\ or F:\ or whatever. I can not anyway use UNIX command on dos (and I can not use rsh to communicate with the unix machine and then use something like "find . -name etc". I hope to have been clearer this time... I really welcome all your suggestions. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: How To Do It Faster?!?
On Thu, 31 Mar 2005 13:38:34 +0200, andrea.gavana wrote: > Hello NG, > > in my application, I use os.walk() to walk on a BIG directory. I > need > to retrieve the files, in each sub-directory, that are owned by a > particular user. Noting that I am on Windows (2000 or XP), this is what I > do: You should *try* directly retrieving the relevant information from the OS, instead of spawning a "dir" process. I have no idea how to do that and it will probably require the win32 extensions for Python. After that, you're done. Odds are you'll be disk bound. In fact, you may get no gain if Windows is optimized enough that the process you describe below is *still* disk-bound. Your only hope then is two things: * Poke around in the Windows API for a function that does what you want, and hope it can do it faster due to being in the kernel. * Somehow work this out to be lazy so it tries to grab what the user is looking at, instead of absolutely everything. Whether or not this will work depends on your application. If you post more information about how you are using this data, I can try to help you. (I've had some experience in this domain, but what is good heavily depends on what you are doing. For instance, if you're batch processing a whole bunch of records after the user gave a bulk command, there's not much you can do. But if they're looking at something in a Windows Explorer-like tree view, there's a lot you can do to improve responsiveness, even if you can't speed up the process overall.) -- http://mail.python.org/mailman/listinfo/python-list
How To Do It Faster?!?
Hello max & NG, >I don't quite understand what your program is doing. The user=a[18::20] >looks really fragile/specific to a directory to me. I corrected it to user=a[18::5][:-2], it was my mistake. However, that command is NOT specific to a particular directory. You can try to whatever directory or net resource mounted on your system. It works. >>> a=os.popen("dir /s /q /-c /a-d " + root).read().splitlines() Mhm... have you tried this command on a BIG directory? On your C: drive for example? I had to kill Python after having issued that command because it ate up all my CPU (1GB) for a quite long time. There simply are too many files/information to retrieve in a single command. In my first mail, I said I have to work with a BIG directory (more than 1 TB) and I need to retrieve information when they become available (I put this info on a wxPython ListCtrl). This is why I have chosen os.walk() and that command (that runs on a separate thread wrt the ListCtrl). It does NOT run faster than your command (probably my solution is slower), but I can get information on every directory I scan, while with your command I have to wait a long time to process the results, plus the user can not interact with the results already found. >To get a list containing files owned by a specific user, do something like: >>> files=[line.split()[-1] for line in a if owner in line] I will try this solution also. Thanks NG for your useful suggestions. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rif: Re: How To Do It Faster?!?
[EMAIL PROTECTED] wrote: Unfortunately, on Windows it does not seem to work very well: st = os.stat('MyFile.txt') print st.st_uid 0 I don't think my user ID is 0... While with the OS dos command I get: userid: \\ENI\ag12905 I would recommend using the pywin32 support that almost certainly exists for getting the owner of a file. On the other hand, I'm not familiar with the Windows API in question, but either somebody else will point you to it, or a search on msdn.com would find it for you. -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: How To Do It Faster?!?
I don't quite understand what your program is doing. The user=a[18::20] looks really fragile/specific to a directory to me. Try something like this: >>> a=os.popen("dir /s /q /-c /a-d " + root).read().splitlines() Should give you the dir output split into lines, for every file below root(notice that I added '/s' to the dir command). There will be some extra lines in a that aren't about specific files... >>> a[0] ' Volume in drive C has no label.' but the files should be there. >>> len(a) 232 To get a list containing files owned by a specific user, do something like: >>> files=[line.split()[-1] for line in a if owner in line] >>> len(files) 118 This is throwing away directory information, but using os.walk() instead of the /s switch to dir should work, if you need it... max -- http://mail.python.org/mailman/listinfo/python-list
Re: How To Do It Faster?!?
[EMAIL PROTECTED] wrote: > Hello NG, > > in my application, I use os.walk() to walk on a BIG directory. I need > to retrieve the files, in each sub-directory, that are owned by a > particular user. Noting that I am on Windows (2000 or XP), this is what I > do: > > for root, dirs, files in os.walk(MyBIGDirectory): > > a = os.popen("dir /q /-c /a-d " + root).read().split() > > # Retrieve all files owners > user = a[18::20] > > # Retrieve all the last modification dates & hours > date = a[15::20] > hours = a[16::20] > > # Retrieve all the filenames > name = a[19::20] > > # Retrieve all the files sizes > size = a[17::20] > > # Loop throu all files owners to see if they belong > # to that particular owner (a string) > for util in user: > if util.find(owner) >= 0: > DO SOME PROCESSING > > Does anyone know if there is a faster way to do this job? You may use "dir /s", which lists everything recursively. -- http://mail.python.org/mailman/listinfo/python-list
Re: Rif: Re: How To Do It Faster?!?
[EMAIL PROTECTED] wrote: > Am I missing something on the stat module? I'm running Python 2.3.4. > Yes, you are missing that this is more unix-like. It seems to work in a certain degree on windows - but as the user-model between unix and windows is considerably different, you found a not-so-well working part. I don't think that your code could be much faster at all - the limits are not so much within python than windows itself. The only thing I can think of is to use python-win32 to make the calls that dir with your various options does itself. That would maybe save you the additional overhead of creating a string representation (done by dir) and parsing that - but I doubt the performance gain justifies the means. -- Regards, Diez B. Roggisch -- http://mail.python.org/mailman/listinfo/python-list
Rif: Re: How To Do It Faster?!?
Hello Lazslo & NG, >You can use the stat module to get attributes like last modification >date, uid, gid etc. The documentation of the stat module has a nice >example. Probably it will be faster because you are running an external >program (well, "dir" may be resident but still the OS needs to create a >new shell and interpret the parameters on every invocation). Unfortunately, on Windows it does not seem to work very well: >>> st = os.stat('MyFile.txt') >>> print st.st_uid 0 I don't think my user ID is 0... While with the OS dos command I get: userid: \\ENI\ag12905 Am I missing something on the stat module? I'm running Python 2.3.4. Thanks a lot. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
How To Do It Faster?!?
Hello NG, in my application, I use os.walk() to walk on a BIG directory. I need to retrieve the files, in each sub-directory, that are owned by a particular user. Noting that I am on Windows (2000 or XP), this is what I do: for root, dirs, files in os.walk(MyBIGDirectory): a = os.popen("dir /q /-c /a-d " + root).read().split() # Retrieve all files owners user = a[18::20] # Retrieve all the last modification dates & hours date = a[15::20] hours = a[16::20] # Retrieve all the filenames name = a[19::20] # Retrieve all the files sizes size = a[17::20] # Loop throu all files owners to see if they belong # to that particular owner (a string) for util in user: if util.find(owner) >= 0: DO SOME PROCESSING Does anyone know if there is a faster way to do this job? Thanks to you all. Andrea. -- Message for the recipient only, if received in error, please notify the sender and read http://www.eni.it/disclaimer/ -- http://mail.python.org/mailman/listinfo/python-list
Re: How To Do It Faster?!?
[EMAIL PROTECTED] wrote: Hello NG, in my application, I use os.walk() to walk on a BIG directory. I need to retrieve the files, in each sub-directory, that are owned by a particular user. Noting that I am on Windows (2000 or XP), this is what I do: for root, dirs, files in os.walk(MyBIGDirectory): a = os.popen("dir /q /-c /a-d " + root).read().split() Does anyone know if there is a faster way to do this job? You can use the stat module to get attributes like last modification date, uid, gid etc. The documentation of the stat module has a nice example. Probably it will be faster because you are running an external program (well, "dir" may be resident but still the OS needs to create a new shell and interpret the parameters on every invocation). If the speed is the same, you may still want to use the stat module because: - it is platform independent - it is independent of any external program (for example, the DIR command can change in the future) Best, Laci -- _ Laszlo Nagy web: http://designasign.biz IT Consultantmail: [EMAIL PROTECTED] Python forever! -- http://mail.python.org/mailman/listinfo/python-list