Re: [Tutor] Python structure advice ?
Kent Johnson wrote: Dave S wrote: Separate modules is good. Separate directories for anything other than big programs (say 20 or more files?) is more hassle than its worth. The files are better kept in a single directory IMHO. The exception being modules designed for reuse... It just makes life simpler! Ive tried to be hyper organized and added my dirs in /usr/lib/python2.3/site-packages/mypath.pth /home/dave/mygg/gg1.3/live_datad /home/dave/mygg/gg1.3/logger /home/dave/mygg/gg1.3/utils /home/dave/mygg/gg1.3/datacore /home/dave/mygg/gg1.3 /home/dave/mygg/gg1.3/configs This works OK but I sometimes have to search around a bit to find where the modules are. Probarby part of the problem is I tend to write lots of small modules, debug them & then import them into one controlling script, It works OK but I start to drown in files, eg my live_datad contains ... exact_sleep.py garbage_collect.py gg ftsed.e3p html_strip.py live_datad.py valid_day.pyc exact_sleep.pyc garbage_collect.pyc gg ftsed.e3s html_strip.pyc valid_day.py When I get more experienced I will try & write fewer, bigger modules :-) It's just a guess from the filenames, but it looks like your live_datad package (directory) contains everything needed by live_datad.py. Spot on I would like to suggest a different organization. I tend to organize packages around a single functional area, and by looking at the dependencies of the modules in the package on other packages. For example, in my current project some of the packages are: - common.util - this is a catchall for modules that are not specific to this application, and don't depend on any other packages - common.db - low-level database access modules - cb.data - application-specific database access - the data objects and data access objects that the application works with - cb.import - modules that import legacy data into the application - cb.writer - modules that generate files - cb.gui - GUI components - cb.app - application-level drivers and helpers I have been getting in a muddle, html_strip.py, strips HTML, mines for data & when it finds specific patterns returns a dictionary containing them. However I also use one of its functions in a utility convert_data.py reading in archived semi-processed HTML files. This cross dependance has occured several times and is getting messy, yours is an interesting approach, Its started me thinking... Anyway, the point is, if you organize your modules according to what they do, rather than by who uses them, you might make a structure that is less chaotic. HTH Kent ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
Alan Gauld wrote: 1) batch oriented - each step of the process produces its own output file or data structure and this gets picked up by the next stage. Tis usually involved processing data in chunks - writing the first dump after every 10th set of input say. I see your point, like a static chain, one calling the next & passing data, the problem being that the links of the chain will need to remember their previous state when called again, so their output is a function of previous data + fresh data. I guess their state could be written to a file, then re-read. Yes. Just to expand: the typical processing involves three files: 1) the input which is the output of the preceding stage 2) the output which will form input to the next stage 3) the job log. This will contain references to any input data items that failed to process - typically these will be manually inspected, corrected and a new file created and submitted at the end of the batch run. BUT 3) will also contain the sequence number of the last file and/or last data item processed so that when the next cycle runs it knows where to start. It is this belt and braces approach to data processing and error recovery that makes mainframes so reliable, not just the hardware, but the whole culture there is geared to handling failure and being able to *recover* not just report on it. After all its the mainframes where the really mission critical software of any large enterprise runs! As an ex Unix head I learned an awful lot about reliable computing from the 18 months I spent working on a mainframe project. These guys mostly live in a highly specialised microcosm of their own but they have learned a lot of powerful tricks over the last 40 years that the rest of us ignore at our peril. I strongly recommend that anyone who gets the chance of *a short* contract in mainframe land, with training, to grab the opportunity with both hands! < Steps off soapbox now :-) > Alan G Author of the Learn to Program web tutor http://www.freenetpages.co.uk/hp/alan.gauld You get on that soapbox whenever you want :-) , its good to hear a range of views ! Dave ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
> >1) batch oriented - each step of the process produces its own > >output file or data structure and this gets picked up by the > >next stage. Tis usually involved processing data in chunks > >- writing the first dump after every 10th set of input say. > > > I see your point, like a static chain, one calling the next & passing > data, the problem being that the links of the chain will need to > remember their previous state when called again, so their output is a > function of previous data + fresh data. I guess their state could be > written to a file, then re-read. Yes. Just to expand: the typical processing involves three files: 1) the input which is the output of the preceding stage 2) the output which will form input to the next stage 3) the job log. This will contain references to any input data items that failed to process - typically these will be manually inspected, corrected and a new file created and submitted at the end of the batch run. BUT 3) will also contain the sequence number of the last file and/or last data item processed so that when the next cycle runs it knows where to start. It is this belt and braces approach to data processing and error recovery that makes mainframes so reliable, not just the hardware, but the whole culture there is geared to handling failure and being able to *recover* not just report on it. After all its the mainframes where the really mission critical software of any large enterprise runs! As an ex Unix head I learned an awful lot about reliable computing from the 18 months I spent working on a mainframe project. These guys mostly live in a highly specialised microcosm of their own but they have learned a lot of powerful tricks over the last 40 years that the rest of us ignore at our peril. I strongly recommend that anyone who gets the chance of *a short* contract in mainframe land, with training, to grab the opportunity with both hands! < Steps off soapbox now :-) > Alan G Author of the Learn to Program web tutor http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
> For what it's worth, it seems to me to be perfectly normal to have > classes that are only ever intended to have a single instance. For > example, you're never likely to need more than one HTML parser, and > yet htmllib.HTMLParser is a class... That's true but the argument for a class in that case is that we can subclass it for more specialized purposes. If there is only to be a single instance and it will not be specialized by sub classing then a simple module will do the job just nicely. > As Kent said, the main point of a class is that you have a collection > of data and operations on that data bundled together. Dunno if I'd agree that that was the *main point* of classes, the main point I'd say was to act as a template for objects. The fact that there might only be one instance is a side issue. But creating classes that only have a single instance is certainly OK, after all the original design patterns book by the GoF has a singleton pattern to ensure that only one oinstance can be created! > "I want lots of things like this", as it is a declaration of > modularity -- "This stuff all belongs together as a unit". So use a module... Python is blessed with both constructs and we should use whichever is most appropriate. IMHO of course! :-) Alan G Author of the Learn to Program web tutor http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
Dave S wrote: Separate modules is good. Separate directories for anything other than big programs (say 20 or more files?) is more hassle than its worth. The files are better kept in a single directory IMHO. The exception being modules designed for reuse... It just makes life simpler! Ive tried to be hyper organized and added my dirs in /usr/lib/python2.3/site-packages/mypath.pth /home/dave/mygg/gg1.3/live_datad /home/dave/mygg/gg1.3/logger /home/dave/mygg/gg1.3/utils /home/dave/mygg/gg1.3/datacore /home/dave/mygg/gg1.3 /home/dave/mygg/gg1.3/configs This works OK but I sometimes have to search around a bit to find where the modules are. Probarby part of the problem is I tend to write lots of small modules, debug them & then import them into one controlling script, It works OK but I start to drown in files, eg my live_datad contains ... exact_sleep.py garbage_collect.py gg ftsed.e3p html_strip.py live_datad.py valid_day.pyc exact_sleep.pyc garbage_collect.pyc gg ftsed.e3s html_strip.pyc valid_day.py When I get more experienced I will try & write fewer, bigger modules :-) It's just a guess from the filenames, but it looks like your live_datad package (directory) contains everything needed by live_datad.py. I would like to suggest a different organization. I tend to organize packages around a single functional area, and by looking at the dependencies of the modules in the package on other packages. For example, in my current project some of the packages are: - common.util - this is a catchall for modules that are not specific to this application, and don't depend on any other packages - common.db - low-level database access modules - cb.data - application-specific database access - the data objects and data access objects that the application works with - cb.import - modules that import legacy data into the application - cb.writer - modules that generate files - cb.gui - GUI components - cb.app - application-level drivers and helpers Anyway, the point is, if you organize your modules according to what they do, rather than by who uses them, you might make a structure that is less chaotic. HTH Kent ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
Jeff Shannon wrote: Dave S wrote: Kent Johnson wrote: Why do you say this is 'cheaty'? A class is basically a collection of data (state) and functions to operate on that state. Sorry for the delay, real world work got in the way ... Well I understand classes to be used when multiple instances are required, I will only need one instance and as such it seemed a bit of a cheat, The trouble is I now pretty well understand the tools, but don't know how you guys use them in the real world. For what it's worth, it seems to me to be perfectly normal to have classes that are only ever intended to have a single instance. For example, you're never likely to need more than one HTML parser, and yet htmllib.HTMLParser is a class... Well if its good enough for a Python lib ... As Kent said, the main point of a class is that you have a collection of data and operations on that data bundled together. Whether you have one set of data to operate on, or many such sets, is mostly irrelevant (though classes are even more valuable when there *are* many sets of data). Defining a class isn't so much a statement that "I want lots of things like this", as it is a declaration of modularity -- "This stuff all belongs together as a unit". OK Im a reformed ('L' plate programmer) its going to be classes :-) Jeff Shannon Technician/Programmer Credit International ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
Sorry for the delay, real world work took me away ... everything was global, how you guys handle a modern structured language Don't worry this is one of the hardest bad habits to break. You are not alone. The easiest way is to just pass the data from function to function in the function parameters. Its not at all unusual for functions to have lots of parameters, "global" programmers tend to panic when they have more than a couple, yep ! but its not at all bad to have 5 or 6 - more than that gets unweildy I admit and is usually time to start thinking about classes and objects. I have ended up with my application in several separate directories. Separate modules is good. Separate directories for anything other than big programs (say 20 or more files?) is more hassle than its worth. The files are better kept in a single directory IMHO. The exception being modules designed for reuse... It just makes life simpler! Ive tried to be hyper organized and added my dirs in /usr/lib/python2.3/site-packages/mypath.pth /home/dave/mygg/gg1.3/live_datad /home/dave/mygg/gg1.3/logger /home/dave/mygg/gg1.3/utils /home/dave/mygg/gg1.3/datacore /home/dave/mygg/gg1.3 /home/dave/mygg/gg1.3/configs This works OK but I sometimes have to search around a bit to find where the modules are. Probarby part of the problem is I tend to write lots of small modules, debug them & then import them into one controlling script, It works OK but I start to drown in files, eg my live_datad contains ... exact_sleep.py garbage_collect.py gg ftsed.e3p html_strip.py live_datad.py valid_day.pyc exact_sleep.pyc garbage_collect.pyc gg ftsed.e3s html_strip.pyc valid_day.py When I get more experienced I will try & write fewer, bigger modules :-) My problem is that pretty much all the modules need to fix where they are when they exit and pick up from that point later on, There are two "classic" approaches to this kind of problem: 1) batch oriented - each step of the process produces its own output file or data structure and this gets picked up by the next stage. Tis usually involved processing data in chunks - writing the first dump after every 10th set of input say. This is a very efficient way of processing large chuinks of data and avoids any problems of synchronisation since the output chunks form the self contained input to the next step. And the input stage can run ahead of the processing or the processing aghead of the input. This is classic mainframe strategy, ideal for big volumes. BUT it introduces delays in the end to end process time, its not instant. I see your point, like a static chain, one calling the next & passing data, the problem being that the links of the chain will need to remember their previous state when called again, so their output is a function of previous data + fresh data. I guess their state could be written to a file, then re-read. 2) Real time serial processing, typically constructs a processing chain in a single process. Has a separate thread reading the input data Got that working live_datad ... and kicks off a separate processing thread (or process) for each bit of data received. Each thread then processes the data to completion and writes the output. OK A third process or thread then assembles the outputs into a single report. Interesting ... This produces results quickly but can overload the computer if data starts to arrive so fast that the threads start to back up on each other. Also error handling is harder since with the batch job data errors can be fixed at the intermediate files but with this an error anywhere means that whole data processing chain will be broken with no way to fix it other than resubmitting the initial data. An interesting idea, I had not thought of this approach as an option even with its stated drawbacks. Its given me an idea for some scripting I have to do later on ... With my code now running to a few hundred lines (Don't laugh this is BIG for me :-D ) Its big for me in Python, I've only writtenone program with more than a thousand lines of Python wheras I've written many C/C++ programs in ecess of 10,000 lines Boy am I glad I chose to learn Python rather than C++, probarbly still be at 'hello world' ;-) and worked on several of more than a million lines. But few if any Python programs get to those sizes. HTH, Alan G Author of the Learn to Program web tutor http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
Dave S wrote: Kent Johnson wrote: Why do you say this is 'cheaty'? A class is basically a collection of data (state) and functions to operate on that state. Sorry for the delay, real world work got in the way ... Well I understand classes to be used when multiple instances are required, I will only need one instance and as such it seemed a bit of a cheat, The trouble is I now pretty well understand the tools, but don't know how you guys use them in the real world. For what it's worth, it seems to me to be perfectly normal to have classes that are only ever intended to have a single instance. For example, you're never likely to need more than one HTML parser, and yet htmllib.HTMLParser is a class... As Kent said, the main point of a class is that you have a collection of data and operations on that data bundled together. Whether you have one set of data to operate on, or many such sets, is mostly irrelevant (though classes are even more valuable when there *are* many sets of data). Defining a class isn't so much a statement that "I want lots of things like this", as it is a declaration of modularity -- "This stuff all belongs together as a unit". Jeff Shannon Technician/Programmer Credit International ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
Kent Johnson wrote: Dave S wrote: Dave S wrote: The 'remembering where is was' seems a continuous stumbling block for me. I have though of coding each module as a class but this seems like a cheat. I could declare copious globals, this seems messy, I could define each module as a thread & get them talking via queues, given this serious thought but heeded warning in previous posts. I have thought about returning an list of saved 'pointers' which would be re-submitted when the function is called. I don't know which way to turn. Having written this email, it has put my thoughts in order, though it seems a bit cheaty, wouldn't defining all modules that have to remember their internal state as classes be the best bet ? Dave Why do you say this is 'cheaty'? A class is basically a collection of data (state) and functions to operate on that state. Sorry for the delay, real world work got in the way ... Well I understand classes to be used when multiple instances are required, I will only need one instance and as such it seemed a bit of a cheat, The trouble is I now pretty well understand the tools, but don't know how you guys use them in the real world. You might be interested in this essay: http://www.pycs.net/users/323/stories/15.html I found this particularly usefull, It might well make sense to organize your program as a collection of cooperating classes, or maybe a collection of classes with a top-level function that stitches them all together. Yes, this is the way I see things progressing, from 20,000ft this makes a lot of sense. You might also want to learn about iterator classes and generator functions, they are a technique for returning a bit of data at a time while maintaining state. You might be able to structure your input stage as an iterator or generator. http://docs.python.org/tut/node11.html#SECTION001190 http://docs.python.org/lib/typeiter.html I remeber iterators from 'learning python', I was concerned about several modules all 'having a iterator' to the next, debuging would be scary ! I think I will go the class route. Kent ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
> Having written this email, it has put my thoughts in order, though it > seems a bit cheaty, wouldn't defining all modules that have to remember > their internal state as classes be the best bet ? Its one solution certainly, creeate objects and the objects carry their state with them. But the problem can be tackled as per my earlier post without delving into the world of objects. Alan G. ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
> everything was global, how you guys handle a modern structured > language Don't worry this is one of the hardest bad habits to break. You are not alone. The easiest way is to just pass the data from function to function in the function parameters. Its not at all unusual for functions to have lots of parameters, "global" programmers tend to panic when they have more than a couple, but its not at all bad to have 5 or 6 - more than that gets unweildy I admit and is usually time to start thinking about classes and objects. > I have ended up with my application in several separate directories. Separate modules is good. Separate directories for anything other than big programs (say 20 or more files?) is more hassle than its worth. The files are better kept in a single directory IMHO. The exception being modules designed for reuse... It just makes life simpler! > My problem is that pretty much all the modules need to fix where they > are when they exit and pick up from that point later on, There are two "classic" approaches to this kind of problem: 1) batch oriented - each step of the process produces its own output file or data structure and this gets picked up by the next stage. Tis usually involved processing data in chunks - writing the first dump after every 10th set of input say. This is a very efficient way of processing large chuinks of data and avoids any problems of synchronisation since the output chunks form the self contained input to the next step. And the input stage can run ahead of the processing or the processing aghead of the input. This is classic mainframe strategy, ideal for big volumes. BUT it introduces delays in the end to end process time, its not instant. 2) Real time serial processing, typically constructs a processing chain in a single process. Has a separate thread reading the input data and kicks off a separate processing thread (or process) for each bit of data received. Each thread then processes the data to completion and writes the output. A third process or thread then assembles the outputs into a single report. This produces results quickly but can overload the computer if data starts to arrive so fast that the threads start to back up on each other. Also error handling is harder since with the batch job data errors can be fixed at the intermediate files but with this an error anywhere means that whole data processing chain will be broken with no way to fix it other than resubmitting the initial data. > With my code now running to a few hundred lines > (Don't laugh this is BIG for me :-D ) Its big for me in Python, I've only writtenone program with more than a thousand lines of Python wheras I've written many C/C++ programs in ecess of 10,000 lines and worked on several of more than a million lines. But few if any Python programs get to those sizes. HTH, Alan G Author of the Learn to Program web tutor http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
Dave S wrote: Dave S wrote: The 'remembering where is was' seems a continuous stumbling block for me. I have though of coding each module as a class but this seems like a cheat. I could declare copious globals, this seems messy, I could define each module as a thread & get them talking via queues, given this serious thought but heeded warning in previous posts. I have thought about returning an list of saved 'pointers' which would be re-submitted when the function is called. I don't know which way to turn. Having written this email, it has put my thoughts in order, though it seems a bit cheaty, wouldn't defining all modules that have to remember their internal state as classes be the best bet ? Dave Why do you say this is 'cheaty'? A class is basically a collection of data (state) and functions to operate on that state. You might be interested in this essay: http://www.pycs.net/users/323/stories/15.html It might well make sense to organize your program as a collection of cooperating classes, or maybe a collection of classes with a top-level function that stitches them all together. You might also want to learn about iterator classes and generator functions, they are a technique for returning a bit of data at a time while maintaining state. You might be able to structure your input stage as an iterator or generator. http://docs.python.org/tut/node11.html#SECTION001190 http://docs.python.org/lib/typeiter.html Kent ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python structure advice ?
Dave S wrote: Im sorry to bang on about Python structure, but I do struggle with it, having in the past got into very bad habits with loads of BASIC where everything was global, and Forth, and hand coded 8031, 8051, 6502 I cant get my head round how you guys handle a modern structured language :-) (PS before anyone flames me - I think Python is great and am determined to learn it ;-) ) I have ended up with my application in several separate directories. I have 'live_datad' a demon that extracts web data, at preset times and archives it, this will be run as a thread, and possible using a queue ... (still digesting info from query about IPCing) I have a 'data_core' which accepts data from either live_datad real time or the archive for testing, it builds up a large multi dimensional array with various pointers into the array. I have a statistical module 'data_stats' which analises the array pulling various stats. And finally I have an analytical module 'data_predict' which using the output from 'data_stats' & data directly from the 'data_core' outputs statistical predictions of future data. I have written my 'live_datad', I have written my 'data_core' & have a fairly good idea how to write the rest. My problem is that pretty much all the modules need to fix where they are when they exit and pick up from that point later on, ie more data comes from live_datad, it is passed to 'data_core' which updates the matrix, then 'data_stats' then 'data_predict' all called form the main script. This OK till the main script realizes that more data is avalible from 'live_datad', passes it to 'data_core' which must remember where it was and move on, and the same for the rest of the modules. To make the problem more acute the modules may not be called in exactly the same order depending on what I am trying to achieve. The 'remembering where is was' seems a continuous stumbling block for me. I have though of coding each module as a class but this seems like a cheat. I could declare copious globals, this seems messy, I could define each module as a thread & get them talking via queues, given this serious thought but heeded warning in previous posts. I have thought about returning an list of saved 'pointers' which would be re-submitted when the function is called. I don't know which way to turn. With my code now running to a few hundred lines (Don't laugh this is BIG for me :-D ) I am going to have to make a structure decision and any suggestions would be appreciated. How would you approach it ? Dave ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor Having written this email, it has put my thoughts in order, though it seems a bit cheaty, wouldn't defining all modules that have to remember their internal state as classes be the best bet ? Dave ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor