Hi Pythonist(a/o)s I am interested if anyone can shed any light on a web application problem, both in the specific details (see below) but also in the theory of how to do ad hoc data processing and exploration through a web interface (a tall order, I think). It is apropos of my job, if you hadn't guessed, but the problem (and the solution) might be of more general interest. I am going to describe the problem as well as I can below. Any comments are appreciated, including clarification on the description.
Goal: a web app that (1) takes as input a piece of text such as a novel and stores it in a class instance, (2) derives transition probabilities for each word, each two-word string, etc, stores those in a class instance, (3) creates simulated texts by using these transition probabilities, stores simulations in class instances, (4) presents instances of all of the three classes nicely, including histograms or other graphics, tabular counts of words, etc, (5) organizes each instance created in a table-of-contents type layout. Notes on the goal: (1) Each of these phases in the data processing is worth storing and viewing later--hence TOC. (2) The sequence of operations might branch: we might input the text, but then calculate three different transition rate objects, then calculate inumerable simulated texts from each of these. Additionally, we might implement different phases such that you can jump from (2) to (7) without anything between. (3) The design should be flexible enough that if there is a new phase invented, it should be reasonable to add it later without major code surgery, preferably by doing an insert into a database. (4) It would be nice to have a way to deal with the versions of the data objects ("phases" above); e.g. someone inputs texts, I upgrade the software, then there input data is no longer processable--need to do something.... An "upgrade" method in the object?.. (5) We need to have authentication, so users can only interact with their datasets. (6) The phases can almost be thought of "types" or OO classes, and I will model them as classes. If you were doing this from a command line, the phases would be either intermediate files or processes in a pipeline. (7) It is worth thinking about enforcing sequences of transformations, transformations that take several instances to create a new single instance of something else, and transactioning. (8) We need to keep track and display (somehow) derivations of data, so that if you want to grab all the simulated texts derived from a given transition rate instance you can do that, or if you want to get the upstream processes you can do that too. Notes on ideas for the architecture, etc: (1) My language of choice is Python, including PSP (the mod_python answer to PHP) for interface work, database postgresql, development platform gentoo linux. (2) I am thinking of storing each of the above phases (e.g. word transition stats object) as a Python object, but in a database in order to link it with its owner and the preceding data that generated it (e.g. a stochastic projection would need to be associated with the original rates entered). (3) I imagine an interface that kind of looks like google mail (or any number of other mail programs): the left sidebar contains a list of the various classes; click on a class name, and in the main portion you get a list of instances of those classes (where your email messages would be in gmail) and a list of operations across the top, like delete (where save, archive, etc would be). If you click on an instance in the main list, it would display nicely, including graphs etc; it would also have a list of operations along the top to transform it into other classes. Above all this would be a global command bar, with commands like "logout" etc. Select list of all instances of class 1: ------------------------------------------------------------ | global bar | ------------------------------------------------------------ |*class 1* | operations: on list of instances of class 1| | |---------------------------------------------- | class 2 | instance 1, class 1 | | |---------------------------------------------- | class 3 | instance 2, class 1 | | |---------------------------------------------- | | instance 3, class 1 | ------------------------------------------------------------ Select one instance of class 1: ------------------------------------------------------------ | global bar | ------------------------------------------------------------ | *class 1* | operations: on *instance1* of class 1 | | |----------------------------------------------- | class 2 | instance 1, class 1, bunch of stuff | | | would be text, | | class 3 | graphics | | | forms | | | whatever | ------------------------------------------------------------ (4) I would like something such that you can either inherit or define a few methods on your objects and incorporate them into the whole thing. The framework should apply to all transformation of datasets, from astronomical stuff to population data to whatever. Main Questions: (1) Has anybody done something general enough for me to use? I haven't used a framework before, but most of them seemed geared to content delivery, which is not the goal here. If there is one that seems appropriate, please tell me its name and why it is appropriate. If there are some that seem like they might help if not answer completely, please tell me why too. (2) Are there any good places to discuss this in cyberland? (3) Would anybody else be interested in working on it if it were general enough to meet their needs too? I could provide hosting and coordination, and it would be freely available, etc. (4) Do I sound like a crazy person? Sorry to post such a long thing to a newsgroup I am not a regular on, but I am a little desperate :) Thanks all. -- http://mail.python.org/mailman/listinfo/python-list