Re: [GENERAL] Watching Views
Ah! Your reply was excellent, David. I only found it now, cleaning out my inbox. Comments inline! Quoting David G Johnston david.g.johns...@gmail.com: Nick Guenther wrote As you said, attaching the trigger to a view is useless (for BEFORE/AFTER, which I'm interested in, also only works on statement level changes, which I would rather not have to deal with). I tried attaching my trigger to a materialized view and found that postgres does not support that; as you said, I need to write a trigger on the source to keep the materialized view and the source in sync. But in that case I don't think a materialized view helps me at all, since without triggers on it I can't even have it compute the diffs for me (and I suspect that internally postgres simply truncates the old table and refills it, so there would be nothing to hook anyway). My bottleneck is the size of my database and the network: I want to take slices of a potentially gigabytes-large database and stream them out to multiple clients. Thus I need to stream--not poll--for changes. I think a materialized view would force me to poll for changes, and in that case I would be doing REFRESH MATERIALIZED VIEW on each poll and therefore not winning anything over a regualar view. Ideally, when an update comes in to a parent table I want all views that have sliced it to be notified; I'm interested in doing dataflow* programming, essentially. Is there maybe some sort of extension that does dependency tracking inside of postgres? While PostgreSQL has materialized view functionality built in the concept is general and can be done manually. Instead of the main table having the link to the FIFO I was thinking you would instead replicate record changes to all active subset tables and then triggers on those tables would send the relevant changes out to the world. Keep in mind you can attach a where clause to your trigger, and I think you can pass in arguments to it as well. You should have on trigger per view attached to the source table - though with good meta-data and some imposed limits you can probably pool some clients into the same stream. I actually ended up independently deciding on using triggers. I spent a long time agonizing over trying to avoid triggers because tie me to postgres, but they ended up not being so bad and I even think that duplicating the work for MySQL at least should not be that bad. The hooks I wrote are in python (don't laugh) and are at https://github.com/kousu/modex/tree/databased/src/backend/db/replicate.pysql. All the hooks do are export {INSERT,UPDATE,DELETE}s to json-- So far filtering is only in my frontend--in javascript. I did this because http://nytimes.github.io/pourover/ rightly points outthat centralizing database slicing means creating an artificial bottleneck. If you're curious, what I've got so far is at https://github.com/kousu/modex/blob/databased/src/frontend/tables.js But now I'm musing on how to do the same in the database. For context, my use case has several game instances in parallel. So, each game has many clients which should share a part of the database, and each client individually further slices the data depending on what its user wants to investigate. Sharing data between the games and then having them use their in-javascript triggers to just ignore each other is possible--and my current implementation supports that--but it would be a lot better and less memory and bandwidth heavy if I could have the database do the initial share a part slice, and even better slices could be deduplicated--that is, if I could pooling clients onto the a single table instead. Right now, I foresee the final javascript API being something like ``` var DB = Remote_DB.table(buildings) //create an AST representing .where(Column(run_id) // select location, kind, cost from buildings where run_id = $1 .eq(this.run_id)) .select(location, kind, cost) DB = DB.execute() //invoke the AST inside of postgres, which should wire up triggers doing dataflow var points = DB.where(Column(location) //create an in-javascript dataflow node .within((x,y), 500)) // which uses triggers (currently // PourOver's Events) to watch // the source for changes. var expensive = DB.max(expensive) // Create another one, which will // update simultaneously with its sibling // when the source database changes ``` I would like the AST (and postgres hooks that back it) to have all the same methods as the javascript-dataflow part so that changing the boundary of
Re: [GENERAL] Watching Views
Quoting David G Johnston david.g.johns...@gmail.com: Nick Guenther wrote Dear List, In principle, I am looking for some way to say ``` CREATE VIEW view13131 AS select (id, name, bank_account) from actors where age 22; WATCH view13131; ``` and get output to stdout like ``` INSERT view13131 VALUES (241, Mortimer, 131.09); ... INSERT view13131 VALUES (427, Schezwan, 95.89); UPDATE view13131 SET bank_account = 1017.12 WHERE id = 427; DELETE FROM view13131 WHERE id = 92; ... ``` 9.4 - http://www.postgresql.org/docs/9.4/static/logicaldecoding.html Though I doubt your requirement to obtain only a subset of data is something that can be accommodated; especially in SQL form. And, yes, you can create triggers on views. http://www.postgresql.org/docs/9.4/static/sql-createtrigger.html But assuming your view is meant to be dynamic, covering only the subset of data you wish to watch, no one is going to be using your view to actually Insert/Update/Delete against the underlying table(s) so it will not do you any good to add triggers to it. You probably need to create some kind of materialized view and add a trigger to the relevant source table to maintain that view on an ongoing basis. Then remove the trigger (and optionally the materialized view) when you no longer care to watch. This: http://www.postgresql.org/docs/9.4/static/sql-creatematerializedview.html or roll your own. You can also use the full power of whatever programming languages you can install onto the server (e.g., pl/perl, pl/python, pl/sh) to link up with the outside world from inside one of those triggers... Thanks David, your tips were very helpful. I'm not a SQL expert, but these ideas springboarded me ahead. I've been playing with your ideas this week and I've come up with a way to extract the logical changes without using 9.4, which I'll share here in case anyone else is curious. This is preliminary, so don't rely on it for anything important. It doesn't write DELETE FROM lines, but it does write the data in a json-esque format which could be without too much work turned into my invented WATCH language. ```{psql} -- watch.psql -- This postgres + pl/python2 script demonstrates watching changes to a table via a trigger. -- Python opens a FIFO on which it writes lines like -- + {'rating': 3, 'kind': 'Documentary', 'name': 'The Mail Man'}. -- The FIFO part is the flakiest bit, because it requires you to load the trigger, -- then immediately run a reader (e.g. `tail -f /path/to/postgres/data/_changes_films`) -- *before* any DB updates happen. -- If you have no reader, updates will fail (the kernel will raise ENXIO at print FIFO). -- The other option is to ignore the ENXIOs, but then changes will get lost. I'm not sure. -- Some more design (subprocess? calling this from a master script?) can probably fix that awkwardness. -- -- The second point of flakiness is that attaching the trigger is rather verbose. -- This can be solved with pl/pgsql subroutines. -- -- This could be probably done in plpgsql, but I know python better, and it comes with serialization (json, msgpack, pickle) available easily. -- these tips are due to -- http://www.postgresql.org/message-id/1405660725952-5811931.p...@n5.nabble.com and -- http://www.postgresql.org/message-id/1405703990457-5811982.p...@n5.nabble.com -- The reason I'm not using Logical Decoding http://www.postgresql.org/docs/devel/static/logicaldecoding-example.html is because it's still in devel---not even Arch Linux, usually full of bleeding edge code, has this feature yet. Plus it requires fiddling with the .conf file. DROP TABLE IF EXISTS films CASCADE; CREATE TABLE films (name text, kind text, rating int); DROP FUNCTION IF EXISTS watch_table(); CREATE FUNCTION watch_table() RETURNS trigger AS $$ tablename = TD[args][0] FIFO = _changes_%s % (tablename,) if FIFO not in SD: #this is our first time running in this instance of the python interpreter: # run initializations #PL/Python is really meant for small one-off tasks, mostly. Most data should probably just be stuffed straight into the database. # however, things like file descriptors don't work so well like that # for these things, we need to use the facilities PL/python provides: http://www.postgresql.org/docs/9.3/static/plpython-sharing.html # summary is: SD stands for static data and behaves like static locals in C (they must have some kind of trap table kicking around that switches in values of SD when the appropriate function is called). # GD stands for global data and is the same everywhere #both begin as empty dictionaries # note also that it seems that one python interpreter is invoked ~per client connection~; not per-statement (which would be too fine) nor per import sys, os if os.path.exists(FIFO): #TODO: check that, if it exists, it's a FIFO and we have perms
Re: [GENERAL] Watching Views
Nick Guenther wrote As you said, attaching the trigger to a view is useless (for BEFORE/AFTER, which I'm interested in, also only works on statement level changes, which I would rather not have to deal with). I tried attaching my trigger to a materialized view and found that postgres does not support that; as you said, I need to write a trigger on the source to keep the materialized view and the source in sync. But in that case I don't think a materialized view helps me at all, since without triggers on it I can't even have it compute the diffs for me (and I suspect that internally postgres simply truncates the old table and refills it, so there would be nothing to hook anyway). My bottleneck is the size of my database and the network: I want to take slices of a potentially gigabytes-large database and stream them out to multiple clients. Thus I need to stream--not poll--for changes. I think a materialized view would force me to poll for changes, and in that case I would be doing REFRESH MATERIALIZED VIEW on each poll and therefore not winning anything over a regualar view. Ideally, when an update comes in to a parent table I want all views that have sliced it to be notified; I'm interested in doing dataflow* programming, essentially. Is there maybe some sort of extension that does dependency tracking inside of postgres? While PostgreSQL has materialized view functionality built in the concept is general and can be done manually. Instead of the main table having the link to the FIFO I was thinking you would instead replicate record changes to all active subset tables and then triggers on those tables would send the relevant changes out to the world. Keep in mind you can attach a where clause to your trigger, and I think you can pass in arguments to it as well. You should have on trigger per view attached to the source table - though with good meta-data and some imposed limits you can probably pool some clients into the same stream. Database size is less an issue compared to the change rate of the affected table(s). Triggers let you plug into the change stream. You could even cascade the triggers so less restrictive filters are grouped together at one layer and those materialized views then forward to other tables with more restrictive filters. If you make these other tables UNLOGGED you should be able to mitigate the performance hit somewhat. Beyond that if your views have common and reasonably broad high-level filters you should consider both in-database partitioning and multiserver sharding. The client, not the trigger, should create the FIFO. If the FIFO is unusable the trigger should update a control table and a monitor process should remove that trigger at the next scan. This should then be extended to provide control over the addition and removal of viewers and their corresponding schematic objects. The view tables also help avoid the problem since then even if a FIFO write fails you have an idea of what should have been, but was not, written and can cause it to be written later once the client is listening. Before stock 9.4 triggers are all you got. 9.4 gives logical wal but no clue how that works. There my be solutions via third-party tools like Slony but I am not familiar with them but they focus on direct database replication. Mostly theorizing as I've never actually coded this kind of process...so some of these ideas may not pan out. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Watching-Views-tp5811927p5812680.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Watching Views
Nick Guenther wrote Dear List, I am interested in replicating views of my data in real time to a frontend visualizer. I've looked around, and it seems that most applications in this direction write some ad-hoc json-formatter that spits out exactly the columns it is interested in. I want something more like Cubes[1], where a user (or at least, some javascript) can say I am interested in this slice of the world, and then get updates to that slice, but unlike Cubes it must be near-real-time: I want to hook events, not just redownload tables. In principle, I am looking for some way to say ``` CREATE VIEW view13131 AS select (id, name, bank_account) from actors where age 22; WATCH view13131; ``` and get output to stdout like ``` INSERT view13131 VALUES (241, Mortimer, 131.09); ... INSERT view13131 VALUES (427, Schezwan, 95.89); UPDATE view13131 SET bank_account = 1017.12 WHERE id = 427; DELETE FROM view13131 WHERE id = 92; ... ``` 9.4 - http://www.postgresql.org/docs/9.4/static/logicaldecoding.html Though I doubt your requirement to obtain only a subset of data is something that can be accommodated; especially in SQL form. And, yes, you can create triggers on views. http://www.postgresql.org/docs/9.4/static/sql-createtrigger.html But assuming your view is meant to be dynamic, covering only the subset of data you wish to watch, no one is going to be using your view to actually Insert/Update/Delete against the underlying table(s) so it will not do you any good to add triggers to it. You probably need to create some kind of materialized view and add a trigger to the relevant source table to maintain that view on an ongoing basis. Then remove the trigger (and optionally the materialized view) when you no longer care to watch. This: http://www.postgresql.org/docs/9.4/static/sql-creatematerializedview.html or roll your own. You can also use the full power of whatever programming languages you can install onto the server (e.g., pl/perl, pl/python, pl/sh) to link up with the outside world from inside one of those triggers... David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Watching-Views-tp5811927p5811931.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general