Re: [GENERAL] Watching Views

2014-09-18 Thread Nick Guenther


Ah! Your reply was excellent, David. I only found it now, cleaning out  
my inbox. Comments inline!


Quoting David G Johnston david.g.johns...@gmail.com:


Nick Guenther wrote

As you said, attaching the trigger to a view is useless (for
BEFORE/AFTER, which I'm interested in, also only works on statement
level changes, which I would rather not have to deal with). I tried
attaching my trigger to a materialized view and found that postgres
does not support that; as you said, I need to write a trigger on the
source to keep the materialized view and the source in sync. But in
that case I don't think a materialized view helps me at all, since
without triggers on it I can't even have it compute the diffs for me
(and I suspect that internally postgres simply truncates the old table
and refills it, so there would be nothing to hook anyway).

My bottleneck is the size of my database and the network: I want to
take slices of a potentially gigabytes-large database and stream them
out to multiple clients. Thus I need to stream--not poll--for changes.
I think a materialized view would force me to poll for changes, and in
that case I would be doing REFRESH MATERIALIZED VIEW on each poll and
therefore not winning anything over a regualar view. Ideally, when an
update comes in to a parent table I want all views that have sliced it
to be notified; I'm interested in doing dataflow* programming,
essentially. Is there maybe some sort of extension that does
dependency tracking inside of postgres?


While PostgreSQL has materialized view functionality built in the concept is
general and can be done manually.  Instead of the main table having the link
to the FIFO I was thinking you would instead replicate record changes to all
active subset tables and then triggers on those tables would send the
relevant changes out to the world.

Keep in mind you can attach a where clause to your trigger, and I think you
can pass in arguments to it as well.  You should have on trigger per view
attached to the source table - though with good meta-data and some imposed
limits you can probably pool some clients into the same stream.


I actually ended up independently deciding on using triggers.
I spent a long time agonizing over trying to avoid triggers
because tie me to postgres, but they ended up not being so bad
and I even think that duplicating the work for MySQL at least should
not be that bad.

The hooks I wrote are in python (don't laugh) and are at  
https://github.com/kousu/modex/tree/databased/src/backend/db/replicate.pysql.

All the hooks do are export {INSERT,UPDATE,DELETE}s to json--
So far filtering is only in my frontend--in javascript. I did this because
 http://nytimes.github.io/pourover/
rightly points outthat centralizing database slicing means creating
an artificial bottleneck. If you're curious, what I've got so far is  
at https://github.com/kousu/modex/blob/databased/src/frontend/tables.js

But now I'm musing on how to do the same in the database.

For context, my use case has several game instances in parallel. So,  
each game has many clients which should share a part of the database,  
and each client individually further slices the data depending on what  
its user wants to investigate. Sharing data between the games and then  
having them use their in-javascript triggers to just ignore each other  
is possible--and my current implementation supports that--but it would  
be a lot better and less memory and bandwidth heavy if I could have  
the database do the initial share a part slice, and even better  
slices could be deduplicated--that is, if I could  pooling clients  
onto the a single table instead.


Right now, I foresee the final javascript API being something like

```
var DB = Remote_DB.table(buildings)   //create an AST representing
.where(Column(run_id)  // select  
location, kind, cost from buildings where run_id = $1

  .eq(this.run_id))
.select(location, kind, cost)
DB = DB.execute()  //invoke the AST inside of postgres, which should  
wire up triggers doing dataflow
var points = DB.where(Column(location)  //create an in-javascript  
dataflow node
.within((x,y), 500))   // which  
uses triggers (currently
   //  
PourOver's Events) to watch
   //  
the source for changes.

var expensive = DB.max(expensive) // Create another one, which will
   //  
update simultaneously with its sibling
   //  
when the source database changes

```

I would like the AST (and postgres hooks that back it) to have all the  
same methods as the javascript-dataflow part so that changing the  
boundary of 

Re: [GENERAL] Watching Views

2014-07-23 Thread Nick Guenther




Quoting David G Johnston david.g.johns...@gmail.com:


Nick Guenther wrote

Dear List,

In principle, I am looking for some way to say
```
CREATE VIEW view13131 AS select (id, name, bank_account) from actors
where age  22;
WATCH view13131;
```

and get output to stdout like
```

INSERT view13131 VALUES (241, Mortimer, 131.09);
...
INSERT view13131 VALUES (427, Schezwan, 95.89);
UPDATE view13131 SET bank_account = 1017.12 WHERE id = 427;
DELETE FROM view13131 WHERE id = 92;
...
```


9.4 - http://www.postgresql.org/docs/9.4/static/logicaldecoding.html

Though I doubt your requirement to obtain only a subset of data is something
that can be accommodated; especially in SQL form.

And, yes, you can create triggers on views.

http://www.postgresql.org/docs/9.4/static/sql-createtrigger.html

But assuming your view is meant to be dynamic, covering only the subset of
data you wish to watch, no one is going to be using your view to actually
Insert/Update/Delete against the underlying table(s) so it will not do you
any good to add triggers to it.

You probably need to create some kind of materialized view and add a trigger
to the relevant source table to maintain that view on an ongoing basis.
Then remove the trigger (and optionally the materialized view) when you no
longer care to watch.

This:
http://www.postgresql.org/docs/9.4/static/sql-creatematerializedview.html
 or roll your own.

You can also use the full power of whatever programming languages you can
install onto the server (e.g., pl/perl, pl/python, pl/sh) to link up with
the outside world from inside one of those triggers...


Thanks David, your tips were very helpful. I'm not a SQL expert, but  
these ideas springboarded me ahead. I've been playing with your ideas  
this week and I've come up with a way to extract the logical changes  
without using 9.4, which I'll share here in case anyone else is  
curious. This is preliminary, so don't rely on it for anything  
important. It doesn't write DELETE FROM  lines, but it does write  
the data in a json-esque format which could be without too much work  
turned into my invented WATCH language.



```{psql}
-- watch.psql
-- This postgres + pl/python2 script demonstrates watching changes to  
a table via a trigger.

-- Python opens a FIFO on which it writes lines like
--  + {'rating': 3, 'kind': 'Documentary', 'name': 'The Mail Man'}.
-- The FIFO part is the flakiest bit, because it requires you to load  
the trigger,
-- then immediately run a reader (e.g. `tail -f  
/path/to/postgres/data/_changes_films`)

-- *before* any DB updates happen.
-- If you have no reader, updates will fail (the kernel will raise  
ENXIO at print FIFO).
-- The other option is to ignore the ENXIOs, but then changes will get  
lost. I'm not sure.
-- Some more design (subprocess? calling this from a master script?)  
can probably fix that awkwardness.

--
-- The second point of flakiness is that attaching the trigger is  
rather verbose.

-- This can be solved with pl/pgsql subroutines.
--
-- This could be probably done in plpgsql, but I know python better,  
and it comes with serialization (json, msgpack, pickle) available  
easily.

-- these tips are due to
--   
http://www.postgresql.org/message-id/1405660725952-5811931.p...@n5.nabble.com  
and
--   
http://www.postgresql.org/message-id/1405703990457-5811982.p...@n5.nabble.com
-- The reason I'm not using Logical Decoding  
http://www.postgresql.org/docs/devel/static/logicaldecoding-example.html is  
because it's still in devel---not even Arch Linux, usually full of  
bleeding edge code, has this feature yet. Plus it requires fiddling  
with the .conf file.



DROP TABLE IF EXISTS films CASCADE;
CREATE TABLE films (name text, kind text, rating int);


DROP FUNCTION IF EXISTS watch_table();
CREATE FUNCTION watch_table() RETURNS trigger AS $$
  tablename = TD[args][0]

  FIFO = _changes_%s % (tablename,)
  if FIFO not in SD:
#this is our first time running in this instance of the python  
interpreter:

# run initializations

#PL/Python is really meant for small one-off tasks, mostly. Most  
data should probably just be stuffed straight into the database.

# however, things like file descriptors don't work so well like that
# for these things, we need to use the facilities PL/python  
provides:  
http://www.postgresql.org/docs/9.3/static/plpython-sharing.html
#  summary is: SD stands for static data and behaves like  
static locals in C (they must have some kind of trap table kicking  
around that switches in values of SD when the appropriate function is  
called).

#  GD stands for global data and is the same everywhere
#both begin as empty dictionaries
#   note also that it seems that one python interpreter is  
invoked ~per client connection~; not per-statement (which would be too  
fine) nor per

import sys, os

if os.path.exists(FIFO):
  #TODO: check that, if it exists, it's a FIFO and we have perms 

Re: [GENERAL] Watching Views

2014-07-23 Thread David G Johnston
Nick Guenther wrote
 As you said, attaching the trigger to a view is useless (for  
 BEFORE/AFTER, which I'm interested in, also only works on statement  
 level changes, which I would rather not have to deal with). I tried  
 attaching my trigger to a materialized view and found that postgres  
 does not support that; as you said, I need to write a trigger on the  
 source to keep the materialized view and the source in sync. But in  
 that case I don't think a materialized view helps me at all, since  
 without triggers on it I can't even have it compute the diffs for me  
 (and I suspect that internally postgres simply truncates the old table  
 and refills it, so there would be nothing to hook anyway).
 
 My bottleneck is the size of my database and the network: I want to  
 take slices of a potentially gigabytes-large database and stream them  
 out to multiple clients. Thus I need to stream--not poll--for changes.  
 I think a materialized view would force me to poll for changes, and in  
 that case I would be doing REFRESH MATERIALIZED VIEW on each poll and  
 therefore not winning anything over a regualar view. Ideally, when an  
 update comes in to a parent table I want all views that have sliced it  
 to be notified; I'm interested in doing dataflow* programming,  
 essentially. Is there maybe some sort of extension that does  
 dependency tracking inside of postgres?

While PostgreSQL has materialized view functionality built in the concept is
general and can be done manually.  Instead of the main table having the link
to the FIFO I was thinking you would instead replicate record changes to all
active subset tables and then triggers on those tables would send the
relevant changes out to the world.

Keep in mind you can attach a where clause to your trigger, and I think you
can pass in arguments to it as well.  You should have on trigger per view
attached to the source table - though with good meta-data and some imposed
limits you can probably pool some clients into the same stream.

Database size is less an issue compared to the change rate of the affected
table(s).  Triggers let you plug into the change stream.

You could even cascade the triggers so less restrictive filters are grouped
together at one layer and those materialized views then forward to other
tables with more restrictive filters.

If you make these other tables UNLOGGED you should be able to mitigate the
performance hit somewhat.

Beyond that if your views have common and reasonably broad high-level
filters you should consider both in-database partitioning and multiserver
sharding.

The client, not the trigger, should create the FIFO.  If the FIFO is
unusable the trigger should update a control table and a monitor process
should remove that trigger at the next scan.  This should then be extended
to provide control over the addition and removal of viewers and their
corresponding schematic objects.

The view tables also help avoid the problem since then even if a FIFO write
fails you have an idea of what should have been, but was not, written and
can cause it to be written later once the client is listening.

Before stock 9.4 triggers are all you got.  9.4 gives logical wal but no
clue how that works.  There my be solutions via third-party tools like Slony
but I am not familiar with them but they focus on direct database
replication.

Mostly theorizing as I've never actually coded this kind of process...so
some of these ideas may not pan out.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Watching-Views-tp5811927p5812680.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Watching Views

2014-07-17 Thread David G Johnston
Nick Guenther wrote
 Dear List,
 
 I am interested in replicating views of my data in real time to a  
 frontend visualizer. I've looked around, and it seems that most  
 applications in this direction write some ad-hoc json-formatter that  
 spits out exactly the columns it is interested in. I want something  
 more like Cubes[1], where a user (or at least, some javascript) can  
 say I am interested in this slice of the world, and then get updates  
 to that slice, but unlike Cubes it must be near-real-time: I want to  
 hook events, not just redownload tables.
 
 
 In principle, I am looking for some way to say
 ```
 CREATE VIEW view13131 AS select (id, name, bank_account) from actors  
 where age  22;
 WATCH view13131;
 ```
 
 and get output to stdout like
 ```
 
 INSERT view13131 VALUES (241, Mortimer, 131.09);
 ...
 INSERT view13131 VALUES (427, Schezwan, 95.89);
 UPDATE view13131 SET bank_account = 1017.12 WHERE id = 427;
 DELETE FROM view13131 WHERE id = 92;
 ...
 ```

9.4 - http://www.postgresql.org/docs/9.4/static/logicaldecoding.html

Though I doubt your requirement to obtain only a subset of data is something
that can be accommodated; especially in SQL form.

And, yes, you can create triggers on views.

http://www.postgresql.org/docs/9.4/static/sql-createtrigger.html

But assuming your view is meant to be dynamic, covering only the subset of
data you wish to watch, no one is going to be using your view to actually
Insert/Update/Delete against the underlying table(s) so it will not do you
any good to add triggers to it.

You probably need to create some kind of materialized view and add a trigger
to the relevant source table to maintain that view on an ongoing basis. 
Then remove the trigger (and optionally the materialized view) when you no
longer care to watch.

This:
http://www.postgresql.org/docs/9.4/static/sql-creatematerializedview.html
 or roll your own.

You can also use the full power of whatever programming languages you can
install onto the server (e.g., pl/perl, pl/python, pl/sh) to link up with
the outside world from inside one of those triggers...

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Watching-Views-tp5811927p5811931.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general