Re: [GENERAL] Database INNOVATION

2010-10-20 Thread Brent Wood
Gotcha.

Apologies for the digression, off your exact topic but consistent with the 
subject :-)

I'm interested in both, PL/R & representational graphics from an analytical 
perspective, doing more than just retrieving raw or accumulated data with SQL. 
& also from the (mathemetical) graphic perspective to support biological 
taxonomic trees/heirarchies, which do not easily fit the SQL model, although a 
number of kludges to traverse such structures are around. 

(I need to look at the Postgres recursive capability for this sometime)

Cheers,

  Brent

Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand
>>> Craig Ringer  10/20/10 6:12 PM >>>
On 10/20/2010 12:35 PM, Brent Wood wrote:
> Have a look at PL/R.
>
> You can embed a command to generate a graphic using R via a user defined
> SQL function,

In this case, when I say "graph" or "tree" I'm referring to the concept
in the graph theory sense, not the "plot" sense. "object graph" not
"image representation of data".

http://en.wikipedia.org/wiki/Graph_(mathematics)
http://en.wikipedia.org/wiki/Graph_theory

Sorry, I didn't even think to clarify my usage.

What I'm talking about is a way to query the database and obtain a
representation of matching tuples where each tuple is represented
exactly once, and referential relationships between tuples are included
in an efficient way.

For a simple tree or forest (ie a directed graph with no cycles) this
could be a XML/JSON/YAML/whatever document that uses nesting to
represent relationships.

For more complex graphs, it'd have to be a list of
XML/JSON/YAML/whatever representations of each tuple or (if Pg supported
it) multiple tabular result sets, one for each tuple type. An edge list
could be included to speed mapping out the inter-object references after
deserialization.

To say this would be nice when dealing with document-in-database storage
and certain types of ORM workload is quite an understatement. Getting
rid of all that horrid "multiply left join, filter and de-duplicate" or
"n+1 select" crap would be quite lovely. Sure, it's often better to use
sane SQL directly, but there are tasks for which ORMs or
document-database mappings are a real time and pain saver - it'd just be
nice to be able to teach the database their language.

Plus, that'd help shut up the "NoSQL" crowd and separate "NoSQL" from
"relaxed or no ACID shareded databases", two different things people
currently confuse.

In any case, thanks for the tip. It's nice to know the PL/R can be used
for such in-database processing when I *do* want to plot data.

--
Craig Ringer

Please consider the environment before printing this email.

NIWA is the trading name of the National Institute of Water & Atmospheric 
Research Ltd.


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Joe Conway
On 10/19/2010 10:12 PM, Craig Ringer wrote:
> On 10/20/2010 12:35 PM, Brent Wood wrote:
>> Have a look at PL/R.

> In this case, when I say "graph" or "tree" I'm referring to the concept
> in the graph theory sense, not the "plot" sense. "object graph" not
> "image representation of data".

Analysis of graphs is available via PL/R as well. See:
  http://cran.r-project.org/web/views/gR.html

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support



signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Darren Duncan

Craig Ringer wrote:

On 20/10/10 13:12, Darren Duncan wrote:

Never mind JSON.  You can fix the outer joins problem and other issues
simply by supporting relation-valued attributes, or in other words, row
field values that are rowsets.


You can for trees/forests yes. How would you handle more general graphs
with cycles or bidirectional relationships, where you still want to be
able to reconstruct them into a traversible graph client-side?

There are existing graph databases for this, of course, but I've
frequently wished to be able to use the power of SQL's query language
and reporting facilities with my data as well as being able to extract
it as (sub)graphs when needed. Using a graph database would usually cost
me ACID, full SQL support, and in most cases all those goodies like
triggers, constraints, etc as well.

I think there's a real use case for using regular relational storage
with a few SQL extensions to support returning graph-style rather than
row-set style results. Even the SQL extensions probably aren't necesary;
I suspect a (limited and somewhat slow) version could be done purely in
a PL under PostgreSQL, and I've been thinking about trying to prototype one.


And recursively.  Parent records in
outer rowset, child records inside.  And this is all perfectly normal
for the relational model, and SQL's differing from this is part of how
SQL is crippled and not really the relational model.  I demonstrate how
it might be better done with my Muldis D language. -- Darren Duncan


You'd really wow people if you could bang together a working JPA 2.0
backend for that, or a dialect for an existing provider like Hibernate.

Personally, I'd love to give someting like your Muldis-D query interface
a go if it could live within PostgreSQL as a contrib module, using the
regular Pg storage and just providing an alternative query facility.

Right now, it looks like it is a perl-all-the-way system, with no
interfaces for other languages and its own database storage system. In
its current form a quick glance at the docs doesn't demonstrate any
obvious advantage of using it over any of the existing well-established
graph databases or object-relational database systems.


Muldis D is a work-in-progress.  The system is specified in terms of a 
programming language of which multiple implementations could exist, essentially 
the same way as SQL is.  I fully intend for it to run on Postgres, using regular 
Pg storage (which may gain features along the way), and be useable both in the 
manner of a PL and for general DDL/DML/queries/etc.  I agree that it is best for 
the initial all-in attempt to be done as a contrib module, and I am happy that 
Pg already has so much of the infrastructure in place that is needed.  The 
first, Perl-all-the-way, system is just the reference implementation, but the 
Pg-contrib version would likely be the first one for serious use.


If you or anyone wants to work on making this a reality, I welcome it.

-- Darren Duncan


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Craig Ringer
On 20/10/10 13:12, Darren Duncan wrote:

> Never mind JSON.  You can fix the outer joins problem and other issues
> simply by supporting relation-valued attributes, or in other words, row
> field values that are rowsets.

You can for trees/forests yes. How would you handle more general graphs
with cycles or bidirectional relationships, where you still want to be
able to reconstruct them into a traversible graph client-side?

There are existing graph databases for this, of course, but I've
frequently wished to be able to use the power of SQL's query language
and reporting facilities with my data as well as being able to extract
it as (sub)graphs when needed. Using a graph database would usually cost
me ACID, full SQL support, and in most cases all those goodies like
triggers, constraints, etc as well.

I think there's a real use case for using regular relational storage
with a few SQL extensions to support returning graph-style rather than
row-set style results. Even the SQL extensions probably aren't necesary;
I suspect a (limited and somewhat slow) version could be done purely in
a PL under PostgreSQL, and I've been thinking about trying to prototype one.



> And recursively.  Parent records in
> outer rowset, child records inside.  And this is all perfectly normal
> for the relational model, and SQL's differing from this is part of how
> SQL is crippled and not really the relational model.  I demonstrate how
> it might be better done with my Muldis D language. -- Darren Duncan

You'd really wow people if you could bang together a working JPA 2.0
backend for that, or a dialect for an existing provider like Hibernate.

Personally, I'd love to give someting like your Muldis-D query interface
a go if it could live within PostgreSQL as a contrib module, using the
regular Pg storage and just providing an alternative query facility.

Right now, it looks like it is a perl-all-the-way system, with no
interfaces for other languages and its own database storage system. In
its current form a quick glance at the docs doesn't demonstrate any
obvious advantage of using it over any of the existing well-established
graph databases or object-relational database systems.

-- 
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Craig Ringer
On 10/20/2010 12:35 PM, Brent Wood wrote:
> Have a look at PL/R.
>
> You can embed a command to generate a graphic using R via a user defined
> SQL function,

In this case, when I say "graph" or "tree" I'm referring to the concept
in the graph theory sense, not the "plot" sense. "object graph" not
"image representation of data".

http://en.wikipedia.org/wiki/Graph_(mathematics)
http://en.wikipedia.org/wiki/Graph_theory

Sorry, I didn't even think to clarify my usage.

What I'm talking about is a way to query the database and obtain a
representation of matching tuples where each tuple is represented
exactly once, and referential relationships between tuples are included
in an efficient way.

For a simple tree or forest (ie a directed graph with no cycles) this
could be a XML/JSON/YAML/whatever document that uses nesting to
represent relationships.

For more complex graphs, it'd have to be a list of
XML/JSON/YAML/whatever representations of each tuple or (if Pg supported
it) multiple tabular result sets, one for each tuple type. An edge list
could be included to speed mapping out the inter-object references after
deserialization.

To say this would be nice when dealing with document-in-database storage
and certain types of ORM workload is quite an understatement. Getting
rid of all that horrid "multiply left join, filter and de-duplicate" or
"n+1 select" crap would be quite lovely. Sure, it's often better to use
sane SQL directly, but there are tasks for which ORMs or
document-database mappings are a real time and pain saver - it'd just be
nice to be able to teach the database their language.

Plus, that'd help shut up the "NoSQL" crowd and separate "NoSQL" from
"relaxed or no ACID shareded databases", two different things people
currently confuse.

In any case, thanks for the tip. It's nice to know the PL/R can be used
for such in-database processing when I *do* want to plot data.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Darren Duncan

Craig Ringer wrote:
Now, personally, if we're talking "database innovation" what I'd like to 
see is a built-in way to get query results straight from the database as 
graphs of tuples and their relationships. Tabular result sets are poorly 
suited to some kinds of workloads, including a few increasingly common 
ones like document-oriented storage and use via ORMs. In particular, the 
way ORMs tend to do multiple LEFT OUTER JOINs and deduplicate the 
results or do multiple queries and post-process to form a graph is 
wasteful and slow. If Pg had a way to output an object graph (or at 
least tree) natively as, say, JSON, that'd be a marvellous option for 
some kinds of workloads, and might help the NoSQL folks from whining 
quite so much as well ;-)


Never mind JSON.  You can fix the outer joins problem and other issues simply by 
supporting relation-valued attributes, or in other words, row field values that 
are rowsets.  And recursively.  Parent records in outer rowset, child records 
inside.  And this is all perfectly normal for the relational model, and SQL's 
differing from this is part of how SQL is crippled and not really the relational 
model.  I demonstrate how it might be better done with my Muldis D language. -- 
Darren Duncan


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Brent Wood
Have a look at PL/R.

You can embed a command to generate a graphic using R via a user defined SQL 
function, 

This example from 
http://www.varlena.com/GeneralBits/Tidbits/bernier/art_66/graphingWithR.html

HTH

  Brent Wood

=
Graphs can be as easy as '123'. Here's an example where two columnsin a table 
are plotted against each other.
Create and populate the table using the following commands:
CREATE TABLE temp (x int, y int);

  INSERT INTO temp VALUES(4,6);INSERT INTO temp VALUES(8,3);INSERT INTO temp 
VALUES(4,7);INSERT INTO temp VALUES(1,5);INSERT INTO temp VALUES(7,8);INSERT 
INTO temp VALUES(2,3);INSERT INTO temp VALUES(5,1);INSERT INTO temp VALUES(9,4);
The function f_graph()generates the graph as a pdf document:
CREATE OR REPLACE FUNCTIONf_graph() RETURNS text AS 
'str <<- pg.spi.exec(''select x as "my a" ,y as"my b" from temp order by 
x,y'');pdf(''/tmp/myplot.pdf'');plot(str,type="l",main="GraphicsDemonstration",sub="Line
 Graph");dev.off();print(''done'');' 
LANGUAGE plr;
  Creating the graph by invoking this query:
SELECT f_graph();   





Craig Ringer said:

Now, personally, if we're talking "database innovation" what I'd like to 
see is a built-in way to get query results straight from the database as 
graphs of tuples and their relationships. Tabular result sets are poorly 
suited to some kinds of workloads, including a few increasingly common 
ones like document-oriented storage and use via ORMs. In particular, the 
way ORMs tend to do multiple LEFT OUTER JOINs and deduplicate the 
results or do multiple queries and post-process to form a graph is 
wasteful and slow. If Pg had a way to output an object graph (or at 
least tree) natively as, say, JSON, that'd be a marvellous option for 
some kinds of workloads, and might help the NoSQL folks from whining 
quite so much as well ;-)



-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand
Please consider the environment before printing this email.

NIWA is the trading name of the National Institute of Water & Atmospheric 
Research Ltd.


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Craig Ringer

On 19/10/2010 10:12 PM, Mauricio Chamati wrote:

> I am recomming

to you guys to have an special "Section" for Audit tables. As we
separate sequence in another seq, we should do the same for Audit
tables. They have their own behavior, are less used, and so on.


As a senior software architect, you should know how to write clearer 
descriptions of what you want :-P


Your suggestion is really broad. It's not clear exactly what you are 
suggesting or what you actually want, or even what kind of audit tables 
you're referring to.


Are you suggesting that there should also be a facility to put some 
tables outside the normal transactional flow, so that writes to them are 
retained when a transaction rolls back? (If so: this can be done, albeit 
hackishly, with dblink).


Are you suggesting support for audit logging of SELECTs? ie "ON SELECT" 
triggers or some similar mechanism?


Do you want separate storage and/or WAL logging of audit tables for 
performance reasons? If Pg ever supports separate WAL logging for 
different databases or different sets of tables, that'd be an attractive 
option for audit tables. As it is, you already can use tablespaces to 
put them on different storage.


Do you want built-in support for marking tables/columns as "audited" so 
the database system automatically manages recording of audit data to 
audit tables with no need to define triggers etc?


If not, what exactly is it that you *do* want?


Now, personally, if we're talking "database innovation" what I'd like to 
see is a built-in way to get query results straight from the database as 
graphs of tuples and their relationships. Tabular result sets are poorly 
suited to some kinds of workloads, including a few increasingly common 
ones like document-oriented storage and use via ORMs. In particular, the 
way ORMs tend to do multiple LEFT OUTER JOINs and deduplicate the 
results or do multiple queries and post-process to form a graph is 
wasteful and slow. If Pg had a way to output an object graph (or at 
least tree) natively as, say, JSON, that'd be a marvellous option for 
some kinds of workloads, and might help the NoSQL folks from whining 
quite so much as well ;-)




--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Scott Marlowe
On Tue, Oct 19, 2010 at 1:17 PM, Greg Smith  wrote:
> Scott Marlowe wrote:
>>
>> On a side note, MySQL is GPL licensed and cannot simply be taken.  The
>> commercially licensed version is owned by Oracle / Sun, but the GPL
>> code is quite free to be hacked and released and use according to the
>> GPL.
>
> As the MySQL documentation is not GPL licensed, such hacks will have to
> start over from scratch in that area.  Good luck with that.

Actually, given the horrible docs for mysql (sorry but I can't find
anything in there half the time) that's not a wholly bad thing.  Plus
you can always get the docs from somewhere else, if you have to, I
guess.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Chris Browne
mcham...@gmail.com (Mauricio Chamati) writes:
> Postgree is the most amazing DB, even more it will be the only one that will
> remain free (the good ones) as MySQL has been taken. In order to move on with
> this project, as an Java Senior Architect, I am recomming to you guys to have
> an special "Section" for Audit tables. As we separate sequence in another seq,
> we should do the same for Audit tables. They have their own behavior, are less
> used, and so on. I got in the mailing list and am sending the email just to
> give this idea for you guys, keep in mind and implement as you want.

There is value to there being a tutorial (or such) available to explain
strategies for doing this.

I'm not so sure that this is something that properly belongs in the
Postgres reference documentation, as:

  - There are various strategies in the handling of auditing data, and
no forcibly normative/universal solutions;

  - This isn't an explanation of how to use Postgres (ergo part of the
"user guide" aspect of the documentation);

  - This isn't an aspect of the standard behaviour of Postgres (ergo
part of the "reference manual" aspect of the documentation).

It might be sort of neat to have another "Part" to the documentation
that consists of stuff like:
  - Design patterns
  - Design antipatterns (to avoid!)
  - Cookbook
  - Implementation strategies

(and there is overlap between those 4 things, to be sure!)

But that's not a portion of the Official Documentation that presently
exists, so I'm not sure it properly fits.

People have written articles on this sort of thing in the past,
including:

 - Elein Mustain, with material at varlena.com
 - Postgres Online Journal 
 - Articles blogged at Planet PostgreSQL
-- 
select 'cbbrowne' || '@' || 'gmail.com';
http://www3.sympatico.ca/cbbrowne/postgresql.html
"There are no threads in a.b.p.erotica, so there's no  gain in using a
threaded news reader."  -- (Unknown source)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Greg Smith

Scott Marlowe wrote:

On a side note, MySQL is GPL licensed and cannot simply be taken.  The
commercially licensed version is owned by Oracle / Sun, but the GPL
code is quite free to be hacked and released and use according to the
GPL.


As the MySQL documentation is not GPL licensed, such hacks will have to 
start over from scratch in that area.  Good luck with that.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Scott Marlowe
On Tue, Oct 19, 2010 at 8:12 AM, Mauricio Chamati  wrote:
> Folks,
> Postgree is the most amazing DB, even more it will be the only one that will
> remain free (the good ones) as MySQL has been taken.

On a side note, MySQL is GPL licensed and cannot simply be taken.  The
commercially licensed version is owned by Oracle / Sun, but the GPL
code is quite free to be hacked and released and use according to the
GPL.  Due to the idiocy of GPL licensing the connection libs, there
are issues with integrating it with non-GPL software, but it is most
certainly not taken.

I'm no MySQL fan, just wanted to clear that point up.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general