Re: [HACKERS] [PATCH] Minor typo in the source repository documentation

2017-07-09 Thread Pantelis Theodosiou
On Sun, Jul 9, 2017 at 6:27 PM, Martin Mai  wrote:

> Hello Tom,
>
> thanks for the clarification. The sentence felt a little bumpy to me,
> but I am native German speaker, so maybe it is just me then.
>
> Since "they" means the tools, would changing
> ... the files they are used to build are included in the tarball.
> to
> ... the files they use to build [it] are included in the tarball.
> convey the intended meaning more clearly?
>

Perhaps:

.. since the necessary files are included in the tarball.


Re: [HACKERS] Make subquery alias optional in FROM clause

2017-02-23 Thread Pantelis Theodosiou
On Thu, Feb 23, 2017 at 3:17 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Robert Haas <robertmh...@gmail.com> writes:
> > On Wed, Feb 22, 2017 at 10:33 PM, Nico Williams <n...@cryptonector.com>
> wrote:
> >> I suspect most users, like me, just roll their eyes, grumble, and put up
> >> with it rather than complain.  It's a pain point, but tolerable enough
> >> that no one bothers to demand a change.  Now that it's been done though,
> >> allow me to add my voice in favor of it!
>
> > +1 to all of that.


My vote doesn't count much but it would be -1.

Question: Will the patch be removed if and when Oracle decides to be
compatible with the standard and forbids non-aliased derived tables?

(I know it's a rather theoretical question. Unlikely that Oracle breaks
backwards compatibility on that.)

Pantelis Theodosiou


Re: [HACKERS] Idea on how to simplify comparing two sets

2017-02-08 Thread Pantelis Theodosiou
On Tue, Feb 7, 2017 at 3:13 PM, Joel Jacobson <j...@trustly.com> wrote:

>  Hi hackers,
>
> Currently there is no simple way to check if two sets are equal.
>
> Looks like no RDBMS in the world has a simple command for it.
>
> You have to do something like:
>
> ...
>
> Introducing new SQL keywords is of course not an option,
> since it would possibly break backwards compatibility.
>
>
I'm not advocating it but I don't see how introducing new SQL keywords
breaks backwards compatibility.

Pantelis Theodosiou


Re: [HACKERS] Idea on how to simplify comparing two sets

2017-02-08 Thread Pantelis Theodosiou
On Tue, Feb 7, 2017 at 3:58 PM, Tom Lane  wrote:

> Joel Jacobson  writes:
> > Currently there is no simple way to check if two sets are equal.
>
> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
> and SELECT set2 EXCEPT SELECT set1 are both empty?
>
> regards, tom lane
>
>
> Yes, if the wanted result is true or false, something like this:

SELECT EXISTS (TABLE a EXCEPT TABLE b)
OR EXISTS (TABLE b EXCEPT TABLE a) ;


And if a new operator was added (in the same category as UNION and
EXCEPT), it could be:


SELECT EXISTS (TABLE a XORSET TABLE b) ;

What about using the = and <> operators in sets? Is the following
allowed in the standard?


SELECT (TABLE a) <> (TABLE b) ;


Re: [HACKERS] PoC: Grouped base relation

2017-01-10 Thread Pantelis Theodosiou
On Tue, Jan 10, 2017 at 6:52 PM, Pantelis Theodosiou <yperc...@gmail.com>
wrote:

>
>
> On Mon, Jan 9, 2017 at 5:56 PM, Antonin Houska <a...@cybertec.at> wrote:
>
>> Attached is a draft patch that lets partial aggregation happen at base
>> relation level. If the relations contain relatively small number of
>> groups,
>> the number of input rows of the aggregation at the query level can be
>> reduced
>> this way.  Also, if append relation and postgres_fdw planning is enhanced
>> accordingly, patch like this can let us aggregate individual tables on
>> remote
>> servers (e.g. shard nodes) and thus reduce the amount of rows subject to
>> the
>> final aggregation.
>>
>> For example, consider query
>>
>> SELECT b.j, sum(a.x) FROM a, b WHERE a.i = b.j GROUP BY b.j;
>>
>> and tables "a"
>>
>>  i | x
>> ---
>>  1 | 3
>>  1 | 4
>>
>> and "b"
>>
>>  j
>> ---
>>  1
>>  1
>>
>
> The example should have j= 1,2 , right?
>
> and "b"
>
>  j
> ---
>  1
>  2
>
>
>
>> The base relations grouped look like
>>
>>  i | sum(a.x)| count(*)
>> ---
>>  1 |   7 |   2
>>
>
>
> Otherwise, the sum and count would be 14 and 4.
>
>
>>
>> and
>>
>>  j | count(*)
>> -
>>  1 |   2
>>
>>
>>
>>
>
> Pantelis
>

Or perhaps I should be reading more carefully the whole mail before
posting. Ignore the previous.


Re: [HACKERS] PoC: Grouped base relation

2017-01-10 Thread Pantelis Theodosiou
On Mon, Jan 9, 2017 at 5:56 PM, Antonin Houska  wrote:

> Attached is a draft patch that lets partial aggregation happen at base
> relation level. If the relations contain relatively small number of groups,
> the number of input rows of the aggregation at the query level can be
> reduced
> this way.  Also, if append relation and postgres_fdw planning is enhanced
> accordingly, patch like this can let us aggregate individual tables on
> remote
> servers (e.g. shard nodes) and thus reduce the amount of rows subject to
> the
> final aggregation.
>
> For example, consider query
>
> SELECT b.j, sum(a.x) FROM a, b WHERE a.i = b.j GROUP BY b.j;
>
> and tables "a"
>
>  i | x
> ---
>  1 | 3
>  1 | 4
>
> and "b"
>
>  j
> ---
>  1
>  1
>

The example should have j= 1,2 , right?

and "b"

 j
---
 1
 2



> The base relations grouped look like
>
>  i | sum(a.x)| count(*)
> ---
>  1 |   7 |   2
>


Otherwise, the sum and count would be 14 and 4.


>
> and
>
>  j | count(*)
> -
>  1 |   2
>
>
>
>

Pantelis


Re: [HACKERS] missing optimization - column <> column

2016-12-05 Thread Pantelis Theodosiou
On Mon, Dec 5, 2016 at 7:02 PM, Corey Huinker 
wrote:

>
>> Would it be worth replacing the condition with the equivalent?
>> I mean would that help optimizing better some queries when it knows that
>> a is (not) nullable or when "a" is more complicated expression?
>>
>> a <> a:   (a IS NULL) AND NULL
>> a = a :   (a IS NOT NULL) OR NULL
>>
>
>
> I think you're looking for
>
> a IS DISTINCT FROM a
>
>
> And that will work for cases where a might be null.
>
> I have no opinion about whether adding such a test to the planner is worth
> it.
>

No,  (a IS DISTINCT FROM a) will evaluate to FALSE when a is NULL. The
other conditions (a <> a) , ((a IS NULL) AND NULL) will evaluate to NULL.


Re: [HACKERS] missing optimization - column <> column

2016-12-05 Thread Pantelis Theodosiou
On Mon, Dec 5, 2016 at 3:24 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Pavel Stehule <pavel.steh...@gmail.com> writes:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple
> optimization
>
> > create table foo(a int);
> > insert into foo select generate_series(1,10);
> > analyze foo;
> > explain select * from foo where a <> a;
>
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
>
> > Same issue is a expression a = a .. can be replaced by true
>
> Wrong; those expressions yield NULL for NULL input.  You could perhaps
> optimize them slightly into some form of is-null test, but it hardly
> seems worth the planner cycles to check for.
>
> If you write something like "1 <> 1", it will be folded.
>
> regards, tom lane
>
>
>

Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that a
is (not) nullable or when "a" is more complicated expression?

a <> a:   (a IS NULL) AND NULL
a = a :   (a IS NOT NULL) OR NULL

Pantelis Theodosiou


Re: [HACKERS] Indirect indexes

2016-10-21 Thread Pantelis Theodosiou
On Thu, Oct 20, 2016 at 4:24 PM, Bruce Momjian  wrote:

> On Thu, Oct 20, 2016 at 05:14:51PM +0200, Petr Jelinek wrote:
> > > Also, it seems indirect indexes would be useful for indexing columns
> > > that are not updated frequently on tables that are updated frequently,
> > > and whose primary key is not updated frequently.  That's quite a logic
> > > problem for users to understand.
> > >
> >
> > Which covers like 99.9% of problematic cases I see on daily basis.
> >
> > And by that logic we should not have indexes at all, they are not
> > automatically created and user needs to think about if they need them or
> > not.
>
> Do you have to resort to extreme statements to make your point?  The use
> of indexes is clear to most users, while the use of indirect indexes
> would not be, as I stated earlier.
>

It's not that difficult to explain I think. We just tell them (to
non-sophisticated users) that they are similar to the non-clustered indexes
that other dbms have (SQL Server, MySQL), which add the PK columns to the
non-clustered index when the table is clustered. Same way as there, the
index doesn't need update when the columns or the PK isn't updated.
So we have the same benefit, except that we have the feature for our heap
tables.

I think it's the same for any other feature that is added (partial indexes,
cubes, new syntax like LATERAL and FILTER). People will learn and start to
use it. We can't expect it to be used by everyone the day it's released.


>
> > Also helping user who does not have performance problem by 1% is very
> > different from helping user who has performance problem by 50% even if
> > she needs to think about the solution a bit.
> >
> > WARM can do WARM update 50% of time, indirect index can do HOT update
> > 100% of time (provided the column is not changed), I don't see why we
> > could not have both solutions.
>
> We don't know enough about the limits of WARM to say it is limited to
> 50%.
>
>
>


Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Pantelis Theodosiou
On Thu, Oct 6, 2016 at 12:05 AM, Serge Rielau  wrote:

>
> via Newton Mail
> 
>
> On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane  wrote:
>
> Andres Freund  writes:
> > On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
> >>> No, "a second “exist default"" was mentioned, i.e. it is an additional
> >>> column in a system table (pg_attribute) as default column values of
> >>> the "pre-alter" era. It solves changing of the default expression of
> >>> the same column later.
>
> > Don't think that actually solves the issue. The default might be unset
> > for a while, for example. Essentially you'd need to be able to associate
> > arbitrary number of default values with an arbitrary set of rows.
>
> I think it does work, as long as the "exists default" is immutable.
> (For safety, personally, I'd restrict it to be a verbatim constant.)
> The point is that you apply that when you are reading a row that has
> so few columns that it must predate the original ALTER TABLE ADD COLUMN.
> Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
> insertions that happen after them, but they don't affect the
> interpretation of old rows. And of course all rows inserted after the
> ADD COLUMN contain explicit values of the column, so their meaning is
> unaffected in any case.
>
>
> You do need two defaults associated with a column to make this work.
> The "exists default" never changes after the column is added. But
> in principle, the "exists default" just replaces the NULL value that
> we implicitly insert now in such cases.
>
> Explained so much better than I could do it :-)
>
> I want to point out as a minor “extension” that there is no need for the
> default to be immutable. It is merely required that the default is evaluate
> at time of ADD COLUMN
> and then we remember the actual value for the exist default, rather than
> the parsed expression as we do for the “current” default.
>
>
> Need a better name for the concept, since evidently this name isn't
> conveying the idea.
>
> By all means. Got anything in mind?
>
>
>
For comparison, SQL Server's implementation. They have a similar feature
(in their Enterprise only edition).
>From https://msdn.microsoft.com/en-us/library/ms190273.aspx :

Adding NOT NULL Columns as an Online Operation

Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column
with a default value is an online operation when the default value is
a *runtime
constant*. This means that the operation is completed almost
instantaneously regardless of the number of rows in the table. This is
because the existing rows in the table are not updated during the
operation; instead, the default value is stored only in the metadata of the
table and the value is looked up as needed in queries that access these
rows. This behavior is automatic; no additional syntax is required to
implement the online operation beyond the ADD COLUMN syntax. A runtime
constant is an expression that produces the same value at runtime for each
row in the table regardless of its determinism. For example, the constant
expression "My temporary data", or the system function GETUTCDATETIME() are
runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID()
are not runtime constants because a unique value is produced for each row
in the table. Adding a NOT NULL column with a default value that is not a
runtime constant is always performed offline and an exclusive (SCH-M) lock
is acquired for the duration of the operation.

While the existing rows reference the value stored in metadata, the default
value is stored on the row for any new rows that are inserted and do not
specify another value for the column. The default value stored in metadata
is moved to an existing row when the row is updated (even if the actual
column is not specified in the UPDATE statement), or if the table or
clustered index is rebuilt.


Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

2016-10-05 Thread Pantelis Theodosiou
On Wed, Oct 5, 2016 at 11:44 PM, Jeff Janes  wrote:

> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund  wrote:
>
>> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
>> > On 10/5/16, Andres Freund  wrote:
>> > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> > >> Dear Hackers,
>> > >> I’m working on a patch that expands PG’s ability to add columns to a
>> table
>> > >> without a table rewrite (i.e. at O(1) cost) from the
>> > >> nullable-without-default to a more general case.
>> > >
>> > > If I understand this proposal correctly, altering a column default
>> will
>> > > still have trigger a rewrite unless there's previous default?
>> >
>> > No, "a second “exist default"" was mentioned, i.e. it is an additional
>> > column in a system table (pg_attribute) as default column values of
>> > the "pre-alter" era. It solves changing of the default expression of
>> > the same column later.
>>
>> Don't think that actually solves the issue. The default might be unset
>> for a while, for example. Essentially you'd need to be able to associate
>> arbitrary number of default values with an arbitrary set of rows.
>>
>>
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 1;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 2;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 3;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>>
>> The result here would be that there's three rows with a default value
>> for foo that's the same as their id. None of them has that column
>> present in the row.
>>
>
> My understanding is that all of those would be materialized.  The only
> default that isn't materialized is the one in effect in the same statement
> in which that column was added.  Since a column can only be added once, the
> default in effect at the time the column was added can never change, no
> matter what you do to the default later on.
>
> Cheers,
>
> Jeff
>

I understood the same thing. That when the column is added the "DEFAULT
constant" means 2 things:

-a- existing rows get a value of that constant (that is not actually
written in the rows, but kept (hidden from the user) in the system tables
and only written in the rows that are updated, vacuumed, etc) and
-b- new rows, inserted after the ADD COLUMN will get the DEFAULT constant,
same way as a normal column definition would.

The b part can easily be changed later with an ALTER COLUMN that sets a new
DEFAULT.
The a part is never changed - but possibly deleted from the system table
when all rows existing before the ADD COLUMN have been updated.


Pantelis


Re: [HACKERS] Feature suggestion: ON CONFLICT DO NOTHING RETURNING which returns existing row data

2016-10-05 Thread Pantelis Theodosiou
This can be solved by chaining modifying CTEs.

Something like this (not tested)  that can work with multiple rows inserted:


WITH
  vals (bk1, bk2, other_t1_columns, other_t2_columns) AS
( VALUES (bk1val, bk2val, other_t1_values, other_t2_values),
 (bk1val, bk2val, other_t1_values, other_t2_values)
),
  ins_t1 AS
( INSERT INTO t1 (bk1, bk2, other columns)
  SELECT bk1, bk2, other_t1_columns
  FROM vals
  ON CONFLICT (bk1val, bk2val) DO NOTHING
  RETURNING id, bk1, bk2
)
INSERT INTO t2 (t1_id, other_t2_columns)
SELECT
COALESCE(t1.id, ins_t1,id),
val.bk1, val.bk2, val.other_t2_columns
FROM vals
  LEFT JOIN ins_t1 ON (vals.bk1, vals.bk2) = (ins_t1.bk1, ins_t1.bk2)
  LEFT JOIN t1 ON (vals.bk1, vals.bk2) = (t1.bk1, t1.bk2)
 ;

On Wed, Oct 5, 2016 at 1:53 AM, Tom Dunstan  wrote:

> Hi all
>
> We recently moved to using 9.5 and were hoping to use the new upsert
> functionality, but unfortunately it doesn’t quite do what we need.
>
> Our setup is something like this:
>
> CREATE TABLE t1 (
>   id BIGSERIAL NOT NULL PRIMARY KEY,
>   bk1 INT,
>   bk2 UUID
>   — other columns
> );
> CREATE UNIQUE INDEX t1_bk ON t1 (bk1, bk2);
>
> CREATE TABLE t2 (
>   t1_id BIGINT NOT NULL REFERENCES t1
>  — other stuff
> );
>
> Data comes in as inserts of one tuple each of t1 and t2. We expect inserts
> to t1 to be heavily duplicated. That is, for stuff coming in we expect a
> large number of rows to have duplicate (bk1, bk2), and we wish to discard
> those, but not discard the t2 tuple - those should always be inserted and
> reference the correct t1 record.
>
> So we currently have an insertion function that does this:
>
> BEGIN
>   INSERT INTO t1 (bk1, bk2, other columns)
>   VALUES (bk1val, bk2val, other values)
>   RETURNING id
>   INTO t1_id;
> EXCEPTION WHEN unique_violation THEN
>   SELECT id
>   FROM t1
>   WHERE bk1 = bk1val AND bk2 = bk2val
>   INTO t1_id;
> END;
>
> INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);
>
> We were hoping that we’d be able to do something like this:
>
> INSERT INTO t1 (bk1, bk2, other columns)
>   VALUES (bk1val, bk2val, other values)
>   ON CONFLICT (bk1val, bk2val) DO NOTHING
>   RETURNING id
>   INTO t1_id;
> INSERT INTO t2(t1_id, other columns) VALUES(t1_id, other values);
>
> But unfortunately it seems that the RETURNING clause returns null when
> there’s a conflict, rather than the existing row’s value.
>
> I understand that there is ambiguity if there were multiple rows that were
> in conflict. I think this sort of functionality really only makes sense
> where the conflict target is a unique constraint, so IMO it would make
> sense to only support returning columns in that case.
>
> I imagine that this would be possible to do more efficiently than the
> subsequent query that we are currently doing given that postgres has
> already found the rows in question, in the index at least. I have no idea
> how hard it would actually be to implement though. FWIW my use-case would
> be supported even if this only worked for indexes where the to-be-returned
> columns were stored in the index using Anastasia’s covering + unique index
> patch, when that lands.
>
> Thoughts?
>
> Tom
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>