date:20160307

Re: [GENERAL] multiple UNIQUE indices for FK

2016-03-07 Thread Rafal Pietrak



W dniu 07.03.2016 o 20:11, Francisco Olarte pisze:
[---]
> 
> When in a single process problem like this I normally use an
> second timestamp which I autoincrement if repeated, something
> like:
> 
> get_timestamp_for_id() {
>   Locked(mutex) {
>   now=time();
>   if (last_returned_id >= now) {
>return ++ last_returned_id;
>   } else {
>return last_returned_id = now;
>   }
>   }
> 
> This has the nice property that it eventually drops to timestamp after
> a burst, so the ID do double service as generation timestamps, but a
> single locked counter, a sequence, works as well.

OK. But in this case I really fail to see the usefulness of "timestamp
semantics" for this field. Just plain INT/BIGINT would do; particularly
if the system has a chance to grow beyond 1 message per second
sustained. If you used it, while actually meaning: "something
sequencial/monotonic like timestamp is", that was misleading to me. I'd
say: an inverse XY problem :)

... but INT/SERIAL semantics might actually be a workaround for me.
Finding a "largest but smaller then" is somewhat more expensive then
plain hit with NEXT=CURRENT ... but may be the penalty will not be so
great after all. And if there are no hidden rise conditions/locking
problems that might just do the trick.


> 
> ... More snipping.
> 
> I cannot recommend more things. The only thing, for easier locating of
> a message in a person, cache the last message id in the person ( which
> you can use as a lock for updtings ) and just use the next for linking
> the chain ( because, as you said, a message can be no longer the last,
> so, unless this only happens when you destructively pop the last
> message in the chain for a user, you need a full linked list to
> recover the previous one ) ( if you determine the new last message by
> other means you do not need any of these things, just cache the last
> message in each person record, then when you insert a new one you
> update each sender / recipient with the last message id at the same
> time you insert the records, preferably sorting the ids first to avoid
> deadlocks if your concurrency is high, although I suspect you'll need
> a linked-list-per-user if it has to be the previous one ).

Yes I do. The NEXT field was there for some time now, and it worked
exactly as expected. That's why I'm so desperate to keep it.
Particularly that it followed the "business model" the schema served (in
the language of XY problem, for the pop operation), literally:
"efficiently find a message, that this person send before".

But thenx for tackling  the problem with me.

-R


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] INDEX ONLY scan with expression index

2016-03-07 Thread David G. Johnston

On Mon, Mar 7, 2016 at 11:33 PM, Tom Lane  wrote:

> James Sewell  writes:
> > Would anyone be able to shed some light on why expression based indexes
> > can't be used for an index only scan?
> > I've found a few comments saying this is the case, and I've proven it is
> > the case in reality - but I can't seem to find the why.
>
> Well, it would help if you posted a concrete example ... but there's
> at least one known limitation: the planner's rule for whether an
> index can be used for an index-only scan is that all variables needed
> by the query be available from the index.  So if you have an index
> on f(x), it might be useful for a query that needs f(x), but you won't
> get an index-only scan for it because the planner fails to notice that
> the query has no references to bare "x" but just "f(x)".  (This is
> something that could be fixed, but it's not clear how to do so without
> imposing considerable cost on queries that get no benefit because they
> have no interest in f(x).)
>
> The recommended workaround at the moment is to create a two-column index
> on "f(x), x".  The second index column has no great value in reality,
> but it lets the planner accept the index as usable for an IOS.  As a
> small consolation prize, it might let you get an IOS on cases where you
> *do* need x as well.
>
>
Have we intentionally excluded creating a section under Chapter 11. Indexes
covering the user-visible dynamics of IOS and what can be done - such as
the advice just given - to cause the planner to choose one?

David J.

Re: [GENERAL] INDEX ONLY scan with expression index

2016-03-07 Thread Tom Lane

James Sewell  writes:
> Would anyone be able to shed some light on why expression based indexes
> can't be used for an index only scan?
> I've found a few comments saying this is the case, and I've proven it is
> the case in reality - but I can't seem to find the why.

Well, it would help if you posted a concrete example ... but there's
at least one known limitation: the planner's rule for whether an
index can be used for an index-only scan is that all variables needed
by the query be available from the index.  So if you have an index
on f(x), it might be useful for a query that needs f(x), but you won't
get an index-only scan for it because the planner fails to notice that
the query has no references to bare "x" but just "f(x)".  (This is
something that could be fixed, but it's not clear how to do so without
imposing considerable cost on queries that get no benefit because they
have no interest in f(x).)

The recommended workaround at the moment is to create a two-column index
on "f(x), x".  The second index column has no great value in reality,
but it lets the planner accept the index as usable for an IOS.  As a
small consolation prize, it might let you get an IOS on cases where you
*do* need x as well.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] INDEX ONLY scan with expression index

2016-03-07 Thread James Sewell

Hello,

Would anyone be able to shed some light on why expression based indexes
can't be used for an index only scan?

I've found a few comments saying this is the case, and I've proven it is
the case in reality - but I can't seem to find the why.

Cheers,


James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread david

From: Jan de Visser [mailto:j...@de-visser.net] 

You're being pretty oblique about what it is you're trying to achieve.
[dmb>] Sorry you see it that way. I know exactly where I'm trying to get to,
but it would take many pages to explain and I don't want to unduly trouble
other busy people.

To go back to one of your earlier emails: the hardest problem in computing
isn't cache invalidation. It is clearly explaining what the problem at hand
is.
[dmb>] What would you like to know? You can find out about Andl by following
the link in my footer. You can find out about me from various links. I have
a long history of experience in C/C++ in multiple environments. Andl is
written in C#, and I've been doing that since pre 1.0. Much of my life has
been writing compilers and related tools for developers, never for end
users. I don't have problems that look anything like the others I see on
this list.

I know from experience that writing long questions is a good way to get
ignored. So I try to pick out one hard question and ask it as briefly as
possible, in the hope that someone with deep Postgres knowledge will
understand what I need and help me find it. At the moment I have two
pressing problems.

One is the start-up phase: getting the Andl runtime stoked up, load its
catalog, set up its execution environment (including its own type system),
ready for business. That process in Postgres seems to be undocumented, but I
think I have it sorted (barring memory lifetime issues down the track).

The other is type conversions: incoming and outgoing. That is undocumented
too, and that's disappointing. Anyone writing functions or a language
handler will really need this. I'm finding it hard to pick a good path right
now.

The third would be queries, but that doesn't look too hard. SPI is quite
well documented.


Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Inserting JSON via Java PreparedStatment

2016-03-07 Thread David G. Johnston

On Mon, Mar 7, 2016 at 8:15 PM, Curt Huffman  wrote:

> Thanks Rob & David!
>
> I got it to work using the following:
>
>
> String qry = "INSERT INTO event "
> + "(spotid, qid, userid, persid, , "
> + "evtvalue, evtdt, evtjson) "
> + "VALUES(?,?,?,?,?,?,?,?,?,?,to_json(?::json));";
>
>
This is redundant (though possibly the to_json become a no-op in this
case, idk).  Either cast (which is what you are doing when you say
"?::json") so pass the text through the to_json function.  What you are
saying here to "please convert this json value I am handing you tojson".

and
> pStmt.setString (11,dtlRec.toString());
>
> (another suggestion was to use: cast(? as json)   which I haven't tried
> yet.)
>

This has the benefit of being standard conforming, the "::" syntax is a
PostgreSQL-ism.

> This worked with an ultra-simple, 1-pair json object. {"New MbrID":34}
>  I'll try it with more complex structures next.
>
> Any opinions on using the postgres function, to_json, over the cast?
>

I'm reasonably certain there is no actual difference between the two so
whatever syntax seems more natural.

> However, from my (limited) understanding, I think I am now just incurring
> additional processing overhead from all of this.
> I think that I am stuffing text into a JSON object, then converting it
> into a string for the preparedStatment, which then passes it to the JDBC
> driver to re-convert it into a JSON object, and gets ultimately stored as a
> text string in the column?  Is that correct?
> I suspect I'll have to reverse the process to read it back out, yes?
>
>
Follow-up questions:
> 1) Since I'm not (yet) using JSONB, but just regular JSON column, is there
> much point to even using a JSON column?
>

Yes, you are using "text" as a serialization feature and by using a typed
json column you are validating/constraining the text being sent to conform
to JSON structure.  Unless you have solid and overwhelming proof that using
JSON is unacceptably slow you should use it from a "proper model"
perspective.

> 2) Will this method also work for JSONB column types?
>

This is how you do type conversion in PostgreSQL, there is nothing here
(aside from the unnecessary to_json function call) that is json/jsonb
specific.

David J.

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread David Bennett

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Jan de Visser

One thing that's probably key here is that pgsql isn't multi-threaded. 
Individual connections are handled by forked backends, which share a shared- 
memory cache that's not accessible by SQL-land code (which includes language 
handlers). So I think your problem goes away once you realize that all the data 
you have is tied to a single connection anyway.

[dmb>] 
[dmb>] Thanks. Yes, I knew that. My problem is a strategy to start up the 
language engine once (per session), load stuff from the database once, and then 
keep the per-call cost as low as possible. I know that palloc() memory goes 
away; that it has a lifetime longer than a function invocation and should span 
a recursive function call, but definitely not longer than a transaction. So I 
need to get some memory per session and keep a pointer to it per session (which 
it seems is per process).

You cannot use multi-threaded code (which touches the database) in language 
handlers or other "plug-in" code.
[dmb>] That's good -- I hate threads. I've never been able to figure out how to 
test them adequately. No loss there.

Also, trying to outsmart the db engine's cache by building your own is usually 
an exercise in futility and often counter-productive. I speak from experience 
:-P
[dmb>] Isn't cache invalidation the second hardest thing in computing (after 
naming things)?

[dmb>] Seems like DLL static memory with allocation from process memory (or 
even malloc()) is "the simplest thing that could possibly work".

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Slave-Master replication on top of BDR

2016-03-07 Thread Craig Ringer

On 4 March 2016 at 23:02, Alvaro Aguayo Garcia-Rada  wrote:

> Hi. I currently have two servers in different geographical locations; both
> of them are replicating with Postgres-BDR, that's OK. However, I need two
> more servers to get a read only replication of only some tables from the
> master ones.
>
> At first I tried with Slony, but it just didn't work(don't know why). I've
> also checked docs for UDR and pglogical, but don't keep it clear if they
> can coexist with BDR.
>
>
pglogical should theoretically work as both provider and subscriber on BDR,
but we haven't put the test infrastructure together to validate that and
make it an officially supported configuration yet.  It's certainly desired
and on the roadmap.

Using UDR with BDR doesn't work well; the issues we found there are part of
why pglogical was created and why it works the way it does.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread Jan de Visser

On March 8, 2016 11:35:00 AM da...@andl.org wrote:
> From: pgsql-general-ow...@postgresql.org
> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
> 
> this stuff you're loading from the database once, that's just data about
> your language plugin's configuration, or is it user data, or what? [dmb>]
> It's the catalog for Andl. It contains defined functions, types, persistent
> scalar (non table) data values and links to tables.
> 
> if its just a few global settings, you should consider using custom
> settings variables, rather than database tables.   for instance, pljava has
> a setting, pljava.libjvm_location='/usr/lib/jvm/java-1.8.0/lib/libjvm.so'
> or whatever which it uses to find the Java native calls interface
> library... [dmb>] Andl has something similar, but that problem is already
> solved.

You're being pretty oblique about what it is you're trying to achieve.

To go back to one of your earlier emails: the hardest problem in computing 
isn't cache invalidation. It is clearly explaining what the problem at hand 
is.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread david

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce

this stuff you're loading from the database once, that's just data about your 
language plugin's configuration, or is it user data, or what?
[dmb>] It's the catalog for Andl. It contains defined functions, types, 
persistent scalar (non table) data values and links to tables. 

if its just a few global settings, you should consider using custom 
settings variables, rather than database tables.   for instance, pljava has a 
setting,
pljava.libjvm_location='/usr/lib/jvm/java-1.8.0/lib/libjvm.so' or whatever 
which it uses to find the Java native calls interface library...
[dmb>] Andl has something similar, but that problem is already solved.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Logger into table and/or to cli

2016-03-07 Thread Thiemo Kellner, NHC Barhufpflege


Hi all

I would like to log from within pgsql into table and/or to command  
line, something like https://sourceforge.net/projects/plnsqllogger/ .  
Does anybody know of an open source solution available?


Kind regards

Thiemo

--
Auf Gelassene Pferde kann man bauen!
+49 (0)1578-772 37 37
+41 (0)78 947 36 21
sip: thiemo.kell...@iptel.org
http://www.gelassene-pferde.biz
Mitglied bei http://www.keep-it-natural.org
Öffentlicher PGP-Schlüssel:  
http://pgp.mit.edu/pks/lookup?op=get&search=0x8F70EFD2D972CBEF



This message was sent using IMP, the Internet Messaging Program.


pgpxFidYqT1mS.pgp
Description: Digitale PGP-Signatur

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread John R Pierce


On 3/7/2016 3:28 PM, da...@andl.org wrote:

[dmb>] Thanks. Yes, I knew that. My problem is a strategy to start up the 
language engine once (per session), load stuff from the database once, and then 
keep the per-call cost as low as possible


this stuff you're loading from the database once, that's just data about 
your language plugin's configuration, or is it user data, or what?


if its just a few global settings, you should consider using custom 
settings variables, rather than database tables.   for instance, pljava 
has a setting, 
pljava.libjvm_location='/usr/lib/jvm/java-1.8.0/lib/libjvm.so' or 
whatever which it uses to find the Java native calls interface library...





--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread david

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Jan de Visser

One thing that's probably key here is that pgsql isn't multi-threaded. 
Individual connections are handled by forked backends, which share a shared- 
memory cache that's not accessible by SQL-land code (which includes language 
handlers). So I think your problem goes away once you realize that all the data 
you have is tied to a single connection anyway.

[dmb>] 
[dmb>] Thanks. Yes, I knew that. My problem is a strategy to start up the 
language engine once (per session), load stuff from the database once, and then 
keep the per-call cost as low as possible. I know that palloc() memory goes 
away; that it has a lifetime longer than a function invocation and should span 
a recursive function call, but definitely not longer than a transaction. So I 
need to get some memory per session and keep a pointer to it per session (which 
it seems is per process).

You cannot use multi-threaded code (which touches the database) in language 
handlers or other "plug-in" code.
[dmb>] That's good -- I hate threads. I've never been able to figure out how to 
test them adequately. No loss there.

Also, trying to outsmart the db engine's cache by building your own is usually 
an exercise in futility and often counter-productive. I speak from experience 
:-P
[dmb>] Isn't cache invalidation the second hardest thing in computing (after 
naming things)?

[dmb>] Seems like DLL static memory with allocation from process memory (or 
even malloc()) is "the simplest thing that could possibly work".

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless

On 7 March 2016 at 20:23, Jeff Janes  wrote:
> PostgreSQL does not (yet) implement "loose" index scans or "skip
> scans", which is what you are asking for.  You can roll your own using
> the techniques described here:
> https://wiki.postgresql.org/wiki/Loose_indexscan, which has the
> benefit over your example code in that you don't need to enumerate all
> possible values, it effectively does it for you.

Uh huh. This is obviously where my expectation is wrong, thanks. It
certainly makes it more obvious why (sc_id,scdate) is more attractive
to the planner than (scdate,sc_id) and why the index that was
transferred from the Other Database that we've migrated from isn't
useful here :)

Geoff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Peter J. Holzer

On 2016-03-07 16:37:37 +, Geoff Winkless wrote:
> On 7 March 2016 at 16:02, Tom Lane  wrote:
> > In English, what that plan is trying to do is scan the index
> > in sc_id order until it hits a row with scdate in the target range.
> > The first such row, by definition, has the correct min(sc_id) value.
> > The problem is that we're guessing at how soon we'll hit such a row.
> > If the columns are independent, then the planner can guess based on how
> > many rows in the whole table have scdate in the target range, and it
> > will probably be about right.  But that estimate can fall down very
> > badly if sc_id and scdate increase together, because then the target
> > rows aren't randomly distributed in the index sequence but could all be
> > all the way at the far end of the index.
> 
> I'm sorry, I'm obviously not being clear. I already accepted this
> argument when Victor gave it, although I believe that in part it falls
> down because sc_id is also (potentially) randomly distributed so it's
> not like you're doing a sequential table scan (it might work better on
> a clustered table, but we don't have those :) )
> 
> So you still have an extra layer of indirection into a large table
> with lots of random accesses.
> 
> > If we had cross-column correlation stats we could detect this pitfall,
> > but without that it's hard to do.
> 
> But as far as I can see, apart from the absolute extremes, the
> index-only scan is _always_ going to be quicker than the index+table
> scan.

We are talking about an "absolute extreme" here. You have about 420 date
values and you are looking for 3 of them. Assuming for the moment that
your distribution is uniform, that's 140th of the whole table.

So if PostgreSQL were using the (sc_date,sc_id) index, it would have so
scan 4E6/140 = 29000 index entries, extract the id value and get the
minumum of those 29000 values.

OTOH, if it uses the sc_id index, it only expects to have to scan 140
entries until it finds a matching entry. And then it is finished.

So it's 140 index entries plus row accesses against 29000 index entries.
To choose the second plan, the planner would have to estimate that
reading a random row is more than 200 times slower than reading an index
entry, which apparently it doesn't.

As Tom wrote, the estimate of having to read only about 140 rows is only
valid if sc_id and sc_date are uncorrelated. In reality your query has
to read a lot more than 140 rows, so it is much slower.

> I don't believe you need any further statistics than what is currently
> available to be able to make that judgement, and that's why I believe
> it's suboptimal.

We all know it is suboptimal, but unfortunately, without additional
statistics I don't think there is a better way. The other way around -
assuming that the columns are correlated in the worst possible way -
would remove viable plans in many cases. 

This is, I think one of the places where hints are a good idea. The
programmer sometimes knows more about the characteristics of the data
than the planner can possibly know and it is a pity that there is no way
for the programmer to pass that knowledge to the planner. (And yes, I
know that quite often the programmer is wrong - but I do believe in
giving people enough rope to hang themselves with)

hp

-- 
   _  | Peter J. Holzer| I want to forget all about both belts and
|_|_) || suspenders; instead, I want to buy pants 
| |   | h...@hjp.at | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

signature.asc
Description: Digital signature

Re: [GENERAL] index problems (again)

2016-03-07 Thread Jeff Janes

On Mon, Mar 7, 2016 at 9:35 AM, Geoff Winkless  wrote:
> On 7 March 2016 at 16:44, Tom Lane  wrote:
>> Geoff Winkless  writes:
>>> But as far as I can see, apart from the absolute extremes, the
>>> index-only scan is _always_ going to be quicker than the index+table
>>> scan.
>>
>> Well, that is a different issue: what does the planner think of an
>> index-only scan as compared to a regular index scan.  I suspect that
>> it's pricing the IOS very high because a lot of the table is dirty
>> and therefore will have to be visited even in a nominally index-only
>> scan.  You might check whether the plan choice changes immediately
>> after a VACUUM of the table.
>
> I ran VACUUM FULL and VACUUM ANALYZE. It made no difference. I would
> have thought that if it were the case then the equality-test queries
> would suffer from the same problem anyway, no?

No.  The range case scans the entire date range, visits the table for
each row in that range (to check visibility), and takes the min over
the sc_ids which pass the visibility check.

The equality test case jumps directly to the lowest sc_id for the
given scdate, and then has to walk up the sc_ids only until it finds
one which passes the visibility check.  Once it finds one which is
visible, it is done with that scdate.

Assuming most tuples are visible, that is a huge difference in the
amount of table blocks being visited.  (And maybe index blocks as
well)

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Jeff Janes

On Mon, Mar 7, 2016 at 8:37 AM, Geoff Winkless  wrote:
>
> But as far as I can see, apart from the absolute extremes, the
> index-only scan is _always_ going to be quicker than the index+table
> scan.

If relallvisible is zero, it thinks it gets zero benefit from an index
only scan.  It thinks that using a larger index has a small, but
non-zero, cost over the smaller index.

> We can see that by the massive speedup I get by
> using index(scid,scdate), which in all other respects is going to
> suffer from exactly the same problem from that the scid-only index
> suffers.

What massive speedup?  (scid,scdate) is the index it *does* use in
your worse demonstrated
case.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Jeff Janes

On Mon, Mar 7, 2016 at 5:01 AM, Geoff Winkless  wrote:
> On 7 March 2016 at 11:48, Victor Yegorov  wrote:
>> 2016-03-07 13:38 GMT+02:00 Geoff Winkless :
>>>
>>> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
>>> BETWEEN 20160219 AND 20160221;
>>
>>
>> Will it help if you'll add `count(*)` to your query like this:
>>
>> SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND
>> 20160221;
>
> Thanks for the reply.
>
> Yes, that does work around the problem, sort-of (although it's only
> using the scdate-only index, since it needs all the data):

You could also do "min(sc_id+0)" rather than adding a count(*) column.
Although that is not as future proof, as someday the planner might
recognize that '+0' is a no-op.

If your table is well-vacuumed such that pg_class.relallvisible is
high, then it should use the (scdate,sc_id) index in an index-only
scan.  But if relallvisible is low, it has to check the table itself
for visibility information which destroys most of the benefit of an
index-only scan, and thus would prefer to use the smaller index
instead.

> Even given that, I still don't see why the (scdate,sc_id) index isn't
> perfect for this; it allows the planner to use sc_id for MIN() while
> using scdate to restrict the values. Three values to look up from the
> index-only.
>
> If I manually change the query to do what I hoped the planner would do for me:
>
> SELECT LEAST(( SELECT MIN(sc_id) FROM legs WHERE scdate =20160219), (
> SELECT MIN(sc_id) FROM legs WHERE scdate =20160220), ( SELECT
> MIN(sc_id) FROM legs WHERE scdate =20160221));

PostgreSQL does not (yet) implement "loose" index scans or "skip
scans", which is what you are asking for.  You can roll your own using
the techniques described here:
https://wiki.postgresql.org/wiki/Loose_indexscan, which has the
benefit over your example code in that you don't need to enumerate all
possible values, it effectively does it for you.

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_restore man page question

2016-03-07 Thread Melvin Davidson

Be advised, the --section option is only available from 9.2 and up.

On Mon, Mar 7, 2016 at 2:08 PM, Adrian Klaver 
wrote:

> On 03/07/2016 10:22 AM, Karsten Hilbert wrote:
>
>> On Mon, Mar 07, 2016 at 11:03:39AM -0700, David G. Johnston wrote:
>>
>> The man page of pg_restore says

  --disable-triggers
  This option is relevant only when performing a
  data-only restore. It instructs pg_restore to execute
  commands to temporarily disable triggers on the
  target tables while the data is reloaded. [...]

 This hint seems overly narrow: when doing a restore of schema
 PLUS data (IOW, not a schema-only restore) it should apply,
 too, shouldn't it ?

 Otherwise it would suggest that pg_restore does this:

 - restore basic table structure
 - restore data
 - restore constraints and triggers and ...


>>> ... indexes
>>>
>>>
>>> Why is the surprising?
>>>
>>
>> It's not surprising to me. I was making sure which way things
>> are because the documentation I consulted wasn't clear on
>> that for me. The way things turn out to be -- so much the
>> better !
>>
>
> Yeah, you have to dig into one of the switch definitions to get an answer:
>
> http://www.postgresql.org/docs/9.5/interactive/app-pgrestore.html
>
> --section=sectionname
>
> Only restore the named section. The section name can be pre-data,
> data, or post-data. This option can be specified more than once to select
> multiple sections. The default is to restore all sections.
>
> The data section contains actual table data as well as large-object
> definitions. Post-data items consist of definitions of indexes, triggers,
> rules and constraints other than validated check constraints. Pre-data
> items consist of all other data definition items.
>
>
>> Karsten
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] multiple UNIQUE indices for FK

2016-03-07 Thread Francisco Olarte

Hi Rafal:

On Mon, Mar 7, 2016 at 9:29 AM, Rafal Pietrak  wrote:
.
>> be slower ). And you introduce several problems, the naming ones, a
>> very strange foreign-key relationship between kids, the possibility of
>> having a row inserted in the parent.
> No, no. It was OK. the parent table was appropriately "ruled on insert"
> for inherited tables to work as partitions.

But you have to rule a lot, to avoid people inserting into the kids,
anyway, without seeing the whole lot I'm not gonna comment more.

> and the table was called "messages" :)

But it did not contain messages, it contained message-persons relations.

. sniped, too complex without seeing the whole dessign.

>> http://xyproblem.info/ .
> :) this is a good one!!!
> Actually I'm very, very acquainted with this "XY problem". i.e quite
> often when "end-users" ask me for additional functionality, they (like
> in the XY case) suggest a "technical solution". And (just like you :) I
> always ask: pls tell me what you do "traditionally", e.g "when you are
> currently doing this on paper: how does it go - step by step", then I'll
> find a good IT solution for you.

Well, now you have an url to mail them.

> In case of a complex inter-mangled systems, where a well defined
> "critical point" shows up, it's more efficient to extract the "show
> case" that causes the problem and focus on this, instead of digressing
> on overall design. (which may be flowed, but cannot be rewritten at this
> point).

May be, but for me your solutions are so complex I cannot follow them.

> BTW: using timestamp instead of FK to message_id does not work quite so
> well. To see the problem, ponder a question: what time resolution should
> such timestamp have, to be as robust as FK ... irrespectively of the
> transaction load?

That's irrelevant. Timestamp is a concept, as I told you, it's just a
value whcich defines a full order. Normally the system has a timestamp
source which insures it. If you have not one you can use a cache=1
sequence.

When in a single process problem like this I normally use an
second timestamp which I autoincrement if repeated, something
like:

get_timestamp_for_id() {
  Locked(mutex) {
  now=time();
  if (last_returned_id >= now) {
   return ++ last_returned_id;
  } else {
   return last_returned_id = now;
  }
  }

This has the nice property that it eventually drops to timestamp after
a burst, so the ID do double service as generation timestamps, but a
single locked counter, a sequence, works as well.

... More snipping.

I cannot recommend more things. The only thing, for easier locating of
a message in a person, cache the last message id in the person ( which
you can use as a lock for updtings ) and just use the next for linking
the chain ( because, as you said, a message can be no longer the last,
so, unless this only happens when you destructively pop the last
message in the chain for a user, you need a full linked list to
recover the previous one ) ( if you determine the new last message by
other means you do not need any of these things, just cache the last
message in each person record, then when you insert a new one you
update each sender / recipient with the last message id at the same
time you insert the records, preferably sorting the ids first to avoid
deadlocks if your concurrency is high, although I suspect you'll need
a linked-list-per-user if it has to be the previous one ).

Francisco Olarte.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_restore man page question

2016-03-07 Thread Adrian Klaver


On 03/07/2016 10:22 AM, Karsten Hilbert wrote:

On Mon, Mar 07, 2016 at 11:03:39AM -0700, David G. Johnston wrote:


The man page of pg_restore says

 --disable-triggers
 This option is relevant only when performing a
 data-only restore. It instructs pg_restore to execute
 commands to temporarily disable triggers on the
 target tables while the data is reloaded. [...]

This hint seems overly narrow: when doing a restore of schema
PLUS data (IOW, not a schema-only restore) it should apply,
too, shouldn't it ?

Otherwise it would suggest that pg_restore does this:

- restore basic table structure
- restore data
- restore constraints and triggers and ...



... indexes


Why is the surprising?


It's not surprising to me. I was making sure which way things
are because the documentation I consulted wasn't clear on
that for me. The way things turn out to be -- so much the
better !


Yeah, you have to dig into one of the switch definitions to get an answer:

http://www.postgresql.org/docs/9.5/interactive/app-pgrestore.html

--section=sectionname

Only restore the named section. The section name can be pre-data, 
data, or post-data. This option can be specified more than once to 
select multiple sections. The default is to restore all sections.


The data section contains actual table data as well as large-object 
definitions. Post-data items consist of definitions of indexes, 
triggers, rules and constraints other than validated check constraints. 
Pre-data items consist of all other data definition items.




Karsten




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_restore man page question

2016-03-07 Thread Karsten Hilbert

On Mon, Mar 07, 2016 at 11:03:39AM -0700, David G. Johnston wrote:

> > The man page of pg_restore says
> >
> > --disable-triggers
> > This option is relevant only when performing a
> > data-only restore. It instructs pg_restore to execute
> > commands to temporarily disable triggers on the
> > target tables while the data is reloaded. [...]
> >
> > This hint seems overly narrow: when doing a restore of schema
> > PLUS data (IOW, not a schema-only restore) it should apply,
> > too, shouldn't it ?
> >
> > Otherwise it would suggest that pg_restore does this:
> >
> > - restore basic table structure
> > - restore data
> > - restore constraints and triggers and ...
> >
> 
> ... indexes
> 
> 
> Why is the surprising?

It's not surprising to me. I was making sure which way things
are because the documentation I consulted wasn't clear on
that for me. The way things turn out to be -- so much the
better !

Karsten
-- 
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_restore man page question

2016-03-07 Thread David G. Johnston

On Mon, Mar 7, 2016 at 10:17 AM, Karsten Hilbert 
wrote:

> The man page of pg_restore says
>
> --disable-triggers
> This option is relevant only when performing a
> data-only restore. It instructs pg_restore to execute
> commands to temporarily disable triggers on the
> target tables while the data is reloaded. [...]
>
> This hint seems overly narrow: when doing a restore of schema
> PLUS data (IOW, not a schema-only restore) it should apply,
> too, shouldn't it ?
>
> Otherwise it would suggest that pg_restore does this:
>
> - restore basic table structure
> - restore data
> - restore constraints and triggers and ...
>

... indexes

Why is the surprising?  When restoring everything it is able to "cheat"
since it knows that what it is restoring is a self-consistent state of
being and as such there is no need or desire to fire triggers, check
constraints, or maintain indexes, as the data is being loaded.  I'm not
positive on the exact mechanics but observation and Josh's comment
cross-thread support this explanation.

It can be readily checked by passing the --list option to pg_restore.

David J.

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless

On 7 March 2016 at 16:44, Tom Lane  wrote:
> Geoff Winkless  writes:
>> But as far as I can see, apart from the absolute extremes, the
>> index-only scan is _always_ going to be quicker than the index+table
>> scan.
>
> Well, that is a different issue: what does the planner think of an
> index-only scan as compared to a regular index scan.  I suspect that
> it's pricing the IOS very high because a lot of the table is dirty
> and therefore will have to be visited even in a nominally index-only
> scan.  You might check whether the plan choice changes immediately
> after a VACUUM of the table.

I ran VACUUM FULL and VACUUM ANALYZE. It made no difference. I would
have thought that if it were the case then the equality-test queries
would suffer from the same problem anyway, no?

Even being fairly kind and selecting an scdate range that's only 1%
into the set the query takes over 4 times the amount of time taken by
the indexed query - so the "best" range for the index+table method is
utterly tiny - it would be reasonable only when the scdate field is
uniformly distributed, which even in a table without correlation
between the fields is likely to be almost never.

Geoff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_restore man page question

2016-03-07 Thread Joshua D. Drake


On 03/07/2016 09:17 AM, Karsten Hilbert wrote:

The man page of pg_restore says

--disable-triggers
This option is relevant only when performing a
data-only restore. It instructs pg_restore to execute
commands to temporarily disable triggers on the
target tables while the data is reloaded. [...]

This hint seems overly narrow: when doing a restore of schema
PLUS data (IOW, not a schema-only restore) it should apply,
too, shouldn't it ?


No because triggers are restored AFTER data. This explicitly applies to 
data being restored to an existing schema that has triggers on the table 
the data is being loaded into.


Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Custom column ordering

2016-03-07 Thread Steven Xu

I see. Thanks again! I removed the "port" data type from my implementation and just used the ORDER BY... USING clause and the explain output/results is providing what I expect.StevenThat is an editing mistake.  I mean> hasegeli=# SELECT port FROM device_port ORDER BY cast_to_port(port);

Re: [GENERAL] Inserting JSON via Java PreparedStatment

2016-03-07 Thread rob stone

On Mon, 2016-03-07 at 09:32 -0700, David G. Johnston wrote:
> 
> > 
> > Hint: You will need to rewrite or cast the expression.
> > 
> Take the hint, literally.  You never did show the SQL but usually the
> least complex way to solve this is to indeed transfer the data as a
> string/text and then instruction PostgreSQL to convert (i.e., cast)
> it to json.
> 
> SELECT (?)::json;  <-- that ? parameter is seen as text; then you
> convert it.  The parentheses should be optional but I use them to
> emphasize the point.
> 
> then
> 
> pStmt.setString(1, dtlRec.toString());
> 
> David J.
> 

For some reason there is no java.sql.Type = JSON. There is ARRAY
though.

I would have written this:-

JsonObject mbrLogRec = Json.createObjectBuilder().build();
mbrLogRec = Json.createObjectBuilder()
.add("New MbrID", newId)
.build();

as

JsonObject mbrLogRec = Json.createObjectBuilder().add("New MbrID",
newId);

pStmt.setObject(11, mbrLogRec);

If you pass a string to your prepared statement and want to cast it in
your INSERT/UPDATE statement, you will probably have to include the
double quotes, colons and commas. Never tried it, just a guess.
Could become complicated when you have multiple pairs of JSON
attributes.

E.g.

JsonObject mbrLogRec = Json.createObjectBuilder().add("New MbrID",
newId).add("Old MbrID","fred");

I'm sorry but I don't have time at the moment to knock up a test
program and verify any of this. I'm not an expert on JSON objects in
Java.

Just my two bob's worth.

HTH,
Rob

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_restore man page question

2016-03-07 Thread Karsten Hilbert

The man page of pg_restore says

--disable-triggers
This option is relevant only when performing a
data-only restore. It instructs pg_restore to execute
commands to temporarily disable triggers on the
target tables while the data is reloaded. [...]

This hint seems overly narrow: when doing a restore of schema
PLUS data (IOW, not a schema-only restore) it should apply,
too, shouldn't it ?

Otherwise it would suggest that pg_restore does this:

- restore basic table structure
- restore data
- restore constraints and triggers and ...

Karsten
-- 
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Tom Lane

Geoff Winkless  writes:
> But as far as I can see, apart from the absolute extremes, the
> index-only scan is _always_ going to be quicker than the index+table
> scan.

Well, that is a different issue: what does the planner think of an
index-only scan as compared to a regular index scan.  I suspect that
it's pricing the IOS very high because a lot of the table is dirty
and therefore will have to be visited even in a nominally index-only
scan.  You might check whether the plan choice changes immediately
after a VACUUM of the table.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless

On 7 March 2016 at 16:02, Tom Lane  wrote:
> In English, what that plan is trying to do is scan the index
> in sc_id order until it hits a row with scdate in the target range.
> The first such row, by definition, has the correct min(sc_id) value.
> The problem is that we're guessing at how soon we'll hit such a row.
> If the columns are independent, then the planner can guess based on how
> many rows in the whole table have scdate in the target range, and it
> will probably be about right.  But that estimate can fall down very
> badly if sc_id and scdate increase together, because then the target
> rows aren't randomly distributed in the index sequence but could all be
> all the way at the far end of the index.

I'm sorry, I'm obviously not being clear. I already accepted this
argument when Victor gave it, although I believe that in part it falls
down because sc_id is also (potentially) randomly distributed so it's
not like you're doing a sequential table scan (it might work better on
a clustered table, but we don't have those :) )

So you still have an extra layer of indirection into a large table
with lots of random accesses.

> If we had cross-column correlation stats we could detect this pitfall,
> but without that it's hard to do.

But as far as I can see, apart from the absolute extremes, the
index-only scan is _always_ going to be quicker than the index+table
scan. It doesn't matter whether or not the distribution is random or
skewed, the index-only scan is going to be better (or approximately
equally as good). We can see that by the massive speedup I get by
using index(scid,scdate), which in all other respects is going to
suffer from exactly the same problem from that the scid-only index
suffers.

And the real advantage: at the extremes, the index-only worst-case is
minimally worse than the best case. Whereas the worst-case of the
index-scan-plus-table-compare method is horrific.

I don't believe you need any further statistics than what is currently
available to be able to make that judgement, and that's why I believe
it's suboptimal.

Geoff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Inserting JSON via Java PreparedStatment

2016-03-07 Thread Adrian Klaver


On 03/07/2016 05:25 AM, Curt Huffman wrote:

Hello!

I’m struggling to insert a JSON object into my postgres v9.4 DB. I have
defined the column called "evtjson" as type json. (not jsonb)
I am trying to use a prepared statement in Java (jdk1.8) to insert a
Json object (built using JEE javax.json libraries) into the column, but
I keep running into SQLException errors.

I'm using JDBC 9.4.1208

I create the JSON object using:

|JsonObjectmbrLogRec =Json.createObjectBuilder().build();…mbrLogRec
=Json.createObjectBuilder().add("New MbrID",newId).build();|

Then I pass this object as a parameter to another method to write it to
the DB using a prepared statement. (along with several other fields) As:

|pStmt.setObject(11,dtlRec);|


You lost me here, I thought the object you are building is mbrLogRec?



Using this method, I receive the following error:

at org.postgresql.util.PSQLException: No hstore extension installed.

at
org.postgresql.jdbc.PgPreparedStatement.setMap(PgPreparedStatement.java:553)


at

org.postgresql.jdbc.PgPreparedStatement.setObject(PgPreparedStatement.java:1036)

I did try installing the hstore extension, but it then told me that it
was not an hstore object.

I have also tried:

|pStmt.setString(11,dtlRec.toString());pStmt.setObject(11,dtlRec.toString());|

Which produce a different error:

Event JSON: {"New MbrID":29}

SQLException: ERROR: column "evtjson" is of type json but expression
is of type character varying

Hint: You will need to rewrite or cast the expression.

But, at least this tells me that the DB is recognizing the column as
type JSON.

OracleDocs shows a number of various methods to set the parameter value
in the preparedStatement, but I'd rather not try them all if someone
knows the answer.
(http://docs.oracle.com/javase/8/docs/api/java/sql/PreparedStatement.html)
These also reference an additional parameter, SQLType, but I can't find
any refernce to these.
Should I try setAsciiStream? CharacterStream? CLOB? ???

I couldn't find any help or tutes on postgres or the web.

Thanks for any help.

-Curt




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Inserting JSON via Java PreparedStatment

2016-03-07 Thread David G. Johnston

> pStmt.setString(11, dtlRec.toString());
> pStmt.setObject(11, dtlRec.toString());
>
> Which produce a different error:
>
> Event JSON: {"New MbrID":29}
>
> SQLException: ERROR: column "evtjson" is of type json but expression is of
> type character varying
>
> Hint: You will need to rewrite or cast the expression.
>
>
Take the hint, literally.  You never did show the SQL but usually the
least complex way to solve this is to indeed transfer the data as a
string/text and then instruction PostgreSQL to convert (i.e., cast) it to
json.

SELECT (?)::json;  <-- that ? parameter is seen as text; then you convert
it.  The parentheses should be optional but I use them to emphasize the
point.

then

pStmt.setString(1, dtlRec.toString());

David J.

[GENERAL] query reboot pgsql 9.5.1

2016-03-07 Thread MOLINA BRAVO FELIPE DE JESUS


Hi!!!

I try to explain my problem...sorry for my english :(


In pgsql 9.5.1 I have a two tables with the next structure:

1. Tabla unlogged «public._gc_cat»
   Columna | Tipo | Modificadores 
   -+--+---
   idppicat| integer  | 
   idprodxintegrar | integer  | 
   tipo| character(1) | 
   valor   | numeric  | 
   estado  | character(1) | 
   idsll   | text | 
   idsfte  | text | 
   arama   | text[]   | 
   ne_arama| integer  | 
   rama| text | 
   rvar| text | 
   nodec   | integer  | 

   Índices:
   "_gc_cat_arama" btree (ne_arama)
   "_gc_cat_arama_gin" gin (arama)

2. Tabla unlogged «public._gc_tb»
   Columna  |  Tipo   | Modificadores 
   --+-+---
   idb2 | integer | 
   idc1 | integer | 
   rama | text| 
   arama| text[]  | 
   ne_arama | integer | 
   Índices:
   "_gc_tb_arama" btree (ne_arama)
   "_gc_tb_arama_gin" gin (arama)
   "_gc_tb_idb2idc1" btree (idb2, idc1)


the tabla _gc_cat have 91932 records an _gc_tb have 120130 records; when i run 
the
next query:

SELECT idprodxintegrar
  FROM _gc_tb a
  LEFT  join
   _gc_cat b
   on ( b.arama <@ a.arama  and a.arama < @ b.arama )

psql send the next message (after three minutes aprox.):
      Terminado (killed)

and i have to reboot my "guest server".

Now i execute the same in pgsql 9.4.5 and all is fine!!! 

The EXPLAINs are:

- pgsql 9.5.1:

  Nested Loop Left Join  (cost=0.03..492944.81 rows=276095 width=4)
->  Seq Scan on _gc_tb a  (cost=0.00..3321.30 rows=120130 width=66)
  ->  Bitmap Heap Scan on _gc_cat b  (cost=0.03..4.06 rows=2 width=70)
Recheck Cond: ((arama <@ a.arama) AND (a.arama <@ arama))
->  Bitmap Index Scan on _gc_cat_arama_gin
(cost=0.00..0.03 rows=2 width=0)
   Index Cond: ((arama <@ a.arama) AND
   (a.arama <@ arama))


- pgsql 9.4.5:
  Nested Loop Left Join  (cost=0.03..492944.81 rows=276095 width=4)
->  Seq Scan on _gc_tb a  (cost=0.00..3321.30 rows=120130 width=66)
   ->  Bitmap Heap Scan on _gc_cat b  (cost=0.03..4.06 rows=2 width=70)
  Recheck Cond: ((arama <@ a.arama) AND (a.arama <@ arama))
   ->  Bitmap Index Scan on _gc_cat_arama_gin
   (cost=0.00..0.03 rows=2 width=0)
  Index Cond: ((arama <@ a.arama) AND
  (a.arama <@ arama))

If i change the query as:
   SELECT idprodxintegrar
 FROM _gc_tb a
 LEFT  join
  _gc_cat b
  on ( a.ne_arama = b.ne_arama and a.arama <@ b.arama )

In pgsql 9.5.1 finished after  450708.112 ms

In pgsql 9.4.5 finished after   17996.756 ms (very fast!!!)

The EXPLAINs are:
- pgsql 9.5.1
  Nested Loop Left Join  (cost=3.49..1915550.34 rows=41825277 width=4)
->  Seq Scan on _gc_tb a  (cost=0.00..3321.30 rows=120130 width=70)
   ->  Bitmap Heap Scan on _gc_cat b  (cost=3.49..14.39 rows=153 width=74)
 Recheck Cond: (a.arama <@ arama)
Filter: (a.ne_arama = ne_arama)
 ->  Bitmap Index Scan on _gc_cat_arama_gin
 (cost=0.00..3.45 rows=460 width=0)
Index Cond: (a.arama <@ arama)

- pgsql 9.4.5
  Nested Loop Left Join  (cost=3.48..1868759.71 rows=42284738 width=4)
->  Seq Scan on _gc_tb a  (cost=0.00..3321.30 rows=120130 width=70)
   ->  Bitmap Heap Scan on _gc_cat b  (cost=3.48..14.38 rows=115 width=74)
 Recheck Cond: (a.arama <@ arama)
   Filter: (a.ne_arama = ne_arama)
->  Bitmap Index Scan on _gc_cat_arama_gin
(cost=0.00..3.45 rows=460 width=0)
   Index Cond: (a.arama <@ arama)


The shared_buffers and work_mem are the same in both versions of pgsql (128MB 
and
4MB)

I am doing this test in a laptop with the next characteristics:

- hp probook with 8 Gb ram. SATA disk, AMD A8-5550M 
- OS Linux (fedora 23)
- lxc containers


I am sharing the dumper's database are in the next links:

  http://sipre.inegi.org.mx/pgsql/dump_pgsql_9_4_5.dump

  http://sipre.inegi.org.mx/pgsql/dump_pgsql_9_5_1.dump


similar post is found in:

http://www.postgresql.org/message-id/CALrs2KPMowV6juLdOkMRq_P3MA5VkUmhdM4Q1OD0vCf2qimFfA@mail.gmail.
com

thanks in advance!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Inserting JSON via Java PreparedStatment

2016-03-07 Thread Curt Huffman

Hello!

I’m struggling to insert a JSON object into my postgres v9.4 DB. I have
defined the column called "evtjson" as type json. (not jsonb)
I am trying to use a prepared statement in Java (jdk1.8) to insert a Json
object (built using JEE javax.json libraries) into the column, but I keep
running into SQLException errors.

I'm using JDBC 9.4.1208

I create the JSON object using:

JsonObject mbrLogRec = Json.createObjectBuilder().build();…
mbrLogRec = Json.createObjectBuilder()
.add("New MbrID", newId)
.build();

Then I pass this object as a parameter to another method to write it to the
DB using a prepared statement. (along with several other fields) As:

pStmt.setObject(11, dtlRec);

Using this method, I receive the following error:

at org.postgresql.util.PSQLException: No hstore extension installed.

at
org.postgresql.jdbc.PgPreparedStatement.setMap(PgPreparedStatement.java:553)

at
org.postgresql.jdbc.PgPreparedStatement.setObject(PgPreparedStatement.java:1036)

I did try installing the hstore extension, but it then told me that it was
not an hstore object.

I have also tried:

pStmt.setString(11, dtlRec.toString());
pStmt.setObject(11, dtlRec.toString());

Which produce a different error:

Event JSON: {"New MbrID":29}

SQLException: ERROR: column "evtjson" is of type json but expression is of
type character varying

Hint: You will need to rewrite or cast the expression.

But, at least this tells me that the DB is recognizing the column as type
JSON.

OracleDocs shows a number of various methods to set the parameter value in
the preparedStatement, but I'd rather not try them all if someone knows the
answer. (
http://docs.oracle.com/javase/8/docs/api/java/sql/PreparedStatement.html)
These also reference an additional parameter, SQLType, but I can't find any
refernce to these.
Should I try setAsciiStream? CharacterStream? CLOB? ???

I couldn't find any help or tutes on postgres or the web.

Thanks for any help.

-Curt

Re: [GENERAL] index problems (again)

2016-03-07 Thread Tom Lane

Geoff Winkless  writes:
> On 7 March 2016 at 14:51, Tom Lane  wrote:
>> Because the other way is estimated to be cheaper.  The estimate is
>> wrong, because it's based on a statistical assumption that's wrong
>> (ie that sc_id and scdate are uncorrelated), but it's what we have
>> to work with at the moment.

> Are you saying that the planner can't tell without scanning the index
> how much of the index the range constraint will retrieve?

The question isn't "how much", the question is "where is that data
exactly?".  In English, what that plan is trying to do is scan the index
in sc_id order until it hits a row with scdate in the target range.
The first such row, by definition, has the correct min(sc_id) value.
The problem is that we're guessing at how soon we'll hit such a row.
If the columns are independent, then the planner can guess based on how
many rows in the whole table have scdate in the target range, and it
will probably be about right.  But that estimate can fall down very
badly if sc_id and scdate increase together, because then the target
rows aren't randomly distributed in the index sequence but could all be
all the way at the far end of the index.

> I accept that this is how the planner behaves, but I don't accept that
> it's optimal.

If we had cross-column correlation stats we could detect this pitfall,
but without that it's hard to do.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless

On 7 March 2016 at 14:51, Tom Lane  wrote:
> Geoff Winkless  writes:
>> So it seems that it should in fact be usable after all. So I'm still
>> stumped as to why the (scdate,sc_id) index isn't used :(
>
> Because the other way is estimated to be cheaper.  The estimate is
> wrong, because it's based on a statistical assumption that's wrong
> (ie that sc_id and scdate are uncorrelated), but it's what we have
> to work with at the moment.

Are you saying that the planner can't tell without scanning the index
how much of the index the range constraint will retrieve? That's
reasonable, I suppose, but if you consider the relative size of the
index (92MB) and table (1.6GB) (both of which pieces of information
are available to the planner at query-time) if I were to scan 3% of
the table (which we assume the planner is estimating because of the
cardinality of the scdate field) I've read as much data from disk as
I've read for 50% of the index. That's ignoring the reads I'll have to
do from the sc_id index too... so in the worst-case where I've had to
read the entire index (because the range didn't actually restrict any
records) I'm still only 2x the average-case of the other way. Whereas
the worst-case of the sc_id-only-index-plus-table-retrieve is about
1000x the worst case of the index-only scan.

> select min((select min(sc_id) from legs where scdate = gs))
> from generate_series(20160219, 20160221) gs

> This would only work well for relatively small ranges of scdate,

As it happens it works for the full range of scdate and returns in 99ms.

# select min((select min(sc_id) from legs where scdate = gs))
from generate_series(20150101, 20160303) gs;
   min
--
 12914746
(1 row)

Time: 99.210 ms

> but if you had a large range then I think the original plan
> would've been fine.

Well yes, obviously doing MIN() across the whole range is going to be
able to just return as soon as it gets the first value from sc_id and
references the table to check the date; however even in that _best_
case the value comes back in 25ms, ie the _best-case_
index-plus-table-scan is 1/3 the time of the worst-case index-only
scan.

I accept that this is how the planner behaves, but I don't accept that
it's optimal.

Geoff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Custom column ordering

2016-03-07 Thread Emre Hasegeli

> Although, I do find your output very strange. You wrote ORDER BY port, which
> is a text type. Why does Postgres order using the ordering operators of the
> "Port" data type rather than the "text" type, even though you haven't
> performed a cast?

That is an editing mistake.  I mean

> hasegeli=# SELECT port FROM device_port ORDER BY cast_to_port(port);


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread David Bennett

Thank you for your response.

 

Yes, I was aware of GD and SD. My question is about what facilities Postgres 
provides for implementing such a thing. Where is the proper place for the root 
of the SD/GD? What does an implementation use to determine that two calls 
belong to the same session?

 

I’m not finding that easy to understand by reading source code.

 

Regards

David M Bennett FACS

  _  

MD Powerflex Corporation, creators of PFXplus

To contact us, please call +61-3-9548-9114 or go to  
 www.pfxcorp.com/contact.htm

 

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David G. Johnston
Sent: Monday, 7 March 2016 4:28 PM
To: da...@andl.org
Cc: pgsql-general-owner+M220479=david=andl@postgresql.org; pgsql-general 

Subject: Re: [GENERAL] Does a call to a language handler provide a 
context/session, and somewhere to keep session data?

 

On Sun, Mar 6, 2016 at 10:21 PM, mailto:da...@andl.org> > 
wrote:

Given that a language handler would be expected to be persistent, and to 
support concurrent (and reentrant) calls within a single database, is there a 
unique context or session identifier available?

Is there a place to store data, to be retrieved later in the same session?

Is there a place to store data (a cache?) that has been retrieved from the 
database for use by concurrent sessions using that database?

 

http://www.postgresql.org/docs/9.5/interactive/plpython-sharing.html

 

PL/R also has an implementation for this kind of thing.

 

David J.

[GENERAL] Script to check replication

2016-03-07 Thread Ashish Chauhan

Hi,

We are running Streaming replication with Hot standby in our current 
production. I do have below script to check replication status and replication 
lag. Does anyone have script which runs every 15 mins to check replication 
status and send out email if replication is not running or lagging behind.

SELECT pg_last_xlog_receive_location() receive, pg_last_xlog_replay_location() 
replay, now() - pg_last_xact_replay_timestamp() AS replication_delay, 
(extract(epoch FROM now()) -   extract(epoch FROM 
pg_last_xact_replay_timestamp()))::int lag

Thanks for your help!

Thanks,
Ashish Chauhan

Re: [GENERAL] Re: "missing chunk number XX for toast value YY in pg_toast ..." after pg_basebackup.

2016-03-07 Thread Adrian Klaver


On 03/06/2016 10:18 PM, fred...@huitfeldt.com wrote:

HI All,

i would really appreciate any help I can get on this issue.

basically, a pg_basebackup + streaming attach, led to a database that we
could not read from afterwards.



From original post:

http://www.postgresql.org/message-id/1456919678340.31300.116900@webmail2

"The issue remained until we ran a full vacuum analyze on the cluster."

Which cluster was that, the master or the slave?

"I have logfiles from the incident, but I cannot see anything out of the 
ordinary (despite having a fair amount of experience investigating 
postgresql logs)."



Can we see the section before and after ERROR?



Beset regards,
Fredrik

PS please advise if this is better posted on another list.



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Custom column ordering

2016-03-07 Thread Steven Xu

Hi Emre,Thanks for your tips! I think I'll do as you suggest and keep it simple with a single ordering operator.Although, I do find your output very strange. You wrote ORDER BY port, which is a text type. Why does Postgres order using the ordering operators of the "Port" data type rather than the "text" type, even though you haven't performed a cast?Steven-Emre Hasegeli  wrote: -To: Steven Xu From: Emre Hasegeli Date: 03/05/2016 09:30AMCc: "pgsql-general@postgresql.org" Subject: Re: [GENERAL] Custom column ordering>   - Why is PostgreSQL not using the functional index I created and why is it> not being ordered correctly?Your example works for me:> hasegeli=# CREATE TABLE device_port (port text);> CREATE TABLE>> hasegeli=# CREATE INDEX idx_device_port_port_proper ON device_port (cast_to_port(port) port_ops DESC);> CREATE INDEX>> hasegeli=# INSERT INTO device_port VALUES ('a'), ('b'), ('c');> INSERT 0 3>> hasegeli=# SELECT port FROM device_port ORDER BY port;> port> --> c> b> a> (3 rows)>> hasegeli=# SET enable_seqscan = 0;> SET>> hasegeli=# EXPLAIN ANALYZE SELECT port FROM device_port ORDER BY cast_to_port(port);>                                                                        QUERY PLAN> >  Index Scan Backward using idx_device_port_port_proper on device_port  (cost=0.15..408.55 rows=1360 width=32) (actual time=0.042..0.053 rows=3 loops=1)>  Planning time: 0.079 ms>  Execution time: 0.079 ms> (3 rows)>   - Is creating a separate data type and using a functional index on the> casts to this data type the right approach to a custom ordering?You don't need to create a type for this.  You can just create anon-default operator class and use it with your text type by specifythe operator with ORDER BY ... USING clause.> Creating the index:> CREATE INDEX idx_device_port_port_proper on device_port (cast_to_port(port) port_ops desc);The operator class is not necessary in here as it is the default forthe "port" type.  DESC also wouldn't make any difference.

Re: [GENERAL] index problems (again)

2016-03-07 Thread Tom Lane

Geoff Winkless  writes:
> So it seems that it should in fact be usable after all. So I'm still
> stumped as to why the (scdate,sc_id) index isn't used :(

Because the other way is estimated to be cheaper.  The estimate is
wrong, because it's based on a statistical assumption that's wrong
(ie that sc_id and scdate are uncorrelated), but it's what we have
to work with at the moment.

As you found upthread, that index could be used in the way you want
if you had an equality condition on scdate.  So the workaround
I'd suggest is to whack the query into that shape.  Something
along the lines of (untested)

select min((select min(sc_id) from legs where scdate = gs))
from generate_series(20160219, 20160221) gs

This would only work well for relatively small ranges of scdate,
but if you had a large range then I think the original plan
would've been fine.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless

On 7 March 2016 at 14:27, I wrote:
> So it seems that it should in fact be usable after all. So I'm still
> stumped as to why the (scdate,sc_id) index isn't used :(

Also, while the index on sc_id will be sorted there's no guarantee
that sc_id values will be in order in the table itself, so you're
still left with (30,000) potentially random accesses to the table,
even assuming fully random distribution of scdate (with a worst-case
of 97 random accesses). That average case is no better than the
(30,000) random accesses that were required from using an scdate
index, even ignoring the scdate/sc_id index.

So I'm afraid I'm fully back in the "I still don't get it" column.

Geoff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless

On 7 March 2016 at 14:18, I wrote:
> That assumes that I've not completely misunderstood, of course :)

Always a dangerous assumption, I'm rapidly learning.

The very next section:

   Constraints on columns to the right of these columns are checked
   in the index, so they save visits to the table proper, but they do not
   reduce the portion of the index that has to be scanned.

So it seems that it should in fact be usable after all. So I'm still
stumped as to why the (scdate,sc_id) index isn't used :(

Geoff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless

On 7 March 2016 at 13:23, Victor Yegorov  wrote:
> Your `sc_id` and `scdate` columns are correlated.

Actually not necessarily, although in the majority case that's mostly true.

> Planner has no such knowledge and assumes columns being independent. Your
> `scdate` predicate is
> estimate to return 14394 rows (based on the EXPLAIN of your first post). I
> think, that this corresponds to
> a quite small portion of your table, less than 1% (based on `Rows Removed by
> Filter: 4068865` from the
> same EXPLAIN). Under uniform distribution, these 14394 rows can be anywhere
> in the table.

Oh I see! To make sure I understand what you're saying: given a fully
random distribution of scdate across the table, searching the whole
table for the first instance of scdate=(20160219|20160220|20160221)
will be fastest, because using the scdate index could end up doing
lots of random-accesses across the whole table to get sc_id values,
whereas assuming a random distribution of scdate values I would only
expect to have to scan (sequentially) 3% of the table before I find
one of the scdate values that match, yes?

> Right now (9.5 and earlier versions) I do not know of any options that would
> not require fixing your queries.

Well I can cope with using (sc_id,scdate) index (which I guess wins
purely by being index-only and smaller than the table), but it makes
me unhappy.

I think I realised why the planner won't use (scdate,sc_id): the
multicolumn index page
(http://www.postgresql.org/docs/current/static/indexes-multicolumn.html)
suggests that:

   The exact rule is that equality constraints on leading columns,
   plus any inequality constraints on the first column that does not
   have an equality constraint, will be used to limit the portion of
   the index that is scanned.

So since the first column is an inequality constraint, PG won't use
the second column from index(scdate,sc_id) to retrieve the MIN()
value. This explains why it happily uses the index for the
LEAST(MIN(),MIN(),MIN()) version (since each one is an equality
condition); it seems like that rule is just too restrictive, because a
range constraint on the first key, when used in conjunction with an
index-only scan, is almost always going to win, even if the second
constraint matches all of the rows and even if the range constraint
returns all the values in the table (unless the index is larger than
the table itself, I suppose).

I might suggest that perhaps the rule should be relaxed so that an
inequality constraint does not stop the subsequent columns being used
in an index-only scan.

That assumes that I've not completely misunderstood, of course :)

Geoff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread Jan de Visser

On March 8, 2016 12:18:08 AM da...@andl.org wrote:
> From: pgsql-general-ow...@postgresql.org
> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce
> 
> 
> 
> Yes, I was aware of GD and SD. My question is about what facilities Postgres
> provides for implementing such a thing. Where is the proper place for the
> root of the SD/GD? What does an implementation use to determine that two
> calls belong to the same session?
> 
> the process ID is unique for each active session.   of course, the OS can
> recycle a PID when a process/connection terminates
> 
> [dmb>] Thanks for the idea, but I’m wary of using PID for that purpose.
> 
> [dmb>] In the Python implementation the GD appears to just be stored as a
> simple variable at file scope in the DLL. Would I be right in saying that
> the language handler DLL is loaded exactly once for each session (when the
> language is first used)? If so, then any unique identifier allocated in
> PG_init (such as a GUID or timestamp or counter) would seem to serve the
> purpose. I just wondered if there was something clever I hadn’t found out
> about yet.
> 

One thing that's probably key here is that pgsql isn't multi-threaded. 
Individual connections are handled by forked backends, which share a shared-
memory cache that's not accessible by SQL-land code (which includes language 
handlers). So I think your problem goes away once you realize that all the 
data you have is tied to a single connection anyway.

You cannot use multi-threaded code (which touches the database) in language 
handlers or other "plug-in" code.

Also, trying to outsmart the db engine's cache by building your own is usually 
an exercise in futility and often counter-productive. I speak from experience 
:-P

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Victor Yegorov

2016-03-07 15:01 GMT+02:00 Geoff Winkless :

> Unfortunately the cost of changing all the code that uses MIN() in
> this way would be higher than just adding an extra index :(
>
> I suppose the thought is that for selecting just the MIN() value, by
> traipsing through the index you immediately find the lowest match - so
> for a dataset where scdate cardinality is higher, this would make
> sense; indeed if I give this query a value with scdate in the low
> range of the table it returns quickly (although still slower than when
> it uses the scdate index).
>
> It seems to me that the weighting the planner applied to this MIN()
> rule is too high, or perhaps it needs to pay more attention to the
> statistics of the indexes for the WHERE clauses?
>
> Even given that, I still don't see why the (scdate,sc_id) index isn't
> perfect for this; it allows the planner to use sc_id for MIN() while
> using scdate to restrict the values. Three values to look up from the
> index-only.
>

Your `sc_id` and `scdate` columns are correlated.

Planner has no such knowledge and assumes columns being independent. Your
`scdate` predicate is
estimate to return 14394 rows (based on the EXPLAIN of your first post). I
think, that this corresponds to
a quite small portion of your table, less than 1% (based on `Rows Removed
by Filter: 4068865` from the
same EXPLAIN). Under uniform distribution, these 14394 rows can be anywhere
in the table.
Therefore, reading min values in the order of your PK is optimal, as you're
expected to hit a rows that
matches given conditions quite soon.

Problem is — your predicate matches a bunch of rows towards the end of the
table, which causes Postgres
to read a big portion of your index before it finds the row that fits.


Right now (9.5 and earlier versions) I do not know of any options that
would not require fixing your queries.


P.S. Maybe `Upper pathification` patch, that is being considered for 9.6,
can deal with such cases.


-- 
Victor Y. Yegorov

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread david

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of John R Pierce



Yes, I was aware of GD and SD. My question is about what facilities Postgres 
provides for implementing such a thing. Where is the proper place for the root 
of the SD/GD? What does an implementation use to determine that two calls 
belong to the same session?

the process ID is unique for each active session.   of course, the OS can 
recycle a PID when a process/connection terminates

[dmb>] Thanks for the idea, but I’m wary of using PID for that purpose.

[dmb>] In the Python implementation the GD appears to just be stored as a 
simple variable at file scope in the DLL. Would I be right in saying that the 
language handler DLL is loaded exactly once for each session (when the language 
is first used)? If so, then any unique identifier allocated in PG_init (such as 
a GUID or timestamp or counter) would seem to serve the purpose. I just 
wondered if there was something clever I hadn’t found out about yet.

 

Regards

David M Bennett FACS

  _  

Andl - A New Database Language - andl.org

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless

On 7 March 2016 at 11:48, Victor Yegorov  wrote:
> 2016-03-07 13:38 GMT+02:00 Geoff Winkless :
>>
>> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
>> BETWEEN 20160219 AND 20160221;
>
>
> Will it help if you'll add `count(*)` to your query like this:
>
> SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND
> 20160221;

Thanks for the reply.

Yes, that does work around the problem, sort-of (although it's only
using the scdate-only index, since it needs all the data):

 Aggregate  (cost=1242.59..1242.60 rows=1 width=4)
   ->  Index Scan using legs_scdate_idx on legs  (cost=0.43..1170.62
rows=14394 width=4)
 Index Cond: ((scdate >= 20160219) AND (scdate <= 20160221))

Unfortunately the cost of changing all the code that uses MIN() in
this way would be higher than just adding an extra index :(

I suppose the thought is that for selecting just the MIN() value, by
traipsing through the index you immediately find the lowest match - so
for a dataset where scdate cardinality is higher, this would make
sense; indeed if I give this query a value with scdate in the low
range of the table it returns quickly (although still slower than when
it uses the scdate index).

It seems to me that the weighting the planner applied to this MIN()
rule is too high, or perhaps it needs to pay more attention to the
statistics of the indexes for the WHERE clauses?

Even given that, I still don't see why the (scdate,sc_id) index isn't
perfect for this; it allows the planner to use sc_id for MIN() while
using scdate to restrict the values. Three values to look up from the
index-only.

If I manually change the query to do what I hoped the planner would do for me:

SELECT LEAST(( SELECT MIN(sc_id) FROM legs WHERE scdate =20160219), (
SELECT MIN(sc_id) FROM legs WHERE scdate =20160220), ( SELECT
MIN(sc_id) FROM legs WHERE scdate =20160221));

it returns in 16ms - and uses the (scdate_sc_id_idx) index as
expected; again though, I can't really justify changing all the code
to do that instead.

Geoff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] index problems (again)

2016-03-07 Thread Victor Yegorov

2016-03-07 13:38 GMT+02:00 Geoff Winkless :

> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
> BETWEEN 20160219 AND 20160221;
>

Will it help if you'll add `count(*)` to your query like this:

SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND
20160221;

?


-- 
Victor Y. Yegorov

[GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless

Hi all

Firstly, I appreciate that my index problems are fairly difficult to
debug given that I can't upload the data anywhere (it's commercially
sensitive); I tried creating an equivalent dataset for my last problem
using a lot of random() inserts, but unfortunately, even though the
sizes and index cardinality seemed similar, it didn't exhibit the same
problem, which leaves me a bit stuck.

I now have (what seems to me to be) an utterly bizarre situation where
postgres is using the "wrong" index, to the extent where I can't even
begin to comprehend why it would do so.

http://explain.depesz.com/s/uF4L

# EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
BETWEEN 20160219 AND 20160221;
 QUERY PLAN

 Result  (cost=25.54..25.55 rows=1 width=0) (actual
time=25337.593..25337.594 rows=1 loops=1)
   Buffers: shared hit=2976790 read=152188
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.43..25.54 rows=1 width=4) (actual
time=25337.579..25337.587 rows=1 loops=1)
   Buffers: shared hit=2976790 read=152188
   ->  Index Scan using legs_sc_id_idx on legs
(cost=0.43..361498.49 rows=14394 width=4) (actual
time=25337.578..25337.578 rows=1 loops=1)
 Index Cond: (sc_id IS NOT NULL)
 Filter: ((scdate >= 20160219) AND (scdate <= 20160221))
 Rows Removed by Filter: 4068865
 Buffers: shared hit=2976790 read=152188
 Planning time: 0.235 ms
 Execution time: 25337.620 ms
(12 rows)

Time: 25338.375 ms

There is an index on scdate,sc_id that (I would have thought) should
be ideal for this query but it's ignored.

sc_id has no null values - it's even _defined_ as NOT NULL. I've no
idea why the planner would think that it needs to use the sc_id index
on this query.

If I create an index on sc_id,scdate, that one is used (index-only
scan) and the query returns in 200ms or so.

http://explain.depesz.com/s/3qNC

=# EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
BETWEEN 20160219 AND 20160221;

 QUERY PLAN
---
 Result  (cost=7.32..7.33 rows=1 width=0) (actual
time=207.194..207.194 rows=1 loops=1)
   Buffers: shared hit=1 read=11120
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.43..7.32 rows=1 width=4) (actual
time=207.187..207.187 rows=1 loops=1)
   Buffers: shared hit=1 read=11120
   ->  Index Only Scan using legs_sc_id_scdate_idx on legs
(cost=0.43..99204.99 rows=14394 width=4) (actual time=207.185..207.185
rows=1 loops=1)
 Index Cond: ((sc_id IS NOT NULL) AND (scdate >=
20160219) AND (scdate <= 20160221))
 Heap Fetches: 0
 Buffers: shared hit=1 read=11120
 Planning time: 0.236 ms
 Execution time: 207.223 ms



I'm utterly at a loss. There are only 427 distinct scdate values on
this table, but 4 million sc_id values (and the spread across scdate
is reasonably similar - between 6000 and 11000 for each), so using an
index on (just) sc_id makes absolutely no sense (I would expect it to
be slower than a tablescan, no?). I also don't see how sc_id,scdate is
more useful than scdate,sc_id.

Have I completely misunderstood how this is all meant to work? I tried
reading the documentation around understanding EXPLAIN and the slow
query questions in the FAQ/Wiki but what I read didn't really seem to
suggest any investigative steps other than "RUN ANALYZE and VACUUM" -
is there a good doc on how to go about debugging this kind of thing?
Or even on how the planner makes its decisions?

I'm currently at the point where I'm just throwing random indexes at
tables in the vain hope that it might help. I'm fairly sure that
that's suboptimal :)

As before, pg9.5.1, CentOS 6 x64, 4GB RAM, Xeon X3220.
effective_cache_size is set to 3GB (but changing it wildly up or down
doesn't change anything), shared_buffers is 1GB, work_mem is 5242kB
(but changing to anything up to 1GB makes no difference).

Thanks

Geoff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread John R Pierce


On 3/7/2016 1:34 AM, da...@andl.org wrote:


Yes, I was aware of GD and SD. My question is about what facilities 
Postgres provides for implementing such a thing. Where is the proper 
place for the root of the SD/GD? What does an _implementation_ use to 
determine that two calls belong to the same session?





the process ID is unique for each active session.   of course, the OS 
can recycle a PID when a process/connection terminates



--
john r pierce, recycling bits in santa cruz

Re: [GENERAL] Does a call to a language handler provide a context/session, and somewhere to keep session data?

2016-03-07 Thread david

Thank you for your response.

 

Yes, I was aware of GD and SD. My question is about what facilities Postgres 
provides for implementing such a thing. Where is the proper place for the root 
of the SD/GD? What does an implementation use to determine that two calls 
belong to the same session?

 

I’m not finding that easy to understand by reading source code.

 

Regards

David M Bennett FACS

  _  

MD Powerflex Corporation, creators of PFXplus

To contact us, please call +61-3-9548-9114 or go to  
 www.pfxcorp.com/contact.htm

 

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David G. Johnston
Sent: Monday, 7 March 2016 4:28 PM
To: da...@andl.org
Cc: pgsql-general-owner+M220479=david=andl@postgresql.org; pgsql-general 

Subject: Re: [GENERAL] Does a call to a language handler provide a 
context/session, and somewhere to keep session data?

 

On Sun, Mar 6, 2016 at 10:21 PM, mailto:da...@andl.org> > 
wrote:

Given that a language handler would be expected to be persistent, and to 
support concurrent (and reentrant) calls within a single database, is there a 
unique context or session identifier available?

Is there a place to store data, to be retrieved later in the same session?

Is there a place to store data (a cache?) that has been retrieved from the 
database for use by concurrent sessions using that database?

 

http://www.postgresql.org/docs/9.5/interactive/plpython-sharing.html

 

PL/R also has an implementation for this kind of thing.

 

David J.

Re: [GENERAL] multiple UNIQUE indices for FK

2016-03-07 Thread Rafal Pietrak

W dniu 05.03.2016 o 19:53, Francisco Olarte pisze:
> Hi Rafal:
> 
> On Fri, Mar 4, 2016 at 11:46 PM, Rafal Pietrak  wrote:
>> W dniu 04.03.2016 o 18:59, Francisco Olarte pisze:
>>> Make sender_person_id NOT NULL in messages if you want to insure every
>>> message ahs exactly ONE SENDER, leave it out if you want to allow
>>> senderless messages. An FK column must either link to a record or be
>>> null.
>>>
>>> Then, if you want to have a msgs-person ''table'' I would use a view:
>>>
>>> CREATE VIEW msgs_persons as
>>> SELECT message_id, sender_person_id as person_id, 'SENDER' as role from 
>>> messages
>>> UNION ALL
>>> SELECT message_id, recipient_person_id as person_id, 'RECIPIENT' as
>>> role from recipients
>> This was my initial schema .. with the addition of one "super table",
>> that the two above (sender_person_id and recipient_person_id) both
>> inharited from (to avoid the UNION ALL when selecting everything).
> 
> Wuf. I do not like it. I would add a column named sender_person_id to
> messages ( to distinguish its role ) and put a recipient_person_id, or
> just person_id, in recipients ( the role is clear in that table ) to
> avoid problems. Otherwise, what do you call the parent table and the
> fields? It's a naming issue, nut I've found the hard way naming is
> important in this things. Bear in mind you do only avoid TYPING the
> union all when selecting everything ( as inheritance DOES do a union
> all, it would have to do it with both kids AND the parent, so it MAY
> be slower ). And you introduce several problems, the naming ones, a
> very strange foreign-key relationship between kids, the possibility of
> having a row inserted in the parent.

No, no. It was OK. the parent table was appropriately "ruled on insert"
for inherited tables to work as partitions.

and the table was called "messages" :)

[--]
>> broke the NEXT functionality and I cannot imagine any way to reintroduce
>> it into the new table layouts.
> 
> Which is exactly the functionality of the NEXT column ? I mean, I see
> you have messages with ONE sender and MANY? (Can they be zero? )
> recipients. What are you trying to achieve with it? How are you
> planning to maintain it in your dessign?

NULL NEXT indicates the last message inserted, and as I explain below,
this is necessary to access/update the row that looses it's "most
recently inserted" status just after that happens.

The access/update of the row that looses it's "last" status after new
insert is sufficiently expensive, that it pays to do so after an insert.
Otherwise I'd have to perform it for pretty much every row that is
selected, every time it is selected ... which will be an overkill.

and pls note, that when performance of locating one NULL field in
millions of records becomes a problem, I'm prepared to use a reserved
value (a reserved record, like: "a null message") for NEXT, instead of
current NULL.

> 
>> Now I'm quite stuck here.
> 
> I ask these questions because I think we are in a case of
> http://xyproblem.info/ .

:) this is a good one!!!

Actually I'm very, very acquainted with this "XY problem". i.e quite
often when "end-users" ask me for additional functionality, they (like
in the XY case) suggest a "technical solution". And (just like you :) I
always ask: pls tell me what you do "traditionally", e.g "when you are
currently doing this on paper: how does it go - step by step", then I'll
find a good IT solution for you.

But there is a flip side of this coin.

In case of a complex inter-mangled systems, where a well defined
"critical point" shows up, it's more efficient to extract the "show
case" that causes the problem and focus on this, instead of digressing
on overall design. (which may be flowed, but cannot be rewritten at this
point).

> 
>> BTW: I'm considering your sugestion of replaceing NEXT with the
>> timestamp. The primary reason for the NEXT is to be able to fetch a row
>> "just preceeding" currently inserted new one AFTER the insert is done
>> (in trigger after), so that some elaborated "statistics" get updated in
>> that "one before" message record. May be timestap would do instead...
> 
> If you are planning on updating the previous row for a message ( or a
> person ? ) on a trigger, this smells fishy. You may have a reason, not

Yes it does.

> knowing what you are exactly planning to do, I cannot tell, but it
> sounds really weird.

I can understand that.

But all I can tell you without full documentation is that it's
statistics gathering, which results are subsequently used for almost
every query made; and that it's a major performance hog of the system,
so it have to be done just once, at the moment is becomes well defined,
which is just after insert of a "next" row.

And It actually does not matter (to me) how the "single row" is located
within the system (i.e using NEXT or not), but it have to be located
robustly - there may not be a situation, where such last record is
missed or more then one get updated. I like FK

54 matches

Mail list logo