Re: [HACKERS] Discussion on missing optimizations

2017-10-22 Thread Andreas Seltenreich
Alvaro Herrera writes:

> Andres Freund wrote:
>> Unfortunately it won't help in this specific case (no support for UNION,
>> just UNION ALL), but I thought it might be interesting to reference
>> https://medium.com/@uwdb/introducing-cosette-527898504bd6
>> here.
>
> Interesting.  I thought about a completely different approach -- use a
> fuzzer, which runs each generated query on two servers, one patched one
> not, and compare the results.  Would it be possible to tweak sqlsmith to
> do this?

I think the tweaking needed would be:

1. Log successful queries as well along with a result description.
   Maybe logging returned/affected rows is sufficent.

2. Make it avoid nondeterministic things such as joining
   pg_stat_activity or calling non-immutable functions

The second one is a bit harder and I can't think of a more elegant
solution than adding a blacklisting/whitelisting feature and let the
user do the hard work…

If these are solved though, one could make multiple runs with the same
random seed and query the logging database for differences in the result
descriptions.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in adjust_appendrel_attrs_mutator

2017-10-22 Thread Andreas Seltenreich
Hi,

testing master as of 7c981590c2, sqlsmith just triggered the following
assertion:

TRAP: FailedAssertion("!(!const Node*)(node))->type) == T_SubLink))", File: 
"prepunion.c", Line: 2231)

I can reproduce it on a vanilla regression database with the following
query:

--8<---cut here---start->8---
explain delete from public.road
where
EXISTS (
  select
  ref_1.tablename
from
  public.renamecolumnchild as ref_0
right join pg_catalog.pg_policies as ref_1
on (true)
where EXISTS (
  select
  public.road.name as c1,
  ref_1.with_check as c3,
  ref_2.b as c6
from
  public.itest4 as ref_2));
--8<---cut here---end--->8---

Backtrace below.

regards,
Andreas

Core was generated by `postgres: smith regression [local] DELETE
'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f70c2dacfcf in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7f70c2dae3fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x564d8ce587b3 in ExceptionalCondition 
(conditionName=conditionName@entry=0x564d8cfe87b0 "!(!const 
Node*)(node))->type) == T_SubLink))", errorType=errorType@entry=0x564d8cea483d 
"FailedAssertion", fileName=fileName@entry=0x564d8cfea866 "prepunion.c", 
lineNumber=lineNumber@entry=2231) at assert.c:54
#3  0x564d8ccaf7dc in adjust_appendrel_attrs_mutator (node=0x564d91e0d2a0, 
context=0x7fff0a33caa0) at prepunion.c:2231
#4  0x564d8cc4d8ef in expression_tree_mutator (node=0x564d91e0ccd8, 
mutator=0x564d8ccaf1d0 , 
context=0x7fff0a33caa0) at nodeFuncs.c:2554
#5  0x564d8cc4d88f in expression_tree_mutator (node=0x564d91e0df90, 
mutator=mutator@entry=0x564d8ccaf1d0 , 
context=context@entry=0x7fff0a33caa0) at nodeFuncs.c:3008
#6  0x564d8ccaf29f in adjust_appendrel_attrs_mutator (node=0x564d91e0df90, 
context=0x7fff0a33caa0) at prepunion.c:2151
#7  0x564d8cc4dafb in expression_tree_mutator (node=, 
mutator=0x564d8ccaf1d0 , 
context=0x7fff0a33caa0) at nodeFuncs.c:2903
#8  0x564d8cc4e589 in range_table_mutator (rtable=, 
mutator=mutator@entry=0x564d8ccaf1d0 , 
context=context@entry=0x7fff0a33caa0, flags=flags@entry=3) at nodeFuncs.c:3146
#9  0x564d8cc4e78e in query_tree_mutator (query=0x564d91e30c20, 
query@entry=0x564d90fbe090, mutator=mutator@entry=0x564d8ccaf1d0 
, context=context@entry=0x7fff0a33caa0, 
flags=flags@entry=3) at nodeFuncs.c:3096
#10 0x564d8ccb1499 in adjust_appendrel_attrs 
(root=root@entry=0x564d910bab58, node=node@entry=0x564d90fbe090, 
nappinfos=nappinfos@entry=1, appinfos=appinfos@entry=0x7fff0a33cba8) at 
prepunion.c:1964
#11 0x564d8cca3406 in inheritance_planner (root=) at 
planner.c:1200
#12 subquery_planner (glob=glob@entry=0x564d910baac0, 
parse=parse@entry=0x564d90fbe090, parent_root=parent_root@entry=0x0, 
hasRecursion=hasRecursion@entry=0 '\000', 
tuple_fraction=tuple_fraction@entry=0) at planner.c:857
#13 0x564d8cca3dbf in standard_planner (parse=0x564d90fbe090, 
cursorOptions=256, boundParams=0x0) at planner.c:352
#14 0x564d8cd4ef5d in pg_plan_query (querytree=0x564d90fbe090, 
cursorOptions=256, boundParams=0x0) at postgres.c:807
#15 0x564d8cd4f03e in pg_plan_queries (querytrees=, 
cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at 
postgres.c:873
#16 0x564d8cd4f519 in exec_simple_query (query_string=0x564d8ed5c330 
"[...]") at postgres.c:1048
#17 0x564d8cd51151 in PostgresMain (argc=, 
argv=argv@entry=0x564d8ed66c50, dbname=, username=) at postgres.c:4139
#18 0x564d8ca56032 in BackendRun (port=0x564d8ed6d460) at postmaster.c:4364
#19 BackendStartup (port=0x564d8ed6d460) at postmaster.c:4036
#20 ServerLoop () at postmaster.c:1755
#21 0x564d8ccd628c in PostmasterMain (argc=3, argv=0x564d8ed3c5a0) at 
postmaster.c:1363
#22 0x564d8ca5761d in main (argc=3, argv=0x564d8ed3c5a0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel worker (PID ) exited with exit code 1

2017-10-07 Thread Andreas Seltenreich
Michael Paquier writes:

> On Fri, Oct 6, 2017 at 9:19 PM, tushar  wrote:
>> ERROR:  recovery is not in progress
>
> Perhaps there is a way to blacklist some functions depending on the
> server context. This question may be better asked directly where the
> project is maintained then: https://github.com/anse1/sqlsmith. I am
> adding as well Andreas in CC, he works on sqlsmith.

Blacklisting when testing with sqlsmith typically happens on the error
logging side: Logging into a database via --log-to with the schema
shipped with sqlsmith filters out boring error messages with triggers.
The ERROR reported is actually in the filter tables that ship with
sqlsmith.

I don't think adding such product-specific filtering on the generating
side is necessarily the right thing, as it would reduces code coverage
while testing.  We might miss a core dump when something becomes buggy
in that path.

I did make an exception for syntax errors: sqlsmith notices when grammar
productions consistently lead to errors. This feature was added to allow
testing other products or older postgres, so sqlsmith doesn't waste time
generating upsert statements on 9.1.  It's more apparent when testing
sqlite3: The error rate is 90% on startup, but shrinks to 2% or so after
a couple thousand statements have been generated.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-03 Thread Andreas Seltenreich
Michael Paquier writes:

> I am attaching a patch that addresses the bugs for the spin lock sections.
> [2. text/x-diff; walreceiver-spin-calls.patch]

I haven't seen a spinlock PANIC since testing with the patch applied.
They occured five times with the same amount of testing done earlier.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

2017-10-03 Thread Andreas Seltenreich
Tom Lane writes:

> Presumably somebody could dig into the libc source code and prove or
> disprove this, though it would sure help to know exactly what platform
> and version Andreas is testing on.

This is the code in glibc-2.24 around the crash site:

,[ glibc-2.24/elf/dl-load.c:442 ]
|   to_free = cp = expand_dynamic_string_token (l, cp, 1);
|
|   size_t len = strlen (cp);
`

…while expand_dynamic_string_token will indeed return NULL on a failed
malloc.  Code in the most recent glibc looks the same, so I'll carry
this issue over to the glibc bugzilla then.

Sorry about the noise…
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] crash in RestoreLibraryState during low-memory testing

2017-10-02 Thread Andreas Seltenreich
Hi,

doing low-memory testing with REL_10_STABLE at 1f19550a87 also produced
a couple of parallel worker core dumps with the backtrace below.
Although most of the backtrace is inside the dynamic linker, it looks
like it was passed a pointer to gone-away shared memory.

regards,
Andreas

Core was generated by `postgres: bgworker: parallel worker for PID 24326
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x7f5184852a36 in fillin_rpath (rpath=, 
rpath@entry=0x55b692f0d360 "/home/smith/postgres/inst/master/lib", 
result=result@entry=0x55b692f1b380, sep=sep@entry=0x7f5184868060 ":", 
check_trusted=check_trusted@entry=0, what=what@entry=0x7f51848683bd "RUNPATH", 
where=where@entry=0x55b692f2d2f0 
"/home/smith/postgres/inst/master/lib/pgcrypto.so", l=0x55b692f2d330) at 
dl-load.c:444
#2  0x7f5184852daf in decompose_rpath (sps=sps@entry=0x55b692f2d6d8, 
rpath=, l=l@entry=0x55b692f2d330, what=what@entry=0x7f51848683bd 
"RUNPATH") at dl-load.c:618
#3  0x7f5184852ef7 in cache_rpath (l=l@entry=0x55b692f2d330, 
sp=sp@entry=0x55b692f2d6d8, tag=tag@entry=29, what=what@entry=0x7f51848683bd 
"RUNPATH") at dl-load.c:652
#4  0x7f5184853c62 in cache_rpath (what=0x7f51848683bd "RUNPATH", tag=29, 
sp=0x55b692f2d6d8, l=0x55b692f2d330) at dl-load.c:2307
#5  _dl_map_object (loader=0x55b692f2d330, name=0x7f517f300cc3 "libz.so.1", 
type=2, trace_mode=0, mode=, nsid=) at 
dl-load.c:2314
#6  0x7f5184857e70 in openaux (a=a@entry=0x7ffd4f686130) at dl-deps.c:63
#7  0x7f518485a4f4 in _dl_catch_error 
(objname=objname@entry=0x7ffd4f686128, 
errstring=errstring@entry=0x7ffd4f686120, 
mallocedp=mallocedp@entry=0x7ffd4f68611f, operate=operate@entry=0x7f5184857e40 
, args=args@entry=0x7ffd4f686130) at dl-error.c:187
#8  0x7f51848580df in _dl_map_object_deps (map=map@entry=0x55b692f2d330, 
preloads=preloads@entry=0x0, npreloads=npreloads@entry=0, 
trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at 
dl-deps.c:254
#9  0x7f518485ea02 in dl_open_worker (a=a@entry=0x7ffd4f6863c0) at 
dl-open.c:280
#10 0x7f518485a4f4 in _dl_catch_error 
(objname=objname@entry=0x7ffd4f6863b0, 
errstring=errstring@entry=0x7ffd4f6863b8, 
mallocedp=mallocedp@entry=0x7ffd4f6863af, operate=operate@entry=0x7f518485e8f0 
, args=args@entry=0x7ffd4f6863c0) at dl-error.c:187
#11 0x7f518485e489 in _dl_open (file=0x55b692f2d2b0 
"/home/smith/postgres/inst/master/lib/pgcrypto.so", mode=-2147483390, 
caller_dlopen=0x55b691cb4c7e , nsid=-2, 
argc=, argv=, env=0x55b692eef880) at dl-open.c:660
#12 0x7f5184020ee9 in dlopen_doit (a=a@entry=0x7ffd4f6865f0) at dlopen.c:66
#13 0x7f518485a4f4 in _dl_catch_error (objname=0x55b692eef6d0, 
errstring=0x55b692eef6d8, mallocedp=0x55b692eef6c8, operate=0x7f5184020e90 
, args=0x7ffd4f6865f0) at dl-error.c:187
#14 0x7f5184021521 in _dlerror_run (operate=operate@entry=0x7f5184020e90 
, args=args@entry=0x7ffd4f6865f0) at dlerror.c:163
#15 0x7f5184020f82 in __dlopen (file=, mode=mode@entry=258) 
at dlopen.c:87
#16 0x55b691cb4c7e in internal_load_library 
(libname=libname@entry=0x7f51848be7f8 ) at dfmgr.c:231
#17 0x55b691cb5928 in RestoreLibraryState (start_address=0x7f51848be7f8 
) at dfmgr.c:754
#18 0x55b6919459d9 in ParallelWorkerMain (main_arg=) at 
parallel.c:1030
#19 0x55b691b23746 in StartBackgroundWorker () at bgworker.c:835
#20 0x55b691b2faf5 in do_start_bgworker (rw=0x55b692f0e050) at 
postmaster.c:5680
#21 maybe_start_bgworkers () at postmaster.c:5884
#22 0x55b691b305c8 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:5073
#23 
#24 0x7f5183a5f273 in __select_nocancel () at 
../sysdeps/unix/syscall-template.S:84
#25 0x55b6918b8c0b in ServerLoop () at postmaster.c:1717
#26 0x55b691b31c65 in PostmasterMain (argc=3, argv=0x55b692eea5f0) at 
postmaster.c:1361
#27 0x55b6918bac4d in main (argc=3, argv=0x55b692eea5f0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Andreas Seltenreich
Hi,

low-memory testing with REL_10_STABLE at 1f19550a87 produced the
following PANIC:

stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397

I was about to wrap the pstrdup()s with a PG_TRY block, but I can't find
a spinlock being released in a PG_CATCH block anywhere, so maybe that's
a bad idea?

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-13 Thread Andreas Seltenreich
Tom Lane writes:
> I wonder if Andreas would be interested in trying the randomly-timed-
> SIGTERM thing with sqlsmith.

So far, most of the core dumps generated are Jeevan's assertion failing
with backtraces through SearchCatCacheList.  The rest is failing this
assertion:

TRAP: FailedAssertion("!(portal->cleanup == ((void *)0))", File: 
"portalmem.c", Line: 846)

Example backtrace below.  They all happened during a rollback statement.
Testing was done on master at 2336f84284.

regards,
Andreas

Core was generated by `postgres: smith regression [local] ROLLBACK  
'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x7f4c26d3240a in __GI_abort () at abort.c:89
#2  0x559d18897a73 in ExceptionalCondition 
(conditionName=conditionName@entry=0x559d18a81370 "!(portal->cleanup == ((void 
*)0))", errorType=errorType@entry=0x559d188e3f7d "FailedAssertion", 
fileName=fileName@entry=0x559d18a81013 "portalmem.c", 
lineNumber=lineNumber@entry=846) at assert.c:54
#3  0x559d188c42f1 in AtCleanup_Portals () at portalmem.c:846
#4  0x559d18536cb7 in CleanupTransaction () at xact.c:2652
#5  0x559d1853b825 in AbortOutOfAnyTransaction () at xact.c:4278
#6  0x559d188a7799 in ShutdownPostgres (code=, 
arg=) at postinit.c:1146
#7  0x559d1876b4e9 in shmem_exit (code=code@entry=1) at ipc.c:228
#8  0x559d1876b5fa in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#9  0x559d1876b688 in proc_exit (code=code@entry=1) at ipc.c:102
#10 0x559d188999b1 in errfinish (dummy=) at elog.c:543
#11 0x559d1878fefa in ProcessInterrupts () at postgres.c:2841
#12 0x559d18790829 in ProcessInterrupts () at postgres.c:2828
#13 0x559d18795395 in PortalRunMulti (portal=portal@entry=0x559d197f2bf0, 
isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 
'\000', dest=dest@entry=0x559d19850c40, altdest=altdest@entry=0x559d19850c40, 
completionTag=completionTag@entry=0x7ffc04f1b560 "") at pquery.c:1239
#14 0x559d18796069 in PortalRun (portal=portal@entry=0x559d197f2bf0, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
run_once=run_once@entry=1 '\001', dest=dest@entry=0x559d19850c40, 
altdest=altdest@entry=0x559d19850c40, completionTag=0x7ffc04f1b560 "") at 
pquery.c:799
#15 0x559d18791dca in exec_simple_query (query_string=0x559d1984fe00 
"ROLLBACK;") at postgres.c:1099
#16 0x559d18793af1 in PostgresMain (argc=, 
argv=argv@entry=0x559d197fa078, dbname=, username=) at postgres.c:4090
#17 0x559d184a3428 in BackendRun (port=0x559d197e8f00) at postmaster.c:4357
#18 BackendStartup (port=0x559d197e8f00) at postmaster.c:4029
#19 ServerLoop () at postmaster.c:1753
#20 0x559d1871ad65 in PostmasterMain (argc=3, argv=0x559d197be5a0) at 
postmaster.c:1361
#21 0x559d184a4a6d in main (argc=3, argv=0x559d197be5a0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-10 Thread Andreas Seltenreich
Tom Lane writes:

> I wonder if Andreas would be interested in trying the randomly-timed-
> SIGTERM thing with sqlsmith.

Will do.  Won't miss this chance to try out discostu's extension
pg_rage_terminator[1] :-)

regards,
Andreas

Footnotes: 
[1]  https://github.com/disco-stu/pg_rage_terminator


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Planner crash on foreign table join

2017-04-09 Thread Andreas Seltenreich
Tom Lane writes:

> I made the attached quick-hack patch, and found that check-world
> passes just fine with it.  That's not complete proof that we have
> no other bugs of this ilk, but it definitely supports the idea
> that we don't really need to add the overhead.  I'll just put this
> in the archives for possible future reference.
>
> (Or perhaps Andreas would like to try bashing on a copy with this
> installed.)

I certainly do :-).  SQLsmith has been fuzzing for couple hours with the
patch applied, and so far none of the assertions fired.  I'll leave the
patch on my fuzzing branch until merging becomes burdensome.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...

2017-04-09 Thread Andreas Seltenreich
Hi,

I see the above ERROR logged a lot when testing master at eef8c0069e
with a postgres_fdw around.  Below is a recipe to reproduce it on top of
the regression DB.

regards,
Andreas

create extension postgres_fdw;
create server myself foreign data wrapper postgres_fdw;
create schema fdw_postgres;
create user fdw login;
grant all on schema public to fdw;
grant all on all tables in schema public to fdw;
create user mapping for public server myself options (user 'fdw');
import foreign schema public from server myself into fdw_postgres;

set max_parallel_workers_per_gather = 8;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;

regression=> select (select max(result) from fdw_postgres.num_result) from tt0;
ERROR:  badly formatted node string "RESTRICTINFO :clause {NULLTEST :"...
CONTEXT:  parallel worker
regression=> explain select (select max(result) from fdw_postgres.num_result) 
from tt0;
QUERY PLAN
---
 Gather  (cost=100.06..122.51 rows=1947 width=32)
   Workers Planned: 1
   InitPlan 2 (returns $1)
 ->  Result  (cost=100.05..100.06 rows=1 width=32)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=100.00..100.05 rows=1 width=74)
   ->  Foreign Scan on num_result  (cost=100.00..138.64 
rows=831 width=74)
   ->  Append  (cost=0.00..22.45 rows=1145 width=0)
 ->  Parallel Seq Scan on tt0  (cost=0.00..2.04 rows=104 width=0)
 ->  Parallel Seq Scan on tt6  (cost=0.00..20.41 rows=1041 width=0)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Planner crash on foreign table join

2017-04-08 Thread Andreas Seltenreich
Hi,

testing master at f0e44021df with a loopback postgres_fdw installed, I
see lots of crashes on queries joining foreign tables with various
expressions.  Below is a reduced recipe for the regression database and
a backtrace.

regards,
Andreas

--8<---cut here---start->8---
create extension postgres_fdw;
create server myself foreign data wrapper postgres_fdw;
create schema fdw_postgres;
create user fdw login;
grant all on schema public to fdw;
grant all on all tables in schema public to fdw;
create user mapping for public server myself options (user 'fdw');
import foreign schema public from server myself into fdw_postgres;

explain select from
  fdw_postgres.hslot
left join fdw_postgres.num_exp_div
on ((exists (values (1))) and (values (1)) is null);
--8<---cut here---end--->8---

Program terminated with signal SIGSEGV, Segmentation fault.
#0  bms_get_singleton_member (a=0x10, member=member@entry=0x7fffb577cafc) at 
bitmapset.c:577
#1  0x56425107b531 in find_relation_from_clauses (clauses=0x564251a68570, 
root=0x564251a273d8) at clausesel.c:445
#2  clauselist_selectivity (root=root@entry=0x564251a273d8, 
clauses=0x564251a68570, varRelid=varRelid@entry=0, jointype=JOIN_LEFT, 
sjinfo=0x564251a661c0) at clausesel.c:128
#3  0x7f61d3d9f22f in postgresGetForeignJoinPaths (root=, 
joinrel=0x564251a66ba8, outerrel=, innerrel=, 
jointype=, extra=0x7fffb577cc50) at postgres_fdw.c:4466
#4  0x56425108a238 in add_paths_to_joinrel (root=root@entry=0x564251a273d8, 
joinrel=joinrel@entry=0x564251a66ba8, outerrel=outerrel@entry=0x564251a65378, 
innerrel=innerrel@entry=0x564251a65f30, jointype=jointype@entry=JOIN_LEFT, 
sjinfo=sjinfo@entry=0x564251a661c0, restrictlist=0x564251a681c8) at 
joinpath.c:278
#5  0x56425108bff2 in populate_joinrel_with_paths (restrictlist=, sjinfo=0x564251a661c0, joinrel=0x564251a66ba8, rel2=0x564251a65f30, 
rel1=0x564251a65378, root=0x564251a273d8) at joinrels.c:795
#6  make_join_rel (root=root@entry=0x564251a273d8, 
rel1=rel1@entry=0x564251a65378, rel2=rel2@entry=0x564251a65f30) at 
joinrels.c:731
#7  0x56425108c7ef in make_rels_by_clause_joins (other_rels=, old_rel=, root=) at joinrels.c:277
#8  join_search_one_level (root=root@entry=0x564251a273d8, level=level@entry=2) 
at joinrels.c:99
#9  0x564251079bdb in standard_join_search (root=0x564251a273d8, 
levels_needed=2, initial_rels=) at allpaths.c:2385
#10 0x56425107ac7b in make_one_rel (root=root@entry=0x564251a273d8, 
joinlist=joinlist@entry=0x564251a65998) at allpaths.c:184
#11 0x564251099ef4 in query_planner (root=root@entry=0x564251a273d8, 
tlist=tlist@entry=0x0, qp_callback=qp_callback@entry=0x56425109aeb0 
, qp_extra=qp_extra@entry=0x7fffb577cff0) at 
planmain.c:253
#12 0x56425109dbc2 in grouping_planner (root=root@entry=0x564251a273d8, 
inheritance_update=inheritance_update@entry=0 '\000', tuple_fraction=, tuple_fraction@entry=0) at planner.c:1684
#13 0x5642510a0133 in subquery_planner (glob=glob@entry=0x564251a2a6d0, 
parse=parse@entry=0x5642519aac60, parent_root=parent_root@entry=0x0, 
hasRecursion=hasRecursion@entry=0 '\000', 
tuple_fraction=tuple_fraction@entry=0) at planner.c:833
#14 0x5642510a0f71 in standard_planner (parse=0x5642519aac60, 
cursorOptions=256, boundParams=0x0) at planner.c:333
#15 0x5642511458cd in pg_plan_query 
(querytree=querytree@entry=0x5642519aac60, cursorOptions=256, 
boundParams=boundParams@entry=0x0) at postgres.c:802
#16 0x564250fa9a40 in ExplainOneQuery (query=0x5642519aac60, 
cursorOptions=, into=0x0, es=0x564251a513a0, 
queryString=0x564251a09590 "explain select from\n  fdw_postgres.hslot\nleft 
join fdw_postgres.num_exp_div\non ((exists (values (1))) and (values (1)) 
is null);", params=0x0, queryEnv=0x0) at explain.c:367
#17 0x564250faa005 in ExplainQuery (pstate=pstate@entry=0x564251a511f0, 
stmt=stmt@entry=0x564251a0aa58, queryString=queryString@entry=0x564251a09590 
"explain select from\n  fdw_postgres.hslot\nleft join 
fdw_postgres.num_exp_div\non ((exists (values (1))) and (values (1)) is 
null);", params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, 
dest=dest@entry=0x564251a51308) at explain.c:256
#18 0x56425114b9cb in standard_ProcessUtility (pstmt=0x564251a0b2e0, 
queryString=0x564251a09590 "explain select from\n  fdw_postgres.hslot\nleft 
join fdw_postgres.num_exp_div\non ((exists (values (1))) and (values (1)) 
is null);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 
dest=0x564251a51308, completionTag=0x7fffb577d390 "") at utility.c:680
#19 0x5642511487b4 in PortalRunUtility (portal=0x564251a07580, 
pstmt=0x564251a0b2e0, isTopLevel=, setHoldSnapshot=, dest=, completionTag=0x7fffb577d390 "") at pquery.c:1179
#20 0x564251149633 in FillPortalStore (portal=portal@entry=0x564251a07580, 
isTopLevel=isTopLevel@entry=1 '\001') at pquery.c:1039
#21 0x56425114a21d in PortalRun 

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-04-01 Thread Andreas Seltenreich
Andreas Seltenreich writes:
>>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
>>> (&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> I got about one TRAP per hour when testing on 20 nodes with one postgres
> and 5 sqlsmithes on each.

> Ashutosh Sharma writes:
>> [2. reacquire_lock_hashkillitems_if_required.patch]

I've done about 12 hours of testing since applying this patch and no
failed assertions so far.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Unpinning error in parallel worker

2017-03-29 Thread Andreas Seltenreich
Thomas Munro writes:

> Based on feedback on another thread about how to make reviewers' and
> committers' jobs easier, here is a format-patch version with a short
> description as raw material for a commit message, in case that is
> helpful.

+1

It's quite convenient.  Otherwise I have to be creative myself writing
commit messages in order to keep track of which patches are applied on
the branch sqlsmith is crunching on.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-27 Thread Andreas Seltenreich
Ashutosh Sharma writes:
>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
>> (&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> Thanks for reporting this problem. Could you please let me know on for
> how long did you run sqlsmith to get this crash.

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each.  Nodes are tiny consumer machines with
low-power 4-core sandy bridges.

> [2. reacquire_lock_hashkillitems_if_required.patch]

I'll test with your patch applied as soon as time permits and report
back.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Unpinning error in parallel worker

2017-03-26 Thread Andreas Seltenreich
Hi,

today's testing with master as of d253b0f6e3 yielded two clusters that
stopped processing queries.  Symptoms:

regression=> select application_name, state, wait_event, wait_event_type, 
count(1), min(pid)
 from pg_stat_activity group by 1,2,3,4;
 application_name | state  |   wait_event   | wait_event_type | 
count | min  
--+++-+---+--
 psql | active || | 
1 | 3781
 psql | active | DynamicSharedMemoryControlLock | LWLock  | 
2 | 3272
 sqlsmith::dut| active | DynamicSharedMemoryControlLock | LWLock  | 
9 | 2726
 sqlsmith::dut| active | MessageQueueSend   | IPC | 
1 | 2625
 sqlsmith::dut| active | BgWorkerShutdown   | IPC | 
1 | 3655
 sqlsmith::schema | idle   | ClientRead | Client  | 
9 | 3634
(6 rows)

regression=> select application_name, state, wait_event, wait_event_type, 
count(1), min(pid)
 from pg_stat_activity group by 1,2,3,4;
 application_name | state  |   wait_event   | wait_event_type | 
count |  min  
--+++-+---+---
 psql | active || | 
1 | 29645
 sqlsmith::dut| active | DynamicSharedMemoryControlLock | LWLock  | 
5 | 23167
 sqlsmith::schema | idle   | ClientRead | Client  | 
5 |  1169
 sqlsmith::dut| active | BgWorkerShutdown   | IPC | 
1 |  1178
(4 rows)

Both sport the following last error in their logfiles:

FATAL:  cannot unpin a segment that is not pinned

Below are the backtraces of the processes throwing them.

regards,
Andreas

Backtrace on dwagon:

#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x006b7c02 in PGSemaphoreLock (sema=0x7f8aa19b20d8) at pg_sema.c:310
#2  0x00726b04 in LWLockAcquire (lock=0x7f8aa19b3400, 
mode=LW_EXCLUSIVE) at lwlock.c:1233
#3  0x00710b56 in dsm_detach (seg=0x12cf6a0) at dsm.c:760
#4  0x00710e6e in dsm_backend_shutdown () at dsm.c:643
#5  0x0071144f in shmem_exit (code=code@entry=1) at ipc.c:248
#6  0x0071150e in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#7  0x00711588 in proc_exit (code=code@entry=1) at ipc.c:102
#8  0x008352ff in errfinish (dummy=dummy@entry=0) at elog.c:543
#9  0x00838a8c in elog_finish (elevel=, fmt=0x9e8e70 
"cannot unpin a segment that is not pinned") at elog.c:1378
#10 0x007111ff in dsm_unpin_segment (handle=545777640) at dsm.c:917
#11 0x00858205 in dsa_release_in_place (place=0x7f8aa524a178) at 
dsa.c:617
#12 0x00710ae0 in dsm_detach (seg=0x12cf6a0) at dsm.c:734
#13 0x00710e6e in dsm_backend_shutdown () at dsm.c:643
#14 0x0071144f in shmem_exit (code=code@entry=1) at ipc.c:248
#15 0x0071150e in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#16 0x00711588 in proc_exit (code=code@entry=1) at ipc.c:102
#17 0x008352ff in errfinish (dummy=) at elog.c:543
#18 0x0073425c in ProcessInterrupts () at postgres.c:2837
#19 0x004baeb5 in heapgetpage (scan=scan@entry=0x1376138, 
page=page@entry=0) at heapam.c:373
#20 0x004bbaf0 in heapgettup_pagemode (key=0x1376550, nkeys=1, 
dir=ForwardScanDirection, scan=0x1376138) at heapam.c:830
#21 heap_getnext (scan=0x1376138, 
direction=direction@entry=ForwardScanDirection) at heapam.c:1804
#22 0x004cf2be in systable_getnext (sysscan=sysscan@entry=0x13760e0) at 
genam.c:435
#23 0x00823760 in ScanPgRelation (targetRelId=, 
indexOK=, force_non_historic=force_non_historic@entry=0 '\000') 
at relcache.c:357
#24 0x00823ad3 in RelationReloadIndexInfo 
(relation=relation@entry=0x7f8aa52321e8) at relcache.c:2235
#25 0x00827dd8 in RelationIdGetRelation 
(relationId=relationId@entry=2662) at relcache.c:2091
#26 0x004bb088 in relation_open (relationId=relationId@entry=2662, 
lockmode=lockmode@entry=1) at heapam.c:1128
#27 0x004cf8f6 in index_open (relationId=relationId@entry=2662, 
lockmode=lockmode@entry=1) at indexam.c:155
#28 0x004cf226 in systable_beginscan 
(heapRelation=heapRelation@entry=0x7f8aa10e64a0, indexId=2662, 
indexOK=, snapshot=snapshot@entry=0x0, nkeys=1, 
key=key@entry=0x7ffe7b2bd260) at genam.c:340
#29 0x0081ef9c in SearchCatCache (cache=0x132cd10, v1=v1@entry=16528, 
v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at catcache.c:1234
#30 0x0082d46e in SearchSysCache (cacheId=cacheId@entry=46, 
key1=key1@entry=16528, key2=key2@entry=0, key3=key3@entry=0, key4=key4@entry=0) 
at syscache.c:1108
#31 0x00530b8c in pg_class_aclmask (table_oid=table_oid@entry=16528, 

[HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-26 Thread Andreas Seltenreich
Hi,

testing with master as of cf366e97ff, sqlsmith occasionally triggers the
following assertion:

TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
(&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)

Backtraces always look like the one below.  It is reproducible on a
cluster once it happens.  I could provide a tarball if needed.

regards,
Andreas

#2  0x008324b1 in ExceptionalCondition 
(conditionName=conditionName@entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*) 
(&(bufHdr)->content_lock", errorType=errorType@entry=0x87b03d 
"FailedAssertion", fileName=fileName@entry=0x9e5856 "bufmgr.c", 
lineNumber=lineNumber@entry=3397) at assert.c:54
#3  0x00706971 in MarkBufferDirtyHint (buffer=2844, 
buffer_std=buffer_std@entry=1 '\001') at bufmgr.c:3397
#4  0x004b3ecd in _hash_kill_items (scan=scan@entry=0x66dcf70) at 
hashutil.c:514
#5  0x004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512
#6  0x004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353
#7  0x0061fa51 in ExecEndIndexScan (node=0x3093f30) at 
nodeIndexscan.c:852
#8  0x00608e59 in ExecEndNode (node=) at 
execProcnode.c:715
#9  0x006045b8 in ExecEndPlan (estate=0x3064000, planstate=) at execMain.c:1540
#10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487
#11 0x005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302
#12 0x0085cbb3 in PortalDrop (portal=0x1a60060, isTopCommit=) at portalmem.c:489
#13 0x00736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at 
postgres.c:
#14 0x00738b51 in PostgresMain (argc=, 
argv=argv@entry=0x1a6c6c8, dbname=, username=) at 
postgres.c:4071
#15 0x00475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317
#16 BackendStartup (port=0x1a65b90) at postmaster.c:3989
#17 ServerLoop () at postmaster.c:1729
#18 0x006c8662 in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0x1a3f540) at postmaster.c:1337
#19 0x0047729d in main (argc=4, argv=0x1a3f540) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving RLS planning

2017-01-19 Thread Andreas Seltenreich
Tom Lane writes:

> Thanks for reviewing --- I'll do something with that test case and
> push it.

sqlsmith doesn't seem to like 215b43cdc:

select 1 from information_schema.usage_privileges
where information_schema._pg_keysequal(
   (select null::smallint[]),
   '{16,25,23}');

-- TRAP: FailedAssertion("!(!and_clause((Node *) clause))", File: 
"restrictinfo.c", Line: 81)

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Short reads in hash indexes

2016-12-29 Thread Andreas Seltenreich
Amit Kapila writes:

> Can you please try with the patch posted on hash index thread [1] to
> see if you can reproduce any of these problems?
>
> [1] - 
> https://www.postgresql.org/message-id/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA%40mail.gmail.com

I'm no longer seeing the failed assertions nor short reads since these
patches are in.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-29 Thread Andreas Seltenreich
Thomas Munro writes:

> [2. text/x-diff; fix-dsa-tranche-registration.patch]

Fuzzing with the patch applied hasn't triggered the crash so far.  It
did happen 5 times with the same amount of fuzzing before patching.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Andreas Seltenreich
Andreas Seltenreich writes:

> Thomas Munro writes:
>
>> It is safe, as long as the segment remains mapped.  Each backend that
>> attaches calls LWLockRegisterTranche giving it the address of the name
>> in its virtual address space.
>
> Hmok, I was under the impression only backends participating in the IPC
> call the attach function, not necessarily the ones that could possible
> want to resolve the wait_event_info they found in the procArray via
> pgstat_get_wait_event().

Erm, ignore that question: They'll find a NULL in their
LWLockTrancheArray and run into the "extension" case you mentioned.

> But I really feel like I need to study the code a bit more before
> commenting further…

Following this advise now :-)

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Andreas Seltenreich
Thomas Munro writes:

> On Wed, Dec 28, 2016 at 11:38 AM, Andreas Seltenreich
> <seltenre...@gmx.de> wrote:
>> Thomas Munro writes:
>>
>>> On Wed, Dec 28, 2016 at 10:40 AM, Thomas Munro 
>>> <thomas.mu...@enterprisedb.com> wrote:
>>>> On Wed, Dec 28, 2016 at 10:01 AM, Andreas Seltenreich <seltenre...@gmx.de> 
>>>> wrote:
>>>>> testing master as of fe591f8bf6 produced a crash reading
>>>>> pg_stat_activity (backtrace below).  Digging around with with gdb
>>>>> revealed that pgstat_get_wait_event() returned an invalid pointer for a
>>>>> classId PG_WAIT_LWLOCK.
>>>>>
>>>>> I think the culprit is dsa.c passing a pointer to memory that goes away
>>>>> on dsa_free() as a name to LWLockRegisterTranche.
>> [..]
>>>> Maybe we should replace it with another value when the DSA area is
>>>> detached, using a constant string.  Something like
>>
>> I'm wondering: Is it safe to pass a pointer into a DSA at all?  If I
>> understand the comments correctly, they are not necessarily mapped (at
>> the same address) in an unrelated backend looking into pg_stat_activity,
>> and in this case a dsa_free() is not actually needed to trigger a crash.
>
> It is safe, as long as the segment remains mapped.  Each backend that
> attaches calls LWLockRegisterTranche giving it the address of the name
> in its virtual address space.

Hmok, I was under the impression only backends participating in the IPC
call the attach function, not necessarily the ones that could possible
want to resolve the wait_event_info they found in the procArray via
pgstat_get_wait_event().

>> Maybe instead of copying the name, just put the passed pointer itself
>> into the area?  Extensions using LWLockNewTrancheId need to use
>> shared_preload_libraries anyway, so static strings would be mapped in
>> all backends.
>
> Yeah that would be another way.  I had this idea that only the process
> that creates a DSA area should name it, and then processes attaching
> would see the existing tranche ID and name, so could use a narrower
> interface.  We could instead do as you say and make processes that
> attach provide a pointer to the name too, and make it the caller's
> problem to ensure that the pointers remain valid long enough; or go
> one step further and make them register/unregister it themselves.

Hmm, turning the member of the control struct
char lwlock_tranche_name[DSA_MAXLEN];
into
const char *lwlock_tranche_name;
and initializing it with the passed static const char * instead of
copying wouldn't require a change of the interface, would it?

But I really feel like I need to study the code a bit more before
commenting further…

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Andreas Seltenreich
Thomas Munro writes:

> On Wed, Dec 28, 2016 at 10:40 AM, Thomas Munro 
> <thomas.mu...@enterprisedb.com> wrote:
>> On Wed, Dec 28, 2016 at 10:01 AM, Andreas Seltenreich <seltenre...@gmx.de> 
>> wrote:
>>> testing master as of fe591f8bf6 produced a crash reading
>>> pg_stat_activity (backtrace below).  Digging around with with gdb
>>> revealed that pgstat_get_wait_event() returned an invalid pointer for a
>>> classId PG_WAIT_LWLOCK.
>>>
>>> I think the culprit is dsa.c passing a pointer to memory that goes away
>>> on dsa_free() as a name to LWLockRegisterTranche.
[..]
>> Maybe we should replace it with another value when the DSA area is
>> detached, using a constant string.  Something like

I'm wondering: Is it safe to pass a pointer into a DSA at all?  If I
understand the comments correctly, they are not necessarily mapped (at
the same address) in an unrelated backend looking into pg_stat_activity,
and in this case a dsa_free() is not actually needed to trigger a crash.

> That line of thinking suggests another potential solution: go and
> register the name in RegisterLWLockTranches, and stop registering it
> in dsa.c.  For other potential uses of DSA including extensions that
> call LWLockNewTrancheId() we'd have to decide whether to make the name
> an optional argument, or require those users to register it
> themselves.

Maybe instead of copying the name, just put the passed pointer itself
into the area?  Extensions using LWLockNewTrancheId need to use
shared_preload_libraries anyway, so static strings would be mapped in
all backends.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Crash reading pg_stat_activity

2016-12-27 Thread Andreas Seltenreich
Hi,

testing master as of fe591f8bf6 produced a crash reading
pg_stat_activity (backtrace below).  Digging around with with gdb
revealed that pgstat_get_wait_event() returned an invalid pointer for a
classId PG_WAIT_LWLOCK.

I think the culprit is dsa.c passing a pointer to memory that goes away
on dsa_free() as a name to LWLockRegisterTranche.

regards,
Andreas

Program terminated with signal SIGSEGV, Segmentation fault.
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
(gdb) bt
#1  0x007e03c9 in cstring_to_text (s=0x7fab18d1f954 ) at varlena.c:152
#2  0x00792d7c in pg_stat_get_activity (fcinfo=) at 
pgstatfuncs.c:805
#3  0x005f0af5 in ExecMakeTableFunctionResult (funcexpr=0x5469f90, 
econtext=0x5469c80, argContext=, expectedDesc=0x387b2b0, 
randomAccess=0 '\000') at execQual.c:2216
#4  0x00608633 in FunctionNext (node=node@entry=0x5469b68) at 
nodeFunctionscan.c:94
#5  0x005f2c22 in ExecScanFetch (recheckMtd=0x608390 , 
accessMtd=0x6083a0 , node=0x5469b68) at execScan.c:95
#6  ExecScan (node=node@entry=0x5469b68, accessMtd=accessMtd@entry=0x6083a0 
, recheckMtd=recheckMtd@entry=0x608390 ) at 
execScan.c:180
#7  0x0060867f in ExecFunctionScan (node=node@entry=0x5469b68) at 
nodeFunctionscan.c:268
#8  0x005eb4c8 in ExecProcNode (node=node@entry=0x5469b68) at 
execProcnode.c:449
#9  0x00602cd0 in ExecLimit (node=node@entry=0x54697f0) at 
nodeLimit.c:91
#10 0x005eb368 in ExecProcNode (node=node@entry=0x54697f0) at 
execProcnode.c:531
[...]


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Short reads in hash indexes

2016-12-07 Thread Andreas Seltenreich
Andreas Seltenreich writes:

> Amit Kapila writes:
>
>> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich <seltenre...@gmx.de> 
>> wrote:
>>> Amit Kapila writes:
>>>
>>>> [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
>>> Ok, I'll do testing with the patch applied.
>
> Good news: the assertion hasn't fired since the patch is in.

Meh, it fired again today after being silent for 100e6 queries :-/
I guess I need to add some confidence qualification on such statements.
Maybe sigmas as they do at CERN…

> smith=# select * from state_report where sqlstate = 'XX001';
> -[ RECORD 1 
> ]--
> count| 10
> sqlstate | XX001
> sample   | ERROR:  could not read block 1173 in file "base/16384/17256": read 
> only 0 of 8192 bytes
> hosts| {airbisquit,frell,gorgo,marbit,pillcrow,quakken}
>
>> Hmm, I am not sure if this is related to previous problem, but it
>> could be.  Is it possible to get the operation and or callstack for
>> above failure?
>
> Ok, will turn the elog into an assertion to get at the backtraces.

Doing so on top of 4212cb7, I caught the backtrace below.  Query was:

--8<---cut here---start->8---
set max_parallel_workers_per_gather = 0;
select  count(1) from
   public.hash_name_heap as ref_2
   join public.rtest_emplog as sample_1
  on (ref_2.random = sample_1.who);
--8<---cut here---end--->8---

I've put the data directory where it can be reproduced here:

http://ansel.ydns.eu/~andreas/hash_index_short_read.tar.xz (12MB)

regards,
Andreas

TRAP: FailedAssertion("!(!"short read of block")", File: "md.c", Line: 782)
#2  0x007f7f11 in ExceptionalCondition 
(conditionName=conditionName@entry=0x9a1ae9 "!(!\"short read of block\")", 
errorType=errorType@entry=0x83db3d "FailedAssertion", 
fileName=fileName@entry=0x946a9a "md.c", lineNumber=lineNumber@entry=782) at 
assert.c:54
#3  0x006fb305 in mdread (reln=, forknum=, blocknum=4702, buffer=0x7fe97e7e1280 "\"") at md.c:782
#4  0x006d0ffa in ReadBuffer_common (smgr=0x2af7408, 
relpersistence=, forkNum=forkNum@entry=MAIN_FORKNUM, 
blockNum=blockNum@entry=4702, mode=RBM_NORMAL, strategy=, 
hit=0x7ffde9df11cf "") at bufmgr.c:890
#5  0x006d1a20 in ReadBufferExtended (reln=0x2fd10d8, 
forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=4702, mode=mode@entry=RBM_NORMAL, 
strategy=strategy@entry=0x0) at bufmgr.c:664
#6  0x006d1b74 in ReadBuffer (blockNum=, reln=) at bufmgr.c:596
#7  ReleaseAndReadBuffer (buffer=buffer@entry=87109984, relation=, blockNum=) at bufmgr.c:1540
#8  0x004c047b in index_fetch_heap (scan=scan@entry=0x5313160) at 
indexam.c:469
#9  0x004c05ee in index_getnext (scan=scan@entry=0x5313160, 
direction=direction@entry=ForwardScanDirection) at indexam.c:565
#10 0x005f9b71 in IndexNext (node=node@entry=0x5311c48) at 
nodeIndexscan.c:105
#11 0x005ec492 in ExecScanFetch (recheckMtd=0x5f9af0 , 
accessMtd=0x5f9b30 , node=0x5311c48) at execScan.c:95
#12 ExecScan (node=0x5311c48, accessMtd=0x5f9b30 , 
recheckMtd=0x5f9af0 ) at execScan.c:145
#13 0x005e4da8 in ExecProcNode (node=node@entry=0x5311c48) at 
execProcnode.c:427
#14 0x006014f9 in ExecNestLoop (node=node@entry=0x53110a8) at 
nodeNestloop.c:174
#15 0x005e4cf8 in ExecProcNode (node=node@entry=0x53110a8) at 
execProcnode.c:476
#16 0x00601436 in ExecNestLoop (node=node@entry=0x5310e00) at 
nodeNestloop.c:123
#17 0x005e4cf8 in ExecProcNode (node=node@entry=0x5310e00) at 
execProcnode.c:476
#18 0x00601436 in ExecNestLoop (node=node@entry=0x530f698) at 
nodeNestloop.c:123
#19 0x005e4cf8 in ExecProcNode (node=node@entry=0x530f698) at 
execProcnode.c:476
#20 0x005e0e9e in ExecutePlan (dest=0x603a4a8, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x530f698, estate=0x46bc008) at 
execMain.c:1568
#21 standard_ExecutorRun (queryDesc=0x3475168, direction=, 
count=0) at execMain.c:338
#22 0x007029f8 in PortalRunSelect (portal=portal@entry=0x2561e18, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x603a4a8) at pquery.c:946
#23 0x00703f3e in PortalRun (portal=portal@entry=0x2561e18, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
dest=dest@entry=0x603a4a8, altdest=altdest@entry=0x603a4a8, 
completionTag=completionTag@entry=0x7ffde9df18b0 "") at pquery.c:787
#24 0x00700d5b in exec_simple_query (query_string=0x4685258 ) at 
postgres.c:1094
#25 PostgresMain (argc=, argv=argv@entry=0x256f5a8, 
dbname=

[HACKERS] [sqlsmith] Crash in tsquery_rewrite/QTNBinary

2016-12-06 Thread Andreas Seltenreich
Hi,

the following query crashes master as of 4212cb7.

select ts_rewrite(
  tsquery_phrase(
 tsquery $$'sanct' & 'peter'$$,
 tsquery $$'5' <-> '6'$$,
 42),
  tsquery $$'5' <-> '6'$$,
  plainto_tsquery('I') );

Backtrace below.

regards,
Andreas

Program terminated with signal SIGSEGV, Segmentation fault.
#0  QTNBinary (in=0x0) at tsquery_util.c:256
#1  0x559debd68643 in QTNBinary (in=0x559ded7cc998) at tsquery_util.c:260
#2  0x559debd68643 in QTNBinary (in=in@entry=0x559ded7cd068) at 
tsquery_util.c:260
#3  0x559debd67df5 in tsquery_rewrite (fcinfo=0x559ded72c040) at 
tsquery_rewrite.c:453
#4  0x559debb754f4 in ExecMakeFunctionResultNoSets (fcache=0x559ded72bfd0, 
econtext=0x559ded72bda8, isNull=0x559ded72d350 "", isDone=) at 
execQual.c:2046
#5  0x559debb7ba1e in ExecTargetList (tupdesc=, 
isDone=0x7ffce180da6c, itemIsDone=0x559ded72d490, isnull=0x559ded72d350 "", 
values=0x559ded72d330, econtext=0x559ded72bda8, targetlist=0x559ded72d458) at 
execQual.c:5486
#6  ExecProject (projInfo=, isDone=isDone@entry=0x7ffce180da6c) 
at execQual.c:5710
#7  0x559debb92c79 in ExecResult (node=node@entry=0x559ded72bc90) at 
nodeResult.c:155
#8  0x559debb74478 in ExecProcNode (node=node@entry=0x559ded72bc90) at 
execProcnode.c:392
#9  0x559debb702fe in ExecutePlan (dest=0x559ded7c8b98, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_SELECT, use_parallel_mode=, 
planstate=0x559ded72bc90, estate=0x559ded72bb78) at execMain.c:1568
#10 standard_ExecutorRun (queryDesc=0x559ded727a18, direction=, 
count=0) at execMain.c:338
#11 0x559debc9c238 in PortalRunSelect (portal=portal@entry=0x559ded71f958, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x559ded7c8b98) at pquery.c:946
#12 0x559debc9d89e in PortalRun (portal=portal@entry=0x559ded71f958, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
dest=dest@entry=0x559ded7c8b98, altdest=altdest@entry=0x559ded7c8b98, 
completionTag=completionTag@entry=0x7ffce180dee0 "") at pquery.c:787
#13 0x559debc9af42 in exec_simple_query (query_string=0x559ded795048 "...") 
at postgres.c:1094
#14 PostgresMain (argc=, argv=argv@entry=0x559ded7390b0, 
dbname=, username=) at postgres.c:4069
#15 0x559deb9ee2f8 in BackendRun (port=0x559ded726ef0) at postmaster.c:4274
#16 BackendStartup (port=0x559ded726ef0) at postmaster.c:3946
#17 ServerLoop () at postmaster.c:1704
#18 0x559debc2ebb4 in PostmasterMain (argc=3, argv=0x559ded7004a0) at 
postmaster.c:1312
#19 0x559deb9ef68d in main (argc=3, argv=0x559ded7004a0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Short reads in hash indexes (was: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts)

2016-12-06 Thread Andreas Seltenreich
Amit Kapila writes:

> On Sat, Dec 3, 2016 at 3:44 PM, Andreas Seltenreich <seltenre...@gmx.de> 
> wrote:
>> Amit Kapila writes:
>>
>>> [2. text/x-diff; fix_hash_bucketsplit_sqlsmith_v1.patch]
>> Ok, I'll do testing with the patch applied.

Good news: the assertion hasn't fired since the patch is in.

However, these are still getting logged:

smith=# select * from state_report where sqlstate = 'XX001';
-[ RECORD 1 
]--
count| 10
sqlstate | XX001
sample   | ERROR:  could not read block 1173 in file "base/16384/17256": read 
only 0 of 8192 bytes
hosts| {airbisquit,frell,gorgo,marbit,pillcrow,quakken}

> Hmm, I am not sure if this is related to previous problem, but it
> could be.  Is it possible to get the operation and or callstack for
> above failure?

Ok, will turn the elog into an assertion to get at the backtraces.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Crash in gather_readnext

2016-12-05 Thread Andreas Seltenreich
Hi,

on master as of a0ae54d, there's a 1 in 10e6 chance sqlsmith catches
gather_readnext reading beyond the gatherstate->readers array with
readers[gatherstate->readnext].  Sample backtrace below.

As readnext is never explicitly initialized, I think what happens is
that a rescan gets less workers than the initial scan, and the dangling
readnext points outside the array.  I'm no longer seeing these crashes
when explicitly initializing readnext to 0 like in the attached patch.

regards,
Andreas

Program terminated with signal SIGSEGV, Segmentation fault.
#0  shm_mq_receive (mqh=0x259, nbytesp=nbytesp@entry=0x7ffc55ce0580, 
datap=datap@entry=0x7ffc55ce0588, nowait=nowait@entry=1 '\001') at shm_mq.c:520
520 shm_mq *mq = mqh->mqh_queue;
(gdb) bt
#0  shm_mq_receive (mqh=0x259, nbytesp=nbytesp@entry=0x7ffc55ce0580, 
datap=datap@entry=0x7ffc55ce0588, nowait=nowait@entry=1 '\001') at shm_mq.c:520
#1  0x0060b8b7 in TupleQueueReaderNext (reader=reader@entry=0x5446c10, 
nowait=nowait@entry=1 '\001', done=done@entry=0x7ffc55ce065b "") at tqueue.c:692
#2  0x005f5e03 in gather_readnext (gatherstate=0x52a9918) at 
nodeGather.c:339
#3  gather_getnext (gatherstate=0x52a9918) at nodeGather.c:292
#4  ExecGather (node=node@entry=0x52a9918) at nodeGather.c:233
#5  0x005e3b68 in ExecProcNode (node=0x52a9918) at execProcnode.c:515
#6  0x005eb2f2 in ExecScanFetch (recheckMtd=0x605e40 , 
accessMtd=0x605e50 , node=0x52a86c0) at execScan.c:95
#7  ExecScan (node=node@entry=0x52a86c0, accessMtd=accessMtd@entry=0x605e50 
, recheckMtd=recheckMtd@entry=0x605e40 ) at 
execScan.c:180
#8  0x00605e6f in ExecSubqueryScan (node=node@entry=0x52a86c0) at 
nodeSubqueryscan.c:85
#9  0x005e3c68 in ExecProcNode (node=node@entry=0x52a86c0) at 
execProcnode.c:445
#10 0x006001d6 in ExecNestLoop (node=node@entry=0x52a7978) at 
nodeNestloop.c:123
#11 0x005e3bf8 in ExecProcNode (node=node@entry=0x52a7978) at 
execProcnode.c:476
#12 0x006001d6 in ExecNestLoop (node=node@entry=0x52a5120) at 
nodeNestloop.c:123
#13 0x005e3bf8 in ExecProcNode (node=node@entry=0x52a5120) at 
execProcnode.c:476
#14 0x006001d6 in ExecNestLoop (node=node@entry=0x52a3d50) at 
nodeNestloop.c:123
#15 0x005e3bf8 in ExecProcNode (node=0x52a3d50) at execProcnode.c:476
#16 0x006015e5 in ExecResult (node=node@entry=0x52a3140) at 
nodeResult.c:130
#17 0x005e3d18 in ExecProcNode (node=node@entry=0x52a3140) at 
execProcnode.c:392
#18 0x005fb360 in ExecLimit (node=node@entry=0x52a2e70) at 
nodeLimit.c:91
#19 0x005e3af8 in ExecProcNode (node=node@entry=0x52a2e70) at 
execProcnode.c:531
#20 0x00600299 in ExecNestLoop (node=node@entry=0x52a1a10) at 
nodeNestloop.c:174
#21 0x005e3bf8 in ExecProcNode (node=node@entry=0x52a1a10) at 
execProcnode.c:476
#22 0x006001d6 in ExecNestLoop (node=node@entry=0x52a16d0) at 
nodeNestloop.c:123
#23 0x005e3bf8 in ExecProcNode (node=node@entry=0x52a16d0) at 
execProcnode.c:476
#24 0x005dfdae in ExecutePlan (dest=0x50cbb00, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x52a16d0, estate=0x3610968) at 
execMain.c:1567
#25 standard_ExecutorRun (queryDesc=0x36805b8, direction=, 
count=0) at execMain.c:338
#26 0x00701a58 in PortalRunSelect (portal=portal@entry=0x529da38, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x50cbb00) at pquery.c:946
#27 0x0070300e in PortalRun (portal=portal@entry=0x529da38, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
dest=dest@entry=0x50cbb00, altdest=altdest@entry=0x50cbb00, 
completionTag=completionTag@entry=0x7ffc55ce0ed0 "") at pquery.c:787
#28 0x00700869 in exec_simple_query (query_string=0x45d3028 "select 
...") at postgres.c:1094
#29 PostgresMain (argc=, argv=argv@entry=0x23ce878, 
dbname=, username=) at postgres.c:4069
#30 0x0046d9d9 in BackendRun (port=0x23d1ad0) at postmaster.c:4271
#31 BackendStartup (port=0x23d1ad0) at postmaster.c:3945
#32 ServerLoop () at postmaster.c:1701
#33 0x00698ed9 in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0x23a05c0) at postmaster.c:1309
#34 0x0046ebbd in main (argc=4, argv=0x23a05c0) at main.c:228

>From be80954688c406122b560161192cc1d2e64e3757 Mon Sep 17 00:00:00 2001
From: Andreas Seltenreich <seltenre...@gmx.de>
Date: Mon, 5 Dec 2016 20:46:28 +0100
Subject: [PATCH] Fix potential crash on ReScanGather.

Initialize gatherstate->nextreader to 0 in order to prevent a crash
when ReScanGather gets less workers than the original scan, leading to
nextreader pointing outside the readers[nworkers] array.
---
 src/backend/executor/nodeGather.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 880ca62..2bdf223 1

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts

2016-12-03 Thread Andreas Seltenreich
Amit Kapila writes:

> How should I connect to this database?  If I use the user fdw
> mentioned in pg_hba.conf (changed authentication method to trust in
> pg_hba.conf), it says the user doesn't exist.  Can you create a user
> in the database which I can use?

There is also a superuser "postgres" and an unprivileged user "smith"
you should be able to login with.  You could also start postgres in
single-user mode to bypass the authentication altogether.

Amit Kapila writes:

> Please find attached patch to fix above code.  Now, if this is the
> reason of the problem you are seeing, it won't fix your existing
> database as it already contains some tuples in the wrong bucket.  Can
> you please re-run the test to see if you can reproduce the problem?

Ok, I'll do testing with the patch applied.

Btw, I also find entries like following in the logging database:

ERROR:  could not read block 2638 in file "base/16384/17256": read only 0 of 
8192 bytes

…with relfilenode being an hash index.  I usually ignore these as they
naturally start occuring after a recovery because of an unrelated crash.
But since 11003eb, they also occur when the cluster has not yet suffered
a crash.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in _hash_splitbucket_guts

2016-12-02 Thread Andreas Seltenreich
Hi,

the new hash index code on 11003eb failed an assertion yesterday:

TRAP: FailedAssertion("!(bucket == obucket)", File: "hashpage.c", Line: 
1037)

Statement was

update public.hash_i4_heap set seqno = public.hash_i4_heap.random;

It can be reproduced with the data directory (Debian stretch amd64) I've
put here:

http://ansel.ydns.eu/~andreas/_hash_splitbucket_guts.tar.xz (12 MB)

Backtrace below.  The cluster hasn't suffered any crashes before this
incident.

regards,
Andreas

Core was generated by `postgres: smith regression [local] UPDATE
   '.
Program terminated with signal SIGABRT, Aborted.
(gdb) bt
#0  0x7f49c40cc198 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x7f49c40cd61a in __GI_abort () at abort.c:89
#2  0x007f55c1 in ExceptionalCondition 
(conditionName=conditionName@entry=0x84f890 "!(bucket == obucket)", 
errorType=errorType@entry=0x83665d "FailedAssertion", 
fileName=fileName@entry=0x84f86a "hashpage.c", 
lineNumber=lineNumber@entry=1037) at assert.c:54
#3  0x004a3d41 in _hash_splitbucket_guts (rel=rel@entry=0x1251ff8, 
metabuf=metabuf@entry=1703, obucket=obucket@entry=37, 
nbucket=nbucket@entry=549, obuf=obuf@entry=3082, nbuf=nbuf@entry=1754, 
htab=0x0, maxbucket=549, highmask=1023, lowmask=511) at hashpage.c:1037
#4  0x004a5627 in _hash_splitbucket (lowmask=511, highmask=1023, 
maxbucket=549, nbuf=1754, obuf=3082, nbucket=549, obucket=37, metabuf=1703, 
rel=0x1251ff8) at hashpage.c:894
#5  _hash_expandtable (rel=0x1251ff8, metabuf=1703) at hashpage.c:768
#6  0x004a1f71 in _hash_doinsert (rel=rel@entry=0x1251ff8, 
itup=itup@entry=0x26dc830) at hashinsert.c:236
#7  0x004a01c3 in hashinsert (rel=0x1251ff8, values=, 
isnull=, ht_ctid=0x26dc6fc, heapRel=, 
checkUnique=) at hash.c:247
#8  0x005ded1b in ExecInsertIndexTuples (slot=slot@entry=0x26dbd10, 
tupleid=tupleid@entry=0x26dc6fc, estate=estate@entry=0x2530028, 
noDupErr=noDupErr@entry=0 '\000', specConflict=specConflict@entry=0x0, 
arbiterIndexes=arbiterIndexes@entry=0x0) at execIndexing.c:388
#9  0x005fddaa in ExecUpdate (tupleid=tupleid@entry=0x7ffcaa7c9e40, 
oldtuple=oldtuple@entry=0x0, slot=slot@entry=0x26dbd10, 
planSlot=planSlot@entry=0x26db278, epqstate=epqstate@entry=0x26dac98, 
estate=estate@entry=0x2530028, canSetTag=1 '\001') at nodeModifyTable.c:1030
#10 0x005fe49c in ExecModifyTable (node=node@entry=0x26dabf0) at 
nodeModifyTable.c:1516
#11 0x005e3a18 in ExecProcNode (node=node@entry=0x26dabf0) at 
execProcnode.c:396
#12 0x005dfabe in ExecutePlan (dest=0x1c2ecd0, direction=, numberTuples=0, sendTuples=, operation=CMD_UPDATE, 
use_parallel_mode=, planstate=0x26dabf0, estate=0x2530028) at 
execMain.c:1567
#13 standard_ExecutorRun (queryDesc=0x1c2ed68, direction=, 
count=0) at execMain.c:338
#14 0x00701b94 in ProcessQuery (plan=, 
sourceText=0xfff228 "update public.hash_i4_heap set \n  seqno = 
public.hash_i4_heap.random\nreturning \n  (select option_value from 
information_schema.foreign_server_options limit 1 offset 2)\n as c0", 
params=0x0, dest=0x1c2ecd0, completionTag=0x7ffcaa7ca020 "") at pquery.c:185
#15 0x00701e0b in PortalRunMulti (portal=portal@entry=0x25c52b0, 
isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=1 
'\001', dest=dest@entry=0x1c2ecd0, altdest=0xca30e0 , 
completionTag=completionTag@entry=0x7ffcaa7ca020 "") at pquery.c:1299
#16 0x007020f9 in FillPortalStore (portal=portal@entry=0x25c52b0, 
isTopLevel=isTopLevel@entry=1 '\001') at pquery.c:1045
#17 0x00702bcd in PortalRun (portal=portal@entry=0x25c52b0, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
dest=dest@entry=0x199f248, altdest=altdest@entry=0x199f248, 
completionTag=completionTag@entry=0x7ffcaa7ca3d0 "") at pquery.c:782
#18 0x00700379 in exec_simple_query (query_string=0xfff228 "update 
public.hash_i4_heap set \n  seqno = public.hash_i4_heap.random\nreturning \n  
(select option_value from information_schema.foreign_server_options limit 1 
offset 2)\n as c0") at postgres.c:1094
#19 PostgresMain (argc=, argv=argv@entry=0xfad1d8, 
dbname=, username=) at postgres.c:4069
#20 0x0046d6c9 in BackendRun (port=0xfa8c60) at postmaster.c:4271
#21 BackendStartup (port=0xfa8c60) at postmaster.c:3945
#22 ServerLoop () at postmaster.c:1701
#23 0x00698ab9 in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0xf765d0) at postmaster.c:1309
#24 0x0046e88d in main (argc=4, argv=0xf765d0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in TS_phrase_execute

2016-11-26 Thread Andreas Seltenreich
Hi,

the query below triggers an assertion in TS_phrase_execute.  Testing was
done on master at dbdfd11.

regards,
Andreas

-- TRAP: FailedAssertion("!(curitem->qoperator.oper == 4)", File: 
"tsvector_op.c", Line: 1432)

select 'moscow' @@
   ts_rewrite('moscow', 'moscow',
  ts_rewrite(
 tsquery_phrase('moscow','moscow'),
 'moscow',
 $$ 'sanct' & 'peter'$$));


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-24 Thread Andreas Seltenreich
Hi,

just caught another InitPlan below Gather with the recent patches in
(master as of 4cc6a3f).  Recipe below.

regards,
andreas

set max_parallel_workers_per_gather = 2;
set min_parallel_relation_size = 0;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;

explain select 1 from
   public.quad_point_tbl as ref_0,
   lateral (select
 ref_0.p as c3,
 sample_0.d as c5
   from
 public.nv_child_2010 as sample_0
   left join public.mvtest_tvv as ref_1
   on ('x'< (select contype from pg_catalog.pg_constraint limit 1))
   limit 82) as subq_0;

--QUERY PLAN
-- 

--  Gather  (cost=0.19..13727.52 rows=902246 width=4)
--Workers Planned: 2
--->  Nested Loop  (cost=0.19..13727.52 rows=902246 width=4)
--  ->  Parallel Seq Scan on quad_point_tbl ref_0  (cost=0.00..105.85 
rows=4585 width=16)
--  ->  Limit  (cost=0.19..1.33 rows=82 width=20)
--InitPlan 1 (returns $0)
--  ->  Limit  (cost=0.00..0.19 rows=1 width=1)
--->  Gather  (cost=0.00..10.22 rows=54 width=1)
--  Workers Planned: 2
--  ->  Parallel Seq Scan on pg_constraint  
(cost=0.00..10.22 rows=22 width=1)
--->  Seq Scan on nv_child_2010 sample_0  (cost=0.00..35.50 
rows=2550 width=20)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Andreas Seltenreich
Hi,

Ashutosh Sharma writes:

>> the following query appears to reliably crash parallel workers on master
>> as of 0832f2d.

> As suggested, I have tried to reproduce this issue on *0832f2d* commit but
> could not reproduce it.

as Tom pointed out earlier, I had secretly set parallel_setup_cost and
parallel_tuple_cost to 0.  I assumed these were irrelevant when
force_parallel_mode is on.  I'll do less assuming and more testing on a
vanilla install on future reports.

Sorry for the inconvenience,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-20 Thread Andreas Seltenreich
Hi,

the query below triggers a parallel worker assertion for me when run on
the regression database of master as of 0832f2d.  The plan sports a
couple of InitPlan nodes below Gather.

regards,
Andreas

 Gather  (cost=1.64..84.29 rows=128 width=4)
   Workers Planned: 1
   Single Copy: true
   ->  Limit  (cost=1.64..84.29 rows=128 width=4)
 ->  Subquery Scan on subq_0  (cost=1.64..451.06 rows=696 width=4)
   Filter: (subq_0.c6 IS NOT NULL)
   ->  Nested Loop Left Join  (cost=1.64..444.07 rows=699 width=145)
 Join Filter: (sample_0.aa = sample_1.pageno)
 InitPlan 4 (returns $3)
   ->  Result  (cost=1.21..5.36 rows=15 width=0)
 One-Time Filter: ($0 AND ($1 = $2))
 InitPlan 1 (returns $0)
   ->  Result  (cost=0.00..0.00 rows=0 width=0)
 One-Time Filter: false
 InitPlan 2 (returns $1)
   ->  Limit  (cost=0.35..0.52 rows=1 width=4)
 ->  Gather  (cost=0.00..1.04 rows=6 
width=4)
   Workers Planned: 1
   ->  Parallel Seq Scan on reltime_tbl 
 (cost=0.00..1.04 rows=4 width=4)
 InitPlan 3 (returns $2)
   ->  Limit  (cost=0.52..0.69 rows=1 width=4)
 ->  Gather  (cost=0.00..1.04 rows=6 
width=4)
   Workers Planned: 1
   ->  Parallel Seq Scan on reltime_tbl 
reltime_tbl_1  (cost=0.00..1.04 rows=4 width=4)
 ->  Sample Scan on pg_foreign_data_wrapper 
sample_2  (cost=1.21..5.36 rows=15 width=0)
   Sampling: system ('3.1'::real)
 ->  Nested Loop  (cost=0.15..382.85 rows=699 width=4)
   ->  Sample Scan on pg_largeobject sample_1  
(cost=0.00..209.03 rows=699 width=8)
 Sampling: bernoulli ('2.9'::real)
 Filter: (pageno IS NOT NULL)
   ->  Index Only Scan using 
pg_foreign_table_relid_index on pg_foreign_table ref_0  (cost=0.15..0.24 rows=1 
width=4)
 Index Cond: (ftrelid = sample_1.loid)
 ->  Materialize  (cost=0.00..16.06 rows=4 width=4)
   ->  Append  (cost=0.00..16.04 rows=4 width=4)
 ->  Sample Scan on a sample_0  
(cost=0.00..4.01 rows=1 width=4)
   Sampling: system ('5'::real)
 ->  Sample Scan on b sample_0_1  
(cost=0.00..4.01 rows=1 width=4)
   Sampling: system ('5'::real)
 ->  Sample Scan on c sample_0_2  
(cost=0.00..4.01 rows=1 width=4)
   Sampling: system ('5'::real)
 ->  Sample Scan on d sample_0_3  
(cost=0.00..4.01 rows=1 width=4)
   Sampling: system ('5'::real)


--8<---cut here---start->8---
set force_parallel_mode to on;
set max_parallel_workers_per_gather to 2;

select
  91 as c0
from
  (select
(select pfname from public.pslot limit 1 offset 3)
   as c0,
ref_1.grandtot as c1,
(select pg_catalog.min(procost) from pg_catalog.pg_proc)
   as c2,
ref_0.ftoptions as c3,
ref_1.grandtot as c4,
sample_1.loid as c5,
pg_catalog.pg_rotate_logfile() as c6,
(select random from public.hash_i4_heap limit 1 offset 5)
   as c7,
sample_1.loid as c8
  from
public.a as sample_0 tablesample system (5)
right join pg_catalog.pg_largeobject as sample_1 tablesample 
bernoulli (2.9)
  inner join pg_catalog.pg_foreign_table as ref_0
  on (sample_1.loid = ref_0.ftrelid )
on (sample_0.aa = sample_1.pageno )
  left join public.mvtest_tvv as ref_1
  on (EXISTS (
  select
  sample_2.fdwoptions as c0,
  sample_2.fdwhandler as c1,
  (select during from public.test_range_excl limit 1 offset 89)
 as c2
from
  pg_catalog.pg_foreign_data_wrapper as sample_2 tablesample 
system (3.1)
where (EXISTS (
select
sample_3.b as c0,
(select grantee from information_schema.udt_privileges 
limit 1 offset 4)
   as c1,
sample_3.b as c2,
sample_3.rf_a as c3,
sample_3.b as c4,
 

[HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-20 Thread Andreas Seltenreich
Hi,

the following query appears to reliably crash parallel workers on master
as of 0832f2d.

--8<---cut here---start->8---
set max_parallel_workers_per_gather to 2;
set force_parallel_mode to 1;

select subq.context from pg_settings,
lateral (select context from pg_opclass limit 1) as subq
limit 1;
--8<---cut here---end--->8---

Backtrace of a worker below.

regards,
Andreas

Core was generated by `postgres: bgworker: parallel worker for PID 27448'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  MakeExpandedObjectReadOnlyInternal (d=0) at expandeddatum.c:100
100 if (!VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(d)))
(gdb) bt
#0  MakeExpandedObjectReadOnlyInternal (d=0) at expandeddatum.c:100
#1  0x563b0c9a4e98 in ExecTargetList (tupdesc=, 
isDone=0x7ffdd20400ac, itemIsDone=0x563b0e6a8b50, isnull=0x563b0e6a8ae0 "", 
values=0x563b0e6a8ac0, econtext=0x563b0e6a7db8, targetlist=0x563b0e6a8498) at 
execQual.c:5491
#2  ExecProject (projInfo=projInfo@entry=0x563b0e6a8368, 
isDone=isDone@entry=0x7ffdd20400ac) at execQual.c:5710
#3  0x563b0c9a514f in ExecScan (node=node@entry=0x563b0e6a8038, 
accessMtd=accessMtd@entry=0x563b0c9bc910 , 
recheckMtd=recheckMtd@entry=0x563b0c9bc900 ) at execScan.c:220
#4  0x563b0c9bc9c3 in ExecSeqScan (node=node@entry=0x563b0e6a8038) at 
nodeSeqscan.c:127
#5  0x563b0c99d6e8 in ExecProcNode (node=node@entry=0x563b0e6a8038) at 
execProcnode.c:419
#6  0x563b0c9995be in ExecutePlan (dest=0x563b0e67da40, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_SELECT, use_parallel_mode=, 
planstate=0x563b0e6a8038, estate=0x563b0e6a77f8) at execMain.c:1567
#7  standard_ExecutorRun (queryDesc=0x563b0e6a2828, direction=, 
count=0) at execMain.c:338
#8  0x563b0c99c911 in ParallelQueryMain (seg=, 
toc=0x7f55173aa000) at execParallel.c:745
#9  0x563b0c898b02 in ParallelWorkerMain (main_arg=) at 
parallel.c:1108
#10 0x563b0ca49cad in StartBackgroundWorker () at bgworker.c:726
#11 0x563b0ca55770 in do_start_bgworker (rw=0x563b0e621080) at 
postmaster.c:5535
#12 maybe_start_bgworker () at postmaster.c:5710
#13 0x563b0ca56238 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:4959
#14 
#15 0x7f5516788293 in __select_nocancel () at 
../sysdeps/unix/syscall-template.S:84
#16 0x563b0c818249 in ServerLoop () at postmaster.c:1665
#17 0x563b0ca577e2 in PostmasterMain (argc=3, argv=0x563b0e5fa490) at 
postmaster.c:1309
#18 0x563b0c819f6d in main (argc=3, argv=0x563b0e5fa490) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Andreas Seltenreich
Michael Paquier writes:

> [2. text/plain; fix-guc-string-eval.patch]

I'm afraid taking care of the length computation is not sufficient.
ISTM like it'll still try to serialize the NULL pointer later on in
serialize_variable:

,[ guc.c:9108 ]
| case PGC_STRING:
| {
|   struct config_string *conf = (struct config_string *) gconf;
|   do_serialize(destptr, maxbytes, "%s", *conf->variable);
`


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Crash on GUC serialization

2016-11-19 Thread Andreas Seltenreich
Hi,

sqlsmith just made a GUC that tricks the serialization code into
dereferencing a nullpointer.  Here's a recipe:

--8<---cut here---start->8---
set min_parallel_relation_size to 0;
set max_parallel_workers_per_gather to 2;
set force_parallel_mode to on;
begin;
select set_config('foo.baz', null, b) from (values (false), (true)) g(b);
commit;
select 1;
--8<---cut here---end--->8---

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Missing CHECK_FOR_INTERRUPTS in tsquery_rewrite

2016-10-29 Thread Andreas Seltenreich
Hi,

testing with sqlsmith yielded an uncancellable backend hogging CPU time.
Gdb showed it was busy in findeq() of tsquery_rewrite.c.  This function
appears to have exponential complexity wrt. the size of the involved
tsqueries.  The following query runs for 12s on my machine with no way
to cancel it and incrementing the length of the first argument by 1
doubles this time.

select ts_rewrite(
  (select string_agg(i::text, '&')::tsquery from generate_series(1,32) g(i)),
  (select string_agg(i::text, '&')::tsquery from generate_series(1,19) g(i)),
  'foo');

The attached patch adds a CHECK_FOR_INTERRUPTS to make it cancellable.

regards,
Andreas

>From d9910a96c9bd73c16e29ecaa0577945d5e1c091c Mon Sep 17 00:00:00 2001
From: Andreas Seltenreich <seltenre...@gmx.de>
Date: Sun, 30 Oct 2016 03:25:55 +0100
Subject: [PATCH] Add CHECK_FOR_INTERRUPTS in tsquery_rewrite loop.

The loop in findeq() appears to have exponential complexity and
runtime becomes excessive for more than about 30 tokens in the
tsvectors.  Add a CHECK_FOR_INTERRUPTS to make it cancellable.
---
 src/backend/utils/adt/tsquery_rewrite.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/backend/utils/adt/tsquery_rewrite.c b/src/backend/utils/adt/tsquery_rewrite.c
index 28f328d..ef6444f 100644
--- a/src/backend/utils/adt/tsquery_rewrite.c
+++ b/src/backend/utils/adt/tsquery_rewrite.c
@@ -95,6 +95,10 @@ findeq(QTNode *node, QTNode *ex, QTNode *subs, bool *isfind)
 
 			do
 			{
+/* This loop is rather heavyweight, it better be
+ * cancellable. */
+CHECK_FOR_INTERRUPTS();
+
 tnode->sign = 0;
 for (i = 0; i < ex->nchild; i++)
 {
-- 
2.9.3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Infinite recursion in bitshift

2016-10-14 Thread Andreas Seltenreich
Tom Lane writes:

>> This is due to an integer overflow in bitshiftright()/bitshiftleft()
>> leading to them recursively calling each other.  Patch attached.
>
> Seems sane, though I wonder if it'd be better to use -INT_MAX rather
> than -VARBITMAXLEN.

I am undecided between those two.  -INT_MAX might be a more precise fix
for the problem, but the extra distance to the danger zone was kind of
soothing :-).

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Infinite recursion in bitshift

2016-10-14 Thread Andreas Seltenreich
Hi,

sqlsmith just found another crasher:

select bit '1' >> (-2^31)::int;

This is due to an integer overflow in bitshiftright()/bitshiftleft()
leading to them recursively calling each other.  Patch attached.

regards,
Andreas
>From cfdc425f75da268e1c2af08f936c59f34b69e577 Mon Sep 17 00:00:00 2001
From: Andreas Seltenreich <seltenre...@gmx.de>
Date: Fri, 14 Oct 2016 20:52:52 +0200
Subject: [PATCH] Fix possible infinite recursion on bitshift.

bitshiftright() and bitshiftleft() would recursively call each other
infinitely when the user passed a MIN_INT for the shift amount due to
an integer overflow.  This patch reduces a negative shift amount to
-VARBITMAXLEN to avoid overflow.
---
 src/backend/utils/adt/varbit.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/backend/utils/adt/varbit.c b/src/backend/utils/adt/varbit.c
index 75e6a46..d8ecfb6 100644
--- a/src/backend/utils/adt/varbit.c
+++ b/src/backend/utils/adt/varbit.c
@@ -1387,9 +1387,15 @@ bitshiftleft(PG_FUNCTION_ARGS)
 
 	/* Negative shift is a shift to the right */
 	if (shft < 0)
+	{
+		/* Protect against overflow */
+		if (shft < -VARBITMAXLEN)
+			shft = -VARBITMAXLEN;
+
 		PG_RETURN_DATUM(DirectFunctionCall2(bitshiftright,
 			VarBitPGetDatum(arg),
 			Int32GetDatum(-shft)));
+	}
 
 	result = (VarBit *) palloc(VARSIZE(arg));
 	SET_VARSIZE(result, VARSIZE(arg));
@@ -1447,9 +1453,15 @@ bitshiftright(PG_FUNCTION_ARGS)
 
 	/* Negative shift is a shift to the left */
 	if (shft < 0)
+	{
+		/* Protect against overflow */
+		if (shft < -VARBITMAXLEN)
+			shft = -VARBITMAXLEN;
+
 		PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft,
 			VarBitPGetDatum(arg),
 			Int32GetDatum(-shft)));
+	}
 
 	result = (VarBit *) palloc(VARSIZE(arg));
 	SET_VARSIZE(result, VARSIZE(arg));
-- 
2.9.3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in numeric aggregate

2016-09-03 Thread Andreas Seltenreich
Hi,

updating master from be7f7ee..39b691f, the following assertion is
triggered frequently by sqlsmith:

TRAP: BadArgument("!(((context) != ((void *)0) && (const 
Node*)((context)))->type) == T_AllocSetContext", File: "mcxt.c", Line: 1010)

Digging in the coredumps, it looks like set_var_from_num() is invoked on
an uninitialized NumericVar.  Sample gdb session below.

Below is also one of the generated queries that eventually triggers it
for me when invoked a dozen times or so.

regards,
Andreas

--8<---cut here---start->8---
select
  subq_0.c0 as c0,
  subq_0.c0 as c1,
  5 as c2,
  (select pg_catalog.min(class) from public.f_star)
 as c3
from
  (select
sample_2.cc as c0
  from
public.shoelace_arrive as ref_0
  inner join public.hub as sample_1
right join public.e_star as sample_2
on (sample_1.name = sample_2.class )
  on (ref_0.arr_name = sample_2.class )
  limit 63) as subq_0
where ((subq_0.c0 is not NULL)
and ((select pg_catalog.var_pop(enumsortorder) from pg_catalog.pg_enum)
 is not NULL))
  and (((select pg_catalog.var_samp(random) from public.bt_txt_heap)
 is NULL)
or ((select m from public.money_data limit 1 offset 1)
 <> (select pg_catalog.min(salary) from public.rtest_empmass)
));
--8<---cut here---end--->8---

(gdb) bt
#0  0x7ff011f221c8 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x7ff011f2364a in __GI_abort () at abort.c:89
#2  0x007ef1b1 in ExceptionalCondition 
(conditionName=conditionName@entry=0x9d26c8 "!(((context) != ((void *)0) && 
(const Node*)((context)))->type) == T_AllocSetContext", 
errorType=errorType@entry=0x835c25 "BadArgument", 
fileName=fileName@entry=0x9d2640 "mcxt.c", lineNumber=lineNumber@entry=1010) at 
assert.c:54
#3  0x00813561 in pfree (pointer=) at mcxt.c:1010
#4  0x00773169 in alloc_var (var=var@entry=0x7ffe3a6d18d0, 
ndigits=ndigits@entry=6) at numeric.c:5387
#5  0x00774230 in set_var_from_num (num=0x1e49180, dest=0x7ffe3a6d18d0) 
at numeric.c:5608
#6  0x0077be2c in numeric_poly_deserialize (fcinfo=) at 
numeric.c:4196
#7  0x005ec48c in combine_aggregates (aggstate=0x1e255d8, 
pergroup=) at nodeAgg.c:986
#8  0x005edcc5 in agg_retrieve_direct (aggstate=0x1e255d8) at 
nodeAgg.c:2095
#9  ExecAgg (node=node@entry=0x1e255d8) at nodeAgg.c:1837
#10 0x005e0078 in ExecProcNode (node=node@entry=0x1e255d8) at 
execProcnode.c:503
#11 0x0060173c in ExecSetParamPlan (node=, 
econtext=0x1e2e710) at nodeSubplan.c:995
#12 0x005e4f75 in ExecEvalParamExec (exprstate=, 
econtext=, isNull=0x7ffe3a6d1b3f "", isDone=) at 
execQual.c:1140
#13 0x005e14c6 in ExecEvalNullTest (nstate=0x1e2ec50, 
econtext=0x1e2e710, isNull=0x7ffe3a6d1b3f "", isDone=0x0) at execQual.c:3902
#14 0x005e0656 in ExecEvalOr (orExpr=, 
econtext=0x1e2e710, isNull=0x7ffe3a6d1b3f "", isDone=) at 
execQual.c:2809
#15 0x005e7089 in ExecQual (qual=, 
econtext=econtext@entry=0x1e2e710, resultForNull=resultForNull@entry=0 '\000') 
at execQual.c:5379
#16 0x005fd6b1 in ExecResult (node=node@entry=0x1e2e5f8) at 
nodeResult.c:82
#17 0x005e01f8 in ExecProcNode (node=node@entry=0x1e2e5f8) at 
execProcnode.c:392
#18 0x005dc27e in ExecutePlan (dest=0x7ff0129e22b0, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_SELECT, use_parallel_mode=, planstate=0x1e2e5f8, 
estate=0x1e1aba8) at execMain.c:1567
#19 standard_ExecutorRun (queryDesc=0x1d563b8, direction=, 
count=0) at execMain.c:338
#20 0x006faad8 in PortalRunSelect (portal=portal@entry=0x1def878, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x7ff0129e22b0) at pquery.c:948
#21 0x006fc04e in PortalRun (portal=portal@entry=0x1def878, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
dest=dest@entry=0x7ff0129e22b0, altdest=altdest@entry=0x7ff0129e22b0, 
completionTag=completionTag@entry=0x7ffe3a6d1fa0 "") at pquery.c:789
#22 0x006f8deb in exec_simple_query (query_string=0x1dc2e58 "select  \n 
 subq_0.c0 as c0, \n  subq_0.c0 as c1, \n  5 as c2, \n  (select 
pg_catalog.min(class) from public.f_star)\n as c3\nfrom \n  (select  \n 
   sample_2.cc as c0\n  from \npublic.shoel"...) at postgres.c:1094
#23 PostgresMain (argc=, argv=argv@entry=0x1d64730, 
dbname=0x1d64590 "regression", username=) at postgres.c:4070
#24 0x0046cf81 in BackendRun (port=0x1d4ffd0) at postmaster.c:4260
#25 BackendStartup (port=0x1d4ffd0) at postmaster.c:3934
#26 ServerLoop () at postmaster.c:1691
#27 0x00693634 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x1d2c5d0) at postmaster.c:1299
#28 0x0046e0d6 in main (argc=3, argv=0x1d2c5d0) at main.c:228
(gdb) frame 5
#5  0x00774230 in 

[HACKERS] [sqlsmith] Crash in pg_get_viewdef_name_ext()

2016-08-07 Thread Andreas Seltenreich
Hi,

sqlsmith just triggered a crash in pg_get_viewdef_name_ext().  Looks
like commit 976b24fb4 failed to update this caller of
pg_get_viewdef_worker().  Backtrace below.  Patch attached.

regards,
Andreas

Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x007cdf09 in cstring_to_text (s=s@entry=0x0) at varlena.c:152
#2  0x007a3409 in string_to_text (str=0x0) at ruleutils.c:10083
#3  pg_get_viewdef_name_ext (fcinfo=) at ruleutils.c:681
#4  0x005dfae2 in ExecMakeFunctionResultNoSets (fcache=0x403ed80, 
econtext=0x3fb0eb8, isNull=0x403e0a1 "", isDone=) at 
execQual.c:2041
[...]

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index d68ca7a..3c3fce4 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -671,6 +671,7 @@ pg_get_viewdef_name_ext(PG_FUNCTION_ARGS)
 	int			prettyFlags;
 	RangeVar   *viewrel;
 	Oid			viewoid;
+	char		*res;
 
 	prettyFlags = pretty ? PRETTYFLAG_PAREN | PRETTYFLAG_INDENT : PRETTYFLAG_INDENT;
 
@@ -678,7 +679,12 @@ pg_get_viewdef_name_ext(PG_FUNCTION_ARGS)
 	viewrel = makeRangeVarFromNameList(textToQualifiedNameList(viewname));
 	viewoid = RangeVarGetRelid(viewrel, NoLock, false);
 
-	PG_RETURN_TEXT_P(string_to_text(pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT)));
+	res = pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT);
+
+	if (res == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_TEXT_P(string_to_text(res));
 }
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-06 Thread Andreas Seltenreich
Michael Paquier writes:

> Andreas, with the patch attached is the assertion still triggered?
> [2. text/x-diff; base-backup-crash-v2.patch]

I didn't observe the crashes since applying this patch.  There should
have been about five by the amount of fuzzing done.

regards
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-06 Thread Andreas Seltenreich
Hi,

since updating master from c93d873..fc509cd, I see crashes in
GetOldestSnapshot() on update/delete returning statements.

I reduced the triggering statements down to this:

update clstr_tst set d = d returning d;

Backtrace below.

regards,
Andreas

Program received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  GetOldestSnapshot () at snapmgr.c:422
#1  0x004b8279 in init_toast_snapshot (toast_snapshot=0x7ffcd824b010) 
at tuptoaster.c:2314
#2  0x004b83bc in toast_fetch_datum (attr=) at 
tuptoaster.c:1869
#3  0x004b9ab5 in heap_tuple_untoast_attr (attr=0x18226c8) at 
tuptoaster.c:179
#4  0x007f71ad in pg_detoast_datum_packed (datum=) at 
fmgr.c:2266
#5  0x007cfc12 in text_to_cstring (t=0x18226c8) at varlena.c:186
#6  0x007f5735 in FunctionCall1Coll (flinfo=flinfo@entry=0x18221c0, 
collation=collation@entry=0, arg1=arg1@entry=25306824) at fmgr.c:1297
#7  0x007f68ee in OutputFunctionCall (flinfo=0x18221c0, val=25306824) 
at fmgr.c:1946
#8  0x00478bc1 in printtup (slot=0x1821f80, self=0x181ce48) at 
printtup.c:359
#9  0x006f9c8e in RunFromStore (portal=portal@entry=0x177cbf8, 
direction=direction@entry=ForwardScanDirection, count=count@entry=0, 
dest=0x181ce48) at pquery.c:1117
#10 0x006f9d52 in PortalRunSelect (portal=portal@entry=0x177cbf8, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x181ce48) at pquery.c:942
#11 0x006fb41e in PortalRun (portal=portal@entry=0x177cbf8, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
dest=dest@entry=0x181ce48, altdest=altdest@entry=0x181ce48, 
completionTag=completionTag@entry=0x7ffcd824b920 "") at pquery.c:787
#12 0x006f822b in exec_simple_query (query_string=0x17db878 "update 
clstr_tst set d = d returning d;") at postgres.c:1094
#13 PostgresMain (argc=, argv=argv@entry=0x1781ce0, 
dbname=0x1781b40 "regression", username=) at postgres.c:4074
#14 0x0046c9bd in BackendRun (port=0x1786920) at postmaster.c:4262
#15 BackendStartup (port=0x1786920) at postmaster.c:3936
#16 ServerLoop () at postmaster.c:1693
#17 0x00693044 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x175d5f0) at postmaster.c:1301
#18 0x0046dd26 in main (argc=3, argv=0x175d5f0) at main.c:228
(gdb) list
417
418 if (OldestActiveSnapshot != NULL)
419 ActiveLSN = OldestActiveSnapshot->as_snap->lsn;
420
421 if (XLogRecPtrIsInvalid(RegisteredLSN) || RegisteredLSN > 
ActiveLSN)
422 return OldestActiveSnapshot->as_snap;
423
424 return OldestRegisteredSnapshot;
425 }
426
(gdb) bt full
#0  GetOldestSnapshot () at snapmgr.c:422
OldestRegisteredSnapshot = 
RegisteredLSN = 
ActiveLSN = 
#1  0x004b8279 in init_toast_snapshot (toast_snapshot=0x7ffcd824b010) 
at tuptoaster.c:2314
snapshot = 
#2  0x004b83bc in toast_fetch_datum (attr=) at 
tuptoaster.c:1869
toastrel = 0x7f8b4ca88920
toastidxs = 0x18447c8
toastkey = {
  sk_flags = 0,
  sk_attno = 1,
  sk_strategy = 3,
  sk_subtype = 0,
  sk_collation = 100,
  sk_func = {
fn_addr = 0x77c490 ,
fn_oid = 184,
fn_nargs = 2,
fn_strict = 1 '\001',
fn_retset = 0 '\000',
fn_stats = 2 '\002',
fn_extra = 0x0,
fn_mcxt = 0x18282a8,
fn_expr = 0x0
  },
  sk_argument = 34491
}
toastscan = 
ttup = 
toasttupDesc = 0x7f8b4ca88c50
result = 0x18422d8
toast_pointer = 
ressize = 5735
residx = 
nextidx = 0
numchunks = 3
chunk = 
isnull = 
chunkdata = 
chunksize = 
num_indexes = 1
validIndex = 0
SnapshotToast = {
  satisfies = 0x112,
  xmin = 3626283536,
  xmax = 32764,
  xip = 0xf8ac628,
  xcnt = 5221870,
  subxip = 0x0,
  subxcnt = 0,
  suboverflowed = 0 '\000',
  takenDuringRecovery = 0 '\000',
  copied = 0 '\000',
  curcid = 14,
  speculativeToken = 0,
  active_count = 260753304,
  regd_count = 0,
  ph_node = {
first_child = 0xf8ac680,
next_sibling = 0xa400112,
prev_or_parent = 0x0
  },
  whenTaken = 274,
  lsn = 0
}
__func__ = "toast_fetch_datum"
#3  0x004b9ab5 in heap_tuple_untoast_attr (attr=0x18226c8) at 
tuptoaster.c:179
No locals.
#4  0x007f71ad in pg_detoast_datum_packed (datum=) at 
fmgr.c:2266
No locals.
#5  0x007cfc12 in text_to_cstring (t=0x18226c8) at varlena.c:186
tunpacked = 
result = 
#6  0x007f5735 in FunctionCall1Coll (flinfo=flinfo@entry=0x18221c0, 

[HACKERS] [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)

2016-08-03 Thread Andreas Seltenreich
Hi,

the following statement triggers an assertion in tsearch:

select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]),  
'{smith,smith}'::text[]);
-- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 
511)

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-08-03 Thread Andreas Seltenreich
Hi,

testing with sqlsmith shows that the following assertion doesn't hold:

FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 
10200)

The triggering statements always contain a call to pg_start_backup with
the third argument 'true', i.e. it's trying to start an exlusive backup.

I didn't manage to put together a stand-alone testcase yet.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-16 Thread Andreas Seltenreich
I wrote:

> Sounds like some fuzz testing with nan/infinity is in order.

related fallout: close_ps returns a NULL pointer with NaNs around:

select close_ps('(nan,nan)', '(nan,nan),(nan,nan)');
-- TRAP: FailedAssertion("!(result != ((void *)0))", File: "geo_ops.c", Line: 
2860)

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance

2016-07-14 Thread Andreas Seltenreich
Andres Freund writes:

> On 2016-07-14 23:03:10 +0200, Andreas Seltenreich wrote:
>> That's the plan, yes.  I'm sorry there's no publishable code yet on the
>> the postgres side of things.  Using libFirm[1], the plan is to.
>
> Why libfirm?

- It has a more modern IR than LLVM (they're catching up though)
- It's very lightweight, compiles faster than LLVM's ./configure run
- I do have lots of experience with it and a committer bit

> It seems to only have x86 and sparc backends, and no windows support?

Ack, it's mostly used in research projects, that's why the number of
supported ISAs is small.  It's enough to answer the burning question
what speedup is to expected by jit-compiling things in the backend
though.  Also, if this thing actually takes off, adding more backends is
something that is easier with libFirm than LLVM, IMHO.

>> 1. Automatically generate Firm-IR for the static C code around
>>expression evaluation as well operators in the system catalog.
>
>> 2. Construct IR for expression trees (essentially all the function calls
>>the executor would do).
>
> But that essentially means redoing most of execQual's current code in IR
> - or do you want to do that via 1) above?

Manually re-doing backend logic in IR is a can of worms I do not want to
open.  This would guarantee bugs and result in a maintenance nightmare,
so doing 1) for the code is the only option when something turns out to
be a bottleneck IMHO.

> As long as the preparation code (which is currently intermixed with
> the execution phase) isn't separated, that means pulling essentially
> the whole backend given that we do catalog lookups and everything
> during that phase.

Right, the catalog lookups need to be done before JIT-compilation to
allow inlining operators.

>> Currently, a student at credativ is working on applying these
>> techniques to postgres.
>
> Are you planning to support this to postgres proper?

The goal is to publish it as an extension that sneaks into planner_hook.
I think BSD-licensing is ok as long as libfirm (LGPL) is kept as an
external dependency.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance

2016-07-14 Thread Andreas Seltenreich
Andres Freund writes:

> The problem is that the previous form has a lot of ad-hoc analysis
> strewn in. The interesting part is getting rid of all that. That's what
> the new ExecInitExpr2() does. The target form can be both evaluated more
> efficiently in the dispatch manner in the patch, and quite simply
> converted to a JIT - without duplicating the analysis code.  I did write
> a small ad-hoc x86 jit, and it was really quite straightforward this
> way.

Ja, I see the advantage when doing ad-hoc-JIT compilation.

> What did you do with JIT and expression evaluation? You basically just
> replaced the toplevel ExprState note with a different evalfunc, pointing
> into your code?

That's the plan, yes.  I'm sorry there's no publishable code yet on the
the postgres side of things.  Using libFirm[1], the plan is to.

1. Automatically generate Firm-IR for the static C code around
   expression evaluation as well operators in the system catalog.

2. Construct IR for expression trees (essentially all the function calls
   the executor would do).

3. Push libFirm's optimize button.  At this stage, most of the
   dispatching goes away by inlining the calls including functions from
   the catalog implementing operators.

4. Generate code and replace the toplevel expression node with a funcall
   node.

I did implement this recipe with a toy Forth interpreter to see whether
libFirm was up to the job (Nobody's done JIT with libFirm before).  The
results were favorable[2].  Currently, a student at credativ is working
on applying these techniques to postgres.

regards,
Andreas

Footnotes: 
[1]  http://libfirm.org/

[2]  https://korte.credativ.com/~ase/firm-postgres-jit-forth.pdf



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving executor performance

2016-07-14 Thread Andreas Seltenreich
Andres Freund writes:

> Having expression evaluation and slot deforming as a series of simple
> sequential steps, instead of complex recursive calls, would also make it
> fairly straightforward to optionally just-in-time compile those.

I don't think that JIT becomes easier by this change.  Constructing the
IR for LLVM, libFirm or any other JIT library from expression trees is
straightforward already.  It's probably more of a nuisance for those
that already have some code/design on JIT-compiling expressions
(vitessedb, ISP RAS, yours truly)

I like your patch for all the other reasons stated though!

regards
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issue in pg_catalog.pg_indexes view definition

2016-07-14 Thread Andreas Seltenreich
Tom Lane writes:

> Dilip Kumar  writes:
>> So I think changing the view definition and calling this function on
>> indexrelid will remove the error. So I think
>> correct fix is to change view definition, as I proposed in above patch.
[...]
> We've dealt with similar issues in places like pg_relation_size() by
> making the functions return NULL instead of throwing an error for an
> unmatched argument OID.

Note that Michael Paquier sent a patch implementing this in another
thread:

https://www.postgresql.org/message-id/cab7npqtxf5dtxjezb7xkjvowxx8d_2atxmtu3psnkhcwt_j...@mail.gmail.com

regards,
andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-12 Thread Andreas Seltenreich
Tom Lane writes:

> More generally, this example makes me fearful that NaN coordinates in
> geometric values are likely to cause all sorts of issues.  It's too late
> to disallow them, probably, but I wonder how can we identify other bugs
> of this ilk.

Sounds like some fuzz testing with nan/infinity is in order.  sqlsmith
doesn't generate any float literals, but it calls functions to satisfy
its need for values of specific types.  Adding suitable functions[1] to
the regression db, I made the following observations:

The infinite loop from the bug report was triggered. Further, two
previously unseen errors are logged:

ERROR:  timestamp cannot be NaN
ERROR:  getQuadrant: impossible case

The first is porbably as boring as it gets, the second one is from the
getQuadrant() in spgquadtreeproc.c.

Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do
not have such a check.  I guess the boxes will just end up in an
undefined position in the index for these.

regards
Andreas

Footnotes:
[1]
create function smith_double_inf() returns float as $$select 
'infinity'::float$$ language sql immutable;
create function smith_double_ninf() returns float as $$select 
'-infinity'::float$$ language sql immutable;
create function smith_double_nan() returns float as $$select 'nan'::float$$ 
language sql immutable;
create function smith_real_nan() returns real as $$select 'nan'::real$$ 
language sql immutable;
create function smith_real_inf() returns real as $$select 'infinity'::real$$ 
language sql immutable;
create function smith_real_ninf() returns real as $$select '-infinity'::real$$ 
language sql immutable;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] ERROR: plan should not reference subplan's variable

2016-07-01 Thread Andreas Seltenreich
Updating master from f8c5855..1bdae16, sqlsmith triggers "failed to
generate plan" errors again.  Below is the smallest query logged so far.

regards,
Andreas

-- ERROR:  plan should not reference subplan's variable
set force_parallel_mode = 'on';
set max_parallel_workers_per_gather = '1';

explain WITH
jennifer_0 AS (select
(select b from public.rtest_v1 limit 1 offset 5)
   as c0,
pg_catalog.pg_current_xlog_location() as c1,
sample_0.a as c2,
sample_0.a as c3
  from
public.rtest_view4 as sample_0 tablesample system (5.9)
  where cast(null as bigint) = pg_catalog.hashinet(
  cast((select client_addr from pg_catalog.pg_stat_activity limit 1 offset 
35)
 as inet))
  limit 76)
select
ref_0.sl_name as c0
  from
public.shoelace as ref_0
  where (cast(null as anyrange) < cast(null as anyrange))
and (EXISTS (
  select
  39 as c0
from
  jennifer_0 as ref_1
where cast(null as real) = cast(null as real)
limit 81))
  limit 96;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] crashes in RestoreSnapshot on hot standby

2016-06-30 Thread Andreas Seltenreich
Amit Kapila writes:
> On Fri, Jul 1, 2016 at 9:38 AM, Thomas Munro  
> wrote:
>> Or maybe just like this?
>>
>> -   snapshot->subxip = snapshot->xip + serialized_snapshot->xcnt;
>> +   snapshot->subxip = ((TransactionId *) (snapshot + 1)) +
>> +   serialized_snapshot->xcnt;
>>
>
> This way it looks better to me.  Thanks for the patch.

I no longer see these crashes when testing with the patch applied.

thanks,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] crashes in RestoreSnapshot on hot standby

2016-06-30 Thread Andreas Seltenreich
Running sqlsmith on a streaming slave (master as of f8c5855) is
inconspicuous as long as the master is idle.  As soon as I start it on
the master as well, the standby frequently crashes in RestoreSnapshot.
It doesn't seem to be specific to the queries, as they don't trigger a
crash when re-run.

Backtraces always look like the ones below.

regards,
Andreas

 BEGIN BACKTRACE OF CORE FILE ./slave/postgres.9826@.core ON doombat 
Core was generated by `postgres: smith regression [local] SELECT
 '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __memcpy_sse2_unaligned () at 
../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167
167 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Datei oder 
Verzeichnis nicht gefunden.
(gdb) bt
#0  __memcpy_sse2_unaligned () at 
../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167
#1  0x00822032 in RestoreSnapshot 
(start_address=start_address@entry=0x7f2701d5a110 ) at snapmgr.c:2020
#2  0x004a934a in heap_beginscan_parallel (relation=0x2060a90, 
parallel_scan=parallel_scan@entry=0x7f2701d5a0f8) at heapam.c:1657
#3  0x005fbedf in ExecSeqScanInitializeDSM (node=0x1f5b470, 
pcxt=0x221af88) at nodeSeqscan.c:327
#4  0x005dd0ad in ExecParallelInitializeDSM 
(planstate=planstate@entry=0x1f5b470, d=d@entry=0x7ffd4ba200d0) at 
execParallel.c:245
#5  0x005dd425 in ExecInitParallelPlan (planstate=0x1f5b470, 
estate=estate@entry=0x1f5ab28, nworkers=2) at execParallel.c:477
#6  0x005ef4a4 in ExecGather (node=node@entry=0x1f5b048) at 
nodeGather.c:159
#7  0x005dda48 in ExecProcNode (node=node@entry=0x1f5b048) at 
execProcnode.c:515
#8  0x005f4b30 in ExecLimit (node=node@entry=0x1f5acd0) at 
nodeLimit.c:91
#9  0x005dd9d8 in ExecProcNode (node=node@entry=0x1f5acd0) at 
execProcnode.c:531
#10 0x005fef7c in ExecSetParamPlan (node=, 
econtext=0x1f5c138) at nodeSubplan.c:999
#11 0x005e28b5 in ExecEvalParamExec (exprstate=, 
econtext=, isNull=0x22045b0 "", isDone=) at 
execQual.c:1135
#12 0x005deb6d in ExecMakeFunctionResultNoSets (fcache=0x2204200, 
econtext=0x1f5c138, isNull=0x2203d98 "", isDone=) at 
execQual.c:2015
#13 0x005de29a in ExecEvalCoalesce (coalesceExpr=, 
econtext=0x1f5c138, isNull=0x2203d98 "", isDone=) at 
execQual.c:3446
#14 0x005deb6d in ExecMakeFunctionResultNoSets (fcache=0x22039e8, 
econtext=0x1f5c138, isNull=0x7ffd4ba203df "", isDone=) at 
execQual.c:2015
#15 0x005e4939 in ExecQual (qual=, 
econtext=econtext@entry=0x1f5c138, resultForNull=resultForNull@entry=0 '\000') 
at execQual.c:5269
#16 0x005faef1 in ExecResult (node=node@entry=0x1f5c020) at 
nodeResult.c:82
#17 0x005ddbf8 in ExecProcNode (node=node@entry=0x1f5c020) at 
execProcnode.c:392
#18 0x005d9c1f in ExecutePlan (dest=0x1ebb7d0, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x1f5c020, estate=0x1f5ab28) at 
execMain.c:1567
#19 standard_ExecutorRun (queryDesc=0x1f5a718, direction=, 
count=0) at execMain.c:338
#20 0x006f7238 in PortalRunSelect (portal=portal@entry=0x1d13be8, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x1ebb7d0) at pquery.c:946
#21 0x006f875e in PortalRun (portal=0x1d13be8, 
count=9223372036854775807, isTopLevel=, dest=0x1ebb7d0, 
altdest=0x1ebb7d0, completionTag=0x7ffd4ba20840 "") at pquery.c:787
#22 0x006f6003 in exec_simple_query (query_string=) at 
postgres.c:1094
#23 PostgresMain (argc=30489576, argv=0x1ecfb08, dbname=0x1cf5a00 "regression", 
username=0x1ecfc20 "\b\373\354\001") at postgres.c:4074
#24 0x0046ca67 in BackendRun (port=0x1d17b50) at postmaster.c:4262
#25 BackendStartup (port=0x1d17b50) at postmaster.c:3936
#26 ServerLoop () at postmaster.c:1693
#27 0x00690ab7 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x1cf45e0) at postmaster.c:1301
#28 0x0046d9cd in main (argc=3, argv=0x1cf45e0) at main.c:228
(gdb) p debug_query_string
$1 = 0x1d68a78 "select  \n  sample_0.j as c0\nfrom \n  public.testjsonb as 
sample_0 tablesample system (8) \nwhere cast(coalesce(pg_catalog.char_length(\n 
 cast((select comment from public.room limit 1 offset 20)\n as 
text)),\npg_catalog.pg_trigger_depth()) as integer) <> 3"

 BEGIN BACKTRACE OF CORE FILE ./slave/postgres.8104@.core ON marbit 
Core was generated by `postgres: bgworker: parallel worker for PID 2610 
 '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __memcpy_sse2_unaligned () at 
../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167
167 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Datei oder 
Verzeichnis nicht gefunden.
(gdb) bt
#0  __memcpy_sse2_unaligned () at 
../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167
#1  0x00822032 in RestoreSnapshot (start_address=0x7f7b1ee4fa58 ) at snapmgr.c:2020
#2  

[HACKERS] [sqlsmith] OOM crash in plpgsql_extra_checks_check_hook

2016-06-20 Thread Andreas Seltenreich
Just had a parallel worker of a memory-starved instance of sqlsmith
crash.  plpgsql_extra_checks_check_hook forgot to check the result of
its malloc call here:

Core was generated by `postgres: bgworker: parallel worker for PID 5905 
   '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  plpgsql_extra_checks_check_hook (newvalue=, 
extra=0x7fff7fe31a58, source=) at pl_handler.c:113
113 *myextra = extrachecks;
(gdb) bt
#0  plpgsql_extra_checks_check_hook (newvalue=, 
extra=0x7fff7fe31a58, source=) at pl_handler.c:113
#1  0x0080173f in call_string_check_hook (newval=0x7fff7fe31a50, 
extra=, source=, elevel=15, conf=, 
conf=) at guc.c:9779
#2  0x008029b8 in InitializeOneGUCOption (gconf=0x4) at guc.c:4546
#3  0x00804dbc in define_custom_variable (variable=0x2cb6ef0) at 
guc.c:7466
#4  0x00805862 in DefineCustomStringVariable 
(name=name@entry=0x7f803cbfe011 "plpgsql.extra_warnings", 
short_desc=short_desc@entry=0x7f803cbfe1f8 "List of programming constructs that 
should produce a warning.", long_desc=long_desc@entry=0x0, 
valueAddr=valueAddr@entry=0x7f803ce070d8 , 
bootValue=bootValue@entry=0x7f803cbfdf78 "none", 
context=context@entry=PGC_USERSET, flags=1, check_hook=0x7f803cbe9700 
, assign_hook=0x7f803cbe96e0 
, show_hook=0x0) at guc.c:7733
#5  0x7f803cbe99ea in _PG_init () at pl_handler.c:173
#6  0x007f1bcb in internal_load_library 
(libname=libname@entry=0x7f8040cee14d ) at dfmgr.c:276
#7  0x007f2738 in RestoreLibraryState (start_address=0x7f8040cee14d 
) at dfmgr.c:741
#8  0x004e61c0 in ParallelWorkerMain (main_arg=) at 
parallel.c:985
#9  0x00684072 in StartBackgroundWorker () at bgworker.c:726
#10 0x0068f142 in do_start_bgworker (rw=0x2cb5230) at postmaster.c:5535
#11 maybe_start_bgworker () at postmaster.c:5709
#12 0x0068fb96 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:4971
#13 
#14 0x7f8040091ac3 in __select_nocancel () at 
../sysdeps/unix/syscall-template.S:81
#15 0x0046c31f in ServerLoop () at postmaster.c:1657
#16 0x00690fc7 in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0x2c8c620) at postmaster.c:1301
#17 0x0046d96d in main (argc=4, argv=0x2c8c620) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Don't generate parallel paths for rels with parallel-restricted

2016-06-15 Thread Andreas Seltenreich
Amit Kapila writes:

> Right, so I have moved "Failed assertion in parallel worker
> (ExecInitSubPlan)" item to CLOSE_WAIT state as I don't think there is any
> known pending issue in that item.  I have moved it to CLOSE_WAIT state
> because we have derived our queries to reproduce the problem based on
> original report[1].  If next run of sqlsmith doesn't show any problem in
> this context then we will move it to resolved.

It ran for about 100e6 queries by now without tripping on any parallel
worker related assertions.  I tested with min_parallel_relation_size_v1.patch
applied and set to 64kB to have more exposure during testing.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Should pg_export_snapshot() and currtid() be tagged parallel-unsafe?

2016-06-14 Thread Andreas Seltenreich
Digging through the sqlsmith logging db, I noticed the following errors:

ERROR:  cannot update SecondarySnapshot during a parallel operation
ERROR:  cannot assign XIDs during a parallel operation

Queries raising the first one always contain calls to currtid() or
currtid2().  Queries raising the second one always contain a call to
pg_export_snapshot().  Patch to tag them as unsafe attached.

regards,
Andreas
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index f33c3ff..6a65e77 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -1347,9 +1347,9 @@ DATA(insert OID = 1291 (  suppress_redundant_updates_trigger	PGNSP PGUID 12 1 0
 DESCR("trigger to suppress updates when new and old records match");
 
 DATA(insert OID = 1292 ( tideq			   PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "27 27" _null_ _null_ _null_ _null_ _null_ tideq _null_ _null_ _null_ ));
-DATA(insert OID = 1293 ( currtid		   PGNSP PGUID 12 1 0 0 0 f f f f t f v s 2 0 27 "26 27" _null_ _null_ _null_ _null_ _null_ currtid_byreloid _null_ _null_ _null_ ));
+DATA(insert OID = 1293 ( currtid		   PGNSP PGUID 12 1 0 0 0 f f f f t f v u 2 0 27 "26 27" _null_ _null_ _null_ _null_ _null_ currtid_byreloid _null_ _null_ _null_ ));
 DESCR("latest tid of a tuple");
-DATA(insert OID = 1294 ( currtid2		   PGNSP PGUID 12 1 0 0 0 f f f f t f v s 2 0 27 "25 27" _null_ _null_ _null_ _null_ _null_ currtid_byrelname _null_ _null_ _null_ ));
+DATA(insert OID = 1294 ( currtid2		   PGNSP PGUID 12 1 0 0 0 f f f f t f v u 2 0 27 "25 27" _null_ _null_ _null_ _null_ _null_ currtid_byrelname _null_ _null_ _null_ ));
 DESCR("latest tid of a tuple");
 DATA(insert OID = 1265 ( tidne			   PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "27 27" _null_ _null_ _null_ _null_ _null_ tidne _null_ _null_ _null_ ));
 DATA(insert OID = 2790 ( tidgt			   PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "27 27" _null_ _null_ _null_ _null_ _null_ tidgt _null_ _null_ _null_ ));
@@ -3135,7 +3135,7 @@ DESCR("xlog filename, given an xlog location");
 DATA(insert OID = 3165 ( pg_xlog_location_diff		PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1700 "3220 3220" _null_ _null_ _null_ _null_ _null_ pg_xlog_location_diff _null_ _null_ _null_ ));
 DESCR("difference in bytes, given two xlog locations");
 
-DATA(insert OID = 3809 ( pg_export_snapshot		PGNSP PGUID 12 1 0 0 0 f f f f t f v r 0 0 25 "" _null_ _null_ _null_ _null_ _null_ pg_export_snapshot _null_ _null_ _null_ ));
+DATA(insert OID = 3809 ( pg_export_snapshot		PGNSP PGUID 12 1 0 0 0 f f f f t f v u 0 0 25 "" _null_ _null_ _null_ _null_ _null_ pg_export_snapshot _null_ _null_ _null_ ));
 DESCR("export a snapshot");
 
 DATA(insert OID = 3810 (  pg_is_in_recovery		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_is_in_recovery _null_ _null_ _null_ ));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Don't generate parallel paths for rels with parallel-restricted

2016-06-11 Thread Andreas Seltenreich
Amit Kapila writes:

> I have moved it to CLOSE_WAIT state because we have derived our
> queries to reproduce the problem based on original report[1].  If next
> run of sqlsmith doesn't show any problem in this context then we will
> move it to resolved.

I don't have access to my testing horse power this weekend so I can
report on tuesday at the earliest.  Unless someone else feels like
running sqlsmith…

regards,
Andreas
-- 
SQLsmith error of the day: time zone "Bruce Momjian" not recognized.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in postgres_fdw/deparse.c:1116

2016-06-05 Thread Andreas Seltenreich
Creating some foreign tables via postgres_fdw in the regression db of
master as of de33af8, sqlsmith triggers the following assertion:

TRAP: FailedAssertion("!(const Node*)(var))->type) == T_Var))", File: 
"deparse.c", Line: 1116)

gdb says var is holding a T_PlaceHolderVar instead.  In a build without
assertions, it leads to an error later:

ERROR:  cache lookup failed for type 0

Recipe:

--8<---cut here---start->8---
create extension postgres_fdw;
create server myself foreign data wrapper postgres_fdw;
create schema fdw_postgres;
create user mapping for public server myself options (user :'USER');
import foreign schema public from server myself into fdw_postgres;
select subq_0.c0 as c0 from
   (select 31 as c0 from fdw_postgres.a as ref_0
  where 93 >= ref_0.aa) as subq_0
   right join fdw_postgres.rtest_vview5 as ref_1
   on (subq_0.c0 = ref_1.a )
   where 92 = subq_0.c0;
--8<---cut here---end--->8---

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple

2016-05-29 Thread Andreas Seltenreich
Alvaro Herrera writes:

> If you can re-run sqlsmith and see if you can find different bugs, I'd
> appreciate it.
[...]
> [2. text/x-diff; brincrash-2.patch]

BRIN is inconspicuous since applying this patch.  All coredumps I see
now are either due to the parallel worker shutdown issue or acl.c's
text/name confusion, both reported earlier.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist

2016-05-26 Thread Andreas Seltenreich
Tom Lane writes:

> Andreas Seltenreich <seltenre...@gmx.de> writes:
>> Peter Geoghegan writes:
>>> It's surprising that SQL Smith didn't catch something with such simple
>>> steps to reproduce.
>
>> I removed distinct relatively early because it causes a large part of
>> queries to fail due to it not finding an equality operator it likes.  It
>> seems to be more picky about the equality operator than, say, joins.
>> I'm sure it has a good reason to do so?
>
> It's looking for an operator that is known to be semantically equality,
> by virtue of being the equality member of a btree or hash opclass.
> Type path has no such opclass unfortunately.

As do lots of data types in the regression db while still having an
operator providing semantic equivalence.  I was hoping for someone to
question that status quo.  Naively I'd say an equivalence flag is
missing in the catalog that is independent of opclasses.

regards
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist

2016-05-26 Thread Andreas Seltenreich
Peter Geoghegan writes:

> On Wed, May 25, 2016 at 7:12 PM, Andres Freund  wrote:
>>
>> =# CREATE TABLE twocol(col01 int, col02 int);
>> =# SELECT DISTINCT col01, col02, col01 FROM twocol ;
>> ERROR:  XX000: ORDER/GROUP BY expression not found in targetlist
>> LOCATION:  get_sortgroupref_tle, tlist.c:341
>>
>> which appears to be a 9.6 regression, presumable fallout from the path
>> restructuring.
>
> It's surprising that SQL Smith didn't catch something with such simple
> steps to reproduce.

I removed distinct relatively early because it causes a large part of
queries to fail due to it not finding an equality operator it likes.  It
seems to be more picky about the equality operator than, say, joins.
I'm sure it has a good reason to do so?

regression=> select distinct f1 from path_tbl;
ERROR:  could not identify an equality operator for type path
LINE 1: select distinct f1 from path_tbl;

regression=> \do =
-[ RECORD 38 ]-+
Schema | pg_catalog
Name   | =
Left arg type  | path
Right arg type | path
Result type| boolean
Description| equal



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple

2016-05-25 Thread Andreas Seltenreich
I wrote:

> Re-fuzzing now with your patch applied.

This so far yielded three BRIN core dumps on different machines with the
same backtraces.  Crash recovery fails in these cases.

I've put the data directory here (before recovery):

http://ansel.ydns.eu/~andreas/brincrash2-spidew.tar.xz (9.1M)

Corresponding backtraces of the backend and startup core files below.

regards,
Andreas

Core was generated by `postgres: smith brintest [local] UPDATE  
   '.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f5a49a9c067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x7f5a49a9c067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f5a49a9d448 in __GI_abort () at abort.c:89
#2  0x007ec979 in errfinish (dummy=dummy@entry=0) at elog.c:557
#3  0x007f012c in elog_finish (elevel=elevel@entry=20, 
fmt=fmt@entry=0x989cf0 "incorrect index offsets supplied") at elog.c:1378
#4  0x006eda2f in PageIndexDeleteNoCompact 
(page=page@entry=0x7f5a48c6f200 "Y", itemnos=itemnos@entry=0x7ffe6d0d0abc, 
nitems=nitems@entry=1) at bufpage.c:1011
#5  0x00470119 in brin_doupdate (idxrel=0x1c5a530, pagesPerRange=1, 
revmap=0x91a7d90, heapBlk=2166, oldbuf=3782, oldoff=2, origtup=0x719db80, 
origsz=3656, newtup=0x754e188, newsz=3656, samepage=1 '\001') at 
brin_pageops.c:181
#6  0x0046e5db in brininsert (idxRel=0x1c5a530, values=0x6591, 
nulls=0x6 , 
heaptid=0x, heapRel=0x7f5a4a72f700, 
checkUnique=UNIQUE_CHECK_NO) at brin.c:244
#7  0x005d888f in ExecInsertIndexTuples (slot=0x91a4870, 
tupleid=0x91a7df4, estate=0x91b9b38, noDupErr=0 '\000', specConflict=0x0, 
arbiterIndexes=0x0) at execIndexing.c:383
#8  0x005f74e5 in ExecUpdate (tupleid=0x7ffe6d0d0ed0, oldtuple=0x6591, 
slot=0x91a4870, planSlot=0x, epqstate=0x7f5a4a72f700, 
estate=0x91b9b38, canSetTag=1 '\001') at nodeModifyTable.c:1015
#9  0x005f7b7c in ExecModifyTable (node=0x71861c0) at 
nodeModifyTable.c:1501
#10 0x005dd5e8 in ExecProcNode (node=node@entry=0x71861c0) at 
execProcnode.c:396
#11 0x005d963f in ExecutePlan (dest=0x17bb870, direction=, numberTuples=0, sendTuples=, operation=CMD_UPDATE, 
use_parallel_mode=, planstate=0x71861c0, estate=0x91b9b38) at 
execMain.c:1567
#12 standard_ExecutorRun (queryDesc=0x17bb908, direction=, 
count=0) at execMain.c:338
#13 0x006f74d9 in ProcessQuery (plan=, 
sourceText=0x1670e18 "update public.brintest set \n  charcol = null, \n  
namecol = pg_catalog.name(cast(null as character varying)\n), \n  int8col = 
null, \n  int2col = public.brintest.int2col, \n  textcol = null, \n  float4col 
= null, \n  float8col = cast(coalesce(null,\n\npublic.brintest.float8col) 
as double precision), \n  inetcol = public.brintest.inetcol, \n  cidrcol = 
public.brintest.cidrcol, \n  bpcharcol = null, \n  datecol = (select datecol 
from public.brintest limit 1 offset 25)\n, \n  timecol = (select timecol from 
public.brintest limit 1 offset 42)\n, \n  timetzcol = (select timetzcol from 
public.brintest limit 1 offset 3)\n, \n  numericcol = (select numericcol from 
public.brintest limit 1 offset 32)\n, \n  uuidcol = 
public.brintest.uuidcol\nreturning \n  public.brintest.datecol as c0;", 
params=0x0, dest=0x17bb870, completionTag=0x7ffe6d0d10a0 "") at pquery.c:185
#14 0x006f776f in PortalRunMulti (portal=portal@entry=0x1611b68, 
isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x17bb870, 
altdest=0xc96680 , 
completionTag=completionTag@entry=0x7ffe6d0d10a0 "") at pquery.c:1267
#15 0x006f7a1c in FillPortalStore (portal=portal@entry=0x1611b68, 
isTopLevel=isTopLevel@entry=1 '\001') at pquery.c:1044
#16 0x006f846d in PortalRun (portal=0x1611b68, 
count=9223372036854775807, isTopLevel=, dest=0x756c870, 
altdest=0x756c870, completionTag=0x7ffe6d0d1450 "") at pquery.c:782
#17 0x006f5c73 in exec_simple_query (query_string=) at 
postgres.c:1094
#18 PostgresMain (argc=23141224, argv=0x2cab518, dbname=0x15f3578 "brintest", 
username=0x2cab570 "\030\265\312\002") at postgres.c:4059
#19 0x0046c8d2 in BackendRun (port=0x1618880) at postmaster.c:4258
#20 BackendStartup (port=0x1618880) at postmaster.c:3932
#21 ServerLoop () at postmaster.c:1690
#22 0x006907fe in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0x15f2560) at postmaster.c:1298
#23 0x0046d82d in main (argc=4, argv=0x15f2560) at main.c:228
(gdb) frame 4
#4  0x006eda2f in PageIndexDeleteNoCompact 
(page=page@entry=0x7f5a48c6f200 "Y", itemnos=itemnos@entry=0x7ffe6d0d0abc, 
nitems=nitems@entry=1) at bufpage.c:1011
1011elog(ERROR, "incorrect index offsets supplied");
(gdb) list
1006}
1007}
1008
1009/* this will catch invalid or out-of-order itemnos[] */
1010 

Re: [HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple

2016-05-25 Thread Andreas Seltenreich
I wrote:
> Alvaro Herrera writes:
>> How long does it take for you to reproduce this panic in the unpatched
>> code?
>
> I could probably speed it up by creating lots of additional BRIN indexes
> in the regression database, and by compiling a sqlsmith that generates
> update statements only.

This actually worked.  Sqlsmith triggered the BRIN panic twice in 80e6
queries (vs. onece in 4e9 before).  I pushed the modified version on the
branch "modify-heavy".  Re-fuzzing now with your patch applied.

andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple

2016-05-24 Thread Andreas Seltenreich
Alvaro Herrera writes:

> How long does it take for you to reproduce this panic in the unpatched
> code?

Very long, I'm afraid.  I only observed it once, and according to the
logging database, about 4e9 random queries were generated since testing
with 9.5 code.

I could probably speed it up by creating lots of additional BRIN indexes
in the regression database, and by compiling a sqlsmith that generates
update statements only.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown

2016-05-24 Thread Andreas Seltenreich
Amit Kapila writes:

> On Mon, May 23, 2016 at 4:48 PM, Andreas Seltenreich <seltenre...@gmx.de>
> wrote:
>> plan6 corresponds to this query:
>>
> Are you sure that the core dumps you are seeing are due to plan6?

Each of the plans sent was harvested from a controlling process when the
above assertion failed in its workers.  I do not know whether the plans
themselves really are at fault, as most of the collected plans look ok
to me.  The backtrace in the controlling process always look like the
one reported. (Except when the coredumping took so long as to trigger a
statement_timeout in the still-running master. There are no
plans/queries available in this case, as the the state is no longer
available in an aborted transaction.)

> I have tried to generate a parallel plan for above query and it seems to me 
> that
> after applying the patches (avoid_restricted_clause_below_gather_v1.patch
> and prohibit_parallel_clause_below_rel_v1.patch), the plan it generates
> doesn't have subplan below gather node [1].

> Without patch avoid_restricted_clause_below_gather_v1.patch, it will allow to 
> push
> subplan below gather node, so I think either there is some other plan
> (query) due to which you are seeing core dumps or the above two patches
> haven't been applied before testing.

According to my notes, the patches were applied in the instance that
crashed.  The fact that I do not see the other variants of the crashes
the patches fix anymore, and the probability for this failed assertion
per random query is reduced by about a factor of 20 in contrast to
testing with the patches not applied, I'm pretty certain that this is
not a bookkeeping error on my part.

> Is it possible that core dump is due to plan2 or some other similar
> plan (I am not sure at this stage about the cause of the problem you
> are seeing, but if due to some reason PARAM_EXEC params are pushed
> below gather, then such a plan might not work)?  If you think plan
> other than plan6 can cause such a problem, then can you share the
> query for plan2?

Each of the sent plans was collected when a worker dumped core due to
the failed assertion.  More core dumps than plans were actually
observed, since with this failed assertion, multiple workers usually
trip on and dump core simultaneously.

The following query corresponds to plan2:

--8<---cut here---start->8---
select
  pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0,
  subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4,
  (select unique1 from public.bprime limit 1 offset 9) as c5,
  subq_0.c2 as c6
from
(select ref_0.tablename as c0, ref_0.inherited as c1,
ref_0.histogram_bounds as c2, 100 as c3
  from
pg_catalog.pg_stats as ref_0
  where 49 is not NULL limit 55) as subq_0
where true
limit 58;
--8<---cut here---end--->8---

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown

2016-05-23 Thread Andreas Seltenreich
Amit Kapila writes:

> Earlier problems were due to the reason that some unsafe/restricted
> expressions were pushed below Gather node as part of target list whereas in
> the plan6, it seems some unsafe node is pushed below Gather node. It will
> be helpful if you can share the offending query?

plan6 corresponds to this query:

select
pg_catalog.anyarray_out(
cast((select most_common_vals from pg_catalog.pg_stats limit 1 offset 41)
 as anyarray)) as c0
 from
public.quad_point_tbl as ref_0 where ref_0.p ~= ref_0.p;

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown

2016-05-23 Thread Andreas Seltenreich
I wrote:
>> There's another class of parallel worker core dumps when testing master
>> with sqlsmith.  In these cases, the following assertion fails for all
>> workers simulataneously:
>>
>> TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: 
>> "shm_mq.c", Line: 386)
>
> I no longer observe these after applying these two patches by Amit
> Kapila

I spoke too soon: These still occur with the patches applied, but with
much lower probability. (one core dump per 20e6 random queries instead
of 1e6).

Most of the collected plans look inconspicuous to me now, except for one
that again had a subplan below a gather node (plan6).  Tarball of all
collected plans attached.

regards,
Andreas



plans.tar.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown

2016-05-22 Thread Andreas Seltenreich
I wrote:

> There's another class of parallel worker core dumps when testing master
> with sqlsmith.  In these cases, the following assertion fails for all
> workers simulataneously:
>
> TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: 
> "shm_mq.c", Line: 386)

I no longer observe these after applying these two patches by Amit
Kapila:

avoid_restricted_clause_below_gather_v1.patch
Message-ID: 

Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

2016-05-22 Thread Andreas Seltenreich
Amit Kapila writes:

> avoid_restricted_clause_below_gather_v1.patch
> prohibit_parallel_clause_below_rel_v1.patch

I didn't observe any parallel worker related coredumps since applying
these.  The same amount of testing done before applying them yielded
about a dozend.

Dilip Kumar writes:

> So now its clear that because of sub query pullup, we may get expression in
> targetlist while creating single table path list. So we need to avoid
> parallel plan if it contains expression.

This sounds like a rather heavy restriction though…

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple

2016-05-22 Thread Andreas Seltenreich
There was one instance of this PANIC when testing with the regression db
of master at 50e5315.

,
| WARNING:  specified item offset is too large
| PANIC:  failed to add BRIN tuple
| server closed the connection unexpectedly
`

It is reproducible with the query below on this instance only.  I've put
the data directory (20MB) here:

http://ansel.ydns.eu/~andreas/brincrash.tar.xz

The instance was running on Debian Jessie amd64.  Query and Backtrace
below.

regards,
Andreas

--8<---cut here---start->8---
update public.brintest set byteacol = null, charcol =
public.brintest.charcol, int2col = null, int4col =
public.brintest.int4col, textcol = public.brintest.textcol, oidcol =
cast(coalesce(cast(coalesce(null, public.brintest.oidcol) as oid),
pg_catalog.pg_my_temp_schema()) as oid), tidcol =
public.brintest.tidcol, float8col = public.brintest.float8col,
macaddrcol = null, cidrcol = public.brintest.cidrcol, datecol =
public.brintest.datecol, timecol = public.brintest.timecol,
timestamptzcol = pg_catalog.clock_timestamp(), intervalcol =
public.brintest.intervalcol, timetzcol = public.brintest.timetzcol,
bitcol = public.brintest.bitcol, varbitcol =
public.brintest.varbitcol, uuidcol = null returning
public.brintest.byteacol as c0;
--8<---cut here---end--->8---

Core was generated by `postgres: smith regression [local] UPDATE
   '.
Program terminated with signal SIGABRT, Aborted.
#0  0x7fd2cda67067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x7fd2cda67067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7fd2cda68448 in __GI_abort () at abort.c:89
#2  0x007ec969 in errfinish (dummy=dummy@entry=0) at elog.c:557
#3  0x007f011c in elog_finish (elevel=elevel@entry=20, 
fmt=fmt@entry=0x82ca8f "failed to add BRIN tuple") at elog.c:1378
#4  0x00470618 in brin_doupdate (idxrel=0x101f4c0, pagesPerRange=1, 
revmap=0x10d20e50, heapBlk=8, oldbuf=2878, oldoff=9, origtup=0x10d864a8, 
origsz=6144, newtup=0x5328a88, newsz=6144, samepage=1 '\001') at 
brin_pageops.c:184
#5  0x0046e5bb in brininsert (idxRel=0x101f4c0, values=0x211b, 
nulls=0x6 , 
heaptid=0x, heapRel=0x7fd2ce6fd700, 
checkUnique=UNIQUE_CHECK_NO) at brin.c:244
#6  0x005d887f in ExecInsertIndexTuples (slot=0xe92a560, 
tupleid=0x10d21084, estate=0x9ed8a68, noDupErr=0 '\000', specConflict=0x0, 
arbiterIndexes=0x0) at execIndexing.c:383
#7  0x005f74d5 in ExecUpdate (tupleid=0x7ffe11ea74a0, oldtuple=0x211b, 
slot=0xe92a560, planSlot=0x, epqstate=0x7fd2ce6fd700, 
estate=0x9ed8a68, canSetTag=1 '\001') at nodeModifyTable.c:1015
#8  0x005f7b6c in ExecModifyTable (node=0x9ed8d28) at 
nodeModifyTable.c:1501
#9  0x005dd5d8 in ExecProcNode (node=node@entry=0x9ed8d28) at 
execProcnode.c:396
#10 0x005d962f in ExecutePlan (dest=0xde86040, direction=, numberTuples=0, sendTuples=, operation=CMD_UPDATE, 
use_parallel_mode=, planstate=0x9ed8d28, estate=0x9ed8a68) at 
execMain.c:1567
#11 standard_ExecutorRun (queryDesc=0xde860d8, direction=, 
count=0) at execMain.c:338
#12 0x006f74c9 in ProcessQuery (plan=, 
sourceText=0xd74e88 "update public.brintest[...]", params=0x0, dest=0xde86040, 
completionTag=0x7ffe11ea7670 "") at pquery.c:185
#13 0x006f775f in PortalRunMulti (portal=portal@entry=0xde8abf0, 
isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0xde86040, 
altdest=0xc96680 , 
completionTag=completionTag@entry=0x7ffe11ea7670 "") at pquery.c:1267
#14 0x006f7a0c in FillPortalStore (portal=portal@entry=0xde8abf0, 
isTopLevel=isTopLevel@entry=1 '\001') at pquery.c:1044
#15 0x006f845d in PortalRun (portal=0xde8abf0, 
count=9223372036854775807, isTopLevel=, dest=0x9ee76b8, 
altdest=0x9ee76b8, completionTag=0x7ffe11ea7a20 "") at pquery.c:782
#16 0x006f5c63 in exec_simple_query (query_string=) at 
postgres.c:1094
#17 PostgresMain (argc=233352176, argv=0xe8ad358, dbname=0xcf7508 "regression", 
username=0xe8ad3b0 "Xӊ\016") at postgres.c:4059
#18 0x0046c8b2 in BackendRun (port=0xd1c580) at postmaster.c:4258
#19 BackendStartup (port=0xd1c580) at postmaster.c:3932
#20 ServerLoop () at postmaster.c:1690
#21 0x0069081e in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0xcf64f0) at postmaster.c:1298
#22 0x0046d80d in main (argc=4, argv=0xcf64f0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown

2016-05-22 Thread Andreas Seltenreich
There's another class of parallel worker core dumps when testing master
with sqlsmith.  In these cases, the following assertion fails for all
workers simulataneously:

TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: "shm_mq.c", 
Line: 386)

The backtraces of the controlling process is always in
ExecShutdownGatherWorkers.  The queries always work fine on re-running,
so I guess there is some race condition on worker shutdown?  Backtraces
below.

regards
andreas

Core was generated by `postgres: bgworker: parallel worker for PID 30525   
'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f5a3df91067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x7f5a3df91067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f5a3df92448 in __GI_abort () at abort.c:89
#2  0x007eabe1 in ExceptionalCondition 
(conditionName=conditionName@entry=0x984e10 "!(mqh->mqh_partial_bytes <= 
nbytes)", errorType=errorType@entry=0x82a75d "FailedAssertion", 
fileName=fileName@entry=0x984b8c "shm_mq.c", lineNumber=lineNumber@entry=386) 
at assert.c:54
#3  0x006d8042 in shm_mq_sendv (mqh=0x25f17b8, 
iov=iov@entry=0x7ffc6352af00, iovcnt=iovcnt@entry=1, nowait=) at 
shm_mq.c:386
#4  0x006d807d in shm_mq_send (mqh=, nbytes=, data=, nowait=) at shm_mq.c:327
#5  0x005d96b9 in ExecutePlan (dest=0x25f1850, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x2612da8, estate=0x2612658) at 
execMain.c:1596
#6  standard_ExecutorRun (queryDesc=0x261a660, direction=, 
count=0) at execMain.c:338
#7  0x005dc7cf in ParallelQueryMain (seg=, 
toc=0x7f5a3ea6c000) at execParallel.c:735
#8  0x004e617b in ParallelWorkerMain (main_arg=) at 
parallel.c:1035
#9  0x00683862 in StartBackgroundWorker () at bgworker.c:726
#10 0x0068e9a2 in do_start_bgworker (rw=0x2590760) at postmaster.c:5531
#11 maybe_start_bgworker () at postmaster.c:5706
#12 0x0046cbba in ServerLoop () at postmaster.c:1762
#13 0x0069081e in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0x256d580) at postmaster.c:1298
#14 0x0046d80d in main (argc=4, argv=0x256d580) at main.c:228
(gdb) attach 30525
0x7f5a3e044e33 in __epoll_wait_nocancel () at 
../sysdeps/unix/syscall-template.S:81
81  ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x7f5a3e044e33 in __epoll_wait_nocancel () at 
../sysdeps/unix/syscall-template.S:81
#1  0x006d1b4e in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7ffc6352aec0, cur_timeout=-1, set=0x44251c0) at latch.c:981
#2  WaitEventSetWait (set=set@entry=0x44251c0, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7ffc6352aec0, nevents=nevents@entry=1) 
at latch.c:935
#3  0x006d1f96 in WaitLatchOrSocket (latch=0x7f5a3d898494, 
wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=timeout@entry=-1) at 
latch.c:347
#4  0x006d205d in WaitLatch (latch=, 
wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=-1) at latch.c:302
#5  0x004e6d64 in WaitForParallelWorkersToFinish (pcxt=0x442d4e8) at 
parallel.c:537
#6  0x005dcf84 in ExecParallelFinish (pei=0x441cab8) at 
execParallel.c:541
#7  0x005eeead in ExecShutdownGatherWorkers (node=node@entry=0x3e3a070) 
at nodeGather.c:416
#8  0x005ef389 in ExecShutdownGather (node=0x3e3a070) at 
nodeGather.c:430
#9  0x005dd03d in ExecShutdownNode (node=0x3e3a070) at 
execProcnode.c:807
#10 0x0061ad73 in planstate_tree_walker (planstate=0x3e361a8, 
walker=0x5dd010 , context=0x0) at nodeFuncs.c:3442
#11 0x0061ad73 in planstate_tree_walker (planstate=0xf323c30, 
walker=0x5dd010 , context=0x0) at nodeFuncs.c:3442
#12 0x0061ad73 in planstate_tree_walker (planstate=0xf323960, 
walker=0x5dd010 , context=0x0) at nodeFuncs.c:3442
#13 0x005d96da in ExecutePlan (dest=0xb826868, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0xf323960, estate=0xf322b28) at 
execMain.c:1576
#14 standard_ExecutorRun (queryDesc=0xddca888, direction=, 
count=0) at execMain.c:338
#15 0x006f6e88 in PortalRunSelect (portal=portal@entry=0x258ccc8, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0xb826868) at pquery.c:946
#16 0x006f83ae in PortalRun (portal=0x258ccc8, 
count=9223372036854775807, isTopLevel=, dest=0xb826868, 
altdest=0xb826868, completionTag=0x7ffc6352b3d0 "") at pquery.c:787
#17 0x006f5c63 in exec_simple_query (query_string=) at 
postgres.c:1094
#18 PostgresMain (argc=39374024, argv=0x25ed130, dbname=0x256e480 "regression", 
username=0x25ed308 "0\321^\002") at postgres.c:4059
#19 0x0046c8b2 in BackendRun (port=0x25935d0) at postmaster.c:4258
#20 

[HACKERS] Just-in-time compiling things (was: asynchronous and vectorized execution)

2016-05-14 Thread Andreas Seltenreich
Konstantin Knizhnik writes:

> Latest information from ISP RAS guys: them have made good progress
> since February: them have rewritten most of methods of Scan, Aggregate
> and Join to LLVM API.

Is their work available somewhere?  I'm experimenting in that area as
well, although I'm using libFirm instead of LLVM.  I wonder what their
motivation to rewrite backend code in LLVM IR was, since I am following
the approach of keeping the IR around when compiling the vanilla
postgres C code, possibly inlining it during JIT and then doing
optimizations on this IR.  That way the logic doesn't have to be
duplicated.

regrads
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader

2016-05-05 Thread Andreas Seltenreich
Alvaro Herrera writes:

> Robert Haas wrote:
>> On Thu, May 5, 2016 at 4:11 PM, Andreas Seltenreich <seltenre...@gmx.de> 
>> wrote:
>> > I don't see these crashes anymore in c1543a8.  By the amount of fuzzing
>> > done it should have happened a dozen times, so it's highly likely
>> > something in 23b09e15..c1543a8 fixed it.
>> 
>> Hmm, I'd guess c45bf5751b6338488bd79ce777210285531da373 to be the most
>> likely candidate.
>
> I thought so too, but then that patch change things in the planner side,
> but it seems to me that the reported crash is in the executor, unless I'm
> misreading.

Tom had a theory in Message-ID: <12751.1461937...@sss.pgh.pa.us> on how
the planner bug could cause the executor crash.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader

2016-05-05 Thread Andreas Seltenreich
> Amit Kapila writes:
>> Sounds good.  So can we assume that you will try to get us the new report
>> with more information?

I don't see these crashes anymore in c1543a8.  By the amount of fuzzing
done it should have happened a dozen times, so it's highly likely
something in 23b09e15..c1543a8 fixed it.

regards,
andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

2016-05-05 Thread Andreas Seltenreich
Hi,

when fuzz testing master as of c1543a8, parallel workers trigger the
following assertion in ExecInitSubPlan every couple hours.

TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File: "list.c", 
Line: 390)

Sample backtraces of a worker and leader below, plan of leader attached.
The collected queries don't seem to reproduce it.  Curiously, running
explain on them on the failed instance after crash recovery never shows
any gather nodes…

regards,
andreas

Core was generated by `postgres: bgworker: parallel worker for PID 28062
   '.
Program terminated with signal SIGABRT, Aborted.
(gdb) bt
#3  0x0061bad2 in list_nth_cell (list=0x0, n=) at 
list.c:390
#4  0x0061bb26 in list_nth (list=, n=) at 
list.c:413
#5  0x005fe566 in ExecInitSubPlan (subplan=subplan@entry=0x1522a08, 
parent=parent@entry=0x1538188) at nodeSubplan.c:705
#6  0x005e3b54 in ExecInitExpr (node=0x1522a08, 
parent=parent@entry=0x1538188) at execQual.c:4724
#7  0x005e415c in ExecInitExpr (node=, 
parent=parent@entry=0x1538188) at execQual.c:5164
#8  0x005ff3fc in ExecInitAlternativeSubPlan 
(asplan=asplan@entry=0x1522060, parent=parent@entry=0x1538188) at 
nodeSubplan.c:1185
#9  0x005e35c4 in ExecInitExpr (node=0x1522060, 
parent=parent@entry=0x1538188) at execQual.c:4740
#10 0x005e3f8a in ExecInitExpr (node=0x1522978, 
parent=parent@entry=0x1538188) at execQual.c:4845
#11 0x005e415c in ExecInitExpr (node=, 
parent=parent@entry=0x1538188) at execQual.c:5164
#12 0x005e3687 in ExecInitExpr (node=0x1522920, 
parent=parent@entry=0x1538188) at execQual.c:4648
#13 0x005e330f in ExecInitExpr (node=0x15228c8, 
parent=parent@entry=0x1538188) at execQual.c:5132
#14 0x005e326f in ExecInitExpr (node=0x1522870, 
parent=parent@entry=0x1538188) at execQual.c:5152
#15 0x005e415c in ExecInitExpr (node=, 
parent=parent@entry=0x1538188) at execQual.c:5164
#16 0x005fbb62 in ExecInitSeqScan (node=0x1522728, estate=0x15379b8, 
eflags=16) at nodeSeqscan.c:192
#17 0x005dd567 in ExecInitNode (node=0x1522728, 
estate=estate@entry=0x15379b8, eflags=eflags@entry=16) at execProcnode.c:192
#18 0x005f12a5 in ExecInitHashJoin (node=0x1522530, estate=0x15379b8, 
eflags=16) at nodeHashjoin.c:489
#19 0x005dd497 in ExecInitNode (node=node@entry=0x1522530, 
estate=estate@entry=0x15379b8, eflags=eflags@entry=16) at execProcnode.c:275
#20 0x005dae6c in InitPlan (eflags=16, queryDesc=) at 
execMain.c:959
#21 standard_ExecutorStart (queryDesc=, eflags=16) at 
execMain.c:238
#22 0x005dcac4 in ParallelQueryMain (seg=, 
toc=0x7f442d27b000) at execParallel.c:729
#23 0x004e631b in ParallelWorkerMain (main_arg=) at 
parallel.c:1033
#24 0x00683af2 in StartBackgroundWorker () at bgworker.c:726
#25 0x0068ec32 in do_start_bgworker (rw=0x14c4a20) at postmaster.c:5531
#26 maybe_start_bgworker () at postmaster.c:5706
#27 0x0068f686 in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:4967
#28 
#29 0x7f442c839ac3 in __select_nocancel () at 
../sysdeps/unix/syscall-template.S:81
#30 0x0046c144 in ServerLoop () at postmaster.c:1654
#31 0x00690aae in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0x14a1560) at postmaster.c:1298
#32 0x0046d78d in main (argc=4, argv=0x14a1560) at main.c:228
(gdb) frame 5
#5  0x005fe566 in ExecInitSubPlan (subplan=subplan@entry=0x1522a08, 
parent=parent@entry=0x1538188) at nodeSubplan.c:705
(gdb) list
704 /* Link the SubPlanState to already-initialized subplan */
705 sstate->planstate = (PlanState *) 
list_nth(estate->es_subplanstates,
706subplan->plan_id - 
1);

(gdb) attach 28062
Attaching to program: /home/smith/postgres/inst/master/bin/postgres, process 
28062
(gdb) bt
#0  0x7f442c840e33 in __epoll_wait_nocancel () at 
../sysdeps/unix/syscall-template.S:81
#1  0x006d1dde in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7fffdedd75a0, cur_timeout=-1, set=0xe3eedb8) at latch.c:981
#2  WaitEventSetWait (set=set@entry=0xe3eedb8, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7fffdedd75a0, nevents=nevents@entry=1) 
at latch.c:935
#3  0x006d2226 in WaitLatchOrSocket (latch=0x7f442c0d9644, 
wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0) 
at latch.c:347
#4  0x006d22ed in WaitLatch (latch=, 
wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0) at latch.c:302
#5  0x005ef4e3 in gather_readnext (gatherstate=0xe3d7ce8) at 
nodeGather.c:384
#6  gather_getnext (gatherstate=0xe3d7ce8) at nodeGather.c:283
#7  ExecGather (node=node@entry=0xe3d7ce8) at nodeGather.c:229
#8  0x005dd728 in ExecProcNode (node=node@entry=0xe3d7ce8) at 
execProcnode.c:515
#9  0x005f995c in ExecNestLoop (node=node@entry=0x10ef9d90) at 
nodeNestloop.c:174
#10 

Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader

2016-04-30 Thread Andreas Seltenreich
Amit Kapila writes:

> On Fri, Apr 29, 2016 at 7:15 PM, Tom Lane  wrote:
>> but it might be worth copying over the full query from the parent
>> side.
>
> That would amount to couple of extra cycles considering we need to do it
> for each worker, but OTOH it might be a useful debugging information in the
> cases as reported in this thread.

Maybe only do it in assertion-enabled builds when performance is an
issue?

regards,
andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader

2016-04-30 Thread Andreas Seltenreich
Amit Kapila writes:

> On Sat, Apr 30, 2016 at 5:58 AM, Andreas Seltenreich <seltenre...@gmx.de> 
> wrote:
>> This sounds like it should work to capture more context when the
>> Assertion fails the next time.
>
> Sounds good.  So can we assume that you will try to get us the new report
> with more information?

Ja.  I do have a busy weekend+week ahead though, so no holding of
breath.

regards
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader

2016-04-29 Thread Andreas Seltenreich
Simon Riggs writes:

> It's good that the input is fuzzed, but there needs to be a way to re-run
> identical fuzzing or a way to backtrack to find what broke. Not much point
> finding bugs we can't identify later.

sqlsmith is deterministic and allows re-generating a sequence of random
queries with the --seed argument.  Finding a testing methodology that
ensures a repeatable server-side is a harder problem though.

One would have to avoid touching any kind of concurrency, disable
autovacuum, autoanalyze and invoke explicit analyzes/vacuums in concert
with query generation.  Further, one would have to avoid any kind of
concurrency while testing.  Even then, 1% of the queries run into a
statement_timeout due to randomly generated excessive cross joins.  If a
timeout just barely happens, it might not do so on the repeated run and
the deterministic state is gone from then on.  I'm afraid this list is
not complete yet.

I didn't think the effort of creating this kind of clean-room testing
was worth it.  If reports of failed assertions with backtrace without a
recipe to reproduce them are a nuisance, I'll avoid them in the future.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader

2016-04-29 Thread Andreas Seltenreich
Alvaro Herrera writes:
> Amit Kapila wrote:
>> It will be helpful if you can find the offending query or plan
>> corresponding to it?
>
> So I suppose the PID of the process starting the workers should be in
> the stack somewhere.

Ja, it's right on the top, but long gone by now…

> With that one should be able to attach to that process and get another
> stack trace.  I'm curious on whether you would need to have started
> the server with "postgres -T"

This sounds like it should work to capture more context when the
Assertion fails the next time.  I have to purge the catalogs a bit
though to avoid stopping early on boring core dumps.  Most of them are
currently caused by acl.c using text for syscache lookups and triggering
an NAMEDATALEN assertion.

E.g.: select 
has_language_privilege('smithsmithsmithsmithsmithsmithsmithsmithsmithsmithsmithsmithsmith',
 smith');

thanks,
andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader

2016-04-29 Thread Andreas Seltenreich
Hi,

tonight's sqlsmith run yielded another core dump:

TRAP: FailedAssertion("!(MyProc->lockGroupLeader == ((void *)0))", File: 
"proc.c", Line: 1787)

I couldn't identifiy a query for it though: debug_query_string is empty.
Additionally, the offending query was not reported in the error context
as it typically is for non-parallel executor crashes.

regards,
Andreas

GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Core was generated by `postgres: bgworker: parallel worker for PID 4706 
'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7ff1bda16067 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht 
gefunden.
#0  0x7ff1bda16067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7ff1bda17448 in __GI_abort () at abort.c:89
#2  0x007eaa11 in ExceptionalCondition 
(conditionName=conditionName@entry=0x988318 "!(MyProc->lockGroupLeader == 
((void *)0))", errorType=errorType@entry=0x82a45d "FailedAssertion", 
fileName=fileName@entry=0x8760e5 "proc.c", lineNumber=lineNumber@entry=1787) at 
assert.c:54
#3  0x006e3e7b in BecomeLockGroupLeader () at proc.c:1787
#4  0x004e6a59 in LaunchParallelWorkers (pcxt=pcxt@entry=0x1db05c8) at 
parallel.c:437
#5  0x005ef2d7 in ExecGather (node=node@entry=0x1d9d0b8) at 
nodeGather.c:168
#6  0x005dd788 in ExecProcNode (node=node@entry=0x1d9d0b8) at 
execProcnode.c:515
#7  0x005d999f in ExecutePlan (dest=0x1d7d310, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x1d9d0b8, estate=0x1d9c858) at 
execMain.c:1567
#8  standard_ExecutorRun (queryDesc=0x1db0080, direction=, 
count=0) at execMain.c:338
#9  0x005dcb3f in ParallelQueryMain (seg=, 
toc=0x7ff1be507000) at execParallel.c:716
#10 0x004e608b in ParallelWorkerMain (main_arg=) at 
parallel.c:1033
#11 0x00683a42 in StartBackgroundWorker () at bgworker.c:726
#12 0x0068eb82 in do_start_bgworker (rw=0x1d24ec0) at postmaster.c:5531
#13 maybe_start_bgworker () at postmaster.c:5706
#14 0x0046c993 in ServerLoop () at postmaster.c:1762
#15 0x006909fe in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x1cfa560) at postmaster.c:1298
#16 0x0046d5ed in main (argc=3, argv=0x1cfa560) at main.c:228
(gdb) bt full
#3  0x006e3e7b in BecomeLockGroupLeader () at proc.c:1787
leader_lwlock = 
#4  0x004e6a59 in LaunchParallelWorkers (pcxt=pcxt@entry=0x1db05c8) at 
parallel.c:437
oldcontext = 0x1d9ced0
worker = {
  bgw_name = 
"\220\a\333\001\000\000\000\000\370\340L\276\361\177\000\000\370\373\025\264\361\177\000\000P\a\333\001\000\000\000\000X\310\331\001\000\000\000\000\004\000\000\000\000\000\000\000\310\005\333\001\000\000\000\000\233\222J\000\000\000\000",
 
  bgw_flags = 1, 
  bgw_start_time = BgWorkerStart_PostmasterStart, 
  bgw_restart_time = 1, 
  bgw_main = 0x0, 
  bgw_library_name = 
"\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000u\222J\000\000\000\000\000\350\336\331\001\000\000\000\000\360\260a\000\000\000\000\000\200\315]\000\000\000\000\000\300\330\367\217\375\177\000\000h\205\333\001\000\000\000",
 
  bgw_function_name = 
"`\346\327\001\000\000\000\000\004\000\000\000\000\000\000\000\360\260a\000\000\000\000\000\200\315]\000\000\000\000\000\300\330\367\217\375\177\000\000h\317\331\001\000\000\000\000\210\325\327\001\000\000\000\000\004\000\000\000\000\000\000",
 
  bgw_main_arg = 6402288, 
  bgw_extra = 
"X\310\331\001\000\000\000\000\000\000\000\000\000\000\000\000\060\233\333\001\000\000\000\000\001\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000l\321]\000\000\000\000\000\000\000\000\000\000\000\000\000H-\334\001\000\000\000\000h\317\331\001\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\274\341M\276\361\177\000\000\310\005\333\001\000\000\000\000\004\000\000\000\000\000\000\000\310\005\333\001\000\000\000\000\000\000\000\000\000\000\000",
 
  bgw_notify_pid = 4
}
i = 
any_registrations_failed = 0 '\000'
#5  0x005ef2d7 in ExecGather (node=node@entry=0x1d9d0b8) at 
nodeGather.c:168
pcxt = 0x1db05c8
estate = 
gather = 0x1d7d440
fslot = 0x1d9ced0
i = 
resultSlot = 
isDone = ExprSingleResult
econtext = 
#6  0x005dd788 in ExecProcNode (node=node@entry=0x1d9d0b8) at 
execProcnode.c:515
result = 
__func__ = "ExecProcNode"
#7  0x005d999f in ExecutePlan (dest=0x1d7d310, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x1d9d0b8, estate=0x1d9c858) at 
execMain.c:1567
slot = 
current_tuple_count = 0
#8  standard_ExecutorRun (queryDesc=0x1db0080, direction=, 

[HACKERS] [sqlsmith] Crash in apply_projection_to_path

2016-04-28 Thread Andreas Seltenreich
Hi,

the following query against the regression database crashes master as of
23b09e15.

select 1 from depth0 inner join depth1 on (depth0.c = depth1.c)
 where depth0.c @@ depth1.c limit 1;

regards,
Andreas

Program terminated with signal SIGSEGV, Segmentation fault.
#0  create_projection_path (root=root@entry=0x6918e60,
rel=0x7f7f7f7f7f7f7f7f, subpath=0x69d6de8, target=target@entry=0x69d7428)
at pathnode.c:2160
2160pathnode->path.parallel_safe = rel->consider_parallel &&
#0  create_projection_path (root=root@entry=0x6918e60, rel=0x7f7f7f7f7f7f7f7f, 
subpath=0x69d6de8, target=target@entry=0x69d7428) at pathnode.c:2160
#1  0x0067841e in apply_projection_to_path (root=0x6918e60, 
rel=0x69d5e18, path=0x69d6ee0, target=0x69d7428) at pathnode.c:2251
#2  0x0065ff20 in grouping_planner (root=0x22e1850, 
root@entry=0x6918e60, inheritance_update=56 '8', inheritance_update@entry=0 
'\000', tuple_fraction=2615.2579411764709, tuple_fraction@entry=0) at 
planner.c:1737
#3  0x00661f44 in subquery_planner (glob=glob@entry=0x6918dc8, 
parse=parse@entry=0x235e3b0, parent_root=parent_root@entry=0x0, 
hasRecursion=hasRecursion@entry=0 '\000', 
tuple_fraction=tuple_fraction@entry=0) at planner.c:758
#4  0x00662e2b in standard_planner (parse=0x235e3b0, cursorOptions=256, 
boundParams=0x0) at planner.c:307
#5  0x006f40ed in pg_plan_query (querytree=0x235e3b0, 
cursorOptions=256, boundParams=0x0) at postgres.c:798
#6  0x006f41e4 in pg_plan_queries (querytrees=, 
cursorOptions=256, boundParams=0x0) at postgres.c:857
#7  0x006f5c93 in exec_simple_query (query_string=) at 
postgres.c:1022


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Improve spinlock inline assembly for x86.

2016-01-17 Thread Andreas Seltenreich
Hi,

I'm currently experimenting with just-in-time compilation using libfirm.
While discussing issues with its developers, it was pointed out to me
that our spinlock inline assembly is less than optimal.  Attached is a
patch that addresses this.

,
| Remove the LOCK prefix from the XCHG instruction.  Locking is implicit
| with XCHG and the prefix wastes a byte.  Also remove the "cc" register
| from the clobber list as the XCHG instruction does not modify any flags.
| 
| Reported by Christoph Mallon.
`

regards,
Andreas

>From c836b4f3e0b60d070481d4061e6fe0ffbe488495 Mon Sep 17 00:00:00 2001
From: Andreas Seltenreich <seltenre...@gmx.de>
Date: Sun, 17 Jan 2016 11:51:53 +0100
Subject: [PATCH] Improve spinlock inline assembly for x86.

Remove the LOCK prefix from the XCHG instruction.  Locking is implicit
with XCHG and the prefix wastes a byte.  Also remove the "cc" register
from the clobber list as the xchg instruction does not modify any
flags.

Reported by Christoph Mallon.
---
 src/include/storage/s_lock.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 8b240cd..933bb76 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -158,7 +158,6 @@ tas(volatile slock_t *lock)
 	__asm__ __volatile__(
 		"	cmpb	$0,%1	\n"
 		"	jne		1f		\n"
-		"	lock			\n"
 		"	xchgb	%0,%1	\n"
 		"1: \n"
 :		"+q"(_res), "+m"(*lock)
@@ -226,11 +225,10 @@ tas(volatile slock_t *lock)
 	register slock_t _res = 1;
 
 	__asm__ __volatile__(
-		"	lock			\n"
 		"	xchgb	%0,%1	\n"
 :		"+q"(_res), "+m"(*lock)
 :		/* no inputs */
-:		"memory", "cc");
+:		"memory");
 	return (int) _res;
 }
 
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Add STRICT to some regression test C functions.

2016-01-08 Thread Andreas Seltenreich
Hi,

some of the C functions in the regression test DB readily crash when
passing NULL input values.  The regression tests themselves do not pass
NULL values to them, but when the regression database is used as a basis
for fuzz testing, they cause a lot of noise.  Maybe someone can sneak
this patch in?

Thanks,
Andreas
>From 2711471d48c2e58809c2f4617d36352c5903bbd9 Mon Sep 17 00:00:00 2001
From: Andreas Seltenreich <seltenre...@gmx.de>
Date: Sun, 3 Jan 2016 19:05:06 +0100
Subject: [PATCH] Add STRICT to some regression test C functions.

These functions readily crash when passing NULL input values.  The
regression tests themselves do not pass NULL values to them, but when
the regression database is used as a basis for fuzz testing, they
cause a lot of noise.
---
 src/test/regress/input/create_function_2.source  | 10 +-
 src/test/regress/output/create_function_2.source | 10 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/test/regress/input/create_function_2.source b/src/test/regress/input/create_function_2.source
index 1b013ae..486803d 100644
--- a/src/test/regress/input/create_function_2.source
+++ b/src/test/regress/input/create_function_2.source
@@ -74,27 +74,27 @@ CREATE FUNCTION user_relns()
 CREATE FUNCTION pt_in_widget(point, widget)
RETURNS bool
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 
 CREATE FUNCTION overpaid(emp)
RETURNS bool
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 
 CREATE FUNCTION boxarea(box)
RETURNS float8
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 
 CREATE FUNCTION interpt_pp(path, path)
RETURNS point
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 
 CREATE FUNCTION reverse_name(name)
RETURNS name
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 
 CREATE FUNCTION oldstyle_length(int4, text)
RETURNS int4
diff --git a/src/test/regress/output/create_function_2.source b/src/test/regress/output/create_function_2.source
index 98e1c29..bdfc5be 100644
--- a/src/test/regress/output/create_function_2.source
+++ b/src/test/regress/output/create_function_2.source
@@ -58,23 +58,23 @@ CREATE FUNCTION user_relns()
 CREATE FUNCTION pt_in_widget(point, widget)
RETURNS bool
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 CREATE FUNCTION overpaid(emp)
RETURNS bool
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 CREATE FUNCTION boxarea(box)
RETURNS float8
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 CREATE FUNCTION interpt_pp(path, path)
RETURNS point
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 CREATE FUNCTION reverse_name(name)
RETURNS name
AS '@libdir@/regress@DLSUFFIX@'
-   LANGUAGE C;
+   LANGUAGE C STRICT;
 CREATE FUNCTION oldstyle_length(int4, text)
RETURNS int4
AS '@libdir@/regress@DLSUFFIX@'
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade in 9.5 broken for adminpack

2016-01-03 Thread Andreas Seltenreich
Bruce Momjian writes:

> On Thu, Apr 16, 2015 at 11:29:07PM -0700, Jeff Janes wrote:
>> Of course after sending that it became obvious.  The C function is not 
>> getting
>> called because the SQL function is marked as being strict, yet is called with
>> NULL arguments.
>> 
>> Trivial patch attached to unset strict flag in pg_proc.h.
>> 
>> But  CATALOG_VERSION_NO probably needs another bump as well.
>
> Patch applied and catversion bumped.  Thanks.

Shouldn't there be some validation of arguments now that the function is
no longer marked strict?  Currently, unprivileged users can crash the
server calling binary_upgrade_create_empty_extension with null
arguments.  Found using sqlsmith.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failing assertions in spgtextproc.c

2016-01-02 Thread Andreas Seltenreich
Tom Lane writes:

> Andreas Seltenreich <seltenre...@gmx.de> writes:
>> TRAP: FailedAssertion([...], File: "spgtextproc.c", Line: 424)
>> TRAP: FailedAssertion([...], File: "spgtextproc.c", Line: 564)
>
> Can you show us the definition of the index that's causing this,
> and some samples of the data you're putting in it?

Here's a recipe for triggering the former:

create table t(c text);
create index on t using spgist(c);
insert into t select '' from generate_series(1,1);
set enable_seqscan to off; select count(1) from t;

I still think it's just the assertions being too strict.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failing assertions in spgtextproc.c

2015-12-18 Thread Andreas Seltenreich
I do see two assertions in spgtextproc.c fail on occasion when testing
with sqlsmith:

TRAP: FailedAssertion([...], File: "spgtextproc.c", Line: 424)
TRAP: FailedAssertion([...], File: "spgtextproc.c", Line: 564)

I can't reproduce it reliably but looking at the coredumps, the failing
part of the expression is always

in->level == 0 && DatumGetPointer(in->reconstructedValue) == NULL

In all of the dumps I looked at, in->reconstructedValue contains a
zero-length text instead of the asserted NULL, and the tuples fed to
leaf_consistent()/inner_consistent() look like the one below.

,
| (gdb) p *in
| $1 = {scankeys = 0x60a3ee0, nkeys = 1, reconstructedValue = 101373680, level 
= 0, 
|   returnData = 1 '\001', allTheSame = 1 '\001', hasPrefix = 0 '\000', 
prefixDatum = 0, nNodes = 8, 
|   nodeLabels = 0x37b6768}
| (gdb) x ((text *)in->reconstructedValue)->vl_len_
| 0x60ad6f0:0x0010
| (gdb) p *(text *)in->scankeys[0]->sk_argument
| $2 = {vl_len_ = "0\000\000", vl_dat = 0x855950c "sqlsmith~", '\177' , "\020 "}
| (gdb) p in->nodeLabels[0]
| $3 = 65535
`

Maybe these assertions are just too strict?  I don't see the code
misbehaving when relaxing them to

reconstrValue != NULL && VARSIZE_ANY_EXHDR(reconstrValue) == in->level
  || in->level == 0

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failing assertions in spgtextproc.c

2015-12-18 Thread Andreas Seltenreich
Peter Geoghegan writes:

> Can you do this?:
>
> (gdb) p debug_query_string

output below.  Since sqlsmith ist no longer restricted to read-only
statements, the chances for reproduction are low :-/.

select
  pg_catalog.pg_stat_get_buf_written_backend() as c0,
  subq_1.c0 as c1,
  subq_1.c0 as c2,
  subq_1.c0 as c3
from
  (select
(select ordinal_position from information_schema.parameters limit 1 
offset 12)
 as c0,
ref_2.t as c1
  from
public.radix_text_tbl as ref_2
  inner join pg_catalog.pg_stat_activity as ref_3
  on (ref_2.t = ref_3.application_name )
  where ref_2.t @@ cast(coalesce(ref_2.t, ref_3.client_hostname) as text)
  limit 111) as subq_1,
  lateral (select
subq_1.c0 as c0,
subq_2.c2 as c1,
56 as c2,
cast(coalesce(cast(coalesce((select pop from public.real_city limit 1 
offset 34)
, subq_1.c0) as integer), subq_2.c0) as integer) as c3,
74 as c4,
(select unique1 from public.onek2 limit 1 offset 17)
 as c5
  from
(select
  (select ordinal_position from information_schema.parameters limit 
1 offset 27)
 as c0,
  sample_2.umoptions as c1,
  sample_2.umserver as c2
from
  pg_catalog.pg_user_mapping as sample_2 tablesample system (6.2)
where 49 is NULL) as subq_2
  where cast(coalesce(subq_1.c0, subq_2.c0) as integer) is not NULL
  limit 105) as subq_3
where ((select x from public.tt0 limit 1 offset 12)
 <= subq_3.c0)
  and (subq_3.c4 <= 30);


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-14 Thread Andreas Seltenreich
David Fetter writes:

> On Mon, Dec 14, 2015 at 03:06:18PM +0900, Michael Paquier wrote:
>> On Sun, Dec 13, 2015 at 10:14 AM, Andreas Seltenreich wrote:
>> > https://github.com/anse1/sqlsmith
>> 
>> I am in awe regarding this stuff, which has caught many bugs
>> already, it is a bit sad that it is released under the GPL license
>> preventing a potential integration into PostgreSQL core to
>> strengthen the test infrastructure,
>
> I suspect that a polite request to the Andreas that he change to a
> PostgreSQL-compatible license like one of (TPL, BSD2, MIT) might
> handle this part.

It probably would, but I never thought core integration would be a
desirable thing.  Like Csmith, SQLsmith is intended to be
product-agnostic.  That's not yet the case, but it's still on the
roadmap.

Further, the license shouldn't interfere with institutionalizing
sqlsmith somewhere to automatically send mails on failed assertions or
first sight of an error message.  Or providing a web interface around
the logging database of an instance where one can drill down on logged
errors/queries.

>> and it is even sadder to see a direct dependency with libpqxx :(
>
> I suspect this part is a SMOP, but I'm not quite sure what S might
> constitute in this case.

sqlsmith uses three connections in total:

 1. Connection for sending the generated queries to the DUT
 2. Connection for retrieving the schema from the DUT
 3. Logging connection

1 is trivial to change. 1+2 are intendend to be pluggable for supporting
other products.  Switching 1 to libpq is even on the todo list, as
libpqxx doesn't allow access to the SQLSTATE (There is a four year old
bug report about this by Andres).

2. Is more effort to change to libpq, as actual data is passed through
that connection.

3. Was intended to stay libpqxx-only even when testing other products.

Btw, I'm glad C++11 was no longer considered a blocker this thime :-)

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-12 Thread Andreas Seltenreich
Greg Stark writes:

> There may be other errors that would be surprising for Tom or Robert. What
> I did with the string argument fuzzer was printed  counts for each sqlstate
> for the errors and watched for errors that only occurred occasionally or
> didn't make sense to me.
>
> Also, do you have any timeouts?

I currently set statement_timeout to 1s to avoid wasting time letting
postgres crunch numbers.  Less than 0.5% of the queries run into this
timeout.

> Do you have any stats on how long these queries are taking to plan?
> What's the longest query to plan you've found?

No, I'm currently not logging timing spects.  The schema I let the
instances log into is part of the repo[1].

> Do you have coverage data for the corpus?

I do have some older numbers for line coverage from before the recent
grammar extension:

| revision | overall | parser |
|--+-+|
| a4c1989  |26.0 |   20.4 |

regards,
Andreas

Footnotes: 
[1]  https://github.com/anse1/sqlsmith/blob/master/log.sql


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-12 Thread Andreas Seltenreich
Greg Stark writes:

> On Sat, Dec 12, 2015 at 8:30 PM, Andreas Seltenreich <seltenre...@gmx.de> 
> wrote:
> When you hit the timeout is this implemented in your fuzzer or using
> statement_timeout? If the former, can you add a statement_timeout of
> just short of the timeout in the fuzzer and find cases where the
> planner might not be calling CHECK_FOR_INTERRUPTS frequently enough?

It's the latter.  I don't think I can add a client-side timeout into
sqlsmith elegantly.  IMHO it's better to write another test tool that
just re-runs the queries that were logged with a timeout by sqlsmith and
investigates their timeout-behavior more closely.

>> I do have some older numbers for line coverage from before the recent 
>> grammar extension:
>
> If you have a corpus of queries in a simple format it would be pretty
> convenient to add them in a regression test and then run make coverage
> to get html reports.

Hmm, I thought I found a workflow that would yield sqlsmith's coverage
without integrating it into the regession tests.  This is what I did:

make install
initdb /tmp/gcov
pg_ctl -D /tmp/gcov start
make installcheck
pg_ctl -D /tmp/gcov stop
make coverage-clean
pg_ctl -D /tmp/gcov start
sqlsmith --target='dbname=regression' --max-queries=1
pg_ctl -D /tmp/gcov stop
make coverage-html

It seems to yield a pure sqlsmith-only coverage report, as a "make
coverage-html" before the "make coverage-clean" yields a report with
much higher score.  Maybe there are drawbacks to the workflow you are
suggesting?  I just re-did it with the current sqlsmith code, and it's
up by 25% compared to the latest tested revision:

| revision | overall | parser |
|--+-+|
| a4c1989  |26.0 |   20.4 |
| ee099e6  |33.8 |   25.8 |

I also put the report here, in case someone wants to look at certain
details, or make suggestions into what directions to best extend the
grammar to increase coverage.

http://ansel.ydns.eu/~andreas/coverage/
http://ansel.ydns.eu/~andreas/gcov.tar.xz

> Did you publish the source already? I haven't been following all
> along, sorry if these are all answered questions.

It's not had a proper release yet, but the code is available via github
in all its rapid-prototypesque glory:

https://github.com/anse1/sqlsmith

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-11 Thread Andreas Seltenreich
Tom Lane writes:
> [2. transitive-lateral-fixes-2.patch ]
> [2. remove-lateraljoininfo-2.patch ]

They seem to have fixed the issue for good now.  No errors have been
logged for 2e8 queries since applying the first patch.  (The second one
was applied later and didn't get as much exposure.)  I guess that means
I have to go back to extending the grammar again :-).

regards
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-11 Thread Andreas Seltenreich
Peter Geoghegan writes:

> On Sun, Dec 6, 2015 at 9:52 AM, Andreas Seltenreich <seltenre...@gmx.de> 
> wrote:
>> I've added new grammar rules to sqlsmith and improved some older ones.
>
> Could you possibly teach sqlsmith about INSERT ... ON CONFLICT DO
> UPDATE/IGNORE? I think that that could be very helpful, especially if
> it could be done in advance of any stable release of 9.5.

In summary, it can't be added ad-hoc, but might still happen in advance
of the release of 9.5.

Adding upsert needs significiant effort because historically,
non-boolean value expression productions yield a random type.  This is
not a problem for generating queries, but it is for inserts.  Also,
sqlsmith can at the moment only generate sensible value expressions from
column references.  Generating a proper upsert would require supporting
type-constraining of productions as well as adding productions for
pulling values out of thin air (e.g., generating atomic value subselects
or calling argumentless functions).

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-10 Thread Andreas Seltenreich
Tom Lane writes:

> [2. transitive-lateral-fixes-1.patch]

I was about to write that sqlsmith likes the patch, but after more than
10^8 ok queries the attached ones were generated.

regards,
Andreas



post-patch-errors.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-10 Thread Andreas Seltenreich
Tom Lane writes:

> Merlin Moncure  writes:
>> Aside from the functional issues, could your changes result in
>> performance regressions?
[...]
> It's a little bit harder to gauge the impact on planner speed.  The
> transitive closure calculation could be expensive in a query with many
> lateral references, but that doesn't seem likely to be common; and anyway
> we'll buy back some of that cost due to simpler tests later.  I'm
> optimistic that we'll come out ahead in HEAD/9.5 after the removal
> of LateralJoinInfo setup.  It might be roughly a wash in the back
> branches.

On the empirical side: I see a speedup of 0.4% in testing speed with the
patch applied.  It could very well be me venting the room one additional
time during the second session, resulting in the CPUs spending more time
in their opportunistic frequency range or something.

regards,
Andreas

smith=# select extract('day' from t),
   avg(generated/extract(epoch from updated - t)) as tps
from instance natural join stat
where generated > 100
  and hostname not in('dwagon','slugbug')
   -- these do stuff beside sqlsmith
  and t > now() - interval '3 days' group by 1 order by 1;
 date_part |   tps
---+--
 8 |  55.494181110456
 9 | 55.6902316869404


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-07 Thread Andreas Seltenreich
I wrote:

> Tom Lane writes:
>> Andreas Seltenreich <seltenre...@gmx.de> writes:
>>> I've added new grammar rules to sqlsmith and improved some older ones.
>>> This was rewarded with a return of "failed to generate plan" errors.
>>
>> I believe I've dealt with these cases now.  Thanks for the report!
>
> I no longer see "failed to build any n-way joins" after pulling, but
> there are still instances of "could not devise a query plan". Samples below.

sorry, I spoke too soon: nine of the former have been logged through the
night.  I'm attaching a larger set of sample queries this time in case
that there are still multiple causes for the observed errors.

regards,
Andreas

-- select query||';' from error e
--  where (msg ~~ 'ERROR:  failed to build any%')
--  and e.t > now() - interval '8 hours'
--  order by length(query) asc;

select  
  ref_0.collation_schema as c0
from 
  (select  
  sample_0.is_supported as c0
from 
  information_schema.sql_packages as sample_0 tablesample system (0.8) 
where 25 > 15
fetch first 60 rows only) as subq_0
right join information_schema.collations as ref_0
on (subq_0.c0 = ref_0.collation_catalog ),
  lateral (select  
ref_1.f2 as c0, 
ref_1.f1 as c1, 
ref_0.collation_name as c2, 
ref_0.pad_attribute as c3, 
subq_0.c0 as c4, 
subq_0.c0 as c5, 
subq_0.c0 as c6
  from 
public.func_index_heap as ref_1
  where ref_1.f2 ~ ref_1.f1
  fetch first 170 rows only) as subq_1,
  lateral (select  
sample_8.b as c0, 
subq_1.c1 as c1, 
sample_8.a as c2
  from 
public.clstr_tst as sample_8 tablesample system (8.8) 
left join public.f_star as sample_9 tablesample bernoulli (7.5) 
on (sample_8.a = sample_9.aa )
  left join public.rules_log as sample_10 tablesample bernoulli (7) 
  on (sample_8.b = sample_10.f1 )
  where sample_10.tag <> sample_8.c
  fetch first 126 rows only) as subq_2
where (ref_0.collation_name is NULL) 
  or (subq_2.c1 !~ subq_2.c1)
fetch first 56 rows only;
select  
  subq_16.c4 as c0
from 
  public.rule_and_refint_t1 as sample_18 tablesample system (9.7) 
left join (select  
sample_20.xx as c0, 
sample_20.xx as c1
  from 
public.inhf as sample_20 tablesample bernoulli (1) 
  where sample_20.xx is not NULL) as subq_11
  inner join public.main_table as sample_25 tablesample system (1.2) 
inner join (select  
  sample_26.b as c0, 
  subq_15.c1 as c1, 
  subq_15.c0 as c2, 
  32 as c3, 
  subq_15.c0 as c4
from 
  public.dropcolumn as sample_26 tablesample system (7.6) ,
  lateral (select  
sample_27.t as c0, 
sample_26.b as c1
  from 
public.radix_text_tbl as sample_27 tablesample system (4.6) 
  where (sample_26.b is not NULL) 
and (sample_27.t ~~ sample_27.t)) as subq_15
where subq_15.c0 !~ subq_15.c0) as subq_16
on (sample_25.b = subq_16.c0 )
  on (subq_11.c0 = subq_16.c2 )
on (sample_18.id1a = sample_25.a ),
  lateral (select  
subq_16.c3 as c0, 
subq_17.c1 as c1, 
sample_18.id1a as c2, 
coalesce(subq_16.c0, subq_16.c1) as c3
  from 
public.rtest_vcomp as ref_21,
lateral (select  
  sample_28.a as c0, 
  ref_21.size_in_cm as c1, 
  subq_11.c0 as c2, 
  sample_18.id1a as c3
from 
  public.tab1 as sample_28 tablesample system (8.2) 
where subq_11.c1 <= sample_28.b
fetch first 111 rows only) as subq_17
  where ref_21.size_in_cm is NULL
  fetch first 101 rows only) as subq_18
where subq_11.c0 ~>=~ subq_11.c1
fetch first 94 rows only;
select  
  subq_33.c0 as c0, 
  subq_33.c0 as c1
from 
  (select  
sample_95.y as c0, 
sample_95.z as c1
  from 
public.check2_tbl as sample_95 tablesample system (5.1) 
  where sample_95.y = sample_95.y) as subq_28
  right join pg_catalog.pg_user as ref_101
  on (subq_28.c0 = ref_101.passwd )
left join pg_catalog.pg_opfamily as sample_96 tablesample bernoulli (9.8) 
on (ref_101.usesysid = sample_96.opfmethod ),
  lateral (select  
subq_31.c0 as c0, 
sample_97.e as c1, 
sample_97.e as c2
  from 
public.dropcolumnchild as sample_97 tablesample system (2.4) ,
lateral (select  
  subq_30.c0 as c0
from 
  public.random_tbl as sample_98 tablesample system (8.4) ,
  lateral (select  
subq_28.c1 as c0, 
 

Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-07 Thread Andreas Seltenreich
Tom Lane writes:

> Andreas Seltenreich <seltenre...@gmx.de> writes:
>> I've added new grammar rules to sqlsmith and improved some older ones.
>> This was rewarded with a return of "failed to generate plan" errors.
>
> I believe I've dealt with these cases now.  Thanks for the report!

I no longer see "failed to build any n-way joins" after pulling, but
there are still instances of "could not devise a query plan". Samples below.

regards,
Andreas

select
  ref_1.aa as c0,
  subq_1.c1 as c1,
  coalesce(ref_1.class, ref_1.class) as c2,
  subq_1.c0 as c3
from
  (select
  subq_0.c1 as c0,
  coalesce(sample_0.a, sample_1.i) as c1
from
  public.rtest_t9 as sample_0 tablesample bernoulli (5.6)
inner join public.iportaltest as sample_1 tablesample bernoulli 
(9.8)
on (sample_0.a = sample_1.i ),
  lateral (select
sample_1.d as c0,
ref_0.a as c1,
sample_1.p as c2,
ref_0.a as c3,
ref_0.a as c4,
sample_0.b as c5,
sample_1.i as c6
  from
public.rtest_view2 as ref_0
  where sample_0.b = sample_0.b
  fetch first 93 rows only) as subq_0
where sample_0.b ~<=~ sample_0.b) as subq_1
right join public.e_star as ref_1
on (subq_1.c0 = ref_1.aa )
where ref_1.cc < ref_1.cc
fetch first 59 rows only;

select
  sample_69.tmpllibrary as c0,
  coalesce(sample_69.tmplname, sample_69.tmplname) as c1,
  subq_33.c0 as c2
from
  (select
  coalesce(ref_53.provider, sample_68.typdefault) as c0
from
  pg_catalog.pg_type as sample_68 tablesample bernoulli (6.9)
inner join pg_catalog.pg_shseclabel as ref_53
on (sample_68.typowner = ref_53.objoid ),
  lateral (select
sample_68.typcategory as c0,
ref_54.speaker as c1,
ref_54.speaker as c2
  from
public.test_range_excl as ref_54
  where (ref_53.label >= ref_53.provider)
and (ref_53.label !~* ref_53.provider)
  fetch first 143 rows only) as subq_32
where ref_53.label ~>~ ref_53.label) as subq_33
right join pg_catalog.pg_pltemplate as sample_69 tablesample bernoulli (9.8)
on (subq_33.c0 = sample_69.tmplhandler )
where sample_69.tmplvalidator ~ subq_33.c0
fetch first 131 rows only;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries

2015-12-06 Thread Andreas Seltenreich
Hi,

I've added new grammar rules to sqlsmith and improved some older ones.
This was rewarded with a return of "failed to generate plan" errors.
The failing queries all contain a lateral subquery.  The shortest of the
failing queries are below.  They were run against the regression db of
master as of db07236.

regards,
Andreas

smith=# select msg, query from error where
   (firstline(msg) ~~ 'ERROR:  failed to build any%'
   or firstline(msg) ~~ 'ERROR:  could not devise a query plan%')
  and t > now() - interval '1 day' order by length(query) asc limit 3;

ERROR:  failed to build any 8-way joins
select
  ref_96.foreign_table_schema as c0,
  sample_87.is_supported as c1
from
  information_schema.sql_packages as sample_87 tablesample system (0.2)
right join information_schema._pg_foreign_tables as ref_96
on (sample_87.feature_id = ref_96.foreign_table_catalog ),
  lateral (select
sample_87.is_verified_by as c0,
ref_97.indexed_col as c1,
coalesce(sample_87.feature_id, ref_96.foreign_server_name) as c2,
4 as c3
  from
public.comment_test as ref_97
  where ref_97.id ~>~ ref_97.indexed_col
  fetch first 73 rows only) as subq_33
where ref_96.foreign_table_name ~~ subq_33.c1

ERROR:  could not devise a query plan for the given query
select
  subq_43.c0 as c0
from
  (select
  ref_181.installed as c0
from
  pg_catalog.pg_available_extension_versions as ref_181,
  lateral (select
ref_181.name as c0,
ref_181.installed as c1
  from
pg_catalog.pg_conversion as ref_182
  where ref_182.conname ~~* ref_181.version
  fetch first 98 rows only) as subq_42
where (subq_42.c0 is not NULL)
  or (subq_42.c1 is NULL)) as subq_43
right join pg_catalog.pg_language as sample_177 tablesample system (2.8)
on (subq_43.c0 = sample_177.lanispl )
where sample_177.lanowner < sample_177.lanvalidator

ERROR:  failed to build any 5-way joins
select
  ref_239.id2 as c0,
  40 as c1,
  ref_239.id2 as c2,
  ref_238.aa as c3
from
  public.tt5 as sample_289 tablesample system (8.1)
inner join information_schema.element_types as ref_237
on (sample_289.x = ref_237.character_maximum_length )
  left join public.b as ref_238
  on (ref_237.character_maximum_length = ref_238.aa )
left join public.num_exp_mul as ref_239
on (ref_237.numeric_precision_radix = ref_239.id1 ),
  lateral (select
sample_290.b as c0,
sample_289.y as c1,
ref_239.id2 as c2
  from
public.rtest_t8 as sample_290 tablesample bernoulli (4.6)
  where (sample_290.b > ref_238.bb)
and (sample_289.y > ref_239.expected)
  fetch first 91 rows only) as subq_64
where (subq_64.c1 > sample_289.y)
  and (sample_289.y = ref_239.expected)
fetch first 133 rows only


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-25 Thread Andreas Seltenreich
[ adding Bjorn Munch to Cc ]

Jim Nasby writes:
> On 9/20/15 9:23 AM, Christoph Berg wrote:
>> a short update here: the customer updated the compiler to a newer
>> version, is now compiling using -O2 instead of -O3, and the code
>> generated now looks sane, so this turned out to be a compiler issue.
>> (Though it's unclear if the upgrade fixed it, or the different -O
>> level.)
>
> Do we officially not support anything > -O2? If so it'd be nice if
> configure threw at least a warning (if not an error that you had to
> explicitly over-ride).

At least the solaris binaries distributed via postgresql.org[1] have
been compiled with -xO3 according to pg_config.  And their code for
multixact.c looks inconspicuous.  To recap the data points:

| compiler  | flags | multixact.o |
|---+---+-|
| Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3  | bad |
| Sun C 5.13 SunOS_Sparc 2014/10/20 | -xO2  | good|
| Sun C 5.8 Patch 121015-04 2007/01/10  | -xO3  | good|

regards,
Andreas

Footnotes: 
[1]  http://www.postgresql.org/download/solaris/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >