Constraint's NO INHERIT option is ignored in CREATE TABLE LIKE statement
Hi hackers, My colleague Chris Travers discovered something that looks like a bug. Let's say we have a table with a constraint that is declared as NO INHERIT. CREATE TABLE test ( x INT CHECK (x > 0) NO INHERIT ); \d test Table "public.test" Column | Type | Collation | Nullable | Default +-+---+--+- x | integer | | | Check constraints: "test_x_check1" CHECK (x > 0) NO INHERIT Now when we want to make a copy of the table structure into a new table the `NO INHERIT` option is ignored. CREATE TABLE test2 (LIKE test INCLUDING CONSTRAINTS); \d test2 Table "public.test2" Column | Type | Collation | Nullable | Default +-+---+--+- x | integer | | | Check constraints: "test_x_check1" CHECK (x > 0) Is this a bug or expected behaviour? Just in case I've attached a patch that fixes this. Regards, Ildar diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index 2406ca7a5d..d75ec6766a 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -1154,6 +1154,7 @@ transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla n->contype = CONSTR_CHECK; n->location = -1; n->conname = pstrdup(ccname); + n->is_no_inherit = tupleDesc->constr->check[ccnum].ccnoinherit; n->raw_expr = NULL; n->cooked_expr = nodeToString(ccbin_node); cxt->ckconstraints = lappend(cxt->ckconstraints, n);
Compressed pluggable storage experiments
Hi hackers, I've been experimenting with pluggable storage API recently and just feel like I can share my first experience. First of all it's great to have this API and that now community has the opportunity to implement alternative storage engines. There are a few applications that come to mind and a compressed storage is one of them. Recently I've been working on a simple append-only compressed storage [1]. My first idea was to just store data into compressed 1mb blocks in a continuous file and keep separate file for block offsets (similar to Knizhnik's CFS proposal). But then i realized that then i won't be able to use most of postgres' infrastructure like WAL-logging and also won't be able to implement some of the functions of TableAmRoutine (like bitmap scan or analyze). So I had to adjust extension the way to utilize standard postgres 8kb blocks: compressed 1mb blocks are split into chunks and distributed among 8kb blocks. Current page layout looks like this: ┌───┐ │ metapage │ └───┘ ┌───┐ ┐ │ block 1 │ │ ├...┤ │ compressed 1mb block │ block k │ │ └───┘ ┘ ┌───┐ ┐ │ block k+1 │ │ ├...┤ │ another compressed 1mb block │ block m │ │ └───┘ ┘ Inside compressed blocks there are regular postgres heap tuples. The following is the list of things i stumbled upon while implementing storage. Since API is just came out there are not many examples of pluggable storages and even less as external extensions (I managed to find only blackhole_am by Michael Paquier which doesn't do much). So many things i had to figure out by myself. Hopefully some of those issues have a solution that i just can't see. 1. Unlike FDW API, in pluggable storage API there are no routines like "begin modify table" and "end modify table" and there is no shared state between insert/update/delete calls. In context of compressed storage that means that there is no exact moment when we can finalize writes (compress, split into chunks etc). We can set a callback at the end of transaction, but in this case we'll have to keep latest modifications for every table in memory until the end of transaction. As for shared state we also can maintain some kind of key-value data structure with per-relation shared state. But that again requires memory. Because of this currently I only implemented COPY semantics. 2. It looks like I cannot implement custom storage options. E.g. for compressed storage it makes sense to implement different compression methods (lz4, zstd etc.) and corresponding options (like compression level). But as i can see storage options (like fillfactor etc) are hardcoded and are not extensible. Possible solution is to use GUCs which would work but is not extremely convinient. 3. A bit surprising limitation that in order to use bitmap scan the maximum number of tuples per page must not exceed 291 due to MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on 8kb page size. In case of 1mb page this restriction feels really limiting. 4. In order to use WAL-logging each page must start with a standard 24 byte PageHeaderData even if it is needless for storage itself. Not a big deal though. Another (acutally documented) WAL-related limitation is that only generic WAL can be used within extension. So unless inserts are made in bulks it's going to require a lot of disk space to accomodate logs and wide bandwith for replication. pg_cryogen extension is still in developement so if other issues arise i'll post them here. At this point the extension already supports inserts via COPY, index and bitmap scans, vacuum (only freezing), analyze. It uses lz4 compression and currently i'm working on adding different compression methods. I'm also willing to work on forementioned issues in API if community verifies them as valid. [1] https://github.com/adjust/pg_cryogen Thanks, Ildar
Duplicated LSN in ReorderBuffer
Hi hackers, I believe we found a bug in logical decoding. It only occures with casserts enabled. It was originally discovered and reproduced by Murat Kabilov and Ildus Kurbangaliev. Here is the stacktrace we've got: #0 0x7facc66ef82f in raise () from /usr/lib/libc.so.6 #1 0x7facc66da672 in abort () from /usr/lib/libc.so.6 #2 0x00ac4ebf in ExceptionalCondition ( conditionName=0xccdea8 "!(prev_first_lsn < cur_txn->first_lsn)", errorType=0xccdce4 "FailedAssertion", fileName=0xccdd38 "reorderbuffer.c", lineNumber=680) at assert.c:54 #3 0x008a9515 in AssertTXNLsnOrder (rb=0x25ca128) at reorderbuffer.c:680 #4 0x008a900f in ReorderBufferTXNByXid (rb=0x25ca128, xid=65609, create=true, is_new=0x0, lsn=211590864, create_as_top=true) at reorderbuffer.c:559 #5 0x008abf0d in ReorderBufferAddNewTupleCids (rb=0x25ca128, xid=65609, lsn=211590864, node=..., tid=..., cmin=0, cmax=4294967295, combocid=4294967295) at reorderbuffer.c:2098 #6 0x008b096b in SnapBuildProcessNewCid (builder=0x25d0158, xid=65610, lsn=211590864, xlrec=0x25d60b8) at snapbuild.c:781 #7 0x0089d01c in DecodeHeap2Op (ctx=0x25ba0b8, buf=0x7ffd0e294da0) at decode.c:382 #8 0x0089c8ca in LogicalDecodingProcessRecord (ctx=0x25ba0b8, record=0x25ba378) at decode.c:125 #9 0x008a124c in DecodingContextFindStartpoint (ctx=0x25ba0b8) at logical.c:492 #10 0x008b9c3d in CreateReplicationSlot (cmd=0x257be20) at walsender.c:957 #11 0x008baa60 in exec_replication_command ( cmd_string=0x24f5b08 "CREATE_REPLICATION_SLOT temp_slot_name TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1531 #12 0x00937230 in PostgresMain (argc=1, argv=0x25233a8, dbname=0x2523380 "postgres", username=0x24f23c8 "zilder") at postgres.c:4245 #13 0x00881453 in BackendRun (port=0x251a900) at postmaster.c:4431 #14 0x00880b4f in BackendStartup (port=0x251a900) at postmaster.c:4122 #15 0x0087cbbe in ServerLoop () at postmaster.c:1704 #16 0x0087c34a in PostmasterMain (argc=3, argv=0x24f0330) at postmaster.c:1377 #17 0x007926b6 in main (argc=3, argv=0x24f0330) at main.c:228 After viewing coredump we see that `prev_first_lsn == cur_txn->first_lsn` The problem seems to be that ReorderBuffer adds two ReorderBufferTXNs with the same LSN, but different transaction ids: subxid and top-level xid. See FIX part below. STEPS TO REPRODUCE -- We were able reproduce it on 10, 11 and on master branch. Postgres was configured as: ./configure --enable-cassert CFLAGS='-ggdb3 -O0' --prefix=$HOME/pg12 Additional options in postgresql.conf: wal_level='logical' max_connections=1000 max_replication_slots=100 max_wal_senders=100 max_logical_replication_workers=100 pgbench scripts: $ cat create_table.sql BEGIN; SAVEPOINT p1; CREATE temp TABLE t_t (id INT) ON COMMIT DROP; ROLLBACK TO SAVEPOINT p1; ROLLBACK; $ cat create_slot.sql BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY; SELECT pg_create_logical_replication_slot('test' || pg_backend_pid(), 'pgoutput', true); SELECT pg_drop_replication_slot('test' || pg_backend_pid()); ROLLBACK; Run in parallel terminals: $HOME/pg12/bin/pgbench postgres -f create_table.sql -T1000 -c50 -j50 $HOME/pg12/bin/pgbench postgres -f create_slot.sql -T1000 -c50 -j50 It may take some time. On my local machine it breaks in few seconds. FIX? Can't say that i have enough understanding of what's going on in the logical decoding code. But the one thing i've noticed is inconsistency of xids used to make ReorderBufferTXNByXid() call: 1. first, in DecodeHeap2Op() function ReorderBufferProcessXid() is called with subtransaction id; it actually creates ReorderBufferTXN and adds it to reorder buffer's hash table and toplevel_by_lsn list; 2. second, within ReorderBufferXidSetCatalogChanges() it uses same subxid to lookup the ReorderBufferTXN that was created before, successfully; 3. now in ReorderBufferAddNewTupleCids() it uses top-level transaction id instead for lookup; it cannot find xid in hash table and tries to add a new record with the same LSN. And it fails since this LSN is already in toplevel_by_lsn list. Attached is a simple patch that uses subxid instead of top-level xid in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but i'm not sure that this is a valid change. Can someone please verify it or maybe suggest a better solution for the issue? Best regards, Ildar diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 3e9d4cd79f..6c03c85768 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -778,7 +778,7 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid, */ ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn); - ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn, + ReorderBufferAddNewTupleCids(buil
Re: [HACKERS] Custom compression methods
Hi Ildus, On Fri, Mar 15, 2019 at 12:52 PM Ildus Kurbangaliev < i.kurbangal...@gmail.com> wrote: > > Hi, > in my opinion this patch is usually skipped not because it is not > needed, but because of its size. It is not hard to maintain it until > commiters will have time for it or I will get actual response that > nobody is going to commit it. > > Attached latest set of patches. > > As I understand, the only thing changed since my last review is an additional compression method for zlib. The code looks good. I have one suggestion though. Currently you only predefine two compression levels: `best_speed` and `best_compression`. But zlib itself allows a fine gradation between those two. It is possible to set level to the values from 0 to 9 (where zero means no compression at all which I guess isn't useful in our case). So I think we should allow user choose between either textual representation (as you already did) or numeral. Another thing is that one can specify, for instance, `best_speed` level, but not `BEST_SPEED`, which can be a bit frustrating for user. Regards, Ildar Musin
Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2
Hello, The patch needs rebase as it doesn't apply to the current master. I applied it to the older commit to test it. It worked fine so far. I found one bug though which would cause resolver to finish by timeout even though there are unresolved foreign transactions in the list. The `fdw_xact_exists()` function expects database id as the first argument and xid as the second. But everywhere it is called arguments specified in the different order (xid first, then dbid). Also function declaration in header doesn't match its definition. There are some other things I found. * In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is declared as bool but used as integer. * In fdwxact.c's module comment there are `FdwXactRegisterForeignTransaction()` and `FdwXactMarkForeignTransactionModified()` functions mentioned that are not there anymore. * In documentation (storage.sgml) there is no mention of `pg_fdw_xact` directory. Couple of stylistic notes. * In `FdwXactCtlData struct` there are both camel case and snake case naming used. * In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with `TransactionIdIsValid(xid)`. * In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of format string instead of being processed by `sprintf` as an extra argument. I'll continue looking into the patch. Thanks! On Tue, Nov 20, 2018 at 12:18 PM Masahiko Sawada wrote: > On Thu, Nov 15, 2018 at 7:36 PM Masahiko Sawada > wrote: > > > > On Mon, Oct 29, 2018 at 6:03 PM Masahiko Sawada > wrote: > > > > > > On Mon, Oct 29, 2018 at 10:16 AM Masahiko Sawada < > sawada.m...@gmail.com> wrote: > > > > > > > > On Wed, Oct 24, 2018 at 9:06 AM Masahiko Sawada < > sawada.m...@gmail.com> wrote: > > > > > > > > > > On Tue, Oct 23, 2018 at 12:54 PM Kyotaro HORIGUCHI > > > > > wrote: > > > > > > > > > > > > Hello. > > > > > > > > > > > > # It took a long time to come here.. > > > > > > > > > > > > At Fri, 19 Oct 2018 21:38:35 +0900, Masahiko Sawada < > sawada.m...@gmail.com> wrote in > > > > > > > > On Wed, Oct 10, 2018 at 1:34 PM Masahiko Sawada < > sawada.m...@gmail.com> wrote: > > > > > > ... > > > > > > > * Updated docs, added the new section "Distributed > Transaction" at > > > > > > > Chapter 33 to explain the concept to users > > > > > > > > > > > > > > * Moved atomic commit codes into src/backend/access/fdwxact > directory. > > > > > > > > > > > > > > * Some bug fixes. > > > > > > > > > > > > > > Please reivew them. > > > > > > > > > > > > I have some comments, with apologize in advance for possible > > > > > > duplicate or conflict with others' comments so far. > > > > > > > > > > Thank youf so much for reviewing this patch! > > > > > > > > > > > > > > > > > 0001: > > > > > > > > > > > > This sets XACT_FLAG_WROTENONTEMPREL when RELPERSISTENT_PERMANENT > > > > > > relation is modified. Isn't it needed when UNLOGGED tables are > > > > > > modified? It may be better that we have dedicated classification > > > > > > macro or function. > > > > > > > > > > I think even if we do atomic commit for modifying the an UNLOGGED > > > > > table and a remote table the data will get inconsistent if the > local > > > > > server crashes. For example, if the local server crashes after > > > > > prepared the transaction on foreign server but before the local > commit > > > > > and, we will lose the all data of the local UNLOGGED table whereas > the > > > > > modification of remote table is rollbacked. In case of persistent > > > > > tables, the data consistency is left. So I think the keeping data > > > > > consistency between remote data and local unlogged table is > difficult > > > > > and want to leave it as a restriction for now. Am I missing > something? > > > > > > > > > > > > > > > > > The flag is handled in heapam.c. I suppose that it should be done > > > > > > in the upper layer considering coming pluggable storage. > > > > > > (X_F_ACCESSEDTEMPREL is set in heapam, but..) > > > > > > > > > > > > > > > > Yeah, or we can set the flag after heap_insert in ExecInsert. > > > > > > > > > > > > > > > > > 0002: > > > > > > > > > > > > The name FdwXactParticipantsForAC doesn't sound good for me. How > > > > > > about FdwXactAtomicCommitPartitcipants? > > > > > > > > > > +1, will fix it. > > > > > > > > > > > > > > > > > Well, as the file comment of fdwxact.c, > > > > > > FdwXactRegisterTransaction is called from FDW driver and > > > > > > F_X_MarkForeignTransactionModified is called from executor. I > > > > > > think that we should clarify who is responsible to the whole > > > > > > sequence. Since the state of local tables affects, I suppose > > > > > > executor is that. Couldn't we do the whole thing within executor > > > > > > side? I'm not sure but I feel that > > > > > > F_X_RegisterForeignTransaction can be a part of > > > > > > F_X_MarkForeignTransactionModified. The callers of > > > > > > MarkForeignTransactionModified can find whether the table is > > > > > > involved in 2pc by IsT
Possibly redundant context switch in postgres_fdw
Hi hackers, ISTM that context switch in `create_cursor()`: if (numParams > 0) { MemoryContext oldcontext; oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory); process_query_params(econtext, fsstate->param_flinfo, fsstate->param_exprs, values); MemoryContextSwitchTo(oldcontext); } is redundant since we should already be in `ecxt_per_tuple_memory` context according to `ForeignNext()`. Do I miss some hidden purpose? If not here is a patch that removes it. Regards, Ildar Musin diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index fd20aa96aa..65962ea657 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -3083,16 +3083,10 @@ create_cursor(ForeignScanState *node) */ if (numParams > 0) { - MemoryContext oldcontext; - - oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory); - process_query_params(econtext, fsstate->param_flinfo, fsstate->param_exprs, values); - - MemoryContextSwitchTo(oldcontext); } /* Construct the DECLARE CURSOR command */
Re: hostorder and failover_timeout for libpq
Hello Surafel, On Fri, Sep 14, 2018 at 2:03 PM Surafel Temesgen wrote: > Hey , > Here are a few comment. > + xreflabel="failover_timeout"> > Here's a typo: ="libpq-connect-falover-timeout" > + {"failover_timeout", NULL, NULL, NULL, > + "Failover Timeout", "", 10, > Word is separated by hyphen in internalPQconninfoOption lable as a > surrounding code > +If the value is random, the host to connect to > +will be randomly picked from the list. It allows load balacing > between > +several cluster nodes. > I Can’t think of use case where randomly picking a node rather than in > user specified order can load balance the cluster better. Can you > explain the purpose of this feature more? Probably load-balancing is a wrong word for this. Think of it as a connection routing mechanism. Let's say you have 10 servers and 100 clients willing to establish read-only connection. Without this feature all clients will go to the first specified host (unless they hit max_connections limit). And with random `hostorder` they would be splited between hosts more or less evenly. > And in the code I can’t see > a mechanism for preventing picking one host multiple time > The original idea was to collect all ip addresses that we get after resolving specified hostnames, put those addresses into a global array, apply random permutations to it and then use round robin algorithm trying to connect to each of them until we succeed. Now I'm not sure that this approach was the best. There are two concerns: 1. host name can be resolved to several ip addresses (which in turn can point to either the same physical server with multiple network interfaces or different servers). In described above schema each ip address would be added to the global array. This may lead to a situation when one host gets higher chance of being picked because it has more addresses in global array than other hosts. 2. host may support both ipv4 and ipv6 connections, which again leads to extra items in global array and therefore also increases its chance to be picked. Another approach would be to leave `pg_conn->connhost` as it is now (i.e. not to create global addresses array) and just apply random permutations to it if `hostorder=random` is specified. And probably apply permutations to addresses list within each individual host. At this point I'd like to ask community what in your opinion would be the best course of action and whether this feature should be implemented within libpq at all? Because from my POV there are factors that really depend on network architecture and there is probably no single right solution. Kind regards, Ildar
Re: PATCH pass PGOPTIONS to pg_regress
Hi Manuel, On 29.05.2018 16:19, Manuel Kniep wrote: Hi, attached patch passes PGOPTIONS env variable to pg_regress in pgxs.mk This is especially useful when developing extensions for different postgres versions where some session_variables might or might not exists. Consider something like this in an extensions makefile: ifeq ($(shell test $(VERSION_NUM) -ge 90600; echo $$?),0) PGOPTIONS+= "--max_parallel_workers_per_gather=0" endif But also when there are many testfiles it might be convenient to align some session parameter globally In the Makefile. Have you considered using EXTRA_REGRESS_OPTS variable in extension's Makefile? EXTRA_REGRESS_OPTS=--temp-config=$(top_srcdir)/$(subdir)/extra.conf Here extra.conf is implied to be a file in extension's root directory which contains additional server options. This would only work for `make check` though, not `make installcheck`. -- Ildar Musin i.mu...@postgrespro.ru
Re: MAP syntax for arrays
On 08.05.2018 17:15, Peter Eisentraut wrote: On 5/8/18 09:19, Chapman Flack wrote: On 05/08/2018 08:57 AM, Ildar Musin wrote: select map (pow(2, x) - 1 for x in array[1,2,3,4,5]); I wonder how efficient an implementation would be possible strictly as a function, without grammar changes? Yeah, you can pass a function to another function (using regprocedure or just oid), so this should be possible entirely in user space. The problem with this approach is that extension should either have single map() function with input and output type of anyarray which cannot be used when user needs to map int[] to text[] for example. Or the other way there should be a set of map functions for different intput/output types. Another thing is that this approach is not as versatile since user need to create a function before running map, while with the proposed patch they could run arbitrary expression over the array directly. -- Ildar Musin i.mu...@postgrespro.ru
Re: MAP syntax for arrays
On 08.05.2018 15:49, Ildar Musin wrote: select map (pow(x, 2) - 1 for x in array[1,2,3]); Sorry, the example should be: select map (pow(2, x) - 1 for x in array[1,2,3,4,5]); ?column? --- {1,3,7,15,31} (1 row) -- Ildar Musin i.mu...@postgrespro.ru
Re: MAP syntax for arrays
Hello Tom, Ashutosh, On 07.05.2018 18:16, Tom Lane wrote: Ashutosh Bapat writes: Is there a way we can improve unnest() and array_agg() to match the performance you have specified by let's say optimizing the cases specially when those two are used together. Identifying that may be some work, but will not require introducing new syntax. +1. The first thing I thought on seeing this proposal was "I wonder how long it will be before the SQL committee introduces some syntax that uses the MAP keyword and breaks this". ISTM the planner could be taught to notice the combination of unnest and array_agg and produce a special output plan from that. It is, however, fair to wonder whether this is worth our time to optimize. I've not noticed a lot of people complaining about performance of this sort of thing, at least not since we fixed array_agg to not be O(N^2). The main point of this patch was about convenience; the performance thing came out later just as a side effect :) Many users are familiar with "map/reduce/filter" concept that many languages (not only functional ones) utilized. And my though was that it would be great to have those in postgres too. Apparently there is also a case when unnest/array_agg may not produce the result we're looking for. I mean multidimensional arrays. E.g. select array_agg(x * 2) from unnest(array[[1,2],[3,4],[5,6]]) as x; array_agg - {2,4,6,8,10,12} (1 row) select map(x * 2 for x in array[[1,2],[3,4],[5,6]]); ?column? --- {{2,4},{6,8},{10,12}} (1 row) array_agg produces plain arrays no matter what the input was. There is a new version of the patch in the attachment which introduces arbitrary per-element expressions (instead of single function call). So now user can specify a placeholder representing one element of the array and use it in the expressions. Like following: select map (pow(x, 2) - 1 for x in array[1,2,3]); ?column? --- {1,3,7,15,31} (1 row) -- Ildar Musin i.mu...@postgrespro.ru diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index e284fd7..85d76b2 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -886,6 +886,56 @@ ExecInitExprRec(Expr *node, ExprState *state, break; } + case T_MapExpr: + { +MapExpr *map = (MapExpr *) node; +ExprState *elemstate; +Oid resultelemtype; + +ExecInitExprRec(map->arrexpr, state, resv, resnull); + +resultelemtype = get_element_type(map->resulttype); +if (!OidIsValid(resultelemtype)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("target type is not an array"))); + +/* Construct a sub-expression for the per-element expression */ +elemstate = makeNode(ExprState); +elemstate->expr = map->elemexpr; +elemstate->parent = state->parent; +elemstate->ext_params = state->ext_params; +elemstate->innermost_caseval = (Datum *) palloc(sizeof(Datum)); +elemstate->innermost_casenull = (bool *) palloc(sizeof(bool)); + +ExecInitExprRec(map->elemexpr, elemstate, +&elemstate->resvalue, &elemstate->resnull); + +/* Append a DONE step and ready the subexpression */ +scratch.opcode = EEOP_DONE; +ExprEvalPushStep(elemstate, &scratch); +ExecReadyExpr(elemstate); + +scratch.opcode = EEOP_MAP; +scratch.d.map.elemexprstate = elemstate; +scratch.d.map.resultelemtype = resultelemtype; + +if (elemstate) +{ + /* Set up workspace for array_map */ + scratch.d.map.amstate = + (ArrayMapState *) palloc0(sizeof(ArrayMapState)); +} +else +{ + /* Don't need workspace if there's no subexpression */ + scratch.d.map.amstate = NULL; +} + +ExprEvalPushStep(state, &scratch); +break; + } + case T_OpExpr: { OpExpr *op = (OpExpr *) node; diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 9d6e25a..b2cbc45 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -328,6 +328,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) &&CASE_EEOP_FUNCEXPR_STRICT, &&CASE_EEOP_FUNCEXPR_FUSAGE, &&CASE_EEOP_FUNCEXPR_STRICT_FUSAGE, + &&CASE_EEOP_MAP, &&CASE_EEOP_BOOL_AND_STEP_FIRST, &&CASE_EEOP_BOOL_AND_STEP, &&CASE_EEOP_BOOL_AND_STEP_LAST, @@ -699,6 +700,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) EEO_NEXT(); } + EEO_CASE(EEOP_MAP) + { + ExecEvalMapExpr(state, op, econtext); + + EEO_NEXT(); + } + /* * If any of its clauses is FALSE, an AND's result is FALSE regardless * of the states of the rest of the clauses, so we can stop evaluating @@ -2230,6 +2238,33 @@ ExecEvalFuncExprStrictFusa
MAP syntax for arrays
Hello hackers, Recently I was working with sql arrays in postgres and it turned out that postgres doesn't have such very convinient functional constructions as map, reduce and filter. Currently to map function over array user has to make a subquery like: select u.* from my_table, lateral ( select array_agg(lower(elem)) from unnest(arr) as elem ) as u; Which is not only inconvenient but not very efficient as well (see 'Demo' section below). When I dug into the code I found that postgres already has the needed infrastructure for implementing map for arrays; actually array coercing already works that way (it basically maps cast function). In the attached patch there is a simple map implementation which introduces new expression type and syntax: MAP( OVER ) For example: SELECT MAP(upper OVER array['one', 'two', 'three']::text[]); ?column? - {ONE,TWO,THREE} (1 row) This is probably not the most useful notation and it would be better to have syntax for mapping arbitrary expressions over array, not just function. I'm struggling to come up with a good idea of how it should look like. It could look something like following: MAP( FOR IN ) For instance: SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]); Looking forward for community's suggestions! Demo Here is a small comparison between map and unnest/aggregate ways for per-element processing of arrays. Given a table with 1K rows which contains single column of text[] type. Each array contains 5/10/100 elements. create table my_table (arr text[]); insert into my_table select array_agg(md5(random()::text)) from generate_series(1, 1000) as rows, generate_series(1, 10) as elements group by rows; There are two scripts for pgbench. One for 'map' syntax: select map(upper over arr) from my_table; And one for unnest/aggregate: select u.* from my_table, lateral ( select array_agg(upper(elem)) from unnest(arr) as elem ) as u; Results are: elements per array | map (tps) | unnest/aggregate (tps) ++ 5 | 139.105359 | 74.434010 10 | 74.089743 | 43.622554 100 | 7.693000 |5.325805 Apparently map is more efficient for small arrays. And as the size of array increases the difference decreases. I'll be glad to any input from the community. Thanks! -- Ildar Musin i.mu...@postgrespro.ru diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index e284fd7..85d76b2 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -886,6 +886,56 @@ ExecInitExprRec(Expr *node, ExprState *state, break; } + case T_MapExpr: + { +MapExpr *map = (MapExpr *) node; +ExprState *elemstate; +Oid resultelemtype; + +ExecInitExprRec(map->arrexpr, state, resv, resnull); + +resultelemtype = get_element_type(map->resulttype); +if (!OidIsValid(resultelemtype)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("target type is not an array"))); + +/* Construct a sub-expression for the per-element expression */ +elemstate = makeNode(ExprState); +elemstate->expr = map->elemexpr; +elemstate->parent = state->parent; +elemstate->ext_params = state->ext_params; +elemstate->innermost_caseval = (Datum *) palloc(sizeof(Datum)); +elemstate->innermost_casenull = (bool *) palloc(sizeof(bool)); + +ExecInitExprRec(map->elemexpr, elemstate, +&elemstate->resvalue, &elemstate->resnull); + +/* Append a DONE step and ready the subexpression */ +scratch.opcode = EEOP_DONE; +ExprEvalPushStep(elemstate, &scratch); +ExecReadyExpr(elemstate); + +scratch.opcode = EEOP_MAP; +scratch.d.map.elemexprstate = elemstate; +scratch.d.map.resultelemtype = resultelemtype; + +if (elemstate) +{ + /* Set up workspace for array_map */ + scratch.d.map.amstate = + (ArrayMapState *) palloc0(sizeof(ArrayMapState)); +} +else +{ + /* Don't need workspace if there's no subexpression */ + scratch.d.map.amstate = NULL; +} + +ExprEvalPushStep(state, &scratch); +break; + } + case T_OpExpr: { OpExpr *op = (OpExpr *) node; diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 9d6e25a..b2cbc45 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -328,6 +328,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) &&
hostorder and failover_timeout for libpq
Hello hackers, Couple of years ago Victor Wagner presented a patch [1] that introduced multiple hosts capability and also hostorder and failover_timeout parameters for libpq. Subsequently multi-host feature was reimplemented by Robert Haas and committed. Later target_session_attrs parameter was also added. In this thread I want to revisit hostorder and failover_timeout proposal. 'hostorder' defines the order in which postgres instances listed in connection string will be tried. Possible values are: * sequential (default) * random Random order can be used, for instance, for maintaining load balancing (which is particularly useful in multi-master cluster, but also can be used to load-balance read-only connections to standbys). 'failover_timeout' specifies time span (in seconds) during which libpq would continue attempts to connect to the hosts listed in connection string. If failover_timeout is specified then libpq will loop over hosts again and again until either it successfully connects to one of the hosts or it runs out of time. I reimplemented 'hostorder' and 'failover_timeout' parameters in the attached patch. I also took some documentation pieces from Victor Wagner's original patch. I'll be glad to see any comments and suggestions. Thanks! [1] https://www.postgresql.org/message-id/flat/20150818041850.GA5092%40wagner.pp.ru -- Ildar Musin i.mu...@postgrespro.ru diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 800e68a..a9ba534 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -845,7 +845,7 @@ postgresql://localhost/mydb postgresql://user@localhost postgresql://user:secret@localhost postgresql://other@localhost/otherdb?connect_timeout=10&application_name=myapp -postgresql://host1:123,host2:456/somedb?target_session_attrs=any&application_name=myapp +postgresql://host1:123,host2:456/somedb?hostorder=random&target_session_attrs=any&application_name=myapp Components of the hierarchical part of the URI can also be given as parameters. For example: @@ -910,14 +910,15 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname Specifying Multiple Hosts - It is possible to specify multiple hosts to connect to, so that they are - tried in the given order. In the Keyword/Value format, the host, - hostaddr, and port options accept a comma-separated - list of values. The same number of elements must be given in each option, such - that e.g. the first hostaddr corresponds to the first host name, - the second hostaddr corresponds to the second host name, and so - forth. As an exception, if only one port is specified, it - applies to all the hosts. + It is possible to specify multiple hosts to connect to. In the Keyword/Value + format, the host, hostaddr, and + port options accept a comma-separated list of values. + The same number of elements must be given in each option, such that e.g. the + first hostaddr corresponds to the first host name, the second + hostaddr corresponds to the second host name, and so + forth. As an exception, if only one port is specified, + it applies to all the hosts. The order in which hosts are tried may be + specified by parameter. @@ -968,8 +969,7 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname Unix-domain sockets, the default is to connect to localhost. -A comma-separated list of host names is also accepted, in which case -each host name in the list is tried in order. See +A comma-separated list of host names is also accepted. See for details. @@ -1039,6 +1039,30 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname + + hostorder + + +Specifies the order in which hosts from the list of hosts +specified by the parameter are +tried. + + +If the value of this argument is sequential (the +default) connections to the hosts will be attempted in the order in +which they are given. + + +If the value is random, the host to connect to +will be randomly picked from the list. It allows load balacing between +several cluster nodes. However, PostgreSQL doesn't currently support +multimaster clusters. So, without the use of third-party products, +only read-only connections can take advantage from load-balancing. +See + + + + port @@ -1115,6 +1139,29 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname + + failover_timeout + + + Maximum time to cyclically retry all the hosts in the connection string. + (as decimal integer number of seconds). If not specified, then + hosts are tried just once.
Re: using index or check in ALTER TABLE SET NOT NULL
Hello Sergei, On 10.03.2018 12:35, Sergei Kornilov wrote: Hello My patch does not apply after commit 5748f3a0aa7cf78ac6979010273bd9d50869bb8e. Here is update to current master. Not null constraint is immutable too, so here is no changes in PartConstraintImpliedByRelConstraint excepts rename and comments fix. In this patch version i also revert tests to v4 state: i use DEBUG ereport instead INFO and code path not tested. Please tell me if i must change tests some way. regards, Sergei Ok, I can't think of any other ways to test it so I have to agree with Tom Lane i.e. rely only on coverage. (There also were another suggestion to use statistics [select seq_scan from pg_stat_user_tables where relid='test'::regclass] which show number of table scans. But statistics is collected by stat collector process with some latency and hence cannot be reliable for tests). Patch seems correct to me, it applies and compiles cleanly, docs compiles as well, tests pass. Changed status to Ready for Committer. -- Ildar Musin i.mu...@postgrespro.ru
Re: General purpose hashing func in pgbench
Hello Teodor, On 07.03.2018 16:21, Ildar Musin wrote: Turned out that the only big-endian machine I could run test on is out of order. I finally managed to perform this test on sparc v9 machine which is 64 bit big-endian architecture. I run pgbench script (see previous message) with default_seed=123 on both x86-64 and sparc machines and visualized the results. You can find them in the attached chart. Both images showed the same distribution. So endianness isn't a concern here. Thanks! -- Ildar Musin i.mu...@postgrespro.ru
Re: Failed to request an autovacuum work-item in silence
Hi, autovac_get_workitem_name() declaration seems redundant and should be removed. The same thing with including "utils/lsyscache.h" in brin.c. The 'requested' variable in brininsert() I would again rename to something like 'success' because a work item is requested anyway but what matters is whether the request was satisfied/successful. Except for those two points everything is fine and works as expected. -- Ildar Musin i.mu...@postgrespro.ru
Re: Failed to request an autovacuum work-item in silence
Hello, The patch applies and compiles cleanly, tests pass. The code is working as well. I was able to check it by simply creating a BRIN index and filling the table with data forcing the index to request lots of work items: create table test (a serial, b text); create index on test using brin (a) with (pages_per_range=1, autosummarize=true); insert into test select i, repeat(md5(random()::text), 30) from generate_series(1, 3000) i; LOG: could not request an autovacuum work item "BrinSummarizeRange" for "test_a_idx" LOG: could not request an autovacuum work item "BrinSummarizeRange" for "test_a_idx" ... Just couple remarks. I would rename 'requested' variable in AutoVacuumRequestWork() func to something like 'success' or 'result'. Because request is something caller does. And I would also rephrase log message as follows: request for autovacuum work item "%s" for "%s" failed Thanks! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: General purpose hashing func in pgbench
Hello Teodor, 1) Seems, it's good idea to add credits to Austin Appleby to comments. Done. Also rebased to the latest master. I think that both points refer to the fact that original algorithm accepts a byte string as an input, slices it up by 8 bytes and form unsigned int values from it. In that case the order of bytes in the input string does matter since it may result in different integers on different architectures. And it is also fair requirement for a byte string to be aligned as some architectures cannot handle unaligned data. In this patch though I slightly modified the original algorithm in a way that it takes unsigned ints as an input (not byte string), so neither of this points should be a problem as it seems to me. But I'll double check it on big-endian machine with strict alignment (Sparc). Turned out that the only big-endian machine I could run test on is out of order. Maybe someone has access to a big-endian machine? If so, could you please run some basic test and send me the resulting file? I've attached initialization script and pgbench script which could be run as follows: psql postgres -f hash_init.sql pgbench postgres -T60 -f hash_run.sql psql postgres -c "copy abc to '/tmp/hash_results.csv'" Thanks! -- Ildar Musin i.mu...@postgrespro.ru hash_init.sql Description: application/sql hash_run.sql Description: application/sql diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 5f28023..f07ddf1 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -874,13 +874,18 @@ pgbench options d -scale - current scale factor +client_id + unique number identifying the client session (starts from zero) -client_id - unique number identifying the client session (starts from zero) +default_seed + seed used in hash functions by default + + + +scale + current scale factor @@ -1246,6 +1251,27 @@ pgbench options d 5 + hash(a [, seed ] ) + integer + alias for hash_murmur2() + hash(10, 5432) + -5817877081768721676 + + + hash_fnv1a(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function";>FNV-1a hash + hash_fnv1a(10, 5432) + -7793829335365542153 + + + hash_murmur2(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/MurmurHash";>MurmurHash2 hash + hash_murmur2(10, 5432) + -5817877081768721676 + + int(x) integer cast to int @@ -1424,6 +1450,31 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / +Hash functions hash, hash_murmur2 and +hash_fnv1a accept an input value and an optional seed parameter. +In case the seed isn't provided the value of :default_seed +is used, which is initialized randomly unless set by the command-line +-D option. Hash functions can be used to scatter the +distribution of random functions such as random_zipfian or +random_exponential. For instance, the following pgbench +script simulates possible real world workload typical for social media and +blogging platforms where few accounts generate excessive load: + + +\set r random_zipfian(0, 1, 1.07) +\set k abs(hash(:r)) % 100 + + +In some cases several distinct distributions are needed which don't correlate +with each other and this is when implicit seed parameter comes in handy: + + +\set k1 abs(hash(:r), :default_seed + 123) % 100 +\set k2 abs(hash(:r), :default_seed + 321) % 100 + + + + As an example, the full definition of the built-in TPC-B-like transaction is: diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index e23ca51..fc42c47 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,6 +16,10 @@ #include "pgbench.h" +#define PGBENCH_NARGS_VARIABLE (-1) +#define PGBENCH_NARGS_CASE (-2) +#define PGBENCH_NARGS_HASH (-3) + PgBenchExpr *expr_parse_result; static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list); @@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, PgBenchExpr *expr) /* * List of available functions: * - fname: function name, "!..." for special internal functions - * - nargs: number of arguments - * -1 is a special value for least & greatest meaning #args >= 1 - * -2 is for the "CASE WHEN ..." function, which has #args >= 3 and odd + * - nargs: number of arguments. Special cases: + * - PGBENCH_NARGS_VARIABLE is a special value for least & greatest + * meaning #args >= 1; + * - PGBENCH_NARGS_CASE is for the "CASE WHEN ..." function, which + * has #args >
Re: using index or check in ALTER TABLE SET NOT NULL
Hello Sergei, Alvaro, Tom, On 06.03.2018 20:25, Alvaro Herrera wrote: You should be able to use an event trigger that raises a message when table_rewrite is hit, to notify the test driver that a rewrite happens. (If any DDL that causes a table rewrite fails to trigger the table_rewrite event correctly, that is a bug.) It seems that 'table_rewrite' event trigger isn't fired in case of simple scan (without actual rewrite). ISTM that depending on DEBUG messages is bad because debugging lines added elsewhere will make your tests fail; I think that between test depending on DEBUG message and no test at all the former one is preferable. Are there other options to test it? On 06.03.2018 21:51, Tom Lane wrote: Do you actually need test output proving that this code path was taken rather than the default one? Seems like looking at the code coverage report might be enough. Code coverage cannot guarantee correctness. I think there should be at least two tests: * full scan is performed when there are no check constraints that restrict NULL values; * full scan is omitted when there are such constraints. The last one could also include some unobvious cases like CHECK (col > 0), complex expressions with boolean operators etc. PS: Btw, some check constraints that implies NOT NULL still won't prevent from full table scan. For instance: # create table test (a int, b int check ((a + b) is not null)); CREATE TABLE # set client_min_messages to 'debug1'; SET # insert into test values (1, 1); INSERT 0 1 # alter table test alter column a set not null; DEBUG: verifying table "test" <<<< full table scan! ALTER TABLE Since operator+ (aka int4pl) is strict it returns NULL if at least one of its operands is NULL. And this check constraint ensures therefore that both columns 'a' and 'b' are NOT NULL. It isn't probably the issue of this patch, just something that someone may find interesting. -- Ildar Musin i.mu...@postgrespro.ru
Re: using index or check in ALTER TABLE SET NOT NULL
On 06.03.2018 16:12, Sergei Kornilov wrote: Hello thank you for review! Adding check constraint will also force the full table scan. So I think it would be better to rephrased it as follows: Agree. I updated docs in new attached patch slightly different Regarding regression tests it may be useful to set client_min_messages to 'debug1' before setting "NOT NULL" attribute for a column. In this case you can tell for sure that NotNullImpliedByRelConstraints() returned true (i.e. your code actually did the job) as the special debug message is printed to the log. I can not find any debug messages in tests: grep client_min_messages -irn src/test/ Only some warning level and few error. So i verify in regression tests only top-level behavior. Or this paragraph was for other people, not for tests? Sorry, probably I didn't explain it clearly enough. I meant that your regression tests can't distinguish cases when the full table scan was actually performed from the ones when it was skipped due to NotNullImpliedByRelConstraints() check. For instance, consider this piece from the test suite: # create table atacc1 (test_a int, test_b int); CREATE TABLE ... # alter table atacc1 add constraint atacc1_constr_a_valid check(test_a is not null); ALTER TABLE # alter table atacc1 alter test_a set not null; ALTER TABLE It is not obvious that full table scan was omitted. But if we set client_min_messages to 'debug1', we'll be able to see what is really happened by the different debug messages. For example: # create table atacc1 (test_a int, test_b int); CREATE TABLE ... # set client_min_messages to 'debug1'; SET # alter table atacc1 alter test_a set not null; DEBUG: verifying table "atacc1" <<<< full table scan! ALTER TABLE # alter table atacc1 alter test_a drop not null; ALTER TABLE # alter table atacc1 add constraint atacc1_constr_a_valid check(test_a is not null); DEBUG: verifying table "atacc1" ALTER TABLE # alter table atacc1 alter test_a set not null; DEBUG: verifying table "atacc1" NOT NULL constraint on test_a attribute by existed constraints <<<< full scan was skipped! ALTER TABLE -- Ildar Musin i.mu...@postgrespro.ru
Re: General purpose hashing func in pgbench
Hello Teodor, Thank you for reviewing this patch. On 06.03.2018 15:53, Teodor Sigaev wrote: Patch applies, compiles, pgbench & global "make check" ok, doc built ok. Agree. If I understand upthread correctly, implementation of Murmur hash algorithm based on Austin Appleby work https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp If so, I have notice and objections: 1) Seems, it's good idea to add credits to Austin Appleby to comments. Sounds fair, I'll send an updated version soon. 2) Reference implementaion directly says (link above): // 2. It will not produce the same results on little-endian and big-endian // machines. I don't think that is good thing for testing and benchmarking for several reasons: it could produce different data collection, different selects, different distribution. 3) Again, from comments of reference implementation: // Note - This code makes a few assumptions about how your machine behaves - // 1. We can read a 4-byte value from any address without crashing It's not true for all supported platforms. Any box with strict aligment will SIGBUSed here. I think that both points refer to the fact that original algorithm accepts a byte string as an input, slices it up by 8 bytes and form unsigned int values from it. In that case the order of bytes in the input string does matter since it may result in different integers on different architectures. And it is also fair requirement for a byte string to be aligned as some architectures cannot handle unaligned data. In this patch though I slightly modified the original algorithm in a way that it takes unsigned ints as an input (not byte string), so neither of this points should be a problem as it seems to me. But I'll double check it on big-endian machine with strict alignment (Sparc). Thanks! -- Ildar Musin i.mu...@postgrespro.ru
Re: using index or check in ALTER TABLE SET NOT NULL
Hello Sergei, I couldn't find any case when your code doesn't work properly. So it seems ok to me. @@ -220,6 +220,13 @@ WITH ( MODULUS numeric_literal, REM + Full table scan is performed to check that no existing row + in the table has null values in given column. It is possible to avoid + this scan by adding a valid CHECK constraint to + the table that would allow only NOT NULL values for given column. + Adding check constraint will also force the full table scan. So I think it would be better to rephrased it as follows: "Full table scan is performed to check that column doesn't contain NULL values unless there are valid check constraints that prohibit NULL values for specified column. In the latter case table scan is skipped." A native English speaker input would be helpful here. Regarding regression tests it may be useful to set client_min_messages to 'debug1' before setting "NOT NULL" attribute for a column. In this case you can tell for sure that NotNullImpliedByRelConstraints() returned true (i.e. your code actually did the job) as the special debug message is printed to the log. Thanks! -- Ildar Musin i.mu...@postgrespro.ru
Re: Proposal: partition pruning by secondary attributes
On 08.02.2018 21:01, Andres Freund wrote: On 2018-02-08 14:48:34 -0300, Alvaro Herrera wrote: Ildar Musin wrote: The idea is to store min and max values of secondary attributes (like 'id' in the example above) for each partition somewhere in catalog and use it for partition pruning along with partitioning key. You can think of it as somewhat like BRIN index but for partitions. What is the problem with having a BRIN index? Building plans to scan the individual partitions, lock them, open the relevant files, etc is often going to be significantly more expensive than pruning at plan time. But there also seems to be a number of fairly nasty locking issues with this proposal, leaving the amount of required code aside. Sorry, I probably didn't describe it right. I wasn't talking about using brin index for partition pruning or something like that, just used it as a reference to the idea. I'll try to explain it in more detailed way. Let's say we have a table to store some events, which is partitioned by timestamp column: CREATE TABLE events ( id serial, dt timestamp, ... ) PARTITION BY RANGE (dt); In some cases it is queried by 'dt' column and partition pruning is working fine because 'dt' is a partitioning key: EXPLAIN (COSTS OFF) SELECT ... FROM events WHERE dt >= '2018-01-01' AND dt < '2018-02-01'; Append -> Seq Scan on events_0 Filter: ((dt >= '2018-01-01 00:00:00'::timestamp without time zone) AND (dt < '2018-02-01 00:00:00'::timestamp without time zone)) But if we filter the table by 'id' then planner has no other way but to append every partition to the plan. EXPLAIN (COSTS OFF) SELECT * FROM events WHERE id = 123; Append -> Seq Scan on events_0 Filter: (id = 123) -> Seq Scan on events_1 Filter: (id = 123) -> Seq Scan on events_2 Filter: (id = 123) We can see though that values of 'dt' and 'id' both monotonically increase over time and so we can potentially use 'id' column to do partition pruning at plan time too. To do so we need to store min and max values of 'id' column per partition somewhere in catalog and use them to decide which partition should be added to the plan by matching them to the query restrictions. Each time table is updated we must check whether new value exceeds stored min/max values and update those too if needed. This raises few issues. One of them as Ashutosh Bapat mentioned is the need to change catalog very often which could result in high catalog contention. I can't think of comprehensive solution for this problem. But for numerical and datetime types we could shift min or max bounds with margin so that not every update will result in catalog change. The other one is the need to change min/max of partition if rows were removed which is less evil since we can postpone it and do it later (during autovacuum for instance). A new command for ALTER TABLE should also be introduced to specify the column or expression which is not a partition key but can be used for partition pruning as described above. This command would scan each partition, gather min/max values and store them into catalog. What do you think? -- Ildar Musin i.mu...@postgrespro.ru
Proposal: partition pruning by secondary attributes
Hello, hackers! Sorry if this have already been discussed. I've had this idea some time ago and then successfully forgot about it until pgconf.ru, where I had a conversation with one of postgres users. His situation could be described as this: they have a table with id, timestamp and some other attributes, which is partitioned by (let's say) timestamp column. In different contexts they may want to filter the table either by id or by timestamp attribute (but not both). In this case pruning will only work for timestamp column. The idea is to store min and max values of secondary attributes (like 'id' in the example above) for each partition somewhere in catalog and use it for partition pruning along with partitioning key. You can think of it as somewhat like BRIN index but for partitions. And it will have similar limitations. For example, we may benefit if secondary attribute values are monotonically increase or decrease, but would be unhelpful if they are scattered, or if table wasn't partitioned by range. I wanted to ask community's opinion would it be worth considering. Thanks! -- Ildar Musin i.mu...@postgrespro.ru
Re: [HACKERS] Custom compression methods
Hello Ildus, On 29.01.2018 14:44, Ildus Kurbangaliev wrote: Thanks! Attached new version of the patch. Patch applies cleanly, builds without any warnings, documentation builds ok, all tests pass. A remark for the committers. The patch is quite big, so I really wish more reviewers looked into it for more comprehensive review. Also a native english speaker should check the documentation and comments. Another thing is that tests don't cover cmdrop method because the built-in pglz compression doesn't use it (I know there is an jsonbd extension [1] based on this patch and which should benefit from cmdrop method, but it doesn't test it either yet). I think I did what I could and so passing this patch to committers for the review. Changed status to "Ready for committer". [1] https://github.com/postgrespro/jsonbd -- Ildar Musin i.mu...@postgrespro.ru
Re: General purpose hashing func in pgbench
On 29.01.2018 15:03, Fabien COELHO wrote: Patch applies, compiles, pgbench & global "make check" ok, doc built ok. Ok for me, switched to "Ready". Thank you for the thorough review! -- Ildar Musin i.mu...@postgrespro.ru
Re: General purpose hashing func in pgbench
Hi Fabien, On 28.01.2018 11:10, Fabien COELHO wrote: Hello Ildar, I did everything you mention here and attached a new version on the patch. Patch applies, compiles, runs ok. Alas, I still have a few more very minor comments about the doc, sorry again: No problem : ) +default_seed + random seed used in hash functions by default s/random //: the seed may or may not be random. The "In some cases several distinct distributions..." paragraph is also just one line in the xml source file. It should be justified at about 80 columns like others. Fixed the doc, attached the patch. Thanks! -- Ildar Musin i.mu...@postgrespro.ru diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 3dd492c..b2a9c9d 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -874,13 +874,18 @@ pgbench options d -scale - current scale factor +client_id + unique number identifying the client session (starts from zero) -client_id - unique number identifying the client session (starts from zero) +default_seed + seed used in hash functions by default + + + +scale + current scale factor @@ -1246,6 +1251,27 @@ pgbench options d 5 + hash(a [, seed ] ) + integer + alias for hash_murmur2() + hash(10, 5432) + -5817877081768721676 + + + hash_fnv1a(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function";>FNV-1a hash + hash_fnv1a(10, 5432) + -7793829335365542153 + + + hash_murmur2(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/MurmurHash";>MurmurHash2 hash + hash_murmur2(10, 5432) + -5817877081768721676 + + int(x) integer cast to int @@ -1423,6 +1449,31 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / +Hash functions hash, hash_murmur2 and +hash_fnv1a accept an input value and an optional seed parameter. +In case the seed isn't provided the value of :default_seed +is used, which is initialized randomly unless set by the command-line +-D option. Hash functions can be used to scatter the +distribution of random functions such as random_zipfian or +random_exponential. For instance, the following pgbench +script simulates possible real world workload typical for social media and +blogging platforms where few accounts generate excessive load: + + +\set r random_zipfian(0, 1, 1.07) +\set k abs(hash(:r)) % 100 + + +In some cases several distinct distributions are needed which don't correlate +with each other and this is when implicit seed parameter comes in handy: + + +\set k1 abs(hash(:r), :default_seed + 123) % 100 +\set k2 abs(hash(:r), :default_seed + 321) % 100 + + + + As an example, the full definition of the built-in TPC-B-like transaction is: diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index e23ca51..fc42c47 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,6 +16,10 @@ #include "pgbench.h" +#define PGBENCH_NARGS_VARIABLE (-1) +#define PGBENCH_NARGS_CASE (-2) +#define PGBENCH_NARGS_HASH (-3) + PgBenchExpr *expr_parse_result; static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list); @@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, PgBenchExpr *expr) /* * List of available functions: * - fname: function name, "!..." for special internal functions - * - nargs: number of arguments - * -1 is a special value for least & greatest meaning #args >= 1 - * -2 is for the "CASE WHEN ..." function, which has #args >= 3 and odd + * - nargs: number of arguments. Special cases: + * - PGBENCH_NARGS_VARIABLE is a special value for least & greatest + * meaning #args >= 1; + * - PGBENCH_NARGS_CASE is for the "CASE WHEN ..." function, which + * has #args >= 3 and odd; + * - PGBENCH_NARGS_HASH is for hash functions, which have one required + * and one optional argument; * - tag: function identifier from PgBenchFunction enum */ static const struct @@ -259,10 +267,10 @@ static const struct "abs", 1, PGBENCH_ABS }, { - "least", -1, PGBENCH_LEAST + "least", PGBENCH_NARGS_VARIABLE, PGBENCH_LEAST }, { - "greatest", -1, PGBENCH_GREATEST + "greatest", PGBENCH_NARGS_VARIABLE, PGBENCH_GREATEST }, { "debug", 1, PGBENCH_DEBUG @@ -347,7 +355,25 @@ static const struct }, /* "case when ... then ... else ... end" construction */ { - "!case_end", -2, PGBENCH_CASE + "!case_end",
Re: General purpose hashing func in pgbench
Hello Fabien, 26/01/2018 09:28, Fabien COELHO пишет: > > Hello Ildar, > > Applies, compiles, runs. > > I still have a few very minor comments, sorry for this (hopefully) > last iteration from my part. I'm kind of iterative... > > The XML documentation source should avoid a paragraph on one very long > line, but rather be indented like other sections. > > I'd propose simplify the second part: > > Hash functions can be used, for example, to modify the distribution of > random_zipfian or > random_exponential > functions in order to obtain scattered distribution. > Thus the following pgbench script simulates possible real world > workload > typical for social media and blogging platforms where few accounts > generate excessive load: > > -> > > Hash functions can be used to scatter the distribution of random > functions such as random_zipfian or > random_exponential. > For instance, the following pgbench script simulates possible real > world workload typical for social media and blogging platforms where > few accounts generate excessive load: > > Comment the Assert(0) as an internal error that cannot happen. > > I'd suggest to compact the execution code by declaring int64 variable > and coerce to int in one go, like the integer bitwise functions. I'm > in favor to keeping them in their own case and not reuse this one. > I did everything you mention here and attached a new version on the patch. Thank you! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 3dd492c..503dd79 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -874,13 +874,18 @@ pgbench options d -scale - current scale factor +client_id + unique number identifying the client session (starts from zero) -client_id - unique number identifying the client session (starts from zero) +default_seed + random seed used in hash functions by default + + + +scale + current scale factor @@ -1246,6 +1251,27 @@ pgbench options d 5 + hash(a [, seed ] ) + integer + alias for hash_murmur2() + hash(10, 5432) + -5817877081768721676 + + + hash_fnv1a(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function";>FNV-1a hash + hash_fnv1a(10, 5432) + -7793829335365542153 + + + hash_murmur2(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/MurmurHash";>MurmurHash2 hash + hash_murmur2(10, 5432) + -5817877081768721676 + + int(x) integer cast to int @@ -1423,6 +1449,30 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / +Hash functions hash, hash_murmur2 and +hash_fnv1a accept an input value and an optional seed parameter. +In case the seed isn't provided the value of :default_seed +is used, which is initialized randomly unless set by the command-line +-D option. Hash functions can be used to scatter the +distribution of random functions such as random_zipfian or +random_exponential. For instance, the following pgbench +script simulates possible real world workload typical for social media and +blogging platforms where few accounts generate excessive load: + + +\set r random_zipfian(0, 1, 1.07) +\set k abs(hash(:r)) % 100 + + +In some cases several distinct distributions are needed which don't correlate with each other and this is when implicit seed parameter comes in handy: + + +\set k1 abs(hash(:r), :default_seed + 123) % 100 +\set k2 abs(hash(:r), :default_seed + 321) % 100 + + + + As an example, the full definition of the built-in TPC-B-like transaction is: diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index e23ca51..fc42c47 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,6 +16,10 @@ #include "pgbench.h" +#define PGBENCH_NARGS_VARIABLE (-1) +#define PGBENCH_NARGS_CASE (-2) +#define PGBENCH_NARGS_HASH (-3) + PgBenchExpr *expr_parse_result; static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list); @@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, PgBenchExpr *expr) /* * List of available functions: * - fname: function name, "!..." for special internal functions - * - nargs: number of arguments - * -1 is a special value for least & greatest meaning #args >= 1 - * -2
Re: [HACKERS] Custom compression methods
Hello Ildus, I continue reviewing your patch. Here are some thoughts. 1. When I set column storage to EXTERNAL then I cannot set compression. Seems reasonable: create table test(id serial, msg text); alter table test alter column msg set storage external; alter table test alter column msg set compression pg_lz4; ERROR: storage for "msg" should be MAIN or EXTENDED But if I reorder commands then it's ok: create table test(id serial, msg text); alter table test alter column msg set compression pg_lz4; alter table test alter column msg set storage external; \d+ test Table "public.test" Column | Type | ... | Storage | Compression +-+ ... +--+- id | integer | ... | plain| msg| text| ... | external | pg_lz4 So we could either allow user to set compression settings even when storage is EXTERNAL but with warning or prohibit user to set compression and external storage at the same time. The same thing is with setting storage PLAIN. 2. I think TOAST_COMPRESS_SET_RAWSIZE macro could be rewritten like following to prevent overwriting of higher bits of 'info': ((toast_compress_header *) (ptr))->info = \ ((toast_compress_header *) (ptr))->info & ~RAWSIZEMASK | (len); It maybe does not matter at the moment since it is only used once, but it could save some efforts for other developers in future. In TOAST_COMPRESS_SET_CUSTOM() instead of changing individual bits you may do something like this: #define TOAST_COMPRESS_SET_CUSTOM(ptr) \ do { \ ((toast_compress_header *) (ptr))->info = \ ((toast_compress_header *) (ptr))->info & RAWSIZEMASK | ((uint32) 0x02 << 30) \ } while (0) Also it would be nice if bit flags were explained and maybe replaced by a macro. 3. In AlteredTableInfo, BulkInsertStateData and some functions (eg toast_insert_or_update) there is a hash table used to keep preserved compression methods list per attribute. I think a simple array of List* would be sufficient in this case. 4. In optionListToArray() you can use list_qsort() to sort options list instead of converting it manually into array and then back to a list. 5. Redundunt #includes: In heap.c: #include "access/reloptions.h" In tsvector.c: #include "catalog/pg_type.h" #include "common/pg_lzcompress.h" In relcache.c: #include "utils/datum.h" 6. Just a minor thing: no reason to change formatting in copy.c - heap_insert(resultRelInfo->ri_RelationDesc, tuple, mycid, - hi_options, bistate); + heap_insert(resultRelInfo->ri_RelationDesc, tuple, + mycid, hi_options, bistate); 7. Also in utility.c the extra new line was added which isn't relevant for this patch. 8. In parse_utilcmd.h the 'extern' keyword was removed from transformRuleStmt declaration which doesn't make sense in this patch. 9. Comments. Again, they should be read by a native speaker. So just a few suggestions: toast_prepare_varlena() - comment needed invalidate_amoptions_cache() - comment format doesn't match other functions in the file In htup_details.h: /* tuple contain custom compressed * varlenas */ should be "contains" -- Ildar Musin i.mu...@postgrespro.ru
Re: [HACKERS] Custom compression methods
Hello Ildus, On 23.01.2018 16:04, Ildus Kurbangaliev wrote: On Mon, 22 Jan 2018 23:26:31 +0300 Ildar Musin wrote: Thanks for review! Attached new version of the patch. Fixed few bugs, added more documentation and rebased to current master. You need to rebase to the latest master, there are some conflicts. I've applied it to the three days old master to try it. Done. As I can see the documentation is not yet complete. For example, there is no section for ALTER COLUMN ... SET COMPRESSION in ddl.sgml; and section "Compression Access Method Functions" in compression-am.sgml hasn't been finished. Not sure about ddl.sgml, it contains more common things, but since postgres contains only pglz by default there is not much to show. I've implemented an extension [1] to understand the way developer would go to work with new infrastructure. And for me it seems clear. (Except that it took me some effort to wrap my mind around varlena macros but it is probably a different topic). I noticed that you haven't cover 'cmdrop' in the regression tests and I saw the previous discussion about it. Have you considered using event triggers to handle the drop of column compression instead of 'cmdrop' function? This way you would kill two birds with one stone: it still provides sufficient infrastructure to catch those events (and it something postgres already has for different kinds of ddl commands) and it would be easier to test. I have added support for event triggers for ALTER SET COMPRESSION in current version. Event trigger on ALTER can be used to replace cmdrop function but it will be far from trivial. There is not easy way to understand that's attribute compression is really dropping in the command. I've encountered unexpected behavior in command 'CREATE TABLE ... (LIKE ...)'. It seems that it copies compression settings of the table attributes no matter which INCLUDING options are specified. E.g. create table xxx(id serial, msg text compression pg_lz4); alter table xxx alter column msg set storage external; \d+ xxx Table "public.xxx" Column | Type | ... | Storage | Compression | +-+ ... +--+-+ id | integer | ... | plain| | msg| text| ... | external | pg_lz4 | Now copy the table structure with "INCLUDING ALL": create table yyy (like xxx including all); \d+ yyy Table "public.yyy" Column | Type | ... | Storage | Compression | +-+ ... +--+-+ id | integer | ... | plain| | msg| text| ... | external | pg_lz4 | And now copy without "INCLUDING ALL": create table zzz (like xxx); \d+ zzz Table "public.zzz" Column | Type | ... | Storage | Compression | +-+ ... +--+-+ id | integer | ... | plain| | msg| text| ... | extended | pg_lz4 | As you see, compression option is copied anyway. I suggest adding new INCLUDING COMPRESSION option to enable user to explicitly specify whether they want or not to copy compression settings. I found a few phrases in documentation that can be improved. But the documentation should be checked by a native speaker. In compression-am.sgml: "an compression access method" -> "a compression access method" "compression method method" -> "compression method" "compability" -> "compatibility" Probably "local-backend cached state" would be better to replace with "per backend cached state"? "Useful to store the parsed view of the compression options" -> "It could be useful for example to cache compression options" "and stores result of" -> "and stores the result of" "Called when CompressionAmOptions is creating." -> "Called when CompressionAmOptions is being initialized" "Note that in any system cache invalidation related with pg_attr_compression relation the options will be cleaned" -> "Note that any pg_attr_compression relation invalidation will cause all the cached acstate options cleared." "Function used to ..." -> "Function is used to ..." I think it would be nice to mention custom compression methods in storage.sgml. At this moment it only mentions built-in pglz compression. -- Ildar Musin i.mu...@postgrespro.ru
Re: General purpose hashing func in pgbench
Hello Fabien, On 18.01.2018 12:06, Fabien COELHO wrote: My intention was to introduce seed variable which potentially could be used in different contexts, not only for hash functions. Yes, good point. I'd need it for the pseudo-random permutation function. I renamed it to 'hash_seed' for now. But what do you think about naming it simply 'seed' or choosing some other general name? ISTM "seed" that is prone to being used for something else in a script. What about ":default_seed", which says it all? Sounds good, renamed to "default_seed". Some minor suggestions: In "make_func", I'd assert that nargs is positive in the default branch. Added assert for non-negative nargs (since there is pi() function with zero arguments). The default seed may be created with just one assignment instead of repeated |= ops. Or not:-) Tried this, but after applying pgindent it turns into a mess : ) So I left it in the initial form. In evalStandardFunc, I'd suggest to avoid the ? construction and use an if and a direct setIntValue in both case, removing the "result" variable, as done in other branches (eg bitwise operators...). Maybe something like: if (func == murmur2) setIntValue(retval, getHashXXX(val, seed)); else if (...) ... else Assert(0); so that it is structurally ready for other hash functions if needed. Done. Probably if more functions are added it would make sense to change it to "switch". Documentation: In the table, use official names in the description, and add wikipedia links, something like: FNV hash -> https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function";>FNV-1a hash murmur2 hash -> https://en.wikipedia.org/wiki/MurmurHash";>MurmurHash2 hash In the text: "Hash functions accepts" -> "Hash functions hash, <..> and <> accept*NO S*" "... provided hash_seed value is used" => "... provided the value of :hash_seed is used, which is initialized randomly unless set by the command-line -D option." ISTM that the Automatic Variable table should be in alphabetical order. Updated the documentation. Thanks! -- Ildar Musin i.mu...@postgrespro.ru diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 3dd492c..f3a4a4f 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -874,13 +874,18 @@ pgbench options d -scale - current scale factor +client_id + unique number identifying the client session (starts from zero) -client_id - unique number identifying the client session (starts from zero) +default_seed + random seed used in hash functions by default + + + +scale + current scale factor @@ -1246,6 +1251,27 @@ pgbench options d 5 + hash(a [, seed ] ) + integer + alias for hash_murmur2() + hash(10, 5432) + -5817877081768721676 + + + hash_fnv1a(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function";>FNV-1a hash + hash_fnv1a(10, 5432) + -7793829335365542153 + + + hash_murmur2(a [, seed ] ) + integer + https://en.wikipedia.org/wiki/MurmurHash";>MurmurHash2 hash + hash_murmur2(10, 5432) + -5817877081768721676 + + int(x) integer cast to int @@ -1423,6 +1449,22 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / +Hash functions hash, hash_murmur2 and hash_fnv1a accept an input value and an optional seed parameter. In case the seed isn't provided the value of :default_seed is used, which is initialized randomly unless set by the command-line -D option. Hash functions can be used, for example, to modify the distribution of random_zipfian or random_exponential functions in order to obtain scattered distribution. Thus the following pgbench script simulates possible real world workload typical for social media and blogging platforms where few accounts generate excessive load: + + +\set r random_zipfian(0, 1, 1.07) +\set k abs(hash(:r)) % 100 + + +In some cases several distinct distributions are needed which don't correlate with each other and this is when implicit seed parameter comes in handy: + + +\set k1 abs(hash(:r), :default_seed + 123) % 100 +\set k2 abs(hash(:r), :default_seed + 321) % 100 + + + + As an example, the full definition of the built-in TPC-B-like transaction is: diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index e23ca51..fc42c47 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,6 +16,10 @@
Re: [HACKERS] Custom compression methods
Hello Ildus, 15/01/2018 00:49, Ildus Kurbangaliev пишет: > Attached a new version of the patch. Main changes: > > * compression as an access method > * pglz as default compression access method. > * PRESERVE syntax for tables rewrite control. > * pg_upgrade fixes > * support partitioned tables. > * more tests. > You need to rebase to the latest master, there are some conflicts. I've applied it to the three days old master to try it. As I can see the documentation is not yet complete. For example, there is no section for ALTER COLUMN ... SET COMPRESSION in ddl.sgml; and section "Compression Access Method Functions" in compression-am.sgml hasn't been finished. I've implemented an extension [1] to understand the way developer would go to work with new infrastructure. And for me it seems clear. (Except that it took me some effort to wrap my mind around varlena macros but it is probably a different topic). I noticed that you haven't cover 'cmdrop' in the regression tests and I saw the previous discussion about it. Have you considered using event triggers to handle the drop of column compression instead of 'cmdrop' function? This way you would kill two birds with one stone: it still provides sufficient infrastructure to catch those events (and it something postgres already has for different kinds of ddl commands) and it would be easier to test. Thanks! [1] https://github.com/zilder/pg_lz4 -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: General purpose hashing func in pgbench
Hello Fabien, 17/01/2018 10:52, Fabien COELHO пишет: >> Here is a new version of patch. I've splitted it into two parts. The >> first one is almost the same as v4 from [1] with some refactoring. >> The second part introduces random_seed variable as you proposed. > > Patch 1 applies. Compilations fails, there are two "hash_seed" > declared in "pgbench.c". > > Patch 2 applies cleanly on top of the previous patch and compiles, > because the variable is removed... > > If an ":hash_seed" pgbench variable is used, ISTM that there is no > need for a global variable at all, so the two patches are going back > and forth, which is unhelpful. ISTM better to provide just one > combined patch for the feature. > > If the hash_seed variable really needs to be kept, it should be an > "int64" variable, like other pgbench values. > > The len < 1 || len > 2 is checked twice, once in the "switch", on in > an "if" just after the "switch". Once is enough. I totally messed up doing git rebase and didn't double check the code. *facepalm* There shouldn't be hash_seed variable and the second 'len < 1 || len > 2' check. Sorry for that, fixed in the attached patch. > Calling random just usually initializes about 31 bits, so random > should be called 2 or maybe 3 times? Or maybe use the internal getrand > which has 48 pseudorandom bits? Done. I used code from get_random_uint64() as an example. > > For me "random seed" is what is passed to srandom, so the variable > should rather be named hash_seed because there could also be a random > seed (actually, there is in another parallel patch:-). Moreover, this > seed may or may not be random if set, so calling it "random_seed" is > not desirable. > My intention was to introduce seed variable which potentially could be used in different contexts, not only for hash functions. I renamed it to 'hash_seed' for now. But what do you think about naming it simply 'seed' or choosing some other general name? >> I didn't do the executor simplification thing yet because I'm a >> little concerned about inventive users, who may want to change >> random_seed variable in runtime (which is possible since pgbench >> doesn't have read only variables aka constants AFAIK). > > If the user choses to overide hash_seed in their script, it is their > decision, the documentation has only to be clear about :hash_seed > being the default seed. I see no clear reason to work around this > possibility by evaluating the seed at parse time, especially as the > variable may not have its final value yet depending on the option > order. I'd suggest to just use make_variable("hash_seed") for the > default second argument and simplify the executor. That is a great idea, I didn't see that possibility. Done. > > The seed variable is not tested explicitely in the script, you could add > a "hash(5432) == hash(5432, :hash_seed)" for instance. > > It would be nice if an native English speaker could proofread the > documentation text. I'd suggest: "*an* optional seed parameter". "In > case *the* seed...". ":hash_seed". "shared for" -> > "shared by". "following listing" -> "following pgbench script". "few > accounts generates" -> "few accounts generate". > Done as well. > For the document example, I'd use larger values for the random & > modulo, eg 1 and 100. The drawback is that zipfian does a > costly computation of the generalized harmonic number when the > parameter is lower than 1.0. For cities, the parameter found by Zipf > was 1.07 (says Wikipedia). Maybe use this historical value. Or maybe > use an exponential distribution in the example. > Changed parameter to 1.07. Thanks! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 3dd492c..c31254c 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -882,6 +882,11 @@ pgbench options d client_id unique number identifying the client session (starts from zero) + + +hash_seed + random seed used in hash functions by default + @@ -1246,6 +1251,27 @@ pgbench options d 5 + hash(a [, seed ] ) + integer + alias for hash_murmur2() + hash(10, 5432) + -5817877081768721676 + + + hash_fnv1a(a [, seed ] ) + integer + FNV hash + hash_fnv1a(10, 5432) + -779382933536
Re: General purpose hashing func in pgbench
Hi Fabien, 13/01/2018 11:16, Fabien COELHO пишет: > > Hello Ildar, > >>> so that different instances of hash function within one script would >>> have different seeds. Yes, that is a good idea, I can do that. >>> >> Added this feature in attached patch. But on a second thought this could >> be something that user won't expect. For example, they may want to run >> pgbench with two scripts: >> - the first one updates row by key that is a hashed random_zipfian >> value; >> - the second one reads row by key generated the same way >> (that is actually what YCSB workloads A and B do) >> >> It feels natural to write something like this: >> \set rnd random_zipfian(0, 100, 0.99) >> \set key abs(hash(:rnd)) % 1000 >> in both scripts and expect that they both would have the same >> distribution. But they wouldn't. We could of course describe this >> implicit behaviour in documentation, but ISTM that shared seed would be >> more clear. > > I think that it depends on the use case, that both can be useful, so > there should be a way to do either. > > With "always different" default seed, distinct distributions are achieved > with: > > -- DIFF distinct seeds inside and between runs > \set i1 abs(hash(:r1)) % 1000 > \set j1 abs(hash(:r2)) % 1000 > > and the same distribution can be done with an explicit seed: > > -- DIFF same seed inside and between runs > \set i1 abs(hash(:r1), 5432) % 1000 > \set j1 abs(hash(:r2), 5432) % 1000 > > The drawback is that the same seed is used between runs in this case, > which is not desirable. This could be circumvented by adding the > random seed as an automatic variable and using it, eg: > > -- DIFF same seed inside run, distinct between runs > \set i1 abs(hash(:r1), :random_seed + 5432) % 1000 > \set j1 abs(hash(:r2), :random_seed + 2345) % 1000 > > > Now with a shared hash_seed the same distribution is by default: > > -- SHARED same underlying hash_seed inside run, distinct between runs > \set i1 abs(hash(:r1)) % 1000 > \set j1 abs(hash(:r2)) % 1000 > > However some trick is needed now to get distinct seeds. With > > -- SHARED distinct seed inside run, but same between runs > \set i1 abs(hash(:r1, 5432)) % 1000 > \set j1 abs(hash(:r2, 2345)) % 1000 > > We are back to the same issue has the previous case because then the > distribution is the same from one run to the next, which is not > desirable. I found this workaround trick: > > -- SHARED distinct seeds inside and between runs > \set i1 abs(hash(:r1, hash(5432))) % 1000 > \set j1 abs(hash(:r2, hash(2345))) % 1000 > > Or with a new :hash_seed or :random_seed automatic variable, we could > also have: > > -- SHARED distinct seeds inside and between runs > \set i1 abs(hash(:r1, :hash_seed + 5432)) % 1000 > \set j1 abs(hash(:r2, :hash_seed + 2345)) % 1000 > > It provides controllable distinct seeds between runs but equal one > between if desired, by reusing the same value to be hashed as a seed. > > I also agree with your argument that the user may reasonably expect > that hash(5432) == hash(5432) inside and between scripts, at least on > the same run, so would be surprised that it is not the case. > > So I've changed my mind, I'm sorry for making you going back and forth > on the subject. I'm now okay with one shared 64 bit hash seed, with a > clear documentation about the fact, and an outline of the trick to > achieve distinct distributions inside a run if desired and why it > would be desirable to avoid correlations. Also, I think that providing > the seed as automatic variable (:hash_seed or :hseed or whatever) > would make some sense as well. Maybe this could be used as a way to > fix the seed explicitely, eg: > > pgbench -D hash_seed=1234 ... > > Would use this value instead of the random generated one. Also, with > that the default inserted second argument could be simply > ":hash_seed", which would simplify the executor which would not have > to do check for an optional second argument. > Here is a new version of patch. I've splitted it into two parts. The first one is almost the same as v4 from [1] with some refactoring. The second part introduces random_seed variable as you proposed. I didn't do the executor simplification thing yet because I'm a little concerned about inventive users, who may want to change random_seed variable in runtime (which is possible since pgbench doesn't have read only variables aka constants AFAIK). [1] https://www.postgresql.org/message-id/43a8fbbb-32fa-6478-30a9-f64041adf019%40postgrespro.ru -- Ildar Musin Pos
Re: Minor fix for pgbench documentation
Hello Fabien, 13/01/2018 19:30, Fabien COELHO пишет: > >> Here is a patch that adds missing random_zipfian func to the paragraph >> in pgbench documentation about random functions parameterization. > > Indeed. > > Patch applies cleanly, doc build ok. Marked as "ready". > > I have added it to the next commitfest. > Thank you for your review! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: General purpose hashing func in pgbench
Hello Fabien, 11/01/2018 19:21, Ildar Musin пишет: > > 10/01/2018 21:42, Fabien COELHO пишет: >> Hmm. I do not think that we should want a shared seed value. The seed >> should be different for each call so as to avoid undesired >> correlations. If wanted, correlation could be obtained by using an >> explicit identical seed. >> >> ISTM that the best way to add the seed is to call random() when the >> second arg is missing in make_func. Also, this means that the executor >> would always get its two arguments, so it would simplify the code there. >> > Ok, I think I understand what you meant. You meant the case like following: > > \set x random(1, 100) > \set h1 hash(:x) > \set h2 hash(:x) -- will have different seed from h1 > > so that different instances of hash function within one script would > have different seeds. Yes, that is a good idea, I can do that. > Added this feature in attached patch. But on a second thought this could be something that user won't expect. For example, they may want to run pgbench with two scripts: - the first one updates row by key that is a hashed random_zipfian value; - the second one reads row by key generated the same way (that is actually what YCSB workloads A and B do) It feels natural to write something like this: \set rnd random_zipfian(0, 100, 0.99) \set key abs(hash(:rnd)) % 1000 in both scripts and expect that they both would have the same distribution. But they wouldn't. We could of course describe this implicit behaviour in documentation, but ISTM that shared seed would be more clear. Thanks! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 3dd492c..c575f19 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1246,6 +1246,27 @@ pgbench options d 5 + hash(a [, seed ] ) + integer + alias for hash_murmur2() + hash(10, 5432) + -5817877081768721676 + + + hash_fnv1a(a [, seed ] ) + integer + FNV hash + hash_fnv1a(10, 5432) + -7793829335365542153 + + + hash_murmur2(a [, seed ] ) + integer + murmur2 hash + hash_murmur2(10, 5432) + -5817877081768721676 + + int(x) integer cast to int diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index e23ca51..36cad30 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,6 +16,10 @@ #include "pgbench.h" +#define PGBENCH_NARGS_VARIABLE (-1) +#define PGBENCH_NARGS_CASE (-2) +#define PGBENCH_NARGS_HASH (-3) + PgBenchExpr *expr_parse_result; static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list); @@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, PgBenchExpr *expr) /* * List of available functions: * - fname: function name, "!..." for special internal functions - * - nargs: number of arguments - * -1 is a special value for least & greatest meaning #args >= 1 - * -2 is for the "CASE WHEN ..." function, which has #args >= 3 and odd + * - nargs: number of arguments. Special cases: + * - PGBENCH_NARGS_VARIABLE is a special value for least & greatest + * meaning #args >= 1; + * - PGBENCH_NARGS_CASE is for the "CASE WHEN ..." function, which + * has #args >= 3 and odd; + * - PGBENCH_NARGS_HASH is for hash functions, which have one required + * and one optional argument; * - tag: function identifier from PgBenchFunction enum */ static const struct @@ -259,10 +267,10 @@ static const struct "abs", 1, PGBENCH_ABS }, { - "least", -1, PGBENCH_LEAST + "least", PGBENCH_NARGS_VARIABLE, PGBENCH_LEAST }, { - "greatest", -1, PGBENCH_GREATEST + "greatest", PGBENCH_NARGS_VARIABLE, PGBENCH_GREATEST }, { "debug", 1, PGBENCH_DEBUG @@ -347,7 +355,16 @@ static const struct }, /* "case when ... then ... else ... end" construction */ { - "!case_end", -2, PGBENCH_CASE + "!case_end", PGBENCH_NARGS_CASE, PGBENCH_CASE + }, + { + "hash", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2 + }, + { + "hash_murmur2", PGBENCH_NARGS_HASH, PGBENCH_HASH_MURMUR2 + }, + { + "hash_fnv1a", PGBENCH_N
Minor fix for pgbench documentation
Hi hackers, Here is a patch that adds missing random_zipfian func to the paragraph in pgbench documentation about random functions parameterization. -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 3dd492c..7f2fc1f 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1332,9 +1332,10 @@ pgbench options d The random function generates values using a uniform distribution, that is all the values are drawn within the specified -range with equal probability. The random_exponential and -random_gaussian functions require an additional double -parameter which determines the precise shape of the distribution. +range with equal probability. The random_exponential, +random_gaussian and random_zipfian +functions require an additional double parameter which determines the precise +shape of the distribution.
Re: General purpose hashing func in pgbench
10/01/2018 21:42, Fabien COELHO пишет: > > Hmm. I do not think that we should want a shared seed value. The seed > should be different for each call so as to avoid undesired > correlations. If wanted, correlation could be obtained by using an > explicit identical seed. > > ISTM that the best way to add the seed is to call random() when the > second arg is missing in make_func. Also, this means that the executor > would always get its two arguments, so it would simplify the code there. > Ok, I think I understand what you meant. You meant the case like following: \set x random(1, 100) \set h1 hash(:x) \set h2 hash(:x) -- will have different seed from h1 so that different instances of hash function within one script would have different seeds. Yes, that is a good idea, I can do that. -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: General purpose hashing func in pgbench
Hello Fabien, 10/01/2018 21:42, Fabien COELHO пишет: >>>> Should we probably add some infrastructure for optional arguments? >>> >>> You can look at the handling of "CASE" which may or may not have an >>> "ELSE" clause. >>> >>> I'd suggest you use a new negative argument with the special meaning >>> for hash, and create the seed value when missing when building the >>> function, so as to simplify the executor code. > >> Added a new nargs option -3 for hash functions and moved arguments check >> to parser. It's starting to look a bit odd and I'm thinking about >> replacing bare numbers (-1, -2, -3) with #defined macros. E.g.: >> >> #define PGBENCH_NARGS_VARIABLE (-1) >> #define PGBENCH_NARGS_CASE (-2) >> #define PGBENCH_NARGS_HASH (-3) > > Yes, I'm more than fine with improving readability. > Added macros. >>> Instead of 0, I'd consider providing a random default so that the >>> hashing behavior is not the same from one run to the next. What do you >>> think? >> >> Makes sence since each thread is also initializes its random_state with >> random numbers before start. So I added global variable 'hash_seed' and >> initialize it with random() before threads spawned. > > Hmm. I do not think that we should want a shared seed value. The seed > should be different for each call so as to avoid undesired > correlations. If wanted, correlation could be obtained by using an > explicit identical seed. Probably I'm missing something but I cannot see the point. If we change seed on every invokation then we get uniform-like distribution (see attached image). And we don't get the same hash value for the same input which is the whole point of hash functions. Maybe I didn't understand you correctly. Anyway I've attached a new version with some tests and docs added. -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 3dd492c..c575f19 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -1246,6 +1246,27 @@ pgbench options d 5 + hash(a [, seed ] ) + integer + alias for hash_murmur2() + hash(10, 5432) + -5817877081768721676 + + + hash_fnv1a(a [, seed ] ) + integer + FNV hash + hash_fnv1a(10, 5432) + -7793829335365542153 + + + hash_murmur2(a [, seed ] ) + integer + murmur2 hash + hash_murmur2(10, 5432) + -5817877081768721676 + + int(x) integer cast to int diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index e23ca51..9668b3d 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,6 +16,10 @@ #include "pgbench.h" +#define PGBENCH_NARGS_VARIABLE (-1) +#define PGBENCH_NARGS_CASE (-2) +#define PGBENCH_NARGS_HASH (-3) + PgBenchExpr *expr_parse_result; static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list); @@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, PgBenchExpr *expr) /* * List of available functions: * - fname: function name, "!..." for special internal functions - * - nargs: number of arguments - * -1 is a special value for least & greatest meaning #args >= 1 - * -2 is for the "CASE WHEN ..." function, which has #args >= 3 and odd + * - nargs: number of arguments. Special cases: + * - PGBENCH_NARGS_VARIABLE is a special value for least & greatest + * meaning #args >= 1; + * - PGBENCH_NARGS_CASE is for the "CASE WHEN ..." function, which + * has #args >= 3 and odd; + * - PGBENCH_NARGS_HASH is for hash functions, which have one required + * and one optional argument; * - tag: function identifier from PgBenchFunction enum */ static const struct @@ -259,10 +267,10 @@ static const struct "abs", 1, PGBENCH_ABS }, { - "least", -1, PGBENCH_LEAST + "least", PGBENCH_NARGS_VARIABLE, PGBENCH_LEAST }, { - "greatest", -1, PGBENCH_GREATEST + "greatest", PGBENCH_NARGS_VARIABLE, PGBENCH_GREATEST }, { "debug", 1, PGBENCH_DEBUG @@ -347,7 +355,16 @@ static const struct }, /* "case when ... then ... else ... end" construction */ { - "!cas
Re: General purpose hashing func in pgbench
10/01/2018 16:35, Ildar Musin пишет: > 09/01/2018 23:11, Fabien COELHO пишет: >> Hello Ildar, >> >>> Sorry for a long delay. I've added hash() function which is just an >>> alias for murmur2. I've also utilized variable arguments feature from >>> least()/greates() functions to make optional seed parameter, but I >>> consider this as a hack. >> Patch needs a rebase after Teodor push for a set of pgbench functions. > Done. Congratulations on your patch finally being committed : ) >>> Should we probably add some infrastructure for optional arguments? >> You can look at the handling of "CASE" which may or may not have an >> "ELSE" clause. >> >> I'd suggest you use a new negative argument with the special meaning >> for hash, and create the seed value when missing when building the >> function, so as to simplify the executor code. > Added a new nargs option -3 for hash functions and moved arguments check > to parser. It's starting to look a bit odd and I'm thinking about > replacing bare numbers (-1, -2, -3) with #defined macros. E.g.: > > #define PGBENCH_NARGS_VARIABLE (-1) > #define PGBENCH_NARGS_CASE (-2) > #define PGBENCH_NARGS_HASH (-3) >> Instead of 0, I'd consider providing a random default so that the >> hashing behavior is not the same from one run to the next. What do you >> think? >> > Makes sence since each thread is also initializes its random_state with > random numbers before start. So I added global variable 'hash_seed' and > initialize it with random() before threads spawned. >> Like the previous version, murmur2 with seed looks much more random >> than fnv with seed... >> Sorry, here is the patch -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index e23ca51..847b011 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -349,6 +349,15 @@ static const struct { "!case_end", -2, PGBENCH_CASE }, + { + "hash", -3, PGBENCH_HASH_MURMUR2 + }, + { + "hash_murmur2", -3, PGBENCH_HASH_MURMUR2 + }, + { + "hash_fnv1a", -3, PGBENCH_HASH_FNV1A + }, /* keep as last array element */ { NULL, 0, 0 @@ -447,6 +456,15 @@ make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *args) expr_yyerror_more(yyscanner, "odd and >= 3 number of arguments expected", "case control structure"); } + /* special case: hash functions with optional arguments */ + if (PGBENCH_FUNCTIONS[fnumber].nargs == -3) + { + int len = elist_length(args); + + if (len < 1 || len > 2) + expr_yyerror_more(yyscanner, "unexpected number of arguments", + PGBENCH_FUNCTIONS[fnumber].fname); + } expr->etype = ENODE_FUNCTION; expr->u.function.function = PGBENCH_FUNCTIONS[fnumber].tag; diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 31ea6ca..6bf94cc 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -61,6 +61,14 @@ #define ERRCODE_UNDEFINED_TABLE "42P01" /* + * Hashing constants + */ +#define FNV_PRIME 0x10001b3 +#define FNV_OFFSET_BASIS 0xcbf29ce484222325 +#define MM2_MUL 0xc6a4a7935bd1e995 +#define MM2_ROT 47 + +/* * Multi-platform pthread implementations */ @@ -439,6 +447,8 @@ static int num_scripts;/* number of scripts in sql_script[] */ static int num_commands = 0; /* total number of Command structs */ static int64 total_weight = 0; +static int hash_seed; /* default seed used in hash functions */ + static int debug = 0; /* debug flag */ /* Builtin test scripts */ @@ -915,6 +925,51 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s) } /* + * FNV-1a hash function + */ +static int64 +getHashFnv1a(int64 val, uint64 seed) +{ + int64 result; + int i; + + result = FNV_OFFSET_BASIS ^ seed; + for (i = 0; i < 8; ++i) + { + int32 octet = val & 0xff; + + val = val >> 8; + result = result ^ octet; + result = result * FNV_PRIME; + } + + return result; +} + +/* + * Murmur2 hash function + */ +static int64 +getHashMurmur2(int64 val, uint64 seed) +{ + uint64 result = seed ^ (sizeof(int64) * MM2_MUL); + uint64 k = (uint64) val
Re: General purpose hashing func in pgbench
09/01/2018 23:11, Fabien COELHO пишет: > > Hello Ildar, > >> Sorry for a long delay. I've added hash() function which is just an >> alias for murmur2. I've also utilized variable arguments feature from >> least()/greates() functions to make optional seed parameter, but I >> consider this as a hack. > > Patch needs a rebase after Teodor push for a set of pgbench functions. Done. Congratulations on your patch finally being committed : ) > >> Should we probably add some infrastructure for optional arguments? > > You can look at the handling of "CASE" which may or may not have an > "ELSE" clause. > > I'd suggest you use a new negative argument with the special meaning > for hash, and create the seed value when missing when building the > function, so as to simplify the executor code. Added a new nargs option -3 for hash functions and moved arguments check to parser. It's starting to look a bit odd and I'm thinking about replacing bare numbers (-1, -2, -3) with #defined macros. E.g.: #define PGBENCH_NARGS_VARIABLE (-1) #define PGBENCH_NARGS_CASE (-2) #define PGBENCH_NARGS_HASH (-3) > > Instead of 0, I'd consider providing a random default so that the > hashing behavior is not the same from one run to the next. What do you > think? > Makes sence since each thread is also initializes its random_state with random numbers before start. So I added global variable 'hash_seed' and initialize it with random() before threads spawned. > Like the previous version, murmur2 with seed looks much more random > than fnv with seed... > -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: General purpose hashing func in pgbench
Hello Fabien, 25/12/2017 19:17, Fabien COELHO пишет: > >>> However, the key can be used if controlled so that different values do >>> not have the same randomization in different part of the script, so as >>> to avoid using the same patterns for different things if not desirable. >> >> Original murmur algorithm accepts seed as a parameter, which can be used >> for this purpose. I used value itself as a seed in the patch, but I can >> turn it into a function argument. > > Hmmm. Possibly. However it needs a default, something like > > x = hash(v[, key]) > > with key some pseudo-random value? > Sorry for a long delay. I've added hash() function which is just an alias for murmur2. I've also utilized variable arguments feature from least()/greates() functions to make optional seed parameter, but I consider this as a hack. Should we probably add some infrastructure for optional arguments? Docs and tests are on their way. Thanks! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index 26494fd..340c020 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -131,7 +131,8 @@ make_op(yyscan_t yyscanner, const char *operator, * List of available functions: * - fname: function name * - nargs: number of arguments - * -1 is a special value for least & greatest meaning #args >= 1 + * -1 is a special value for least, greatest & hash functions + * meaning variable arguments number * - tag: function identifier from PgBenchFunction enum */ static const struct @@ -200,6 +201,15 @@ static const struct { "power", 2, PGBENCH_POW }, + { + "hash", -1, PGBENCH_HASH_MURMUR2 + }, + { + "hash_murmur2", -1, PGBENCH_HASH_MURMUR2 + }, + { + "hash_fnv1a", -1, PGBENCH_HASH_FNV1A + }, /* keep as last array element */ { NULL, 0, 0 diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index fc2c734..71e4814 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -61,6 +61,14 @@ #define ERRCODE_UNDEFINED_TABLE "42P01" /* + * Hashing constants + */ +#define FNV_PRIME 0x10001b3 +#define FNV_OFFSET_BASIS 0xcbf29ce484222325 +#define MM2_MUL 0xc6a4a7935bd1e995 +#define MM2_ROT 47 + +/* * Multi-platform pthread implementations */ @@ -912,6 +920,51 @@ getZipfianRand(TState *thread, int64 min, int64 max, double s) } /* + * FNV-1a hash function + */ +static int64 +getHashFnv1a(int64 val, uint64 seed) +{ + int64 result; + int i; + + result = FNV_OFFSET_BASIS ^ seed; + for (i = 0; i < 8; ++i) + { + int32 octet = val & 0xff; + + val = val >> 8; + result = result ^ octet; + result = result * FNV_PRIME; + } + + return result; +} + +/* + * Murmur2 hash function + */ +static int64 +getHashMurmur2(int64 val, uint64 seed) +{ + uint64 result = seed ^ (sizeof(int64) * MM2_MUL); + uint64 k = (uint64) val; + + k *= MM2_MUL; + k ^= k >> MM2_ROT; + k *= MM2_MUL; + + result ^= k; + result *= MM2_MUL; + + result ^= result >> MM2_ROT; + result *= MM2_MUL; + result ^= result >> MM2_ROT; + + return (int64) result; +} + +/* * Initialize the given SimpleStats struct to all zeroes */ static void @@ -1868,6 +1921,37 @@ evalFunc(TState *thread, CState *st, return true; } + /* hashing */ + case PGBENCH_HASH_FNV1A: + case PGBENCH_HASH_MURMUR2: + { + int64 val; + int64 seed = 0; + int64 result; + + Assert(nargs >= 1); + + if (!coerceToInt(&vargs[0], &val)) + return false; + + if (nargs > 2) + { + fprintf(stderr, + "hash function accepts maximum of two arguments\n"); + return false; + } + + /* read optional seed value */ + if (nargs > 1) + if (!coerceToInt(&vargs[1], &seed)) + return false; + +
Re: General purpose hashing func in pgbench
25/12/2017 17:12, Fabien COELHO пишет: > > However, the key can be used if controlled so that different values do > not have the same randomization in different part of the script, so as > to avoid using the same patterns for different things if not desirable. Original murmur algorithm accepts seed as a parameter, which can be used for this purpose. I used value itself as a seed in the patch, but I can turn it into a function argument. > > For the pgbench pseudo-random permutation I'm looking at, the key can > be specified to control whether the same pattern or not should be used. > Just curious, which algorithm are you intended to choose? -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: General purpose hashing func in pgbench
Hello Fabien, 24/12/2017 11:12, Fabien COELHO пишет: > > Yep. The ugliness is significantly linked to the choice of name. With > MM2_MUL and MM2_ROT ISTM that it is more readable: > >> k *= MM2_MUL; >> k ^= k >> MM2_ROT; >> k *= MM2_MUL; >> result ^= k; >> result *= MM2_MUL; Ok, will do. > >> [...] So I'd better leave it the way it is. Actually I was thinking >> to do the same to fnv1a too : ) > > I think that the implementation style should be homogeneous, so I'd > suggest at least to stick to one style. > > I noticed from the source of all human knowledege (aka Wikipedia:-) > that there seems to be a murmur3 successor. Have you considered it? > One good reason to skip it would be that the implementation is long > and complex. I'm not sure about a 8-byte input simplified version. Murmur2 naturally supports 8-byte data. Murmur3 has 32- and 128-bit versions. So to handle int64 I could 1) split input value into two halfs and combine somehow the results of 32 bit version or 2) use 128-bit version and discard higher bytes. Btw, postgres core already has a 32bit murmur3 implementation, but it only uses the finalization part of algorithm (see murmurhash32). As my colleague Yura Sokolov told me in a recent conversation it alone provides pretty good randomization. I haven't tried it yet though. > > Just a question: Have you looked at SipHash24? > > https://en.wikipedia.org/wiki/SipHash > > The interesting point is that it can use a key and seems somehow > cryptographically secure, for a similar cost. However the how to > decide for/control the key is unclear. > Not yet. As I can understand from the wiki its main feature is to prevent attacks with crafted input data. How can it be useful in benchmarking? Unless it shows superior performance and randomization. -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: General purpose hashing func in pgbench
21/12/2017 18:26, Fabien COELHO пишет: > >> I think it is not commitfest ready yet -- I need to add some >> documentation and tests first. > > Yes, doc & test are missing. > > From your figures, the murmur2 algorithm output looks way better. I'm > wondering whether it makes sense to provide a bad hash function if a > good/better one is available, unless the bad one actually appears in > some benchmark... So I would suggest to remove fnv1a. Actually the "bad" one appears in YCSB. But if we should choose the only one I would stick to murmur too given it provides better results while having similar computational complexity. > > One implementation put constants in defines, the other one uses "const > int". The practice in pgbench seems to use defines (eg > MIN_GAUSSIAN_PARAM...), so I would suggest to stick to this style. That had been my intention from the start until I coded it that way and it looked ugly and hard to read (IMHO), like: k *= MURMUR2_M; k ^= k >> MURMUR2_R; k *= MURMUR2_M; result ^= k; result *= MURMUR2_M; ...etc. And besides it is not a parameter to be tuned and not something someone would ever want to change; it appears in just a single function. So I'd better leave it the way it is. Actually I was thinking to do the same to fnv1a too : ) -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: General purpose hashing func in pgbench
21/12/2017 15:44, Fabien COELHO пишет: > >>> Add the submission to the next CF? >> I think it is not commitfest ready yet -- I need to add some >> documentation and tests first. > > It just helps to that the thread is referenced, and the review process > has started anyway. > You are right, I've submitted the patch to upcoming commitfest. Thanks! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
IndexTupleDSize macro seems redundant
Hi all, While I was looking through the indexes code I got confused by couple of macros - IndexTupleSize() and IndexTupleDSize() - which seem to do the same thing with only difference that the first one takes pointer as an argument while the second one takes struct. And in most cases IndexTupleDSize() is used with dereferencing of index tuple where IndexTupleSize() would suit perfectly. Is there a particular reason to have them both? I've made a patch that removes IndexTupleDSize macro. All the tests have passed. -- Ildar Musin i.mu...@postgrespro.ru diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c index f19f6fd..a133143 100644 --- a/src/backend/access/hash/hash_xlog.c +++ b/src/backend/access/hash/hash_xlog.c @@ -558,7 +558,7 @@ hash_xlog_move_page_contents(XLogReaderState *record) Size itemsz; OffsetNumber l; -itemsz = IndexTupleDSize(*itup); +itemsz = IndexTupleSize(itup); itemsz = MAXALIGN(itemsz); data += itemsz; @@ -686,7 +686,7 @@ hash_xlog_squeeze_page(XLogReaderState *record) Size itemsz; OffsetNumber l; -itemsz = IndexTupleDSize(*itup); +itemsz = IndexTupleSize(itup); itemsz = MAXALIGN(itemsz); data += itemsz; diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c index dc08db9..b90d417 100644 --- a/src/backend/access/hash/hashinsert.c +++ b/src/backend/access/hash/hashinsert.c @@ -55,7 +55,7 @@ _hash_doinsert(Relation rel, IndexTuple itup, Relation heapRel) hashkey = _hash_get_indextuple_hashkey(itup); /* compute item size too */ - itemsz = IndexTupleDSize(*itup); + itemsz = IndexTupleSize(itup); itemsz = MAXALIGN(itemsz); /* be safe, PageAddItem will do this but we * need to be consistent */ @@ -222,7 +222,7 @@ restart_insert: XLogRegisterBuffer(1, metabuf, REGBUF_STANDARD); XLogRegisterBuffer(0, buf, REGBUF_STANDARD); - XLogRegisterBufData(0, (char *) itup, IndexTupleDSize(*itup)); + XLogRegisterBufData(0, (char *) itup, IndexTupleSize(itup)); recptr = XLogInsert(RM_HASH_ID, XLOG_HASH_INSERT); @@ -309,7 +309,7 @@ _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups, { Size itemsize; - itemsize = IndexTupleDSize(*itups[i]); + itemsize = IndexTupleSize(itups[i]); itemsize = MAXALIGN(itemsize); /* Find where to insert the tuple (preserving page's hashkey ordering) */ diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c index c206e70..b5ec0f6 100644 --- a/src/backend/access/hash/hashovfl.c +++ b/src/backend/access/hash/hashovfl.c @@ -890,7 +890,7 @@ readpage: itup = (IndexTuple) PageGetItem(rpage, PageGetItemId(rpage, roffnum)); - itemsz = IndexTupleDSize(*itup); + itemsz = IndexTupleSize(itup); itemsz = MAXALIGN(itemsz); /* diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c index a50e35d..51100e0 100644 --- a/src/backend/access/hash/hashpage.c +++ b/src/backend/access/hash/hashpage.c @@ -1173,7 +1173,7 @@ _hash_splitbucket(Relation rel, * the current page in the new bucket, we must allocate a new * overflow page and place the tuple on that page instead. */ -itemsz = IndexTupleDSize(*new_itup); +itemsz = IndexTupleSize(new_itup); itemsz = MAXALIGN(itemsz); if (PageGetFreeSpaceForMultipleTuples(npage, nitups + 1) < (all_tups_size + itemsz)) diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index 310589d..b39090a 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -558,7 +558,7 @@ _bt_findinsertloc(Relation rel, lpageop = (BTPageOpaque) PageGetSpecialPointer(page); - itemsz = IndexTupleDSize(*newtup); + itemsz = IndexTupleSize(newtup); itemsz = MAXALIGN(itemsz); /* be safe, PageAddItem will do this but we * need to be consistent */ @@ -755,7 +755,7 @@ _bt_insertonpg(Relation rel, elog(ERROR, "cannot insert to incompletely split page %u", BufferGetBlockNumber(buf)); - itemsz = IndexTupleDSize(*itup); + itemsz = IndexTupleSize(itup); itemsz = MAXALIGN(itemsz); /* be safe, PageAddItem will do this but we * need to be consistent */ @@ -914,7 +914,7 @@ _bt_insertonpg(Relation rel, sizeof(IndexTupleData)); } else -XLogRegisterBufData(0, (char *) itup, IndexTupleDSize(*itup)); +XLogRegisterBufData(0, (char *) itup, IndexTupleSize(itup)); recptr = XLogInsert(RM_BTREE_ID, xlinfo); diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index bf6c03c..b71529f 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -468,7 +468,7 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup) last_off = state->btps_lastoff; pgspc = PageGetFreeSpace(npage); - itupsz = IndexTupleDSize(*itup);
Re: [HACKERS] Repetitive code in RI triggers
On 19.11.2017 00:31, Tom Lane wrote: Ildar Musin writes: [ ri_triggers_v2.patch ] Pushed with two minor improvements. I noticed that ri_setdefault could just go directly to ri_restrict rather than call the two separate triggers that would end up there anyway; that lets its argument be "TriggerData *trigdata" for more consistency with the other cases. Also, this patch made it very obvious that we were caching identical queries under hash keys RI_PLAN_RESTRICT_DEL_CHECKREF and RI_PLAN_RESTRICT_UPD_CHECKREF, so we might as well just use one hash entry for both cases, saving a few lines of code as well as a lot of cycles. Likewise in the other two functions. regards, tom lane Great, thanks -- Ildar Musin i.mu...@postgrespro.ru
Re: [HACKERS] Custom compression methods
Hi Ildus, On 14.11.2017 16:23, Ildus Kurbangaliev wrote: On Thu, 2 Nov 2017 15:28:36 +0300 Ildus Kurbangaliev wrote: On Tue, 12 Sep 2017 17:55:05 +0300 Ildus Kurbangaliev wrote: Attached rebased version of the patch. Added support of pg_dump, the code was simplified, and a separate cache for compression options was added. Attached version 3 of the patch. Rebased to the current master, removed ALTER TYPE .. SET COMPRESSED syntax, fixed bug in compression options cache. Attached version 4 of the patch. Fixed pg_upgrade and few other bugs. I've started to review your code. And even though it's fine overall I have few questions and comments (aside from DROP COMPRESSION METHOD discussion). 1. I'm not sure about proposed syntax for ALTER TABLE command: ALTER TABLE t ALTER COLUMN a SET COMPRESSED WITH (); ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED; ISTM it is more common for Postgres to use syntax like SET/DROP for column options (SET/DROP NOT NULL, DEFAULT etc). My suggestion would be: ALTER TABLE t ALTER COLUMN a SET COMPRESSED USING WITH (); ALTER TABLE t ALTER COLUMN a DROP COMPRESSED; (keyword USING here is similar to "CREATE INDEX ... USING " syntax) 2. The way you changed DefineRelation() implies that caller is responsible for creation of compression options. Probably it would be better to create them within DefineRelation(). 3. Few minor issues which seem like obsolete code: Function freeRelOptions() is defined but never used. Function getBaseTypeTuple() has been extracted from getBaseTypeAndTypmod() but never used separately. In toast_flatten_tuple_to_datum() there is untoasted_value variable which is only used for meaningless assignment. (Should I send a patch for that kind of issues?) -- Ildar Musin i.mu...@postgrespro.ru