Re: [HACKERS] parallel mode and parallel contexts

Andres Freund Sat, 07 Feb 2015 16:21:13 -0800

Hi,


On 2015-02-06 22:43:21 -0500, Robert Haas wrote:
> Here's v4, with that fixed and a few more tweaks.

If you attached files generated with 'git format-patch' I could directly
apply then with the commit message and such. All at once if it's
mutliple patches, as individual commits. On nontrivial patches it's nice
to see the commit message(s) along the diff(s).

Observations:
* Some tailing whitespace in the readme. Very nice otherwise.

* Don't like CreateParallelContextForExtension much as a function name -
  I don't think we normally equate the fact that code is located in a
  loadable library with the extension mechanism.

* StaticAssertStmt(BUFFERALIGN(PARALLEL_ERROR_QUEUE_SIZE) ...) what's
  that about?

* the plain shm calculations intentionally use mul_size/add_size to deal
  with overflows. On 32bit it doesn't seem impossible, but unlikely to
  overflow size_t.

* I'd s/very // i "We might be running in a very short-lived memory
  context.". Or replace it with "too".

* In +LaunchParallelWorkers, does it make sense trying to start workers
  if one failed? ISTM that's likely to not be helpful. I.e. it should
  just break; after the first failure.

* +WaitForParallelWorkersToFinish says that it waits for workers to exit
  cleanly. To me that's ambiguous. How about "fully"?

* ParallelMain restores libraries before GUC state. Given that
  librararies can, and actually somewhat frequently do, inspect GUCs on
  load, it seems better to do it the other way round? You argue "We want
  to do this before restoring GUCs, because the libraries might define
  custom variables.", but I don't buy that. It's completely normal for
  namespaced GUCs to be present before a library is loaded? Especially
  as you then go and process_session_preload_libraries() after setting
  the GUCs.

* Should ParallelMain maybe enter the parallel context before loading
  user defined libraries? It's far from absurd to execute code touching
  the database on library initialization...

* rename ParallelMain to ParallelWorkerMain?

* I think restoring snapshots needs to fudge the worker's PGXACT->xmin
  to be the minimum of the top transaction id and the
  snapshots. Otherwise if the initiating process dies badly
  (e.g. because postmaster died) the workers will continue to work,
  while other processes may remove things. Also, you don't seem to
  prohibit popping the active snapshot (should that be prohibitted
  maybe?) which might bump the initiator's xmin horizon.

* Comment about 'signal_handler' in +HandleParallelMessageInterrupt
  is outdated.

* Is it really a good idea to overlay Z/ReadyForQuery with 'this worker
  exited cleanly'? Shouldn't it at least be T/Terminate? I'm generally
  not sure it's wise to use this faux FE/BE protocol here...

* HandlingParallelMessages looks dead.

* ParallelErrorContext has the wrong comment.

* ParallelErrorContext() provides the worker's pid in the context
  message. I guess that's so it's easier to interpret when sent to the
  initiator? It'll look odd when logged by the failing process.

* We now have pretty much the same handle_sigterm in a bunch of
  places. Can't we get rid of those? You actually probably can just use
  die().

* The comments in xact.c above XactTopTransactionId pretty much assume
  that the reader knows that that is about parallel stuff.

* I'm a bit confused by the fact that Start/CommitTransactionCommand()
  emit a generic elog when encountering a PARALLEL_INPROGRESS, whereas
  ReleaseSavepoint()/RollbackTo has a special message. Shouldn't it be
  pretty much impossible to hit a ReleaseSavepoint() with a parallel
  transaction in progress? We'd have had to end the previous transaction
  command while parallel stuff was in progress - right?
  (Internal/ReleaseCurrentSubTransaction is different, that's used in code)

* Why are you deviating from the sorting order used in other places for
  ParallelCurrentXids? That seems really wierd, especially as we use
  something else a couple lines down. Especially as you actually seem to
  send stuff in xidComparator order?

* I don't think skipping AtEOXact_Namespace() entirely if parallel is a
  good idea. That function already does some other stuff than cleaning
  up temp tables. I think you should pass in parallel and do the skip in
  there.

* Start/DndParallelWorkerTransaction assert the current state, whereas
  the rest of the file FATALs in that case. I think it'd actually be
  good to be conservative and do the same in this case.

* You're manually passing function names to
  PreventCommandIfParallelMode() in a fair number of cases. I'd either
  try and keep the names consistent with what the functions are actually
  called at the sql level (adding the types in the parens) or just use
  PG_FUNCNAME_MACRO to keep them correct.

* Wait. You now copy all held relation held "as is" to the standby? I
  quite doubt that's a good idea, and it's not my reading of the
  conclusions made in the group locking thread. At the very least this
  part needs to be extensively documented. And while
  LockAcquireExtended() refers to
  src/backend/access/transam/README.parallel for details I don't see
  anything pertinent in there. And the function header sounds like the
  only difference is the HS logging - not mentioning that it essentially
  disables lock queuing entirely.

  This seems unsafe (e.g. consider if the initiating backend died and
  somebody else acquired the lock, possible e.g. if postmaster died) and
  not even remotely enough discussed. I think this should be removed
  from the patch for now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

Reply via email to