Re: backends stuck in "startup"

2017-11-21 Thread Justin Pryzby
On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote:
> > I'm sorry to report this previously reported problem is happening again,
> > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5.
> > 
> > As $subject: backends are stuck in startup for minutes at a time.  I didn't
> > strace this time, but I believe last time I saw one was waiting in a futex.
> 
> Could you try stracing next time?

I straced all the "startup" PIDs, which were all in futex, without exception:

[pryzbyj@TS-DB ~]$ sudo strace `ps h --sort start_time -o pid,args -u postgres 
|awk '/startup$/{print "-p"$1}'` 2>&1 |tee -a /tmp/strace-pg-backends

Process 3611 attached
Process 6081 attached
[pid  6081] futex(0x7ff4314000f8, FUTEX_WAIT, 0, NULL 
[pid  3611] futex(0x7ff431400138, FUTEX_WAIT, 0, NULL 
[pid  3354] futex(0x7ff431400258, FUTEX_WAIT, 0, NULL 
[pid  3117] futex(0x7ff4314002b8, FUTEX_WAIT, 0, NULL 
[pid  2948] futex(0x7ff431400378, FUTEX_WAIT, 0, NULL 
[pid  2925] futex(0x7ff4314003d8, FUTEX_WAIT, 0, NULL 
[pid  2789] futex(0x7ff431400438, FUTEX_WAIT, 0, NULL 
[pid  2752] futex(0x7ff4314004b8, FUTEX_WAIT, 0, NULL 
[pid  2663] futex(0x7ff431400618, FUTEX_WAIT, 0, NULL 
[pid  2661] futex(0x7ff431400638, FUTEX_WAIT, 0, NULL 
[pid  2624] futex(0x7ff431400678, FUTEX_WAIT, 0, NULL 
[pid  2576] futex(0x7ff4314006f8, FUTEX_WAIT, 0, NULL 
[pid   365] futex(0x7ff431400718, FUTEX_WAIT, 0, NULL 
[pid   326] futex(0x7ff431400798, FUTEX_WAIT, 0, NULL 
[pid 32704] futex(0x7ff431400898, FUTEX_WAIT, 0, NULL 
[pid 32309] futex(0x7ff431400938, FUTEX_WAIT, 0, NULL 
[pid 32292] futex(0x7ff431400a18, FUTEX_WAIT, 0, NULL 
[pid 32170] futex(0x7ff431400ad8, FUTEX_WAIT, 0, NULL 
[pid 32134] futex(0x7ff431400b58, FUTEX_WAIT, 0, NULL 
[pid 31987] futex(0x7ff431400bf8, FUTEX_WAIT, 0, NULL 
[pid 31962] futex(0x7ff431400c58, FUTEX_WAIT, 0, NULL 
[pid 31817] futex(0x7ff431400d38, FUTEX_WAIT, 0, NULL 
[pid 31807] futex(0x7ff431400d98, FUTEX_WAIT, 0, NULL 
[pid 29664] futex(0x7ff431400e98, FUTEX_WAIT, 0, NULL 
[pid 29462] futex(0x7ff431400ef8, FUTEX_WAIT, 0, NULL 
[pid 29426] futex(0x7ff431400f38, FUTEX_WAIT, 0, NULL 
[pid 29258] futex(0x7ff431401058, FUTEX_WAIT, 0, NULL 
[pid 29227] futex(0x7ff431402f58, FUTEX_WAIT, 0, NULL 
[pid 29107] futex(0x7ff431401138, FUTEX_WAIT, 0, NULL 
[pid 29071] futex(0x7ff431401198, FUTEX_WAIT, 0, NULL 
[pid 28942] futex(0x7ff431401258, FUTEX_WAIT, 0, NULL 
[pid 28924] futex(0x7ff431401298, FUTEX_WAIT, 0, NULL 
[pid 26942] futex(0x7ff431401418, FUTEX_WAIT, 0, NULL 
[pid 25278] futex(0x7ff431401438, FUTEX_WAIT, 0, NULL 
[pid 24909] futex(0x7ff431401598, FUTEX_WAIT, 0, NULL 
[pid 24908] futex(0x7ff4314015b8, FUTEX_WAIT, 0, NULL 
[pid 24852] futex(0x7ff4314015f8, FUTEX_WAIT, 0, NULL 
[pid 24851] futex(0x7ff431401618, FUTEX_WAIT, 0, NULL 
[pid 24585] futex(0x7ff4314016f8, FUTEX_WAIT, 0, NULL 
[pid 24584] futex(0x7ff431401718, FUTEX_WAIT, 0, NULL 
[pid 24478] futex(0x7ff431401918, FUTEX_WAIT, 0, NULL 
[pid 24476] futex(0x7ff431401958, FUTEX_WAIT, 0, NULL 
[pid 24470] futex(0x7ff4314019d8, FUTEX_WAIT, 0, NULL 
[pid 24464] futex(0x7ff431401ad8, FUTEX_WAIT, 0, NULL 
[pid 24458] futex(0x7ff431401bb8, FUTEX_WAIT, 0, NULL 
[pid 24456] futex(0x7ff431401b98, FUTEX_WAIT, 0, NULL 
[pid 24450] futex(0x7ff431401c58, FUTEX_WAIT, 0, NULL 
[pid 24448] futex(0x7ff431401c98, FUTEX_WAIT, 0, NULL 
[pid 24439] futex(0x7ff431401cf8, FUTEX_WAIT, 0, NULL 
[pid 24323] futex(0x7ff431401e38, FUTEX_WAIT, 0, NULL 
[pid 24322] futex(0x7ff431401e58, FUTEX_WAIT, 0, NULL 
[pid 23057] futex(0x7ff431401fd8, FUTEX_WAIT, 0, NULL 
[pid 23055] futex(0x7ff431402038, FUTEX_WAIT, 0, NULL 
[pid 21826] futex(0x7ff4314021f8, FUTEX_WAIT, 0, NULL 
[pid 21824] futex(0x7ff431402238, FUTEX_WAIT, 0, NULL 
[pid 21794] futex(0x7ff431402298, FUTEX_WAIT, 0, NULL 
[pid 21792] futex(0x7ff431402638, FUTEX_WAIT, 0, NULL 
[pid 21791] futex(0x7ff431402af8, FUTEX_WAIT, 0, NULL 
[pid 21790] futex(0x7ff431402c98, FUTEX_WAIT, 0, NULL 
[pid 21786] futex(0x7ff431402478, FUTEX_WAIT, 0, NULL 
[pid 21785] futex(0x7ff431402758, FUTEX_WAIT, 0, NULL 
[pid 21784] futex(0x7ff431403098, FUTEX_WAIT, 0, NULL 
[pid 21768] futex(0x7ff431402718, FUTEX_WAIT, 0, NULL 
[pid 21662] futex(0x7ff431402338, FUTEX_WAIT, 0, NULL 
[pid 21648] futex(0x7ff431402c38, FUTEX_WAIT, 0, NULL 
[pid 21630] futex(0x7ff4314024d8, FUTEX_WAIT, 0, NULL 
[pid 21591] futex(0x7ff431402278, FUTEX_WAIT, 0, NULL 
[pid 21590] futex(0x7ff431402bf8, FUTEX_WAIT, 0, NULL

Also:

[pryzbyj@TS-DB ~]$ date;ps -O lstart,wchan=99 -u postgres 
--sort start_time |grep startup$
Wed Nov 22 00:47:41 EST 2017
 1990 Wed Nov 22 00:36:44 2017 futex_wait_queue_meS ?00:00:00 
postgres: xx xx xx.247(53934) startup
 2007 Wed Nov 22 00:36:44 2017 futex_wait_queue_meS ?00:00:00 
postgres: xx xx xx.247(53945) startup
 2014 Wed Nov 22 00:36:44 2017 futex_wait_queue_meS ?00:00:00 
postgres: xx xx xx.247(53946) startup
 2042 Wed Nov 22 

Re: On Judging the Value of Tests

2017-11-21 Thread Tom Lane
Craig Ringer  writes:
> On 22 November 2017 at 08:43, Jankirk.Vincent., Jamison <
> k.jami...@jp.fujitsu.com> wrote:
>> 1.  How do you judge when a test suite is acceptable to be added to
>> Postgres OSS source code? (How do you judge the value of a test suite?)

> Make your argument for it, and see if others agree. There's no formal
> process.

Yeah.  Also realize that we've been accreting test cases for ~20 years,
and there's never been any master plan about what to test.  So there's a
lot of individual test cases of varying quality/value, and you shouldn't
necessarily take any one existing test as gospel.  I don't hold that out
as a great example of software engineering, but it's the truth.

> Inputs into the decision making process include:

> * How much coverage of previously untested functionality it adds
> * How much code coverage it adds
> * How long the test takes to run, especially considering the slow buildfarm
> boxes and development turnaround time
> * Whether the test fits into one of the existing suites we run routinely,
> or requires separate steps
> * How much work will be required to maintain the test

Also, portability/repeatability.  If it doesn't produce the same answers,
with very high probability, across all the platforms we support, it isn't
going to last long.

>> 3.  Is there a standard way of writing tests on the source code that
>> we should follow, like when should test be written in TAP/SQL/C formats and
>> how long should it be?

> In general, prefer pg_regress if you can use it. Isolation tests for
> concurrency issues.  TAP tests if you can't write it with pg_regress or
> isolation tester. Test modules only if you really must.

Right.  This ties in closely to the incremental runtime needed for one
additional test case.  A new query or two in an existing SQL regression
script adds little overhead.  A new TAP test adds quite a bit.

>> 4.  In the src/test/examples directory (which are all libpq tests),
>> why is the “examples” directory not included when building postgres?

I've always taken those to be more documentation, or sample code, than
test cases.  I'm not sure why they're under src/test/ at all.

regards, tom lane



Re: migrations (was Re: To all who wish to unsubscribe)

2017-11-21 Thread Igor Korot
Hi,

On Tue, Nov 21, 2017 at 2:15 PM, Jerry Regan
 wrote:
> I kind of agree with both.  Personally, my strengths are data communication. 
> Databases are frequent end points. Mailing lists? Have never had a need to 
> work with them, so they’re in the, “don’t care” bucket so long as they work.
>
> As far as missing , ‘unsubscribe’ in headers/titles, my totally unscientific 
> experience with a limited number of mailing lists (email, text, etc) gives me 
> the impression unsubscribe in header/title is by far the most common method 
> provided. Also seems simplest, but then I’m just a user...;)

This is exactly a point.
Every single ML has a links in the footer to either the ML archives or
to the (un)-subscribe page.

But apparently from now on PG lists will not have it and if you want
to unsubscribe from the list you will need to open a header (why?)
find the appropriate link inside a bunch of non-useful information and
copy paste it into the mailer.

Can anyone explain why?
Why not do what other ML do - provide the links in the footer?

Thank you.

>
> /s/jr
> Sent from my iPhone
>
>> On Nov 22, 2017, at 05:02, Tom Lane  wrote:
>>
>> John R Pierce  writes:
 On 11/21/2017 11:42 AM, Joshua D. Drake wrote:
 I am not sure if that is sarcasm but I think the reason is pretty self
 explanatory. -Hackers have all the people that understand how all this
 works, -general has all the people that don't.
>>
>>> rotfl, and ain't that the truth.
>>
>> I'm not sure I believe it.  People reading any database-oriented mailing
>> list are going to be pretty tech-savvy, I'd think.
>>
>>regards, tom lane
>>
>



Re: On Judging the Value of Tests

2017-11-21 Thread Michael Paquier
On Wed, Nov 22, 2017 at 10:31 AM, Craig Ringer  wrote:
> On 22 November 2017 at 08:43, Jankirk.Vincent., Jamison
>  wrote:
>> I am entirely very new to PG development.
>>
>> Currently, I’m studying the test suites, and I feel the more experienced
>> PG devs here could provide some insights on testing because I could not find
>> concrete answers in the Postgres documentations.
>>
>> 1.  How do you judge when a test suite is acceptable to be added to
>> Postgres OSS source code? (How do you judge the value of a test suite?)
>
> Make your argument for it, and see if others agree. There's no formal
> process.
>
> Inputs into the decision making process include:
>
> * How much coverage of previously untested functionality it adds
> * How much code coverage it adds
> * How long the test takes to run, especially considering the slow buildfarm
> boxes and development turnaround time
> * Whether the test fits into one of the existing suites we run routinely, or
> requires separate steps
> * How much work will be required to maintain the test

Even with those factors in mind there is a case-by-case variability.
Sometimes it is not worth initializing a new instance with initdb for
a small test. Parallelization of tests help on large machines by on
buildfarm members the size of a RPI things get time-consuming.
Priority is usually given into things that do not consume much time,
so as any developer can run the full set of tests quickly on a laptop.
I have for example a small alias to run the full set of basic tests
(make check-world -j 4 PROVE_FLAGS="-j 4") and this finishes within
two minutes if I recall correctly, TAP tests included. So runtime is
really an important matter.

>> 3.  Is there a standard way of writing tests on the source code that
>> we should follow, like when should test be written in TAP/SQL/C formats and
>> how long should it be?
>>
>> ①  I know that TAP test is for client program tests, SQL for
>> regression tests with sql queries, but there are instances also where tests
>> are also written in C like isolation tests, etc. How do we best judge which
>> language is preferred to use when writing tests for Postgres components? How
>> long should a test be when proposing them to hackers page?
>
> In general, prefer pg_regress if you can use it. Isolation tests for
> concurrency issues.  TAP tests if you can't write it with pg_regress or
> isolation tester. Test modules only if you really must.

This really depends on what you want to achieve, so this comes back to
a case-by-case. Regression tests are fine for basic tests, where one
session is enough to prove your feature. Isolation tests are helpful
when testing concurrent behaviors across multple sessions. TAP tests
are useful when more than one instance is needed or when you need to
test cluster-wide configurations or client binaries. Having tests in
patches proposed definitely helps people in understanding more how
something works. Sometimes things get discarded but they can really
help in making a point. Lately, while working on channel binding
support for SCRAM, TAP tests have for example proved to help:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9288d62bb4b6f302bf13bb2fed3783b61385f315#patch13.
This needs a SSL setup with a specific configuration for pg_hba.conf,
and proved with just a lookup at the tests in the patch that the
feature proposed was able to achieve what it aimed at. So it proved a
point.

>> 4.  In the src/test/examples directory (which are all libpq tests),
>> why is the “examples” directory not included when building postgres? (Why
>> weren't these libpq tests added to src/interface/libpq/test or in regression
>> test suite instead?) In short, how to know where (in which file/directory in
>> source code) to put a test?
>
> Dunno, sorry.

Those are mainly present as example programs.
-- 
Michael



Re: backends stuck in "startup"

2017-11-21 Thread Rakesh Kumar
why is that I did not receive the first 4 emails on this topic?  I see that 
only the old email address "pgsql-gene...@postgresql.org" is mentioned.  Could 
that be the reason ?

ps: I am adding the new lists address.



On 2017-11-21 19:02:01 -0500, Tom Lane wrote:
> and...@anarazel.de (Andres Freund) writes:
> > On 2017-11-21 18:50:05 -0500, Tom Lane wrote:
> >> (If Justin saw that while still on 9.6, then it'd be worth looking
> >> closer.)
> 
> > Right. I took this to be referring to something before the current
> > migration, but I might have overinterpreted things. There've been
> > various forks/ports of pg around that had hand-coded replacements with
> > futex usage, and there were definitely buggy versions going around a few
> > years back.
> 
> Poking around in the archives reminded me of this thread:
> https://www.postgresql.org/message-id/flat/14947.1475690...@sss.pgh.pa.us
> which describes symptoms uncomfortably close to what Justin is showing.
> 
> I remember speculating that the SysV-sema implementation, because it'd
> always enter the kernel, would provide some memory barrier behavior
> that POSIX-sema code based on futexes might miss when taking the no-wait
> path.

I think I was speculating that, but with the benefit of just having had
my fourth espresso: I've a hard time believing that - the fast path in a
futex pretty much has to be either a test-and-set or a
compare-and-exchange type operation. See e.g. the demo of futex usage in
the futex(2) manpage:

while (1) {
/* Is the futex available? */
if (__sync_bool_compare_and_swap(futexp, 1, 0))
break;  /* Yes */

/* Futex is not available; wait */
s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
if (s == -1 && errno != EAGAIN)
errExit("futex-FUTEX_WAIT");
}

I can't see how you could make use of futexes without some kind of
barrier semantics, at least on x86.

Greetings,

Andres Freund


Re: backends stuck in "startup"

2017-11-21 Thread Tom Lane
I wrote:
> ...  Maybe we need
> to take a closer look at where LWLocks devolve to blocking on the process
> semaphore and see if there's any implicit assumptions about barriers there.

Like, say, here:

for (;;)
{
PGSemaphoreLock(proc->sem);
if (!proc->lwWaiting)
break;
extraWaits++;
}

which is exactly where Justin's backend is stuck.  Without a read barrier
after the PGSemaphoreLock, it seems possible that we might not see our
lwWaiting as cleared yet, causing the loop to go around an extra time and
wait for a sema wakeup that will never come.  Then, we are holding the
lock but are blocked anyway, and everyone else who wants that same lock
will queue up behind us.

regards, tom lane



Re: backends stuck in "startup"

2017-11-21 Thread Andres Freund
On 2017-11-21 19:02:01 -0500, Tom Lane wrote:
> and...@anarazel.de (Andres Freund) writes:
> > On 2017-11-21 18:50:05 -0500, Tom Lane wrote:
> >> (If Justin saw that while still on 9.6, then it'd be worth looking
> >> closer.)
> 
> > Right. I took this to be referring to something before the current
> > migration, but I might have overinterpreted things. There've been
> > various forks/ports of pg around that had hand-coded replacements with
> > futex usage, and there were definitely buggy versions going around a few
> > years back.
> 
> Poking around in the archives reminded me of this thread:
> https://www.postgresql.org/message-id/flat/14947.1475690...@sss.pgh.pa.us
> which describes symptoms uncomfortably close to what Justin is showing.
> 
> I remember speculating that the SysV-sema implementation, because it'd
> always enter the kernel, would provide some memory barrier behavior
> that POSIX-sema code based on futexes might miss when taking the no-wait
> path.

I think I was speculating that, but with the benefit of just having had
my fourth espresso: I've a hard time believing that - the fast path in a
futex pretty much has to be either a test-and-set or a
compare-and-exchange type operation. See e.g. the demo of futex usage in
the futex(2) manpage:

while (1) {
/* Is the futex available? */
if (__sync_bool_compare_and_swap(futexp, 1, 0))
break;  /* Yes */

/* Futex is not available; wait */
s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
if (s == -1 && errno != EAGAIN)
errExit("futex-FUTEX_WAIT");
}

I can't see how you could make use of futexes without some kind of
barrier semantics, at least on x86.

Greetings,

Andres Freund



Re: backends stuck in "startup"

2017-11-21 Thread Tom Lane
Justin Pryzby  writes:
> As $subject: backends are stuck in startup for minutes at a time.  I didn't
> strace this time, but I believe last time I saw one was waiting in a futex.

Hm...

> I saved ~40 cores from backends from the most recent incident, which are all
> essentially identical:

This one seems to be waiting for a shared LWLock (specifically, one of
the locks for the shared-buffer lookup hashtable), which could only block
if someone else is holding that lock exclusively, which implies that
there's at least one backtrace that's different from this.  Unless you
can find what was holding that lock, there's not much we can learn from
this report ...

regards, tom lane



Re: [GENERAL] Query Using Massive Temp Space

2017-11-21 Thread Tom Lane
Thomas Munro  writes:
> On Wed, Nov 22, 2017 at 7:04 AM, Tom Lane  wrote:
>> Now, there's definitely something busted here; it should not have gone as
>> far as 2 million batches before giving up on splitting.

> I had been meaning to discuss this.  We only give up when we reach the
> point when a batch is entirely entirely kept or sent to a new batch
> (ie splitting the batch resulted in one batch with the whole contents
> and another empty batch).  If you have about 2 million evenly
> distributed keys and an ideal hash function, and then you also have 42
> billion keys that are the same (and exceed work_mem), we won't detect
> extreme skew until the 2 million well behaved keys have been spread so
> thin that the 42 billion keys are isolated in a batch on their own,
> which we should expect to happen somewhere around 2 million batches.

Yeah, I suspected it was something like that, but hadn't dug into the
code yet.

> I have wondered if our extreme skew detector needs to go off sooner.
> I don't have a specific suggestion, but it could just be something
> like 'you threw out or kept more than X% of the tuples'.

Doing this, with some threshold like 95% or 99%, sounds plausible to me.
I'd like to reproduce Cory's disk-space issue before we monkey with
related logic, though; fixing the part we understand might obscure
the part we still don't.

regards, tom lane



dblink surprise

2017-11-21 Thread Torsten Förtsch
Hi,

according to the documentation, dblink_send_query sends the query to be
executed asynchronously. I tried this out in the following function and it
works as expected:

CREATE OR REPLACE FUNCTION t_par () RETURNS TABLE (
tx_time TIMESTAMP,
end_time TIMESTAMP
) AS $def$
DECLARE
v_q RECORD;
BEGIN
FOR v_q IN
WITH jobs(cn) AS (
VALUES ('c1'), ('c2')
)
, conn AS (
SELECT *, 1/(dblink_connect(cn, 'dbname=postgres
port=5440')='OK')::INT AS connstatus
  FROM jobs
)
SELECT conn.*, 1/q.status AS sendstatus
  FROM conn
 CROSS JOIN LATERAL dblink_send_query(conn.cn,
   $$
   select now(), pg_sleep(3), clock_timestamp()
   $$) q(status)
LOOP
RETURN QUERY
SELECT tb.tx_time, tb.end_time
  FROM dblink_get_result(v_q.cn) tb(
   tx_time TIMESTAMP,
   dummy TEXT,
   end_time TIMESTAMP
   );
PERFORM dblink_disconnect(v_q.cn);
END LOOP;
END
$def$ LANGUAGE plpgsql;

# select * from t_par();
  tx_time   |  end_time
+
 2017-11-21 21:14:35.593741 | 2017-11-21 21:14:38.597384
 2017-11-21 21:14:35.595805 | 2017-11-21 21:14:38.599272

Both remote queries start at the same time and finish 3 seconds later.

Then I thought I can take it one step further and collect the results in
the same statement:

CREATE OR REPLACE FUNCTION t_seq () RETURNS TABLE (
tx_time TIMESTAMP,
end_time TIMESTAMP
) AS $def$
BEGIN
RETURN QUERY
WITH v_q AS (
WITH jobs(cn) AS (
VALUES ('c1'), ('c2')
)
, conn AS (
SELECT *, 1/(dblink_connect(cn, 'dbname=postgres
port=5440')='OK')::INT AS connstatus
  FROM jobs
)
SELECT conn.*, 1/q.status AS sendstatus
  FROM conn
 CROSS JOIN LATERAL dblink_send_query(conn.cn,
   $$
   select now(), pg_sleep(3), clock_timestamp()
   $$) q(status)
)
SELECT tb.tx_time, tb.end_time
  FROM v_q
 CROSS JOIN LATERAL dblink_get_result(v_q.cn) tb(
   tx_time TIMESTAMP,
   dummy TEXT,
   end_time TIMESTAMP
   );
PERFORM dblink_disconnect(jobs.cn)
   FROM (VALUES ('c1'), ('c2')) jobs(cn);
END
$def$ LANGUAGE plpgsql;

regentmarkets=# select * from t_seq();
  tx_time   |  end_time
+
 2017-11-21 21:25:07.764467 | 2017-11-21 21:25:10.768032
 2017-11-21 21:25:10.770409 | 2017-11-21 21:25:13.773907

Unfortunately, that does not work. The remote queries are started one after
the other.

Why?

This is PG 9.6.

BTW, is it somehow possible in plpgsql to catch a query cancellation or
backend termination request? That would be useful to propagate such
requests to remote queries.

Thanks,
Torsten


Re: migrations (was Re: To all who wish to unsubscribe)

2017-11-21 Thread Andrew Sullivan
On Tue, Nov 21, 2017 at 11:46:42AM -0800, John R Pierce wrote:
> On 11/21/2017 11:42 AM, Joshua D. Drake wrote:
> > I am not sure if that is sarcasm but I think the reason is pretty self
> > explanatory. -Hackers have all the people that understand how all this
> > works, -general has all the people that don't.
> 
> rotfl, and ain't that the truth.

I know it's enjoyable to make fun of one's users -- those of us who
make the DNS happen have apparently made a career of it -- but I don't
think it's fair or likely to endear the database system I love to
those who might be coming from, say, the database system that employs
me.

I am pretty sure I know as much about mail processing as most of the
people who were involved in this change.  I apparently missed an
important announcement; that hardly surprises me given the volume of
mail I read, and I am pleased to have been corrected (and apologised
accordingly).

But pointing and laughing at your users and suggesting that they are
ignorant rubes is what causes people to turn their backs.  I am not
going to apologise for being offended by that.  It's a nasty thing to
do on a -general list.

Best regards,

A

-- 
Andrew Sullivan
a...@crankycanuck.ca



Re: migrations (was Re: To all who wish to unsubscribe)

2017-11-21 Thread Tom Lane
John R Pierce  writes:
> On 11/21/2017 11:42 AM, Joshua D. Drake wrote:
>> I am not sure if that is sarcasm but I think the reason is pretty self 
>> explanatory. -Hackers have all the people that understand how all this 
>> works, -general has all the people that don't. 

> rotfl, and ain't that the truth.

I'm not sure I believe it.  People reading any database-oriented mailing
list are going to be pretty tech-savvy, I'd think.

regards, tom lane



Re: migrations (was Re: To all who wish to unsubscribe)

2017-11-21 Thread John R Pierce

On 11/21/2017 11:42 AM, Joshua D. Drake wrote:
I am not sure if that is sarcasm but I think the reason is pretty self 
explanatory. -Hackers have all the people that understand how all this 
works, -general has all the people that don't. 


rotfl, and ain't that the truth.


--
john r pierce, recycling bits in santa cruz




Re: One Partition by list is always chosen by planner

2017-11-21 Thread legrand legrand
yes it is

show constraint_exclusion
partition

and if I explain the same query with an other filter

explain select * from wiki_data_part where category='fr'

| Append  (cost=0.00..14010.76 rows=291609 width=48)

  
|   ->  Seq Scan on wiki_data_part_f  (cost=0.00..9975.04 rows=291339
width=48)   

| Filter: ((category)::text = 'fr'::text)   

  
|   ->  Seq Scan on wiki_data_part_s  (cost=0.00..4035.72 rows=270 width=50)

  
| Filter: ((category)::text = 'fr'::text)

wiki_data_part_s is always chosen in the plan



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: Nightmare? was unsubscribe

2017-11-21 Thread Alvaro Herrera
Andrew Sullivan wrote:

> That would change the body of the mail.  Mail that is DKIM signed (or
> worse conforms to DMARC) cannot be so changed, or it will fail
> validation and will be bounced.  The bouncing system is then doing the
> correct thing, and yet it will create "excessive bounces" to the list,
> which will cause the _validating_ user to become a problem on the
> list.  Users don't control their mail server validation policies, so
> lists have to conform.  Tom Lane explained this upthread somewhere.

This was already happening and it already had had a very ugly
consequence: we had to ban Yahoo addresses from subscribing to the list,
because since the emails sent by the list from Yahoo subscribers were
"invalid" per DMARC, the postgresql.org servers were being flagged as
spammers, causing all sorts of trouble.  I think we'll be able to accept
Yahoo subscribers to the lists after the migration ... assuming they
still exist, of course.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: migrations (was Re: To all who wish to unsubscribe)

2017-11-21 Thread Stephen Frost
Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> What would help the majority of readers is if a rule could be added that
> keys off of the word "unsubscribe" (and maybe some other key phrases) and
> dumps it into a non-public moderation queue where requests can be handled
> manually and prevented from reaching the list members.

This was already done.  Originally it was just against the body, now it
looks in the body and in the Subject.  Hindsight being what it is, we're
sorry we didn't have it in both places to begin with, but that's all
water under the bridge at this point.

We're now getting more emails on these threads than about either
PostgreSQL or people wanting to be unsubscribed.  Please, let's move on.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: To all who wish to unsubscribe

2017-11-21 Thread Steve Atkins

> On Nov 21, 2017, at 10:39 AM, Andrew Sullivan  wrote:
> 
> On Mon, Nov 20, 2017 at 02:46:08PM -0800, Steve Atkins wrote:
>> That's poor practice, for several reasons - replay attacks with added content
>> and it being an extremely rare practice that's likely to trigger bugs in DKIM
>> validation are two. The latter is the much bigger deal.
>> 
>> It also doesn't help much for most MIME encoded mail (including base64
>> encoded plain text, like the mail I'm replying to).
>> 
>> Pretending those paragraphs aren't there is the right thing to do.
> 
> Yes.  Also the DMARC and forthcoming ARC mechanisms -- super important
> for people behind gmail and yahoo and so on -- make that feature not
> really work, AFAICT.  

It doesn't break DMARC or ARC as they only care if the mail is
authenticated via SPF or DKIM (with an aligned, blah, blah, blah) so
if the DKIM passes, even with an l= flag, it's OK. But ...

> I think that part of DKIM is busted, and the
> authors of it I've talked to seem to agree.

Pretty much, yes. Certainly for mail where you don't have tight control
over how it's generated.

Removing Subject tagging and footers of the messages isn't an accidental
side effect of the migration, it's (I assume) a primary goal of it.
If that weren't done, more and more people at large consumer mailbox
providers would increasingly have problems sending mail successfully
to the lists. 

Peoples mail filters will just have to adjust.

Blame Yahoo, not PGDG.

Cheers,
  Steve




Re: To all who wish to unsubscribe

2017-11-21 Thread John R Pierce

On 11/21/2017 10:39 AM, Andrew Sullivan wrote:

On Mon, Nov 20, 2017 at 02:46:08PM -0800, Steve Atkins wrote:

That's poor practice, for several reasons - replay attacks with added content
and it being an extremely rare practice that's likely to trigger bugs in DKIM
validation are two. The latter is the much bigger deal.

It also doesn't help much for most MIME encoded mail (including base64
encoded plain text, like the mail I'm replying to).

Pretending those paragraphs aren't there is the right thing to do.

Yes.  Also the DMARC and forthcoming ARC mechanisms -- super important
for people behind gmail and yahoo and so on -- make that feature not
really work, AFAICT.  I think that part of DKIM is busted, and the
authors of it I've talked to seem to agree.


it seems to *ME* like a simpler solution to the original problem would 
have been to simply STRIP any DKIM out of the original messages, and 
continue to munge headers and footers like mail list reflectors have 
been doing for decades.



--
john r pierce, recycling bits in santa cruz



migrations (was Re: To all who wish to unsubscribe)

2017-11-21 Thread Andrew Sullivan
On Mon, Nov 20, 2017 at 08:40:08PM +0100, Magnus Hagander wrote:
> 
> Which is why the extra email was sent *after* the migration, to make sure
> it would be the first one *not* hitting peoples filters, and thus have a
> chance of being read even if the others weren't.

That surely didn't work, since the relevant people received that mail
after the migration had happened.

> So do you have any suggestions for actually fixing that? Given that we have
> more lists to migrate, if you can figure out a way to make those changes
> without peoples filters not matching, we'd be happy to hear it..

I don't have an answer for that, but I will note that if this
migration was announced weeks in advance I certainly missed it.  I
admit I was busy, but I can't find it in the mail I didn't read yet,
either.

Maybe a two week heads up that some time in the future the mail is
going to change (the same announcement as was sent) would help?

A

-- 
Andrew Sullivan
a...@anvilwalrusden.com



Re: unsubscribe

2017-11-21 Thread Bruce Momjian
On Tue, Nov 21, 2017 at 10:34:27AM +0100, Magnus Hagander wrote:
> On Tue, Nov 21, 2017 at 8:17 AM, Rafal Pietrak  wrote:
> 
> Hi all,
> 
> Have anyone noticed, that last couple of days on the list, can become a
> nice example of HOWTO turn the best list on the planet into a nightmare?
> 
> Pls forgive rudeness, but IMHO, whoever made the "upgrade" should
> manually take all the "unsubscribe" messages from archive and do that
> "unsubscription" by hand  instead of giving not always helpfull
> guidance.
> 
> 
> So what makes you think that's not what they're doing?
> 
> Note that the majority of the advice posted have not been from the people who
> actually are doing the system upgrade. It's from fellow users.
> 
> So to show some numbers, since the migration, 57 unsubscriptions have been
> processed manually by the people who were involved in the migration. This is a
> mix of people who posted unsubscribe requests that ended up getting moderated,
> and people who posted publicly. 
> 
> During this time, the systems have also processed 282 unsubscriptions from
> people who managed to read and work with the instructions.

Maybe we need an "unsubscribe" email list.  ;-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [GENERAL] unnest-like pg_stats.most_common_values and pg_stats.most_common_freqs

2017-11-21 Thread Luca Ferrari
On Mon, Nov 20, 2017 at 4:38 PM, Tom Lane  wrote:

> Those two aren't correlated ... but I think what you want is
>
> select ...,v,f
> from
>   pg_stats,
>   rows from (unnest(most_common_vals::text::text[]),
>  unnest(most_common_freqs)) r(v,f)
> where ...

Of course I was meaning MCVs and MCFs, it did not make sense to use
the histogram here!
Thanks for the query, it works (tested on 9.6.6).

Luca



Re: unsubscribe

2017-11-21 Thread Magnus Hagander
On Tue, Nov 21, 2017 at 8:19 AM, Craig Ringer  wrote:

> On 21 November 2017 at 15:17, Rafal Pietrak  wrote:
>
>> Hi all,
>>
>> Have anyone noticed, that last couple of days on the list, can become a
>> nice example of HOWTO turn the best list on the planet into a nightmare?
>>
>> Pls forgive rudeness, but IMHO, whoever made the "upgrade" should
>> manually take all the "unsubscribe" messages from archive and do that
>> "unsubscription" by hand  instead of giving not always helpfull
>> guidance.
>
>
> The list used to have a filter that blocked messages with "unsubscribe" in
> the subject or forced them into moderation the moderation queue.
>
> I thought that'd be preserved with the PgLister migration.
>

It is preserved.

But they are easily fooled. We can't match every mail that contains the
word unsubscribe (or subscribe) *anywhere*. There are restrictions. And
they have a hard time dealing with people posting in
pure-html-mail-with-no-plaintext-part and things like that.

The filters have trapped around 30 emails so far that didn't make it to the
list because of those.



> But really, a nightmare? Yeah, it's a pain, but I think that's laying it
> on a bit strong. Personally I appreciate the hard and usually thankless
> work the infrastructure and admin team do.
>

Thanks!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: unsubscribe

2017-11-21 Thread Magnus Hagander
On Tue, Nov 21, 2017 at 8:17 AM, Rafal Pietrak  wrote:

> Hi all,
>
> Have anyone noticed, that last couple of days on the list, can become a
> nice example of HOWTO turn the best list on the planet into a nightmare?
>
> Pls forgive rudeness, but IMHO, whoever made the "upgrade" should
> manually take all the "unsubscribe" messages from archive and do that
> "unsubscription" by hand  instead of giving not always helpfull
> guidance.
>

So what makes you think that's not what they're doing?

Note that the majority of the advice posted have not been from the people
who actually are doing the system upgrade. It's from fellow users.

So to show some numbers, since the migration, 57 unsubscriptions have been
processed manually by the people who were involved in the migration. This
is a mix of people who posted unsubscribe requests that ended up getting
moderated, and people who posted publicly.

During this time, the systems have also processed 282 unsubscriptions from
people who managed to read and work with the instructions.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: unsubscribe

2017-11-21 Thread Rafal Pietrak
I'm very very sorry for my rude words. I do understand and truelly appreciate continues availability of the list. I know (from my own experence) it takes signifficant effort to maintain.
Nonetheless, I do stand by opinion, that it would annoy less if the infant state of the new list servar was babysitted for a while.
Regards
R
21.11.2017 8:19 AM Craig Ringer  napisał(a):On 21 November 2017 at 15:17, Rafal Pietrak  wrote:Hi all,

Have anyone noticed, that last couple of days on the list, can become a
nice example of HOWTO turn the best list on the planet into a nightmare?

Pls forgive rudeness, but IMHO, whoever made the "upgrade" should
manually take all the "unsubscribe" messages from archive and do that
"unsubscription" by hand  instead of giving not always helpfull
guidance.The list used to have a filter that blocked messages with "unsubscribe" in the subject or forced them into moderation the moderation queue. I thought that'd be preserved with the PgLister migration. But really, a nightmare? Yeah, it's a pain, but I think that's laying it on a bit strong. Personally I appreciate the hard and usually thankless work the infrastructure and admin team do.--  Craig Ringer   http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services