Re: [HACKERS] Restore-reliability mode

2015-07-27 Thread Alvaro Herrera
Noah Misch wrote:
> On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote:
> > Peter Geoghegan wrote:
> > > On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch  wrote:
> > > >   - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local 
> > > > pin
> > > > count falls to zero.  Under CLOBBER_FREED_MEMORY, wipe a shared 
> > > > buffer
> > > > when its global pin count falls to zero.
> > > 
> > > Did a patch for this ever materialize?
> > 
> > I think the first part would be something like the attached.
> 
> Neat.  Does it produce any new complaints during "make installcheck"?

I only tried a few tests, for lack of time, and it didn't produce any.
(To verify that the whole thing was working properly, I reduced the
range of memory made available during PinBuffer and that resulted in a
crash immediately).  I am not really familiar with valgrind TBH and just
copied a recipe to run postmaster under it, so if someone with more
valgrind-fu could verify this, it would be great.


This part:

> > > > Under CLOBBER_FREED_MEMORY, wipe a shared buffer when its
> > > > global pin count falls to zero.

can be done without any valgrind, I think.  Any takers?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-07-26 Thread Noah Misch
On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote:
> Peter Geoghegan wrote:
> > On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch  wrote:
> > >   - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local 
> > > pin
> > > count falls to zero.  Under CLOBBER_FREED_MEMORY, wipe a shared buffer
> > > when its global pin count falls to zero.
> > 
> > Did a patch for this ever materialize?
> 
> I think the first part would be something like the attached.

Neat.  Does it produce any new complaints during "make installcheck"?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-07-23 Thread Alvaro Herrera
Noah Misch wrote:

> - Add buildfarm members.  This entails reporting any bugs that prevent an
>   initial passing run.  Once you have a passing run, schedule regular runs.
>   Examples of useful additions:
>   - "./configure ac_cv_func_getopt_long=no, ac_cv_func_snprintf=no ..." to
> enable all the replacement code regardless of the current platform's need
> for it.  This helps distinguish "Windows bug" from "replacement code bug."
>   - --disable-integer-datetimes, --disable-float8-byval, disable-float4-byval,
> --disable-spinlocks, --disable-atomics, disable-thread-safety,
> --disable-largefile, #define RANDOMIZE_ALLOCATED_MEMORY

  #define RELCACHE_FORCE_RELEASE + #define CLOBBER_FREED_MEMORY

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-07-23 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch  wrote:
> >   - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin
> > count falls to zero.  Under CLOBBER_FREED_MEMORY, wipe a shared buffer
> > when its global pin count falls to zero.
> 
> Did a patch for this ever materialize?

I think the first part would be something like the attached.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e4b25587..83fde10 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -47,6 +47,7 @@
 #include "storage/proc.h"
 #include "storage/smgr.h"
 #include "storage/standby.h"
+#include "utils/memdebug.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/timestamp.h"
@@ -1438,6 +1439,9 @@ PinBuffer(volatile BufferDesc *buf, BufferAccessStrategy strategy)
 		ref = NewPrivateRefCountEntry(b + 1);
 
 		LockBufHdr(buf);
+
+		VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
+
 		buf->refcount++;
 		if (strategy == NULL)
 		{
@@ -1498,6 +1502,8 @@ PinBuffer_Locked(volatile BufferDesc *buf)
 	 */
 	Assert(GetPrivateRefCountEntry(b + 1, false) == NULL);
 
+	VALGRIND_MAKE_MEM_DEFINED(BufHdrGetBlock(buf), BLCKSZ);
+
 	buf->refcount++;
 	UnlockBufHdr(buf);
 
@@ -1543,6 +1549,8 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
 		Assert(buf->refcount > 0);
 		buf->refcount--;
 
+		VALGRIND_MAKE_MEM_NOACCESS(BufHdrGetBlock(buf), BLCKSZ);
+
 		/* Support LockBufferForCleanup() */
 		if ((buf->flags & BM_PIN_COUNT_WAITER) &&
 			buf->refcount == 1)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-10 Thread Andres Freund
On 2015-06-10 01:57:22 -0400, Noah Misch wrote:
> I think I agree with everything after your first sentence.  I liked your
> specific proposal to split StartupXLOG(), but making broad-appeal
> restructuring proposals is hard.  I doubt we would get good results by casting
> a wide net for restructuring ideas.

I'm not meaning that we should actively strive to find as many things to
refactor as possible (yes, over-emphasized a bit). But that we shouldn't
skip refactoring if we notice something structurally bad, just because
it's been that way and we don't want to touch something "working". That
argument has e.g. been made repeatedly for xlog.c contents.

My feeling is that we're reaching the stage where a significant number
of bugs are added because code is structured "needlessly" complicated
and/or repetitive. And better testing can only catch so much - often
enough somebody has to think of all the possible corner cases.

> Automated testing has a lower barrier to
> entry and is far less liable to make things worse instead of better.  I can
> hope for good results from a TestSuiteFest, but not from a RestructureFest.
> That said, if folks initiate compelling restructure proposals, we should be
> willing to risk bugs from them like we risk bugs to acquire new
> features.

Sure, increasing testing and reviews are good independently. And
especially testing actually makes refactoring much more realistic.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-09 Thread Noah Misch
On Wed, Jun 03, 2015 at 04:18:37PM +0200, Andres Freund wrote:
> On 2015-06-03 09:50:49 -0400, Noah Misch wrote:
> > Second, I would define the subject matter as "bug fixes, testing and
> > review", not "restructuring, testing and review."  Different code
> > structures are clearest to different hackers.  Restructuring, on
> > average, adds bugs even more quickly than feature development adds
> > them.
> 
> I can't agree with this. While I agree with not doing large
> restructuring for 9.5, I think we can't affort not to refactor for
> clarity, even if that introduces bugs. Noticeable parts of our code have
> to frequently be modified for new features and are badly structured at
> the same time. While restructuring will may temporarily increase the
> number of bugs in the short term, it'll decrease the number of bugs long
> term while increasing the number of potential contributors and new
> features.  That's obviously not to say we should just refactor for the
> sake of it.

I think I agree with everything after your first sentence.  I liked your
specific proposal to split StartupXLOG(), but making broad-appeal
restructuring proposals is hard.  I doubt we would get good results by casting
a wide net for restructuring ideas.  Automated testing has a lower barrier to
entry and is far less liable to make things worse instead of better.  I can
hope for good results from a TestSuiteFest, but not from a RestructureFest.
That said, if folks initiate compelling restructure proposals, we should be
willing to risk bugs from them like we risk bugs to acquire new features.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-08 Thread Bruce Momjian
On Mon, Jun  8, 2015 at 07:48:36PM +0200, Andres Freund wrote:
> On 2015-06-08 13:44:05 -0400, Bruce Momjian wrote:
> > I understand the overreaction/underreaction debate.  Here were my goals
> > in this discussion:
> > 
> > 1.  stop worry about the 9.5 timeline so we could honestly assess our
> > software - *done*
> > 2.  seriously address multi-xact issues without 9.5/commit-fest pressure -
> > *in process*
> > 3.  identify any other areas in need of serious work
> > 
> > While I like the list you provided, I don't think we can be effective in
> > an environment where we assume every big new features will have problems
> > like multi-xact.  For example, we have not seen destabilization from any
> > major 9.4 features, that I can remember anyway.
> > 
> > Unless there is consensus about new areas for #3, I am thinking we will
> > continue looking at multi-xact until we are happy, then move ahead with
> > 9.5 items in the way we have before.
> 
> I think one important part is that we (continue to?) regularly tell our
> employers that work on pre-commit, post-commit review, and refactoring
> are critical for their long term business prospects.  My impression so
> far is that that the employer side hasn't widely realized that fact, and
> that many contributors do the review etc. part in their spare time.

Agreed.  My bet is that more employers realize it now than they did a
few months ago --- kind of hard to miss all those minor releases and
customer complaints.  :-(

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-08 Thread Andres Freund
On 2015-06-08 13:44:05 -0400, Bruce Momjian wrote:
> I understand the overreaction/underreaction debate.  Here were my goals
> in this discussion:
> 
> 1.  stop worry about the 9.5 timeline so we could honestly assess our
> software - *done*
> 2.  seriously address multi-xact issues without 9.5/commit-fest pressure -
> *in process*
> 3.  identify any other areas in need of serious work
> 
> While I like the list you provided, I don't think we can be effective in
> an environment where we assume every big new features will have problems
> like multi-xact.  For example, we have not seen destabilization from any
> major 9.4 features, that I can remember anyway.
> 
> Unless there is consensus about new areas for #3, I am thinking we will
> continue looking at multi-xact until we are happy, then move ahead with
> 9.5 items in the way we have before.

I think one important part is that we (continue to?) regularly tell our
employers that work on pre-commit, post-commit review, and refactoring
are critical for their long term business prospects.  My impression so
far is that that the employer side hasn't widely realized that fact, and
that many contributors do the review etc. part in their spare time.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-08 Thread Bruce Momjian
On Sat, Jun  6, 2015 at 03:58:05PM -0400, Noah Misch wrote:
> On Fri, Jun 05, 2015 at 08:25:34AM +0100, Simon Riggs wrote:
> > This whole idea of "feature development" vs reliability is bogus. It
> > implies people that work on features don't care about reliability. Given
> > the fact that many of the features are actually about increasing database
> > reliability in the event of crashes and corruptions it just makes no sense.
> 
> I'm contrasting work that helps to keep our existing promises ("reliability")
> with work that makes new promises ("features").  In software development, we
> invariably hazard old promises to make new promises; our success hinges on
> electing neither too little nor too much risk.  Two years ago, PostgreSQL's
> track record had placed it in a good position to invest in new, high-risk,
> high-reward promises.  We did that, and we emerged solvent yet carrying an
> elevated debt service ratio.  It's time to reduce risk somewhat.
> 
> You write about a different sense of "reliability."  (Had I anticipated this
> misunderstanding, I might have written "Restore-probity mode.")  None of this
> was about classifying people, most of whom allocate substantial time to each
> kind of work.
> 
> > How will we participate in cleanup efforts? How do we know when something
> > has been "cleaned up", how will we measure our success or failure? I think
> > we should be clear that wasting N months on cleanup can *fail* to achieve a
> > useful objective. Without a clear plan it almost certainly will do so. The
> > flip side is that wasting N months will cause great amusement and dancing
> > amongst those people who wish to pull ahead of our open source project and
> > we should take care not to hand them a victory from an overreaction.
> 
> I agree with all that.  We should likewise take care not to become insolvent
> from an underreaction.

I understand the overreaction/underreaction debate.  Here were my goals
in this discussion:

1.  stop worry about the 9.5 timeline so we could honestly assess our
software - *done*
2.  seriously address multi-xact issues without 9.5/commit-fest pressure -
*in process*
3.  identify any other areas in need of serious work

While I like the list you provided, I don't think we can be effective in
an environment where we assume every big new features will have problems
like multi-xact.  For example, we have not seen destabilization from any
major 9.4 features, that I can remember anyway.

Unless there is consensus about new areas for #3, I am thinking we will
continue looking at multi-xact until we are happy, then move ahead with
9.5 items in the way we have before.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-07 Thread Peter Geoghegan
On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch  wrote:
>   - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin
> count falls to zero.  Under CLOBBER_FREED_MEMORY, wipe a shared buffer
> when its global pin count falls to zero.

Did a patch for this ever materialize?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-06 Thread Michael Paquier
On Sun, Jun 7, 2015 at 4:58 AM, Noah Misch  wrote:
> - Write, review and commit more automated test machinery to PostgreSQL.  Test
>   whatever excites you.  If you need ideas, Craig posted some good ones
>   upthread.  Here are a few more:
>   - Improve TAP suite (src/test/perl/TestLib.pm) logging.  Currently, these
> suites redirect much output to /dev/null.  Instead, log that output and
> teach the buildfarm to capture the log.

We can capture the logs and redirect them by replacing
system_or_bail() with more calls to IPC::run. That would be a patch
simple enough. pg_rewind's tests should be switched to use that as
well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-06 Thread Noah Misch
On Fri, Jun 05, 2015 at 08:25:34AM +0100, Simon Riggs wrote:
> This whole idea of "feature development" vs reliability is bogus. It
> implies people that work on features don't care about reliability. Given
> the fact that many of the features are actually about increasing database
> reliability in the event of crashes and corruptions it just makes no sense.

I'm contrasting work that helps to keep our existing promises ("reliability")
with work that makes new promises ("features").  In software development, we
invariably hazard old promises to make new promises; our success hinges on
electing neither too little nor too much risk.  Two years ago, PostgreSQL's
track record had placed it in a good position to invest in new, high-risk,
high-reward promises.  We did that, and we emerged solvent yet carrying an
elevated debt service ratio.  It's time to reduce risk somewhat.

You write about a different sense of "reliability."  (Had I anticipated this
misunderstanding, I might have written "Restore-probity mode.")  None of this
was about classifying people, most of whom allocate substantial time to each
kind of work.

> How will we participate in cleanup efforts? How do we know when something
> has been "cleaned up", how will we measure our success or failure? I think
> we should be clear that wasting N months on cleanup can *fail* to achieve a
> useful objective. Without a clear plan it almost certainly will do so. The
> flip side is that wasting N months will cause great amusement and dancing
> amongst those people who wish to pull ahead of our open source project and
> we should take care not to hand them a victory from an overreaction.

I agree with all that.  We should likewise take care not to become insolvent
from an underreaction.

> So lets do our normal things, not do a "total stop" for an indefinite
> period. If someone has specific things that in their opinion need to be
> addressed, list them and we can talk about doing them, together.

I recommend these four exit criteria:

1. Non-author committer review of foreign keys locks/multixact durability.
   Done when that committer certifies, as if he were committing the patch
   himself today, that the code will not eat data.

2. Non-author committer review of row-level security.  Done when that
   committer certifies that the code keeps its promises and that the
   documentation bounds those promises accurately.

3. Second committer review of the src/backend/access changes for INSERT ... ON
   CONFLICT DO NOTHING/UPDATE.  (Bugs affecting folks who don't use the new
   syntax are most likely to fall in that portion.)  Unlike the previous two
   criteria, a review without certification is sufficient.

4. Non-author committer certifying that the 9.5 WAL format changes will not
   eat your data.  The patch lists Andres and Alvaro as reviewers; if they
   already reviewed it enough to make that certification, this one is easy.

That ties up four people.  For everyone else:

- Fix bugs those reviews find.  This will start slow but will grow to keep
  everyone busy.  Committers won't certify code, and thus we can't declare
  victory, until these bugs are fixed.  The rest of this list, in contrast,
  calls out topics to sample from, not topics to exhaust.

- Turn current buildfarm members green.

- Write, review and commit more automated test machinery to PostgreSQL.  Test
  whatever excites you.  If you need ideas, Craig posted some good ones
  upthread.  Here are a few more:
  - Add a debug mode that calls sched_yield() in SpinLockRelease(); see
6322.1406219...@sss.pgh.pa.us.
  - Improve TAP suite (src/test/perl/TestLib.pm) logging.  Currently, these
suites redirect much output to /dev/null.  Instead, log that output and
teach the buildfarm to capture the log.
  - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local pin
count falls to zero.  Under CLOBBER_FREED_MEMORY, wipe a shared buffer
when its global pin count falls to zero.
  - With assertions enabled, or perhaps in a new debug mode, have
pg_do_encoding_conversion() and pg_server_to_any() check the data for a
no-op conversion instead of assuming the data is valid.

- Add buildfarm members.  This entails reporting any bugs that prevent an
  initial passing run.  Once you have a passing run, schedule regular runs.
  Examples of useful additions:
  - "./configure ac_cv_func_getopt_long=no, ac_cv_func_snprintf=no ..." to
enable all the replacement code regardless of the current platform's need
for it.  This helps distinguish "Windows bug" from "replacement code bug."
  - --disable-integer-datetimes, --disable-float8-byval, disable-float4-byval,
--disable-spinlocks, --disable-atomics, disable-thread-safety,
--disable-largefile, #define RANDOMIZE_ALLOCATED_MEMORY
  - Any OS or CPU architecture other than x86 GNU/Linux, even ones already
represented.

- Write, review and commit fixes for the bugs that come to light by way of
  these new automated 

Re: [HACKERS] Restore-reliability mode

2015-06-05 Thread Simon Riggs
On 3 June 2015 at 14:50, Noah Misch  wrote:

> Subject changed from "Re: [CORE] postpone next week's release".
>
> On Sat, May 30, 2015 at 10:48:45PM -0400, Bruce Momjian wrote:
> > Well, I think we stop what we are doing, focus on restructuring,
> > testing, and reviewing areas that historically have had problems, and
> > when we are done, we can look to go to 9.5 beta.  What we don't want to
> > do is to push out more code and get back into a
> > wack-a-bug-as-they-are-found mode, which obviously did not serve us well
> > for multi-xact, and which is what releasing a beta will do, and of
> > course, more commit-fests, and more features.
> >
> > If we have to totally stop feature development until we are all happy
> > with the code we have, so be it.  If people feel they have to get into
> > cleanup mode or they will never get to add a feature to Postgres again,
> > so be it.  If people say, heh, I am not going to do anything and just
> > come back when cleanup is done (by someone else), then we will end up
> > with a smaller but more dedicated development team, and I am fine with
> > that too.  I am suggesting that until everyone is happy with the code we
> > have, we should not move forward.
>
> I like the essence of this proposal.  Two suggestions.  We can't achieve or
> even robustly measure "everyone is happy with the code," so let's pick
> concrete exit criteria.  Given criteria framed like "Files A,B,C and
> patches
> X,Y,Z have a sign-off from a committer other than their original
> committer."
> anyone can monitor progress and find specific ways to contribute.


I don't like the proposal, nor do I like the follow on comments made.

This whole idea of "feature development" vs reliability is bogus. It
implies people that work on features don't care about reliability. Given
the fact that many of the features are actually about increasing database
reliability in the event of crashes and corruptions it just makes no sense.

How will we participate in cleanup efforts? How do we know when something
has been "cleaned up", how will we measure our success or failure? I think
we should be clear that wasting N months on cleanup can *fail* to achieve a
useful objective. Without a clear plan it almost certainly will do so. The
flip side is that wasting N months will cause great amusement and dancing
amongst those people who wish to pull ahead of our open source project and
we should take care not to hand them a victory from an overreaction.

Lastly, the idea that we allow developers to drift away and we're OK with
that is just plain mad. I've spent a decade trying to grow the pool of
skilled developers who can assist the project. Acting against that, in deed
or just word, is highly counter productive for the project.

Let's just take a breath and think about this.

It is normal for us to spend a month or so consolidating our work. It is
also normal for people that see major problems to call them out,
effectively using the "Stop The Line" technique.
https://leanbuilds.wordpress.com/tag/stop-the-line/

So lets do our normal things, not do a "total stop" for an indefinite
period. If someone has specific things that in their opinion need to be
addressed, list them and we can talk about doing them, together. I thought
that was what the Open Items list was for. Let's use it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Restore-reliability mode

2015-06-03 Thread Joshua D. Drake

On 06/03/2015 07:18 AM, Andres Freund wrote:


On 2015-06-03 09:50:49 -0400, Noah Misch wrote:

Second, I would define the subject matter as "bug fixes, testing and
review", not "restructuring, testing and review."  Different code
structures are clearest to different hackers.  Restructuring, on
average, adds bugs even more quickly than feature development adds
them.


I can't agree with this. While I agree with not doing large
restructuring for 9.5, I think we can't affort not to refactor for
clarity, even if that introduces bugs. Noticeable parts of our code have
to frequently be modified for new features and are badly structured at
the same time. While restructuring will may temporarily increase the
number of bugs in the short term, it'll decrease the number of bugs long
term while increasing the number of potential contributors and new
features.  That's obviously not to say we should just refactor for the
sake of it.



Our project has been continuing to increase momentum over the last few 
years and our adoption has increased at an amazing rate. It is important 
to remember that we have users. These users have needs that must be met 
else those users will move on to a different technology.


I agree that we need to postpone this release. I also agree that there 
is likely re-factoring to be done. I have also never met a programmer 
who doesn't think something needs to be re-factored. The majority of 
programmers I know all suffer from NIH and want to change how things are 
implemented.


If we are going to re-factor, it should not be considered global and 
should be attacked with specific goals in mind. If those goals are not 
specifically defined and agreed on, we will get very pretty code with 
very little use for our users. Then our users will leave because they 
are busy waiting on us to re-factor.


In short, we must balance this effort with the needs of the code versus 
the needs of our users.


Sincerely,

JD

--
The most kicking donkey PostgreSQL Infrastructure company in existence.
The oldest, the most experienced, the consulting company to the stars.
Command Prompt, Inc. http://www.commandprompt.com/ +1 -503-667-4564 -
24x7 - 365 - Proactive and Managed Professional Services!


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-03 Thread Andres Freund
On 2015-06-03 09:50:49 -0400, Noah Misch wrote:
> Second, I would define the subject matter as "bug fixes, testing and
> review", not "restructuring, testing and review."  Different code
> structures are clearest to different hackers.  Restructuring, on
> average, adds bugs even more quickly than feature development adds
> them.

I can't agree with this. While I agree with not doing large
restructuring for 9.5, I think we can't affort not to refactor for
clarity, even if that introduces bugs. Noticeable parts of our code have
to frequently be modified for new features and are badly structured at
the same time. While restructuring will may temporarily increase the
number of bugs in the short term, it'll decrease the number of bugs long
term while increasing the number of potential contributors and new
features.  That's obviously not to say we should just refactor for the
sake of it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-06-03 Thread Geoff Winkless
On 3 June 2015 at 14:50, Noah Misch  wrote:

> I
> ​ ​
> would define the subject matter as "bug fixes, testing and review", not
> "restructuring, testing and review."  Different code structures are
> clearest
> to different hackers.  Restructuring, on average, adds bugs even more
> quickly
> than feature development adds them.
>

​+1 to this. Rewriting or restructuring code because you don't trust it
(even though you have no reported real-world bugs)​ is a terrible idea.

Stopping all feature development to do it is even worse.

I know you're not talking about rewriting, but I think
http://www.joelonsoftware.com/articles/fog69.html is always worth a
re-read, if only because it's funny :)

I would always 100% support a decision to push back new releases because of
bugfixes for *known* issues, but if you think you *might *be able to find
bugs in code you don't like, you should do that on your own time. Iff you
find actual bugs, *then *you talk about halting new releases.

Geoff


[HACKERS] Restore-reliability mode

2015-06-03 Thread Noah Misch
Subject changed from "Re: [CORE] postpone next week's release".

On Sat, May 30, 2015 at 10:48:45PM -0400, Bruce Momjian wrote:
> Well, I think we stop what we are doing, focus on restructuring,
> testing, and reviewing areas that historically have had problems, and
> when we are done, we can look to go to 9.5 beta.  What we don't want to
> do is to push out more code and get back into a
> wack-a-bug-as-they-are-found mode, which obviously did not serve us well
> for multi-xact, and which is what releasing a beta will do, and of
> course, more commit-fests, and more features.  
> 
> If we have to totally stop feature development until we are all happy
> with the code we have, so be it.  If people feel they have to get into
> cleanup mode or they will never get to add a feature to Postgres again,
> so be it.  If people say, heh, I am not going to do anything and just
> come back when cleanup is done (by someone else), then we will end up
> with a smaller but more dedicated development team, and I am fine with
> that too.  I am suggesting that until everyone is happy with the code we
> have, we should not move forward.

I like the essence of this proposal.  Two suggestions.  We can't achieve or
even robustly measure "everyone is happy with the code," so let's pick
concrete exit criteria.  Given criteria framed like "Files A,B,C and patches
X,Y,Z have a sign-off from a committer other than their original committer."
anyone can monitor progress and find specific ways to contribute.  Second, I
would define the subject matter as "bug fixes, testing and review", not
"restructuring, testing and review."  Different code structures are clearest
to different hackers.  Restructuring, on average, adds bugs even more quickly
than feature development adds them.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers