Re: [HACKERS] Fix Error Message for allocate_recordbuf() Failure

2016-07-12 Thread Shoaib Lari
Thank you for the feedback.

We had encountered this error message when allocating a record buf of 344
bytes.  The message "record length 344 at %X/%X too long" along with the
comment /* We treat this as a "bogus data" condition */  masked the OOM
condition, implying an error in log record size calculation logic.

The actual allocation that failed was for 5 * Max(BLCKSZ, XLOG_BLCKSZ), a
much larger allocation.

Shoaib


On Sun, Jul 10, 2016 at 10:58 PM, Craig Ringer <cr...@2ndquadrant.com>
wrote:

>
>
> On 11 July 2016 at 12:04, Michael Paquier <michael.paqu...@gmail.com>
> wrote:
>
>> On Sat, Jul 9, 2016 at 2:58 AM, Shoaib Lari <sl...@pivotal.io> wrote:
>> > Besides making the error message more informative, we had to modify
>> > allocate_recordbuf() to return the actual number of bytes that were
>> being
>> > allocated.
>>
>> -   report_invalid_record(state, "record length %u at %X/%X too long",
>> - total_len,
>> - (uint32) (RecPtr >> 32), (uint32) RecPtr);
>> +   report_invalid_record(state,
>> + "cannot allocate %u bytes for record
>> length %u at %X/%X",
>> + newSizeOut, total_len, (uint32) (RecPtr >>
>> 32),
>> + (uint32) RecPtr);
>>
>> It does not look like a good idea to me to complicate the interface of
>> allocate_recordbuf just to make more verbose one error message,
>> meaning that it basically a complain related to the fact that
>> palloc_extended(MCXT_ALLOC_NO_OOM) does not mention to the user the
>> size that it has tried to allocate before returning NULL. Do you have
>> a use case that would prove to be useful if this extra piece of
>> information is provided? Because it seems to me that we don't really
>> care if we know how much memory it has failed to allocate, we only
>> want to know that it *has* failed and take actions only based on that.
>>
>
> I actually find details of how much memory we tried to allocate to be
> really useful in other places that do emit it. Sometimes it's been an
> important clue when trying to figure out what's going on on a remote system
> with no or limited access. Even if it just tells me "we were probably
> incrementally allocating lots of small values since this failure is small"
> vs "this allocation is huge, look for something unusually large" or "this
> allocation is impossibly huge / some suspicious value look for memory
> corruption".
>
> I tend to agree with your suggestion about a better approach but do think
> emitting details on allocation failures is useful. At least when people
> turn VM overcommit off ...
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


[HACKERS] Fix Error Message for allocate_recordbuf() Failure

2016-07-08 Thread Shoaib Lari
Hello,

Attached is a patch for xlogreader.c for a more informative error message
for allocate_recordbuf() failure.

The patch details are:


   - Project name.: None



   - Uniquely identifiable file name, so we can tell difference between
   your v1 and v24.:  src/backend/access/transam/xlogreader.c



   - What the patch does in a short paragraph.:

The patch enhances the error message when memory cannot be allocated to
extend the readRecordBuf.  The previous error message was implying an
invalid (too long) record length and also did not specify how much memory
we were trying to allocate when the memory allocation failure happened.

We have observed failed allocations in production systems where the
reclength was only 344 bytes, but the extended allocation of 5 *
Max(BLKSIZE, XLOG_BLCKSZ) failed due to out of memory.



   - Whether the patch is for discussion or for application (see WIP notes
   below): for application.



   - Which branch the patch is against (ordinarily this will be *master*).
   For more on branches in PostgreSQL, see Using Back Branches
   <https://wiki.postgresql.org/wiki/Working_with_Git#Using_Back_Branches>.:
master



   - Whether it compiles and tests successfully, so we know nothing obvious
   is broken.:  Yes.  We ran make check and all 166 tests passed.



   - Whether it contains any platform-specific items and if so, has it been
   tested on other platforms.:  Nothing platform specific.



   - Confirm that the patch includes regression tests
   <https://wiki.postgresql.org/wiki/Regression_test_authoring> to check
   the new feature actually works as described.:  None.



   - Include documentation on how to use the new feature, including
   examples. See the documentation documentation
   <http://www.postgresql.org/docs/current/static/docguide.html> for more
   information.:  None needed.



   - Describe the effect your patch has on performance, if any.:  None.



   - Try to include a few lines about why you chose to do things particular
   ways, rather than let your reviewer guess what was happening. This can be
   done as code comments, but it might also be an additional reviewers' guide,
   or additions to a README file in one of the code directories.:

Besides making the error message more informative, we had to modify
allocate_recordbuf() to return the actual number of bytes that were being
allocated.



   - If your patch addresses a Todo
<https://wiki.postgresql.org/wiki/Todo> item,
   please give the name of the Todo item in your email. This is so that the
   reviewers will know that the item needs to be marked as done if your patch
   is applied.:  None.



Thank you.

Shoaib Lari


fix_error_message_for_allocate_recordbuf_failure.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers