I'm working on PostgreSQL 9.13 (waiting for admin to push upgrades next week), in the meanwhile, I was curious if there are any known bugs regarding large cursor fetches, or if I am to blame.
My cursor has 400 million records, and I'm fetching in blocks of 2^17 (approx. 130K). When I fetch the next block after processing the 48,889,856th record, then the DB seg faults. It should be noted, I have processed tables with 23 million+ records several times and everything appears to work great. I have watched top, and the system memory usage gets up to 97.6% (from approx 30 million records onward - then sways up and down), but ultimately crashes when I try to get past the 48,889,856th record. I have tried odd and various block sizes, anything greater than 2^17 crashes at the fetch that would have it surpassed 48,889,856 records, 2^16 hits the same sweet spot, and anything less than 2^16 actually crashes slightly earlier (noted in comments in code below). To me, it appears to be an obvious memory leak, the question is who caused it. I would typically assume I am to blame (and I may be), but the code is so simple (shown below) that I can't see how it could be me - unless I am misusing pg-sql (which is totally possible). Here is the code segment that is crashing... <code> // Cursor variables const char *cursor_name = NULL; // Postgres will self-assign a name const int arg_count = 0; // No arguments will be passed Oid *arg_types = NULL; // n/a Datum *arg_values = NULL; // n/a const char *null_args = NULL; // n/a bool read_only = true; // read_only allows for optimization const int cursor_opts = CURSOR_OPT_NO_SCROLL; // default cursor options bool forward = true; //const long fetch_count = FETCH_ALL; //const long fetch_count = 1048576; // 2^20 - last processed = 48,234,496 //const long fetch_count = 524288; // 2^19 - last processed = 48,758,784 //const long fetch_count = 262144; // 2^18 - last processed = 48,758,784 const long fetch_count = 131072; // 2^17 - last processed = 48,889,856 //const long fetch_count = 65536; // 2^16 - last processed = 48,889,856 //const long fetch_count = 32768; // 2^15 - last processed = 48,857,088 //const long fetch_count = 16384; // 2^14 - last processed = 48,791,552 //const long fetch_count = 8192; // 2^13 - last processed = 48,660,480 //const long fetch_count = 4096; // 2^12 - last processed = 48,398,336 //const long fetch_count = 2048; // 2^11 //const long fetch_count = 1024; // 2^10 //const long fetch_count = 512; // 2^9 //const long fetch_count = 256; // 2^8 //const long fetch_count = 128; // 2^7 //const long fetch_count = 64; // 2^6 //const long fetch_count = 32; // 2^5 //const long fetch_count = 16; // 2^4 //const long fetch_count = 8; // 2^3 //const long fetch_count = 4; // 2^2 //const long fetch_count = 2; // 2^1 //const long fetch_count = 1; // 2^0 unsigned int i, j, end, stored; unsigned int result_counter = 0; float8 l1_norm; bool is_null = true; bool nulls[4]; Datum result_tuple_datum[4]; HeapTuple new_tuple; MemoryContext function_context; ResultCandidate *candidate, **candidates, *top, *free_candidate = NULL; KSieve<ResultCandidate *> sieve(result_cnt_); /*********************/ /** Init SPI_cursor **/ /*********************/ // Connect to SPI if ( SPI_connect() != SPI_OK_CONNECT ) { return; } // Prepare and open SPI cursor Portal signature_cursor = SPI_cursor_open_with_args(cursor_name, sql_stmt_, arg_count, arg_types, arg_values, null_args, read_only, cursor_opts); do { // Fetch rows for processing (Populates SPI_processed and SPI_tuptable) SPI_cursor_fetch(signature_cursor, forward, fetch_count); /************************/ /** Process SPI_cursor **/ /************************/ // Iterate cursor and perform calculations for (i = 0 ; i < SPI_processed ; ++i) { // Transfer columns to work array for ( j = 1 ; j < 4 ; ++j ) { result_tuple_datum[j-1] = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, j, &is_null); nulls[j-1] = is_null; } // Special Handling for final column Datum raw_double_array = SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 4, &is_null); nulls[3] = is_null; if ( is_null ) { l1_norm = FLT_MAX; result_tuple_datum[3] = PointerGetDatum(NULL); } else { // Transform binary into double array ArrayType *pg_double_array = DatumGetArrayTypeP(raw_double_array); l1_norm = meanAbsoluteError(signature_, (double *)ARR_DATA_PTR(pg_double_array), (ARR_DIMS(pg_double_array))[0], 0); result_tuple_datum[3] = Float8GetDatum(l1_norm); } // Create and test candidate if ( free_candidate ) { candidate = free_candidate; free_candidate = NULL; } else { candidate = (ResultCandidate *)palloc(sizeof(ResultCandidate)); } (*candidate).lat = DatumGetFloat8(result_tuple_datum[0]); (*candidate).null_lat = nulls[0]; (*candidate).lon = DatumGetFloat8(result_tuple_datum[1]); (*candidate).null_lon = nulls[1]; (*candidate).orientation = DatumGetFloat8(result_tuple_datum[2]); (*candidate).null_orientation = nulls[2]; (*candidate).rank = l1_norm; (*candidate).null_rank = nulls[3]; // Run candidate through sieve top = sieve.top(); if ( !sieve.siftItem(candidate) ) { // Free non-filtered candidates free_candidate = candidate; } else if ( sieve.size() == result_cnt_ ) { // Free non-filtered candidates free_candidate = top; } } result_counter += i; } while ( SPI_processed ); SPI_finish(); </code> Is there an obvious error I'm overlooking, or is there a known bug (PG9.13) for large fetch sizes? Thanks, Zak P.S. KSieve is POD encapsulating an array that has been allocated with palloc().