John Naylor <john.nay...@2ndquadrant.com> 于2019年8月2日周五 下午3:12写道:
> On Tue, Jul 30, 2019 at 8:20 PM Binguo Bao <djydew...@gmail.com> wrote: > > > > John Naylor <john.nay...@2ndquadrant.com> 于2019年7月29日周一 上午11:49写道: > >> > >> 1). For every needle comparison, text_position_next_internal() > >> calculates how much of the value is needed and passes that to > >> detoast_iterate(), which then calculates if it has to do something or > >> not. This is a bit hard to follow. There might also be a performance > >> penalty -- the following is just a theory, but it sounds plausible: > >> The CPU can probably correctly predict that detoast_iterate() will > >> usually return the same value it did last time, but it still has to > >> call the function and make sure, which I imagine is more expensive > >> than advancing the needle. Ideally, we want to call the iterator only > >> if we have to. > >> > >> In the attached patch (applies on top of your v5), > >> text_position_next_internal() simply compares hptr to the detoast > >> buffer limit, and calls detoast_iterate() until it can proceed. I > >> think this is clearer. > > > > > > Yes, I think this is a general scenario where the caller continually > > calls detoast_iterate until gets enough data, so I think such operations > can > > be extracted as a macro, as I did in patch v6. In the macro, the > detoast_iterate > > function is called only when the data requested by the caller is greater > than the > > buffer limit. > > I like the use of a macro here. However, I think we can find a better > location for the definition. See the header comment of fmgr.h: > "Definitions for the Postgres function manager and function-call > interface." Maybe tuptoaster.h is as good a place as any? > PG_DETOAST_ITERATE isn't a sample function-call interface, But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is similar to PG_DETOAST_ITERATE, make condition check first and then decide whether to call the function. Besides, PG_DETOAST_DATUM, PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE, PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable to put all the de-TOAST interface together. >> 2). detoast_iterate() and fetch_datum_iterate() return a value but we > >> don't check it or do anything with it. Should we do something with it? > >> It's also not yet clear if we should check the iterator state instead > >> of return values. I've added some XXX comments as a reminder. We > >> should also check the return value of pglz_decompress_iterate(). > > > > > > IMO, we need to provide users with a simple iterative interface. > > Using the required data pointer to compare with the buffer limit is an > easy way. > > And the application scenarios of the iterator are mostly read operations. > > So I think there is no need to return a value, and the iterator needs to > throw an > > exception for some wrong calls, such as all the data have been iterated, > > but the user still calls the iterator. > > Okay, and see these functions now return void. The orignal > pglz_decompress() returned a value that was check against corruption. > Is there a similar check we can do for the iterative version? > As far as I know, we can just do such check after all compressed data is decompressed. If we are slicing, we can't do the check. > > >> 3). Speaking of pglz_decompress_iterate(), I diff'd it with > >> pglz_decompress(), and I have some questions on it: > >> > >> a). > >> + srcend = (const unsigned char *) (source->limit == source->capacity > >> ? source->limit : (source->limit - 4)); > >> > >> What does the 4 here mean in this expression? > > > > > > Since we fetch chunks one by one, if we make srcend equals to the source > buffer limit, > > In the while loop "while (sp < srcend && dp < destend)", sp may exceed > the source buffer limit and read unallocated bytes. > > Why is this? That tells me the limit is incorrect. Can the setter not > determine the right value? > There are three statments change `sp` value in the while loop `while (sp < srcend && dp < destend)`: `ctrl = *sp++;` `off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;` `len += *sp++` Although we make sure `sp` is less than `srcend` when enter while loop, `sp` is likely to go beyond the `srcend` in the loop, and we should ensure that `sp` is always smaller than `buf->limit` to avoid reading unallocated data. So, `srcend` can't be initialized to `buf->limit`. Only one case is exceptional, we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity', it's imposisble to read unallocated data via `sp`. > Giving a four-byte buffer can prevent sp from exceeding the source buffer > limit. > > Why 4? That's a magic number. Why not 2, or 27? > As I explained above, `sp` may go beyond the `srcend`in the loop, up to the `srcend + 2`. In theory, it's ok to set the buffer size to be greater than or equal 2. > > If we have read all the chunks, we don't need to be careful to cross the > border, > > just make srcend equal to source buffer limit. I've added comments to > explain it in patch v6. > > That's a good thing to comment on, but it doesn't explain why. Yes, the current comment is puzzling. I'll improve it. > This > logic seems like a band-aid and I think a committer would want this to > be handled in a more principled way. > I don't want to change pglz_decompress logic too much, the iterator should pay more attention to saving and restoring the original pglz_decompress state. > >> Is it possible it's > >> compensating for this bit in init_toast_buffer()? > >> > >> + buf->limit = VARDATA(buf->buf); > >> > >> It seems the initial limit should also depend on whether the datum is > >> compressed, right? Can we just do this: > >> > >> + buf->limit = buf->position; > > > > > > I'm afraid not. buf->position points to the data portion of the buffer, > but the beginning of > > the chunks we read may contain header information. For example, for > compressed data chunks, > > the first four bytes record the size of raw data, this means that limit > is four bytes ahead of position. > > This initialization doesn't cause errors, although the position is less > than the limit in other cases. > > Because we always fetch chunks first, then decompress it. > > I see what you mean now. This could use a comment or two to explain > the stated constraints may not actually be satisfied at > initialization. > Done. > >> b). > >> - while (sp < srcend && dp < destend) > >> ... > >> + while (sp + 1 < srcend && dp < destend && > >> ... > >> > >> Why is it here "sp + 1"? > > > > > > Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch > v6 to > > achieve the purpose of parsing ctrl correctly every time. > > Please explain further. Was the "sp + 1" correct behavior (and why), > or only for debugging setting ctrl/c correctly? In patch v5, If the condition is `sp < srcend`, suppose `sp = srcend - 1` before entering the loop `while (sp < srcend && dp < destend)`, when entering the loop and read a control byte(sp equals to `srcend` now), the program can't enter the loop `for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)`, then set `iter->ctrlc` to 0, exit the first loop and then this iteration is over. At the next iteration, the control byte will be reread since `iter->ctrlc` equals to 0, but the previous control byte is not used. Changing the condition to `sp + 1 < srcend` avoid only one control byte is read then the iterator is over. > Also, I don't think > the new logic for the ctrl/c variables is an improvement: > > 1. iter->ctrlc is intialized with '8' (even in the uncompressed case, > which is confusing). Any time you initialize with something not 0 or > 1, it's a magic number, and here it's far from where the loop variable > is used. This is harder to read. > `iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at the end of the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value is 0~7, When `ctrlc` reaches 8, a control byte is read from the source buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read from the source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be intialized with '8'. > 2. First time though the loop, iter->ctrlc = 8, which immediately gets > set back to 0. > As I explained above, `iter->ctrlc = 8` make a control byte be read from the source buffer to `ctrl` on the first iteration. Besides, `iter->ctrlc = 8` indicates that the valid value of `ctrlc` at the end of the last iteration was not recorded, Obviously, there are no other iterations before the first iteration. > 3. At the end of the loop, iter->ctrl/c are unconditionally set. In > v5, there was a condition which would usually avoid this copying of > values through pointers. > Patch v6 just records the value of `ctrlc` at the end of each iteration(or loop) whether it is valid (0~7) or 8, and initializes `ctrlc` on the next iteration(or loop) correctly. I think it is more concise in patch v6. > > >> 4. Note that varlena.c has a static state variable, and a cleanup > >> function that currently does: > >> > >> static void > >> text_position_cleanup(TextPositionState *state) > >> { > >> /* no cleanup needed */ > >> } > >> > >> It seems to be the detoast iterator could be embedded in this state > >> variable, and then free-ing can happen here. That has a possible > >> advantage that the iterator struct would be on the same cache line as > >> the state data. That would also remove the need to pass "iter" as a > >> parameter, since these functions already pass "state". I'm not sure if > >> this would be good for other users of the iterator, so maybe we can > >> hold off on that for now. > > > > > > Good idea. I've implemented it in patch v6. > > That's better, and I think we can take it a little bit farther. > > 1. Notice that TextPositionState is allocated on the stack in > text_position(), which passes both the "state" pointer and the "iter" > pointer to text_position_setup(), and only then sets state->iter = > iter. We can easily set this inside text_position(). That would get > rid of the need for other callers to pass NULL iter to > text_position_setup(). > Done. > 2. DetoastIteratorData is fixed size, so I see no reason to allocate > it on the heap. We could allocate it on the stack in text_pos(), and > pass the pointer to create_detoast_iterator() (in this case maybe a > better name is init_detoast_iterator), which would return a bool to > tell text_pos() whether to pass down the pointer or a NULL. The > allocation of other structs (toast buffer and fetch iterator) probably > can't be changed without more work. > Done If there is anything else that is not explained clearly, please point it out. -- Best regards, Binguo Bao
From 8971fbdc8c072f72918a46feec65e385995660b1 Mon Sep 17 00:00:00 2001 From: BBG <djydew...@gmail.com> Date: Tue, 4 Jun 2019 22:56:42 +0800 Subject: [PATCH] de-TOASTing using a iterator --- src/backend/access/heap/tuptoaster.c | 458 +++++++++++++++++++++++++++++++++++ src/backend/utils/adt/varlena.c | 37 ++- src/include/access/tuptoaster.h | 90 +++++++ src/include/fmgr.h | 7 + 4 files changed, 586 insertions(+), 6 deletions(-) diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c index 55d6e91..7123f62 100644 --- a/src/backend/access/heap/tuptoaster.c +++ b/src/backend/access/heap/tuptoaster.c @@ -83,6 +83,13 @@ static int toast_open_indexes(Relation toastrel, static void toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock); static void init_toast_snapshot(Snapshot toast_snapshot); +static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr); +static bool free_fetch_datum_iterator(FetchDatumIterator iter); +static void fetch_datum_iterate(FetchDatumIterator iter); +static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed); +static bool free_toast_buffer(ToastBuffer *buf); +static void pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest, + DetoastIterator iter); /* ---------- @@ -347,6 +354,117 @@ heap_tuple_untoast_attr_slice(struct varlena *attr, /* ---------- + * create_detoast_iterator - + * + * Initialize detoast iterator. + * ---------- + */ +bool create_detoast_iterator(struct varlena *attr, DetoastIterator iterator) { + struct varatt_external toast_pointer; + if (VARATT_IS_EXTERNAL_ONDISK(attr)) + { + /* + * This is an externally stored datum --- create fetch datum iterator + */ + iterator->fetch_datum_iterator = create_fetch_datum_iterator(attr); + VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); + if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) + { + /* If it's compressed, prepare buffer for raw data */ + iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer)); + init_toast_buffer(iterator->buf, toast_pointer.va_rawsize, false); + iterator->ctrl = 0; + iterator->ctrlc = 8; + iterator->compressed = true; + iterator->done = false; + } + else + { + iterator->buf = iterator->fetch_datum_iterator->buf; + iterator->ctrl = 0; + iterator->ctrlc = 8; + iterator->compressed = false; + iterator->done = false; + } + return true; + } + else if (VARATT_IS_EXTERNAL_INDIRECT(attr)) + { + /* + * This is an indirect pointer --- dereference it + */ + struct varatt_indirect redirect; + + VARATT_EXTERNAL_GET_POINTER(redirect, attr); + attr = (struct varlena *) redirect.pointer; + + /* nested indirect Datums aren't allowed */ + Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr)); + + /* recurse in case value is still extended in some other way */ + return create_detoast_iterator(attr, iterator); + + } + else if (VARATT_IS_COMPRESSED(attr)) + { + /* + * This is a compressed value inside of the main tuple + * Skip the iterator and just decompress the whole thing. + */ + return false; + } + + return false; +} + + +/* ---------- + * free_detoast_iterator - + * + * Free the memory space occupied by the de-Toast iterator. + * ---------- + */ +bool free_detoast_iterator(DetoastIterator iter) { + if (iter == NULL) + { + return false; + } + if (iter->buf != iter->fetch_datum_iterator->buf) + { + free_toast_buffer(iter->buf); + } + free_fetch_datum_iterator(iter->fetch_datum_iterator); + return true; +} + + +/* ---------- + * detoast_iterate - + * + * Iterate through the toasted value referenced by iterator. + * + * As long as there is another slice in compression or external storage, + * detoast it into toast buffer in iterator. + * ---------- + */ +extern void detoast_iterate(DetoastIterator iter) +{ + FetchDatumIterator fetch_iter = iter->fetch_datum_iterator; + + Assert(iter != NULL && !iter->done); + + fetch_datum_iterate(fetch_iter); + + if (iter->compressed) + pglz_decompress_iterate(fetch_iter->buf, iter->buf, iter); + + if (iter->buf->limit == iter->buf->capacity) { + iter->done = true; + } +} + + +/* ---------- * toast_raw_datum_size - * * Return the raw (detoasted) size of a varlena datum @@ -2409,3 +2527,343 @@ init_toast_snapshot(Snapshot toast_snapshot) InitToastSnapshot(*toast_snapshot, snapshot->lsn, snapshot->whenTaken); } + + +/* ---------- + * create_fetch_datum_iterator - + * + * Initialize fetch datum iterator. + * ---------- + */ +static FetchDatumIterator +create_fetch_datum_iterator(struct varlena *attr) { + int validIndex; + FetchDatumIterator iterator; + + if (!VARATT_IS_EXTERNAL_ONDISK(attr)) + elog(ERROR, "create_fetch_datum_itearator shouldn't be called for non-ondisk datums"); + + iterator = (FetchDatumIterator) palloc0(sizeof(FetchDatumIteratorData)); + + /* Must copy to access aligned fields */ + VARATT_EXTERNAL_GET_POINTER(iterator->toast_pointer, attr); + + iterator->ressize = iterator->toast_pointer.va_extsize; + iterator->numchunks = ((iterator->ressize - 1) / TOAST_MAX_CHUNK_SIZE) + 1; + + /* + * Open the toast relation and its indexes + */ + iterator->toastrel = table_open(iterator->toast_pointer.va_toastrelid, AccessShareLock); + + /* Look for the valid index of the toast relation */ + validIndex = toast_open_indexes(iterator->toastrel, + AccessShareLock, + &iterator->toastidxs, + &iterator->num_indexes); + + /* + * Setup a scan key to fetch from the index by va_valueid + */ + ScanKeyInit(&iterator->toastkey, + (AttrNumber) 1, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(iterator->toast_pointer.va_valueid)); + + /* + * Read the chunks by index + * + * Note that because the index is actually on (valueid, chunkidx) we will + * see the chunks in chunkidx order, even though we didn't explicitly ask + * for it. + */ + + init_toast_snapshot(&iterator->SnapshotToast); + iterator->toastscan = systable_beginscan_ordered(iterator->toastrel, iterator->toastidxs[validIndex], + &iterator->SnapshotToast, 1, &iterator->toastkey); + + iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer)); + init_toast_buffer(iterator->buf, iterator->ressize + VARHDRSZ, VARATT_EXTERNAL_IS_COMPRESSED(iterator->toast_pointer)); + + iterator->nextidx = 0; + iterator->done = false; + + return iterator; +} + +static bool +free_fetch_datum_iterator(FetchDatumIterator iter) +{ + if (iter == NULL) + { + return false; + } + + if (!iter->done) + { + systable_endscan_ordered(iter->toastscan); + toast_close_indexes(iter->toastidxs, iter->num_indexes, AccessShareLock); + table_close(iter->toastrel, AccessShareLock); + } + free_toast_buffer(iter->buf); + pfree(iter); + return true; +} + +/* ---------- + * fetch_datum_iterate - + * + * Iterate through the toasted value referenced by iterator. + * + * As long as there is another chunk data in compression or external storage, + * fetch it into buffer in iterator. + * ---------- + */ +static void +fetch_datum_iterate(FetchDatumIterator iter) { + HeapTuple ttup; + TupleDesc toasttupDesc; + int32 residx; + Pointer chunk; + bool isnull; + char *chunkdata; + int32 chunksize; + + Assert(iter != NULL && !iter->done); + + ttup = systable_getnext_ordered(iter->toastscan, ForwardScanDirection); + if (ttup == NULL) + { + /* + * Final checks that we successfully fetched the datum + */ + if (iter->nextidx != iter->numchunks) + elog(ERROR, "missing chunk number %d for toast value %u in %s", + iter->nextidx, + iter->toast_pointer.va_valueid, + RelationGetRelationName(iter->toastrel)); + + /* + * End scan and close relations + */ + systable_endscan_ordered(iter->toastscan); + toast_close_indexes(iter->toastidxs, iter->num_indexes, AccessShareLock); + table_close(iter->toastrel, AccessShareLock); + + iter->done = true; + return; + } + + /* + * Have a chunk, extract the sequence number and the data + */ + toasttupDesc = iter->toastrel->rd_att; + residx = DatumGetInt32(fastgetattr(ttup, 2, toasttupDesc, &isnull)); + Assert(!isnull); + chunk = DatumGetPointer(fastgetattr(ttup, 3, toasttupDesc, &isnull)); + Assert(!isnull); + if (!VARATT_IS_EXTENDED(chunk)) + { + chunksize = VARSIZE(chunk) - VARHDRSZ; + chunkdata = VARDATA(chunk); + } + else if (VARATT_IS_SHORT(chunk)) + { + /* could happen due to heap_form_tuple doing its thing */ + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT; + chunkdata = VARDATA_SHORT(chunk); + } + else + { + /* should never happen */ + elog(ERROR, "found toasted toast chunk for toast value %u in %s", + iter->toast_pointer.va_valueid, + RelationGetRelationName(iter->toastrel)); + chunksize = 0; /* keep compiler quiet */ + chunkdata = NULL; + } + + /* + * Some checks on the data we've found + */ + if (residx != iter->nextidx) + elog(ERROR, "unexpected chunk number %d (expected %d) for toast value %u in %s", + residx, iter->nextidx, + iter->toast_pointer.va_valueid, + RelationGetRelationName(iter->toastrel)); + if (residx < iter->numchunks - 1) + { + if (chunksize != TOAST_MAX_CHUNK_SIZE) + elog(ERROR, "unexpected chunk size %d (expected %d) in chunk %d of %d for toast value %u in %s", + chunksize, (int) TOAST_MAX_CHUNK_SIZE, + residx, iter->numchunks, + iter->toast_pointer.va_valueid, + RelationGetRelationName(iter->toastrel)); + } + else if (residx == iter->numchunks - 1) + { + if ((residx * TOAST_MAX_CHUNK_SIZE + chunksize) != iter->ressize) + elog(ERROR, "unexpected chunk size %d (expected %d) in final chunk %d for toast value %u in %s", + chunksize, + (int) (iter->ressize - residx * TOAST_MAX_CHUNK_SIZE), + residx, + iter->toast_pointer.va_valueid, + RelationGetRelationName(iter->toastrel)); + } + else + elog(ERROR, "unexpected chunk number %d (out of range %d..%d) for toast value %u in %s", + residx, + 0, iter->numchunks - 1, + iter->toast_pointer.va_valueid, + RelationGetRelationName(iter->toastrel)); + + /* + * Copy the data into proper place in our iterator buffer + */ + memcpy(iter->buf->limit, chunkdata, chunksize); + iter->buf->limit += chunksize; + + iter->nextidx++; +} + + +static void +init_toast_buffer(ToastBuffer *buf, int32 size, bool compressed) { + buf->buf = (const char *) palloc0(size); + if (compressed) { + SET_VARSIZE_COMPRESSED(buf->buf, size); + /* + * Note the constrain buf->position <= buf->limit may be broken + * at initialization. Make sure that the constrain is satisfied + * when consume chars. + */ + buf->position = VARDATA_4B_C(buf->buf); + } + else + { + SET_VARSIZE(buf->buf, size); + buf->position = VARDATA_4B(buf->buf); + } + buf->limit = VARDATA(buf->buf); + buf->capacity = buf->buf + size; + buf->buf_size = size; +} + + +static bool +free_toast_buffer(ToastBuffer *buf) +{ + if (buf == NULL) + { + return false; + } + + pfree((void *)buf->buf); + pfree(buf); + + return true; +} + + +/* ---------- + * pglz_decompress_iterate - + * + * Decompresses source into dest until the source is exhausted. + * ---------- + */ +static void +pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest, DetoastIterator iter) +{ + const unsigned char *sp; + const unsigned char *srcend; + unsigned char *dp; + unsigned char *destend; + + /* + * In the while loop, sp may go beyond the srcend, provides a four-byte + * buffer to prevent sp from reading unallocated bytes from source buffer. + * When source->limit reaches source->capacity, don't worry about reading + * unallocated bytes. + */ + srcend = (const unsigned char *) + (source->limit == source->capacity ? source->limit : (source->limit - 4)); + sp = (const unsigned char *) source->position; + dp = (unsigned char *) dest->limit; + destend = (unsigned char *) dest->capacity; + + while (sp < srcend && dp < destend) + { + /* + * Read one control byte and process the next 8 items (or as many as + * remain in the compressed input). + */ + unsigned char ctrl; + int ctrlc; + + if (iter->ctrlc < 8) { + ctrl = iter->ctrl; + ctrlc = iter->ctrlc; + } + else + { + ctrl = *sp++; + ctrlc = 0; + } + + + for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++) + { + + if (ctrl & 1) + { + /* + * Otherwise it contains the match length minus 3 and the + * upper 4 bits of the offset. The next following byte + * contains the lower 8 bits of the offset. If the length is + * coded as 18, another extension tag byte tells how much + * longer the match really was (0-255). + */ + int32 len; + int32 off; + + len = (sp[0] & 0x0f) + 3; + off = ((sp[0] & 0xf0) << 4) | sp[1]; + sp += 2; + if (len == 18) + len += *sp++; + + /* + * Now we copy the bytes specified by the tag from OUTPUT to + * OUTPUT. It is dangerous and platform dependent to use + * memcpy() here, because the copied areas could overlap + * extremely! + */ + len = Min(len, destend - dp); + while (len--) + { + *dp = dp[-off]; + dp++; + } + } + else + { + /* + * An unset control bit means LITERAL BYTE. So we just copy + * one from INPUT to OUTPUT. + */ + *dp++ = *sp++; + } + + /* + * Advance the control bit + */ + ctrl >>= 1; + } + + iter->ctrlc = ctrlc; + iter->ctrl = ctrl; + } + + source->position = (char *) sp; + dest->limit = (char *) dp; +} diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 0864838..0312dba 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -56,6 +56,8 @@ typedef struct int len1; /* string lengths in bytes */ int len2; + DetoastIterator iter; + /* Skip table for Boyer-Moore-Horspool search algorithm: */ int skiptablemask; /* mask for ANDing with skiptable subscripts */ int skiptable[256]; /* skip distance for given mismatched char */ @@ -122,7 +124,7 @@ static text *text_substring(Datum str, int32 length, bool length_not_specified); static text *text_overlay(text *t1, text *t2, int sp, int sl); -static int text_position(text *t1, text *t2, Oid collid); +static int text_position(text *t1, text *t2, Oid collid, DetoastIterator iter); static void text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state); static bool text_position_next(TextPositionState *state); static char *text_position_next_internal(char *start_ptr, TextPositionState *state); @@ -1092,10 +1094,22 @@ text_overlay(text *t1, text *t2, int sp, int sl) Datum textpos(PG_FUNCTION_ARGS) { - text *str = PG_GETARG_TEXT_PP(0); + text *str; + DetoastIteratorData iteratorData; + DetoastIterator iter = &iteratorData; text *search_str = PG_GETARG_TEXT_PP(1); - PG_RETURN_INT32((int32) text_position(str, search_str, PG_GET_COLLATION())); + if (create_detoast_iterator((struct varlena *)(DatumGetPointer(PG_GETARG_DATUM(0))), iter)) + { + str = (text *) iter->buf->buf; + } + else + { + str = PG_GETARG_TEXT_PP(0); + iter = NULL; + } + + PG_RETURN_INT32((int32) text_position(str, search_str, PG_GET_COLLATION(), iter)); } /* @@ -1113,7 +1127,7 @@ textpos(PG_FUNCTION_ARGS) * functions. */ static int -text_position(text *t1, text *t2, Oid collid) +text_position(text *t1, text *t2, Oid collid, DetoastIterator iter) { TextPositionState state; int result; @@ -1121,6 +1135,7 @@ text_position(text *t1, text *t2, Oid collid) if (VARSIZE_ANY_EXHDR(t1) < 1 || VARSIZE_ANY_EXHDR(t2) < 1) return 0; + state.iter = iter; text_position_setup(t1, t2, collid, &state); if (!text_position_next(&state)) result = 0; @@ -1130,7 +1145,6 @@ text_position(text *t1, text *t2, Oid collid) return result; } - /* * text_position_setup, text_position_next, text_position_cleanup - * Component steps of text_position() @@ -1358,6 +1372,10 @@ text_position_next_internal(char *start_ptr, TextPositionState *state) hptr = start_ptr; while (hptr < haystack_end) { + if (state->iter != NULL) { + PG_DETOAST_ITERATE(state->iter, hptr); + } + if (*hptr == nchar) return (char *) hptr; hptr++; @@ -1375,6 +1393,11 @@ text_position_next_internal(char *start_ptr, TextPositionState *state) const char *nptr; const char *p; + if (state->iter != NULL) + { + PG_DETOAST_ITERATE(state->iter, hptr); + } + nptr = needle_last; p = hptr; while (*nptr == *p) @@ -1438,7 +1461,9 @@ text_position_get_match_pos(TextPositionState *state) static void text_position_cleanup(TextPositionState *state) { - /* no cleanup needed */ + if (state->iter != NULL) { + free_detoast_iterator(state->iter); + } } static void diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h index f0aea24..589bc27 100644 --- a/src/include/access/tuptoaster.h +++ b/src/include/access/tuptoaster.h @@ -17,6 +17,96 @@ #include "storage/lockdefs.h" #include "utils/relcache.h" +#ifndef FRONTEND +#include "access/genam.h" + +/* + * TOAST buffer is a producer consumer buffer. + * + * +--+--+--+--+--+--+--+--+--+--+--+--+--+ + * | | | | | | | | | | | | | | + * +--+--+--+--+--+--+--+--+--+--+--+--+--+ + * ^ ^ ^ ^ + * buf position limit capacity + * + * buf: point to the start of buffer. + * position: point to the next char to be consume. + * limit: point to the next char to be produce. + * capacity: point to the end of buffer. + * + * Constrains that need to be satisfied: + * buf <= position <= limit <= capacity + */ +typedef struct ToastBuffer +{ + const char *buf; + const char *position; + char *limit; + const char *capacity; + int32 buf_size; +} ToastBuffer; + + +typedef struct FetchDatumIteratorData +{ + ToastBuffer *buf; + Relation toastrel; + Relation *toastidxs; + SysScanDesc toastscan; + ScanKeyData toastkey; + SnapshotData SnapshotToast; + struct varatt_external toast_pointer; + int32 ressize; + int32 nextidx; + int32 numchunks; + int num_indexes; + bool done; +} FetchDatumIteratorData; + +typedef struct FetchDatumIteratorData *FetchDatumIterator; + +typedef struct DetoastIteratorData +{ + ToastBuffer *buf; + FetchDatumIterator fetch_datum_iterator; + unsigned char ctrl; + int ctrlc; + bool compressed; /* toast value is compressed? */ + bool done; +} DetoastIteratorData; + +typedef struct DetoastIteratorData *DetoastIterator; + +/* ---------- + * create_detoast_iterator - + * + * Initialize detoast iterator. + * ---------- + */ +extern bool create_detoast_iterator(struct varlena *attr, DetoastIterator iterator); + +/* ---------- + * free_detoast_iterator - + * + * Free the memory space occupied by the de-Toast iterator. + * ---------- + */ +extern bool free_detoast_iterator(DetoastIterator iter); + +/* ---------- + * detoast_iterate - + * + * Iterate through the toasted value referenced by iterator. + * + * As long as there is another slice in compression or external storage, + * detoast it into toast buffer in iterator. + * ---------- + */ +extern void detoast_iterate(DetoastIterator iter); + +#endif + + /* * This enables de-toasting of index entries. Needed until VACUUM is * smart enough to rebuild indexes from scratch. diff --git a/src/include/fmgr.h b/src/include/fmgr.h index 3ff0999..446c880 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -239,6 +239,13 @@ extern struct varlena *pg_detoast_datum_packed(struct varlena *datum); #define PG_DETOAST_DATUM_SLICE(datum,f,c) \ pg_detoast_datum_slice((struct varlena *) DatumGetPointer(datum), \ (int32) (f), (int32) (c)) +#define PG_DETOAST_ITERATE(iter, need) \ + do { \ + Assert(need >= iter->buf->buf && need <= iter->buf->capacity); \ + while (!iter->done && need >= iter->buf->limit) { \ + detoast_iterate(iter); \ + } \ + } while (0) /* WARNING -- unaligned pointer */ #define PG_DETOAST_DATUM_PACKED(datum) \ pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum)) -- 2.7.4