John Naylor <john.nay...@2ndquadrant.com> 于2019年8月2日周五 下午3:12写道:

> On Tue, Jul 30, 2019 at 8:20 PM Binguo Bao <djydew...@gmail.com> wrote:
> >
> > John Naylor <john.nay...@2ndquadrant.com> 于2019年7月29日周一 上午11:49写道:
> >>
> >> 1). For every needle comparison, text_position_next_internal()
> >> calculates how much of the value is needed and passes that to
> >> detoast_iterate(), which then calculates if it has to do something or
> >> not. This is a bit hard to follow. There might also be a performance
> >> penalty -- the following is just a theory, but it sounds plausible:
> >> The CPU can probably correctly predict that detoast_iterate() will
> >> usually return the same value it did last time, but it still has to
> >> call the function and make sure, which I imagine is more expensive
> >> than advancing the needle. Ideally, we want to call the iterator only
> >> if we have to.
> >>
> >> In the attached patch (applies on top of your v5),
> >> text_position_next_internal() simply compares hptr to the detoast
> >> buffer limit, and calls detoast_iterate() until it can proceed. I
> >> think this is clearer.
> >
> >
> > Yes, I think this is a general scenario where the caller continually
> > calls detoast_iterate until gets enough data, so I think such operations
> can
> > be extracted as a macro, as I did in patch v6. In the macro, the
> detoast_iterate
> > function is called only when the data requested by the caller is greater
> than the
> > buffer limit.
>
> I like the use of a macro here. However, I think we can find a better
> location for the definition. See the header comment of fmgr.h:
> "Definitions for the Postgres function manager and function-call
> interface." Maybe tuptoaster.h is as good a place as any?
>

PG_DETOAST_ITERATE isn't a sample function-call interface,
But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is
similar to PG_DETOAST_ITERATE, make condition check first and then
decide whether to call the function. Besides, PG_DETOAST_DATUM,
PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE,
PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable
to put all the de-TOAST interface together.

>> 2). detoast_iterate() and fetch_datum_iterate() return a value but we
> >> don't check it or do anything with it. Should we do something with it?
> >> It's also not yet clear if we should check the iterator state instead
> >> of return values. I've added some XXX comments as a reminder. We
> >> should also check the return value of pglz_decompress_iterate().
> >
> >
> > IMO, we need to provide users with a simple iterative interface.
> > Using the required data pointer to compare with the buffer limit is an
> easy way.
> > And the application scenarios of the iterator are mostly read operations.
> > So I think there is no need to return a value, and the iterator needs to
> throw an
> > exception for some wrong calls, such as all the data have been iterated,
> > but the user still calls the iterator.
>
> Okay, and see these functions now return void. The orignal
> pglz_decompress() returned a value that was check against corruption.
> Is there a similar check we can do for the iterative version?
>

As far as I know, we can just do such check after all compressed data is
decompressed.
If we are slicing, we can't do the check.


>
> >> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
> >> pglz_decompress(), and I have some questions on it:
> >>
> >> a).
> >> + srcend = (const unsigned char *) (source->limit == source->capacity
> >> ? source->limit : (source->limit - 4));
> >>
> >> What does the 4 here mean in this expression?
> >
> >
> > Since we fetch chunks one by one, if we make srcend equals to the source
> buffer limit,
> > In the while loop "while (sp < srcend && dp < destend)", sp may exceed
> the source buffer limit and read unallocated bytes.
>
> Why is this? That tells me the limit is incorrect. Can the setter not
> determine the right value?
>

There are three statments change `sp` value in the while loop `while (sp <
srcend && dp < destend)`:
`ctrl = *sp++;`
`off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;`
`len += *sp++`
Although we make sure `sp` is less than `srcend` when enter while loop,
`sp` is likely to
go beyond the `srcend` in the loop, and we should ensure that `sp` is
always smaller than `buf->limit` to avoid
reading unallocated data. So, `srcend` can't be initialized to
`buf->limit`. Only one case is exceptional,
we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity',
it's imposisble to read unallocated
data via `sp`.

> Giving a four-byte buffer can prevent sp from exceeding the source buffer
> limit.
>
> Why 4? That's a magic number. Why not 2, or 27?
>

As I explained above, `sp` may go beyond the `srcend`in the loop, up to the
`srcend + 2`.
In theory, it's ok to set the buffer size to be greater than or equal 2.


> > If we have read all the chunks, we don't need to be careful to cross the
> border,
> > just make srcend equal to source buffer limit. I've added comments to
> explain it in patch v6.
>
> That's a good thing to comment on, but it doesn't explain why.


Yes, the current comment is puzzling. I'll improve it.


> This
> logic seems like a band-aid and I think a committer would want this to
> be handled in a more principled way.
>

I don't want to change pglz_decompress logic too much, the iterator should
pay more attention to saving and restoring the original pglz_decompress
state.


> >> Is it possible it's
> >> compensating for this bit in init_toast_buffer()?
> >>
> >> + buf->limit = VARDATA(buf->buf);
> >>
> >> It seems the initial limit should also depend on whether the datum is
> >> compressed, right? Can we just do this:
> >>
> >> + buf->limit = buf->position;
> >
> >
> > I'm afraid not. buf->position points to the data portion of the buffer,
> but the beginning of
> > the chunks we read may contain header information. For example, for
> compressed data chunks,
> > the first four bytes record the size of raw data, this means that limit
> is four bytes ahead of position.
> > This initialization doesn't cause errors, although the position is less
> than the limit in other cases.
> > Because we always fetch chunks first, then decompress it.
>
> I see what you mean now. This could use a comment or two to explain
> the stated constraints may not actually be satisfied at
> initialization.
>

Done.


> >> b).
> >> - while (sp < srcend && dp < destend)
> >> ...
> >> + while (sp + 1 < srcend && dp < destend &&
> >> ...
> >>
> >> Why is it here "sp + 1"?
> >
> >
> > Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch
> v6 to
> > achieve the purpose of parsing ctrl correctly every time.
>
> Please explain further. Was the "sp + 1" correct behavior (and why),
> or only for debugging setting ctrl/c correctly?


In patch v5, If the condition is `sp < srcend`, suppose `sp = srcend - 1`
before
entering the loop `while (sp < srcend && dp < destend)`, when entering the
loop
and read a control byte(sp equals to `srcend` now), the program can't enter
the
loop `for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)`, then set
`iter->ctrlc` to 0,
exit the first loop and then this iteration is over. At the next iteration,
the control byte will be reread since `iter->ctrlc` equals to 0, but the
previous control byte
is not used. Changing the condition to `sp + 1 < srcend` avoid only one
control byte is read
then the iterator is over.


> Also, I don't think
> the new logic for the ctrl/c variables is an improvement:
>
> 1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
> which is confusing). Any time you initialize with something not 0 or
> 1, it's a magic number, and here it's far from where the loop variable
> is used. This is harder to read.
>

`iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at
the end of
the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value
is 0~7,
When `ctrlc` reaches 8,  a control byte is read from the source
buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read
from the
source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be
intialized with '8'.


> 2. First time though the loop, iter->ctrlc = 8, which immediately gets
> set back to 0.
>

As I explained above, `iter->ctrlc = 8` make a control byte be read
from the source buffer to `ctrl` on the first iteration. Besides,
`iter->ctrlc = 8`
indicates that the valid value of `ctrlc` at the end of the last iteration
was not
recorded, Obviously, there are no other iterations before the first
iteration.


> 3. At the end of the loop, iter->ctrl/c are unconditionally set. In
> v5, there was a condition which would usually avoid this copying of
> values through pointers.
>

Patch v6 just records the value of `ctrlc` at the end of each iteration(or
loop)
whether it is valid (0~7) or 8, and initializes `ctrlc` on the next
iteration(or loop) correctly.
I think it is more concise in patch v6.


>
> >> 4. Note that varlena.c has a static state variable, and a cleanup
> >> function that currently does:
> >>
> >> static void
> >> text_position_cleanup(TextPositionState *state)
> >> {
> >> /* no cleanup needed */
> >> }
> >>
> >> It seems to be the detoast iterator could be embedded in this state
> >> variable, and then free-ing can happen here. That has a possible
> >> advantage that the iterator struct would be on the same cache line as
> >> the state data. That would also remove the need to pass "iter" as a
> >> parameter, since these functions already pass "state". I'm not sure if
> >> this would be good for other users of the iterator, so maybe we can
> >> hold off on that for now.
> >
> >
> > Good idea. I've implemented it in patch v6.
>
> That's better, and I think we can take it a little bit farther.
>
> 1. Notice that TextPositionState is allocated on the stack in
> text_position(), which passes both the "state" pointer and the "iter"
> pointer to text_position_setup(), and only then sets state->iter =
> iter. We can easily set this inside text_position(). That would get
> rid of the need for other callers to pass NULL iter to
> text_position_setup().
>

Done.


> 2. DetoastIteratorData is fixed size, so I see no reason to allocate
> it on the heap. We could allocate it on the stack in text_pos(), and
> pass the pointer to create_detoast_iterator() (in this case maybe a
> better name is init_detoast_iterator), which would return a bool to
> tell text_pos() whether to pass down the pointer or a NULL. The
> allocation of other structs (toast buffer and fetch iterator) probably
> can't be changed without more work.
>

Done

If there is anything else that is not explained clearly, please point it
out.

-- 
Best regards,
Binguo Bao
From 8971fbdc8c072f72918a46feec65e385995660b1 Mon Sep 17 00:00:00 2001
From: BBG <djydew...@gmail.com>
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/heap/tuptoaster.c | 458 +++++++++++++++++++++++++++++++++++
 src/backend/utils/adt/varlena.c      |  37 ++-
 src/include/access/tuptoaster.h      |  90 +++++++
 src/include/fmgr.h                   |   7 +
 4 files changed, 586 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..7123f62 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -83,6 +83,13 @@ static int	toast_open_indexes(Relation toastrel,
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 								LOCKMODE lock);
 static void init_toast_snapshot(Snapshot toast_snapshot);
+static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+static bool free_fetch_datum_iterator(FetchDatumIterator iter);
+static void fetch_datum_iterate(FetchDatumIterator iter);
+static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed);
+static bool free_toast_buffer(ToastBuffer *buf);
+static void pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+									DetoastIterator iter);
 
 
 /* ----------
@@ -347,6 +354,117 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 
 
 /* ----------
+ * create_detoast_iterator -
+ *
+ * Initialize detoast iterator.
+ * ----------
+ */
+bool create_detoast_iterator(struct varlena *attr, DetoastIterator iterator) {
+	struct varatt_external toast_pointer;
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
+	{
+		/*
+		 * This is an externally stored datum --- create fetch datum iterator
+		 */
+		iterator->fetch_datum_iterator = create_fetch_datum_iterator(attr);
+		VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+		if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+		{
+			/* If it's compressed, prepare buffer for raw data */
+			iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
+			init_toast_buffer(iterator->buf, toast_pointer.va_rawsize, false);
+			iterator->ctrl = 0;
+			iterator->ctrlc = 8;
+			iterator->compressed = true;
+			iterator->done = false;
+		}
+		else
+		{
+			iterator->buf = iterator->fetch_datum_iterator->buf;
+			iterator->ctrl = 0;
+			iterator->ctrlc = 8;
+			iterator->compressed = false;
+			iterator->done = false;
+		}
+		return true;
+	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		/*
+		 * This is an indirect pointer --- dereference it
+		 */
+		struct varatt_indirect redirect;
+
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *) redirect.pointer;
+
+		/* nested indirect Datums aren't allowed */
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		/* recurse in case value is still extended in some other way */
+		return create_detoast_iterator(attr, iterator);
+
+	}
+	else if (VARATT_IS_COMPRESSED(attr))
+	{
+		/*
+		 * This is a compressed value inside of the main tuple
+		 * Skip the iterator and just decompress the whole thing.
+		 */
+		return false;
+	}
+
+	return false;
+}
+
+
+/* ----------
+ * free_detoast_iterator -
+ *
+ * Free the memory space occupied by the de-Toast iterator.
+ * ----------
+ */
+bool free_detoast_iterator(DetoastIterator iter) {
+	if (iter == NULL)
+	{
+		return false;
+	}
+	if (iter->buf != iter->fetch_datum_iterator->buf)
+	{
+		free_toast_buffer(iter->buf);
+	}
+	free_fetch_datum_iterator(iter->fetch_datum_iterator);
+	return true;
+}
+
+
+/* ----------
+ * detoast_iterate -
+ *
+ * Iterate through the toasted value referenced by iterator.
+ *
+ * As long as there is another slice in compression or external storage,
+ * detoast it into toast buffer in iterator.
+ * ----------
+ */
+extern void detoast_iterate(DetoastIterator iter)
+{
+	FetchDatumIterator fetch_iter = iter->fetch_datum_iterator;
+
+	Assert(iter != NULL && !iter->done);
+
+	fetch_datum_iterate(fetch_iter);
+
+	if (iter->compressed)
+		pglz_decompress_iterate(fetch_iter->buf, iter->buf, iter);
+
+	if (iter->buf->limit == iter->buf->capacity) {
+		iter->done = true;
+	}
+}
+
+
+/* ----------
  * toast_raw_datum_size -
  *
  *	Return the raw (detoasted) size of a varlena datum
@@ -2409,3 +2527,343 @@ init_toast_snapshot(Snapshot toast_snapshot)
 
 	InitToastSnapshot(*toast_snapshot, snapshot->lsn, snapshot->whenTaken);
 }
+
+
+/* ----------
+ * create_fetch_datum_iterator -
+ *
+ * Initialize fetch datum iterator.
+ * ----------
+ */
+static FetchDatumIterator
+create_fetch_datum_iterator(struct varlena *attr) {
+	int			validIndex;
+	FetchDatumIterator iterator;
+
+	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+		elog(ERROR, "create_fetch_datum_itearator shouldn't be called for non-ondisk datums");
+
+	iterator = (FetchDatumIterator) palloc0(sizeof(FetchDatumIteratorData));
+
+	/* Must copy to access aligned fields */
+	VARATT_EXTERNAL_GET_POINTER(iterator->toast_pointer, attr);
+
+	iterator->ressize = iterator->toast_pointer.va_extsize;
+	iterator->numchunks = ((iterator->ressize - 1) / TOAST_MAX_CHUNK_SIZE) + 1;
+
+	/*
+	 * Open the toast relation and its indexes
+	 */
+	iterator->toastrel = table_open(iterator->toast_pointer.va_toastrelid, AccessShareLock);
+
+	/* Look for the valid index of the toast relation */
+	validIndex = toast_open_indexes(iterator->toastrel,
+									AccessShareLock,
+									&iterator->toastidxs,
+									&iterator->num_indexes);
+
+	/*
+	 * Setup a scan key to fetch from the index by va_valueid
+	 */
+	ScanKeyInit(&iterator->toastkey,
+				(AttrNumber) 1,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(iterator->toast_pointer.va_valueid));
+
+	/*
+	 * Read the chunks by index
+	 *
+	 * Note that because the index is actually on (valueid, chunkidx) we will
+	 * see the chunks in chunkidx order, even though we didn't explicitly ask
+	 * for it.
+	 */
+
+	init_toast_snapshot(&iterator->SnapshotToast);
+	iterator->toastscan = systable_beginscan_ordered(iterator->toastrel, iterator->toastidxs[validIndex],
+										   &iterator->SnapshotToast, 1, &iterator->toastkey);
+
+	iterator->buf = (ToastBuffer *) palloc0(sizeof(ToastBuffer));
+	init_toast_buffer(iterator->buf, iterator->ressize + VARHDRSZ, VARATT_EXTERNAL_IS_COMPRESSED(iterator->toast_pointer));
+
+	iterator->nextidx = 0;
+	iterator->done = false;
+
+	return iterator;
+}
+
+static bool
+free_fetch_datum_iterator(FetchDatumIterator iter)
+{
+	if (iter == NULL)
+	{
+		return false;
+	}
+
+	if (!iter->done)
+	{
+		systable_endscan_ordered(iter->toastscan);
+		toast_close_indexes(iter->toastidxs, iter->num_indexes, AccessShareLock);
+		table_close(iter->toastrel, AccessShareLock);
+	}
+	free_toast_buffer(iter->buf);
+	pfree(iter);
+	return true;
+}
+
+/* ----------
+ * fetch_datum_iterate -
+ *
+ * Iterate through the toasted value referenced by iterator.
+ *
+ * As long as there is another chunk data in compression or external storage,
+ * fetch it into buffer in iterator.
+ * ----------
+ */
+static void
+fetch_datum_iterate(FetchDatumIterator iter) {
+	HeapTuple	ttup;
+	TupleDesc	toasttupDesc;
+	int32		residx;
+	Pointer		chunk;
+	bool		isnull;
+	char		*chunkdata;
+	int32		chunksize;
+
+	Assert(iter != NULL && !iter->done);
+
+	ttup = systable_getnext_ordered(iter->toastscan, ForwardScanDirection);
+	if (ttup == NULL)
+	{
+		/*
+		 * Final checks that we successfully fetched the datum
+		 */
+		if (iter->nextidx != iter->numchunks)
+			elog(ERROR, "missing chunk number %d for toast value %u in %s",
+				 iter->nextidx,
+				 iter->toast_pointer.va_valueid,
+				 RelationGetRelationName(iter->toastrel));
+
+		/*
+		 * End scan and close relations
+		 */
+		systable_endscan_ordered(iter->toastscan);
+		toast_close_indexes(iter->toastidxs, iter->num_indexes, AccessShareLock);
+		table_close(iter->toastrel, AccessShareLock);
+
+		iter->done = true;
+		return;
+	}
+
+	/*
+	 * Have a chunk, extract the sequence number and the data
+	 */
+	toasttupDesc = iter->toastrel->rd_att;
+	residx = DatumGetInt32(fastgetattr(ttup, 2, toasttupDesc, &isnull));
+	Assert(!isnull);
+	chunk = DatumGetPointer(fastgetattr(ttup, 3, toasttupDesc, &isnull));
+	Assert(!isnull);
+	if (!VARATT_IS_EXTENDED(chunk))
+	{
+		chunksize = VARSIZE(chunk) - VARHDRSZ;
+		chunkdata = VARDATA(chunk);
+	}
+	else if (VARATT_IS_SHORT(chunk))
+	{
+		/* could happen due to heap_form_tuple doing its thing */
+		chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+		chunkdata = VARDATA_SHORT(chunk);
+	}
+	else
+	{
+		/* should never happen */
+		elog(ERROR, "found toasted toast chunk for toast value %u in %s",
+			 iter->toast_pointer.va_valueid,
+			 RelationGetRelationName(iter->toastrel));
+		chunksize = 0;		/* keep compiler quiet */
+		chunkdata = NULL;
+	}
+
+	/*
+	 * Some checks on the data we've found
+	 */
+	if (residx != iter->nextidx)
+		elog(ERROR, "unexpected chunk number %d (expected %d) for toast value %u in %s",
+			 residx, iter->nextidx,
+			 iter->toast_pointer.va_valueid,
+			 RelationGetRelationName(iter->toastrel));
+	if (residx < iter->numchunks - 1)
+	{
+		if (chunksize != TOAST_MAX_CHUNK_SIZE)
+			elog(ERROR, "unexpected chunk size %d (expected %d) in chunk %d of %d for toast value %u in %s",
+				 chunksize, (int) TOAST_MAX_CHUNK_SIZE,
+				 residx, iter->numchunks,
+				 iter->toast_pointer.va_valueid,
+				 RelationGetRelationName(iter->toastrel));
+	}
+	else if (residx == iter->numchunks - 1)
+	{
+		if ((residx * TOAST_MAX_CHUNK_SIZE + chunksize) != iter->ressize)
+			elog(ERROR, "unexpected chunk size %d (expected %d) in final chunk %d for toast value %u in %s",
+				 chunksize,
+				 (int) (iter->ressize - residx * TOAST_MAX_CHUNK_SIZE),
+				 residx,
+				 iter->toast_pointer.va_valueid,
+				 RelationGetRelationName(iter->toastrel));
+	}
+	else
+		elog(ERROR, "unexpected chunk number %d (out of range %d..%d) for toast value %u in %s",
+			 residx,
+			 0, iter->numchunks - 1,
+			 iter->toast_pointer.va_valueid,
+			 RelationGetRelationName(iter->toastrel));
+
+	/*
+	 * Copy the data into proper place in our iterator buffer
+	 */
+	memcpy(iter->buf->limit, chunkdata, chunksize);
+	iter->buf->limit += chunksize;
+
+	iter->nextidx++;
+}
+
+
+static void
+init_toast_buffer(ToastBuffer *buf, int32 size, bool compressed) {
+	buf->buf = (const char *) palloc0(size);
+	if (compressed) {
+		SET_VARSIZE_COMPRESSED(buf->buf, size);
+		/*
+		 * Note the constrain buf->position <= buf->limit may be broken
+		 * at initialization. Make sure that the constrain is satisfied
+		 * when consume chars.
+		 */
+		buf->position = VARDATA_4B_C(buf->buf);
+	}
+	else
+	{
+		SET_VARSIZE(buf->buf, size);
+		buf->position = VARDATA_4B(buf->buf);
+	}
+	buf->limit = VARDATA(buf->buf);
+	buf->capacity = buf->buf + size;
+	buf->buf_size = size;
+}
+
+
+static bool
+free_toast_buffer(ToastBuffer *buf)
+{
+	if (buf == NULL)
+	{
+		return false;
+	}
+
+	pfree((void *)buf->buf);
+	pfree(buf);
+
+	return true;
+}
+
+
+/* ----------
+ * pglz_decompress_iterate -
+ *
+ *		Decompresses source into dest until the source is exhausted.
+ * ----------
+ */
+static void
+pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest, DetoastIterator iter)
+{
+	const unsigned char *sp;
+	const unsigned char *srcend;
+	unsigned char *dp;
+	unsigned char *destend;
+
+	/*
+	 * In the while loop, sp may go beyond the srcend, provides a four-byte
+	 * buffer to prevent sp from reading unallocated bytes from source buffer.
+	 * When source->limit reaches source->capacity, don't worry about reading
+	 * unallocated bytes.
+	 */
+	srcend = (const unsigned char *)
+		(source->limit == source->capacity ? source->limit : (source->limit - 4));
+	sp = (const unsigned char *) source->position;
+	dp = (unsigned char *) dest->limit;
+	destend = (unsigned char *) dest->capacity;
+
+	while (sp < srcend && dp < destend)
+	{
+		/*
+		 * Read one control byte and process the next 8 items (or as many as
+		 * remain in the compressed input).
+		 */
+		unsigned char ctrl;
+		int			ctrlc;
+
+		if (iter->ctrlc < 8) {
+			ctrl = iter->ctrl;
+			ctrlc = iter->ctrlc;
+		}
+		else
+		{
+			ctrl = *sp++;
+			ctrlc = 0;
+		}
+
+
+		for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)
+		{
+
+			if (ctrl & 1)
+			{
+				/*
+				 * Otherwise it contains the match length minus 3 and the
+				 * upper 4 bits of the offset. The next following byte
+				 * contains the lower 8 bits of the offset. If the length is
+				 * coded as 18, another extension tag byte tells how much
+				 * longer the match really was (0-255).
+				 */
+				int32		len;
+				int32		off;
+
+				len = (sp[0] & 0x0f) + 3;
+				off = ((sp[0] & 0xf0) << 4) | sp[1];
+				sp += 2;
+				if (len == 18)
+					len += *sp++;
+
+				/*
+				 * Now we copy the bytes specified by the tag from OUTPUT to
+				 * OUTPUT. It is dangerous and platform dependent to use
+				 * memcpy() here, because the copied areas could overlap
+				 * extremely!
+				 */
+				len = Min(len, destend - dp);
+				while (len--)
+				{
+					*dp = dp[-off];
+					dp++;
+				}
+			}
+			else
+			{
+				/*
+				 * An unset control bit means LITERAL BYTE. So we just copy
+				 * one from INPUT to OUTPUT.
+				 */
+				*dp++ = *sp++;
+			}
+
+			/*
+			 * Advance the control bit
+			 */
+			ctrl >>= 1;
+		}
+
+		iter->ctrlc = ctrlc;
+		iter->ctrl = ctrl;
+	}
+
+	source->position = (char *) sp;
+	dest->limit = (char *) dp;
+}
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 0864838..0312dba 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -56,6 +56,8 @@ typedef struct
 	int			len1;			/* string lengths in bytes */
 	int			len2;
 
+	DetoastIterator iter;
+
 	/* Skip table for Boyer-Moore-Horspool search algorithm: */
 	int			skiptablemask;	/* mask for ANDing with skiptable subscripts */
 	int			skiptable[256]; /* skip distance for given mismatched char */
@@ -122,7 +124,7 @@ static text *text_substring(Datum str,
 							int32 length,
 							bool length_not_specified);
 static text *text_overlay(text *t1, text *t2, int sp, int sl);
-static int	text_position(text *t1, text *t2, Oid collid);
+static int	text_position(text *t1, text *t2, Oid collid, DetoastIterator iter);
 static void text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state);
 static bool text_position_next(TextPositionState *state);
 static char *text_position_next_internal(char *start_ptr, TextPositionState *state);
@@ -1092,10 +1094,22 @@ text_overlay(text *t1, text *t2, int sp, int sl)
 Datum
 textpos(PG_FUNCTION_ARGS)
 {
-	text	   *str = PG_GETARG_TEXT_PP(0);
+	text		*str;
+	DetoastIteratorData iteratorData;
+	DetoastIterator iter = &iteratorData;
 	text	   *search_str = PG_GETARG_TEXT_PP(1);
 
-	PG_RETURN_INT32((int32) text_position(str, search_str, PG_GET_COLLATION()));
+	if (create_detoast_iterator((struct varlena *)(DatumGetPointer(PG_GETARG_DATUM(0))), iter))
+	{
+		str = (text *) iter->buf->buf;
+	}
+	else
+	{
+		str = PG_GETARG_TEXT_PP(0);
+		iter = NULL;
+	}
+
+	PG_RETURN_INT32((int32) text_position(str, search_str, PG_GET_COLLATION(), iter));
 }
 
 /*
@@ -1113,7 +1127,7 @@ textpos(PG_FUNCTION_ARGS)
  *	functions.
  */
 static int
-text_position(text *t1, text *t2, Oid collid)
+text_position(text *t1, text *t2, Oid collid, DetoastIterator iter)
 {
 	TextPositionState state;
 	int			result;
@@ -1121,6 +1135,7 @@ text_position(text *t1, text *t2, Oid collid)
 	if (VARSIZE_ANY_EXHDR(t1) < 1 || VARSIZE_ANY_EXHDR(t2) < 1)
 		return 0;
 
+	state.iter = iter;
 	text_position_setup(t1, t2, collid, &state);
 	if (!text_position_next(&state))
 		result = 0;
@@ -1130,7 +1145,6 @@ text_position(text *t1, text *t2, Oid collid)
 	return result;
 }
 
-
 /*
  * text_position_setup, text_position_next, text_position_cleanup -
  *	Component steps of text_position()
@@ -1358,6 +1372,10 @@ text_position_next_internal(char *start_ptr, TextPositionState *state)
 		hptr = start_ptr;
 		while (hptr < haystack_end)
 		{
+			if (state->iter != NULL) {
+				PG_DETOAST_ITERATE(state->iter, hptr);
+			}
+
 			if (*hptr == nchar)
 				return (char *) hptr;
 			hptr++;
@@ -1375,6 +1393,11 @@ text_position_next_internal(char *start_ptr, TextPositionState *state)
 			const char *nptr;
 			const char *p;
 
+			if (state->iter != NULL)
+			{
+				PG_DETOAST_ITERATE(state->iter, hptr);
+			}
+
 			nptr = needle_last;
 			p = hptr;
 			while (*nptr == *p)
@@ -1438,7 +1461,9 @@ text_position_get_match_pos(TextPositionState *state)
 static void
 text_position_cleanup(TextPositionState *state)
 {
-	/* no cleanup needed */
+	if (state->iter != NULL) {
+		free_detoast_iterator(state->iter);
+	}
 }
 
 static void
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index f0aea24..589bc27 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -17,6 +17,96 @@
 #include "storage/lockdefs.h"
 #include "utils/relcache.h"
 
+#ifndef FRONTEND
+#include "access/genam.h"
+
+/*
+ * TOAST buffer is a producer consumer buffer.
+ *
+ *    +--+--+--+--+--+--+--+--+--+--+--+--+--+
+ *    |  |  |  |  |  |  |  |  |  |  |  |  |  |
+ *    +--+--+--+--+--+--+--+--+--+--+--+--+--+
+ *    ^           ^           ^              ^
+ *   buf      position      limit         capacity
+ *
+ * buf: point to the start of buffer.
+ * position: point to the next char to be consume.
+ * limit: point to the next char to be produce.
+ * capacity: point to the end of buffer.
+ *
+ * Constrains that need to be satisfied:
+ * buf <= position <= limit <= capacity
+ */
+typedef struct ToastBuffer
+{
+	const char	*buf;
+	const char	*position;
+	char		*limit;
+	const char	*capacity;
+	int32		buf_size;
+} ToastBuffer;
+
+
+typedef struct FetchDatumIteratorData
+{
+	ToastBuffer	*buf;
+	Relation	toastrel;
+	Relation	*toastidxs;
+	SysScanDesc	toastscan;
+	ScanKeyData	toastkey;
+	SnapshotData			SnapshotToast;
+	struct varatt_external	toast_pointer;
+	int32		ressize;
+	int32		nextidx;
+	int32		numchunks;
+	int			num_indexes;
+	bool		done;
+}				FetchDatumIteratorData;
+
+typedef struct FetchDatumIteratorData *FetchDatumIterator;
+
+typedef struct DetoastIteratorData
+{
+	ToastBuffer 		*buf;
+	FetchDatumIterator	fetch_datum_iterator;
+	unsigned char		ctrl;
+	int					ctrlc;
+	bool				compressed;		/* toast value is compressed? */
+	bool				done;
+}			DetoastIteratorData;
+
+typedef struct DetoastIteratorData *DetoastIterator;
+
+/* ----------
+ * create_detoast_iterator -
+ *
+ * Initialize detoast iterator.
+ * ----------
+ */
+extern bool create_detoast_iterator(struct varlena *attr, DetoastIterator iterator);
+
+/* ----------
+ * free_detoast_iterator -
+ *
+ * Free the memory space occupied by the de-Toast iterator.
+ * ----------
+ */
+extern bool free_detoast_iterator(DetoastIterator iter);
+
+/* ----------
+ * detoast_iterate -
+ *
+ * Iterate through the toasted value referenced by iterator.
+ *
+ * As long as there is another slice in compression or external storage,
+ * detoast it into toast buffer in iterator.
+ * ----------
+ */
+extern void detoast_iterate(DetoastIterator iter);
+
+#endif
+
+
 /*
  * This enables de-toasting of index entries.  Needed until VACUUM is
  * smart enough to rebuild indexes from scratch.
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index 3ff0999..446c880 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -239,6 +239,13 @@ extern struct varlena *pg_detoast_datum_packed(struct varlena *datum);
 #define PG_DETOAST_DATUM_SLICE(datum,f,c) \
 		pg_detoast_datum_slice((struct varlena *) DatumGetPointer(datum), \
 		(int32) (f), (int32) (c))
+#define PG_DETOAST_ITERATE(iter, need)									\
+	do {																\
+		Assert(need >= iter->buf->buf && need <= iter->buf->capacity);	\
+		while (!iter->done && need >= iter->buf->limit) { 				\
+			detoast_iterate(iter);										\
+		}																\
+	} while (0)
 /* WARNING -- unaligned pointer */
 #define PG_DETOAST_DATUM_PACKED(datum) \
 	pg_detoast_datum_packed((struct varlena *) DatumGetPointer(datum))
-- 
2.7.4

Reply via email to