On 09/18/2014 09:27 PM, Heikki Linnakangas wrote:
On 09/18/2014 07:53 PM, Josh Berkus wrote:
On 09/16/2014 08:45 PM, Tom Lane wrote:
We're somewhat comparing apples and oranges here, in that I pushed my
approach to something that I think is of committable quality (and which,
not incidentally, fixes some existing bugs that we'd need to fix in any
case); while Heikki's patch was just proof-of-concept.  It would be worth
pushing Heikki's patch to committable quality so that we had a more
complete understanding of just what the complexity difference really is.

Is anyone actually working on this?

If not, I'm voting for the all-lengths patch so that we can get 9.4 out
the door.

I'll try to write a more polished patch tomorrow. We'll then see what it
looks like, and can decide if we want it.

Ok, here are two patches. One is a refined version of my earlier patch, and the other implements the separate offsets array approach. They are both based on Tom's jsonb-lengths-merged.patch, so they include all the whitespace fixes etc. he mentioned.

There is no big difference in terms of code complexity between the patches. IMHO the separate offsets array is easier to understand, but it makes for more complicated accessor macros to find the beginning of the variable-length data.

Unlike Tom's patch, these patches don't cache any offsets when doing a binary search. Doesn't seem worth it, when the access time is O(1) anyway.

Both of these patches have a #define JB_OFFSET_STRIDE for the "stride size". For the separate offsets array, the offsets array has one element for every JB_OFFSET_STRIDE children. For the other patch, every JB_OFFSET_STRIDE child stores the end offset, while others store the length. A smaller value makes random access faster, at the cost of compressibility / on-disk size. I haven't done any measurements to find the optimal value, the values in the patches are arbitrary.

I think we should bite the bullet and break compatibility with 9.4beta2 format, even if we go with "my patch". In a jsonb object, it makes sense to store all the keys first, like Tom did, because of cache benefits, and the future possibility to do smart EXTERNAL access. Also, even if we can make the on-disk format compatible, it's weird that you can get different runtime behavior with datums created with a beta version. Seems more clear to just require a pg_dump + restore.

Tom: You mentioned earlier that your patch fixes some existing bugs. What were they? There were a bunch of whitespace and comment fixes that we should apply in any case, but I couldn't see any actual bugs. I think we should apply those fixes separately, to make sure we don't forget about them, and to make it easier to review these patches.

- Heikki

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..456011a 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -196,12 +196,12 @@ jsonb_from_cstring(char *json, int len)
 static size_t
 checkStringLen(size_t len)
 {
-	if (len > JENTRY_POSMASK)
+	if (len > JENTRY_LENMASK)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 				 errmsg("string too long to represent as jsonb string"),
 				 errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
-						   JENTRY_POSMASK)));
+						   JENTRY_LENMASK)));
 
 	return len;
 }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..7f7ed4f 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -26,15 +26,16 @@
  * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
  * reserved for that in the JsonbContainer.header field.
  *
- * (the total size of an array's elements is also limited by JENTRY_POSMASK,
- * but we're not concerned about that here)
+ * (The total size of an array's or object's elements is also limited by
+ * JENTRY_LENMASK, but we're not concerned about that here.)
  */
 #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
 #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
 
-static void fillJsonbValue(JEntry *array, int index, char *base_addr,
+static void fillJsonbValue(JEntry *children, int index,
+			   char *base_addr, uint32 offset,
 			   JsonbValue *result);
-static bool	equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
+static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
 static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
 static Jsonb *convertToJsonb(JsonbValue *val);
 static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
@@ -42,7 +43,7 @@ static void convertJsonbArray(StringInfo buffer, JEntry *header, JsonbValue *val
 static void convertJsonbObject(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
 static void convertJsonbScalar(StringInfo buffer, JEntry *header, JsonbValue *scalarVal);
 
-static int reserveFromBuffer(StringInfo buffer, int len);
+static int	reserveFromBuffer(StringInfo buffer, int len);
 static void appendToBuffer(StringInfo buffer, const char *data, int len);
 static void copyToBuffer(StringInfo buffer, int offset, const char *data, int len);
 static short padBufferToInt(StringInfo buffer);
@@ -108,6 +109,57 @@ JsonbValueToJsonb(JsonbValue *val)
 }
 
 /*
+ * Get the offset of the variable-length portion of a Jsonb node within
+ * the variable-length-data part of its container.  The node is identified
+ * by index within the container's JEntry array.
+ */
+static uint32
+getJsonbOffset(const JsonbContainer *jc, int index)
+{
+	int			i;
+	uint32		offset = 0;
+
+	/*
+	 * Start offset of this entry is equal to the end offset of the previous
+	 * entry. Walk backwards to the previous entry stored as an end offset,
+	 * and count up the lengths in between.
+	 */
+	for (i = index - 1; i >= 0; i--)
+	{
+		offset += JBE_LENFLD(jc->children[i]);
+		if (!JBE_HAS_LEN(jc->children[i]))
+			break;
+	}
+
+	return offset;
+}
+
+/*
+ * Get the length of a child in a container.
+ */
+static uint32
+getJsonbLength(const JsonbContainer *jc, int index)
+{
+	uint32          off;
+	uint32          len;
+
+	/*
+	 * If the length is stored directly in the JEntry, return that. Otherwise,
+	 * get the begin offset of the entry, and subtract it from the stored end
+	 * offset.
+	 */
+	if (JBE_HAS_LEN(jc->children[index]))
+		len = JBE_LENFLD(jc->children[index]);
+	else
+	{
+		off = getJsonbOffset(jc, index);
+		len = JBE_LENFLD(jc->children[index]) - off;
+	}
+
+	return len;
+}
+
+/*
  * BT comparator worker function.  Returns an integer less than, equal to, or
  * greater than zero, indicating whether a is less than, equal to, or greater
  * than b.  Consistent with the requirements for a B-Tree operator class
@@ -201,7 +253,7 @@ compareJsonbContainers(JsonbContainer *a, JsonbContainer *b)
 			 *
 			 * If the two values were of the same container type, then there'd
 			 * have been a chance to observe the variation in the number of
-			 * elements/pairs (when processing WJB_BEGIN_OBJECT, say).  They're
+			 * elements/pairs (when processing WJB_BEGIN_OBJECT, say). They're
 			 * either two heterogeneously-typed containers, or a container and
 			 * some scalar type.
 			 *
@@ -271,62 +323,72 @@ findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
 							JsonbValue *key)
 {
 	JEntry	   *children = container->children;
+	char	   *base_addr = JB_CONTAINER_CHILD_DATA(container);
 	int			count = (container->header & JB_CMASK);
-	JsonbValue *result = palloc(sizeof(JsonbValue));
+	JsonbValue *result;
 
 	Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);
 
+	/* Quick out without a palloc cycle if object/array is empty */
+	if (count <= 0)
+		return NULL;
+
+	result = palloc(sizeof(JsonbValue));
+
 	if (flags & JB_FARRAY & container->header)
 	{
-		char	   *base_addr = (char *) (children + count);
+		uint32		offset = 0;
 		int			i;
 
 		for (i = 0; i < count; i++)
 		{
-			fillJsonbValue(children, i, base_addr, result);
+			fillJsonbValue(children, i, base_addr, offset, result);
 
 			if (key->type == result->type)
 			{
 				if (equalsJsonbScalarValue(key, result))
 					return result;
 			}
+
+			if (JBE_HAS_LEN(children[i]))
+				offset += JBE_LENFLD(children[i]);
+			else
+				offset = JBE_LENFLD(children[i]);
 		}
 	}
 	else if (flags & JB_FOBJECT & container->header)
 	{
-		/* Since this is an object, account for *Pairs* of Jentrys */
-		char	   *base_addr = (char *) (children + count * 2);
+		uint32		lastoff;
 		uint32		stopLow = 0,
-					stopMiddle;
+					stopHigh = count;
 
-		/* Object key past by caller must be a string */
+		/* Object key passed by caller must be a string */
 		Assert(key->type == jbvString);
 
 		/* Binary search on object/pair keys *only* */
-		while (stopLow < count)
+		while (stopLow < stopHigh)
 		{
-			int			index;
+			uint32		stopMiddle;
 			int			difference;
 			JsonbValue	candidate;
 
-			/*
-			 * Note how we compensate for the fact that we're iterating
-			 * through pairs (not entries) throughout.
-			 */
-			stopMiddle = stopLow + (count - stopLow) / 2;
-
-			index = stopMiddle * 2;
+			stopMiddle = stopLow + (stopHigh - stopLow) / 2;
 
 			candidate.type = jbvString;
-			candidate.val.string.val = base_addr + JBE_OFF(children, index);
-			candidate.val.string.len = JBE_LEN(children, index);
+			candidate.val.string.val =
+				base_addr + getJsonbOffset(container, stopMiddle);
+			candidate.val.string.len = getJsonbLength(container, stopMiddle);
 
 			difference = lengthCompareJsonbStringValue(&candidate, key);
 
 			if (difference == 0)
 			{
-				/* Found our key, return value */
-				fillJsonbValue(children, index + 1, base_addr, result);
+				/* Found our key, return corresponding value */
+				int			index = stopMiddle + count;
+
+				lastoff = getJsonbOffset(container, index);
+
+				fillJsonbValue(children, index, base_addr, lastoff, result);
 
 				return result;
 			}
@@ -335,7 +397,7 @@ findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
 				if (difference < 0)
 					stopLow = stopMiddle + 1;
 				else
-					count = stopMiddle;
+					stopHigh = stopMiddle;
 			}
 		}
 	}
@@ -361,14 +423,16 @@ getIthJsonbValueFromContainer(JsonbContainer *container, uint32 i)
 		elog(ERROR, "not a jsonb array");
 
 	nelements = container->header & JB_CMASK;
-	base_addr = (char *) &container->children[nelements];
+	base_addr = JB_CONTAINER_CHILD_DATA(container);
 
 	if (i >= nelements)
 		return NULL;
 
 	result = palloc(sizeof(JsonbValue));
 
-	fillJsonbValue(container->children, i, base_addr, result);
+	fillJsonbValue(container->children, i, base_addr,
+				   getJsonbOffset(container, i),
+				   result);
 
 	return result;
 }
@@ -377,11 +441,17 @@ getIthJsonbValueFromContainer(JsonbContainer *container, uint32 i)
  * A helper function to fill in a JsonbValue to represent an element of an
  * array, or a key or value of an object.
  *
+ * The node's JEntry is at children[index], and its variable-length data
+ * is at base_addr + offset.  We make the caller determine the offset since
+ * in many cases the caller can amortize the work across multiple children.
+ *
  * A nested array or object will be returned as jbvBinary, ie. it won't be
  * expanded.
  */
 static void
-fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
+fillJsonbValue(JEntry *children, int index,
+			   char *base_addr, uint32 offset,
+			   JsonbValue *result)
 {
 	JEntry		entry = children[index];
 
@@ -392,14 +462,14 @@ fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
 	else if (JBE_ISSTRING(entry))
 	{
 		result->type = jbvString;
-		result->val.string.val = base_addr + JBE_OFF(children, index);
-		result->val.string.len = JBE_LEN(children, index);
+		result->val.string.val = base_addr + offset;
+		result->val.string.len = JBE_LENFLD(entry);
 		Assert(result->val.string.len >= 0);
 	}
 	else if (JBE_ISNUMERIC(entry))
 	{
 		result->type = jbvNumeric;
-		result->val.numeric = (Numeric) (base_addr + INTALIGN(JBE_OFF(children, index)));
+		result->val.numeric = (Numeric) (base_addr + INTALIGN(offset));
 	}
 	else if (JBE_ISBOOL_TRUE(entry))
 	{
@@ -415,8 +485,9 @@ fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
 	{
 		Assert(JBE_ISCONTAINER(entry));
 		result->type = jbvBinary;
-		result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(JBE_OFF(children, index)));
-		result->val.binary.len = JBE_LEN(children, index) - (INTALIGN(JBE_OFF(children, index)) - JBE_OFF(children, index));
+		/* Remove alignment padding from data pointer and len */
+		result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(offset));
+		result->val.binary.len = JBE_LENFLD(entry) - (INTALIGN(offset) - offset);
 	}
 }
 
@@ -646,6 +717,8 @@ JsonbIteratorInit(JsonbContainer *container)
 JsonbIteratorToken
 JsonbIteratorNext(JsonbIterator **it, JsonbValue *val, bool skipNested)
 {
+	JEntry je;
+
 	if (*it == NULL)
 		return WJB_DONE;
 
@@ -668,13 +741,15 @@ recurse:
 			 * a full conversion
 			 */
 			val->val.array.rawScalar = (*it)->isScalar;
-			(*it)->i = 0;
+			(*it)->curIndex = 0;
+			(*it)->curDataOffset = 0;
+			(*it)->curValueOffset = 0;	/* not actually used */
 			/* Set state for next call */
 			(*it)->state = JBI_ARRAY_ELEM;
 			return WJB_BEGIN_ARRAY;
 
 		case JBI_ARRAY_ELEM:
-			if ((*it)->i >= (*it)->nElems)
+			if ((*it)->curIndex >= (*it)->nElems)
 			{
 				/*
 				 * All elements within array already processed.  Report this
@@ -686,7 +761,16 @@ recurse:
 				return WJB_END_ARRAY;
 			}
 
-			fillJsonbValue((*it)->children, (*it)->i++, (*it)->dataProper, val);
+			fillJsonbValue((*it)->children, (*it)->curIndex,
+						   (*it)->dataProper, (*it)->curDataOffset,
+						   val);
+
+			je = (*it)->children[(*it)->curIndex];
+			if (JBE_HAS_LEN(je))
+				(*it)->curDataOffset += JBE_LENFLD(je);
+			else
+				(*it)->curDataOffset = JBE_LENFLD(je);
+			(*it)->curIndex++;
 
 			if (!IsAJsonbScalar(val) && !skipNested)
 			{
@@ -697,8 +781,8 @@ recurse:
 			else
 			{
 				/*
-				 * Scalar item in array, or a container and caller didn't
-				 * want us to recurse into it.
+				 * Scalar item in array, or a container and caller didn't want
+				 * us to recurse into it.
 				 */
 				return WJB_ELEM;
 			}
@@ -712,13 +796,15 @@ recurse:
 			 * v->val.object.pairs is not actually set, because we aren't
 			 * doing a full conversion
 			 */
-			(*it)->i = 0;
+			(*it)->curIndex = 0;
+			(*it)->curDataOffset = 0;
+			(*it)->curValueOffset = getJsonbOffset((*it)->container, (*it)->nElems);
 			/* Set state for next call */
 			(*it)->state = JBI_OBJECT_KEY;
 			return WJB_BEGIN_OBJECT;
 
 		case JBI_OBJECT_KEY:
-			if ((*it)->i >= (*it)->nElems)
+			if ((*it)->curIndex >= (*it)->nElems)
 			{
 				/*
 				 * All pairs within object already processed.  Report this to
@@ -732,7 +818,9 @@ recurse:
 			else
 			{
 				/* Return key of a key/value pair.  */
-				fillJsonbValue((*it)->children, (*it)->i * 2, (*it)->dataProper, val);
+				fillJsonbValue((*it)->children, (*it)->curIndex,
+							   (*it)->dataProper, (*it)->curDataOffset,
+							   val);
 				if (val->type != jbvString)
 					elog(ERROR, "unexpected jsonb type as object key");
 
@@ -745,8 +833,21 @@ recurse:
 			/* Set state for next call */
 			(*it)->state = JBI_OBJECT_KEY;
 
-			fillJsonbValue((*it)->children, ((*it)->i++) * 2 + 1,
-						   (*it)->dataProper, val);
+			fillJsonbValue((*it)->children, (*it)->curIndex + (*it)->nElems,
+						   (*it)->dataProper, (*it)->curValueOffset,
+						   val);
+
+			je = (*it)->children[(*it)->curIndex];
+			if (JBE_HAS_LEN(je))
+				(*it)->curDataOffset += JBE_LENFLD(je);
+			else
+				(*it)->curDataOffset = JBE_LENFLD(je);
+			je = (*it)->children[(*it)->curIndex + (*it)->nElems];
+			if (JBE_HAS_LEN(je))
+				(*it)->curValueOffset += JBE_LENFLD(je);
+			else
+				(*it)->curValueOffset = JBE_LENFLD(je);
+			(*it)->curIndex++;
 
 			/*
 			 * Value may be a container, in which case we recurse with new,
@@ -781,12 +882,11 @@ iteratorFromContainer(JsonbContainer *container, JsonbIterator *parent)
 
 	/* Array starts just after header */
 	it->children = container->children;
+	it->dataProper = JB_CONTAINER_CHILD_DATA(it->container);
 
 	switch (container->header & (JB_FARRAY | JB_FOBJECT))
 	{
 		case JB_FARRAY:
-			it->dataProper =
-				(char *) it->children + it->nElems * sizeof(JEntry);
 			it->isScalar = (container->header & JB_FSCALAR) != 0;
 			/* This is either a "raw scalar", or an array */
 			Assert(!it->isScalar || it->nElems == 1);
@@ -795,13 +895,6 @@ iteratorFromContainer(JsonbContainer *container, JsonbIterator *parent)
 			break;
 
 		case JB_FOBJECT:
-
-			/*
-			 * Offset reflects that nElems indicates JsonbPairs in an object.
-			 * Each key and each value contain Jentry metadata just the same.
-			 */
-			it->dataProper =
-				(char *) it->children + it->nElems * sizeof(JEntry) * 2;
 			it->state = JBI_OBJECT_START;
 			break;
 
@@ -1209,8 +1302,8 @@ reserveFromBuffer(StringInfo buffer, int len)
 	buffer->len += len;
 
 	/*
-	 * Keep a trailing null in place, even though it's not useful for us;
-	 * it seems best to preserve the invariants of StringInfos.
+	 * Keep a trailing null in place, even though it's not useful for us; it
+	 * seems best to preserve the invariants of StringInfos.
 	 */
 	buffer->data[buffer->len] = '\0';
 
@@ -1284,8 +1377,8 @@ convertToJsonb(JsonbValue *val)
 
 	/*
 	 * Note: the JEntry of the root is discarded. Therefore the root
-	 * JsonbContainer struct must contain enough information to tell what
-	 * kind of value it is.
+	 * JsonbContainer struct must contain enough information to tell what kind
+	 * of value it is.
 	 */
 
 	res = (Jsonb *) buffer.data;
@@ -1315,10 +1408,10 @@ convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level)
 		return;
 
 	/*
-	 * A JsonbValue passed as val should never have a type of jbvBinary,
-	 * and neither should any of its sub-components. Those values will be
-	 * produced by convertJsonbArray and convertJsonbObject, the results of
-	 * which will not be passed back to this function as an argument.
+	 * A JsonbValue passed as val should never have a type of jbvBinary, and
+	 * neither should any of its sub-components. Those values will be produced
+	 * by convertJsonbArray and convertJsonbObject, the results of which will
+	 * not be passed back to this function as an argument.
 	 */
 
 	if (IsAJsonbScalar(val))
@@ -1334,57 +1427,75 @@ convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level)
 static void
 convertJsonbArray(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
 {
-	int			offset;
-	int			metaoffset;
+	int			base_offset;
+	int			jentry_offset;
 	int			i;
 	int			totallen;
 	uint32		header;
+	int			nElems = val->val.array.nElems;
 
-	/* Initialize pointer into conversion buffer at this level */
-	offset = buffer->len;
+	/* Remember where in the buffer this array starts. */
+	base_offset = buffer->len;
 
+	/* Align to 4-byte boundary (any padding counts as part of my data) */
 	padBufferToInt(buffer);
 
 	/*
-	 * Construct the header Jentry, stored in the beginning of the variable-
-	 * length payload.
+	 * Construct the header Jentry and store it in the beginning of the
+	 * variable-length payload.
 	 */
-	header = val->val.array.nElems | JB_FARRAY;
+	header = nElems | JB_FARRAY;
 	if (val->val.array.rawScalar)
 	{
-		Assert(val->val.array.nElems == 1);
+		Assert(nElems == 1);
 		Assert(level == 0);
 		header |= JB_FSCALAR;
 	}
 
 	appendToBuffer(buffer, (char *) &header, sizeof(uint32));
-	/* reserve space for the JEntries of the elements. */
-	metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);
+
+	/* Reserve space for the JEntries of the elements. */
+	jentry_offset = reserveFromBuffer(buffer, sizeof(JEntry) * nElems);
 
 	totallen = 0;
-	for (i = 0; i < val->val.array.nElems; i++)
+	for (i = 0; i < nElems; i++)
 	{
 		JsonbValue *elem = &val->val.array.elems[i];
 		int			len;
 		JEntry		meta;
 
+		/*
+		 * Convert element, producing a JEntry and appending its
+		 * variable-length data to buffer
+		 */
 		convertJsonbValue(buffer, &meta, elem, level + 1);
-		len = meta & JENTRY_POSMASK;
-		totallen += len;
 
-		if (totallen > JENTRY_POSMASK)
+		/*
+		 * Bail out if total variable-length data exceeds what will fit in a
+		 * JEntry length field.  We check this in each iteration, not just
+		 * once at the end, to forestall possible integer overflow.
+		 */
+		len = JBE_LENFLD(meta);
+		totallen += len;
+		if (totallen > JENTRY_LENMASK)
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 					 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-							JENTRY_POSMASK)));
+							JENTRY_LENMASK)));
 
-		if (i > 0)
-			meta = (meta & ~JENTRY_POSMASK) | totallen;
-		copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
-		metaoffset += sizeof(JEntry);
+		copyToBuffer(buffer, jentry_offset, (char *) &meta, sizeof(JEntry));
+		jentry_offset += sizeof(JEntry);
 	}
 
-	totallen = buffer->len - offset;
+	/* Total data size is everything we've appended to buffer */
+	totallen = buffer->len - base_offset;
+
+	/* Check length again, since we didn't include the metadata above */
+	if (totallen > JENTRY_LENMASK)
+		ereport(ERROR,
+				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+				 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
+						JENTRY_LENMASK)));
 
 	/* Initialize the header of this node, in the container's JEntry array */
 	*pheader = JENTRY_ISCONTAINER | totallen;
@@ -1393,65 +1504,104 @@ convertJsonbArray(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level
 static void
 convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
 {
-	uint32		header;
-	int			offset;
-	int			metaoffset;
+	int			base_offset;
+	int			jentry_offset;
 	int			i;
 	int			totallen;
+	uint32		header;
+	int			nPairs = val->val.object.nPairs;
 
-	/* Initialize pointer into conversion buffer at this level */
-	offset = buffer->len;
+	/* Remember where in the buffer this object starts. */
+	base_offset = buffer->len;
 
+	/* Align to 4-byte boundary (any padding counts as part of my data) */
 	padBufferToInt(buffer);
 
-	/* Initialize header */
-	header = val->val.object.nPairs | JB_FOBJECT;
+	/*
+	 * Construct the header Jentry and store it in the beginning of the
+	 * variable-length payload.
+	 */
+	header = nPairs | JB_FOBJECT;
 	appendToBuffer(buffer, (char *) &header, sizeof(uint32));
 
-	/* reserve space for the JEntries of the keys and values */
-	metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);
+	/* Reserve space for the JEntries of the keys and values. */
+	jentry_offset = reserveFromBuffer(buffer, sizeof(JEntry) * nPairs * 2);
 
+	/*
+	 * Iterate over the keys, then over the values, since that is the ordering
+	 * we want in the on-disk representation.
+	 */
 	totallen = 0;
-	for (i = 0; i < val->val.object.nPairs; i++)
+	for (i = 0; i < nPairs; i++)
 	{
-		JsonbPair *pair = &val->val.object.pairs[i];
-		int len;
-		JEntry meta;
+		JsonbPair  *pair = &val->val.object.pairs[i];
+		int			len;
+		JEntry		meta;
 
-		/* put key */
+		/*
+		 * Convert key, producing a JEntry and appending its variable-length
+		 * data to buffer
+		 */
 		convertJsonbScalar(buffer, &meta, &pair->key);
 
-		len = meta & JENTRY_POSMASK;
+		len = JBE_LENFLD(meta);
 		totallen += len;
 
-		if (totallen > JENTRY_POSMASK)
+		copyToBuffer(buffer, jentry_offset, (char *) &meta, sizeof(JEntry));
+		jentry_offset += sizeof(JEntry);
+
+		/*
+		 * Bail out if total variable-length data exceeds what will fit in a
+		 * JEntry length field.  We check this in each iteration, not just
+		 * once at the end, to forestall possible integer overflow.
+		 */
+		if (totallen > JENTRY_LENMASK)
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-					 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-							JENTRY_POSMASK)));
+					 errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+							JENTRY_LENMASK)));
+	}
+	for (i = 0; i < nPairs; i++)
+	{
+		JsonbPair  *pair = &val->val.object.pairs[i];
+		int			len;
+		JEntry		meta;
 
-		if (i > 0)
-			meta = (meta & ~JENTRY_POSMASK) | totallen;
-		copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
-		metaoffset += sizeof(JEntry);
+		/*
+		 * Convert value, producing a JEntry and appending its variable-length
+		 * data to buffer
+		 */
+		convertJsonbValue(buffer, &meta, &pair->value, level + 1);
 
-		convertJsonbValue(buffer, &meta, &pair->value, level);
-		len = meta & JENTRY_POSMASK;
+		len = JBE_LENFLD(meta);
 		totallen += len;
 
-		if (totallen > JENTRY_POSMASK)
+		copyToBuffer(buffer, jentry_offset, (char *) &meta, sizeof(JEntry));
+		jentry_offset += sizeof(JEntry);
+
+		/*
+		 * Bail out if total variable-length data exceeds what will fit in a
+		 * JEntry length field.  We check this in each iteration, not just
+		 * once at the end, to forestall possible integer overflow.
+		 */
+		if (totallen > JENTRY_LENMASK)
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-					 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-							JENTRY_POSMASK)));
-
-		meta = (meta & ~JENTRY_POSMASK) | totallen;
-		copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
-		metaoffset += sizeof(JEntry);
+					 errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+							JENTRY_LENMASK)));
 	}
 
-	totallen = buffer->len - offset;
+	/* Total data size is everything we've appended to buffer */
+	totallen = buffer->len - base_offset;
 
+	/* Check length again, since we didn't include the metadata above */
+	if (totallen > JENTRY_LENMASK)
+		ereport(ERROR,
+				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+				 errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+						JENTRY_LENMASK)));
+
+	/* Initialize the header of this node, in the container's JEntry array */
 	*pheader = JENTRY_ISCONTAINER | totallen;
 }
 
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 91e3e14..9f02319 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -83,9 +83,9 @@ typedef struct JsonbValue JsonbValue;
  * buffer is accessed, but they can also be deep copied and passed around.
  *
  * Jsonb is a tree structure. Each node in the tree consists of a JEntry
- * header, and a variable-length content.  The JEntry header indicates what
- * kind of a node it is, e.g. a string or an array, and the offset and length
- * of its variable-length portion within the container.
+ * header and a variable-length content (possibly of zero size).  The JEntry
+ * header indicates what kind of a node it is, e.g. a string or an array,
+ * and includes the length of its variable-length portion.
  *
  * The JEntry and the content of a node are not stored physically together.
  * Instead, the container array or object has an array that holds the JEntrys
@@ -95,39 +95,31 @@ typedef struct JsonbValue JsonbValue;
  * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
  * is implicitly known that the root node must be an array or an object,
  * so we can get away without the type indicator as long as we can distinguish
- * the two.  For that purpose, both an array and an object begins with a uint32
+ * the two.  For that purpose, both an array and an object begin with a uint32
  * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
  * scalar value needs to be stored as a Jsonb value, what we actually store is
  * an array with one element, with the flags in the array's header field set
  * to JB_FSCALAR | JB_FARRAY.
  *
- * To encode the length and offset of the variable-length portion of each
- * node in a compact way, the JEntry stores only the end offset within the
- * variable-length portion of the container node. For the first JEntry in the
- * container's JEntry array, that equals to the length of the node data.  The
- * begin offset and length of the rest of the entries can be calculated using
- * the end offset of the previous JEntry in the array.
- *
  * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
  * the variable-length portion of some node types is aligned to a 4-byte
  * boundary, while others are not. When alignment is needed, the padding is
  * in the beginning of the node that requires it. For example, if a numeric
  * node is stored after a string node, so that the numeric node begins at
  * offset 3, the variable-length portion of the numeric node will begin with
- * one padding byte.
+ * one padding byte so that the actual numeric data is 4-byte aligned.
  */
 
 /*
  * Jentry format.
  *
- * The least significant 28 bits store the end offset of the entry (see
- * JBE_ENDPOS, JBE_OFF, JBE_LEN macros below). The next three bits
- * are used to store the type of the entry. The most significant bit
- * is unused, and should be set to zero.
+ * The least significant 28 bits store the data length of the entry (see
+ * JBE_LENFLD and JBE_LEN macros below). The next three bits store the type
+ * of the entry.
  */
 typedef uint32 JEntry;
 
-#define JENTRY_POSMASK			0x0FFFFFFF
+#define JENTRY_LENMASK			0x0FFFFFFF
 #define JENTRY_TYPEMASK			0x70000000
 
 /* values stored in the type bits */
@@ -138,7 +130,10 @@ typedef uint32 JEntry;
 #define JENTRY_ISNULL			0x40000000
 #define JENTRY_ISCONTAINER		0x50000000		/* array or object */
 
+#define JENTRY_HAS_LEN			0x80000000
+
 /* Note possible multiple evaluations */
+#define JBE_HAS_LEN(je_)		(((je_) & JENTRY_HAS_LEN) != 0)
 #define JBE_ISSTRING(je_)		(((je_) & JENTRY_TYPEMASK) == JENTRY_ISSTRING)
 #define JBE_ISNUMERIC(je_)		(((je_) & JENTRY_TYPEMASK) == JENTRY_ISNUMERIC)
 #define JBE_ISCONTAINER(je_)	(((je_) & JENTRY_TYPEMASK) == JENTRY_ISCONTAINER)
@@ -148,19 +143,25 @@ typedef uint32 JEntry;
 #define JBE_ISBOOL(je_)			(JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))
 
 /*
- * Macros for getting the offset and length of an element. Note multiple
- * evaluations and access to prior array element.
+ * Macros for getting the data length/offset of a JEntry.
  */
-#define JBE_ENDPOS(je_)			((je_) & JENTRY_POSMASK)
-#define JBE_OFF(ja, i)			((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
-#define JBE_LEN(ja, i)			((i) == 0 ? JBE_ENDPOS((ja)[i]) \
-								 : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))
+#define JBE_LENFLD(je_)			((je_) & JENTRY_LENMASK)
 
 /*
  * A jsonb array or object node, within a Jsonb Datum.
  *
- * An array has one child for each element. An object has two children for
- * each key/value pair.
+ * An array has one child for each element, stored in array order.
+ *
+ * An object has two children for each key/value pair.  The keys all appear
+ * first, in key sort order; then the values appear, in an order matching the
+ * key order.  This arrangement keeps the keys compact in memory, making a
+ * search for a particular key more cache-friendly.
+ *
+ * The JEntry for each child contains the length or offset of the entry.
+ * Some entries store the end offset of the child, while others store the
+ * length. This is a compromise to make the array compressible, but also
+ * provide random access to large arrays without having to add up all the
+ * lengths.
  */
 typedef struct JsonbContainer
 {
@@ -172,11 +173,26 @@ typedef struct JsonbContainer
 } JsonbContainer;
 
 /* flags for the header-field in JsonbContainer */
-#define JB_CMASK				0x0FFFFFFF
-#define JB_FSCALAR				0x10000000
+#define JB_CMASK				0x0FFFFFFF		/* mask for count field */
+#define JB_FSCALAR				0x10000000		/* flag bits */
 #define JB_FOBJECT				0x20000000
 #define JB_FARRAY				0x40000000
 
+/* For every JB_OFFSET_STRIDE children, an offset is stored. */
+#define JB_OFFSET_STRIDE		16
+
+/*
+ * Accessor macros for the fields of a JsonbContainer.
+ *
+ * Beware of multiple evaluation!
+ */
+#define JB_CONTAINER_NCHILDREN(jc) \
+	(((jc)->header & JB_CMASK) * (((jc)->header & JB_FOBJECT) ? 2 : 1))
+
+#define JB_CONTAINER_CHILD_DATA(jc) \
+	((char *) &(jc)->children[JB_CONTAINER_NCHILDREN(jc)])
+
+
 /* The top-level on-disk format for a jsonb datum. */
 typedef struct
 {
@@ -248,18 +264,20 @@ struct JsonbValue
 									 (jsonbval)->type <= jbvBool)
 
 /*
- * Pair within an Object.
+ * Key/value pair within an Object.
+ *
+ * This struct type is only used briefly while constructing a Jsonb; it is
+ * *not* the on-disk representation.
  *
- * Pairs with duplicate keys are de-duplicated.  We store the order for the
- * benefit of doing so in a well-defined way with respect to the original
- * observed order (which is "last observed wins").  This is only used briefly
- * when originally constructing a Jsonb.
+ * Pairs with duplicate keys are de-duplicated.  We store the originally
+ * observed pair ordering for the purpose of removing duplicates in a
+ * well-defined way (which is "last observed wins").
  */
 struct JsonbPair
 {
 	JsonbValue	key;			/* Must be a jbvString */
 	JsonbValue	value;			/* May be of any type */
-	uint32		order;			/* preserves order of pairs with equal keys */
+	uint32		order;			/* Pair's index in original sequence */
 };
 
 /* Conversion state used when parsing Jsonb from text, or for type coercion */
@@ -287,20 +305,25 @@ typedef struct JsonbIterator
 {
 	/* Container being iterated */
 	JsonbContainer *container;
-	uint32		nElems;			/* Number of elements in children array (will be
-								 * nPairs for objects) */
+	uint32		nElems;			/* Number of elements in children array (will
+								 * be nPairs for objects) */
 	bool		isScalar;		/* Pseudo-array scalar value? */
-	JEntry	   *children;
+	JEntry	   *children;		/* JEntrys for child nodes */
+	/* Data proper.  This points to the beginning of the variable-length data */
+	char	   *dataProper;
+
+	/* Current item in buffer (up to nElems) */
+	int			curIndex;
 
-	/* Current item in buffer (up to nElems, but must * 2 for objects) */
-	int			i;
+	/* Data offset corresponding to current item */
+	uint32		curDataOffset;
 
 	/*
-	 * Data proper.  This points just past end of children array.
-	 * We use the JBE_OFF() macro on the Jentrys to find offsets of each
-	 * child in this area.
+	 * If the container is an object, we want to return keys and values
+	 * alternately; so curDataOffset points to the current key, and
+	 * curValueOffset points to the current value.
 	 */
-	char	   *dataProper;
+	uint32		curValueOffset;
 
 	/* Private state */
 	JsonbIterState state;
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..456011a 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -196,12 +196,12 @@ jsonb_from_cstring(char *json, int len)
 static size_t
 checkStringLen(size_t len)
 {
-	if (len > JENTRY_POSMASK)
+	if (len > JENTRY_LENMASK)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 				 errmsg("string too long to represent as jsonb string"),
 				 errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
-						   JENTRY_POSMASK)));
+						   JENTRY_LENMASK)));
 
 	return len;
 }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..7f6223a 100644
--- a/src/backend/utils/adt/jsonb_util.c
+++ b/src/backend/utils/adt/jsonb_util.c
@@ -26,15 +26,16 @@
  * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
  * reserved for that in the JsonbContainer.header field.
  *
- * (the total size of an array's elements is also limited by JENTRY_POSMASK,
- * but we're not concerned about that here)
+ * (The total size of an array's or object's elements is also limited by
+ * JENTRY_LENMASK, but we're not concerned about that here.)
  */
 #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
 #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
 
-static void fillJsonbValue(JEntry *array, int index, char *base_addr,
+static void fillJsonbValue(JEntry *children, int index,
+			   char *base_addr, uint32 offset,
 			   JsonbValue *result);
-static bool	equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
+static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
 static int	compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
 static Jsonb *convertToJsonb(JsonbValue *val);
 static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
@@ -42,7 +43,7 @@ static void convertJsonbArray(StringInfo buffer, JEntry *header, JsonbValue *val
 static void convertJsonbObject(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
 static void convertJsonbScalar(StringInfo buffer, JEntry *header, JsonbValue *scalarVal);
 
-static int reserveFromBuffer(StringInfo buffer, int len);
+static int	reserveFromBuffer(StringInfo buffer, int len);
 static void appendToBuffer(StringInfo buffer, const char *data, int len);
 static void copyToBuffer(StringInfo buffer, int offset, const char *data, int len);
 static short padBufferToInt(StringInfo buffer);
@@ -108,6 +109,32 @@ JsonbValueToJsonb(JsonbValue *val)
 }
 
 /*
+ * Get the offset of the variable-length portion of a Jsonb node within
+ * the variable-length-data part of its container.  The node is identified
+ * by index within the container's JEntry array.
+ */
+static uint32
+getJsonbOffset(const JsonbContainer *jc, int index)
+{
+	int			i;
+	int			stride;
+	uint32		offset;
+
+	/* First, look up nearest offset stored in the offset array. */
+	stride = index / JB_OFFSET_STRIDE;
+	if (stride == 0)
+		offset = 0;
+	else
+		offset = JB_CONTAINER_OFFSETS(jc)[stride - 1];
+
+	/* Add up lengths of previous nodes to get the offset. */
+	for (i = stride * JB_OFFSET_STRIDE; i < index; i++)
+		offset += JBE_LEN(jc->children, i);
+
+	return offset;
+}
+
+/*
  * BT comparator worker function.  Returns an integer less than, equal to, or
  * greater than zero, indicating whether a is less than, equal to, or greater
  * than b.  Consistent with the requirements for a B-Tree operator class
@@ -201,7 +228,7 @@ compareJsonbContainers(JsonbContainer *a, JsonbContainer *b)
 			 *
 			 * If the two values were of the same container type, then there'd
 			 * have been a chance to observe the variation in the number of
-			 * elements/pairs (when processing WJB_BEGIN_OBJECT, say).  They're
+			 * elements/pairs (when processing WJB_BEGIN_OBJECT, say). They're
 			 * either two heterogeneously-typed containers, or a container and
 			 * some scalar type.
 			 *
@@ -271,62 +298,69 @@ findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
 							JsonbValue *key)
 {
 	JEntry	   *children = container->children;
+	char	   *base_addr = JB_CONTAINER_CHILD_DATA(container);
 	int			count = (container->header & JB_CMASK);
-	JsonbValue *result = palloc(sizeof(JsonbValue));
+	JsonbValue *result;
 
 	Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);
 
+	/* Quick out without a palloc cycle if object/array is empty */
+	if (count <= 0)
+		return NULL;
+
+	result = palloc(sizeof(JsonbValue));
+
 	if (flags & JB_FARRAY & container->header)
 	{
-		char	   *base_addr = (char *) (children + count);
+		uint32		offset = 0;
 		int			i;
 
 		for (i = 0; i < count; i++)
 		{
-			fillJsonbValue(children, i, base_addr, result);
+			fillJsonbValue(children, i, base_addr, offset, result);
 
 			if (key->type == result->type)
 			{
 				if (equalsJsonbScalarValue(key, result))
 					return result;
 			}
+
+			offset += JBE_LEN(children, i);
 		}
 	}
 	else if (flags & JB_FOBJECT & container->header)
 	{
-		/* Since this is an object, account for *Pairs* of Jentrys */
-		char	   *base_addr = (char *) (children + count * 2);
+		uint32		lastoff;
 		uint32		stopLow = 0,
-					stopMiddle;
+					stopHigh = count;
 
-		/* Object key past by caller must be a string */
+		/* Object key passed by caller must be a string */
 		Assert(key->type == jbvString);
 
 		/* Binary search on object/pair keys *only* */
-		while (stopLow < count)
+		while (stopLow < stopHigh)
 		{
-			int			index;
+			uint32		stopMiddle;
 			int			difference;
 			JsonbValue	candidate;
 
-			/*
-			 * Note how we compensate for the fact that we're iterating
-			 * through pairs (not entries) throughout.
-			 */
-			stopMiddle = stopLow + (count - stopLow) / 2;
-
-			index = stopMiddle * 2;
+			stopMiddle = stopLow + (stopHigh - stopLow) / 2;
 
 			candidate.type = jbvString;
-			candidate.val.string.val = base_addr + JBE_OFF(children, index);
-			candidate.val.string.len = JBE_LEN(children, index);
+			candidate.val.string.val =
+				base_addr + getJsonbOffset(container, stopMiddle);
+			candidate.val.string.len = JBE_LEN(children, stopMiddle);
 
 			difference = lengthCompareJsonbStringValue(&candidate, key);
 
 			if (difference == 0)
 			{
-				/* Found our key, return value */
-				fillJsonbValue(children, index + 1, base_addr, result);
+				/* Found our key, return corresponding value */
+				int			index = stopMiddle + count;
+
+				lastoff = getJsonbOffset(container, index);
+
+				fillJsonbValue(children, index, base_addr, lastoff, result);
 
 				return result;
 			}
@@ -335,7 +369,7 @@ findJsonbValueFromContainer(JsonbContainer *container, uint32 flags,
 				if (difference < 0)
 					stopLow = stopMiddle + 1;
 				else
-					count = stopMiddle;
+					stopHigh = stopMiddle;
 			}
 		}
 	}
@@ -361,14 +395,16 @@ getIthJsonbValueFromContainer(JsonbContainer *container, uint32 i)
 		elog(ERROR, "not a jsonb array");
 
 	nelements = container->header & JB_CMASK;
-	base_addr = (char *) &container->children[nelements];
+	base_addr = JB_CONTAINER_CHILD_DATA(container);
 
 	if (i >= nelements)
 		return NULL;
 
 	result = palloc(sizeof(JsonbValue));
 
-	fillJsonbValue(container->children, i, base_addr, result);
+	fillJsonbValue(container->children, i, base_addr,
+				   getJsonbOffset(container, i),
+				   result);
 
 	return result;
 }
@@ -377,11 +413,17 @@ getIthJsonbValueFromContainer(JsonbContainer *container, uint32 i)
  * A helper function to fill in a JsonbValue to represent an element of an
  * array, or a key or value of an object.
  *
+ * The node's JEntry is at children[index], and its variable-length data
+ * is at base_addr + offset.  We make the caller determine the offset since
+ * in many cases the caller can amortize the work across multiple children.
+ *
  * A nested array or object will be returned as jbvBinary, ie. it won't be
  * expanded.
  */
 static void
-fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
+fillJsonbValue(JEntry *children, int index,
+			   char *base_addr, uint32 offset,
+			   JsonbValue *result)
 {
 	JEntry		entry = children[index];
 
@@ -392,14 +434,14 @@ fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
 	else if (JBE_ISSTRING(entry))
 	{
 		result->type = jbvString;
-		result->val.string.val = base_addr + JBE_OFF(children, index);
-		result->val.string.len = JBE_LEN(children, index);
+		result->val.string.val = base_addr + offset;
+		result->val.string.len = JBE_LENFLD(entry);
 		Assert(result->val.string.len >= 0);
 	}
 	else if (JBE_ISNUMERIC(entry))
 	{
 		result->type = jbvNumeric;
-		result->val.numeric = (Numeric) (base_addr + INTALIGN(JBE_OFF(children, index)));
+		result->val.numeric = (Numeric) (base_addr + INTALIGN(offset));
 	}
 	else if (JBE_ISBOOL_TRUE(entry))
 	{
@@ -415,8 +457,9 @@ fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
 	{
 		Assert(JBE_ISCONTAINER(entry));
 		result->type = jbvBinary;
-		result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(JBE_OFF(children, index)));
-		result->val.binary.len = JBE_LEN(children, index) - (INTALIGN(JBE_OFF(children, index)) - JBE_OFF(children, index));
+		/* Remove alignment padding from data pointer and len */
+		result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(offset));
+		result->val.binary.len = JBE_LENFLD(entry) - (INTALIGN(offset) - offset);
 	}
 }
 
@@ -668,13 +711,15 @@ recurse:
 			 * a full conversion
 			 */
 			val->val.array.rawScalar = (*it)->isScalar;
-			(*it)->i = 0;
+			(*it)->curIndex = 0;
+			(*it)->curDataOffset = 0;
+			(*it)->curValueOffset = 0;	/* not actually used */
 			/* Set state for next call */
 			(*it)->state = JBI_ARRAY_ELEM;
 			return WJB_BEGIN_ARRAY;
 
 		case JBI_ARRAY_ELEM:
-			if ((*it)->i >= (*it)->nElems)
+			if ((*it)->curIndex >= (*it)->nElems)
 			{
 				/*
 				 * All elements within array already processed.  Report this
@@ -686,7 +731,12 @@ recurse:
 				return WJB_END_ARRAY;
 			}
 
-			fillJsonbValue((*it)->children, (*it)->i++, (*it)->dataProper, val);
+			fillJsonbValue((*it)->children, (*it)->curIndex,
+						   (*it)->dataProper, (*it)->curDataOffset,
+						   val);
+
+			(*it)->curDataOffset += JBE_LEN((*it)->children, (*it)->curIndex);
+			(*it)->curIndex++;
 
 			if (!IsAJsonbScalar(val) && !skipNested)
 			{
@@ -697,8 +747,8 @@ recurse:
 			else
 			{
 				/*
-				 * Scalar item in array, or a container and caller didn't
-				 * want us to recurse into it.
+				 * Scalar item in array, or a container and caller didn't want
+				 * us to recurse into it.
 				 */
 				return WJB_ELEM;
 			}
@@ -712,13 +762,15 @@ recurse:
 			 * v->val.object.pairs is not actually set, because we aren't
 			 * doing a full conversion
 			 */
-			(*it)->i = 0;
+			(*it)->curIndex = 0;
+			(*it)->curDataOffset = 0;
+			(*it)->curValueOffset = getJsonbOffset((*it)->container, (*it)->nElems);
 			/* Set state for next call */
 			(*it)->state = JBI_OBJECT_KEY;
 			return WJB_BEGIN_OBJECT;
 
 		case JBI_OBJECT_KEY:
-			if ((*it)->i >= (*it)->nElems)
+			if ((*it)->curIndex >= (*it)->nElems)
 			{
 				/*
 				 * All pairs within object already processed.  Report this to
@@ -732,7 +784,9 @@ recurse:
 			else
 			{
 				/* Return key of a key/value pair.  */
-				fillJsonbValue((*it)->children, (*it)->i * 2, (*it)->dataProper, val);
+				fillJsonbValue((*it)->children, (*it)->curIndex,
+							   (*it)->dataProper, (*it)->curDataOffset,
+							   val);
 				if (val->type != jbvString)
 					elog(ERROR, "unexpected jsonb type as object key");
 
@@ -745,8 +799,15 @@ recurse:
 			/* Set state for next call */
 			(*it)->state = JBI_OBJECT_KEY;
 
-			fillJsonbValue((*it)->children, ((*it)->i++) * 2 + 1,
-						   (*it)->dataProper, val);
+			fillJsonbValue((*it)->children, (*it)->curIndex + (*it)->nElems,
+						   (*it)->dataProper, (*it)->curValueOffset,
+						   val);
+
+			(*it)->curDataOffset += JBE_LEN((*it)->children,
+											(*it)->curIndex);
+			(*it)->curValueOffset += JBE_LEN((*it)->children,
+											 (*it)->curIndex + (*it)->nElems);
+			(*it)->curIndex++;
 
 			/*
 			 * Value may be a container, in which case we recurse with new,
@@ -781,12 +842,11 @@ iteratorFromContainer(JsonbContainer *container, JsonbIterator *parent)
 
 	/* Array starts just after header */
 	it->children = container->children;
+	it->dataProper = JB_CONTAINER_CHILD_DATA(it->container);
 
 	switch (container->header & (JB_FARRAY | JB_FOBJECT))
 	{
 		case JB_FARRAY:
-			it->dataProper =
-				(char *) it->children + it->nElems * sizeof(JEntry);
 			it->isScalar = (container->header & JB_FSCALAR) != 0;
 			/* This is either a "raw scalar", or an array */
 			Assert(!it->isScalar || it->nElems == 1);
@@ -795,13 +855,6 @@ iteratorFromContainer(JsonbContainer *container, JsonbIterator *parent)
 			break;
 
 		case JB_FOBJECT:
-
-			/*
-			 * Offset reflects that nElems indicates JsonbPairs in an object.
-			 * Each key and each value contain Jentry metadata just the same.
-			 */
-			it->dataProper =
-				(char *) it->children + it->nElems * sizeof(JEntry) * 2;
 			it->state = JBI_OBJECT_START;
 			break;
 
@@ -1209,8 +1262,8 @@ reserveFromBuffer(StringInfo buffer, int len)
 	buffer->len += len;
 
 	/*
-	 * Keep a trailing null in place, even though it's not useful for us;
-	 * it seems best to preserve the invariants of StringInfos.
+	 * Keep a trailing null in place, even though it's not useful for us; it
+	 * seems best to preserve the invariants of StringInfos.
 	 */
 	buffer->data[buffer->len] = '\0';
 
@@ -1284,8 +1337,8 @@ convertToJsonb(JsonbValue *val)
 
 	/*
 	 * Note: the JEntry of the root is discarded. Therefore the root
-	 * JsonbContainer struct must contain enough information to tell what
-	 * kind of value it is.
+	 * JsonbContainer struct must contain enough information to tell what kind
+	 * of value it is.
 	 */
 
 	res = (Jsonb *) buffer.data;
@@ -1315,10 +1368,10 @@ convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level)
 		return;
 
 	/*
-	 * A JsonbValue passed as val should never have a type of jbvBinary,
-	 * and neither should any of its sub-components. Those values will be
-	 * produced by convertJsonbArray and convertJsonbObject, the results of
-	 * which will not be passed back to this function as an argument.
+	 * A JsonbValue passed as val should never have a type of jbvBinary, and
+	 * neither should any of its sub-components. Those values will be produced
+	 * by convertJsonbArray and convertJsonbObject, the results of which will
+	 * not be passed back to this function as an argument.
 	 */
 
 	if (IsAJsonbScalar(val))
@@ -1334,57 +1387,91 @@ convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level)
 static void
 convertJsonbArray(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
 {
-	int			offset;
-	int			metaoffset;
+	int			base_offset;
+	int			jentry_offset;
+	int			offsets_offset;
 	int			i;
 	int			totallen;
 	uint32		header;
+	int			nElems = val->val.array.nElems;
 
-	/* Initialize pointer into conversion buffer at this level */
-	offset = buffer->len;
+	/* Remember where in the buffer this array starts. */
+	base_offset = buffer->len;
 
+	/* Align to 4-byte boundary (any padding counts as part of my data) */
 	padBufferToInt(buffer);
 
 	/*
-	 * Construct the header Jentry, stored in the beginning of the variable-
-	 * length payload.
+	 * Construct the header Jentry and store it in the beginning of the
+	 * variable-length payload.
 	 */
-	header = val->val.array.nElems | JB_FARRAY;
+	header = nElems | JB_FARRAY;
 	if (val->val.array.rawScalar)
 	{
-		Assert(val->val.array.nElems == 1);
+		Assert(nElems == 1);
 		Assert(level == 0);
 		header |= JB_FSCALAR;
 	}
 
 	appendToBuffer(buffer, (char *) &header, sizeof(uint32));
-	/* reserve space for the JEntries of the elements. */
-	metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);
+
+	/* Reserve space for the JEntries of the elements. */
+	jentry_offset = reserveFromBuffer(buffer, sizeof(JEntry) * nElems);
+
+	/* Reserve space for the offsets array */
+	offsets_offset = reserveFromBuffer(buffer,
+						   sizeof(uint32) * JB_CONTAINER_NUM_OFFSETS(nElems));
 
 	totallen = 0;
-	for (i = 0; i < val->val.array.nElems; i++)
+	for (i = 0; i < nElems; i++)
 	{
 		JsonbValue *elem = &val->val.array.elems[i];
 		int			len;
 		JEntry		meta;
 
+		/*
+		 * Every JB_OFFSET_STRIDE elements, store the current offset (from
+		 * the beginning of the variable-length portion) in the offsets array.
+		 */
+		if (i % JB_OFFSET_STRIDE == 0 && i > 0)
+		{
+			uint32 off = (uint32) totallen;
+			copyToBuffer(buffer, offsets_offset, (char *) &off, sizeof(uint32));
+			offsets_offset += sizeof(uint32);
+		}
+
+		/*
+		 * Convert element, producing a JEntry and appending its
+		 * variable-length data to buffer
+		 */
 		convertJsonbValue(buffer, &meta, elem, level + 1);
-		len = meta & JENTRY_POSMASK;
-		totallen += len;
 
-		if (totallen > JENTRY_POSMASK)
+		/*
+		 * Bail out if total variable-length data exceeds what will fit in a
+		 * JEntry length field.  We check this in each iteration, not just
+		 * once at the end, to forestall possible integer overflow.
+		 */
+		len = JBE_LENFLD(meta);
+		totallen += len;
+		if (totallen > JENTRY_LENMASK)
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 					 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-							JENTRY_POSMASK)));
+							JENTRY_LENMASK)));
 
-		if (i > 0)
-			meta = (meta & ~JENTRY_POSMASK) | totallen;
-		copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
-		metaoffset += sizeof(JEntry);
+		copyToBuffer(buffer, jentry_offset, (char *) &meta, sizeof(JEntry));
+		jentry_offset += sizeof(JEntry);
 	}
 
-	totallen = buffer->len - offset;
+	/* Total data size is everything we've appended to buffer */
+	totallen = buffer->len - base_offset;
+
+	/* Check length again, since we didn't include the metadata above */
+	if (totallen > JENTRY_LENMASK)
+		ereport(ERROR,
+				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+				 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
+						JENTRY_LENMASK)));
 
 	/* Initialize the header of this node, in the container's JEntry array */
 	*pheader = JENTRY_ISCONTAINER | totallen;
@@ -1393,65 +1480,131 @@ convertJsonbArray(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level
 static void
 convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
 {
-	uint32		header;
-	int			offset;
-	int			metaoffset;
+	int			base_offset;
+	int			jentry_offset;
+	int			offsets_offset;
 	int			i;
 	int			totallen;
+	uint32		header;
+	int			nPairs = val->val.object.nPairs;
 
-	/* Initialize pointer into conversion buffer at this level */
-	offset = buffer->len;
+	/* Remember where in the buffer this object starts. */
+	base_offset = buffer->len;
 
+	/* Align to 4-byte boundary (any padding counts as part of my data) */
 	padBufferToInt(buffer);
 
-	/* Initialize header */
-	header = val->val.object.nPairs | JB_FOBJECT;
+	/*
+	 * Construct the header Jentry and store it in the beginning of the
+	 * variable-length payload.
+	 */
+	header = nPairs | JB_FOBJECT;
 	appendToBuffer(buffer, (char *) &header, sizeof(uint32));
 
-	/* reserve space for the JEntries of the keys and values */
-	metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);
+	/* Reserve space for the JEntries of the keys and values. */
+	jentry_offset = reserveFromBuffer(buffer, sizeof(JEntry) * nPairs * 2);
+
+	/* Reserve space for the offsets array */
+	offsets_offset = reserveFromBuffer(buffer,
+					   sizeof(uint32) * JB_CONTAINER_NUM_OFFSETS(nPairs * 2));
 
+	/*
+	 * Iterate over the keys, then over the values, since that is the ordering
+	 * we want in the on-disk representation.
+	 */
 	totallen = 0;
-	for (i = 0; i < val->val.object.nPairs; i++)
+	for (i = 0; i < nPairs; i++)
 	{
-		JsonbPair *pair = &val->val.object.pairs[i];
-		int len;
-		JEntry meta;
+		JsonbPair  *pair = &val->val.object.pairs[i];
+		int			len;
+		JEntry		meta;
 
-		/* put key */
+		/*
+		 * Every JB_OFFSET_STRIDE elements, store the current offset (from
+		 * the beginning of the variable-length portion) in the offsets array.
+		 */
+		if (i % JB_OFFSET_STRIDE == 0 && i > 0)
+		{
+			uint32 off = (uint32) totallen;
+			copyToBuffer(buffer, offsets_offset, (char *) &off, sizeof(uint32));
+			offsets_offset += sizeof(uint32);
+		}
+
+		/*
+		 * Convert key, producing a JEntry and appending its variable-length
+		 * data to buffer
+		 */
 		convertJsonbScalar(buffer, &meta, &pair->key);
 
-		len = meta & JENTRY_POSMASK;
+		len = JBE_LENFLD(meta);
 		totallen += len;
 
-		if (totallen > JENTRY_POSMASK)
+		copyToBuffer(buffer, jentry_offset, (char *) &meta, sizeof(JEntry));
+		jentry_offset += sizeof(JEntry);
+
+		/*
+		 * Bail out if total variable-length data exceeds what will fit in a
+		 * JEntry length field.  We check this in each iteration, not just
+		 * once at the end, to forestall possible integer overflow.
+		 */
+		if (totallen > JENTRY_LENMASK)
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-					 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-							JENTRY_POSMASK)));
+					 errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+							JENTRY_LENMASK)));
+	}
+	for (i = 0; i < nPairs; i++)
+	{
+		JsonbPair  *pair = &val->val.object.pairs[i];
+		int			len;
+		JEntry		meta;
+
+		/*
+		 * Every JB_OFFSET_STRIDE elements, store the current offset (from
+		 * the beginning of the variable-length portion) in the offsets array.
+		 */
+		if ((nPairs + i) % JB_OFFSET_STRIDE == 0)
+		{
+			uint32 off = (uint32) totallen;
+			copyToBuffer(buffer, offsets_offset, (char *) &off, sizeof(uint32));
+			offsets_offset += sizeof(uint32);
+		}
 
-		if (i > 0)
-			meta = (meta & ~JENTRY_POSMASK) | totallen;
-		copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
-		metaoffset += sizeof(JEntry);
+		/*
+		 * Convert value, producing a JEntry and appending its variable-length
+		 * data to buffer
+		 */
+		convertJsonbValue(buffer, &meta, &pair->value, level + 1);
 
-		convertJsonbValue(buffer, &meta, &pair->value, level);
-		len = meta & JENTRY_POSMASK;
+		len = JBE_LENFLD(meta);
 		totallen += len;
 
-		if (totallen > JENTRY_POSMASK)
+		copyToBuffer(buffer, jentry_offset, (char *) &meta, sizeof(JEntry));
+		jentry_offset += sizeof(JEntry);
+
+		/*
+		 * Bail out if total variable-length data exceeds what will fit in a
+		 * JEntry length field.  We check this in each iteration, not just
+		 * once at the end, to forestall possible integer overflow.
+		 */
+		if (totallen > JENTRY_LENMASK)
 			ereport(ERROR,
 					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-					 errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-							JENTRY_POSMASK)));
-
-		meta = (meta & ~JENTRY_POSMASK) | totallen;
-		copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
-		metaoffset += sizeof(JEntry);
+					 errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+							JENTRY_LENMASK)));
 	}
 
-	totallen = buffer->len - offset;
+	/* Total data size is everything we've appended to buffer */
+	totallen = buffer->len - base_offset;
 
+	/* Check length again, since we didn't include the metadata above */
+	if (totallen > JENTRY_LENMASK)
+		ereport(ERROR,
+				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+				 errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+						JENTRY_LENMASK)));
+
+	/* Initialize the header of this node, in the container's JEntry array */
 	*pheader = JENTRY_ISCONTAINER | totallen;
 }
 
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 91e3e14..7c3c93c 100644
--- a/src/include/utils/jsonb.h
+++ b/src/include/utils/jsonb.h
@@ -83,9 +83,9 @@ typedef struct JsonbValue JsonbValue;
  * buffer is accessed, but they can also be deep copied and passed around.
  *
  * Jsonb is a tree structure. Each node in the tree consists of a JEntry
- * header, and a variable-length content.  The JEntry header indicates what
- * kind of a node it is, e.g. a string or an array, and the offset and length
- * of its variable-length portion within the container.
+ * header and a variable-length content (possibly of zero size).  The JEntry
+ * header indicates what kind of a node it is, e.g. a string or an array,
+ * and includes the length of its variable-length portion.
  *
  * The JEntry and the content of a node are not stored physically together.
  * Instead, the container array or object has an array that holds the JEntrys
@@ -95,39 +95,32 @@ typedef struct JsonbValue JsonbValue;
  * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
  * is implicitly known that the root node must be an array or an object,
  * so we can get away without the type indicator as long as we can distinguish
- * the two.  For that purpose, both an array and an object begins with a uint32
+ * the two.  For that purpose, both an array and an object begin with a uint32
  * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
  * scalar value needs to be stored as a Jsonb value, what we actually store is
  * an array with one element, with the flags in the array's header field set
  * to JB_FSCALAR | JB_FARRAY.
  *
- * To encode the length and offset of the variable-length portion of each
- * node in a compact way, the JEntry stores only the end offset within the
- * variable-length portion of the container node. For the first JEntry in the
- * container's JEntry array, that equals to the length of the node data.  The
- * begin offset and length of the rest of the entries can be calculated using
- * the end offset of the previous JEntry in the array.
- *
  * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
  * the variable-length portion of some node types is aligned to a 4-byte
  * boundary, while others are not. When alignment is needed, the padding is
  * in the beginning of the node that requires it. For example, if a numeric
  * node is stored after a string node, so that the numeric node begins at
  * offset 3, the variable-length portion of the numeric node will begin with
- * one padding byte.
+ * one padding byte so that the actual numeric data is 4-byte aligned.
  */
 
 /*
  * Jentry format.
  *
- * The least significant 28 bits store the end offset of the entry (see
- * JBE_ENDPOS, JBE_OFF, JBE_LEN macros below). The next three bits
- * are used to store the type of the entry. The most significant bit
- * is unused, and should be set to zero.
+ * The least significant 28 bits store the data length of the entry (see
+ * JBE_LENFLD and JBE_LEN macros below). The next three bits store the type
+ * of the entry. The most significant bit is reserved for future use, and
+ * should be set to zero.
  */
 typedef uint32 JEntry;
 
-#define JENTRY_POSMASK			0x0FFFFFFF
+#define JENTRY_LENMASK			0x0FFFFFFF
 #define JENTRY_TYPEMASK			0x70000000
 
 /* values stored in the type bits */
@@ -148,19 +141,30 @@ typedef uint32 JEntry;
 #define JBE_ISBOOL(je_)			(JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))
 
 /*
- * Macros for getting the offset and length of an element. Note multiple
- * evaluations and access to prior array element.
+ * Macros for getting the data length of a JEntry.
  */
-#define JBE_ENDPOS(je_)			((je_) & JENTRY_POSMASK)
-#define JBE_OFF(ja, i)			((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
-#define JBE_LEN(ja, i)			((i) == 0 ? JBE_ENDPOS((ja)[i]) \
-								 : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))
+#define JBE_LENFLD(je_)			((je_) & JENTRY_LENMASK)
+#define JBE_LEN(ja, i)			JBE_LENFLD((ja)[i])
 
 /*
  * A jsonb array or object node, within a Jsonb Datum.
  *
- * An array has one child for each element. An object has two children for
- * each key/value pair.
+ * An array has one child for each element, stored in array order.
+ *
+ * An object has two children for each key/value pair.  The keys all appear
+ * first, in key sort order; then the values appear, in an order matching the
+ * key order.  This arrangement keeps the keys compact in memory, making a
+ * search for a particular key more cache-friendly.
+ *
+ * The JEntry for each child contains the length of the entry. The offset of
+ * a child's variable-length data, can be found by adding up the lengths of
+ * all preceding children.
+ *
+ * To speed up random access of a large array or object, a redundant array of
+ * offsets is stored after the 'children' array. It contains one element for
+ * every JB_OFFSET_STRIDE children. The offset of the first child is always 0,
+ * which is not included in the offset array. An array or object with less
+ * than JB_OFFSET_STRIDE children has no offsets array.
  */
 typedef struct JsonbContainer
 {
@@ -168,15 +172,37 @@ typedef struct JsonbContainer
 								 * flags */
 	JEntry		children[1];	/* variable length */
 
+	/* an uint32 array of offsets follows. */
+
 	/* the data for each child node follows. */
 } JsonbContainer;
 
 /* flags for the header-field in JsonbContainer */
-#define JB_CMASK				0x0FFFFFFF
-#define JB_FSCALAR				0x10000000
+#define JB_CMASK				0x0FFFFFFF		/* mask for count field */
+#define JB_FSCALAR				0x10000000		/* flag bits */
 #define JB_FOBJECT				0x20000000
 #define JB_FARRAY				0x40000000
 
+/* For every JB_OFFSET_STRIDE children, an offset is stored. */
+#define JB_OFFSET_STRIDE		32
+
+/*
+ * Accessor macros for the fields of a JsonbContainer.
+ *
+ * Beware of multiple evaluation!
+ */
+#define JB_CONTAINER_NCHILDREN(jc) \
+	(((jc)->header & JB_CMASK) * (((jc)->header & JB_FOBJECT) ? 2 : 1))
+
+/* # of offsets stored in the offsets array, for given number of children */
+#define JB_CONTAINER_NUM_OFFSETS(nchildren) \
+	(((nchildren) <= JB_OFFSET_STRIDE) ? 0 : (((nchildren) - 1) / JB_OFFSET_STRIDE))
+
+#define JB_CONTAINER_OFFSETS(jc) \
+	((uint32 *) &(jc)->children[JB_CONTAINER_NCHILDREN(jc)])
+#define JB_CONTAINER_CHILD_DATA(jc) \
+	((char *) &JB_CONTAINER_OFFSETS(jc)[JB_CONTAINER_NUM_OFFSETS(JB_CONTAINER_NCHILDREN(jc))])
+
 /* The top-level on-disk format for a jsonb datum. */
 typedef struct
 {
@@ -248,18 +274,20 @@ struct JsonbValue
 									 (jsonbval)->type <= jbvBool)
 
 /*
- * Pair within an Object.
+ * Key/value pair within an Object.
  *
- * Pairs with duplicate keys are de-duplicated.  We store the order for the
- * benefit of doing so in a well-defined way with respect to the original
- * observed order (which is "last observed wins").  This is only used briefly
- * when originally constructing a Jsonb.
+ * This struct type is only used briefly while constructing a Jsonb; it is
+ * *not* the on-disk representation.
+ *
+ * Pairs with duplicate keys are de-duplicated.  We store the originally
+ * observed pair ordering for the purpose of removing duplicates in a
+ * well-defined way (which is "last observed wins").
  */
 struct JsonbPair
 {
 	JsonbValue	key;			/* Must be a jbvString */
 	JsonbValue	value;			/* May be of any type */
-	uint32		order;			/* preserves order of pairs with equal keys */
+	uint32		order;			/* Pair's index in original sequence */
 };
 
 /* Conversion state used when parsing Jsonb from text, or for type coercion */
@@ -287,20 +315,25 @@ typedef struct JsonbIterator
 {
 	/* Container being iterated */
 	JsonbContainer *container;
-	uint32		nElems;			/* Number of elements in children array (will be
-								 * nPairs for objects) */
+	uint32		nElems;			/* Number of elements in children array (will
+								 * be nPairs for objects) */
 	bool		isScalar;		/* Pseudo-array scalar value? */
-	JEntry	   *children;
+	JEntry	   *children;		/* JEntrys for child nodes */
+	/* Data proper.  This points to the beginning of the variable-length data */
+	char	   *dataProper;
 
-	/* Current item in buffer (up to nElems, but must * 2 for objects) */
-	int			i;
+	/* Current item in buffer (up to nElems) */
+	int			curIndex;
+
+	/* Data offset corresponding to current item */
+	uint32		curDataOffset;
 
 	/*
-	 * Data proper.  This points just past end of children array.
-	 * We use the JBE_OFF() macro on the Jentrys to find offsets of each
-	 * child in this area.
+	 * If the container is an object, we want to return keys and values
+	 * alternately; so curDataOffset points to the current key, and
+	 * curValueOffset points to the current value.
 	 */
-	char	   *dataProper;
+	uint32		curValueOffset;
 
 	/* Private state */
 	JsonbIterState state;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to