Jackie-Jiang commented on code in PR #11739:
URL: https://github.com/apache/pinot/pull/11739#discussion_r1346502805
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -76,6 +77,63 @@ public ImmutableJsonIndexReader(PinotDataBuffer dataBuffer,
int numDocs) {
_docIdMapping = dataBuffer.view(invertedIndexEndOffset,
docIdMappingEndOffset, ByteOrder.LITTLE_ENDIAN);
}
+ /**
+ * Accepts a JSON key and array of docIds used to filter the response
+ * return a String[] where String[i] gives the value of $.key for document i
+ */
+ @Override
+ public String[] getValuesForKeyAndDocs(String key, int[] docIds) {
+ ImmutableRoaringBitmap docIdMask = ImmutableRoaringBitmap.bitmapOf(docIds);
+ int[] dictIds = getDictIdsForKey(key);
+ String[] values = new String[(int) _numDocs];
Review Comment:
`_numDocs`?
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -76,6 +77,63 @@ public ImmutableJsonIndexReader(PinotDataBuffer dataBuffer,
int numDocs) {
_docIdMapping = dataBuffer.view(invertedIndexEndOffset,
docIdMappingEndOffset, ByteOrder.LITTLE_ENDIAN);
}
+ /**
+ * Accepts a JSON key and array of docIds used to filter the response
+ * return a String[] where String[i] gives the value of $.key for document i
+ */
+ @Override
+ public String[] getValuesForKeyAndDocs(String key, int[] docIds) {
+ ImmutableRoaringBitmap docIdMask = ImmutableRoaringBitmap.bitmapOf(docIds);
+ int[] dictIds = getDictIdsForKey(key);
+ String[] values = new String[(int) _numDocs];
+ for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
+ // get docIds from posting list, convert these to the actual docIds
+ ImmutableRoaringBitmap flattenedDocIds =
_invertedIndex.getDocIds(dictId);
+ PeekableIntIterator it = flattenedDocIds.getIntIterator();
+ MutableRoaringBitmap postingList = new MutableRoaringBitmap();
+ while (it.hasNext()) {
+ postingList.add(getDocId(it.next()));
+ }
+ // if posting list does not contain relevant docIds, skip the dictionary
lookup
+ postingList.and(docIdMask);
+ if (postingList.isEmpty()) {
+ continue;
+ }
+
+ // dictionary value lookup, stripping the path prefix
+ String val = _dictionary.getStringValue(dictId).substring(key.length() +
1);
+
+ // add value to padded array
+ for (int docId : postingList) {
+ values[docId] = val;
Review Comment:
This won't work if the same doc contains multiple flattened docs of the same
key but different value. They will override each other. We need to document
this limitation.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -76,6 +77,63 @@ public ImmutableJsonIndexReader(PinotDataBuffer dataBuffer,
int numDocs) {
_docIdMapping = dataBuffer.view(invertedIndexEndOffset,
docIdMappingEndOffset, ByteOrder.LITTLE_ENDIAN);
}
+ /**
+ * Accepts a JSON key and array of docIds used to filter the response
+ * return a String[] where String[i] gives the value of $.key for document i
+ */
+ @Override
+ public String[] getValuesForKeyAndDocs(String key, int[] docIds) {
+ ImmutableRoaringBitmap docIdMask = ImmutableRoaringBitmap.bitmapOf(docIds);
+ int[] dictIds = getDictIdsForKey(key);
+ String[] values = new String[(int) _numDocs];
+ for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
+ // get docIds from posting list, convert these to the actual docIds
+ ImmutableRoaringBitmap flattenedDocIds =
_invertedIndex.getDocIds(dictId);
+ PeekableIntIterator it = flattenedDocIds.getIntIterator();
+ MutableRoaringBitmap postingList = new MutableRoaringBitmap();
+ while (it.hasNext()) {
+ postingList.add(getDocId(it.next()));
+ }
+ // if posting list does not contain relevant docIds, skip the dictionary
lookup
+ postingList.and(docIdMask);
+ if (postingList.isEmpty()) {
+ continue;
+ }
+
+ // dictionary value lookup, stripping the path prefix
+ String val = _dictionary.getStringValue(dictId).substring(key.length() +
1);
+
+ // add value to padded array
+ for (int docId : postingList) {
+ values[docId] = val;
+ }
+ }
+
+ return values;
+ }
+
+ /**
+ * For a JSON key path, returns an int array of [min, max] of all values of
the JSON key path
+ */
+ private int[] getDictIdsForKey(String key) {
+ // json_index uses \0 as the separator (or \u0000 in unicode)
+ // therefore, we can use the unicode char \u0001 to get the range of dict
entries that use this prefix
+
+ // get min for key
+ int indexOfMin = _dictionary.indexOf(key);
+ if (indexOfMin == -1) {
+ return new int[]{-1, -1}; // if key does not exist, immediately return
+ }
+ int indexOfMax = _dictionary.insertionIndexOf(key + "\u0001");
+
+ assert indexOfMax < 0; // the made up max key bound should never be found
Review Comment:
We should handle this. If it is positive, use it as max
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]