Jackie-Jiang commented on code in PR #18368:
URL: https://github.com/apache/pinot/pull/18368#discussion_r3197709408


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/datasource/MapDataSource.java:
##########


Review Comment:
   Same for other APIs. `getKeyXXX()` are very confusing because they are all 
for values



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/creator/ColumnarMapIndexCreator.java:
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.index.creator;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.pinot.segment.spi.index.IndexCreator;
+
+
+/**
+ * Creator for the COLUMNAR_MAP index. Accepts one map per document during 
segment creation
+ * and decomposes it into per-key columnar storage on seal().
+ *
+ * <p>Implementations are not thread-safe; callers must serialize {@link #add} 
calls per
+ * creator instance.
+ *
+ * <p>The inherited {@code add(Object, int)} method from {@link IndexCreator} 
treats the
+ * first argument as the map and the second as the docId, matching the 
column-major creator
+ * path. Callers may use either entry point.
+ */
+public interface ColumnarMapIndexCreator extends IndexCreator {
+
+  /**
+   * Adds one document's map. Keys present in the map's entry set are routed 
to per-key
+   * columnar storage; keys with declared types are coerced to those types, 
others fall
+   * back to the configured default value type. A null or empty map is valid 
and means the
+   * document has no key/value pairs.
+   *
+   * @param mapValue the document's map (may be null or empty)
+   * @param docId the document id, must be monotonically non-decreasing across 
calls
+   */
+  void add(@Nullable Map<String, Object> mapValue, int docId)

Review Comment:
   Do we every all adding `null` here? When value flows here, it should already 
be replaced with default value



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/creator/ColumnarMapIndexCreator.java:
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.index.creator;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.pinot.segment.spi.index.IndexCreator;
+
+
+/**
+ * Creator for the COLUMNAR_MAP index. Accepts one map per document during 
segment creation
+ * and decomposes it into per-key columnar storage on seal().
+ *
+ * <p>Implementations are not thread-safe; callers must serialize {@link #add} 
calls per
+ * creator instance.
+ *
+ * <p>The inherited {@code add(Object, int)} method from {@link IndexCreator} 
treats the
+ * first argument as the map and the second as the docId, matching the 
column-major creator
+ * path. Callers may use either entry point.
+ */
+public interface ColumnarMapIndexCreator extends IndexCreator {
+
+  /**
+   * Adds one document's map. Keys present in the map's entry set are routed 
to per-key
+   * columnar storage; keys with declared types are coerced to those types, 
others fall
+   * back to the configured default value type. A null or empty map is valid 
and means the
+   * document has no key/value pairs.
+   *
+   * @param mapValue the document's map (may be null or empty)
+   * @param docId the document id, must be monotonically non-decreasing across 
calls
+   */
+  void add(@Nullable Map<String, Object> mapValue, int docId)
+      throws IOException;
+
+  /**
+   * Returns metadata properties for any virtual columns this creator 
materialized during
+   * {@code seal()}. The framework merges the returned properties into the 
segment metadata.
+   * Implementations that do not produce virtual columns return an empty map.
+   *
+   * <p>Call after {@code seal()}.
+   *
+   * @return a map from virtual-column name to its {@link 
PropertiesConfiguration}; never null
+   */
+  default Map<String, PropertiesConfiguration> getVirtualColumnMetadata() {
+    return Collections.emptyMap();

Review Comment:
   (minor, convention) We usually use `Map.of()`. Same for other places



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ColumnarMapIndexReader.java:
##########
@@ -0,0 +1,81 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.index.reader;
+
+import java.util.Map;
+import java.util.Set;
+import javax.annotation.Nullable;
+import org.apache.pinot.segment.spi.index.IndexReader;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+
+
+/**
+ * Reader for the COLUMNAR_MAP index. Each indexed key is materialized as its 
own per-key
+ * forward index plus a presence bitmap.
+ *
+ * <p>Implementations must be safe for concurrent reads. Mutable 
implementations may impose
+ * a single-writer constraint; refer to the concrete implementation's Javadoc 
for details.
+ *
+ * <p>Per-key {@code DataSource} construction is the responsibility of the 
surrounding
+ * {@code ColumnarMapDataSource} wrappers, not this reader. This interface 
exposes only
+ * the primitives a wrapper needs (key set, type, presence bitmap, per-doc map 
view).
+ */
+public interface ColumnarMapIndexReader extends IndexReader {
+
+  /** Returns the set of all indexed key names. Never null; empty if no keys 
are indexed. */
+  Set<String> getKeys();
+
+  /** Returns the value DataType for the given key, or null if the key is not 
indexed. */
+  @Nullable
+  DataType getKeyValueType(String key);

Review Comment:
   Same here. Do not call out key in the method name. It is very confusing. Key 
is always string, and all properties are for values



##########
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java:
##########
@@ -71,6 +71,15 @@ public class FieldConfig extends BaseJsonConfig {
   public static final String 
TEXT_INDEX_LUCENE_NRT_CACHING_DIRECTORY_BUFFER_SIZE =
       "luceneNRTCachingDirectoryMaxBufferSizeMB";
 
+  // COLUMNAR_MAP index properties

Review Comment:
   Please document all of them



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/datasource/MapDataSource.java:
##########
@@ -32,9 +32,22 @@ public interface MapDataSource extends DataSource {
 
   /**
    * Get the Data Source representation of a single key within this map column.
+   * Only call after confirming the key exists via {@link 
#containsKey(String)}.
    */
   DataSource getKeyDataSource(String key);

Review Comment:
   Not introduced in this PR, but let's rename it to:
   ```suggestion
     @Nullable
     DataSource getDataSource(String key);
   ```
   
   It is very confusing now because the data source is for value, not key.
   Suggest letting it return `@Nullable` to represent key not exist



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/StandardIndexes.java:
##########
@@ -79,6 +82,7 @@ public class StandardIndexes {
   public static final String TEXT_ID = "text_index";
   public static final String H3_ID = "h3_index";
   public static final String VECTOR_ID = "vector_index";
+  public static final String COLUMNAR_MAP_ID = "columnar_map";

Review Comment:
   Should we just call it `MAP`? Do you foresee other map types to be added in 
the future that doesn't go under this?



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/datasource/MapDataSource.java:
##########
@@ -32,9 +32,22 @@ public interface MapDataSource extends DataSource {
 
   /**
    * Get the Data Source representation of a single key within this map column.
+   * Only call after confirming the key exists via {@link 
#containsKey(String)}.
    */
   DataSource getKeyDataSource(String key);
 
+  /**
+   * Returns true if {@code key} is present in this MAP column for at least 
one document in
+   * this segment. Call this before {@link #getKeyDataSource(String)} to avoid 
undefined
+   * behaviour on absent keys.
+   *
+   * <p>The default implementation delegates to {@link #getKeyDataSources()}, 
which may be
+   * expensive for large key sets. Implementations should override for O(1) 
performance.
+   */
+  default boolean containsKey(String key) {

Review Comment:
   (optional) This is probably not needed if we make `getDataSource` return 
`@Nullable`



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/creator/ColumnarMapIndexCreator.java:
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.index.creator;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.pinot.segment.spi.index.IndexCreator;
+
+
+/**

Review Comment:
   For new added javadoc, please follow markdown style, which is more concise 
and easier to read



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/creator/ColumnarMapIndexCreator.java:
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.index.creator;
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.pinot.segment.spi.index.IndexCreator;
+
+
+/**
+ * Creator for the COLUMNAR_MAP index. Accepts one map per document during 
segment creation
+ * and decomposes it into per-key columnar storage on seal().
+ *
+ * <p>Implementations are not thread-safe; callers must serialize {@link #add} 
calls per
+ * creator instance.
+ *
+ * <p>The inherited {@code add(Object, int)} method from {@link IndexCreator} 
treats the
+ * first argument as the map and the second as the docId, matching the 
column-major creator
+ * path. Callers may use either entry point.
+ */
+public interface ColumnarMapIndexCreator extends IndexCreator {
+
+  /**
+   * Adds one document's map. Keys present in the map's entry set are routed 
to per-key
+   * columnar storage; keys with declared types are coerced to those types, 
others fall
+   * back to the configured default value type. A null or empty map is valid 
and means the
+   * document has no key/value pairs.
+   *
+   * @param mapValue the document's map (may be null or empty)
+   * @param docId the document id, must be monotonically non-decreasing across 
calls
+   */
+  void add(@Nullable Map<String, Object> mapValue, int docId)
+      throws IOException;
+
+  /**
+   * Returns metadata properties for any virtual columns this creator 
materialized during
+   * {@code seal()}. The framework merges the returned properties into the 
segment metadata.
+   * Implementations that do not produce virtual columns return an empty map.
+   *
+   * <p>Call after {@code seal()}.
+   *
+   * @return a map from virtual-column name to its {@link 
PropertiesConfiguration}; never null
+   */
+  default Map<String, PropertiesConfiguration> getVirtualColumnMetadata() {
+    return Collections.emptyMap();

Review Comment:
   Is this virtual column? Virtual column is an already established concept in 
Pinot where column data doesn't exist. IIUC, this is materialized column?



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/metadata/ColumnMetadataImpl.java:
##########
@@ -72,6 +72,8 @@ public class ColumnMetadataImpl implements ColumnMetadata {
   private final PartitionFunction _partitionFunction;
   private final Set<Integer> _partitions;
   private final boolean _autoGenerated;
+  private final boolean _isMapVirtualColumn;
+  private final String _parentMapColumn;

Review Comment:
   Do we need both of them? Is `_isMapVirtualColumn` always equal to 
`_parentMapColumn != null`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to