Jackie-Jiang commented on code in PR #13103:
URL: https://github.com/apache/pinot/pull/13103#discussion_r1602107408


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java:
##########
@@ -38,14 +41,22 @@
  * {@link FieldSpec}.
  */
 public class SanitizationTransformer implements RecordTransformer {
+  private static final String NULL_CHARACTER = "\0";
   private final Map<String, Integer> _stringColumnMaxLengthMap = new 
HashMap<>();
+  private final boolean _failOnTrimmedStringLength;
 
-  public SanitizationTransformer(Schema schema) {
+  public SanitizationTransformer(TableConfig tableConfig, Schema schema) {
     for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
       if (!fieldSpec.isVirtualColumn() && fieldSpec.getDataType() == 
DataType.STRING) {
         _stringColumnMaxLengthMap.put(fieldSpec.getName(), 
fieldSpec.getMaxLength());
       }
     }
+    IngestionConfig ingestionConfig = tableConfig.getIngestionConfig();
+    if (ingestionConfig != null) {
+      _failOnTrimmedStringLength = 
ingestionConfig.isFailOnTrimmedStringLength();

Review Comment:
   I see. I misunderstood the intention of the new config. Basically you want 
to fail the ingestion when the value length is longer than the max length.
   
   I can see we might want different mode when the value is over the max 
length, and it could be per column based:
   - Trim (current behavior)
   - Throw exception (added behavior in this PR)
   - Fill default value (very useful for JSON/BYTES)
   
   Based on this, I'd suggest adding a mode into `FieldSpec` which specifies 
the strategy of handling max length



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to