Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-06-05 Thread via GitHub


klsince merged PR #12440:
URL: https://github.com/apache/pinot/pull/12440


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-29 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1619284601


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/VersionedPropertyReader.java:
##
@@ -0,0 +1,60 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * VersionedPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class VersionedPropertyReader extends PropertiesReader {
+
+  public VersionedPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+// getPropertySeparator(), in general returns the PropertiesConfiguration 
`DEFAULT_SEPARATOR` value i.e. ' = '.
+String separator = getPropertySeparator();
+
Preconditions.checkArgument(CommonsConfigurationUtils.VERSIONED_CONFIG_SEPARATOR.equals(separator),
+String.format("versioned property configuration separator should be 
equal to '%s'",
+CommonsConfigurationUtils.VERSIONED_CONFIG_SEPARATOR));
+
+String[] keyValue = line.split(getPropertySeparator());
+Preconditions.checkArgument(keyValue.length == 2, "property content split 
should result in key and value");

Review Comment:
   This check might fail if the value happen to have the separator
   
   The `line.split()` does regex matching internally, so better use 
`StringUtils.split()`, and `StringUtils.split()` can take a `max` param to 
return just two parts to handle the issue said above.
   
   And since you have get `separator` above, you can save the calls of 
getPropertySeparator() on L48 and L52.



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/VersionedPropertyReader.java:
##
@@ -0,0 +1,60 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * VersionedPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class VersionedPropertyReader extends PropertiesReader {
+
+  public VersionedPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+// getPropertySeparator(), in general returns the PropertiesConfiguration 
`DEFAULT_SEPARATOR` value i.e. ' = '.
+String separator = getPropertySeparator();
+
Preconditions.checkArgument(CommonsConfigurationUtils.VERSIONED_CONFIG_SEPARATOR.equals(separator),
+String.format("versioned property configuration separator should be 
equal to '%s'",

Review Comment:
   nit: "Versioned property configuration separator '%s' should be e

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-28 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1617954347


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -276,20 +334,65 @@ public static String 
recoverSpecialCharacterInPropertyValue(String value) {
 return value.replace("\0\0", ",");
   }
 
-  private static PropertiesConfiguration createPropertiesConfiguration(boolean 
setIOFactory,
-  boolean setDefaultDelimiter) {
+  /**
+   * creates the instance of the {@link 
org.apache.commons.configuration2.PropertiesConfiguration}
+   * with custom IO factory based on kind {@link 
org.apache.commons.configuration2.PropertiesConfiguration.IOFactory}
+   * and legacy list delimiter {@link 
org.apache.commons.configuration2.convert.LegacyListDelimiterHandler}
+   *
+   * @param setDefaultDelimiter sets the default list delimiter.
+   * @param ioFactoryKind IOFactory kind, can be null.
+   * @return PropertiesConfiguration
+   */
+  private static PropertiesConfiguration createPropertiesConfiguration(boolean 
setDefaultDelimiter,
+ @Nullable PropertyIOFactoryKind ioFactoryKind) {
 PropertiesConfiguration config = new PropertiesConfiguration();
 
-// setting IO Reader Factory
-if (setIOFactory) {
-  config.setIOFactory(new ConfigFilePropertyReaderFactory());
+// setting IO Reader Factory of the configuration.
+if (ioFactoryKind != null) {
+  config.setIOFactory(ioFactoryKind.getInstance());
 }
 
-// setting DEFAULT_LIST_DELIMITER
+// setting the DEFAULT_LIST_DELIMITER
 if (setDefaultDelimiter) {
   config.setListDelimiterHandler(new 
LegacyListDelimiterHandler(DEFAULT_LIST_DELIMITER));
 }
 
 return config;
   }
+
+  /**
+   * checks whether the configuration file first line is version header or not.
+   * @param file configuration file
+   * @return String
+   * @throws ConfigurationException exception.
+   */
+  private static String getConfigurationHeaderVersion(File file)
+  throws ConfigurationException {
+String versionValue = DEFAULT_PROPERTIES_CONFIGURATION_HEADER_VERSION;
+if (file.exists()) {
+  try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
+String fileFirstLine = reader.readLine();
+// header version is written as a comment and start with '# '
+String versionHeaderCommentPrefix = String.format("# %s", 
VERSION_HEADER_IDENTIFIER);
+// check whether the file has the version header or not
+if (StringUtils.startsWith(fileFirstLine, versionHeaderCommentPrefix)) 
{
+  String[] headerKeyValue = 
fileFirstLine.split(VERSIONED_CONFIG_SEPARATOR);
+  if (headerKeyValue.length == 2) {
+versionValue = headerKeyValue[1];
+  }
+}
+  } catch (IOException exception) {
+throw new ConfigurationException(
+String.format("Error occurred while reading configuration file %s 
",

Review Comment:
   nit: was the whitespace after `%s` left intentionally? otherwise can save 
`String.format()` and simply `"Error ..." + file.getName()` for short, but not 
a blocking comment any way.



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/VersionedPropertyReader.java:
##
@@ -0,0 +1,55 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * VersionedPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class VersionedPropertyReader extends PropertiesReader {
+
+  public VersionedPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+// getPropertySeparator(), in general returns the PropertiesConfiguration 
`DEFAULT_SEPARATO

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-25 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1614759506


##
pinot-spi/src/test/java/org/apache/pinot/spi/env/VersionedPropertyConfigTest.java:
##
@@ -0,0 +1,224 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.commons.configuration2.ex.ConfigurationException;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.AfterMethod;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertNotNull;
+import static org.testng.Assert.assertNull;
+
+
+public class VersionedPropertyConfigTest {
+
+  private static final String COLON_SEPARATOR = ":";
+  private static final File TEMP_DIR = new File(FileUtils.getTempDirectory(), 
"VersionedPropertyConfigTest");
+  private static final File CONFIG_FILE = new File(TEMP_DIR, "config");
+  private static final String[] TEST_PROPERTY_KEY = { "test1", "test2_key", 
"test3_key_",
+  "test4_key_1234", "test-1", "test.1" };
+  private static final String[] TEST_PROPERTY_KEY_WITH_SPECIAL_CHAR = { 
"test:1", "test2=key",

Review Comment:
   So, in this test case, `test2=key` is a key. the overall property in the 
file looks like this
   ```bash
   test2=key = value.
   ```
   Since we are using the `getPropertySeparator` method in the reader, which 
returns the separator value as ` = `, it's working correctly. I think with this 
implementation, we can also use  `=` in the key.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-25 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1614757440


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -42,29 +44,20 @@
  * Provide utility functions to manipulate Apache Commons {@link 
Configuration} instances.
  */
 public class CommonsConfigurationUtils {
+
+  private static final String VERSIONED_CONFIG_SEPARATOR = " = ";

Review Comment:
   This is the same value of `DEFAULT_SEPARATOR` in 
[PropertiesConfiguration.java](https://github.com/apache/commons-configuration/blob/15e6ec62f1b309bab0bffd139a4790564dc3ca73/src/main/java/org/apache/commons/configuration2/PropertiesConfiguration.java#L1005).
 We are using the same because in VersionedPropertyReader we are using the 
[getPropertySeparator](https://github.com/apache/pinot/pull/12440/files#diff-7cfc074e3db46c25491eba3d6df78847cecbb093b4780dbe6a1b8875ffdc3555R42)
 to split the string into key and value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-23 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1612507311


##
pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java:
##
@@ -50,6 +53,33 @@ public void tearDown()
 FileUtils.deleteDirectory(TEMP_DIR);
   }
 
+  @Test
+  public void testSegmentMetadataFromFile() {
+// load the existing config and check the properties config instance
+try {
+  PropertiesConfiguration config = CommonsConfigurationUtils
+  .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true);
+  assertNotNull(config);
+
+  config.setProperty("testKey", "testValue");
+
+  // add the segment version header to the file and read it again
+  CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE,
+  CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2);
+
+  // reading the property with header.
+  config = 
CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true);
+  assertNotNull(config);
+  assertEquals(config.getHeader(), "# version=2");
+} catch (Exception ex) {
+  Assert.fail("should not throw ConfigurationException exception with 
valid file");
+}
+
+// load the non-existing file and expect the exception
+Assert.expectThrows(NullPointerException.class,
+() -> CommonsConfigurationUtils.getSegmentMetadataFromFile(null, 
true));

Review Comment:
   addressed in the [PR](https://github.com/apache/pinot/pull/13201). marking 
it as resolved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-21 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1608761946


##
pinot-spi/src/test/java/org/apache/pinot/spi/env/VersionedPropertyConfigTest.java:
##
@@ -0,0 +1,224 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.commons.configuration2.ex.ConfigurationException;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.AfterMethod;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertNotNull;
+import static org.testng.Assert.assertNull;
+
+
+public class VersionedPropertyConfigTest {
+
+  private static final String COLON_SEPARATOR = ":";
+  private static final File TEMP_DIR = new File(FileUtils.getTempDirectory(), 
"VersionedPropertyConfigTest");
+  private static final File CONFIG_FILE = new File(TEMP_DIR, "config");
+  private static final String[] TEST_PROPERTY_KEY = { "test1", "test2_key", 
"test3_key_",
+  "test4_key_1234", "test-1", "test.1" };
+  private static final String[] TEST_PROPERTY_KEY_WITH_SPECIAL_CHAR = { 
"test:1", "test2=key",

Review Comment:
   curious how keys with `=` like `test2=key` get handled properly? because the 
two white spaces in the separator constant defined in 
`CommonsConfigurationUtils.java` above?



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -42,29 +44,20 @@
  * Provide utility functions to manipulate Apache Commons {@link 
Configuration} instances.
  */
 public class CommonsConfigurationUtils {
+
+  private static final String VERSIONED_CONFIG_SEPARATOR = " = ";

Review Comment:
   is the white space required?  



##
pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java:
##
@@ -50,6 +53,33 @@ public void tearDown()
 FileUtils.deleteDirectory(TEMP_DIR);
   }
 
+  @Test
+  public void testSegmentMetadataFromFile() {
+// load the existing config and check the properties config instance
+try {
+  PropertiesConfiguration config = CommonsConfigurationUtils
+  .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true);
+  assertNotNull(config);
+
+  config.setProperty("testKey", "testValue");
+
+  // add the segment version header to the file and read it again
+  CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE,
+  CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2);
+
+  // reading the property with header.
+  config = 
CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true);
+  assertNotNull(config);
+  assertEquals(config.getHeader(), "# version=2");
+} catch (Exception ex) {
+  Assert.fail("should not throw ConfigurationException exception with 
valid file");
+}
+
+// load the non-existing file and expect the exception
+Assert.expectThrows(NullPointerException.class,
+() -> CommonsConfigurationUtils.getSegmentMetadataFromFile(null, 
true));

Review Comment:
   > we don't need the non-existing file test.
   
   is this because we do `fileHandler.setFile(file);` if file doesn't exist?
   
   not introduced by this PR, but looks like you added those changes earlier 
on, so in case you know: 
   1. for `fromFile()`, why we need to do `fileHandler.setFile(file)` when file 
doesn't exit?
   2. if that's needed, for `fromPath()` method, should we do 
`fileHandler.setPath(path)` if path doesn't exist as well? 



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -271,20 +328,59 @@ public static String 
recoverSpecialCharacterInPropertyValue(String value) {
 return value.replace("\0\0", ",");
   }
 
-  private static PropertiesConfiguration createPropertiesConfiguration(boolean 

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-18 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1605906610


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,54 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+
+  public SegmentMetadataPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+String[] keyValue = line.split(getPropertySeparator());
+Preconditions.checkArgument(keyValue.length == 2, "property content split 
should result in key and value");
+initPropertyName(keyValue[0]);
+initPropertyValue(keyValue[1]);
+initPropertySeparator(getPropertySeparator());

Review Comment:
   Setting the global separator for versioned properties configurations ensures 
it gets written with the `=` separator. Also, I added the unit test case. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-18 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1605808245


##
pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java:
##
@@ -50,6 +53,33 @@ public void tearDown()
 FileUtils.deleteDirectory(TEMP_DIR);
   }
 
+  @Test
+  public void testSegmentMetadataFromFile() {
+// load the existing config and check the properties config instance
+try {
+  PropertiesConfiguration config = CommonsConfigurationUtils
+  .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true);
+  assertNotNull(config);
+
+  config.setProperty("testKey", "testValue");
+
+  // add the segment version header to the file and read it again
+  CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE,
+  CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2);
+
+  // reading the property with header.
+  config = 
CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true);
+  assertNotNull(config);
+  assertEquals(config.getHeader(), "# version=2");
+} catch (Exception ex) {
+  Assert.fail("should not throw ConfigurationException exception with 
valid file");
+}
+
+// load the non-existing file and expect the exception
+Assert.expectThrows(NullPointerException.class,
+() -> CommonsConfigurationUtils.getSegmentMetadataFromFile(null, 
true));

Review Comment:
   we don't need the non-existing file test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-17 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1605567663


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,54 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+
+  public SegmentMetadataPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+String[] keyValue = line.split(getPropertySeparator());
+Preconditions.checkArgument(keyValue.length == 2, "property content split 
should result in key and value");
+initPropertyName(keyValue[0]);
+initPropertyValue(keyValue[1]);
+initPropertySeparator(getPropertySeparator());

Review Comment:
   We are not changing the separator configuration. Also, we only have one 
[write method 
](https://github.com/apache/pinot/pull/12440/files#diff-d500a9bb6a9a2455521f13b93599aeb1db76a0a3a285ffb6a4cef58dcf98e9c7R189),
 and we can explicitly set the separator value so that in future if default 
behaviour of commons-configuration2 changes, it wouldn't impact us. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-17 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1605567663


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,54 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+
+  public SegmentMetadataPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+String[] keyValue = line.split(getPropertySeparator());
+Preconditions.checkArgument(keyValue.length == 2, "property content split 
should result in key and value");
+initPropertyName(keyValue[0]);
+initPropertyValue(keyValue[1]);
+initPropertySeparator(getPropertySeparator());

Review Comment:
   We are not changing the separator configuration. However, we only have 
[write method 
](https://github.com/apache/pinot/pull/12440/files#diff-d500a9bb6a9a2455521f13b93599aeb1db76a0a3a285ffb6a4cef58dcf98e9c7R189),
 and we can explicitly set the separator value so that in future if default 
behaviour of commons-configuration2 changes, it wouldn't impact us. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-16 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1603867571


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,54 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+
+  public SegmentMetadataPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+String[] keyValue = line.split(getPropertySeparator());
+Preconditions.checkArgument(keyValue.length == 2, "property content split 
should result in key and value");
+initPropertyName(keyValue[0]);
+initPropertyValue(keyValue[1]);
+initPropertySeparator(getPropertySeparator());

Review Comment:
   currently, how we change the separator configuration while writing the 
configuration file? Is there a way to force that to be '=' as part of contract 
of using version 2.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-15 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1602430072


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,54 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+
+  public SegmentMetadataPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+String[] keyValue = line.split(getPropertySeparator());
+Preconditions.checkArgument(keyValue.length == 2, "property content split 
should result in key and value");
+initPropertyName(keyValue[0]);
+initPropertyValue(keyValue[1]);
+initPropertySeparator(getPropertySeparator());

Review Comment:
   Good point. 
   
   I think it only happens if we change the seprator configuration while 
writing the configuration file. By default, it's `=` and I don't think we want 
to change that. I can add a specific comment about it for future reference and 
maybe the unit test around it. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-15 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1601978383


##
pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java:
##
@@ -50,6 +53,33 @@ public void tearDown()
 FileUtils.deleteDirectory(TEMP_DIR);
   }
 
+  @Test
+  public void testSegmentMetadataFromFile() {
+// load the existing config and check the properties config instance
+try {
+  PropertiesConfiguration config = CommonsConfigurationUtils
+  .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true);
+  assertNotNull(config);
+
+  config.setProperty("testKey", "testValue");
+
+  // add the segment version header to the file and read it again
+  CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE,
+  CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2);
+
+  // reading the property with header.
+  config = 
CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true);
+  assertNotNull(config);
+  assertEquals(config.getHeader(), "# version=2");
+} catch (Exception ex) {
+  Assert.fail("should not throw ConfigurationException exception with 
valid file");

Review Comment:
   put ex.getMessage() in the failure msg to help debug



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,54 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+
+  public SegmentMetadataPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+String[] keyValue = line.split(getPropertySeparator());
+Preconditions.checkArgument(keyValue.length == 2, "property content split 
should result in key and value");
+initPropertyName(keyValue[0]);
+initPropertyValue(keyValue[1]);
+initPropertySeparator(getPropertySeparator());

Review Comment:
   would separator be something other than '='? if so, I wonder if the split 
logic in the util class would fail e.g. 
   ```
   String[] headerKeyValue = fileFirstLine.split("=");
   ```



##
pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java:
##
@@ -50,6 +53,33 @@ public void tearDown()
 FileUtils.deleteDirectory(TEMP_DIR);
   }
 
+  @Test
+  public void testSegmentMetadataFromFile() {
+// load the existing config and check the properties config instance
+try {
+  PropertiesConfiguration config = CommonsConfigurationUtils
+  .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true);
+  assertNotNull(config);
+
+  config.setProperty("testKey", "testValue");
+
+  // add the segment version header to the file and read it again
+  CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE,
+  CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2);
+
+  // reading the property with header.
+  config = 
CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true);
+  assertNotNull(config);
+  assertEquals(config.getHeader(), "# version=2");
+} catch (Exception ex) {
+  Assert.fail("should not throw ConfigurationException exception with 
valid file");
+}
+
+// load the non-existing file and expect the exception
+Assert.expectThrows(NullPointerException.class,
+() -> CommonsConfigurationUtils.getSegmentMetadataFromFile(null, 
true));

Review Comment:
   can add another check for non-existing file, in addition to `nu

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-05-14 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1600485846


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/PropertyIOFactoryKind.java:
##
@@ -0,0 +1,55 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import org.apache.commons.configuration2.PropertiesConfiguration;
+
+public enum PropertyIOFactoryKind {
+  ConfigFileIOFactory {
+public String toString() {
+  return "ConfigFile";
+}
+
+@Override
+public ConfigFilePropertyIOFactory getInstance() {
+  return new ConfigFilePropertyIOFactory();
+}
+  },
+  SegmentMetadataIOFactory {
+public String toString() {
+  return "SegmentMetadata";
+}
+@Override
+public SegmentMetadataPropertyIOFactory getInstance() {
+  return new SegmentMetadataPropertyIOFactory();
+}
+  },
+  DefaultPropertyConfigurationIOFactory {

Review Comment:
   nit: just DefaultIOFactory for short?



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";

Review Comment:
   To be generic, should we rename the SegmentMetadataIOFactory to something 
like VersionedIOFactory? so that it's not just for reading/writing 
SegmentMetadata files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1577155394


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -128,13 +155,34 @@ public static PropertiesConfiguration fromFile(File file, 
boolean setIOFactory,
 return config;
   }
 
+  /**
+   * save the segment metadata configuration content into the provided file 
based on the version header.
+   * @param propertiesConfiguration a {@link PropertiesConfiguration} instance.
+   * @param file a {@link File} instance.
+   * @param versionHeader a {@link String} instance.
+   */
+  public static void saveSegmentMetadataToFile(PropertiesConfiguration 
propertiesConfiguration, File file,

Review Comment:
   not merging two checks because, in-future, we might want to set the version 
header with a different IO factory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576806764


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";

Review Comment:
   I think we can use this header version in specific other scenarios, like 
moving away from the usage of `LegacyListDelimiterHandler` in future. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


Jackie-Jiang commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576704306


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";
 
-  private CommonsConfigurationUtils() {
-  }
+  // usage: default header version of all configurations.
+  // if properties configuration doesn't contain header version, it will be 
considered as 1
+  public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1";

Review Comment:
   Maybe keeping a default version:
   ```
   public static final int SEGMENT_METADATA_DEFAULT_VERSION = 1;
   ```
   We probably don't need other constants for versions as version should be int



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


Jackie-Jiang commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576704306


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";
 
-  private CommonsConfigurationUtils() {
-  }
+  // usage: default header version of all configurations.
+  // if properties configuration doesn't contain header version, it will be 
considered as 1
+  public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1";

Review Comment:
   Maybe keeping a default version:
   ```
   public static final int DEFAULT_SEGMENT_METADATA_VERSION = 1;
   ```
   We probably don't need other constants for versions as version should be int



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


Jackie-Jiang commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576701431


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";

Review Comment:
   I think this can be common though. If it is explicitly specified (in any 
configuration file), we take it; if it is not explicitly specified, we treat it 
as default version



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


Jackie-Jiang commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576701431


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";

Review Comment:
   I think this can be common though. If it is explicitly specified (in any 
configuration file), we take it; if it is not explicitly specified, we treat it 
as default version (maybe adding a constant for default version?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576653550


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";
 
-  private CommonsConfigurationUtils() {
-  }
+  // usage: default header version of all configurations.
+  // if properties configuration doesn't contain header version, it will be 
considered as 1
+  public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1";

Review Comment:
   I don't think we need "1" and can use `null` if there is no header from 
segment metadata file. 
   
   How about name the constant below `SEGMENT_METADATA_FILE_FORMAT_VERSION = 
1`, and leave some comments about what the new format would be like.
   ```
   public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_2 = "2";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576653550


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";
 
-  private CommonsConfigurationUtils() {
-  }
+  // usage: default header version of all configurations.
+  // if properties configuration doesn't contain header version, it will be 
considered as 1
+  public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1";

Review Comment:
   I don't think we need "1" and can use `null` if there is no header from 
segment metadata file. 
   
   How about `SEGMENT_METADATA_FILE_FORMAT_VERSION = 1` and use null as 
default. And leave some comments about what the new format would be like.
   ```
   public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_2 = "2";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576653550


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";
 
-  private CommonsConfigurationUtils() {
-  }
+  // usage: default header version of all configurations.
+  // if properties configuration doesn't contain header version, it will be 
considered as 1
+  public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1";

Review Comment:
   I don't think we need "1" and can use `null` if there is no header from 
segment metadata file. 
   
   The var name is not good as it encodes the value in the name. how about 
`SEGMENT_METADATA_FILE_FORMAT_VERSION = 1` and use null as default. And better 
leave some comments about what the new format version would be like.
   ```
   public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_2 = "2";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576653550


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";
 
-  private CommonsConfigurationUtils() {
-  }
+  // usage: default header version of all configurations.
+  // if properties configuration doesn't contain header version, it will be 
considered as 1
+  public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1";

Review Comment:
   I don't think we need "1" and can use `null` if there is no header from 
segment metadata file. 
   
   The var name is not good as it encodes the value in the name. how about 
`SEGMENT_METADATA_VERSION = 1` as default it's null. and do it need to be 
public?
   ```
   public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_2 = "2";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-23 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1576650089


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";

Review Comment:
   SEGMENT_METADATA_VERSION_HEADER_IDENTIFIER = "version" as we only use 
versions to decide the reader/writer for the segment metadata, and leaving all 
other users of this config util intact.



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -271,20 +319,73 @@ public static String 
recoverSpecialCharacterInPropertyValue(String value) {
 return value.replace("\0\0", ",");
   }
 
-  private static PropertiesConfiguration createPropertiesConfiguration(boolean 
setIOFactory,
-  boolean setDefaultDelimiter) {
+  /**
+   * creates the instance of the {@link 
org.apache.commons.configuration2.PropertiesConfiguration}
+   * with custom IO factory based on kind {@link 
org.apache.commons.configuration2.PropertiesConfiguration.IOFactory}
+   * and legacy list delimiter {@link 
org.apache.commons.configuration2.convert.LegacyListDelimiterHandler}
+   *
+   * @param setDefaultDelimiter sets the default list delimiter.
+   * @param ioFactoryKind IOFactory kind
+   * @return PropertiesConfiguration
+   */
+  private static PropertiesConfiguration createPropertiesConfiguration(boolean 
setDefaultDelimiter,
+  PropertyIOFactoryKind ioFactoryKind) {
 PropertiesConfiguration config = new PropertiesConfiguration();
 
-// setting IO Reader Factory
-if (setIOFactory) {
-  config.setIOFactory(new ConfigFilePropertyReaderFactory());
-}
+// setting IO Reader Factory of the configuration.
+config.setIOFactory(createPropertyIOFactory(ioFactoryKind));
 
-// setting DEFAULT_LIST_DELIMITER
+// setting the DEFAULT_LIST_DELIMITER
 if (setDefaultDelimiter) {
   config.setListDelimiterHandler(new 
LegacyListDelimiterHandler(DEFAULT_LIST_DELIMITER));
 }
 
 return config;
   }
+
+  /**
+   * Creates the IOFactory based on the provided kind.
+   * @param ioFactoryKind IOFactory kind
+   * @return IOFactory
+   */
+  private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind 
ioFactoryKind) {

Review Comment:
   We can define a get() or getInstance() method in the Enum 
`PropertyIOFactoryKind` so that we can remove this helper method 
createPropertyIOFactory(). 



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -128,6 +156,18 @@ public static PropertiesConfiguration fromFile(File file, 
boolean setIOFactory,
 return config;
   }
 
+  public static void saveSegmentMetadataToFile(PropertiesConfiguration 
propertiesConfiguration, File file,
+  String versionHeader) {
+if (StringUtils.isNotEmpty(versionHeader)) {

Review Comment:
   I think you can do the check as part of the if-check condition.



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -74,19 +65,19 @@ public static PropertiesConfiguration 
fromInputStream(InputStream stream)
*/
   public static PropertiesConfiguration fromPath(String path)
   throws ConfigurationException {
-return fromPath(path, false, true);
+return fromPath(path, true, 
PropertyIOFactoryKind.DefaultPropertyConfigurationIOFactory);
   }
 
   /**
* Instantiate a {@link PropertiesConfiguration} from an {@link String}.
* @param path representing the path of file
-   * @param setIOFactory representing to set the IOFactory or not
* @param setDefaultDelimiter representing to set the default list delimiter.
* @return a {@link PropertiesConfiguration} instance.
*/
-  public static PropertiesConfiguration fromPath(String path, boolean 
setIOFactory, boolean setDefaultDelimiter)
+  public static PropertiesConfiguration fromPath(String path, boolean 
setDefaultDelimiter,
+  PropertyIOFactoryKind ioFactoryKind)

Review Comment:
   how about pass in `@Nullable IOFactory ioFactory` to those util methods? so 
that we decouple those util methods from the logic of deciding the type and 
initializing the ioFactory object.



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -43,28 +46,16 @@
  */
 public class CommonsConfigurationUtils {
   private static final Character DEFAULT_LIST_DELIMITER = ',';
+  public static final String VERSION_HEADER_IDENTIFIER = "version";
 
-  private CommonsConfigurationUtils() {
-  }
+  // usage: default header version of all configurations.
+  // if properties configuration doesn't contain header version, it will be 
considered as 1
+  public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1";

Review Com

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-22 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1575127192


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/PropertyIOFactoryKind.java:
##
@@ -0,0 +1,25 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+public enum PropertyIOFactoryKind {

Review Comment:
   I plan to use these enums to define table/cluster configs. I want to keep 
these enums as of now, and if, in the next step of work, found not very 
helpful, I will delete them. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-17 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1569256706


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -287,4 +337,51 @@ private static PropertiesConfiguration 
createPropertiesConfiguration(boolean set
 
 return config;
   }
+
+  /**
+   * Creates the IOFactory based on the provided kind.
+   *
+   * @param ioFactoryKind IOFactory kind
+   * @return IOFactory
+   */
+  private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind 
ioFactoryKind) {
+switch (ioFactoryKind) {
+  case ConfigFileIOFactory:
+return new ConfigFilePropertyIOFactory();
+  case SegmentMetadataIOFactory:
+return new SegmentMetadataPropertyIOFactory();
+  default:
+return new PropertiesConfiguration.DefaultIOFactory();
+}
+  }
+
+  /**
+   * checks whether the configuration file first line is version header or not.
+   * @param file configuration file
+   * @return String
+   * @throws ConfigurationException exception.
+   */
+  private static String getConfigurationHeaderVersion(File file)
+  throws ConfigurationException {
+String versionValue = 
PropertyIOFactoryKind.DefaultPropertyConfigurationIOFactory.getVersion();
+if (file.exists()) {
+  try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
+String fileFirstLine = reader.readLine();
+// header version is written as a comment and start with '# '
+String versionHeaderCommentPrefix = String.format("# %s", 
VERSION_HEADER_IDENTIFIER);
+// check whether the segment has the version header or not

Review Comment:
   nit: s/segment/file, as this method doesn't have the concept of `segment`



##
pinot-spi/src/test/java/org/apache/pinot/spi/env/SegmentMetadataPropertyConfigTest.java:
##
@@ -0,0 +1,213 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.List;
+import java.util.Objects;
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.commons.configuration2.ex.ConfigurationException;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.AfterMethod;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertNotNull;
+import static org.testng.Assert.assertNull;
+
+
+public class SegmentMetadataPropertyConfigTest {
+  private static final File TEMP_DIR = new File(FileUtils.getTempDirectory(), 
"SegmentMetadataPropertyConfigTest");
+  private static final File CONFIG_FILE = new File(TEMP_DIR, "config");
+  private static final String[] TEST_PROPERTY_KEY = { "test1", "test2_key", 
"test3_key_",
+  "test4_key_1234", "test-1", "test.1" };
+  private static final String[] TEST_PROPERTY_KEY_WITH_SPECIAL_CHAR = { 
"test:1", "test2=key",
+  "test3,key_", "test4:=._key_1234", "test5-1=" };
+
+  @BeforeClass
+  public void setUp()
+  throws IOException {
+FileUtils.deleteDirectory(TEMP_DIR);
+  }
+
+  @AfterClass
+  @AfterMethod
+  public void tearDown()
+  throws IOException {
+FileUtils.deleteDirectory(TEMP_DIR);
+  }
+
+  @Test
+  public void testSegmentMetadataPropertyConfiguration()
+  throws ConfigurationException {
+PropertiesConfiguration configuration =
+CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, 
true, true);
+
+// setting the random value of the test keys
+for (String key: TEST_PROPERTY_KEY) {
+  configuration.setProperty(key, RandomStringUtils.randomAscii(5));
+}
+// recovered keys from the configuration.
+List recoveredKeys = 
CommonsConfigurationUtils.getKeys(configuration);
+testPropertyKeys(recoveredKeys, TEST_PROPERTY_KEY);
+
+CommonsConfigurationUtils.saveSegmentMetadataToFile(configuration, 
CONFIG_FILE, null); // save the configuration.
+
+// reading the configuration from saved file.
+configuration = 

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-09 Thread via GitHub


abhioncbr commented on PR #12440:
URL: https://github.com/apache/pinot/pull/12440#issuecomment-2046294955

   @Jackie-Jiang / @klsince, please review it once more. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-05 Thread via GitHub


Jackie-Jiang commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1554429709


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -287,4 +337,46 @@ private static PropertiesConfiguration 
createPropertiesConfiguration(boolean set
 
 return config;
   }
+
+  /**
+   * Creates the IOFactory based on the provided kind.
+   *
+   * @param ioFactoryKind IOFactory kind
+   * @return IOFactory
+   */
+  private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind 
ioFactoryKind) {
+switch (ioFactoryKind) {
+  case ConfigFileIOFactory:
+return new ConfigFilePropertyIOFactory();
+  case SegmentMetadataIOFactory:
+return new SegmentMetadataPropertyIOFactory();
+  default:
+return new PropertiesConfiguration.DefaultIOFactory();
+}
+  }
+
+  /**
+   * checks whether the segment metadata file first line is version header or 
not.
+   * @param segmentMetadataFile segment metadata file
+   * @return boolean
+   * @throws ConfigurationException exception.
+   */
+  private static boolean checkHeaderExistsInSegmentMetadata(File 
segmentMetadataFile)

Review Comment:
   Can we make this method return the version number of the file? Then we can 
have different writer/reader pair for different version number. In the current 
PR we treat `DefaultPropertyConfigurationIOFactory` as v1, and 
`SegmentMetadataIOFactory` as v2 which is not explicit. Consider grouping them 
with version number.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-05 Thread via GitHub


Jackie-Jiang commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1554426767


##
pinot-spi/src/test/resources/segment-metadata-without-version-header.properties:
##
@@ -0,0 +1,123 @@
+segment.padding.character = \u
+segment.name = fineFoodReviews_OFFLINE_0
+segment.table.name = fineFoodReviews
+segment.dimension.column.names = 
ProductId,Score,Summary,Text,UserId,combined,embedding,n_tokens
+segment.total.docs = 1000
+custom.input.data.file.uri = 
~/examples/batch/fineFoodReviews/rawdata/fine_food_reviews_with_embeddings_1k.parquet.gzip
+column.ProductId.cardinality = 868
+column.ProductId.totalDocs = 1000
+column.ProductId.dataType = STRING
+column.ProductId.bitsPerElement = 10
+column.ProductId.lengthOfEachEntry = 10
+column.ProductId.columnType = DIMENSION
+column.ProductId.isSorted = false
+column.ProductId.hasDictionary = true
+column.ProductId.isSingleValues = true
+column.ProductId.maxNumberOfMultiValues = 0
+column.ProductId.totalNumberOfEntries = 1000
+column.ProductId.isAutoGenerated = false
+column.ProductId.minValue = 7310172001

Review Comment:
   Can we add some escaped values in this file? We want to ensure the escaped 
values can be properly read. Same for the v2 config



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-05 Thread via GitHub


Jackie-Jiang commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1554426108


##
pinot-spi/src/test/resources/segment-metadata-with-version-header.properties:
##
@@ -0,0 +1,125 @@
+# segment.metadata.version=version1

Review Comment:
   Why is it start from `version1`?
   I'd suggest treating the one without header as version 1, and the new format 
starts as version 2.
   Also suggest simplifying it as `# version=2`. This is a general setting for 
all configurations, instead of just the segment metadata



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-04 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1552619903


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -84,9 +90,10 @@ public static PropertiesConfiguration fromPath(String path)
* @param setDefaultDelimiter representing to set the default list delimiter.
* @return a {@link PropertiesConfiguration} instance.
*/
-  public static PropertiesConfiguration fromPath(String path, boolean 
setIOFactory, boolean setDefaultDelimiter)
+  public static PropertiesConfiguration fromPath(String path, boolean 
setIOFactory, boolean setDefaultDelimiter,

Review Comment:
   Well this is the util class, I think having this as public make sense. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-04 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1552619464


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -54,7 +59,8 @@ private CommonsConfigurationUtils() {
*/
   public static PropertiesConfiguration fromFile(File file)
   throws ConfigurationException {
-return fromFile(file, false, true);
+return fromFile(file, false, true,
+PropertyIOFactoryKind.DefaultPropertyConfigurationIOFactory);

Review Comment:
   imo, The default IOFactory should be the 
`PropertiesConfiguration.DefaultIOFactory()`, which is provided by 
commons-configuration2. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-04-02 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1548331232


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -287,4 +342,42 @@ private static PropertiesConfiguration 
createPropertiesConfiguration(boolean set
 
 return config;
   }
+
+  /**
+   * Creates the IOFactory based on the provided kind.
+   *
+   * @param ioFactoryKind IOFactory kind
+   * @return IOFactory
+   */
+  private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind 
ioFactoryKind) {
+switch (ioFactoryKind) {
+  case ConfigFileIOFactory:
+return new ConfigFilePropertyIOFactory();
+  case SegmentMetadataIOFactory:
+return new SegmentMetadataPropertyIOFactory();
+  default:
+return new PropertiesConfiguration.DefaultIOFactory();
+}
+  }
+
+  /**
+   * checks whether the segment metadata file first line is version header or 
not.
+   * @param segmentMetadataFile segment metadata file
+   * @return boolean
+   * @throws IOException exception.
+   */
+  private static boolean checkHeaderExistsInSegmentMetadata(File 
segmentMetadataFile) throws IOException {
+Preconditions.checkNotNull(segmentMetadataFile,

Review Comment:
   do we need to do null check? I don't feel so, but if there is need, let's 
annotate `@Nullable File segmentMetadataFile`. 



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -287,4 +342,42 @@ private static PropertiesConfiguration 
createPropertiesConfiguration(boolean set
 
 return config;
   }
+
+  /**
+   * Creates the IOFactory based on the provided kind.
+   *
+   * @param ioFactoryKind IOFactory kind
+   * @return IOFactory
+   */
+  private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind 
ioFactoryKind) {
+switch (ioFactoryKind) {
+  case ConfigFileIOFactory:
+return new ConfigFilePropertyIOFactory();
+  case SegmentMetadataIOFactory:
+return new SegmentMetadataPropertyIOFactory();
+  default:
+return new PropertiesConfiguration.DefaultIOFactory();
+}
+  }
+
+  /**
+   * checks whether the segment metadata file first line is version header or 
not.
+   * @param segmentMetadataFile segment metadata file
+   * @return boolean
+   * @throws IOException exception.
+   */
+  private static boolean checkHeaderExistsInSegmentMetadata(File 
segmentMetadataFile) throws IOException {
+Preconditions.checkNotNull(segmentMetadataFile,
+"Segment Metadata file object can not be null for loading 
configurations");
+
+if (segmentMetadataFile.exists()) {
+  BufferedReader reader = new BufferedReader(new 
FileReader(segmentMetadataFile));

Review Comment:
   try this to close reader even upon any exception and save a few lines.
   ```
   try(BufferredReader reader = ...) {
  return StringUtils.startWith(...);
   }
   ```



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -84,9 +90,10 @@ public static PropertiesConfiguration fromPath(String path)
* @param setDefaultDelimiter representing to set the default list delimiter.
* @return a {@link PropertiesConfiguration} instance.
*/
-  public static PropertiesConfiguration fromPath(String path, boolean 
setIOFactory, boolean setDefaultDelimiter)
+  public static PropertiesConfiguration fromPath(String path, boolean 
setIOFactory, boolean setDefaultDelimiter,

Review Comment:
   looks like those fromXXX(, ... PropertyIOFactoryKind ioFactoryKind) can be 
`private` methods for now?



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -128,6 +161,18 @@ public static PropertiesConfiguration fromFile(File file, 
boolean setIOFactory,
 return config;
   }
 
+  public static void saveSegmentMetadataToFile(PropertiesConfiguration 
propertiesConfiguration, File file,
+  String versionHeader) {
+  if (StringUtils.isNotEmpty(versionHeader)) {
+String header = String.format("%s=%s", SEGMENT_VERSION_IDENTIFIER, 
versionHeader);
+propertiesConfiguration.setHeader(header);
+
+// setIO appropriate factory

Review Comment:
   the comment was not completed?



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -287,4 +342,42 @@ private static PropertiesConfiguration 
createPropertiesConfiguration(boolean set
 
 return config;
   }
+
+  /**
+   * Creates the IOFactory based on the provided kind.
+   *
+   * @param ioFactoryKind IOFactory kind
+   * @return IOFactory
+   */
+  private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind 
ioFactoryKind) {
+switch (ioFactoryKind) {
+  case ConfigFileIOFactory:
+return new ConfigFilePropertyIOFactory();
+  case SegmentMetadataIOFactory:
+return new SegmentMetadataPropertyIOFactory();
+  defa

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-21 Thread via GitHub


abhioncbr commented on PR #12440:
URL: https://github.com/apache/pinot/pull/12440#issuecomment-2013435999

   Based on the offline discussion, we decided to have pair of separate custom 
segment metadata reader/writer. We will decide to use custom reader/writer 
based on the version of the segment metadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-18 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1529050750


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,85 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import java.util.List;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+  private boolean _skipUnescapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyReader(Reader reader, String 
segmentMetadataVersionHeader) {
+super(reader);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// if newer version of the segment metadata(based on version value in the 
property configuration header)
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+if (!_skipUnescapePropertyName && getCommentLines().size() > 0) {
+  setSkipUnescapePropertyNameFlag();
+}
+
+if (_skipUnescapePropertyName) {
+  String[] keyValue = line.split(getPropertySeparator());
+  Preconditions.checkArgument(keyValue.length == 2, "property content 
split should result in key and value");
+  initPropertyName(keyValue[0]);
+  initPropertyValue(keyValue[1]);
+  initPropertySeparator(getPropertySeparator());
+} else {
+  // for backward compatability, follow the default approach
+  super.parseProperty(line);
+}
+  }
+
+  @Override
+  protected String unescapePropertyName(final String name) {
+// skip the unescaping of the propertyName(key), if newer version of the 
segment metadata.
+if (_skipUnescapePropertyName) {
+  return name;
+}
+return super.unescapePropertyName(name);
+  }
+
+  // set the `_skipUnescapePropertyName` as true if the header comment line 
contains the segment metadata version.
+  private void setSkipUnescapePropertyNameFlag() {
+List commentLines = getCommentLines();
+if (commentLines.size() > 0) {
+  // assumes that header will have two lines
+  // first one with segment version header
+  // and second as a new line.
+  String headerComment = commentLines.get(0);
+if (headerComment.contains(_segmentMetadataVersionHeader)) {

Review Comment:
   format? 
   and why need to assume there are `two lines`? looks like we just use the 1st 
line



##
pinot-spi/src/test/resources/segment-metadata-with-version-header.properties:
##
@@ -0,0 +1,125 @@
+# segment.metadata.version=version1

Review Comment:
   how did this header/comment line get written to the file? I must have missed 
it as I didn't notice any logic related to writing out this line in the 
SegmentMetadataPropertyWriter class



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/PropertyIOFactoryKind.java:
##
@@ -0,0 +1,25 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-15 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1525985967


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java:
##
@@ -0,0 +1,67 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.IOException;
+import java.io.Writer;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter;
+import org.apache.commons.configuration2.convert.ListDelimiterHandler;
+
+
+/**
+ * SegmentMetadataPropertyWriter extends the PropertiesWriter
+ * 
+ * Purpose: custom property writer for writing the segment metadata faster by 
skipping the escaping of key.
+ */
+public class SegmentMetadataPropertyWriter extends PropertiesWriter {
+  private boolean _skipEscapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyWriter(final Writer writer, 
ListDelimiterHandler handler,
+  String segmentMetadataVersionHeader) {
+super(writer, handler);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected String escapeKey(final String key) {
+// skip the escapeKey functionality, if segment metadata has a newer 
version
+// if not newer version, follow the escape for backward compatibility.
+if (_skipEscapePropertyName) {
+  return key;
+}
+return super.escapeKey(key);
+  }
+
+  @Override
+  public void writeln(final String s) throws IOException {

Review Comment:
   Let me try to provide a little more context around how in 
commons-configuration2, properties are getting written to the file or read from 
the file
   
   - Each PropertiesConfiguration object in config2  is associated with the 
[PropertiesConfigurationLayout](https://commons.apache.org/proper/commons-configuration/apidocs/org/apache/commons/configuration2/PropertiesConfigurationLayout.html)
 and this object helps in preserving the comments, header and footer comment of 
the PropertiesConfiguration.
   - Header is a global comment that can be set for the properties file. This 
comment is written at the very start of the file, followed by an empty line. 
[[reference](https://commons.apache.org/proper/commons-configuration/apidocs/org/apache/commons/configuration2/PropertiesConfigurationLayout.html#setHeaderComment(java.lang.String))].
 
 - With above statement, your concern 
   > which may be too late as we already parse/write many lines in the 
expensive ways 
   
is not valid, since before writing the first property we will be able 
to set the flag value(`_skipEscapePropertyName`). Header comment lines are the 
first lines written to file. [[code 
reference](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfigurationLayout.java#L489)].
   
   However I understand it's not a clean implementation. I think introducing 
the separate new method (say `saveSegmentMetadata`) in 
`CommonsConfigurationUtils` can work for us. Before saving, we will set the 
IOFactory again based on table or cluster config and also by that moment 
version header content will also be available.
   
   I have added test case with segment metadata resources 
[with](https://github.com/apache/pinot/pull/12440/files#diff-9e5c3dcfcbdf8a9ae4d2cbdef02df3805e931990151c47d4d3030f2ecf39b63a)
 and 
[without](https://github.com/apache/pinot/pull/12440/files#diff-8d406f553720a36462fae94ddd6c682ea4d9d5b1eebae3f9425d14d10537aa63)
 header.  Please refer.
   
   For `SegmentMetadataPropertyReader`, I think the current implementation is 
correct since the header comment is the first thing read from the file and also 
it's going to be the only comment present in the segment metadata properties 
file, since we don't set any other comment. Also, imo, opening/closing the file 
first to determine the header will be more expensive. 
   
   please let me know if it make sense or you have any further comments. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-13 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1523970058


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java:
##
@@ -0,0 +1,67 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.IOException;
+import java.io.Writer;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter;
+import org.apache.commons.configuration2.convert.ListDelimiterHandler;
+
+
+/**
+ * SegmentMetadataPropertyWriter extends the PropertiesWriter
+ * 
+ * Purpose: custom property writer for writing the segment metadata faster by 
skipping the escaping of key.
+ */
+public class SegmentMetadataPropertyWriter extends PropertiesWriter {
+  private boolean _skipEscapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyWriter(final Writer writer, 
ListDelimiterHandler handler,
+  String segmentMetadataVersionHeader) {
+super(writer, handler);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected String escapeKey(final String key) {
+// skip the escapeKey functionality, if segment metadata has a newer 
version
+// if not newer version, follow the escape for backward compatibility.
+if (_skipEscapePropertyName) {
+  return key;
+}
+return super.escapeKey(key);
+  }
+
+  @Override
+  public void writeln(final String s) throws IOException {

Review Comment:
   Thanks for the detailed explanation. 
   
   I still didn't follow why we have to decide version while parsing or writing 
the lines, which may be too late as we  already parse/write many lines in the 
expensive ways. 
   
   Could you provide a example segment metadata file with version header? I 
feel my understanding of adding comments into segment metadata file to include 
version info seems different from yours (or how the config lib works).
   
   For `SegmentMetadataPropertyReader`, I was thinking that we could read 
comment ourselves (just 1st line to be simple as we control the writing logic 
and we can just write a single comment to the config file), and then check if 
there is version header there. If there is a version, then use the new reader, 
which does no escaping and no regex; if no version info in the comment, the we 
continue to use the old reader to read segment metadata.



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -100,24 +103,39 @@ public static PropertiesConfiguration fromPath(String 
path, boolean setIOFactory
* @return a {@link PropertiesConfiguration} instance.
*/
   public static PropertiesConfiguration fromInputStream(InputStream stream, 
boolean setIOFactory,
-  boolean setDefaultDelimiter)
+  boolean setDefaultDelimiter, PropertyIOFactoryKind ioFactoryKind)
   throws ConfigurationException {
-PropertiesConfiguration config = 
createPropertiesConfiguration(setIOFactory, setDefaultDelimiter);
+PropertiesConfiguration config = 
createPropertiesConfiguration(setIOFactory, setDefaultDelimiter, ioFactoryKind);
 FileHandler fileHandler = new FileHandler(config);
 fileHandler.load(stream);
 return config;
   }
 
+  public static PropertiesConfiguration segmentMetadataFromFile(File file, 
boolean setIOFactory,
+  boolean setDefaultDelimiter, PropertyIOFactoryKind ioFactoryKind, String 
segmentVersionHeader)
+  throws ConfigurationException {
+return fromFile(file, setIOFactory, setDefaultDelimiter, ioFactoryKind, 
segmentVersionHeader);
+  }
+
+  public static PropertiesConfiguration fromFile(File file, boolean 
setIOFactory,
+  boolean setDefaultDelimiter, PropertyIOFactoryKind ioFactoryKind)
+  throws ConfigurationException {
+return fromFile(file, setIOFactory, setDefaultDelimiter, ioFactoryKind, 
"");

Review Comment:
   I see both null and `""` are used by default, anything special to use either 
one? I'd assume we pass null by default



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -287,4 +321,23 @@ pri

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-12 Thread via GitHub


abhioncbr commented on PR #12440:
URL: https://github.com/apache/pinot/pull/12440#issuecomment-1992849797

   @Jackie-Jiang / @klsince, any further comments on the implementation side. I 
am going to start work on it's  usage based on the cluster configuration or 
table configuration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-11 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1519539765


##
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/V1Constants.java:
##
@@ -65,6 +65,8 @@ public static class MetadataKeys {
 public static class Segment {
   public static final String SEGMENT_CREATOR_VERSION = "creator.version";
   public static final String SEGMENT_NAME = "segment.name";
+
+  public static final String SEGMENT_METADATA_VERSION = 
"segment.metadata.version";

Review Comment:
   I am planning to use this constant for writing new segment metadata, however 
that work I am planning to do in two other PRs. So, this is unused from this PR 
perspective.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-11 Thread via GitHub


abhioncbr commented on PR #12440:
URL: https://github.com/apache/pinot/pull/12440#issuecomment-1988175225

   > Can we add a test to ensure it can read the old format properly? We can 
put a old format file into the resource folder, and use the new reader to read 
it
   
   Added the required tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-07 Thread via GitHub


Jackie-Jiang commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1516797586


##
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/V1Constants.java:
##
@@ -65,6 +65,8 @@ public static class MetadataKeys {
 public static class Segment {
   public static final String SEGMENT_CREATOR_VERSION = "creator.version";
   public static final String SEGMENT_NAME = "segment.name";
+
+  public static final String SEGMENT_METADATA_VERSION = 
"segment.metadata.version";

Review Comment:
   Is this used? I cannot find any usage of it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-06 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1515359972


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,82 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import java.util.List;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+  private boolean _skipUnescapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyReader(Reader reader, String 
segmentMetadataVersionHeader) {
+super(reader);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// if newer version of the segment metadata(based on version value in the 
property configuration header)
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+if (!_skipUnescapePropertyName) {

Review Comment:
   Added extra condition(`getCommentLines().size() > 0`) to skip some of the 
unnecessary check. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-04 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1512037860


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,82 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import java.util.List;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+  private boolean _skipUnescapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyReader(Reader reader, String 
segmentMetadataVersionHeader) {
+super(reader);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// if newer version of the segment metadata(based on version value in the 
property configuration header)
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+if (!_skipUnescapePropertyName) {

Review Comment:
   True, this if-check will run for every property if the header is not present 
in the segment metadata. We need to introduce some more conditions here.
   
   Setting the flag value in the constructor wouldn't work if we want to set 
the value based on the header value present. Config with the header is 
determined when the PropertiesConfiguration reader reads the entire file 
content. Header is the first comment line, and it's set after reading the first 
property from the file([code 
reference](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfigurationLayout.java#L672)).
 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-04 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1512015424


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java:
##
@@ -0,0 +1,67 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.IOException;
+import java.io.Writer;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter;
+import org.apache.commons.configuration2.convert.ListDelimiterHandler;
+
+
+/**
+ * SegmentMetadataPropertyWriter extends the PropertiesWriter
+ * 
+ * Purpose: custom property writer for writing the segment metadata faster by 
skipping the escaping of key.
+ */
+public class SegmentMetadataPropertyWriter extends PropertiesWriter {
+  private boolean _skipEscapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyWriter(final Writer writer, 
ListDelimiterHandler handler,
+  String segmentMetadataVersionHeader) {
+super(writer, handler);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected String escapeKey(final String key) {
+// skip the escapeKey functionality, if segment metadata has a newer 
version
+// if not newer version, follow the escape for backward compatibility.
+if (_skipEscapePropertyName) {
+  return key;
+}
+return super.escapeKey(key);
+  }
+
+  @Override
+  public void writeln(final String s) throws IOException {

Review Comment:
   Good question. 
   
   In the first attempt, I was setting `_skipEscapePropertyName` in the 
constructor itself. However, in our current implementation, We set the 
IOFactory along with properties configuration object creation([code 
reference](https://github.com/apache/pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java#L279)).
 So this flag value based on Header content was always getting set to `false`. 
That's why, I decided to populate the flag value when we started writing the 
data into the file.
   
After debugging, I discovered that header content is always the first line 
written to the file, and it's being treated as the comment line in 
commons-configuration2[[code 
reference](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfigurationLayout.java#L489)],
 so, in my opinion, we don't need any escaping, and no property getting written 
before this. 

   My only worry is that in the case of segment metadata without any header, 
this block executes every time we try to write the line. I need input to make 
this check run more conditional. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-04 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1511713277


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java:
##
@@ -0,0 +1,67 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.IOException;
+import java.io.Writer;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter;
+import org.apache.commons.configuration2.convert.ListDelimiterHandler;
+
+
+/**
+ * SegmentMetadataPropertyWriter extends the PropertiesWriter
+ * 
+ * Purpose: custom property writer for writing the segment metadata faster by 
skipping the escaping of key.
+ */
+public class SegmentMetadataPropertyWriter extends PropertiesWriter {
+  private boolean _skipEscapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyWriter(final Writer writer, 
ListDelimiterHandler handler,
+  String segmentMetadataVersionHeader) {
+super(writer, handler);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected String escapeKey(final String key) {
+// skip the escapeKey functionality, if segment metadata has a newer 
version
+// if not newer version, follow the escape for backward compatibility.
+if (_skipEscapePropertyName) {
+  return key;
+}
+return super.escapeKey(key);
+  }
+
+  @Override
+  public void writeln(final String s) throws IOException {

Review Comment:
   should we set this flag `_skipEscapePropertyName` in the constructor method? 
iiuc, we set this once encountering the special property. If so the properties 
written before seeing this property are handled with escaping (?)
   
   Also, how do you write the special property as a header comment? I must have 
missed it.



##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,82 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import java.util.List;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+  private boolean _skipUnescapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyReader(Reader reader, String 
segmentMetadataVersionHeader) {
+super(reader);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// if newer version of the segment metadata(based on version value in the 
property configuration header)
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+if (!_skipUnescapePropertyName) {

Review Comment:
   for config file w/o this header, it seems like we'd have to enter this 
if-check for every property in the config file?
   
   why not set check and set this flag in the constructor method?



-- 
This is an 

Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-04 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1511264039


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,82 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import com.google.common.base.Preconditions;
+import java.io.Reader;
+import java.util.List;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+/**
+ * SegmentMetadataPropertyReader extends the PropertiesReader
+ * 
+ * Purpose: loads the segment metadata faster
+ *  - by skipping the unescaping of key and
+ *  - parsing the line by splitting based on first occurrence of separator
+ */
+class SegmentMetadataPropertyReader extends PropertiesReader {
+  private boolean _skipUnescapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyReader(Reader reader, String 
segmentMetadataVersionHeader) {
+super(reader);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected void parseProperty(final String line) {
+// if newer version of the segment metadata(based on version value in the 
property configuration header)
+// skip the regex based parsing of the line content and splitting the 
content based on first occurrence of separator
+if (!_skipUnescapePropertyName) {

Review Comment:
   - **Explanation**: We override this method to set the 
`_skipUnescapePropertyName` flag. 
   - **Assumption**: We will not set the version header for the old segment 
metadata. If we decide to add the header for old segment metadata, in that case 
we have to compare the version value as well. 
   
   Header comment is the first content read from the config file, so it will 
set the flag value if `segment.metadata.version` header is present for the 
segment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-04 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1511260848


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java:
##
@@ -0,0 +1,67 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.IOException;
+import java.io.Writer;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter;
+import org.apache.commons.configuration2.convert.ListDelimiterHandler;
+
+
+/**
+ * SegmentMetadataPropertyWriter extends the PropertiesWriter
+ * 
+ * Purpose: custom property writer for writing the segment metadata faster by 
skipping the escaping of key.
+ */
+public class SegmentMetadataPropertyWriter extends PropertiesWriter {
+  private boolean _skipEscapePropertyName;
+  private final String _segmentMetadataVersionHeader;
+
+  public SegmentMetadataPropertyWriter(final Writer writer, 
ListDelimiterHandler handler,
+  String segmentMetadataVersionHeader) {
+super(writer, handler);
+_segmentMetadataVersionHeader = segmentMetadataVersionHeader;
+  }
+
+  @Override
+  protected String escapeKey(final String key) {
+// skip the escapeKey functionality, if segment metadata has a newer 
version
+// if not newer version, follow the escape for backward compatibility.
+if (_skipEscapePropertyName) {
+  return key;
+}
+return super.escapeKey(key);
+  }
+
+  @Override
+  public void writeln(final String s) throws IOException {

Review Comment:
   - **Explanation**: We override this method to set the 
`_skipEscapePropertyName` flag. 
   - **Assumption**: We will not set the version header for the old segment 
metadata. If we decide to add the header for old segment metadata, in that case 
we have to compare the version value. 
   
   Header comment is the first get written to config file, so it will set the 
flag value if `segment.metadata.version` header is present for the segment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-03 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1509441264


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -287,4 +321,35 @@ private static PropertiesConfiguration 
createPropertiesConfiguration(boolean set
 
 return config;
   }
+
+  /**
+   * Creates the IOFactory based on the provided kind.
+   *
+   * @param ioFactoryKind IOFactory kind
+   * @param config to get the header content from the property configuration
+   * @param headerContentToCheck header content to validate.
+   * @return IOFactory
+   */
+  private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind 
ioFactoryKind,
+  PropertiesConfiguration config, String headerContentToCheck) {
+switch (ioFactoryKind) {
+  case ConfigFileIOFactory:
+return new ConfigFilePropertyIOFactory();
+  case SegmentMetadataIOFactory:
+String headerContent = config.getHeader();
+boolean skipEscapeUnescapePropertyName =
+shouldSkipEscapeUnescapePropertyName(headerContent, 
headerContentToCheck);
+return new 
SegmentMetadataPropertyIOFactory(skipEscapeUnescapePropertyName);
+  case DefaultPropertyConfigurationIOFactory:
+  default:
+return new PropertiesConfiguration.DefaultIOFactory();
+}
+  }
+
+  private static boolean shouldSkipEscapeUnescapePropertyName(String header, 
String headerContentToCheck) {

Review Comment:
   We are introducing the segment metadata version header, so it should not be 
present for old metadata properties. That's why checking whether the header 
contains `segment.metadata.version`  or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-03-01 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1509441264


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java:
##
@@ -287,4 +321,35 @@ private static PropertiesConfiguration 
createPropertiesConfiguration(boolean set
 
 return config;
   }
+
+  /**
+   * Creates the IOFactory based on the provided kind.
+   *
+   * @param ioFactoryKind IOFactory kind
+   * @param config to get the header content from the property configuration
+   * @param headerContentToCheck header content to validate.
+   * @return IOFactory
+   */
+  private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind 
ioFactoryKind,
+  PropertiesConfiguration config, String headerContentToCheck) {
+switch (ioFactoryKind) {
+  case ConfigFileIOFactory:
+return new ConfigFilePropertyIOFactory();
+  case SegmentMetadataIOFactory:
+String headerContent = config.getHeader();
+boolean skipEscapeUnescapePropertyName =
+shouldSkipEscapeUnescapePropertyName(headerContent, 
headerContentToCheck);
+return new 
SegmentMetadataPropertyIOFactory(skipEscapeUnescapePropertyName);
+  case DefaultPropertyConfigurationIOFactory:
+  default:
+return new PropertiesConfiguration.DefaultIOFactory();
+}
+  }
+
+  private static boolean shouldSkipEscapeUnescapePropertyName(String header, 
String headerContentToCheck) {

Review Comment:
   We are introducing the segment metadata version header, so it should not be 
present for old metadata properties. That's why checking whether the header 
contains `segment.metadata.version`  or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-02-26 Thread via GitHub


abhioncbr commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1503606292


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+public class SegmentMetadataPropertyReader extends PropertiesReader {
+
+  public SegmentMetadataPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected String unescapePropertyName(final String name) {

Review Comment:
   Actually, the regex support in `doParseProperty` gives support to[ multiple 
separators](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfiguration.java#L460)
 which can be configured. By default  it's `=` and `:`([code 
link](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfiguration.java#L1027))
   
   Do we want to be very specific with `=` or give some support to the default 
separator? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] Custom configuration property reader for segment metadata files [pinot]

2024-02-26 Thread via GitHub


klsince commented on code in PR #12440:
URL: https://github.com/apache/pinot/pull/12440#discussion_r1503346467


##
pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java:
##
@@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.env;
+
+import java.io.Reader;
+import 
org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader;
+
+
+public class SegmentMetadataPropertyReader extends PropertiesReader {
+
+  public SegmentMetadataPropertyReader(Reader reader) {
+super(reader);
+  }
+
+  @Override
+  protected String unescapePropertyName(final String name) {

Review Comment:
   in addition to this method, should we also consider to override this method 
of `PropertiesReader` 
   ```
   protected void parseProperty(final String line) {
   final String[] property = doParseProperty(line, false); <--- 
expensive using regex to split `key = value`
   
   ```
   
   I think we might just use the index of the first `=` to split the key and 
value up



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org