Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince merged PR #12440: URL: https://github.com/apache/pinot/pull/12440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1619284601 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/VersionedPropertyReader.java: ## @@ -0,0 +1,60 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * VersionedPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class VersionedPropertyReader extends PropertiesReader { + + public VersionedPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected void parseProperty(final String line) { +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +// getPropertySeparator(), in general returns the PropertiesConfiguration `DEFAULT_SEPARATOR` value i.e. ' = '. +String separator = getPropertySeparator(); + Preconditions.checkArgument(CommonsConfigurationUtils.VERSIONED_CONFIG_SEPARATOR.equals(separator), +String.format("versioned property configuration separator should be equal to '%s'", +CommonsConfigurationUtils.VERSIONED_CONFIG_SEPARATOR)); + +String[] keyValue = line.split(getPropertySeparator()); +Preconditions.checkArgument(keyValue.length == 2, "property content split should result in key and value"); Review Comment: This check might fail if the value happen to have the separator The `line.split()` does regex matching internally, so better use `StringUtils.split()`, and `StringUtils.split()` can take a `max` param to return just two parts to handle the issue said above. And since you have get `separator` above, you can save the calls of getPropertySeparator() on L48 and L52. ## pinot-spi/src/main/java/org/apache/pinot/spi/env/VersionedPropertyReader.java: ## @@ -0,0 +1,60 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * VersionedPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class VersionedPropertyReader extends PropertiesReader { + + public VersionedPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected void parseProperty(final String line) { +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +// getPropertySeparator(), in general returns the PropertiesConfiguration `DEFAULT_SEPARATOR` value i.e. ' = '. +String separator = getPropertySeparator(); + Preconditions.checkArgument(CommonsConfigurationUtils.VERSIONED_CONFIG_SEPARATOR.equals(separator), +String.format("versioned property configuration separator should be equal to '%s'", Review Comment: nit: "Versioned property configuration separator '%s' should be e
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1617954347 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -276,20 +334,65 @@ public static String recoverSpecialCharacterInPropertyValue(String value) { return value.replace("\0\0", ","); } - private static PropertiesConfiguration createPropertiesConfiguration(boolean setIOFactory, - boolean setDefaultDelimiter) { + /** + * creates the instance of the {@link org.apache.commons.configuration2.PropertiesConfiguration} + * with custom IO factory based on kind {@link org.apache.commons.configuration2.PropertiesConfiguration.IOFactory} + * and legacy list delimiter {@link org.apache.commons.configuration2.convert.LegacyListDelimiterHandler} + * + * @param setDefaultDelimiter sets the default list delimiter. + * @param ioFactoryKind IOFactory kind, can be null. + * @return PropertiesConfiguration + */ + private static PropertiesConfiguration createPropertiesConfiguration(boolean setDefaultDelimiter, + @Nullable PropertyIOFactoryKind ioFactoryKind) { PropertiesConfiguration config = new PropertiesConfiguration(); -// setting IO Reader Factory -if (setIOFactory) { - config.setIOFactory(new ConfigFilePropertyReaderFactory()); +// setting IO Reader Factory of the configuration. +if (ioFactoryKind != null) { + config.setIOFactory(ioFactoryKind.getInstance()); } -// setting DEFAULT_LIST_DELIMITER +// setting the DEFAULT_LIST_DELIMITER if (setDefaultDelimiter) { config.setListDelimiterHandler(new LegacyListDelimiterHandler(DEFAULT_LIST_DELIMITER)); } return config; } + + /** + * checks whether the configuration file first line is version header or not. + * @param file configuration file + * @return String + * @throws ConfigurationException exception. + */ + private static String getConfigurationHeaderVersion(File file) + throws ConfigurationException { +String versionValue = DEFAULT_PROPERTIES_CONFIGURATION_HEADER_VERSION; +if (file.exists()) { + try (BufferedReader reader = new BufferedReader(new FileReader(file))) { +String fileFirstLine = reader.readLine(); +// header version is written as a comment and start with '# ' +String versionHeaderCommentPrefix = String.format("# %s", VERSION_HEADER_IDENTIFIER); +// check whether the file has the version header or not +if (StringUtils.startsWith(fileFirstLine, versionHeaderCommentPrefix)) { + String[] headerKeyValue = fileFirstLine.split(VERSIONED_CONFIG_SEPARATOR); + if (headerKeyValue.length == 2) { +versionValue = headerKeyValue[1]; + } +} + } catch (IOException exception) { +throw new ConfigurationException( +String.format("Error occurred while reading configuration file %s ", Review Comment: nit: was the whitespace after `%s` left intentionally? otherwise can save `String.format()` and simply `"Error ..." + file.getName()` for short, but not a blocking comment any way. ## pinot-spi/src/main/java/org/apache/pinot/spi/env/VersionedPropertyReader.java: ## @@ -0,0 +1,55 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * VersionedPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class VersionedPropertyReader extends PropertiesReader { + + public VersionedPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected void parseProperty(final String line) { +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +// getPropertySeparator(), in general returns the PropertiesConfiguration `DEFAULT_SEPARATO
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1614759506 ## pinot-spi/src/test/java/org/apache/pinot/spi/env/VersionedPropertyConfigTest.java: ## @@ -0,0 +1,224 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.File; +import java.io.IOException; +import java.util.List; +import java.util.Objects; +import org.apache.commons.configuration2.PropertiesConfiguration; +import org.apache.commons.configuration2.ex.ConfigurationException; +import org.apache.commons.io.FileUtils; +import org.apache.commons.lang3.RandomStringUtils; +import org.testng.annotations.AfterClass; +import org.testng.annotations.AfterMethod; +import org.testng.annotations.BeforeClass; +import org.testng.annotations.Test; + +import static org.testng.Assert.assertEquals; +import static org.testng.Assert.assertNotNull; +import static org.testng.Assert.assertNull; + + +public class VersionedPropertyConfigTest { + + private static final String COLON_SEPARATOR = ":"; + private static final File TEMP_DIR = new File(FileUtils.getTempDirectory(), "VersionedPropertyConfigTest"); + private static final File CONFIG_FILE = new File(TEMP_DIR, "config"); + private static final String[] TEST_PROPERTY_KEY = { "test1", "test2_key", "test3_key_", + "test4_key_1234", "test-1", "test.1" }; + private static final String[] TEST_PROPERTY_KEY_WITH_SPECIAL_CHAR = { "test:1", "test2=key", Review Comment: So, in this test case, `test2=key` is a key. the overall property in the file looks like this ```bash test2=key = value. ``` Since we are using the `getPropertySeparator` method in the reader, which returns the separator value as ` = `, it's working correctly. I think with this implementation, we can also use `=` in the key. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1614757440 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -42,29 +44,20 @@ * Provide utility functions to manipulate Apache Commons {@link Configuration} instances. */ public class CommonsConfigurationUtils { + + private static final String VERSIONED_CONFIG_SEPARATOR = " = "; Review Comment: This is the same value of `DEFAULT_SEPARATOR` in [PropertiesConfiguration.java](https://github.com/apache/commons-configuration/blob/15e6ec62f1b309bab0bffd139a4790564dc3ca73/src/main/java/org/apache/commons/configuration2/PropertiesConfiguration.java#L1005). We are using the same because in VersionedPropertyReader we are using the [getPropertySeparator](https://github.com/apache/pinot/pull/12440/files#diff-7cfc074e3db46c25491eba3d6df78847cecbb093b4780dbe6a1b8875ffdc3555R42) to split the string into key and value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1612507311 ## pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java: ## @@ -50,6 +53,33 @@ public void tearDown() FileUtils.deleteDirectory(TEMP_DIR); } + @Test + public void testSegmentMetadataFromFile() { +// load the existing config and check the properties config instance +try { + PropertiesConfiguration config = CommonsConfigurationUtils + .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true); + assertNotNull(config); + + config.setProperty("testKey", "testValue"); + + // add the segment version header to the file and read it again + CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE, + CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2); + + // reading the property with header. + config = CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true); + assertNotNull(config); + assertEquals(config.getHeader(), "# version=2"); +} catch (Exception ex) { + Assert.fail("should not throw ConfigurationException exception with valid file"); +} + +// load the non-existing file and expect the exception +Assert.expectThrows(NullPointerException.class, +() -> CommonsConfigurationUtils.getSegmentMetadataFromFile(null, true)); Review Comment: addressed in the [PR](https://github.com/apache/pinot/pull/13201). marking it as resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1608761946 ## pinot-spi/src/test/java/org/apache/pinot/spi/env/VersionedPropertyConfigTest.java: ## @@ -0,0 +1,224 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.File; +import java.io.IOException; +import java.util.List; +import java.util.Objects; +import org.apache.commons.configuration2.PropertiesConfiguration; +import org.apache.commons.configuration2.ex.ConfigurationException; +import org.apache.commons.io.FileUtils; +import org.apache.commons.lang3.RandomStringUtils; +import org.testng.annotations.AfterClass; +import org.testng.annotations.AfterMethod; +import org.testng.annotations.BeforeClass; +import org.testng.annotations.Test; + +import static org.testng.Assert.assertEquals; +import static org.testng.Assert.assertNotNull; +import static org.testng.Assert.assertNull; + + +public class VersionedPropertyConfigTest { + + private static final String COLON_SEPARATOR = ":"; + private static final File TEMP_DIR = new File(FileUtils.getTempDirectory(), "VersionedPropertyConfigTest"); + private static final File CONFIG_FILE = new File(TEMP_DIR, "config"); + private static final String[] TEST_PROPERTY_KEY = { "test1", "test2_key", "test3_key_", + "test4_key_1234", "test-1", "test.1" }; + private static final String[] TEST_PROPERTY_KEY_WITH_SPECIAL_CHAR = { "test:1", "test2=key", Review Comment: curious how keys with `=` like `test2=key` get handled properly? because the two white spaces in the separator constant defined in `CommonsConfigurationUtils.java` above? ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -42,29 +44,20 @@ * Provide utility functions to manipulate Apache Commons {@link Configuration} instances. */ public class CommonsConfigurationUtils { + + private static final String VERSIONED_CONFIG_SEPARATOR = " = "; Review Comment: is the white space required? ## pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java: ## @@ -50,6 +53,33 @@ public void tearDown() FileUtils.deleteDirectory(TEMP_DIR); } + @Test + public void testSegmentMetadataFromFile() { +// load the existing config and check the properties config instance +try { + PropertiesConfiguration config = CommonsConfigurationUtils + .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true); + assertNotNull(config); + + config.setProperty("testKey", "testValue"); + + // add the segment version header to the file and read it again + CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE, + CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2); + + // reading the property with header. + config = CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true); + assertNotNull(config); + assertEquals(config.getHeader(), "# version=2"); +} catch (Exception ex) { + Assert.fail("should not throw ConfigurationException exception with valid file"); +} + +// load the non-existing file and expect the exception +Assert.expectThrows(NullPointerException.class, +() -> CommonsConfigurationUtils.getSegmentMetadataFromFile(null, true)); Review Comment: > we don't need the non-existing file test. is this because we do `fileHandler.setFile(file);` if file doesn't exist? not introduced by this PR, but looks like you added those changes earlier on, so in case you know: 1. for `fromFile()`, why we need to do `fileHandler.setFile(file)` when file doesn't exit? 2. if that's needed, for `fromPath()` method, should we do `fileHandler.setPath(path)` if path doesn't exist as well? ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -271,20 +328,59 @@ public static String recoverSpecialCharacterInPropertyValue(String value) { return value.replace("\0\0", ","); } - private static PropertiesConfiguration createPropertiesConfiguration(boolean
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1605906610 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,54 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + + public SegmentMetadataPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected void parseProperty(final String line) { +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +String[] keyValue = line.split(getPropertySeparator()); +Preconditions.checkArgument(keyValue.length == 2, "property content split should result in key and value"); +initPropertyName(keyValue[0]); +initPropertyValue(keyValue[1]); +initPropertySeparator(getPropertySeparator()); Review Comment: Setting the global separator for versioned properties configurations ensures it gets written with the `=` separator. Also, I added the unit test case. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1605808245 ## pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java: ## @@ -50,6 +53,33 @@ public void tearDown() FileUtils.deleteDirectory(TEMP_DIR); } + @Test + public void testSegmentMetadataFromFile() { +// load the existing config and check the properties config instance +try { + PropertiesConfiguration config = CommonsConfigurationUtils + .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true); + assertNotNull(config); + + config.setProperty("testKey", "testValue"); + + // add the segment version header to the file and read it again + CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE, + CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2); + + // reading the property with header. + config = CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true); + assertNotNull(config); + assertEquals(config.getHeader(), "# version=2"); +} catch (Exception ex) { + Assert.fail("should not throw ConfigurationException exception with valid file"); +} + +// load the non-existing file and expect the exception +Assert.expectThrows(NullPointerException.class, +() -> CommonsConfigurationUtils.getSegmentMetadataFromFile(null, true)); Review Comment: we don't need the non-existing file test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1605567663 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,54 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + + public SegmentMetadataPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected void parseProperty(final String line) { +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +String[] keyValue = line.split(getPropertySeparator()); +Preconditions.checkArgument(keyValue.length == 2, "property content split should result in key and value"); +initPropertyName(keyValue[0]); +initPropertyValue(keyValue[1]); +initPropertySeparator(getPropertySeparator()); Review Comment: We are not changing the separator configuration. Also, we only have one [write method ](https://github.com/apache/pinot/pull/12440/files#diff-d500a9bb6a9a2455521f13b93599aeb1db76a0a3a285ffb6a4cef58dcf98e9c7R189), and we can explicitly set the separator value so that in future if default behaviour of commons-configuration2 changes, it wouldn't impact us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1605567663 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,54 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + + public SegmentMetadataPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected void parseProperty(final String line) { +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +String[] keyValue = line.split(getPropertySeparator()); +Preconditions.checkArgument(keyValue.length == 2, "property content split should result in key and value"); +initPropertyName(keyValue[0]); +initPropertyValue(keyValue[1]); +initPropertySeparator(getPropertySeparator()); Review Comment: We are not changing the separator configuration. However, we only have [write method ](https://github.com/apache/pinot/pull/12440/files#diff-d500a9bb6a9a2455521f13b93599aeb1db76a0a3a285ffb6a4cef58dcf98e9c7R189), and we can explicitly set the separator value so that in future if default behaviour of commons-configuration2 changes, it wouldn't impact us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1603867571 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,54 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + + public SegmentMetadataPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected void parseProperty(final String line) { +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +String[] keyValue = line.split(getPropertySeparator()); +Preconditions.checkArgument(keyValue.length == 2, "property content split should result in key and value"); +initPropertyName(keyValue[0]); +initPropertyValue(keyValue[1]); +initPropertySeparator(getPropertySeparator()); Review Comment: currently, how we change the separator configuration while writing the configuration file? Is there a way to force that to be '=' as part of contract of using version 2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1602430072 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,54 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + + public SegmentMetadataPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected void parseProperty(final String line) { +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +String[] keyValue = line.split(getPropertySeparator()); +Preconditions.checkArgument(keyValue.length == 2, "property content split should result in key and value"); +initPropertyName(keyValue[0]); +initPropertyValue(keyValue[1]); +initPropertySeparator(getPropertySeparator()); Review Comment: Good point. I think it only happens if we change the seprator configuration while writing the configuration file. By default, it's `=` and I don't think we want to change that. I can add a specific comment about it for future reference and maybe the unit test around it. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1601978383 ## pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java: ## @@ -50,6 +53,33 @@ public void tearDown() FileUtils.deleteDirectory(TEMP_DIR); } + @Test + public void testSegmentMetadataFromFile() { +// load the existing config and check the properties config instance +try { + PropertiesConfiguration config = CommonsConfigurationUtils + .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true); + assertNotNull(config); + + config.setProperty("testKey", "testValue"); + + // add the segment version header to the file and read it again + CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE, + CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2); + + // reading the property with header. + config = CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true); + assertNotNull(config); + assertEquals(config.getHeader(), "# version=2"); +} catch (Exception ex) { + Assert.fail("should not throw ConfigurationException exception with valid file"); Review Comment: put ex.getMessage() in the failure msg to help debug ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,54 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + + public SegmentMetadataPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected void parseProperty(final String line) { +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +String[] keyValue = line.split(getPropertySeparator()); +Preconditions.checkArgument(keyValue.length == 2, "property content split should result in key and value"); +initPropertyName(keyValue[0]); +initPropertyValue(keyValue[1]); +initPropertySeparator(getPropertySeparator()); Review Comment: would separator be something other than '='? if so, I wonder if the split logic in the util class would fail e.g. ``` String[] headerKeyValue = fileFirstLine.split("="); ``` ## pinot-spi/src/test/java/org/apache/pinot/spi/env/CommonsConfigurationUtilsTest.java: ## @@ -50,6 +53,33 @@ public void tearDown() FileUtils.deleteDirectory(TEMP_DIR); } + @Test + public void testSegmentMetadataFromFile() { +// load the existing config and check the properties config instance +try { + PropertiesConfiguration config = CommonsConfigurationUtils + .getSegmentMetadataFromFile(SEGMENT_METADATA_CONFIG_FILE, true); + assertNotNull(config); + + config.setProperty("testKey", "testValue"); + + // add the segment version header to the file and read it again + CommonsConfigurationUtils.saveSegmentMetadataToFile(config, CONFIG_FILE, + CommonsConfigurationUtils.PROPERTIES_CONFIGURATION_HEADER_VERSION_2); + + // reading the property with header. + config = CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true); + assertNotNull(config); + assertEquals(config.getHeader(), "# version=2"); +} catch (Exception ex) { + Assert.fail("should not throw ConfigurationException exception with valid file"); +} + +// load the non-existing file and expect the exception +Assert.expectThrows(NullPointerException.class, +() -> CommonsConfigurationUtils.getSegmentMetadataFromFile(null, true)); Review Comment: can add another check for non-existing file, in addition to `nu
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1600485846 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/PropertyIOFactoryKind.java: ## @@ -0,0 +1,55 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import org.apache.commons.configuration2.PropertiesConfiguration; + +public enum PropertyIOFactoryKind { + ConfigFileIOFactory { +public String toString() { + return "ConfigFile"; +} + +@Override +public ConfigFilePropertyIOFactory getInstance() { + return new ConfigFilePropertyIOFactory(); +} + }, + SegmentMetadataIOFactory { +public String toString() { + return "SegmentMetadata"; +} +@Override +public SegmentMetadataPropertyIOFactory getInstance() { + return new SegmentMetadataPropertyIOFactory(); +} + }, + DefaultPropertyConfigurationIOFactory { Review Comment: nit: just DefaultIOFactory for short? ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; Review Comment: To be generic, should we rename the SegmentMetadataIOFactory to something like VersionedIOFactory? so that it's not just for reading/writing SegmentMetadata files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1577155394 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -128,13 +155,34 @@ public static PropertiesConfiguration fromFile(File file, boolean setIOFactory, return config; } + /** + * save the segment metadata configuration content into the provided file based on the version header. + * @param propertiesConfiguration a {@link PropertiesConfiguration} instance. + * @param file a {@link File} instance. + * @param versionHeader a {@link String} instance. + */ + public static void saveSegmentMetadataToFile(PropertiesConfiguration propertiesConfiguration, File file, Review Comment: not merging two checks because, in-future, we might want to set the version header with a different IO factory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576806764 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; Review Comment: I think we can use this header version in specific other scenarios, like moving away from the usage of `LegacyListDelimiterHandler` in future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
Jackie-Jiang commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576704306 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; - private CommonsConfigurationUtils() { - } + // usage: default header version of all configurations. + // if properties configuration doesn't contain header version, it will be considered as 1 + public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1"; Review Comment: Maybe keeping a default version: ``` public static final int SEGMENT_METADATA_DEFAULT_VERSION = 1; ``` We probably don't need other constants for versions as version should be int -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
Jackie-Jiang commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576704306 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; - private CommonsConfigurationUtils() { - } + // usage: default header version of all configurations. + // if properties configuration doesn't contain header version, it will be considered as 1 + public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1"; Review Comment: Maybe keeping a default version: ``` public static final int DEFAULT_SEGMENT_METADATA_VERSION = 1; ``` We probably don't need other constants for versions as version should be int -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
Jackie-Jiang commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576701431 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; Review Comment: I think this can be common though. If it is explicitly specified (in any configuration file), we take it; if it is not explicitly specified, we treat it as default version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
Jackie-Jiang commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576701431 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; Review Comment: I think this can be common though. If it is explicitly specified (in any configuration file), we take it; if it is not explicitly specified, we treat it as default version (maybe adding a constant for default version?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576653550 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; - private CommonsConfigurationUtils() { - } + // usage: default header version of all configurations. + // if properties configuration doesn't contain header version, it will be considered as 1 + public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1"; Review Comment: I don't think we need "1" and can use `null` if there is no header from segment metadata file. How about name the constant below `SEGMENT_METADATA_FILE_FORMAT_VERSION = 1`, and leave some comments about what the new format would be like. ``` public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_2 = "2"; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576653550 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; - private CommonsConfigurationUtils() { - } + // usage: default header version of all configurations. + // if properties configuration doesn't contain header version, it will be considered as 1 + public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1"; Review Comment: I don't think we need "1" and can use `null` if there is no header from segment metadata file. How about `SEGMENT_METADATA_FILE_FORMAT_VERSION = 1` and use null as default. And leave some comments about what the new format would be like. ``` public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_2 = "2"; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576653550 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; - private CommonsConfigurationUtils() { - } + // usage: default header version of all configurations. + // if properties configuration doesn't contain header version, it will be considered as 1 + public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1"; Review Comment: I don't think we need "1" and can use `null` if there is no header from segment metadata file. The var name is not good as it encodes the value in the name. how about `SEGMENT_METADATA_FILE_FORMAT_VERSION = 1` and use null as default. And better leave some comments about what the new format version would be like. ``` public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_2 = "2"; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576653550 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; - private CommonsConfigurationUtils() { - } + // usage: default header version of all configurations. + // if properties configuration doesn't contain header version, it will be considered as 1 + public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1"; Review Comment: I don't think we need "1" and can use `null` if there is no header from segment metadata file. The var name is not good as it encodes the value in the name. how about `SEGMENT_METADATA_VERSION = 1` as default it's null. and do it need to be public? ``` public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_2 = "2"; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1576650089 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; Review Comment: SEGMENT_METADATA_VERSION_HEADER_IDENTIFIER = "version" as we only use versions to decide the reader/writer for the segment metadata, and leaving all other users of this config util intact. ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -271,20 +319,73 @@ public static String recoverSpecialCharacterInPropertyValue(String value) { return value.replace("\0\0", ","); } - private static PropertiesConfiguration createPropertiesConfiguration(boolean setIOFactory, - boolean setDefaultDelimiter) { + /** + * creates the instance of the {@link org.apache.commons.configuration2.PropertiesConfiguration} + * with custom IO factory based on kind {@link org.apache.commons.configuration2.PropertiesConfiguration.IOFactory} + * and legacy list delimiter {@link org.apache.commons.configuration2.convert.LegacyListDelimiterHandler} + * + * @param setDefaultDelimiter sets the default list delimiter. + * @param ioFactoryKind IOFactory kind + * @return PropertiesConfiguration + */ + private static PropertiesConfiguration createPropertiesConfiguration(boolean setDefaultDelimiter, + PropertyIOFactoryKind ioFactoryKind) { PropertiesConfiguration config = new PropertiesConfiguration(); -// setting IO Reader Factory -if (setIOFactory) { - config.setIOFactory(new ConfigFilePropertyReaderFactory()); -} +// setting IO Reader Factory of the configuration. +config.setIOFactory(createPropertyIOFactory(ioFactoryKind)); -// setting DEFAULT_LIST_DELIMITER +// setting the DEFAULT_LIST_DELIMITER if (setDefaultDelimiter) { config.setListDelimiterHandler(new LegacyListDelimiterHandler(DEFAULT_LIST_DELIMITER)); } return config; } + + /** + * Creates the IOFactory based on the provided kind. + * @param ioFactoryKind IOFactory kind + * @return IOFactory + */ + private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind ioFactoryKind) { Review Comment: We can define a get() or getInstance() method in the Enum `PropertyIOFactoryKind` so that we can remove this helper method createPropertyIOFactory(). ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -128,6 +156,18 @@ public static PropertiesConfiguration fromFile(File file, boolean setIOFactory, return config; } + public static void saveSegmentMetadataToFile(PropertiesConfiguration propertiesConfiguration, File file, + String versionHeader) { +if (StringUtils.isNotEmpty(versionHeader)) { Review Comment: I think you can do the check as part of the if-check condition. ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -74,19 +65,19 @@ public static PropertiesConfiguration fromInputStream(InputStream stream) */ public static PropertiesConfiguration fromPath(String path) throws ConfigurationException { -return fromPath(path, false, true); +return fromPath(path, true, PropertyIOFactoryKind.DefaultPropertyConfigurationIOFactory); } /** * Instantiate a {@link PropertiesConfiguration} from an {@link String}. * @param path representing the path of file - * @param setIOFactory representing to set the IOFactory or not * @param setDefaultDelimiter representing to set the default list delimiter. * @return a {@link PropertiesConfiguration} instance. */ - public static PropertiesConfiguration fromPath(String path, boolean setIOFactory, boolean setDefaultDelimiter) + public static PropertiesConfiguration fromPath(String path, boolean setDefaultDelimiter, + PropertyIOFactoryKind ioFactoryKind) Review Comment: how about pass in `@Nullable IOFactory ioFactory` to those util methods? so that we decouple those util methods from the logic of deciding the type and initializing the ioFactory object. ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -43,28 +46,16 @@ */ public class CommonsConfigurationUtils { private static final Character DEFAULT_LIST_DELIMITER = ','; + public static final String VERSION_HEADER_IDENTIFIER = "version"; - private CommonsConfigurationUtils() { - } + // usage: default header version of all configurations. + // if properties configuration doesn't contain header version, it will be considered as 1 + public static final String PROPERTIES_CONFIGURATION_HEADER_VERSION_1 = "1"; Review Com
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1575127192 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/PropertyIOFactoryKind.java: ## @@ -0,0 +1,25 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +public enum PropertyIOFactoryKind { Review Comment: I plan to use these enums to define table/cluster configs. I want to keep these enums as of now, and if, in the next step of work, found not very helpful, I will delete them. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1569256706 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -287,4 +337,51 @@ private static PropertiesConfiguration createPropertiesConfiguration(boolean set return config; } + + /** + * Creates the IOFactory based on the provided kind. + * + * @param ioFactoryKind IOFactory kind + * @return IOFactory + */ + private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind ioFactoryKind) { +switch (ioFactoryKind) { + case ConfigFileIOFactory: +return new ConfigFilePropertyIOFactory(); + case SegmentMetadataIOFactory: +return new SegmentMetadataPropertyIOFactory(); + default: +return new PropertiesConfiguration.DefaultIOFactory(); +} + } + + /** + * checks whether the configuration file first line is version header or not. + * @param file configuration file + * @return String + * @throws ConfigurationException exception. + */ + private static String getConfigurationHeaderVersion(File file) + throws ConfigurationException { +String versionValue = PropertyIOFactoryKind.DefaultPropertyConfigurationIOFactory.getVersion(); +if (file.exists()) { + try (BufferedReader reader = new BufferedReader(new FileReader(file))) { +String fileFirstLine = reader.readLine(); +// header version is written as a comment and start with '# ' +String versionHeaderCommentPrefix = String.format("# %s", VERSION_HEADER_IDENTIFIER); +// check whether the segment has the version header or not Review Comment: nit: s/segment/file, as this method doesn't have the concept of `segment` ## pinot-spi/src/test/java/org/apache/pinot/spi/env/SegmentMetadataPropertyConfigTest.java: ## @@ -0,0 +1,213 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.File; +import java.io.IOException; +import java.util.List; +import java.util.Objects; +import org.apache.commons.configuration2.PropertiesConfiguration; +import org.apache.commons.configuration2.ex.ConfigurationException; +import org.apache.commons.io.FileUtils; +import org.apache.commons.lang3.RandomStringUtils; +import org.testng.annotations.AfterClass; +import org.testng.annotations.AfterMethod; +import org.testng.annotations.BeforeClass; +import org.testng.annotations.Test; + +import static org.testng.Assert.assertEquals; +import static org.testng.Assert.assertNotNull; +import static org.testng.Assert.assertNull; + + +public class SegmentMetadataPropertyConfigTest { + private static final File TEMP_DIR = new File(FileUtils.getTempDirectory(), "SegmentMetadataPropertyConfigTest"); + private static final File CONFIG_FILE = new File(TEMP_DIR, "config"); + private static final String[] TEST_PROPERTY_KEY = { "test1", "test2_key", "test3_key_", + "test4_key_1234", "test-1", "test.1" }; + private static final String[] TEST_PROPERTY_KEY_WITH_SPECIAL_CHAR = { "test:1", "test2=key", + "test3,key_", "test4:=._key_1234", "test5-1=" }; + + @BeforeClass + public void setUp() + throws IOException { +FileUtils.deleteDirectory(TEMP_DIR); + } + + @AfterClass + @AfterMethod + public void tearDown() + throws IOException { +FileUtils.deleteDirectory(TEMP_DIR); + } + + @Test + public void testSegmentMetadataPropertyConfiguration() + throws ConfigurationException { +PropertiesConfiguration configuration = +CommonsConfigurationUtils.getSegmentMetadataFromFile(CONFIG_FILE, true, true); + +// setting the random value of the test keys +for (String key: TEST_PROPERTY_KEY) { + configuration.setProperty(key, RandomStringUtils.randomAscii(5)); +} +// recovered keys from the configuration. +List recoveredKeys = CommonsConfigurationUtils.getKeys(configuration); +testPropertyKeys(recoveredKeys, TEST_PROPERTY_KEY); + +CommonsConfigurationUtils.saveSegmentMetadataToFile(configuration, CONFIG_FILE, null); // save the configuration. + +// reading the configuration from saved file. +configuration =
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on PR #12440: URL: https://github.com/apache/pinot/pull/12440#issuecomment-2046294955 @Jackie-Jiang / @klsince, please review it once more. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
Jackie-Jiang commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1554429709 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -287,4 +337,46 @@ private static PropertiesConfiguration createPropertiesConfiguration(boolean set return config; } + + /** + * Creates the IOFactory based on the provided kind. + * + * @param ioFactoryKind IOFactory kind + * @return IOFactory + */ + private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind ioFactoryKind) { +switch (ioFactoryKind) { + case ConfigFileIOFactory: +return new ConfigFilePropertyIOFactory(); + case SegmentMetadataIOFactory: +return new SegmentMetadataPropertyIOFactory(); + default: +return new PropertiesConfiguration.DefaultIOFactory(); +} + } + + /** + * checks whether the segment metadata file first line is version header or not. + * @param segmentMetadataFile segment metadata file + * @return boolean + * @throws ConfigurationException exception. + */ + private static boolean checkHeaderExistsInSegmentMetadata(File segmentMetadataFile) Review Comment: Can we make this method return the version number of the file? Then we can have different writer/reader pair for different version number. In the current PR we treat `DefaultPropertyConfigurationIOFactory` as v1, and `SegmentMetadataIOFactory` as v2 which is not explicit. Consider grouping them with version number. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
Jackie-Jiang commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1554426767 ## pinot-spi/src/test/resources/segment-metadata-without-version-header.properties: ## @@ -0,0 +1,123 @@ +segment.padding.character = \u +segment.name = fineFoodReviews_OFFLINE_0 +segment.table.name = fineFoodReviews +segment.dimension.column.names = ProductId,Score,Summary,Text,UserId,combined,embedding,n_tokens +segment.total.docs = 1000 +custom.input.data.file.uri = ~/examples/batch/fineFoodReviews/rawdata/fine_food_reviews_with_embeddings_1k.parquet.gzip +column.ProductId.cardinality = 868 +column.ProductId.totalDocs = 1000 +column.ProductId.dataType = STRING +column.ProductId.bitsPerElement = 10 +column.ProductId.lengthOfEachEntry = 10 +column.ProductId.columnType = DIMENSION +column.ProductId.isSorted = false +column.ProductId.hasDictionary = true +column.ProductId.isSingleValues = true +column.ProductId.maxNumberOfMultiValues = 0 +column.ProductId.totalNumberOfEntries = 1000 +column.ProductId.isAutoGenerated = false +column.ProductId.minValue = 7310172001 Review Comment: Can we add some escaped values in this file? We want to ensure the escaped values can be properly read. Same for the v2 config -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
Jackie-Jiang commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1554426108 ## pinot-spi/src/test/resources/segment-metadata-with-version-header.properties: ## @@ -0,0 +1,125 @@ +# segment.metadata.version=version1 Review Comment: Why is it start from `version1`? I'd suggest treating the one without header as version 1, and the new format starts as version 2. Also suggest simplifying it as `# version=2`. This is a general setting for all configurations, instead of just the segment metadata -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1552619903 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -84,9 +90,10 @@ public static PropertiesConfiguration fromPath(String path) * @param setDefaultDelimiter representing to set the default list delimiter. * @return a {@link PropertiesConfiguration} instance. */ - public static PropertiesConfiguration fromPath(String path, boolean setIOFactory, boolean setDefaultDelimiter) + public static PropertiesConfiguration fromPath(String path, boolean setIOFactory, boolean setDefaultDelimiter, Review Comment: Well this is the util class, I think having this as public make sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1552619464 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -54,7 +59,8 @@ private CommonsConfigurationUtils() { */ public static PropertiesConfiguration fromFile(File file) throws ConfigurationException { -return fromFile(file, false, true); +return fromFile(file, false, true, +PropertyIOFactoryKind.DefaultPropertyConfigurationIOFactory); Review Comment: imo, The default IOFactory should be the `PropertiesConfiguration.DefaultIOFactory()`, which is provided by commons-configuration2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1548331232 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -287,4 +342,42 @@ private static PropertiesConfiguration createPropertiesConfiguration(boolean set return config; } + + /** + * Creates the IOFactory based on the provided kind. + * + * @param ioFactoryKind IOFactory kind + * @return IOFactory + */ + private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind ioFactoryKind) { +switch (ioFactoryKind) { + case ConfigFileIOFactory: +return new ConfigFilePropertyIOFactory(); + case SegmentMetadataIOFactory: +return new SegmentMetadataPropertyIOFactory(); + default: +return new PropertiesConfiguration.DefaultIOFactory(); +} + } + + /** + * checks whether the segment metadata file first line is version header or not. + * @param segmentMetadataFile segment metadata file + * @return boolean + * @throws IOException exception. + */ + private static boolean checkHeaderExistsInSegmentMetadata(File segmentMetadataFile) throws IOException { +Preconditions.checkNotNull(segmentMetadataFile, Review Comment: do we need to do null check? I don't feel so, but if there is need, let's annotate `@Nullable File segmentMetadataFile`. ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -287,4 +342,42 @@ private static PropertiesConfiguration createPropertiesConfiguration(boolean set return config; } + + /** + * Creates the IOFactory based on the provided kind. + * + * @param ioFactoryKind IOFactory kind + * @return IOFactory + */ + private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind ioFactoryKind) { +switch (ioFactoryKind) { + case ConfigFileIOFactory: +return new ConfigFilePropertyIOFactory(); + case SegmentMetadataIOFactory: +return new SegmentMetadataPropertyIOFactory(); + default: +return new PropertiesConfiguration.DefaultIOFactory(); +} + } + + /** + * checks whether the segment metadata file first line is version header or not. + * @param segmentMetadataFile segment metadata file + * @return boolean + * @throws IOException exception. + */ + private static boolean checkHeaderExistsInSegmentMetadata(File segmentMetadataFile) throws IOException { +Preconditions.checkNotNull(segmentMetadataFile, +"Segment Metadata file object can not be null for loading configurations"); + +if (segmentMetadataFile.exists()) { + BufferedReader reader = new BufferedReader(new FileReader(segmentMetadataFile)); Review Comment: try this to close reader even upon any exception and save a few lines. ``` try(BufferredReader reader = ...) { return StringUtils.startWith(...); } ``` ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -84,9 +90,10 @@ public static PropertiesConfiguration fromPath(String path) * @param setDefaultDelimiter representing to set the default list delimiter. * @return a {@link PropertiesConfiguration} instance. */ - public static PropertiesConfiguration fromPath(String path, boolean setIOFactory, boolean setDefaultDelimiter) + public static PropertiesConfiguration fromPath(String path, boolean setIOFactory, boolean setDefaultDelimiter, Review Comment: looks like those fromXXX(, ... PropertyIOFactoryKind ioFactoryKind) can be `private` methods for now? ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -128,6 +161,18 @@ public static PropertiesConfiguration fromFile(File file, boolean setIOFactory, return config; } + public static void saveSegmentMetadataToFile(PropertiesConfiguration propertiesConfiguration, File file, + String versionHeader) { + if (StringUtils.isNotEmpty(versionHeader)) { +String header = String.format("%s=%s", SEGMENT_VERSION_IDENTIFIER, versionHeader); +propertiesConfiguration.setHeader(header); + +// setIO appropriate factory Review Comment: the comment was not completed? ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -287,4 +342,42 @@ private static PropertiesConfiguration createPropertiesConfiguration(boolean set return config; } + + /** + * Creates the IOFactory based on the provided kind. + * + * @param ioFactoryKind IOFactory kind + * @return IOFactory + */ + private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind ioFactoryKind) { +switch (ioFactoryKind) { + case ConfigFileIOFactory: +return new ConfigFilePropertyIOFactory(); + case SegmentMetadataIOFactory: +return new SegmentMetadataPropertyIOFactory(); + defa
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on PR #12440: URL: https://github.com/apache/pinot/pull/12440#issuecomment-2013435999 Based on the offline discussion, we decided to have pair of separate custom segment metadata reader/writer. We will decide to use custom reader/writer based on the version of the segment metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1529050750 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,85 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import java.util.List; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + private boolean _skipUnescapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyReader(Reader reader, String segmentMetadataVersionHeader) { +super(reader); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected void parseProperty(final String line) { +// if newer version of the segment metadata(based on version value in the property configuration header) +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +if (!_skipUnescapePropertyName && getCommentLines().size() > 0) { + setSkipUnescapePropertyNameFlag(); +} + +if (_skipUnescapePropertyName) { + String[] keyValue = line.split(getPropertySeparator()); + Preconditions.checkArgument(keyValue.length == 2, "property content split should result in key and value"); + initPropertyName(keyValue[0]); + initPropertyValue(keyValue[1]); + initPropertySeparator(getPropertySeparator()); +} else { + // for backward compatability, follow the default approach + super.parseProperty(line); +} + } + + @Override + protected String unescapePropertyName(final String name) { +// skip the unescaping of the propertyName(key), if newer version of the segment metadata. +if (_skipUnescapePropertyName) { + return name; +} +return super.unescapePropertyName(name); + } + + // set the `_skipUnescapePropertyName` as true if the header comment line contains the segment metadata version. + private void setSkipUnescapePropertyNameFlag() { +List commentLines = getCommentLines(); +if (commentLines.size() > 0) { + // assumes that header will have two lines + // first one with segment version header + // and second as a new line. + String headerComment = commentLines.get(0); +if (headerComment.contains(_segmentMetadataVersionHeader)) { Review Comment: format? and why need to assume there are `two lines`? looks like we just use the 1st line ## pinot-spi/src/test/resources/segment-metadata-with-version-header.properties: ## @@ -0,0 +1,125 @@ +# segment.metadata.version=version1 Review Comment: how did this header/comment line get written to the file? I must have missed it as I didn't notice any logic related to writing out this line in the SegmentMetadataPropertyWriter class ## pinot-spi/src/main/java/org/apache/pinot/spi/env/PropertyIOFactoryKind.java: ## @@ -0,0 +1,25 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1525985967 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java: ## @@ -0,0 +1,67 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.IOException; +import java.io.Writer; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter; +import org.apache.commons.configuration2.convert.ListDelimiterHandler; + + +/** + * SegmentMetadataPropertyWriter extends the PropertiesWriter + * + * Purpose: custom property writer for writing the segment metadata faster by skipping the escaping of key. + */ +public class SegmentMetadataPropertyWriter extends PropertiesWriter { + private boolean _skipEscapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyWriter(final Writer writer, ListDelimiterHandler handler, + String segmentMetadataVersionHeader) { +super(writer, handler); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected String escapeKey(final String key) { +// skip the escapeKey functionality, if segment metadata has a newer version +// if not newer version, follow the escape for backward compatibility. +if (_skipEscapePropertyName) { + return key; +} +return super.escapeKey(key); + } + + @Override + public void writeln(final String s) throws IOException { Review Comment: Let me try to provide a little more context around how in commons-configuration2, properties are getting written to the file or read from the file - Each PropertiesConfiguration object in config2 is associated with the [PropertiesConfigurationLayout](https://commons.apache.org/proper/commons-configuration/apidocs/org/apache/commons/configuration2/PropertiesConfigurationLayout.html) and this object helps in preserving the comments, header and footer comment of the PropertiesConfiguration. - Header is a global comment that can be set for the properties file. This comment is written at the very start of the file, followed by an empty line. [[reference](https://commons.apache.org/proper/commons-configuration/apidocs/org/apache/commons/configuration2/PropertiesConfigurationLayout.html#setHeaderComment(java.lang.String))]. - With above statement, your concern > which may be too late as we already parse/write many lines in the expensive ways is not valid, since before writing the first property we will be able to set the flag value(`_skipEscapePropertyName`). Header comment lines are the first lines written to file. [[code reference](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfigurationLayout.java#L489)]. However I understand it's not a clean implementation. I think introducing the separate new method (say `saveSegmentMetadata`) in `CommonsConfigurationUtils` can work for us. Before saving, we will set the IOFactory again based on table or cluster config and also by that moment version header content will also be available. I have added test case with segment metadata resources [with](https://github.com/apache/pinot/pull/12440/files#diff-9e5c3dcfcbdf8a9ae4d2cbdef02df3805e931990151c47d4d3030f2ecf39b63a) and [without](https://github.com/apache/pinot/pull/12440/files#diff-8d406f553720a36462fae94ddd6c682ea4d9d5b1eebae3f9425d14d10537aa63) header. Please refer. For `SegmentMetadataPropertyReader`, I think the current implementation is correct since the header comment is the first thing read from the file and also it's going to be the only comment present in the segment metadata properties file, since we don't set any other comment. Also, imo, opening/closing the file first to determine the header will be more expensive. please let me know if it make sense or you have any further comments. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1523970058 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java: ## @@ -0,0 +1,67 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.IOException; +import java.io.Writer; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter; +import org.apache.commons.configuration2.convert.ListDelimiterHandler; + + +/** + * SegmentMetadataPropertyWriter extends the PropertiesWriter + * + * Purpose: custom property writer for writing the segment metadata faster by skipping the escaping of key. + */ +public class SegmentMetadataPropertyWriter extends PropertiesWriter { + private boolean _skipEscapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyWriter(final Writer writer, ListDelimiterHandler handler, + String segmentMetadataVersionHeader) { +super(writer, handler); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected String escapeKey(final String key) { +// skip the escapeKey functionality, if segment metadata has a newer version +// if not newer version, follow the escape for backward compatibility. +if (_skipEscapePropertyName) { + return key; +} +return super.escapeKey(key); + } + + @Override + public void writeln(final String s) throws IOException { Review Comment: Thanks for the detailed explanation. I still didn't follow why we have to decide version while parsing or writing the lines, which may be too late as we already parse/write many lines in the expensive ways. Could you provide a example segment metadata file with version header? I feel my understanding of adding comments into segment metadata file to include version info seems different from yours (or how the config lib works). For `SegmentMetadataPropertyReader`, I was thinking that we could read comment ourselves (just 1st line to be simple as we control the writing logic and we can just write a single comment to the config file), and then check if there is version header there. If there is a version, then use the new reader, which does no escaping and no regex; if no version info in the comment, the we continue to use the old reader to read segment metadata. ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -100,24 +103,39 @@ public static PropertiesConfiguration fromPath(String path, boolean setIOFactory * @return a {@link PropertiesConfiguration} instance. */ public static PropertiesConfiguration fromInputStream(InputStream stream, boolean setIOFactory, - boolean setDefaultDelimiter) + boolean setDefaultDelimiter, PropertyIOFactoryKind ioFactoryKind) throws ConfigurationException { -PropertiesConfiguration config = createPropertiesConfiguration(setIOFactory, setDefaultDelimiter); +PropertiesConfiguration config = createPropertiesConfiguration(setIOFactory, setDefaultDelimiter, ioFactoryKind); FileHandler fileHandler = new FileHandler(config); fileHandler.load(stream); return config; } + public static PropertiesConfiguration segmentMetadataFromFile(File file, boolean setIOFactory, + boolean setDefaultDelimiter, PropertyIOFactoryKind ioFactoryKind, String segmentVersionHeader) + throws ConfigurationException { +return fromFile(file, setIOFactory, setDefaultDelimiter, ioFactoryKind, segmentVersionHeader); + } + + public static PropertiesConfiguration fromFile(File file, boolean setIOFactory, + boolean setDefaultDelimiter, PropertyIOFactoryKind ioFactoryKind) + throws ConfigurationException { +return fromFile(file, setIOFactory, setDefaultDelimiter, ioFactoryKind, ""); Review Comment: I see both null and `""` are used by default, anything special to use either one? I'd assume we pass null by default ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -287,4 +321,23 @@ pri
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on PR #12440: URL: https://github.com/apache/pinot/pull/12440#issuecomment-1992849797 @Jackie-Jiang / @klsince, any further comments on the implementation side. I am going to start work on it's usage based on the cluster configuration or table configuration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1519539765 ## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/V1Constants.java: ## @@ -65,6 +65,8 @@ public static class MetadataKeys { public static class Segment { public static final String SEGMENT_CREATOR_VERSION = "creator.version"; public static final String SEGMENT_NAME = "segment.name"; + + public static final String SEGMENT_METADATA_VERSION = "segment.metadata.version"; Review Comment: I am planning to use this constant for writing new segment metadata, however that work I am planning to do in two other PRs. So, this is unused from this PR perspective. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on PR #12440: URL: https://github.com/apache/pinot/pull/12440#issuecomment-1988175225 > Can we add a test to ensure it can read the old format properly? We can put a old format file into the resource folder, and use the new reader to read it Added the required tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
Jackie-Jiang commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1516797586 ## pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/V1Constants.java: ## @@ -65,6 +65,8 @@ public static class MetadataKeys { public static class Segment { public static final String SEGMENT_CREATOR_VERSION = "creator.version"; public static final String SEGMENT_NAME = "segment.name"; + + public static final String SEGMENT_METADATA_VERSION = "segment.metadata.version"; Review Comment: Is this used? I cannot find any usage of it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1515359972 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,82 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import java.util.List; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + private boolean _skipUnescapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyReader(Reader reader, String segmentMetadataVersionHeader) { +super(reader); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected void parseProperty(final String line) { +// if newer version of the segment metadata(based on version value in the property configuration header) +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +if (!_skipUnescapePropertyName) { Review Comment: Added extra condition(`getCommentLines().size() > 0`) to skip some of the unnecessary check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1512037860 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,82 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import java.util.List; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + private boolean _skipUnescapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyReader(Reader reader, String segmentMetadataVersionHeader) { +super(reader); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected void parseProperty(final String line) { +// if newer version of the segment metadata(based on version value in the property configuration header) +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +if (!_skipUnescapePropertyName) { Review Comment: True, this if-check will run for every property if the header is not present in the segment metadata. We need to introduce some more conditions here. Setting the flag value in the constructor wouldn't work if we want to set the value based on the header value present. Config with the header is determined when the PropertiesConfiguration reader reads the entire file content. Header is the first comment line, and it's set after reading the first property from the file([code reference](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfigurationLayout.java#L672)). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1512015424 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java: ## @@ -0,0 +1,67 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.IOException; +import java.io.Writer; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter; +import org.apache.commons.configuration2.convert.ListDelimiterHandler; + + +/** + * SegmentMetadataPropertyWriter extends the PropertiesWriter + * + * Purpose: custom property writer for writing the segment metadata faster by skipping the escaping of key. + */ +public class SegmentMetadataPropertyWriter extends PropertiesWriter { + private boolean _skipEscapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyWriter(final Writer writer, ListDelimiterHandler handler, + String segmentMetadataVersionHeader) { +super(writer, handler); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected String escapeKey(final String key) { +// skip the escapeKey functionality, if segment metadata has a newer version +// if not newer version, follow the escape for backward compatibility. +if (_skipEscapePropertyName) { + return key; +} +return super.escapeKey(key); + } + + @Override + public void writeln(final String s) throws IOException { Review Comment: Good question. In the first attempt, I was setting `_skipEscapePropertyName` in the constructor itself. However, in our current implementation, We set the IOFactory along with properties configuration object creation([code reference](https://github.com/apache/pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java#L279)). So this flag value based on Header content was always getting set to `false`. That's why, I decided to populate the flag value when we started writing the data into the file. After debugging, I discovered that header content is always the first line written to the file, and it's being treated as the comment line in commons-configuration2[[code reference](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfigurationLayout.java#L489)], so, in my opinion, we don't need any escaping, and no property getting written before this. My only worry is that in the case of segment metadata without any header, this block executes every time we try to write the line. I need input to make this check run more conditional. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1511713277 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java: ## @@ -0,0 +1,67 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.IOException; +import java.io.Writer; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter; +import org.apache.commons.configuration2.convert.ListDelimiterHandler; + + +/** + * SegmentMetadataPropertyWriter extends the PropertiesWriter + * + * Purpose: custom property writer for writing the segment metadata faster by skipping the escaping of key. + */ +public class SegmentMetadataPropertyWriter extends PropertiesWriter { + private boolean _skipEscapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyWriter(final Writer writer, ListDelimiterHandler handler, + String segmentMetadataVersionHeader) { +super(writer, handler); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected String escapeKey(final String key) { +// skip the escapeKey functionality, if segment metadata has a newer version +// if not newer version, follow the escape for backward compatibility. +if (_skipEscapePropertyName) { + return key; +} +return super.escapeKey(key); + } + + @Override + public void writeln(final String s) throws IOException { Review Comment: should we set this flag `_skipEscapePropertyName` in the constructor method? iiuc, we set this once encountering the special property. If so the properties written before seeing this property are handled with escaping (?) Also, how do you write the special property as a header comment? I must have missed it. ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,82 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import java.util.List; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + private boolean _skipUnescapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyReader(Reader reader, String segmentMetadataVersionHeader) { +super(reader); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected void parseProperty(final String line) { +// if newer version of the segment metadata(based on version value in the property configuration header) +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +if (!_skipUnescapePropertyName) { Review Comment: for config file w/o this header, it seems like we'd have to enter this if-check for every property in the config file? why not set check and set this flag in the constructor method? -- This is an
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1511264039 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,82 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import com.google.common.base.Preconditions; +import java.io.Reader; +import java.util.List; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +/** + * SegmentMetadataPropertyReader extends the PropertiesReader + * + * Purpose: loads the segment metadata faster + * - by skipping the unescaping of key and + * - parsing the line by splitting based on first occurrence of separator + */ +class SegmentMetadataPropertyReader extends PropertiesReader { + private boolean _skipUnescapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyReader(Reader reader, String segmentMetadataVersionHeader) { +super(reader); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected void parseProperty(final String line) { +// if newer version of the segment metadata(based on version value in the property configuration header) +// skip the regex based parsing of the line content and splitting the content based on first occurrence of separator +if (!_skipUnescapePropertyName) { Review Comment: - **Explanation**: We override this method to set the `_skipUnescapePropertyName` flag. - **Assumption**: We will not set the version header for the old segment metadata. If we decide to add the header for old segment metadata, in that case we have to compare the version value as well. Header comment is the first content read from the config file, so it will set the flag value if `segment.metadata.version` header is present for the segment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1511260848 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyWriter.java: ## @@ -0,0 +1,67 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.IOException; +import java.io.Writer; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesWriter; +import org.apache.commons.configuration2.convert.ListDelimiterHandler; + + +/** + * SegmentMetadataPropertyWriter extends the PropertiesWriter + * + * Purpose: custom property writer for writing the segment metadata faster by skipping the escaping of key. + */ +public class SegmentMetadataPropertyWriter extends PropertiesWriter { + private boolean _skipEscapePropertyName; + private final String _segmentMetadataVersionHeader; + + public SegmentMetadataPropertyWriter(final Writer writer, ListDelimiterHandler handler, + String segmentMetadataVersionHeader) { +super(writer, handler); +_segmentMetadataVersionHeader = segmentMetadataVersionHeader; + } + + @Override + protected String escapeKey(final String key) { +// skip the escapeKey functionality, if segment metadata has a newer version +// if not newer version, follow the escape for backward compatibility. +if (_skipEscapePropertyName) { + return key; +} +return super.escapeKey(key); + } + + @Override + public void writeln(final String s) throws IOException { Review Comment: - **Explanation**: We override this method to set the `_skipEscapePropertyName` flag. - **Assumption**: We will not set the version header for the old segment metadata. If we decide to add the header for old segment metadata, in that case we have to compare the version value. Header comment is the first get written to config file, so it will set the flag value if `segment.metadata.version` header is present for the segment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1509441264 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -287,4 +321,35 @@ private static PropertiesConfiguration createPropertiesConfiguration(boolean set return config; } + + /** + * Creates the IOFactory based on the provided kind. + * + * @param ioFactoryKind IOFactory kind + * @param config to get the header content from the property configuration + * @param headerContentToCheck header content to validate. + * @return IOFactory + */ + private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind ioFactoryKind, + PropertiesConfiguration config, String headerContentToCheck) { +switch (ioFactoryKind) { + case ConfigFileIOFactory: +return new ConfigFilePropertyIOFactory(); + case SegmentMetadataIOFactory: +String headerContent = config.getHeader(); +boolean skipEscapeUnescapePropertyName = +shouldSkipEscapeUnescapePropertyName(headerContent, headerContentToCheck); +return new SegmentMetadataPropertyIOFactory(skipEscapeUnescapePropertyName); + case DefaultPropertyConfigurationIOFactory: + default: +return new PropertiesConfiguration.DefaultIOFactory(); +} + } + + private static boolean shouldSkipEscapeUnescapePropertyName(String header, String headerContentToCheck) { Review Comment: We are introducing the segment metadata version header, so it should not be present for old metadata properties. That's why checking whether the header contains `segment.metadata.version` or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1509441264 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/CommonsConfigurationUtils.java: ## @@ -287,4 +321,35 @@ private static PropertiesConfiguration createPropertiesConfiguration(boolean set return config; } + + /** + * Creates the IOFactory based on the provided kind. + * + * @param ioFactoryKind IOFactory kind + * @param config to get the header content from the property configuration + * @param headerContentToCheck header content to validate. + * @return IOFactory + */ + private static IOFactory createPropertyIOFactory(PropertyIOFactoryKind ioFactoryKind, + PropertiesConfiguration config, String headerContentToCheck) { +switch (ioFactoryKind) { + case ConfigFileIOFactory: +return new ConfigFilePropertyIOFactory(); + case SegmentMetadataIOFactory: +String headerContent = config.getHeader(); +boolean skipEscapeUnescapePropertyName = +shouldSkipEscapeUnescapePropertyName(headerContent, headerContentToCheck); +return new SegmentMetadataPropertyIOFactory(skipEscapeUnescapePropertyName); + case DefaultPropertyConfigurationIOFactory: + default: +return new PropertiesConfiguration.DefaultIOFactory(); +} + } + + private static boolean shouldSkipEscapeUnescapePropertyName(String header, String headerContentToCheck) { Review Comment: We are introducing the segment metadata version header, so it should not be present for old metadata properties. That's why checking whether the header contains `segment.metadata.version` or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
abhioncbr commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1503606292 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +public class SegmentMetadataPropertyReader extends PropertiesReader { + + public SegmentMetadataPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected String unescapePropertyName(final String name) { Review Comment: Actually, the regex support in `doParseProperty` gives support to[ multiple separators](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfiguration.java#L460) which can be configured. By default it's `=` and `:`([code link](https://github.com/apache/commons-configuration/blob/master/src/main/java/org/apache/commons/configuration2/PropertiesConfiguration.java#L1027)) Do we want to be very specific with `=` or give some support to the default separator? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
Re: [PR] Custom configuration property reader for segment metadata files [pinot]
klsince commented on code in PR #12440: URL: https://github.com/apache/pinot/pull/12440#discussion_r1503346467 ## pinot-spi/src/main/java/org/apache/pinot/spi/env/SegmentMetadataPropertyReader.java: ## @@ -0,0 +1,35 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.env; + +import java.io.Reader; +import org.apache.commons.configuration2.PropertiesConfiguration.PropertiesReader; + + +public class SegmentMetadataPropertyReader extends PropertiesReader { + + public SegmentMetadataPropertyReader(Reader reader) { +super(reader); + } + + @Override + protected String unescapePropertyName(final String name) { Review Comment: in addition to this method, should we also consider to override this method of `PropertiesReader` ``` protected void parseProperty(final String line) { final String[] property = doParseProperty(line, false); <--- expensive using regex to split `key = value` ``` I think we might just use the index of the first `=` to split the key and value up -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org