[GitHub] [incubator-pinot] haibow commented on a change in pull request #5345: Create PULL_REQUEST_TEMPLATE.md
haibow commented on a change in pull request #5345: URL: https://github.com/apache/incubator-pinot/pull/5345#discussion_r422594460 ## File path: .github/PULL_REQUEST_TEMPLATE.md ## @@ -0,0 +1,13 @@ +## Upgrade Notes +Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion) +* [ ] Yes (Please label as **backward-incompat**) + +Does this PR fix a zero-downtime upgrade introduced earlier? +* [ ] Yes (Please label this as **backward-incompat**) + +Does this PR otherwise need attention when creating release notes? Things to consider: +- New configuration options +- Deprecation of configurations +- Signature changes to public methods/interfaces +- New plugins added or old plugins removed +* [ ] Yes (Please label this PR as **release-notes**) Review comment: Even with the "Things to consider", the release notes section is still just a Yes/No question. When the answer is Yes, better explicitly ask the author to prepare some release notes, to make the release master's life easier. e.g. (borrowing the norm from Presto) ``` == RELEASE NOTES == General Changes * ... * ... ``` If release note is NOT required, use: ``` == NO RELEASE NOTE == ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch master updated (9246d7e -> 44b0a91)
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. from 9246d7e Update package.json to resolve sharp version to fix the local build issue. (#5215) add 44b0a91 Fix NoSuchMethodError on ServerConfiguration.addHttpHandler (#5357) No new revisions were added by this update. Summary of changes: pom.xml | 6 -- 1 file changed, 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] codecov-io commented on pull request #5357: Fix NoSuchMethodError on ServerConfiguration.addHttpHandler
codecov-io commented on pull request #5357: URL: https://github.com/apache/incubator-pinot/pull/5357#issuecomment-626278291 # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5357?src=pr&el=h1) Report > Merging [#5357](https://codecov.io/gh/apache/incubator-pinot/pull/5357?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/9246d7efc7ce282f35215c887a96a1bca1a2a6a8&el=desc) will **decrease** coverage by `9.40%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/5357/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/5357?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#5357 +/- ## == - Coverage 66.36% 56.96% -9.41% == Files1075 1075 Lines 5477354773 Branches 8168 8168 == - Hits3635131199-5152 - Misses 1575521129+5374 + Partials 2667 2445 -222 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/5357?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [...a/org/apache/pinot/minion/metrics/MinionMeter.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vbWV0cmljcy9NaW5pb25NZXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/pinot/common/metrics/BrokerQueryPhase.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Ccm9rZXJRdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [.../apache/pinot/minion/metrics/MinionQueryPhase.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vbWV0cmljcy9NaW5pb25RdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...he/pinot/core/query/reduce/ComparisonFunction.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9yZWR1Y2UvQ29tcGFyaXNvbkZ1bmN0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...pinot/minion/exception/TaskCancelledException.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vZXhjZXB0aW9uL1Rhc2tDYW5jZWxsZWRFeGNlcHRpb24uamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...ot/minion/events/EventObserverFactoryRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vZXZlbnRzL0V2ZW50T2JzZXJ2ZXJGYWN0b3J5UmVnaXN0cnkuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...t/minion/executor/TaskExecutorFactoryRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vZXhlY3V0b3IvVGFza0V4ZWN1dG9yRmFjdG9yeVJlZ2lzdHJ5LmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...minion/executor/ConvertToRawIndexTaskExecutor.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vZXhlY3V0b3IvQ29udmVydFRvUmF3SW5kZXhUYXNrRXhlY3V0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...nion/events/DefaultMinionEventObserverFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vZXZlbnRzL0RlZmF1bHRNaW5pb25FdmVudE9ic2VydmVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...startree/executor/StarTreeAggregationExecutor.java](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9zdGFydHJlZS9leGVjdXRvci9TdGFyVHJlZUFnZ3JlZ2F0aW9uRXhlY3V0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | ... and [312 more](https://codecov.io/gh/apache/incubator-pinot/pull/5357/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5357?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/a
[GitHub] [incubator-pinot] jackjlli opened a new pull request #5357: Fix NoSuchMethodError on ServerConfiguration.addHttpHandler
jackjlli opened a new pull request #5357: URL: https://github.com/apache/incubator-pinot/pull/5357 We encounter the following exception when running any command for `PerfBenchmarkRunner`: ``` Exception in thread "main" java.lang.NoSuchMethodError: org.glassfish.grizzly.http.server.ServerConfiguration.addHttpHandler(Lorg/glassfish/grizzly/http/server/HttpHandler;[Lorg/glassfish/grizzly/http/server/HttpHandlerRegistration;)V at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpServerFactory.createHttpServer(GrizzlyHttpServerFactory.java:258) at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpServerFactory.createHttpServer(GrizzlyHttpServerFactory.java:93) at org.apache.pinot.controller.api.ControllerAdminApiApplication.start(ControllerAdminApiApplication.java:85) at org.apache.pinot.controller.ControllerStarter.setUpPinotController(ControllerStarter.java:359) at org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:231) at org.apache.pinot.tools.perf.PerfBenchmarkDriver.startController(PerfBenchmarkDriver.java:200) at org.apache.pinot.tools.perf.PerfBenchmarkDriver.run(PerfBenchmarkDriver.java:172) at org.apache.pinot.tools.perf.PerfBenchmarkRunner.startAllButServer(PerfBenchmarkRunner.java:110) at org.apache.pinot.tools.perf.PerfBenchmarkRunner.execute(PerfBenchmarkRunner.java:96) at org.apache.pinot.tools.PinotToolLauncher.execute(PinotToolLauncher.java:67) at org.apache.pinot.tools.PinotToolLauncher.main(PinotToolLauncher.java:79) ``` This PR removes the exclusion that caused the exception. Tested by running `mvn clean install -DskipTests -Pbin-dist` and tests passed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5345: Create PULL_REQUEST_TEMPLATE.md
mcvsubbu commented on a change in pull request #5345: URL: https://github.com/apache/incubator-pinot/pull/5345#discussion_r422568460 ## File path: .github/PULL_REQUEST_TEMPLATE.md ## @@ -0,0 +1,13 @@ +## Upgrade Notes +Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion) +* [ ] Yes (Please label as **backward-incompat**) + +Does this PR fix a zero-downtime upgrade introduced earlier? Review comment: 1. LinkedIn pulls weekly from open source and deploys into production. LinkedIn (and others who may pull regularly from open source pinot) can get some information of what may break the deployment (even if retro-fixed). 2. It is a good practice to have release notes that describe the upgrade scenarios that may break. If we have good labels on PRs, whoever cuts the release can compile the release notes easily. 3. It hopefully makes the reviewers think whether something may break compatibility. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5345: Create PULL_REQUEST_TEMPLATE.md
mcvsubbu commented on a change in pull request #5345: URL: https://github.com/apache/incubator-pinot/pull/5345#discussion_r422568206 ## File path: .github/PULL_REQUEST_TEMPLATE.md ## @@ -0,0 +1,13 @@ +## Upgrade Notes Review comment: I had the before, but the problem is that when someone creates a PR, the description shows up _before_ the Description section. I can add it if that is blocking merge of this PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] reallocf commented on a change in pull request #5326: Add toDateTime DateTimeFunction (#5313)
reallocf commented on a change in pull request #5326: URL: https://github.com/apache/incubator-pinot/pull/5326#discussion_r422564834 ## File path: pinot-core/src/test/java/org/apache/pinot/core/data/function/DateTimeFunctionEvaluatorTest.java ## @@ -52,12 +54,14 @@ public void testDateTimeTransformFunctions(String transformFunction, List
[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5240: Supporting range queries using indexes
siddharthteotia commented on a change in pull request #5240: URL: https://github.com/apache/incubator-pinot/pull/5240#discussion_r422563677 ## File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/inv/RangeIndexCreator.java ## @@ -0,0 +1,471 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.core.segment.creator.impl.inv; + +import com.google.common.base.Preconditions; +import it.unimi.dsi.fastutil.Arrays; +import it.unimi.dsi.fastutil.Swapper; +import it.unimi.dsi.fastutil.ints.IntComparator; +import java.io.BufferedOutputStream; +import java.io.Closeable; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; +import java.util.Random; +import org.apache.commons.io.FileUtils; +import org.apache.pinot.core.common.Constants; +import org.apache.pinot.core.query.utils.Pair; +import org.apache.pinot.core.segment.creator.InvertedIndexCreator; +import org.apache.pinot.core.segment.memory.PinotDataBuffer; +import org.apache.pinot.spi.data.FieldSpec; +import org.apache.pinot.spi.data.MetricFieldSpec; +import org.roaringbitmap.buffer.ImmutableRoaringBitmap; +import org.roaringbitmap.buffer.MutableRoaringBitmap; + +import static com.google.common.base.Charsets.UTF_8; +import static org.apache.pinot.core.segment.creator.impl.V1Constants.Indexes.BITMAP_RANGE_INDEX_FILE_EXTENSION; + + +/** + * Implementation of {@link InvertedIndexCreator} that uses off-heap memory. + * We use 2 passes to create the range index. + * + * + * + * A + * + * + * In the first pass (adding values phase), when add() method is called, store the raw values into the forward index + * value buffer (for multi-valued column also store number of values for each docId into forward index length + * buffer). We also compute the inverted index length for each dictId while adding values. + * + * + * In the second pass (processing values phase), when seal() method is called, all the dictIds should already been + * added. We first reorder the values into the inverted index buffers by going over the dictIds in forward index + * value buffer (for multi-valued column we also need forward index length buffer to get the docId for each dictId). + * Once we have the inverted index buffers, we simply go over them and create the bitmap for each dictId and + * serialize them into a file. + * + * + * Based on the number of values we need to store, we use direct memory or MMap file to allocate the buffer. + */ +public final class RangeIndexCreator implements InvertedIndexCreator { + + private static final int RANGE_INDEX_VERSION = 1; + + // Use MMapBuffer if the value buffer size is larger than 2G + private static final int NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER = 500_000_000; + private static final int DEFAULT_NUM_RANGES = 20; + + private static final String FORWARD_INDEX_VALUE_BUFFER_SUFFIX = ".fwd.idx.val.buf"; + private static final String DOC_ID_VALUE_BUFFER_SUFFIX = ".doc.id.val.buf"; + + private final File _invertedIndexFile; + private final File _forwardIndexValueBufferFile; + private final File _docIdBufferFileForSorting; + private final int _numValues; + private final boolean _useMMapBuffer; + + // Forward index buffers (from docId to dictId) + private int _nextDocId; + private PinotDataBuffer _forwardIndexValueBuffer; + private NumberValueBuffer _valueBuffer; + //for sorting + private PinotDataBuffer _docIdValueBuffer; + private IntValueBuffer _docIdBuffer; + // For multi-valued column only because each docId can have multiple dictIds + private int _nextValueId; + private int _numDocsPerRange; + private FieldSpec.DataType _valueType; + + public RangeIndexCreator(File indexDir, FieldSpec fieldSpec, FieldSpec.DataType valueType, int numRanges, int numDocsPerRange, int numDocs, int numValues) + throws IOException { +_valueType = valueType; +String columnName = fieldSpec.getName(); +_invertedIndexFile = new File(i
[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5240: Supporting range queries using indexes
siddharthteotia commented on a change in pull request #5240: URL: https://github.com/apache/incubator-pinot/pull/5240#discussion_r422563310 ## File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/inv/RangeIndexCreator.java ## @@ -0,0 +1,471 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.core.segment.creator.impl.inv; + +import com.google.common.base.Preconditions; +import it.unimi.dsi.fastutil.Arrays; +import it.unimi.dsi.fastutil.Swapper; +import it.unimi.dsi.fastutil.ints.IntComparator; +import java.io.BufferedOutputStream; +import java.io.Closeable; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; +import java.util.Random; +import org.apache.commons.io.FileUtils; +import org.apache.pinot.core.common.Constants; +import org.apache.pinot.core.query.utils.Pair; +import org.apache.pinot.core.segment.creator.InvertedIndexCreator; +import org.apache.pinot.core.segment.memory.PinotDataBuffer; +import org.apache.pinot.spi.data.FieldSpec; +import org.apache.pinot.spi.data.MetricFieldSpec; +import org.roaringbitmap.buffer.ImmutableRoaringBitmap; +import org.roaringbitmap.buffer.MutableRoaringBitmap; + +import static com.google.common.base.Charsets.UTF_8; +import static org.apache.pinot.core.segment.creator.impl.V1Constants.Indexes.BITMAP_RANGE_INDEX_FILE_EXTENSION; + + +/** + * Implementation of {@link InvertedIndexCreator} that uses off-heap memory. + * We use 2 passes to create the range index. + * + * + * + * A + * + * + * In the first pass (adding values phase), when add() method is called, store the raw values into the forward index + * value buffer (for multi-valued column also store number of values for each docId into forward index length + * buffer). We also compute the inverted index length for each dictId while adding values. + * + * + * In the second pass (processing values phase), when seal() method is called, all the dictIds should already been + * added. We first reorder the values into the inverted index buffers by going over the dictIds in forward index + * value buffer (for multi-valued column we also need forward index length buffer to get the docId for each dictId). + * Once we have the inverted index buffers, we simply go over them and create the bitmap for each dictId and + * serialize them into a file. + * + * + * Based on the number of values we need to store, we use direct memory or MMap file to allocate the buffer. + */ +public final class RangeIndexCreator implements InvertedIndexCreator { + + private static final int RANGE_INDEX_VERSION = 1; + + // Use MMapBuffer if the value buffer size is larger than 2G + private static final int NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER = 500_000_000; + private static final int DEFAULT_NUM_RANGES = 20; + + private static final String FORWARD_INDEX_VALUE_BUFFER_SUFFIX = ".fwd.idx.val.buf"; + private static final String DOC_ID_VALUE_BUFFER_SUFFIX = ".doc.id.val.buf"; + + private final File _invertedIndexFile; + private final File _forwardIndexValueBufferFile; + private final File _docIdBufferFileForSorting; + private final int _numValues; + private final boolean _useMMapBuffer; + + // Forward index buffers (from docId to dictId) + private int _nextDocId; + private PinotDataBuffer _forwardIndexValueBuffer; + private NumberValueBuffer _valueBuffer; + //for sorting + private PinotDataBuffer _docIdValueBuffer; + private IntValueBuffer _docIdBuffer; + // For multi-valued column only because each docId can have multiple dictIds + private int _nextValueId; + private int _numDocsPerRange; + private FieldSpec.DataType _valueType; + + public RangeIndexCreator(File indexDir, FieldSpec fieldSpec, FieldSpec.DataType valueType, int numRanges, int numDocsPerRange, int numDocs, int numValues) + throws IOException { +_valueType = valueType; +String columnName = fieldSpec.getName(); +_invertedIndexFile = new File(i
[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5240: Supporting range queries using indexes
siddharthteotia commented on a change in pull request #5240: URL: https://github.com/apache/incubator-pinot/pull/5240#discussion_r422562613 ## File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/inv/RangeIndexCreator.java ## @@ -0,0 +1,471 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.core.segment.creator.impl.inv; + +import com.google.common.base.Preconditions; +import it.unimi.dsi.fastutil.Arrays; +import it.unimi.dsi.fastutil.Swapper; +import it.unimi.dsi.fastutil.ints.IntComparator; +import java.io.BufferedOutputStream; +import java.io.Closeable; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; +import java.util.Random; +import org.apache.commons.io.FileUtils; +import org.apache.pinot.core.common.Constants; +import org.apache.pinot.core.query.utils.Pair; +import org.apache.pinot.core.segment.creator.InvertedIndexCreator; +import org.apache.pinot.core.segment.memory.PinotDataBuffer; +import org.apache.pinot.spi.data.FieldSpec; +import org.apache.pinot.spi.data.MetricFieldSpec; +import org.roaringbitmap.buffer.ImmutableRoaringBitmap; +import org.roaringbitmap.buffer.MutableRoaringBitmap; + +import static com.google.common.base.Charsets.UTF_8; +import static org.apache.pinot.core.segment.creator.impl.V1Constants.Indexes.BITMAP_RANGE_INDEX_FILE_EXTENSION; + + +/** + * Implementation of {@link InvertedIndexCreator} that uses off-heap memory. + * We use 2 passes to create the range index. + * + * + * + * A + * + * + * In the first pass (adding values phase), when add() method is called, store the raw values into the forward index + * value buffer (for multi-valued column also store number of values for each docId into forward index length + * buffer). We also compute the inverted index length for each dictId while adding values. + * + * + * In the second pass (processing values phase), when seal() method is called, all the dictIds should already been + * added. We first reorder the values into the inverted index buffers by going over the dictIds in forward index + * value buffer (for multi-valued column we also need forward index length buffer to get the docId for each dictId). + * Once we have the inverted index buffers, we simply go over them and create the bitmap for each dictId and + * serialize them into a file. + * + * + * Based on the number of values we need to store, we use direct memory or MMap file to allocate the buffer. + */ +public final class RangeIndexCreator implements InvertedIndexCreator { + + private static final int RANGE_INDEX_VERSION = 1; + + // Use MMapBuffer if the value buffer size is larger than 2G + private static final int NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER = 500_000_000; + private static final int DEFAULT_NUM_RANGES = 20; + + private static final String FORWARD_INDEX_VALUE_BUFFER_SUFFIX = ".fwd.idx.val.buf"; + private static final String DOC_ID_VALUE_BUFFER_SUFFIX = ".doc.id.val.buf"; + + private final File _invertedIndexFile; + private final File _forwardIndexValueBufferFile; + private final File _docIdBufferFileForSorting; + private final int _numValues; + private final boolean _useMMapBuffer; + + // Forward index buffers (from docId to dictId) + private int _nextDocId; + private PinotDataBuffer _forwardIndexValueBuffer; + private NumberValueBuffer _valueBuffer; Review comment: Let's please rename it indexKeyBuffer? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5240: Supporting range queries using indexes
siddharthteotia commented on a change in pull request #5240: URL: https://github.com/apache/incubator-pinot/pull/5240#discussion_r422562477 ## File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/inv/RangeIndexCreator.java ## @@ -0,0 +1,471 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.core.segment.creator.impl.inv; + +import com.google.common.base.Preconditions; +import it.unimi.dsi.fastutil.Arrays; +import it.unimi.dsi.fastutil.Swapper; +import it.unimi.dsi.fastutil.ints.IntComparator; +import java.io.BufferedOutputStream; +import java.io.Closeable; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; +import java.util.Random; +import org.apache.commons.io.FileUtils; +import org.apache.pinot.core.common.Constants; +import org.apache.pinot.core.query.utils.Pair; +import org.apache.pinot.core.segment.creator.InvertedIndexCreator; +import org.apache.pinot.core.segment.memory.PinotDataBuffer; +import org.apache.pinot.spi.data.FieldSpec; +import org.apache.pinot.spi.data.MetricFieldSpec; +import org.roaringbitmap.buffer.ImmutableRoaringBitmap; +import org.roaringbitmap.buffer.MutableRoaringBitmap; + +import static com.google.common.base.Charsets.UTF_8; +import static org.apache.pinot.core.segment.creator.impl.V1Constants.Indexes.BITMAP_RANGE_INDEX_FILE_EXTENSION; + + +/** + * Implementation of {@link InvertedIndexCreator} that uses off-heap memory. + * We use 2 passes to create the range index. + * + * + * + * A + * + * + * In the first pass (adding values phase), when add() method is called, store the raw values into the forward index + * value buffer (for multi-valued column also store number of values for each docId into forward index length + * buffer). We also compute the inverted index length for each dictId while adding values. + * + * + * In the second pass (processing values phase), when seal() method is called, all the dictIds should already been + * added. We first reorder the values into the inverted index buffers by going over the dictIds in forward index + * value buffer (for multi-valued column we also need forward index length buffer to get the docId for each dictId). + * Once we have the inverted index buffers, we simply go over them and create the bitmap for each dictId and + * serialize them into a file. + * + * + * Based on the number of values we need to store, we use direct memory or MMap file to allocate the buffer. + */ +public final class RangeIndexCreator implements InvertedIndexCreator { + + private static final int RANGE_INDEX_VERSION = 1; + + // Use MMapBuffer if the value buffer size is larger than 2G + private static final int NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER = 500_000_000; + private static final int DEFAULT_NUM_RANGES = 20; + + private static final String FORWARD_INDEX_VALUE_BUFFER_SUFFIX = ".fwd.idx.val.buf"; + private static final String DOC_ID_VALUE_BUFFER_SUFFIX = ".doc.id.val.buf"; + + private final File _invertedIndexFile; + private final File _forwardIndexValueBufferFile; + private final File _docIdBufferFileForSorting; + private final int _numValues; + private final boolean _useMMapBuffer; + + // Forward index buffers (from docId to dictId) + private int _nextDocId; + private PinotDataBuffer _forwardIndexValueBuffer; + private NumberValueBuffer _valueBuffer; + //for sorting + private PinotDataBuffer _docIdValueBuffer; + private IntValueBuffer _docIdBuffer; Review comment: This is very confusing. _docIdBuffer stores docIds. What does _docIdValueBuffer store? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org -
[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5240: Supporting range queries using indexes
siddharthteotia commented on a change in pull request #5240: URL: https://github.com/apache/incubator-pinot/pull/5240#discussion_r422561957 ## File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/inv/RangeIndexCreator.java ## @@ -0,0 +1,471 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.core.segment.creator.impl.inv; + +import com.google.common.base.Preconditions; +import it.unimi.dsi.fastutil.Arrays; +import it.unimi.dsi.fastutil.Swapper; +import it.unimi.dsi.fastutil.ints.IntComparator; +import java.io.BufferedOutputStream; +import java.io.Closeable; +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; +import java.util.Random; +import org.apache.commons.io.FileUtils; +import org.apache.pinot.core.common.Constants; +import org.apache.pinot.core.query.utils.Pair; +import org.apache.pinot.core.segment.creator.InvertedIndexCreator; +import org.apache.pinot.core.segment.memory.PinotDataBuffer; +import org.apache.pinot.spi.data.FieldSpec; +import org.apache.pinot.spi.data.MetricFieldSpec; +import org.roaringbitmap.buffer.ImmutableRoaringBitmap; +import org.roaringbitmap.buffer.MutableRoaringBitmap; + +import static com.google.common.base.Charsets.UTF_8; +import static org.apache.pinot.core.segment.creator.impl.V1Constants.Indexes.BITMAP_RANGE_INDEX_FILE_EXTENSION; + + +/** + * Implementation of {@link InvertedIndexCreator} that uses off-heap memory. + * We use 2 passes to create the range index. + * + * + * + * A + * + * + * In the first pass (adding values phase), when add() method is called, store the raw values into the forward index + * value buffer (for multi-valued column also store number of values for each docId into forward index length + * buffer). We also compute the inverted index length for each dictId while adding values. + * + * + * In the second pass (processing values phase), when seal() method is called, all the dictIds should already been + * added. We first reorder the values into the inverted index buffers by going over the dictIds in forward index + * value buffer (for multi-valued column we also need forward index length buffer to get the docId for each dictId). + * Once we have the inverted index buffers, we simply go over them and create the bitmap for each dictId and + * serialize them into a file. + * + * + * Based on the number of values we need to store, we use direct memory or MMap file to allocate the buffer. + */ +public final class RangeIndexCreator implements InvertedIndexCreator { + + private static final int RANGE_INDEX_VERSION = 1; + + // Use MMapBuffer if the value buffer size is larger than 2G + private static final int NUM_VALUES_THRESHOLD_FOR_MMAP_BUFFER = 500_000_000; + private static final int DEFAULT_NUM_RANGES = 20; + + private static final String FORWARD_INDEX_VALUE_BUFFER_SUFFIX = ".fwd.idx.val.buf"; + private static final String DOC_ID_VALUE_BUFFER_SUFFIX = ".doc.id.val.buf"; + + private final File _invertedIndexFile; + private final File _forwardIndexValueBufferFile; + private final File _docIdBufferFileForSorting; + private final int _numValues; + private final boolean _useMMapBuffer; + + // Forward index buffers (from docId to dictId) Review comment: Let's not use the term forward index inside range index creator. Can we please call it keyBuffer. The key that are stored could be dictIds or raw values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch master updated (af5a901 -> 9246d7e)
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. from af5a901 Optimize RealtimeDictionaryBasedRangePredicateEvaluator by not scanning the dictionary when cardinality is high (#5331) add 9246d7e Update package.json to resolve sharp version to fix the local build issue. (#5215) No new revisions were added by this update. Summary of changes: website/package.json | 3 +++ 1 file changed, 3 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] reallocf commented on a change in pull request #5326: Add toDateTime DateTimeFunction (#5313)
reallocf commented on a change in pull request #5326: URL: https://github.com/apache/incubator-pinot/pull/5326#discussion_r422550932 ## File path: pinot-core/src/main/java/org/apache/pinot/core/data/function/DefaultFunctionEvaluator.java ## @@ -50,11 +50,16 @@ public DefaultFunctionEvaluator(String expression) throws Exception { +this(expression, FunctionRegistryFactory.getFunctionRegistry()); Review comment: I think I'm creating individual `DefaultFunctionRegistry`s in the right place now, but still not 100% sure. Let me know - thanks! :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] reallocf commented on a change in pull request #5326: Add toDateTime DateTimeFunction (#5313)
reallocf commented on a change in pull request #5326: URL: https://github.com/apache/incubator-pinot/pull/5326#discussion_r422551022 ## File path: pinot-core/src/main/java/org/apache/pinot/core/data/function/DateTimeFunctions.java ## @@ -40,4 +47,14 @@ static Long toEpochHours(Long millis) { static Long toEpochMinutes(Long millis, String bucket) { return TimeUnit.MILLISECONDS.toMinutes(millis) / Integer.parseInt(bucket); } + + DateTime toDateTime(String dateTimeString, String pattern) { +if (!_dateTimeFormatterMap.containsKey(pattern)) { + _dateTimeFormatterMap.put(pattern, DateTimeFormat.forPattern(pattern)); +} + +DateTimeFormatter dateTimeFormatter = _dateTimeFormatterMap.get(pattern); + +return dateTimeFormatter.parseDateTime(dateTimeString); Review comment: Okay, adjusted this so toDateTime takes in a Long millis and returns a String formattedDateTime and fromDateTime takes in a String formattedDateTime and outputs a Long millis - does that seem right to you? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] reallocf commented on a change in pull request #5326: Add toDateTime DateTimeFunction (#5313)
reallocf commented on a change in pull request #5326: URL: https://github.com/apache/incubator-pinot/pull/5326#discussion_r422550841 ## File path: pinot-core/src/main/java/org/apache/pinot/core/data/function/DateTimeFunctions.java ## @@ -27,6 +32,8 @@ */ public class DateTimeFunctions { + private Map _dateTimeFormatterMap = new HashMap<>(); Review comment: Good catch! Yeah, I'm still new to the concurrency model here - thanks for keeping an eye out for this! :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5345: Create PULL_REQUEST_TEMPLATE.md
kishoreg commented on a change in pull request #5345: URL: https://github.com/apache/incubator-pinot/pull/5345#discussion_r422304502 ## File path: .github/PULL_REQUEST_TEMPLATE.md ## @@ -0,0 +1,13 @@ +## Upgrade Notes +Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion) +* [ ] Yes (Please label as **backward-incompat**) + +Does this PR fix a zero-downtime upgrade introduced earlier? Review comment: whats the purpose of this question? ## File path: .github/PULL_REQUEST_TEMPLATE.md ## @@ -0,0 +1,13 @@ +## Upgrade Notes Review comment: Please add a description section. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] reallocf commented on a change in pull request #5326: Add toDateTime DateTimeFunction (#5313)
reallocf commented on a change in pull request #5326: URL: https://github.com/apache/incubator-pinot/pull/5326#discussion_r422548708 ## File path: pinot-core/src/test/java/org/apache/pinot/core/data/function/DefaultFunctionEvaluatorTest.java ## @@ -54,49 +54,73 @@ public void testExpressionWithColumn() @Test public void testExpressionWithConstant() throws Exception { -FunctionRegistry - .registerStaticFunction(MyFunc.class.getDeclaredMethod("daysSinceEpoch", String.class, String.class)); +MyFunc myFunc = new MyFunc(); +FunctionRegistry functionRegistry = new FunctionRegistry( + Lists.newArrayList(myFunc.getClass().getDeclaredMethod("daysSinceEpoch", String.class, String.class))); String input = "1980-01-01"; String format = "-MM-dd"; String expression = String.format("daysSinceEpoch('%s', '%s')", input, format); -DefaultFunctionEvaluator evaluator = new DefaultFunctionEvaluator(expression); +DefaultFunctionEvaluator evaluator = new DefaultFunctionEvaluator(expression, functionRegistry); Assert.assertTrue(evaluator.getArguments().isEmpty()); GenericRow row = new GenericRow(); Object result = evaluator.evaluate(row); -Assert.assertEquals(result, MyFunc.daysSinceEpoch(input, format)); +Assert.assertEquals(result, myFunc.daysSinceEpoch(input, format)); } @Test public void testMultiFunctionExpression() throws Exception { - FunctionRegistry.registerStaticFunction(MyFunc.class.getDeclaredMethod("reverseString", String.class)); -FunctionRegistry - .registerStaticFunction(MyFunc.class.getDeclaredMethod("daysSinceEpoch", String.class, String.class)); +MyFunc myFunc = new MyFunc(); +FunctionRegistry functionRegistry = new FunctionRegistry(Lists +.newArrayList(myFunc.getClass().getDeclaredMethod("reverseString", String.class), +myFunc.getClass().getDeclaredMethod("daysSinceEpoch", String.class, String.class))); String input = "1980-01-01"; -String reversedInput = MyFunc.reverseString(input); +String reversedInput = myFunc.reverseString(input); String format = "-MM-dd"; String expression = String.format("daysSinceEpoch(reverseString('%s'), '%s')", reversedInput, format); -DefaultFunctionEvaluator evaluator = new DefaultFunctionEvaluator(expression); +DefaultFunctionEvaluator evaluator = new DefaultFunctionEvaluator(expression, functionRegistry); Assert.assertTrue(evaluator.getArguments().isEmpty()); GenericRow row = new GenericRow(); Object result = evaluator.evaluate(row); -Assert.assertEquals(result, MyFunc.daysSinceEpoch(input, format)); +Assert.assertEquals(result, myFunc.daysSinceEpoch(input, format)); } - private static class MyFunc { -static String reverseString(String input) { - return new StringBuilder(input).reverse().toString(); -} + @Test + public void testStateSharedBetweenRowsForExecution() + throws Exception { Review comment: This test basically just confirms that the internal state of the FunctionRegistry is shared between each row. I agree with the current implementation it's fairly self-explanatory, but you can imagine an implementation where the internals of the FunctionRegistry are different for each row. This is to make sure we don't somehow regress to that, because then we'd see a big performance hit for creating a SDF for each row. But can definitely remove if it just feels like clutter to you ☺️ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5240: Supporting range queries using indexes
kishoreg commented on a change in pull request #5240: URL: https://github.com/apache/incubator-pinot/pull/5240#discussion_r422547267 ## File path: pinot-spi/src/main/java/org/apache/pinot/spi/config/table/IndexingConfig.java ## @@ -27,6 +27,7 @@ public class IndexingConfig extends BaseJsonConfig { private List _invertedIndexColumns; + private List _rangeIndexColumns; Review comment: I agree. will have to do this as part of another PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] kishoreg commented on pull request #5307: Admin UI
kishoreg commented on pull request #5307: URL: https://github.com/apache/incubator-pinot/pull/5307#issuecomment-626232744 @xxsacxx lets resolve conflicts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] damianoporta opened a new issue #5356: Is DataType.BYTES control in aggregate() aggregateGroupBySV() aggregateGroupByMV() needed?
damianoporta opened a new issue #5356: URL: https://github.com/apache/incubator-pinot/issues/5356 Hello, in different aggregators like the following: - https://github.com/apache/incubator-pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AvgAggregationFunction.java#L100 - https://github.com/apache/incubator-pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/MinMaxRangeAggregationFunction.java#L104 I see a "pseudo duplication" of the code because of the column DataType. But, is this really needed? I think that serialization/deserialization is done in **aggregationResultHolder** object only, no? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] KKcorps commented on a change in pull request #5293: Adding support for Protobuf input format
KKcorps commented on a change in pull request #5293: URL: https://github.com/apache/incubator-pinot/pull/5293#discussion_r422464033 ## File path: pinot-plugins/pinot-input-format/pinot-protobuf/src/main/java/org/apache/pinot/plugin/inputformat/protobuf/ProtoBufRecordReaderConfig.java ## @@ -0,0 +1,34 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.plugin.inputformat.protobuf; + +import org.apache.pinot.spi.data.readers.RecordReaderConfig; + + +public class ProtoBufRecordReaderConfig implements RecordReaderConfig { + private String _descriptorFile; + + public String getDescriptorFile() { Review comment: done ## File path: pinot-plugins/pinot-input-format/pinot-protobuf/src/main/java/org/apache/pinot/plugin/inputformat/protobuf/ProtoBufRecordReader.java ## @@ -0,0 +1,136 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.plugin.inputformat.protobuf; + +import com.google.protobuf.DescriptorProtos; +import com.google.protobuf.Descriptors; +import com.google.protobuf.DynamicMessage; +import com.google.protobuf.Message; +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.util.Set; +import javax.annotation.Nullable; +import org.apache.pinot.spi.data.Schema; +import org.apache.pinot.spi.data.readers.GenericRow; +import org.apache.pinot.spi.data.readers.RecordReader; +import org.apache.pinot.spi.data.readers.RecordReaderConfig; +import org.apache.pinot.spi.data.readers.RecordReaderUtils; +import org.apache.pinot.spi.utils.SchemaFieldExtractorUtils; + + +public class ProtoBufRecordReader implements RecordReader { + private File _dataFile; + private Schema _schema; + private ProtoBufRecordExtractor _recordExtractor; + + private InputStream _inputStream; + private boolean _hasNext; + private Descriptors.Descriptor _descriptor; + + private boolean hasMoreToRead() + throws IOException { +_inputStream.mark(1); +int nextByte = _inputStream.read(); +_inputStream.reset(); +return nextByte != -1; + } + + private void init() + throws IOException { +_inputStream = RecordReaderUtils.getBufferedInputStream(_dataFile); +try { + _hasNext = hasMoreToRead(); +} catch (Exception e) { + _inputStream.close(); + throw e; +} + } + + @Override + public void init(File dataFile, Schema schema, @Nullable RecordReaderConfig recordReaderConfig) + throws IOException { +_dataFile = dataFile; +_schema = schema; +Set sourceFields = SchemaFieldExtractorUtils.extract(schema); +ProtoBufRecordExtractorConfig recordExtractorConfig = new ProtoBufRecordExtractorConfig(); +ProtoBufRecordReaderConfig protoBufRecordReaderConfig = (ProtoBufRecordReaderConfig) recordReaderConfig; +String descriptorFile = protoBufRecordReaderConfig.getDescriptorFile(); +FileInputStream fin = new FileInputStream(descriptorFile); Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: comm
[GitHub] [incubator-pinot] KKcorps commented on pull request #5293: Adding support for Protobuf input format
KKcorps commented on pull request #5293: URL: https://github.com/apache/incubator-pinot/pull/5293#issuecomment-626121988 @kishoreg Yes, it is complete from my end This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org