[GitHub] [druid] stale[bot] commented on issue #4768: More accurate dictionary size estimation in RowBasedKeySerde
stale[bot] commented on issue #4768: URL: https://github.com/apache/druid/issues/4768#issuecomment-620983841 This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on pull request #9436: added number of bins parameter
stale[bot] commented on pull request #9436: URL: https://github.com/apache/druid/pull/9436#issuecomment-620983797 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the d...@druid.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on issue #8146: exceptions on coordinator
stale[bot] commented on issue #8146: URL: https://github.com/apache/druid/issues/8146#issuecomment-620983819 This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the d...@druid.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on issue #4784: Refactoring RowBasedGrouperHelper especially the logics for limit push down
stale[bot] commented on issue #4784: URL: https://github.com/apache/druid/issues/4784#issuecomment-620983844 This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on issue #4718: Missing documents for MoveTask, ArchiveTask, and RestoreTask
stale[bot] commented on issue #4718: URL: https://github.com/apache/druid/issues/4718#issuecomment-620983838 This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on issue #4987: Remove unnecessary key serde in StreamingMergeSortedGrouper
stale[bot] commented on issue #4987: URL: https://github.com/apache/druid/issues/4987#issuecomment-620983846 This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on pull request #9390: [bugfix]fix sequences ArrayIndexOutOfBoundsException in SeekableStreamIndexTaskRunner
stale[bot] commented on pull request #9390: URL: https://github.com/apache/druid/pull/9390#issuecomment-620983800 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the d...@druid.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on issue #3832: Improve cost-based planner for search query
stale[bot] commented on issue #3832: URL: https://github.com/apache/druid/issues/3832#issuecomment-620983830 This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on issue #4529: Moving push retries from AppenderatorImpl to DataPusher
stale[bot] commented on issue #4529: URL: https://github.com/apache/druid/issues/4529#issuecomment-620983832 This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins
suneet-s commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r417047809 ## File path: processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java ## @@ -50,11 +50,13 @@ public MultiValueExpressionDimensionSelector(ColumnValueSelector baseS this.baseSelector = baseSelector; } + @Nullable Review comment: These are checked on the hot loop. I've removed them for now, it's hard to reason whether or not it's possible to return null here. I think it can't, but I'm not 100% sure so I'm reverting this change and the associated null check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on pull request #9714: More Hadoop integration tests
jon-wei commented on pull request #9714: URL: https://github.com/apache/druid/pull/9714#issuecomment-620959521 > Should we create an issue for this? Seems like might be a problem for new users / going through tutorial? I don't know if it's an issue with my local environment, I think that can be looked into later This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #9714: More Hadoop integration tests
jon-wei commented on a change in pull request #9714: URL: https://github.com/apache/druid/pull/9714#discussion_r417035022 ## File path: integration-tests/README.md ## @@ -214,31 +214,32 @@ of the integration test run discussed above. This is because druid test clusters might not, in general, have access to hadoop. This also applies to integration test that uses Hadoop HDFS as an inputSource or as a deep storage. To run integration test that uses Hadoop, you will have to run a Hadoop cluster. This can be done in two ways: +1) Run Druid Docker test clusters with Hadoop container by passing -Dstart.hadoop.docker=true to the mvn command. 1) Run your own Druid + Hadoop cluster and specified Hadoop configs in the configuration file (CONFIG_FILE). -2) Run Druid Docker test clusters with Hadoop container by passing -Dstart.hadoop.docker=true to the mvn command. Currently, hdfs-deep-storage and other -deep-storage integration test groups can only be run with Druid Docker test clusters by passing -Dstart.hadoop.docker=true to start Hadoop container. You will also have to provide -Doverride.config.path= with your Druid's Hadoop configs set. See integration-tests/docker/environment-configs/override-examples/hdfs directory for example. Note that if the integration test you are running also uses other cloud extension (S3, Azure, GCS), additional -credentials/configs may need to be set in the same file as your Druid's Hadoop configs set. +credentials/configs may need to be set in the same file as your Druid's Hadoop configs set. Currently, ITHadoopIndexTest can only be run with your own Druid + Hadoop cluster by following the below steps: -Create a directory called batchHadoop1 in the hadoop file system -(anywhere you want) and put batch_hadoop.data (integration-tests/src/test/resources/hadoop/batch_hadoop.data) -into that directory (as its only file). - -Add this keyword to the configuration file (see above): +- Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and wikipedia_index_data3.json Review comment: Updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #9714: More Hadoop integration tests
jon-wei commented on a change in pull request #9714: URL: https://github.com/apache/druid/pull/9714#discussion_r417035061 ## File path: integration-tests/README.md ## @@ -214,31 +214,32 @@ of the integration test run discussed above. This is because druid test clusters might not, in general, have access to hadoop. This also applies to integration test that uses Hadoop HDFS as an inputSource or as a deep storage. To run integration test that uses Hadoop, you will have to run a Hadoop cluster. This can be done in two ways: +1) Run Druid Docker test clusters with Hadoop container by passing -Dstart.hadoop.docker=true to the mvn command. 1) Run your own Druid + Hadoop cluster and specified Hadoop configs in the configuration file (CONFIG_FILE). -2) Run Druid Docker test clusters with Hadoop container by passing -Dstart.hadoop.docker=true to the mvn command. Currently, hdfs-deep-storage and other -deep-storage integration test groups can only be run with Druid Docker test clusters by passing -Dstart.hadoop.docker=true to start Hadoop container. You will also have to provide -Doverride.config.path= with your Druid's Hadoop configs set. See integration-tests/docker/environment-configs/override-examples/hdfs directory for example. Note that if the integration test you are running also uses other cloud extension (S3, Azure, GCS), additional -credentials/configs may need to be set in the same file as your Druid's Hadoop configs set. +credentials/configs may need to be set in the same file as your Druid's Hadoop configs set. Currently, ITHadoopIndexTest can only be run with your own Druid + Hadoop cluster by following the below steps: -Create a directory called batchHadoop1 in the hadoop file system -(anywhere you want) and put batch_hadoop.data (integration-tests/src/test/resources/hadoop/batch_hadoop.data) -into that directory (as its only file). - -Add this keyword to the configuration file (see above): +- Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and wikipedia_index_data3.json + located in integration-tests/src/test/resources/data/batch_index/json to your HDFS at /batch_index/json/ + If using the Docker-based Hadoop container, this is automatically done by the integration tests. +- Copy batch_hadoop.data located in integration-tests/src/test/resources/data/batch_index/tsv to your HDFS + at /batch_index/tsv/ + If using the Docker-based Hadoop container, this is automatically done by the integration tests. +Run the test using mvn (using the bundled Docker-based Hadoop cluster): ``` -"hadoopTestDir": "" + mvn verify -P integration-tests -Dit.test=ITHadoopIndexTest -Dstart.hadoop.docker=true -Doverride.config.path=docker/environment-configs/override-examples/hdfs -Dextra.datasource.name.suffix='' ``` -Run the test using mvn: - +Run the test using mvn (using config file for existing Hadoop cluster): ``` - mvn verify -P int-tests-config-file -Dit.test=ITHadoopIndexTest + mvn verify -P int-tests-config-file -Dit.test=ITHadoopIndexTest -Doverride.config.path=docker/environment-configs/override-examples/hdfs -Dextra.datasource.name.suffix='' Review comment: Removed that part for the manual example ## File path: integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java ## @@ -84,6 +82,15 @@ */ public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage"; + public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage"; Review comment: Updated as suggested This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #9714: More Hadoop integration tests
jon-wei commented on a change in pull request #9714: URL: https://github.com/apache/druid/pull/9714#discussion_r417034943 ## File path: integration-tests/src/test/java/org/apache/druid/tests/hadoop/ITGcsInputToGcsHadoopIndexTest.java ## @@ -29,17 +29,18 @@ * To run this test, you must: * 1) Set the bucket and path for your data. This can be done by setting -Ddruid.test.config.cloudBucket and *-Ddruid.test.config.cloudPath or setting "cloud_bucket" and "cloud_path" in the config file. - * 2. Set -Ddruid.test.config.hadoopGcsCredentialsPath to the location of your Google credentials file as it + * 2) Set -Ddruid.test.config.hadoopGcsCredentialsPath to the location of your Google credentials file as it *exists within the Hadoop cluster that will ingest the data. The credentials file can be placed in the *shared folder used by the integration test containers if running the Docker-based Hadoop container, *in which case this property can be set to /shared/ - * 2) Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and wikipedia_index_data3.json + * 3) Provide -Dresource.file.dir.path= with folder that contains GOOGLE_APPLICATION_CREDENTIALS file + * 4) Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and wikipedia_index_data3.json *located in integration-tests/src/test/resources/data/batch_index/json to your GCS at the location set in step 1. - * 3) Provide -Doverride.config.path= with gcs configs set. See + * 5) Provide -Doverride.config.path= with gcs configs set. See * integration-tests/docker/environment-configs/override-examples/hadoop/gcs_to_gcs for env vars to provide. - * 4) Run the test with -Dstart.hadoop.docker=true -Dextra.datasource.name.suffix='' in the mvn command + * 6) Run the test with -Dstart.hadoop.docker=true -Dextra.datasource.name.suffix='' in the mvn command Review comment: Added a note about that to the README This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei opened a new issue #9788: Segments cannot be loaded from HDFS deep storage when datasource name has special characters
jon-wei opened a new issue #9788: URL: https://github.com/apache/druid/issues/9788 Using non-ASCII chars in datasource names with HDFS storage results in segments failing to load: ``` java.io.FileNotFoundException: File does not exist: /druid/segments/wikipedia_hadoop_index_test_728672c5-affc-4116-9792-d81e2599eaf4%20Россия%20한국%20中国!%3F/20130831T00.000Z_20130901T00.000Z/2020-04-29T02_21_33.533Z/0_index.zip at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_232] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_232] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_232] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_232] at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) ~[hadoop-common-2.8.5.jar:?] at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88) ~[hadoop-common-2.8.5.jar:?] at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:849) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:836) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:825) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:325) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:285) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:270) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1064) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:325) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.8.5.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325) ~[hadoop-hdfs-client-2.8.5.jar:?] at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787) ~[hadoop-common-2.8.5.jar:?] at org.apache.druid.storage.hdfs.HdfsDataSegmentPuller$2.openInputStream(HdfsDataSegmentPuller.java:124) ~[?:?] at org.apache.druid.storage.hdfs.HdfsDataSegmentPuller.getInputStream(HdfsDataSegmentPuller.java:298) ~[?:?] at org.apache.druid.storage.hdfs.HdfsDataSegmentPuller$3.openStream(HdfsDataSegmentPuller.java:249) ~[?:?] at org.apache.druid.utils.CompressionUtils.lambda$unzip$1(CompressionUtils.java:182) ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT] at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT] at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT] at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT] at org.apache.druid.utils.Compressi
[GitHub] [druid] jihoonson commented on a change in pull request #9760: Fix potential NPEs in joins
jihoonson commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r417032580 ## File path: processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java ## @@ -50,11 +50,13 @@ public MultiValueExpressionDimensionSelector(ColumnValueSelector baseS this.baseSelector = baseSelector; } + @Nullable Review comment: Oh I meant for this method. This method is technically private and only used in this class. Some of the callers are still not checking nulls, for example [here](https://github.com/apache/druid/pull/9760/files/79eb23832004e90ed498aa1448d3bc9513cd6b50#diff-2ae123ea2c178ff9151ddbb31ed5f2b8R108). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins
suneet-s commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r417031506 ## File path: processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java ## @@ -50,11 +50,13 @@ public MultiValueExpressionDimensionSelector(ColumnValueSelector baseS this.baseSelector = baseSelector; } + @Nullable Review comment: I'd like to do this in phases, there are about 600+ issues that intelliJ reports in the druid-processing sub module. I'll pick through more of them in my next PR. Once they are fixed, I will enable these warnings in the inspections job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins
suneet-s commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r417031506 ## File path: processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java ## @@ -50,11 +50,13 @@ public MultiValueExpressionDimensionSelector(ColumnValueSelector baseS this.baseSelector = baseSelector; } + @Nullable Review comment: I'd like to do this in phases, there are about 600+ issues that intelliJ reports in the druid-processing sub module. I'll pick through more of them in my next PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats
maytasm commented on a change in pull request #9783: URL: https://github.com/apache/druid/pull/9783#discussion_r417007253 ## File path: integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractStreamIndexingTest.java ## @@ -67,92 +92,147 @@ private IntegrationTestingConfig config; private StreamAdminClient streamAdminClient; - private WikipediaStreamEventStreamGenerator wikipediaStreamEventGenerator; abstract StreamAdminClient createStreamAdminClient(IntegrationTestingConfig config) throws Exception; - abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig config) throws Exception; - abstract Function generateStreamIngestionPropsTransform(String streamName, - String fullDatasourceName, - IntegrationTestingConfig config); + + abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig config, boolean transactionEnabled) + throws Exception; + + abstract Function generateStreamIngestionPropsTransform( + String streamName, + String fullDatasourceName, + String parserType, + String parserOrInputFormat, + IntegrationTestingConfig config + ); + abstract Function generateStreamQueryPropsTransform(String streamName, String fullDatasourceName); + public abstract String getTestNamePrefix(); protected void doBeforeClass() throws Exception { streamAdminClient = createStreamAdminClient(config); -wikipediaStreamEventGenerator = new WikipediaStreamEventStreamGenerator(EVENTS_PER_SECOND, CYCLE_PADDING_MS); } - protected void doClassTeardown() + private static String getOnlyResourcePath(String resourceRoot) throws IOException { -wikipediaStreamEventGenerator.shutdown(); +return String.join("/", resourceRoot, Iterables.getOnlyElement(listResources(resourceRoot))); } - protected void doTestIndexDataWithLegacyParserStableState() throws Exception + protected static List listDataFormatResources() throws IOException { -StreamEventWriter streamEventWriter = createStreamEventWriter(config); -final GeneratedTestConfig generatedTestConfig = new GeneratedTestConfig(); -try ( -final Closeable ignored1 = unloader(generatedTestConfig.getFullDatasourceName()) -) { - final String taskSpec = generatedTestConfig.getStreamIngestionPropsTransform().apply(getResourceAsString(INDEXER_FILE_LEGACY_PARSER)); - LOG.info("supervisorSpec: [%s]\n", taskSpec); - // Start supervisor - generatedTestConfig.setSupervisorId(indexer.submitSupervisor(taskSpec)); - LOG.info("Submitted supervisor"); - // Start data generator - wikipediaStreamEventGenerator.run(generatedTestConfig.getStreamName(), streamEventWriter, TOTAL_NUMBER_OF_SECOND, FIRST_EVENT_TIME); - verifyIngestedData(generatedTestConfig); +return listResources(DATA_RESOURCE_ROOT) +.stream() +.filter(resource -> !SUPERVISOR_SPEC_TEMPLATE_FILE.equals(resource)) +.collect(Collectors.toList()); + } + + /** + * Returns a map of key to path to spec. The returned map contains at least 2 specs and one of them + * should be a {@link #SERIALIZER} spec. + */ + protected static Map findTestSpecs(String resourceRoot) throws IOException + { +final List specDirs = listResources(resourceRoot); +final Map map = new HashMap<>(); +for (String eachSpec : specDirs) { + if (SERIALIZER_SPEC_DIR.equals(eachSpec)) { +map.put(SERIALIZER, getOnlyResourcePath(String.join("/", resourceRoot, SERIALIZER_SPEC_DIR))); + } else if (INPUT_ROW_PARSER_SPEC_DIR.equals(eachSpec)) { +map.put(INPUT_ROW_PARSER, getOnlyResourcePath(String.join("/", resourceRoot, INPUT_ROW_PARSER_SPEC_DIR))); + } else if (INPUT_FORMAT_SPEC_DIR.equals(eachSpec)) { +map.put(INPUT_FORMAT, getOnlyResourcePath(String.join("/", resourceRoot, INPUT_FORMAT_SPEC_DIR))); + } } -finally { - doMethodTeardown(generatedTestConfig, streamEventWriter); +if (!map.containsKey(SERIALIZER_SPEC_DIR)) { + throw new IAE("Failed to find serializer spec under [%s]. Found resources are %s", resourceRoot, map); } +if (map.size() == 1) { + throw new IAE("Failed to find input format or parser spec under [%s]. Found resources are %s", resourceRoot, map); +} +return map; + } + + private Closeable createResourceCloser(GeneratedTestConfig generatedTestConfig) + { +return Closer.create().register(() -> doMethodTeardown(generatedTestConfig)); } - protected void doTestIndexDataWithInputFormatStableState() throws Exception + protected void doTestIndexDataStableState( + boolean transactionEnabled, + String serializerPath, + String parserType, + String specPath + ) throws Exception { -StreamEventWriter streamEventWriter = cr
[GitHub] [druid] maytasm commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats
maytasm commented on a change in pull request #9783: URL: https://github.com/apache/druid/pull/9783#discussion_r417007253 ## File path: integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractStreamIndexingTest.java ## @@ -67,92 +92,147 @@ private IntegrationTestingConfig config; private StreamAdminClient streamAdminClient; - private WikipediaStreamEventStreamGenerator wikipediaStreamEventGenerator; abstract StreamAdminClient createStreamAdminClient(IntegrationTestingConfig config) throws Exception; - abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig config) throws Exception; - abstract Function generateStreamIngestionPropsTransform(String streamName, - String fullDatasourceName, - IntegrationTestingConfig config); + + abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig config, boolean transactionEnabled) + throws Exception; + + abstract Function generateStreamIngestionPropsTransform( + String streamName, + String fullDatasourceName, + String parserType, + String parserOrInputFormat, + IntegrationTestingConfig config + ); + abstract Function generateStreamQueryPropsTransform(String streamName, String fullDatasourceName); + public abstract String getTestNamePrefix(); protected void doBeforeClass() throws Exception { streamAdminClient = createStreamAdminClient(config); -wikipediaStreamEventGenerator = new WikipediaStreamEventStreamGenerator(EVENTS_PER_SECOND, CYCLE_PADDING_MS); } - protected void doClassTeardown() + private static String getOnlyResourcePath(String resourceRoot) throws IOException { -wikipediaStreamEventGenerator.shutdown(); +return String.join("/", resourceRoot, Iterables.getOnlyElement(listResources(resourceRoot))); } - protected void doTestIndexDataWithLegacyParserStableState() throws Exception + protected static List listDataFormatResources() throws IOException { -StreamEventWriter streamEventWriter = createStreamEventWriter(config); -final GeneratedTestConfig generatedTestConfig = new GeneratedTestConfig(); -try ( -final Closeable ignored1 = unloader(generatedTestConfig.getFullDatasourceName()) -) { - final String taskSpec = generatedTestConfig.getStreamIngestionPropsTransform().apply(getResourceAsString(INDEXER_FILE_LEGACY_PARSER)); - LOG.info("supervisorSpec: [%s]\n", taskSpec); - // Start supervisor - generatedTestConfig.setSupervisorId(indexer.submitSupervisor(taskSpec)); - LOG.info("Submitted supervisor"); - // Start data generator - wikipediaStreamEventGenerator.run(generatedTestConfig.getStreamName(), streamEventWriter, TOTAL_NUMBER_OF_SECOND, FIRST_EVENT_TIME); - verifyIngestedData(generatedTestConfig); +return listResources(DATA_RESOURCE_ROOT) +.stream() +.filter(resource -> !SUPERVISOR_SPEC_TEMPLATE_FILE.equals(resource)) +.collect(Collectors.toList()); + } + + /** + * Returns a map of key to path to spec. The returned map contains at least 2 specs and one of them + * should be a {@link #SERIALIZER} spec. + */ + protected static Map findTestSpecs(String resourceRoot) throws IOException + { +final List specDirs = listResources(resourceRoot); +final Map map = new HashMap<>(); +for (String eachSpec : specDirs) { + if (SERIALIZER_SPEC_DIR.equals(eachSpec)) { +map.put(SERIALIZER, getOnlyResourcePath(String.join("/", resourceRoot, SERIALIZER_SPEC_DIR))); + } else if (INPUT_ROW_PARSER_SPEC_DIR.equals(eachSpec)) { +map.put(INPUT_ROW_PARSER, getOnlyResourcePath(String.join("/", resourceRoot, INPUT_ROW_PARSER_SPEC_DIR))); + } else if (INPUT_FORMAT_SPEC_DIR.equals(eachSpec)) { +map.put(INPUT_FORMAT, getOnlyResourcePath(String.join("/", resourceRoot, INPUT_FORMAT_SPEC_DIR))); + } } -finally { - doMethodTeardown(generatedTestConfig, streamEventWriter); +if (!map.containsKey(SERIALIZER_SPEC_DIR)) { + throw new IAE("Failed to find serializer spec under [%s]. Found resources are %s", resourceRoot, map); } +if (map.size() == 1) { + throw new IAE("Failed to find input format or parser spec under [%s]. Found resources are %s", resourceRoot, map); +} +return map; + } + + private Closeable createResourceCloser(GeneratedTestConfig generatedTestConfig) + { +return Closer.create().register(() -> doMethodTeardown(generatedTestConfig)); } - protected void doTestIndexDataWithInputFormatStableState() throws Exception + protected void doTestIndexDataStableState( + boolean transactionEnabled, + String serializerPath, + String parserType, + String specPath + ) throws Exception { -StreamEventWriter streamEventWriter = cr
[GitHub] [druid] jihoonson commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats
jihoonson commented on a change in pull request #9783: URL: https://github.com/apache/druid/pull/9783#discussion_r417000607 ## File path: integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractStreamIndexingTest.java ## @@ -67,92 +92,147 @@ private IntegrationTestingConfig config; private StreamAdminClient streamAdminClient; - private WikipediaStreamEventStreamGenerator wikipediaStreamEventGenerator; abstract StreamAdminClient createStreamAdminClient(IntegrationTestingConfig config) throws Exception; - abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig config) throws Exception; - abstract Function generateStreamIngestionPropsTransform(String streamName, - String fullDatasourceName, - IntegrationTestingConfig config); + + abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig config, boolean transactionEnabled) + throws Exception; + + abstract Function generateStreamIngestionPropsTransform( + String streamName, + String fullDatasourceName, + String parserType, + String parserOrInputFormat, + IntegrationTestingConfig config + ); + abstract Function generateStreamQueryPropsTransform(String streamName, String fullDatasourceName); + public abstract String getTestNamePrefix(); protected void doBeforeClass() throws Exception { streamAdminClient = createStreamAdminClient(config); -wikipediaStreamEventGenerator = new WikipediaStreamEventStreamGenerator(EVENTS_PER_SECOND, CYCLE_PADDING_MS); } - protected void doClassTeardown() + private static String getOnlyResourcePath(String resourceRoot) throws IOException { -wikipediaStreamEventGenerator.shutdown(); +return String.join("/", resourceRoot, Iterables.getOnlyElement(listResources(resourceRoot))); } - protected void doTestIndexDataWithLegacyParserStableState() throws Exception + protected static List listDataFormatResources() throws IOException { -StreamEventWriter streamEventWriter = createStreamEventWriter(config); -final GeneratedTestConfig generatedTestConfig = new GeneratedTestConfig(); -try ( -final Closeable ignored1 = unloader(generatedTestConfig.getFullDatasourceName()) -) { - final String taskSpec = generatedTestConfig.getStreamIngestionPropsTransform().apply(getResourceAsString(INDEXER_FILE_LEGACY_PARSER)); - LOG.info("supervisorSpec: [%s]\n", taskSpec); - // Start supervisor - generatedTestConfig.setSupervisorId(indexer.submitSupervisor(taskSpec)); - LOG.info("Submitted supervisor"); - // Start data generator - wikipediaStreamEventGenerator.run(generatedTestConfig.getStreamName(), streamEventWriter, TOTAL_NUMBER_OF_SECOND, FIRST_EVENT_TIME); - verifyIngestedData(generatedTestConfig); +return listResources(DATA_RESOURCE_ROOT) +.stream() +.filter(resource -> !SUPERVISOR_SPEC_TEMPLATE_FILE.equals(resource)) +.collect(Collectors.toList()); + } + + /** + * Returns a map of key to path to spec. The returned map contains at least 2 specs and one of them + * should be a {@link #SERIALIZER} spec. + */ + protected static Map findTestSpecs(String resourceRoot) throws IOException + { +final List specDirs = listResources(resourceRoot); +final Map map = new HashMap<>(); +for (String eachSpec : specDirs) { + if (SERIALIZER_SPEC_DIR.equals(eachSpec)) { +map.put(SERIALIZER, getOnlyResourcePath(String.join("/", resourceRoot, SERIALIZER_SPEC_DIR))); + } else if (INPUT_ROW_PARSER_SPEC_DIR.equals(eachSpec)) { +map.put(INPUT_ROW_PARSER, getOnlyResourcePath(String.join("/", resourceRoot, INPUT_ROW_PARSER_SPEC_DIR))); + } else if (INPUT_FORMAT_SPEC_DIR.equals(eachSpec)) { +map.put(INPUT_FORMAT, getOnlyResourcePath(String.join("/", resourceRoot, INPUT_FORMAT_SPEC_DIR))); + } } -finally { - doMethodTeardown(generatedTestConfig, streamEventWriter); +if (!map.containsKey(SERIALIZER_SPEC_DIR)) { + throw new IAE("Failed to find serializer spec under [%s]. Found resources are %s", resourceRoot, map); } +if (map.size() == 1) { + throw new IAE("Failed to find input format or parser spec under [%s]. Found resources are %s", resourceRoot, map); +} +return map; + } + + private Closeable createResourceCloser(GeneratedTestConfig generatedTestConfig) + { +return Closer.create().register(() -> doMethodTeardown(generatedTestConfig)); } - protected void doTestIndexDataWithInputFormatStableState() throws Exception + protected void doTestIndexDataStableState( + boolean transactionEnabled, + String serializerPath, + String parserType, + String specPath + ) throws Exception { -StreamEventWriter streamEventWriter =
[GitHub] [druid] maytasm commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats
maytasm commented on a change in pull request #9783: URL: https://github.com/apache/druid/pull/9783#discussion_r416997861 ## File path: integration-tests/src/test/resources/testng.xml ## @@ -20,7 +20,7 @@ http://testng.org/testng-1.0.dtd"; > - + Review comment: nit: might be helpful to add what this does / how to use in README.md This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats
maytasm commented on a change in pull request #9783: URL: https://github.com/apache/druid/pull/9783#discussion_r416995812 ## File path: integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractStreamIndexingTest.java ## @@ -67,92 +92,147 @@ private IntegrationTestingConfig config; private StreamAdminClient streamAdminClient; - private WikipediaStreamEventStreamGenerator wikipediaStreamEventGenerator; abstract StreamAdminClient createStreamAdminClient(IntegrationTestingConfig config) throws Exception; - abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig config) throws Exception; - abstract Function generateStreamIngestionPropsTransform(String streamName, - String fullDatasourceName, - IntegrationTestingConfig config); + + abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig config, boolean transactionEnabled) + throws Exception; + + abstract Function generateStreamIngestionPropsTransform( + String streamName, + String fullDatasourceName, + String parserType, + String parserOrInputFormat, + IntegrationTestingConfig config + ); + abstract Function generateStreamQueryPropsTransform(String streamName, String fullDatasourceName); + public abstract String getTestNamePrefix(); protected void doBeforeClass() throws Exception { streamAdminClient = createStreamAdminClient(config); -wikipediaStreamEventGenerator = new WikipediaStreamEventStreamGenerator(EVENTS_PER_SECOND, CYCLE_PADDING_MS); } - protected void doClassTeardown() + private static String getOnlyResourcePath(String resourceRoot) throws IOException { -wikipediaStreamEventGenerator.shutdown(); +return String.join("/", resourceRoot, Iterables.getOnlyElement(listResources(resourceRoot))); } - protected void doTestIndexDataWithLegacyParserStableState() throws Exception + protected static List listDataFormatResources() throws IOException { -StreamEventWriter streamEventWriter = createStreamEventWriter(config); -final GeneratedTestConfig generatedTestConfig = new GeneratedTestConfig(); -try ( -final Closeable ignored1 = unloader(generatedTestConfig.getFullDatasourceName()) -) { - final String taskSpec = generatedTestConfig.getStreamIngestionPropsTransform().apply(getResourceAsString(INDEXER_FILE_LEGACY_PARSER)); - LOG.info("supervisorSpec: [%s]\n", taskSpec); - // Start supervisor - generatedTestConfig.setSupervisorId(indexer.submitSupervisor(taskSpec)); - LOG.info("Submitted supervisor"); - // Start data generator - wikipediaStreamEventGenerator.run(generatedTestConfig.getStreamName(), streamEventWriter, TOTAL_NUMBER_OF_SECOND, FIRST_EVENT_TIME); - verifyIngestedData(generatedTestConfig); +return listResources(DATA_RESOURCE_ROOT) +.stream() +.filter(resource -> !SUPERVISOR_SPEC_TEMPLATE_FILE.equals(resource)) +.collect(Collectors.toList()); + } + + /** + * Returns a map of key to path to spec. The returned map contains at least 2 specs and one of them + * should be a {@link #SERIALIZER} spec. + */ + protected static Map findTestSpecs(String resourceRoot) throws IOException + { +final List specDirs = listResources(resourceRoot); +final Map map = new HashMap<>(); +for (String eachSpec : specDirs) { + if (SERIALIZER_SPEC_DIR.equals(eachSpec)) { +map.put(SERIALIZER, getOnlyResourcePath(String.join("/", resourceRoot, SERIALIZER_SPEC_DIR))); + } else if (INPUT_ROW_PARSER_SPEC_DIR.equals(eachSpec)) { +map.put(INPUT_ROW_PARSER, getOnlyResourcePath(String.join("/", resourceRoot, INPUT_ROW_PARSER_SPEC_DIR))); + } else if (INPUT_FORMAT_SPEC_DIR.equals(eachSpec)) { +map.put(INPUT_FORMAT, getOnlyResourcePath(String.join("/", resourceRoot, INPUT_FORMAT_SPEC_DIR))); + } } -finally { - doMethodTeardown(generatedTestConfig, streamEventWriter); +if (!map.containsKey(SERIALIZER_SPEC_DIR)) { + throw new IAE("Failed to find serializer spec under [%s]. Found resources are %s", resourceRoot, map); } +if (map.size() == 1) { + throw new IAE("Failed to find input format or parser spec under [%s]. Found resources are %s", resourceRoot, map); +} +return map; + } + + private Closeable createResourceCloser(GeneratedTestConfig generatedTestConfig) + { +return Closer.create().register(() -> doMethodTeardown(generatedTestConfig)); } - protected void doTestIndexDataWithInputFormatStableState() throws Exception + protected void doTestIndexDataStableState( + boolean transactionEnabled, + String serializerPath, + String parserType, + String specPath + ) throws Exception { -StreamEventWriter streamEventWriter = cr
[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins
suneet-s commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r416992856 ## File path: processing/src/main/java/org/apache/druid/segment/join/lookup/LookupJoinMatcher.java ## @@ -277,7 +277,7 @@ public void matchRemainder() } else { currentIterator = Iterators.filter( extractor.iterable().iterator(), - entry -> !matchedKeys.contains(entry.getKey()) + entry -> entry != null && !matchedKeys.contains(entry.getKey()) Review comment: good call. Looks like this is on a hot path too, so I've added an inspection suppression instead of an assert not null. 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins
suneet-s commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r416992856 ## File path: processing/src/main/java/org/apache/druid/segment/join/lookup/LookupJoinMatcher.java ## @@ -277,7 +277,7 @@ public void matchRemainder() } else { currentIterator = Iterators.filter( extractor.iterable().iterator(), - entry -> !matchedKeys.contains(entry.getKey()) + entry -> entry != null && !matchedKeys.contains(entry.getKey()) Review comment: good call. Looks like this is on a hot path too, so I've added an inspection suppression. 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #9760: Fix potential NPEs in joins
jihoonson commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r416985934 ## File path: processing/src/main/java/org/apache/druid/segment/join/lookup/LookupJoinMatcher.java ## @@ -277,7 +277,7 @@ public void matchRemainder() } else { currentIterator = Iterators.filter( extractor.iterable().iterator(), - entry -> !matchedKeys.contains(entry.getKey()) + entry -> entry != null && !matchedKeys.contains(entry.getKey()) Review comment: Yeah. It also doesn't seem reasonable that a "row" can be null. I think it's a false warning because of the `Predicate` interface of Guava. It would be better to suppress warning instead of checking null. ## File path: processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java ## @@ -50,11 +50,13 @@ public MultiValueExpressionDimensionSelector(ColumnValueSelector baseS this.baseSelector = baseSelector; } + @Nullable Review comment: Does all callers need to check the returned eval? Seems there are a couple of missing places. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins
suneet-s commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r416986933 ## File path: processing/src/main/java/org/apache/druid/segment/join/PossiblyNullDimensionSelector.java ## @@ -138,7 +138,8 @@ public int lookupId(@Nullable String name) // id 0 is always null for this selector impl. return 0; } else { - return baseSelector.idLookup().lookupId(name) + nullAdjustment; + IdLookup idLookup = baseSelector.idLookup(); + return (idLookup == null ? 0 : idLookup.lookupId(name)) + nullAdjustment; Review comment: Added an assert statement with an explanation of why it should never be null in here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins
suneet-s commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r416984862 ## File path: processing/src/main/java/org/apache/druid/segment/join/JoinConditionAnalysis.java ## @@ -79,7 +79,7 @@ private JoinConditionAnalysis( .allMatch(expr -> expr.isLiteral() && expr.eval( ExprUtils.nilBindings()).asBoolean()); canHashJoin = nonEquiConditions.stream().allMatch(Expr::isLiteral); -rightKeyColumns = getEquiConditions().stream().map(Equality::getRightColumn).distinct().collect(Collectors.toSet()); +rightKeyColumns = getEquiConditions().stream().map(Equality::getRightColumn).collect(Collectors.toSet()); Review comment: should be a no-op. intelliJ suggested it's un-necessary since we're collecting to a set This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #9760: Fix potential NPEs in joins
jon-wei commented on a change in pull request #9760: URL: https://github.com/apache/druid/pull/9760#discussion_r416975593 ## File path: processing/src/main/java/org/apache/druid/segment/join/JoinConditionAnalysis.java ## @@ -79,7 +79,7 @@ private JoinConditionAnalysis( .allMatch(expr -> expr.isLiteral() && expr.eval( ExprUtils.nilBindings()).asBoolean()); canHashJoin = nonEquiConditions.stream().allMatch(Expr::isLiteral); -rightKeyColumns = getEquiConditions().stream().map(Equality::getRightColumn).distinct().collect(Collectors.toSet()); +rightKeyColumns = getEquiConditions().stream().map(Equality::getRightColumn).collect(Collectors.toSet()); Review comment: What's the impact of removing the `distinct()` call (is it just unnecessary since they're being collected to a set already?) ## File path: processing/src/main/java/org/apache/druid/segment/join/PossiblyNullDimensionSelector.java ## @@ -138,7 +138,8 @@ public int lookupId(@Nullable String name) // id 0 is always null for this selector impl. return 0; } else { - return baseSelector.idLookup().lookupId(name) + nullAdjustment; + IdLookup idLookup = baseSelector.idLookup(); + return (idLookup == null ? 0 : idLookup.lookupId(name)) + nullAdjustment; Review comment: A null `idLookup` there should be an error condition, since the caller should check if `idLookup()` returned null before calling `lookupId(name)` on the `PossiblyNullDimensionSelector` ## File path: processing/src/main/java/org/apache/druid/segment/join/lookup/LookupJoinable.java ## @@ -95,18 +96,23 @@ public JoinMatcher makeJoinMatcher( boolean allowNonKeyColumnSearch ) { +if (!ALL_COLUMNS.contains(searchColumnName) || !ALL_COLUMNS.contains(retrievalColumnName)) { + return ImmutableSet.of(); +} Set correlatedValues; if (LookupColumnSelectorFactory.KEY_COLUMN.equals(searchColumnName)) { if (LookupColumnSelectorFactory.KEY_COLUMN.equals(retrievalColumnName)) { correlatedValues = ImmutableSet.of(searchColumnValue); } else { -correlatedValues = ImmutableSet.of(extractor.apply(searchColumnName)); +// This should not happen in practice because the column to be joined on must be a key. +correlatedValues = Collections.singleton(extractor.apply(searchColumnValue)); Review comment: :+1: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s opened a new issue #9787: Filter pushdown for joins do not work a non-matching condition is on the RHS
suneet-s opened a new issue #9787: URL: https://github.com/apache/druid/issues/9787 ### Affected Version 0.18 ### Description See the following CalciteQueryTest ``` @Test public void testFilterAndGroupByLookupUsingJoinOperatorWithValueFilterPushdownMatchesNothig() throws Exception { // Cannot vectorize JOIN operator. cannotVectorize(); Map queryRewriteValueColumnFiltersContext = DEFAULT_QUERY_CONTEXT_BUILDER .put(QueryContexts.JOIN_FILTER_REWRITE_VALUE_COLUMN_FILTERS_ENABLE_KEY, true) .build(); testQuery( "SELECT lookyloo.k, COUNT(*)\n" + "FROM foo LEFT JOIN lookup.lookyloo ON foo.dim2 = lookyloo.k\n" + "WHERE lookyloo.v = '123'\n" + "GROUP BY lookyloo.k", queryRewriteValueColumnFiltersContext, ImmutableList.of( GroupByQuery.builder() .setDataSource( join( new TableDataSource(CalciteTests.DATASOURCE1), new LookupDataSource("lookyloo"), "j0.", equalsCondition(DruidExpression.fromColumn("dim2"), DruidExpression.fromColumn("j0.k")), JoinType.LEFT ) ) .setInterval(querySegmentSpec(Filtration.eternity())) .setDimFilter(selector("j0.v", "123", null)) .setGranularity(Granularities.ALL) .setDimensions(dimensions(new DefaultDimensionSpec("j0.k", "d0"))) .setAggregatorSpecs(aggregators(new CountAggregatorFactory("a0"))) .setContext(queryRewriteValueColumnFiltersContext) .build() ), ImmutableList.of() ); } ``` Since the filter `lookyloo.v = '123'` matches nothing, Druid treats this as if filter pushdown was disabled so it needs to do the hash join for every row in the fact table only to realize that the dimension does not match. The filter pushdown mechanism should handle the following cases * can't push down for some reason (like disabled by config) * non-existent column * no matches * matches some values This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests
maytasm commented on a change in pull request #9714: URL: https://github.com/apache/druid/pull/9714#discussion_r416969482 ## File path: integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java ## @@ -84,6 +82,15 @@ */ public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage"; + public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage"; Review comment: Ahhh actually you don't since we exclude the hadoop test package in testng.xml. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests
maytasm commented on a change in pull request #9714: URL: https://github.com/apache/druid/pull/9714#discussion_r416972412 ## File path: integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java ## @@ -84,6 +82,15 @@ */ public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage"; + public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage"; Review comment: Wait then your -Dgroups will not work. I believe if you run mvn cmd with -Dgroups it will use the suite defined in testng.xml. This suite excludes running hadoop package (org.apache.druid.tests.hadoop). I think one solution is you will have to remove the exclusion of the hadoop package from testng.xml and add all the new groups to the exclusion list in the Travis's other integration test group This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests
maytasm commented on a change in pull request #9714: URL: https://github.com/apache/druid/pull/9714#discussion_r416969482 ## File path: integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java ## @@ -84,6 +82,15 @@ */ public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage"; + public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage"; Review comment: Ahhh actually you don't since we exclude the hadoop test package. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #9740: fill out missing test coverage for druid-stats, druid-momentsketch, druid-tdigestsketch postaggs
jihoonson commented on a change in pull request #9740: URL: https://github.com/apache/druid/pull/9740#discussion_r416966480 ## File path: extensions-contrib/tdigestsketch/src/test/java/org/apache/druid/query/aggregation/tdigestsketch/TDigestSketchToQuantilePostAggregatorTest.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.aggregation.tdigestsketch; + +import nl.jqno.equalsverifier.EqualsVerifier; +import org.apache.druid.jackson.DefaultObjectMapper; +import org.apache.druid.java.util.common.IAE; +import org.apache.druid.query.aggregation.PostAggregator; +import org.apache.druid.query.aggregation.post.ConstantPostAggregator; +import org.junit.Assert; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; + +public class TDigestSketchToQuantilePostAggregatorTest +{ + @Rule + public ExpectedException expectedException = ExpectedException.none(); + + @Test + public void testSerde() throws Exception + { +TDigestSketchToQuantilePostAggregator there = +new TDigestSketchToQuantilePostAggregator("post", new ConstantPostAggregator("", 100), 0.5); + +DefaultObjectMapper mapper = new DefaultObjectMapper(); +TDigestSketchToQuantilePostAggregator andBackAgain = mapper.readValue( +mapper.writeValueAsString(there), +TDigestSketchToQuantilePostAggregator.class +); + +Assert.assertEquals(there, andBackAgain); +Assert.assertArrayEquals(there.getCacheKey(), andBackAgain.getCacheKey()); +Assert.assertEquals(there.getDependentFields(), andBackAgain.getDependentFields()); + } + + @Test + public void testToString() + { +PostAggregator postAgg = +new TDigestSketchToQuantilePostAggregator("post", new ConstantPostAggregator("", 100), 0.5); + +Assert.assertEquals( +"TDigestSketchToQuantilePostAggregator{name='post', field=ConstantPostAggregator{name='', constantValue=100}, fraction=0.5}", +postAgg.toString() +); + } + + @Test + public void testComparator() + { +expectedException.expect(IAE.class); +expectedException.expectMessage("Comparing arrays of quantiles is not supported"); Review comment: Seems like the comparator should compare doubles instead of throwing an exception. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests
maytasm commented on a change in pull request #9714: URL: https://github.com/apache/druid/pull/9714#discussion_r416965658 ## File path: integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java ## @@ -84,6 +82,15 @@ */ public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage"; + public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage"; Review comment: I think you also have to exclude this from Travis's other integration test group This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests
maytasm commented on a change in pull request #9714: URL: https://github.com/apache/druid/pull/9714#discussion_r416953397 ## File path: integration-tests/src/test/java/org/apache/druid/tests/hadoop/ITGcsInputToGcsHadoopIndexTest.java ## @@ -29,17 +29,18 @@ * To run this test, you must: * 1) Set the bucket and path for your data. This can be done by setting -Ddruid.test.config.cloudBucket and *-Ddruid.test.config.cloudPath or setting "cloud_bucket" and "cloud_path" in the config file. - * 2. Set -Ddruid.test.config.hadoopGcsCredentialsPath to the location of your Google credentials file as it + * 2) Set -Ddruid.test.config.hadoopGcsCredentialsPath to the location of your Google credentials file as it *exists within the Hadoop cluster that will ingest the data. The credentials file can be placed in the *shared folder used by the integration test containers if running the Docker-based Hadoop container, *in which case this property can be set to /shared/ - * 2) Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and wikipedia_index_data3.json + * 3) Provide -Dresource.file.dir.path= with folder that contains GOOGLE_APPLICATION_CREDENTIALS file + * 4) Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and wikipedia_index_data3.json *located in integration-tests/src/test/resources/data/batch_index/json to your GCS at the location set in step 1. - * 3) Provide -Doverride.config.path= with gcs configs set. See + * 5) Provide -Doverride.config.path= with gcs configs set. See * integration-tests/docker/environment-configs/override-examples/hadoop/gcs_to_gcs for env vars to provide. - * 4) Run the test with -Dstart.hadoop.docker=true -Dextra.datasource.name.suffix='' in the mvn command + * 6) Run the test with -Dstart.hadoop.docker=true -Dextra.datasource.name.suffix='' in the mvn command Review comment: nit: maybe mention that -Dextra.datasource.name.suffix='' is due to github issue xxx (not sure if we have an issue for this) ## File path: integration-tests/README.md ## @@ -214,31 +214,32 @@ of the integration test run discussed above. This is because druid test clusters might not, in general, have access to hadoop. This also applies to integration test that uses Hadoop HDFS as an inputSource or as a deep storage. To run integration test that uses Hadoop, you will have to run a Hadoop cluster. This can be done in two ways: +1) Run Druid Docker test clusters with Hadoop container by passing -Dstart.hadoop.docker=true to the mvn command. 1) Run your own Druid + Hadoop cluster and specified Hadoop configs in the configuration file (CONFIG_FILE). -2) Run Druid Docker test clusters with Hadoop container by passing -Dstart.hadoop.docker=true to the mvn command. Currently, hdfs-deep-storage and other -deep-storage integration test groups can only be run with Druid Docker test clusters by passing -Dstart.hadoop.docker=true to start Hadoop container. You will also have to provide -Doverride.config.path= with your Druid's Hadoop configs set. See integration-tests/docker/environment-configs/override-examples/hdfs directory for example. Note that if the integration test you are running also uses other cloud extension (S3, Azure, GCS), additional -credentials/configs may need to be set in the same file as your Druid's Hadoop configs set. +credentials/configs may need to be set in the same file as your Druid's Hadoop configs set. Currently, ITHadoopIndexTest can only be run with your own Druid + Hadoop cluster by following the below steps: -Create a directory called batchHadoop1 in the hadoop file system -(anywhere you want) and put batch_hadoop.data (integration-tests/src/test/resources/hadoop/batch_hadoop.data) -into that directory (as its only file). - -Add this keyword to the configuration file (see above): +- Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and wikipedia_index_data3.json + located in integration-tests/src/test/resources/data/batch_index/json to your HDFS at /batch_index/json/ + If using the Docker-based Hadoop container, this is automatically done by the integration tests. +- Copy batch_hadoop.data located in integration-tests/src/test/resources/data/batch_index/tsv to your HDFS + at /batch_index/tsv/ + If using the Docker-based Hadoop container, this is automatically done by the integration tests. +Run the test using mvn (using the bundled Docker-based Hadoop cluster): ``` -"hadoopTestDir": "" + mvn verify -P integration-tests -Dit.test=ITHadoopIndexTest -Dstart.hadoop.docker=true -Doverride.config.path=docker/environment-configs/override-examples/hdfs -Dextra.datasource.name.suffix='' ``` -Run the test using mvn: - +Run the test using mvn (using config file for existing Hadoop cluster): ``` - mvn verify -P int-tests-config-file -Dit.test=ITHadoopIndexTest + mvn verify -P int-tests-config-file -Dit.test=ITHadoopIn
[GitHub] [druid] ccaominh commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats
ccaominh commented on a change in pull request #9783: URL: https://github.com/apache/druid/pull/9783#discussion_r416958063 ## File path: integration-tests/src/test/java/org/apache/druid/tests/parallelized/ITKinesisIndexingServiceDataFormatTest.java ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.tests.parallelized; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.inject.Inject; +import org.apache.druid.guice.annotations.Json; +import org.apache.druid.testing.guice.DruidTestModuleFactory; +import org.apache.druid.tests.TestNGGroup; +import org.apache.druid.tests.indexer.AbstractKinesisIndexingServiceTest; +import org.apache.druid.tests.indexer.AbstractStreamIndexingTest; +import org.testng.annotations.BeforeClass; +import org.testng.annotations.DataProvider; +import org.testng.annotations.Guice; +import org.testng.annotations.Test; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +@Test(groups = "kinesis-data-format") Review comment: Keeping it separate is nice for running the groups concurrently. Perhaps we can explore TestNG [groups of groups](https://testng.org/doc/documentation-main.html#groups-of-groups) later to simplify running multiple groups. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats
jihoonson commented on a change in pull request #9783: URL: https://github.com/apache/druid/pull/9783#discussion_r416956654 ## File path: integration-tests/src/test/java/org/apache/druid/tests/parallelized/ITKinesisIndexingServiceDataFormatTest.java ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.tests.parallelized; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.inject.Inject; +import org.apache.druid.guice.annotations.Json; +import org.apache.druid.testing.guice.DruidTestModuleFactory; +import org.apache.druid.tests.TestNGGroup; +import org.apache.druid.tests.indexer.AbstractKinesisIndexingServiceTest; +import org.apache.druid.tests.indexer.AbstractStreamIndexingTest; +import org.testng.annotations.BeforeClass; +import org.testng.annotations.DataProvider; +import org.testng.annotations.Guice; +import org.testng.annotations.Test; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +@Test(groups = "kinesis-data-format") Review comment: On second thought, it might be useful to have a different group if we run this test on some CI in the future. I added a new group to `TestNGGroup`. The new group is disabled on Travis. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests
maytasm commented on a change in pull request #9714: URL: https://github.com/apache/druid/pull/9714#discussion_r416950821 ## File path: integration-tests/README.md ## @@ -214,31 +214,32 @@ of the integration test run discussed above. This is because druid test clusters might not, in general, have access to hadoop. This also applies to integration test that uses Hadoop HDFS as an inputSource or as a deep storage. To run integration test that uses Hadoop, you will have to run a Hadoop cluster. This can be done in two ways: +1) Run Druid Docker test clusters with Hadoop container by passing -Dstart.hadoop.docker=true to the mvn command. Review comment: nit: typo? both are 1) ## File path: integration-tests/run_cluster.sh ## @@ -74,17 +74,18 @@ # For druid-kinesis-indexing-service mkdir -p $SHARED_DIR/docker/extensions/druid-kinesis-indexing-service mv $SHARED_DIR/docker/lib/druid-kinesis-indexing-service-* $SHARED_DIR/docker/extensions/druid-kinesis-indexing-service - $ For druid-parquet-extensions + # For druid-parquet-extensions mkdir -p $SHARED_DIR/docker/extensions/druid-parquet-extensions mv $SHARED_DIR/docker/lib/druid-parquet-extensions-* $SHARED_DIR/docker/extensions/druid-parquet-extensions - $ For druid-orc-extensions + # For druid-orc-extensions mkdir -p $SHARED_DIR/docker/extensions/druid-orc-extensions mv $SHARED_DIR/docker/lib/druid-orc-extensions-* $SHARED_DIR/docker/extensions/druid-orc-extensions # Pull Hadoop dependency if needed if [ -n "$DRUID_INTEGRATION_TEST_START_HADOOP_DOCKER" ] && [ "$DRUID_INTEGRATION_TEST_START_HADOOP_DOCKER" == true ] then -java -cp "$SHARED_DIR/docker/lib/*" -Ddruid.extensions.hadoopDependenciesDir="$SHARED_DIR/hadoop-dependencies" org.apache.druid.cli.Main tools pull-deps -h org.apache.hadoop:hadoop-client:2.8.5 -h org.apache.hadoop:hadoop-aws:2.8.5 +java -cp "$SHARED_DIR/docker/lib/*" -Ddruid.extensions.hadoopDependenciesDir="$SHARED_DIR/hadoop-dependencies" org.apache.druid.cli.Main tools pull-deps -h org.apache.hadoop:hadoop-client:2.8.5 -h org.apache.hadoop:hadoop-aws:2.8.5 -h org.apache.hadoop:hadoop-azure:2.8.5 +curl https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop2-latest.jar --output $SHARED_DIR/docker/lib/gcs-connector-hadoop2-latest.jar Review comment: Should we create an issue for this? Seems like might be a problem for new users / going through tutorial? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats
jihoonson commented on a change in pull request #9783: URL: https://github.com/apache/druid/pull/9783#discussion_r416944603 ## File path: integration-tests/src/test/java/org/apache/druid/tests/parallelized/ITKinesisIndexingServiceDataFormatTest.java ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.tests.parallelized; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.inject.Inject; +import org.apache.druid.guice.annotations.Json; +import org.apache.druid.testing.guice.DruidTestModuleFactory; +import org.apache.druid.tests.TestNGGroup; +import org.apache.druid.tests.indexer.AbstractKinesisIndexingServiceTest; +import org.apache.druid.tests.indexer.AbstractStreamIndexingTest; +import org.testng.annotations.BeforeClass; +import org.testng.annotations.DataProvider; +import org.testng.annotations.Guice; +import org.testng.annotations.Test; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +@Test(groups = "kinesis-data-format") Review comment: Oh, I forgot to revert this change. I think it would be better to add this test to the existing `kinesis-index` group since we run them manually and probably want to run them all at once. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on a change in pull request #9778: Fix problem when running single integration test using -Dit.test=
maytasm commented on a change in pull request #9778: URL: https://github.com/apache/druid/pull/9778#discussion_r416943418 ## File path: integration-tests/src/main/java/org/testng/DruidTestRunnerFactory.java ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package /*CHECKSTYLE.OFF: PackageName*/org.testng/*CHECKSTYLE.ON: PackageName*/; + +import org.apache.druid.java.util.common.logger.Logger; +import org.apache.druid.testing.utils.SuiteListener; +import org.testng.internal.IConfiguration; +import org.testng.internal.Systematiser; +import org.testng.internal.annotations.IAnnotationFinder; +import org.testng.xml.XmlTest; + +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +/** + * This class must be in package org.testng to access protected methods like TestNG.getDefault().getConfiguration() + */ +public class DruidTestRunnerFactory implements ITestRunnerFactory +{ + private static final Logger LOG = new Logger(DruidTestRunnerFactory.class); + private static final SuiteListener SUITE_LISTENER = new SuiteListener(); + + @Override + public TestRunner newTestRunner(ISuite suite, + XmlTest test, + Collection methodListeners, + List classListeners) + { +IConfiguration configuration = TestNG.getDefault().getConfiguration(); +String outputDirectory = suite.getOutputDirectory(); +IAnnotationFinder annotationFinder = configuration.getAnnotationFinder(); +Boolean skipFailedInvocationCounts = suite.getXmlSuite().skipFailedInvocationCounts(); +return new DruidTestRunner( +configuration, +suite, +test, +outputDirectory, +annotationFinder, +skipFailedInvocationCounts, +methodListeners, +classListeners +); + } + + private static class DruidTestRunner extends TestRunner + { +DruidTestRunner( +IConfiguration configuration, +ISuite suite, +XmlTest test, +String outputDirectory, +IAnnotationFinder finder, +boolean skipFailedInvocationCounts, +Collection methodListeners, +List classListeners +) +{ + super(configuration, suite, test, outputDirectory, finder, skipFailedInvocationCounts, methodListeners, classListeners, Systematiser.getComparator(), Collections.emptyMap()); +} + +@Override +public void run() +{ + try { +// IntegrationTestSuite is run when -Dit.test is not specify in maven command. +// IntegrationTestSuite uses the configuration from integration-tests/src/test/resources/testng.xml +// which already handle suite onStart and onFinish automatically +if (!"IntegrationTestSuite".equals(getSuite().getName())) { + SUITE_LISTENER.onStart(getSuite()); +} +runTests(); + } + catch (Exception e) { +LOG.error(e, ""); Review comment: Removed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] ccaominh commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats
ccaominh commented on a change in pull request #9783: URL: https://github.com/apache/druid/pull/9783#discussion_r416931177 ## File path: integration-tests/src/test/java/org/apache/druid/tests/parallelized/ITKinesisIndexingServiceDataFormatTest.java ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.tests.parallelized; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.google.inject.Inject; +import org.apache.druid.guice.annotations.Json; +import org.apache.druid.testing.guice.DruidTestModuleFactory; +import org.apache.druid.tests.TestNGGroup; +import org.apache.druid.tests.indexer.AbstractKinesisIndexingServiceTest; +import org.apache.druid.tests.indexer.AbstractStreamIndexingTest; +import org.testng.annotations.BeforeClass; +import org.testng.annotations.DataProvider; +import org.testng.annotations.Guice; +import org.testng.annotations.Test; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +@Test(groups = "kinesis-data-format") Review comment: Can add a new named constant to `TestNGGroup` for "kinesis-data-format" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #9783: Integration tests for stream ingestion with various data formats
jihoonson commented on pull request #9783: URL: https://github.com/apache/druid/pull/9783#issuecomment-620854252 Manually ran the new Kinesis test on my laptop. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #9778: Fix problem when running single integration test using -Dit.test=
jihoonson commented on a change in pull request #9778: URL: https://github.com/apache/druid/pull/9778#discussion_r416915215 ## File path: integration-tests/src/main/java/org/testng/DruidTestRunnerFactory.java ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package /*CHECKSTYLE.OFF: PackageName*/org.testng/*CHECKSTYLE.ON: PackageName*/; + +import org.apache.druid.java.util.common.logger.Logger; +import org.apache.druid.testing.utils.SuiteListener; +import org.testng.internal.IConfiguration; +import org.testng.internal.Systematiser; +import org.testng.internal.annotations.IAnnotationFinder; +import org.testng.xml.XmlTest; + +import java.util.Collection; +import java.util.Collections; +import java.util.List; + +/** + * This class must be in package org.testng to access protected methods like TestNG.getDefault().getConfiguration() + */ +public class DruidTestRunnerFactory implements ITestRunnerFactory +{ + private static final Logger LOG = new Logger(DruidTestRunnerFactory.class); + private static final SuiteListener SUITE_LISTENER = new SuiteListener(); + + @Override + public TestRunner newTestRunner(ISuite suite, + XmlTest test, + Collection methodListeners, + List classListeners) + { +IConfiguration configuration = TestNG.getDefault().getConfiguration(); +String outputDirectory = suite.getOutputDirectory(); +IAnnotationFinder annotationFinder = configuration.getAnnotationFinder(); +Boolean skipFailedInvocationCounts = suite.getXmlSuite().skipFailedInvocationCounts(); +return new DruidTestRunner( +configuration, +suite, +test, +outputDirectory, +annotationFinder, +skipFailedInvocationCounts, +methodListeners, +classListeners +); + } + + private static class DruidTestRunner extends TestRunner + { +DruidTestRunner( +IConfiguration configuration, +ISuite suite, +XmlTest test, +String outputDirectory, +IAnnotationFinder finder, +boolean skipFailedInvocationCounts, +Collection methodListeners, +List classListeners +) +{ + super(configuration, suite, test, outputDirectory, finder, skipFailedInvocationCounts, methodListeners, classListeners, Systematiser.getComparator(), Collections.emptyMap()); +} + +@Override +public void run() +{ + try { +// IntegrationTestSuite is run when -Dit.test is not specify in maven command. +// IntegrationTestSuite uses the configuration from integration-tests/src/test/resources/testng.xml +// which already handle suite onStart and onFinish automatically +if (!"IntegrationTestSuite".equals(getSuite().getName())) { + SUITE_LISTENER.onStart(getSuite()); +} +runTests(); + } + catch (Exception e) { +LOG.error(e, ""); Review comment: Seems like neither `SuiteListener.onStart()` nor `runTests()` throw an `Exception`. Is this for logging? I changed `LoggerListener` to print stack trace properly [in my PR](https://github.com/apache/druid/pull/9783/files#diff-8e812fb64c6e87651c2a617561f88729R36) and this probably wouldn't be necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] averma111 opened a new issue #9786: Druid 0.18 version
averma111 opened a new issue #9786: URL: https://github.com/apache/druid/issues/9786 Hi Team, I am planning to build Druid 0.18 cluster in GKE environment using custom docker file which I have created. I am planning to create 3 pods 1 master(overlord and coordinator), 1 data node(historical and middle managers) and 1 query node(router and broker). Keeping zookeeper and meta store in VM instance so that they won’t interfere with GKE load balancer. Let me know if this design is correct or any other design I can look forward. Appreciate any help Thanks, Ashish This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] JulianJaffePinterest commented on issue #9707: Add IS_MULTIVALUE and COMPLEX_TYPE columns to INFORMATION_SCHEMA.COLUMNS
JulianJaffePinterest commented on issue #9707: URL: https://github.com/apache/druid/issues/9707#issuecomment-620836497 Segment Metadata queries do return the complex type name (e.g. `thetaSketch` or `variance`). I'll try to grab some time to work on this soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] knoguchi commented on a change in pull request #7590: Add single-machine deployment example cfgs and scripts
knoguchi commented on a change in pull request #7590: URL: https://github.com/apache/druid/pull/7590#discussion_r416895607 ## File path: examples/bin/run-druid ## @@ -34,7 +34,7 @@ else CONFDIR="$2" fi -CONFDIR="$(cd "$CONFDIR" && pwd)/druid" +CONFDIR="$(cd "$CONFDIR" && pwd)" Review comment: Why was the /druid removed? It's a breaking change for older version users. It doesn't seem like an absolutely necessary change. This file is one of the bin scripts in the apache-druid.tar.gz distribution although the directory name is "examples". Please avoid similar changes in the future release. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] capistrant opened a new pull request #9785: Ensure Krb auth before killing YARN apps in graceful shutdown of hadoop batch index tasks
capistrant opened a new pull request #9785: URL: https://github.com/apache/druid/pull/9785 ### Description My deployments had struggled with graceful shutdown of our Hadoop Batch Indexing tasks up until now. In the past, we simply patched out graceful shutdown since we didn't really mind the YARN apps not being killed. However, for our Druid 18 upgrade we wanted to get this fixed. After troubleshooting, we found that the code was getting hung in `ToolRunner#run` when trying to kill the YARN app for the indexing job. After some trial and error in our fork we discovered it was an authentication issue and that Calling into `JobHelper#authenticate` fixed the issue with the graceful shutdown code becoming hung. Our analysis of calling into `JobHelper#authenticate` let us to believe it is a harmless method to call multiple times and also harmless to call in a cluster that is not using Kerberized Hadoop. The call is transparent if the authentication is already done or the Hadoop cluster in use is not using Kerberos. We also slightly refactored the signature of `JobHelper#authenticate` after intelliJ flagged the config parameter as being unused. With all this being said, I'm not sure if calling into the `authenticate` method from a different module is frowned upon as far as design goes. I decided to reuse that method and see what the community had to say about that. This PR has: - [ ] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. # Key changed/added classes in this PR * `JobHelper` * `HadoopIndexTask` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (e7e41e3 -> b279e04)
This is an automated email from the ASF dual-hosted git repository. fjy pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from e7e41e3 Adding support for autoscaling in GCE (#8987) add b279e04 table fix (#9769) No new revisions were added by this update. Summary of changes: docs/querying/sql.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on issue #8134: OutOfMemoryError in BufferHashGrouperTest
stale[bot] commented on issue #8134: URL: https://github.com/apache/druid/issues/8134#issuecomment-620745702 This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the d...@druid.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9773: [WIP] Bad plan for table-lookup-lookup join with filter on first lookup and outer limit
suneet-s commented on a change in pull request #9773: URL: https://github.com/apache/druid/pull/9773#discussion_r416778349 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidJoinQueryRel.java ## @@ -328,17 +327,27 @@ private static JoinType toDruidJoinType(JoinRelType calciteJoinType) } } - private static boolean computeLeftRequiresSubquery(final DruidRel left) + private static boolean computeLeftRequiresSubquery(final List> leftList) Review comment: What do you think about keeping this interface the same and instead changing the private members left and right to `DruidRel` Whenever left and right is set - we can do something like `this.left = getCheapestDruidRel(RelNode);` As it's written, I wonder if it's possible for us to estimate that a subquery is not required in the computeSelfCost function, but the DruidRel used to build the native query actually uses a subquery. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] venkatramanp opened a new issue #9784: Filter datasources using regex in Segment Timeline of Druid Webconsole Datasourcese tab
venkatramanp opened a new issue #9784: URL: https://github.com/apache/druid/issues/9784 ### Description Segment timeline by default shows all the datasources. If there are 100's of datasources with difficult names, its very difficult to find the datasource now. currently it doesn't seem to be in a sorted order as well. Note: There is a filter dialogue for the datasources below the segment timeline. ### Motivation This would help in easily finding segment timeline information in the Druid WebConsole. This was possible in the Older consoles. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on issue #7957: Druid SQL planning time can exceed query timeouts
stale[bot] commented on issue #7957: URL: https://github.com/apache/druid/issues/7957#issuecomment-620712898 This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] pjain1 commented on a change in pull request #9449: Add Sql InputSource
pjain1 commented on a change in pull request #9449: URL: https://github.com/apache/druid/pull/9449#discussion_r416549940 ## File path: server/src/main/java/org/apache/druid/guice/FirehoseModule.java ## @@ -58,7 +59,8 @@ public void configure(Binder binder) new NamedType(CombiningFirehoseFactory.class, "combining"), new NamedType(FixedCountFirehoseFactory.class, "fixedCount"), new NamedType(SqlFirehoseFactory.class, "sql"), -new NamedType(InlineFirehoseFactory.class, "inline") +new NamedType(InlineFirehoseFactory.class, "inline"), +new NamedType(SqlInputSource.class, "sql") Review comment: can this be part of new module as InputSource seems to replacement for firehose related interfaces ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated: Adding support for autoscaling in GCE (#8987)
This is an automated email from the ASF dual-hosted git repository. cwylie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/druid.git The following commit(s) were added to refs/heads/master by this push: new e7e41e3 Adding support for autoscaling in GCE (#8987) e7e41e3 is described below commit e7e41e3a369c236b444dc44338c443c2e46904f1 Author: Francesco Nidito <11637948+frnid...@users.noreply.github.com> AuthorDate: Tue Apr 28 12:13:39 2020 +0200 Adding support for autoscaling in GCE (#8987) * Adding support for autoscaling in GCE * adding extra google deps also in gce pom * fix link in doc * remove unused deps * adding terms to spelling file * version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT * GCEXyz -> GceXyz in naming for consistency * add preconditions * add VisibleForTesting annotation * typos in comments * use StringUtils.format instead of String.format * use custom exception instead of exit * factorize interval time between retries * making literal value a constant * iter all network interfaces * use provided on google (non api) deps * adding missing dep * removing unneded this and use Objects methods instead o 3-way if in hash and comparison * adding import * adding retries around getRunningInstances and adding limit for operation end waiting * refactor GceEnvironmentConfig.hashCode * 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT * removing unused config * adding tests to hash and equals * adding nullable to waitForOperationEnd * adding testTerminate * adding unit tests for createComputeService * increasing retries in unrelated integration-test to prevent sporadic failure (hopefully) * reverting queryResponseTemplate change * adding comment for Compute.Builder.build() returning null --- distribution/pom.xml | 2 + docs/configuration/index.md| 8 +- .../extensions-contrib/gce-extensions.md | 103 +++ docs/development/extensions.md | 1 + extensions-contrib/gce-extensions/pom.xml | 131 .../overlord/autoscaling/gce/GceAutoScaler.java| 526 + .../autoscaling/gce/GceEnvironmentConfig.java | 132 .../overlord/autoscaling/gce/GceModule.java| 42 + .../autoscaling/gce/GceServiceException.java | 32 + .../overlord/autoscaling/gce/GceUtils.java | 73 ++ .../org.apache.druid.initialization.DruidModule| 16 + .../autoscaling/gce/GceAutoScalerTest.java | 853 + .../overlord/autoscaling/gce/GceUtilsTest.java | 96 +++ extensions-core/google-extensions/pom.xml | 2 +- licenses.yaml | 16 +- pom.xml| 12 +- .../druid/server/initialization/JettyTest.java | 2 +- website/.spelling | 6 + website/i18n/en.json | 3 + website/sidebars.json | 1 + 20 files changed, 2048 insertions(+), 9 deletions(-) diff --git a/distribution/pom.xml b/distribution/pom.xml index a47ead3..2b5c3d5 100644 --- a/distribution/pom.xml +++ b/distribution/pom.xml @@ -423,6 +423,8 @@ org.apache.druid.extensions.contrib:druid-moving-average-query -c org.apache.druid.extensions.contrib:druid-tdigestsketch +-c + org.apache.druid.extensions.contrib:gce-extensions diff --git a/docs/configuration/index.md b/docs/configuration/index.md index 982ff5e..e17a83a 100644 --- a/docs/configuration/index.md +++ b/docs/configuration/index.md @@ -891,7 +891,7 @@ There are additional configs for autoscaling (if it is enabled): |Property|Description|Default| ||---|---| -|`druid.indexer.autoscale.strategy`|Choices are "noop" or "ec2". Sets the strategy to run when autoscaling is required.|noop| +|`druid.indexer.autoscale.strategy`|Choices are "noop", "ec2" or "gce". Sets the strategy to run when autoscaling is required.|noop| |`druid.indexer.autoscale.doAutoscale`|If set to "true" autoscaling will be enabled.|false| |`druid.indexer.autoscale.provisionPeriod`|How often to check whether or not new MiddleManagers should be added.|PT1M| |`druid.indexer.autoscale.terminatePeriod`|How often to check when MiddleManagers should be removed.|PT5M| @@ -1115,7 +1115,9 @@ field. If not provided, the default is to not use it at all. # A
[GitHub] [druid] stale[bot] commented on pull request #8848: Refactor DirectDruidClient, extract QueryContextInputStreamResponseHandler
stale[bot] commented on pull request #8848: URL: https://github.com/apache/druid/pull/8848#issuecomment-620458156 This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson opened a new pull request #9783: Integration tests for stream ingestion with various data formats
jihoonson opened a new pull request #9783: URL: https://github.com/apache/druid/pull/9783 ### Description This PR adds new integration tests for stream ingestion against different data formats. To do this, a new interface `EventSerializer` was added which converts generated events using `SyntheticStreamGenerator` into a particular data format. The new tests automatically pick up serializer, inputFormat, and inputRowParser specs from test resources and populate test parameters. Since these tests take about 30 mins when you run in parallel with 2 threads, a new Travis job was added. This PR has: - [x] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths. - [x] added integration tests. - [ ] been tested in a test Druid cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] 2bethere commented on pull request #9766: Druid Quickstart refactor and update
2bethere commented on pull request #9766: URL: https://github.com/apache/druid/pull/9766#issuecomment-620434325 No comment, LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] majidbijani opened a new issue #9782: The console will not function at the moment
majidbijani opened a new issue #9782: URL: https://github.com/apache/druid/issues/9782 I had trouble setting up the druid cluster. The cluster setup was done step by step according to the druid website(Clustered deployment). deep storage set is local. And for Metadata, the default derby is used. Most settings are default. common.runtime.properties for Master Server and Query Server and Data Server are similar. common.runtime.properties: ``` # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # # Extensions specified in the load list will be loaded by Druid # We are using local fs for deep storage - not recommended for production - use S3, HDFS, or NFS instead # We are using local derby for the metadata store - not recommended for production - use MySQL or Postgres instead # If you specify `druid.extensions.loadList=[]`, Druid won't load any extension from file system. # If you don't specify `druid.extensions.loadList`, Druid will load all the extensions under root extension directory. # More info: https://druid.apache.org/docs/latest/operations/including-extensions.html druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches"] # If you have a different version of Hadoop, place your Hadoop client jar files in your hadoop-dependencies directory # and uncomment the line below to point to your directory. #druid.extensions.hadoopDependenciesDir=/my/dir/hadoop-dependencies # # Hostname # druid.host=master # # Logging # # Log all runtime properties on startup. Disable to avoid logging properties on startup: druid.startup.logging.logProperties=true # # Zookeeper # druid.zk.service.host=master druid.zk.paths.base=/druid # # Metadata storage # # For Derby server on your Druid Coordinator (only viable in a cluster with a single Coordinator, no fail-over): druid.metadata.storage.type=derby druid.metadata.storage.connector.connectURI=jdbc:derby://master:1527/var/druid/metadata.db;create=true druid.metadata.storage.connector.host=master druid.metadata.storage.connector.port=1527 # For MySQL (make sure to include the MySQL JDBC driver on the classpath): #druid.metadata.storage.type=mysql #druid.metadata.storage.connector.connectURI=jdbc:mysql://db.example.com:3306/druid #druid.metadata.storage.connector.user=... #druid.metadata.storage.connector.password=... # For PostgreSQL: #druid.metadata.storage.type=postgresql #druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid #druid.metadata.storage.connector.user=... #druid.metadata.storage.connector.password=... # # Deep storage # # For local disk (only viable in a cluster if this is a network mount): druid.storage.type=local druid.storage.storageDirectory=var/druid/segments # For HDFS: #druid.storage.type=hdfs #druid.storage.storageDirectory=/druid/segments # For S3: #druid.storage.type=s3 #druid.storage.bucket=your-bucket #druid.storage.baseKey=druid/segments #druid.s3.accessKey=... #druid.s3.secretKey=... # # Indexing service logs # # For local disk (only viable in a cluster if this is a network mount): druid.indexer.logs.type=file druid.indexer.logs.directory=var/druid/indexing-logs # For HDFS: #druid.indexer.logs.type=hdfs #druid.indexer.logs.directory=/druid/indexing-logs # For S3: #druid.indexer.logs.type=s3 #druid.indexer.logs.s3Bucket=your-bucket #druid.indexer.logs.s3Prefix=druid/indexing-logs # # Service discovery # druid.selectors.indexing.serviceName=druid/overlord druid.selectors.coordinator.serviceName=druid/coordinator # # Monitoring # druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"] druid.emitter=noop druid.emitter.logging.logLevel=info # Storage type of double columns # ommiting this will lead to index double as float at the storage layer druid.indexing.doubleStorage=double # # Secur
[GitHub] [druid] majidbijani opened a new issue #9781: Request failed with status code 500
majidbijani opened a new issue #9781: URL: https://github.com/apache/druid/issues/9781 I had trouble setting up the druid cluster. The cluster setup was done step by step according to the druid website(Clustered deployment). deep storage set is local. And for Metadata, the default derby is used. Most settings are default. common.runtime.properties for Master Server and Query Server and Data Server are similar. common.runtime.properties: `# # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # # Extensions specified in the load list will be loaded by Druid # We are using local fs for deep storage - not recommended for production - use S3, HDFS, or NFS instead # We are using local derby for the metadata store - not recommended for production - use MySQL or Postgres instead # If you specify `druid.extensions.loadList=[]`, Druid won't load any extension from file system. # If you don't specify `druid.extensions.loadList`, Druid will load all the extensions under root extension directory. # More info: https://druid.apache.org/docs/latest/operations/including-extensions.html druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches"] # If you have a different version of Hadoop, place your Hadoop client jar files in your hadoop-dependencies directory # and uncomment the line below to point to your directory. #druid.extensions.hadoopDependenciesDir=/my/dir/hadoop-dependencies # # Hostname # druid.host=master # # Logging # # Log all runtime properties on startup. Disable to avoid logging properties on startup: druid.startup.logging.logProperties=true # # Zookeeper # druid.zk.service.host=master druid.zk.paths.base=/druid # # Metadata storage # # For Derby server on your Druid Coordinator (only viable in a cluster with a single Coordinator, no fail-over): druid.metadata.storage.type=derby druid.metadata.storage.connector.connectURI=jdbc:derby://master:1527/var/druid/metadata.db;create=true druid.metadata.storage.connector.host=master druid.metadata.storage.connector.port=1527 # For MySQL (make sure to include the MySQL JDBC driver on the classpath): #druid.metadata.storage.type=mysql #druid.metadata.storage.connector.connectURI=jdbc:mysql://db.example.com:3306/druid #druid.metadata.storage.connector.user=... #druid.metadata.storage.connector.password=... # For PostgreSQL: #druid.metadata.storage.type=postgresql #druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid #druid.metadata.storage.connector.user=... #druid.metadata.storage.connector.password=... # # Deep storage # # For local disk (only viable in a cluster if this is a network mount): druid.storage.type=local druid.storage.storageDirectory=var/druid/segments # For HDFS: #druid.storage.type=hdfs #druid.storage.storageDirectory=/druid/segments # For S3: #druid.storage.type=s3 #druid.storage.bucket=your-bucket #druid.storage.baseKey=druid/segments #druid.s3.accessKey=... #druid.s3.secretKey=... # # Indexing service logs # # For local disk (only viable in a cluster if this is a network mount): druid.indexer.logs.type=file druid.indexer.logs.directory=var/druid/indexing-logs # For HDFS: #druid.indexer.logs.type=hdfs #druid.indexer.logs.directory=/druid/indexing-logs # For S3: #druid.indexer.logs.type=s3 #druid.indexer.logs.s3Bucket=your-bucket #druid.indexer.logs.s3Prefix=druid/indexing-logs # # Service discovery # druid.selectors.indexing.serviceName=druid/overlord druid.selectors.coordinator.serviceName=druid/coordinator # # Monitoring # druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"] druid.emitter=noop druid.emitter.logging.logLevel=info # Storage type of double columns # ommiting this will lead to index double as float at the storage layer druid.indexing.doubleStorage=double # # Security