date:20200428

[GitHub] [druid] stale[bot] commented on issue #4768: More accurate dictionary size estimation in RowBasedKeySerde

2020-04-28 Thread GitBox



stale[bot] commented on issue #4768:
URL: https://github.com/apache/druid/issues/4768#issuecomment-620983841


   This issue has been closed due to lack of activity. If you think that is 
incorrect, or the issue requires additional review, you can revive the issue at 
any time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on pull request #9436: added number of bins parameter

2020-04-28 Thread GitBox



stale[bot] commented on pull request #9436:
URL: https://github.com/apache/druid/pull/9436#issuecomment-620983797


   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 4 weeks if no further activity occurs. If you think that's 
incorrect or this pull request should instead be reviewed, please simply write 
any comment. Even if closed, you can still revive the PR at any time or discuss 
it on the d...@druid.apache.org list. Thank you for your contributions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on issue #8146: exceptions on coordinator

2020-04-28 Thread GitBox



stale[bot] commented on issue #8146:
URL: https://github.com/apache/druid/issues/8146#issuecomment-620983819


   This issue has been marked as stale due to 280 days of inactivity. It will 
be closed in 4 weeks if no further activity occurs. If this issue is still 
relevant, please simply write any comment. Even if closed, you can still revive 
the issue at any time or discuss it on the d...@druid.apache.org list. Thank 
you for your contributions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on issue #4784: Refactoring RowBasedGrouperHelper especially the logics for limit push down

2020-04-28 Thread GitBox



stale[bot] commented on issue #4784:
URL: https://github.com/apache/druid/issues/4784#issuecomment-620983844


   This issue has been closed due to lack of activity. If you think that is 
incorrect, or the issue requires additional review, you can revive the issue at 
any time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on issue #4718: Missing documents for MoveTask, ArchiveTask, and RestoreTask

2020-04-28 Thread GitBox



stale[bot] commented on issue #4718:
URL: https://github.com/apache/druid/issues/4718#issuecomment-620983838


   This issue has been closed due to lack of activity. If you think that is 
incorrect, or the issue requires additional review, you can revive the issue at 
any time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on issue #4987: Remove unnecessary key serde in StreamingMergeSortedGrouper

2020-04-28 Thread GitBox



stale[bot] commented on issue #4987:
URL: https://github.com/apache/druid/issues/4987#issuecomment-620983846


   This issue has been closed due to lack of activity. If you think that is 
incorrect, or the issue requires additional review, you can revive the issue at 
any time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on pull request #9390: [bugfix]fix sequences ArrayIndexOutOfBoundsException in SeekableStreamIndexTaskRunner

2020-04-28 Thread GitBox



stale[bot] commented on pull request #9390:
URL: https://github.com/apache/druid/pull/9390#issuecomment-620983800


   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 4 weeks if no further activity occurs. If you think that's 
incorrect or this pull request should instead be reviewed, please simply write 
any comment. Even if closed, you can still revive the PR at any time or discuss 
it on the d...@druid.apache.org list. Thank you for your contributions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on issue #3832: Improve cost-based planner for search query

2020-04-28 Thread GitBox



stale[bot] commented on issue #3832:
URL: https://github.com/apache/druid/issues/3832#issuecomment-620983830


   This issue has been closed due to lack of activity. If you think that is 
incorrect, or the issue requires additional review, you can revive the issue at 
any time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on issue #4529: Moving push retries from AppenderatorImpl to DataPusher

2020-04-28 Thread GitBox



stale[bot] commented on issue #4529:
URL: https://github.com/apache/druid/issues/4529#issuecomment-620983832


   This issue has been closed due to lack of activity. If you think that is 
incorrect, or the issue requires additional review, you can revive the issue at 
any time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



suneet-s commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r417047809



##
File path: 
processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java
##
@@ -50,11 +50,13 @@ public 
MultiValueExpressionDimensionSelector(ColumnValueSelector baseS
 this.baseSelector = baseSelector;
   }
 
+  @Nullable

Review comment:
   These are checked on the hot loop. I've removed them for now, it's hard 
to reason whether or not it's possible to return null here. I think it can't, 
but I'm not 100% sure so I'm reverting this change and the associated null check





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jon-wei commented on pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



jon-wei commented on pull request #9714:
URL: https://github.com/apache/druid/pull/9714#issuecomment-620959521


   > Should we create an issue for this? Seems like might be a problem for new 
users / going through tutorial?
   
   I don't know if it's an issue with my local environment, I think that can be 
looked into later



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jon-wei commented on a change in pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



jon-wei commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r417035022



##
File path: integration-tests/README.md
##
@@ -214,31 +214,32 @@ of the integration test run discussed above.  This is 
because druid
 test clusters might not, in general, have access to hadoop.
 This also applies to integration test that uses Hadoop HDFS as an inputSource 
or as a deep storage. 
 To run integration test that uses Hadoop, you will have to run a Hadoop 
cluster. This can be done in two ways:
+1) Run Druid Docker test clusters with Hadoop container by passing 
-Dstart.hadoop.docker=true to the mvn command. 
 1) Run your own Druid + Hadoop cluster and specified Hadoop configs in the 
configuration file (CONFIG_FILE).
-2) Run Druid Docker test clusters with Hadoop container by passing 
-Dstart.hadoop.docker=true to the mvn command. 
 
 Currently, hdfs-deep-storage and other -deep-storage integration test 
groups can only be run with 
 Druid Docker test clusters by passing -Dstart.hadoop.docker=true to start 
Hadoop container.
 You will also have to provide -Doverride.config.path= with your 
Druid's Hadoop configs set. 
 See integration-tests/docker/environment-configs/override-examples/hdfs 
directory for example.
 Note that if the integration test you are running also uses other cloud 
extension (S3, Azure, GCS), additional
-credentials/configs may need to be set in the same file as your Druid's Hadoop 
configs set. 
+credentials/configs may need to be set in the same file as your Druid's Hadoop 
configs set.
 
 Currently, ITHadoopIndexTest can only be run with your own Druid + Hadoop 
cluster by following the below steps:
-Create a directory called batchHadoop1 in the hadoop file system
-(anywhere you want) and put batch_hadoop.data 
(integration-tests/src/test/resources/hadoop/batch_hadoop.data) 
-into that directory (as its only file).
-
-Add this keyword to the configuration file (see above):
+- Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and 
wikipedia_index_data3.json

Review comment:
   Updated





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jon-wei commented on a change in pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



jon-wei commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r417035061



##
File path: integration-tests/README.md
##
@@ -214,31 +214,32 @@ of the integration test run discussed above.  This is 
because druid
 test clusters might not, in general, have access to hadoop.
 This also applies to integration test that uses Hadoop HDFS as an inputSource 
or as a deep storage. 
 To run integration test that uses Hadoop, you will have to run a Hadoop 
cluster. This can be done in two ways:
+1) Run Druid Docker test clusters with Hadoop container by passing 
-Dstart.hadoop.docker=true to the mvn command. 
 1) Run your own Druid + Hadoop cluster and specified Hadoop configs in the 
configuration file (CONFIG_FILE).
-2) Run Druid Docker test clusters with Hadoop container by passing 
-Dstart.hadoop.docker=true to the mvn command. 
 
 Currently, hdfs-deep-storage and other -deep-storage integration test 
groups can only be run with 
 Druid Docker test clusters by passing -Dstart.hadoop.docker=true to start 
Hadoop container.
 You will also have to provide -Doverride.config.path= with your 
Druid's Hadoop configs set. 
 See integration-tests/docker/environment-configs/override-examples/hdfs 
directory for example.
 Note that if the integration test you are running also uses other cloud 
extension (S3, Azure, GCS), additional
-credentials/configs may need to be set in the same file as your Druid's Hadoop 
configs set. 
+credentials/configs may need to be set in the same file as your Druid's Hadoop 
configs set.
 
 Currently, ITHadoopIndexTest can only be run with your own Druid + Hadoop 
cluster by following the below steps:
-Create a directory called batchHadoop1 in the hadoop file system
-(anywhere you want) and put batch_hadoop.data 
(integration-tests/src/test/resources/hadoop/batch_hadoop.data) 
-into that directory (as its only file).
-
-Add this keyword to the configuration file (see above):
+- Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and 
wikipedia_index_data3.json
+  located in integration-tests/src/test/resources/data/batch_index/json to 
your HDFS at /batch_index/json/
+  If using the Docker-based Hadoop container, this is automatically done by 
the integration tests.
+- Copy batch_hadoop.data located in 
integration-tests/src/test/resources/data/batch_index/tsv to your HDFS
+  at /batch_index/tsv/
+  If using the Docker-based Hadoop container, this is automatically done by 
the integration tests.
 
+Run the test using mvn (using the bundled Docker-based Hadoop cluster):
 ```
-"hadoopTestDir": ""
+  mvn verify -P integration-tests -Dit.test=ITHadoopIndexTest 
-Dstart.hadoop.docker=true 
-Doverride.config.path=docker/environment-configs/override-examples/hdfs 
-Dextra.datasource.name.suffix=''
 ```
 
-Run the test using mvn:
-
+Run the test using mvn (using config file for existing Hadoop cluster):
 ```
-  mvn verify -P int-tests-config-file -Dit.test=ITHadoopIndexTest
+  mvn verify -P int-tests-config-file -Dit.test=ITHadoopIndexTest 
-Doverride.config.path=docker/environment-configs/override-examples/hdfs 
-Dextra.datasource.name.suffix=''

Review comment:
   Removed that part for the manual example

##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java
##
@@ -84,6 +82,15 @@
*/
   public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage";
 
+  public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage";

Review comment:
   Updated as suggested





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jon-wei commented on a change in pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



jon-wei commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r417034943



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/hadoop/ITGcsInputToGcsHadoopIndexTest.java
##
@@ -29,17 +29,18 @@
  * To run this test, you must:
  * 1) Set the bucket and path for your data. This can be done by setting 
-Ddruid.test.config.cloudBucket and
  *-Ddruid.test.config.cloudPath or setting "cloud_bucket" and "cloud_path" 
in the config file.
- * 2. Set -Ddruid.test.config.hadoopGcsCredentialsPath to the location of your 
Google credentials file as it
+ * 2) Set -Ddruid.test.config.hadoopGcsCredentialsPath to the location of your 
Google credentials file as it
  *exists within the Hadoop cluster that will ingest the data. The 
credentials file can be placed in the
  *shared folder used by the integration test containers if running the 
Docker-based Hadoop container,
  *in which case this property can be set to 
/shared/
- * 2) Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and 
wikipedia_index_data3.json
+ * 3) Provide -Dresource.file.dir.path= with folder that 
contains GOOGLE_APPLICATION_CREDENTIALS file
+ * 4) Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and 
wikipedia_index_data3.json
  *located in integration-tests/src/test/resources/data/batch_index/json to 
your GCS at the location set in step 1.
- * 3) Provide -Doverride.config.path= with gcs configs set. See
+ * 5) Provide -Doverride.config.path= with gcs configs set. See
  *
integration-tests/docker/environment-configs/override-examples/hadoop/gcs_to_gcs
 for env vars to provide.
- * 4) Run the test with -Dstart.hadoop.docker=true 
-Dextra.datasource.name.suffix='' in the mvn command
+ * 6) Run the test with -Dstart.hadoop.docker=true 
-Dextra.datasource.name.suffix='' in the mvn command

Review comment:
   Added a note about that to the README





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jon-wei opened a new issue #9788: Segments cannot be loaded from HDFS deep storage when datasource name has special characters

2020-04-28 Thread GitBox



jon-wei opened a new issue #9788:
URL: https://github.com/apache/druid/issues/9788


   Using non-ASCII chars in datasource names with HDFS storage results in 
segments failing to load:
   
   ```
   java.io.FileNotFoundException: File does not exist: 
/druid/segments/wikipedia_hadoop_index_test_728672c5-affc-4116-9792-d81e2599eaf4%20Россия%20한국%20中国!%3F/20130831T00.000Z_20130901T00.000Z/2020-04-29T02_21_33.533Z/0_index.zip
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)
   
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.8.0_232]
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_232]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_232]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_232]
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
 ~[hadoop-common-2.8.5.jar:?]
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
 ~[hadoop-common-2.8.5.jar:?]
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:849) 
~[hadoop-hdfs-client-2.8.5.jar:?]
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:836) 
~[hadoop-hdfs-client-2.8.5.jar:?]
at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:825) 
~[hadoop-hdfs-client-2.8.5.jar:?]
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:325)
 ~[hadoop-hdfs-client-2.8.5.jar:?]
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:285) 
~[hadoop-hdfs-client-2.8.5.jar:?]
at 
org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:270) 
~[hadoop-hdfs-client-2.8.5.jar:?]
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1064) 
~[hadoop-hdfs-client-2.8.5.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328)
 ~[hadoop-hdfs-client-2.8.5.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:325)
 ~[hadoop-hdfs-client-2.8.5.jar:?]
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-2.8.5.jar:?]
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325)
 ~[hadoop-hdfs-client-2.8.5.jar:?]
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787) 
~[hadoop-common-2.8.5.jar:?]
at 
org.apache.druid.storage.hdfs.HdfsDataSegmentPuller$2.openInputStream(HdfsDataSegmentPuller.java:124)
 ~[?:?]
at 
org.apache.druid.storage.hdfs.HdfsDataSegmentPuller.getInputStream(HdfsDataSegmentPuller.java:298)
 ~[?:?]
at 
org.apache.druid.storage.hdfs.HdfsDataSegmentPuller$3.openStream(HdfsDataSegmentPuller.java:249)
 ~[?:?]
at 
org.apache.druid.utils.CompressionUtils.lambda$unzip$1(CompressionUtils.java:182)
 ~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
at 
org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:87) 
~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
at 
org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:115) 
~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
at 
org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:105) 
~[druid-core-0.19.0-SNAPSHOT.jar:0.19.0-SNAPSHOT]
at 
org.apache.druid.utils.Compressi

[GitHub] [druid] jihoonson commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



jihoonson commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r417032580



##
File path: 
processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java
##
@@ -50,11 +50,13 @@ public 
MultiValueExpressionDimensionSelector(ColumnValueSelector baseS
 this.baseSelector = baseSelector;
   }
 
+  @Nullable

Review comment:
   Oh I meant for this method. This method is technically private and only 
used in this class. Some of the callers are still not checking nulls, for 
example 
[here](https://github.com/apache/druid/pull/9760/files/79eb23832004e90ed498aa1448d3bc9513cd6b50#diff-2ae123ea2c178ff9151ddbb31ed5f2b8R108).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



suneet-s commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r417031506



##
File path: 
processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java
##
@@ -50,11 +50,13 @@ public 
MultiValueExpressionDimensionSelector(ColumnValueSelector baseS
 this.baseSelector = baseSelector;
   }
 
+  @Nullable

Review comment:
   I'd like to do this in phases, there are about 600+ issues that intelliJ 
reports in the druid-processing sub module. I'll pick through more of them in 
my next PR. Once they are fixed, I will enable these warnings in the 
inspections job





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



suneet-s commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r417031506



##
File path: 
processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java
##
@@ -50,11 +50,13 @@ public 
MultiValueExpressionDimensionSelector(ColumnValueSelector baseS
 this.baseSelector = baseSelector;
   }
 
+  @Nullable

Review comment:
   I'd like to do this in phases, there are about 600+ issues that intelliJ 
reports in the druid-processing sub module. I'll pick through more of them in 
my next PR





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] maytasm commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9783:
URL: https://github.com/apache/druid/pull/9783#discussion_r417007253



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractStreamIndexingTest.java
##
@@ -67,92 +92,147 @@
   private IntegrationTestingConfig config;
 
   private StreamAdminClient streamAdminClient;
-  private WikipediaStreamEventStreamGenerator wikipediaStreamEventGenerator;
 
   abstract StreamAdminClient createStreamAdminClient(IntegrationTestingConfig 
config) throws Exception;
-  abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig 
config) throws Exception;
-  abstract Function 
generateStreamIngestionPropsTransform(String streamName,
-  
String fullDatasourceName,
-  
IntegrationTestingConfig config);
+
+  abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig 
config, boolean transactionEnabled)
+  throws Exception;
+
+  abstract Function generateStreamIngestionPropsTransform(
+  String streamName,
+  String fullDatasourceName,
+  String parserType,
+  String parserOrInputFormat,
+  IntegrationTestingConfig config
+  );
+
   abstract Function generateStreamQueryPropsTransform(String 
streamName, String fullDatasourceName);
+
   public abstract String getTestNamePrefix();
 
   protected void doBeforeClass() throws Exception
   {
 streamAdminClient = createStreamAdminClient(config);
-wikipediaStreamEventGenerator = new 
WikipediaStreamEventStreamGenerator(EVENTS_PER_SECOND, CYCLE_PADDING_MS);
   }
 
-  protected void doClassTeardown()
+  private static String getOnlyResourcePath(String resourceRoot) throws 
IOException
   {
-wikipediaStreamEventGenerator.shutdown();
+return String.join("/", resourceRoot, 
Iterables.getOnlyElement(listResources(resourceRoot)));
   }
 
-  protected void doTestIndexDataWithLegacyParserStableState() throws Exception
+  protected static List listDataFormatResources() throws IOException
   {
-StreamEventWriter streamEventWriter = createStreamEventWriter(config);
-final GeneratedTestConfig generatedTestConfig = new GeneratedTestConfig();
-try (
-final Closeable ignored1 = 
unloader(generatedTestConfig.getFullDatasourceName())
-) {
-  final String taskSpec = 
generatedTestConfig.getStreamIngestionPropsTransform().apply(getResourceAsString(INDEXER_FILE_LEGACY_PARSER));
-  LOG.info("supervisorSpec: [%s]\n", taskSpec);
-  // Start supervisor
-  generatedTestConfig.setSupervisorId(indexer.submitSupervisor(taskSpec));
-  LOG.info("Submitted supervisor");
-  // Start data generator
-  wikipediaStreamEventGenerator.run(generatedTestConfig.getStreamName(), 
streamEventWriter, TOTAL_NUMBER_OF_SECOND, FIRST_EVENT_TIME);
-  verifyIngestedData(generatedTestConfig);
+return listResources(DATA_RESOURCE_ROOT)
+.stream()
+.filter(resource -> !SUPERVISOR_SPEC_TEMPLATE_FILE.equals(resource))
+.collect(Collectors.toList());
+  }
+
+  /**
+   * Returns a map of key to path to spec. The returned map contains at least 
2 specs and one of them
+   * should be a {@link #SERIALIZER} spec.
+   */
+  protected static Map findTestSpecs(String resourceRoot) 
throws IOException
+  {
+final List specDirs = listResources(resourceRoot);
+final Map map = new HashMap<>();
+for (String eachSpec : specDirs) {
+  if (SERIALIZER_SPEC_DIR.equals(eachSpec)) {
+map.put(SERIALIZER, getOnlyResourcePath(String.join("/", resourceRoot, 
SERIALIZER_SPEC_DIR)));
+  } else if (INPUT_ROW_PARSER_SPEC_DIR.equals(eachSpec)) {
+map.put(INPUT_ROW_PARSER, getOnlyResourcePath(String.join("/", 
resourceRoot, INPUT_ROW_PARSER_SPEC_DIR)));
+  } else if (INPUT_FORMAT_SPEC_DIR.equals(eachSpec)) {
+map.put(INPUT_FORMAT, getOnlyResourcePath(String.join("/", 
resourceRoot, INPUT_FORMAT_SPEC_DIR)));
+  }
 }
-finally {
-  doMethodTeardown(generatedTestConfig, streamEventWriter);
+if (!map.containsKey(SERIALIZER_SPEC_DIR)) {
+  throw new IAE("Failed to find serializer spec under [%s]. Found 
resources are %s", resourceRoot, map);
 }
+if (map.size() == 1) {
+  throw new IAE("Failed to find input format or parser spec under [%s]. 
Found resources are %s", resourceRoot, map);
+}
+return map;
+  }
+
+  private Closeable createResourceCloser(GeneratedTestConfig 
generatedTestConfig)
+  {
+return Closer.create().register(() -> 
doMethodTeardown(generatedTestConfig));
   }
 
-  protected void doTestIndexDataWithInputFormatStableState() throws Exception
+  protected void doTestIndexDataStableState(
+  boolean transactionEnabled,
+  String serializerPath,
+  String parserType,
+  String specPath
+  ) throws Exception
   {
-StreamEventWriter streamEventWriter = cr

[GitHub] [druid] maytasm commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9783:
URL: https://github.com/apache/druid/pull/9783#discussion_r417007253



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractStreamIndexingTest.java
##
@@ -67,92 +92,147 @@
   private IntegrationTestingConfig config;
 
   private StreamAdminClient streamAdminClient;
-  private WikipediaStreamEventStreamGenerator wikipediaStreamEventGenerator;
 
   abstract StreamAdminClient createStreamAdminClient(IntegrationTestingConfig 
config) throws Exception;
-  abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig 
config) throws Exception;
-  abstract Function 
generateStreamIngestionPropsTransform(String streamName,
-  
String fullDatasourceName,
-  
IntegrationTestingConfig config);
+
+  abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig 
config, boolean transactionEnabled)
+  throws Exception;
+
+  abstract Function generateStreamIngestionPropsTransform(
+  String streamName,
+  String fullDatasourceName,
+  String parserType,
+  String parserOrInputFormat,
+  IntegrationTestingConfig config
+  );
+
   abstract Function generateStreamQueryPropsTransform(String 
streamName, String fullDatasourceName);
+
   public abstract String getTestNamePrefix();
 
   protected void doBeforeClass() throws Exception
   {
 streamAdminClient = createStreamAdminClient(config);
-wikipediaStreamEventGenerator = new 
WikipediaStreamEventStreamGenerator(EVENTS_PER_SECOND, CYCLE_PADDING_MS);
   }
 
-  protected void doClassTeardown()
+  private static String getOnlyResourcePath(String resourceRoot) throws 
IOException
   {
-wikipediaStreamEventGenerator.shutdown();
+return String.join("/", resourceRoot, 
Iterables.getOnlyElement(listResources(resourceRoot)));
   }
 
-  protected void doTestIndexDataWithLegacyParserStableState() throws Exception
+  protected static List listDataFormatResources() throws IOException
   {
-StreamEventWriter streamEventWriter = createStreamEventWriter(config);
-final GeneratedTestConfig generatedTestConfig = new GeneratedTestConfig();
-try (
-final Closeable ignored1 = 
unloader(generatedTestConfig.getFullDatasourceName())
-) {
-  final String taskSpec = 
generatedTestConfig.getStreamIngestionPropsTransform().apply(getResourceAsString(INDEXER_FILE_LEGACY_PARSER));
-  LOG.info("supervisorSpec: [%s]\n", taskSpec);
-  // Start supervisor
-  generatedTestConfig.setSupervisorId(indexer.submitSupervisor(taskSpec));
-  LOG.info("Submitted supervisor");
-  // Start data generator
-  wikipediaStreamEventGenerator.run(generatedTestConfig.getStreamName(), 
streamEventWriter, TOTAL_NUMBER_OF_SECOND, FIRST_EVENT_TIME);
-  verifyIngestedData(generatedTestConfig);
+return listResources(DATA_RESOURCE_ROOT)
+.stream()
+.filter(resource -> !SUPERVISOR_SPEC_TEMPLATE_FILE.equals(resource))
+.collect(Collectors.toList());
+  }
+
+  /**
+   * Returns a map of key to path to spec. The returned map contains at least 
2 specs and one of them
+   * should be a {@link #SERIALIZER} spec.
+   */
+  protected static Map findTestSpecs(String resourceRoot) 
throws IOException
+  {
+final List specDirs = listResources(resourceRoot);
+final Map map = new HashMap<>();
+for (String eachSpec : specDirs) {
+  if (SERIALIZER_SPEC_DIR.equals(eachSpec)) {
+map.put(SERIALIZER, getOnlyResourcePath(String.join("/", resourceRoot, 
SERIALIZER_SPEC_DIR)));
+  } else if (INPUT_ROW_PARSER_SPEC_DIR.equals(eachSpec)) {
+map.put(INPUT_ROW_PARSER, getOnlyResourcePath(String.join("/", 
resourceRoot, INPUT_ROW_PARSER_SPEC_DIR)));
+  } else if (INPUT_FORMAT_SPEC_DIR.equals(eachSpec)) {
+map.put(INPUT_FORMAT, getOnlyResourcePath(String.join("/", 
resourceRoot, INPUT_FORMAT_SPEC_DIR)));
+  }
 }
-finally {
-  doMethodTeardown(generatedTestConfig, streamEventWriter);
+if (!map.containsKey(SERIALIZER_SPEC_DIR)) {
+  throw new IAE("Failed to find serializer spec under [%s]. Found 
resources are %s", resourceRoot, map);
 }
+if (map.size() == 1) {
+  throw new IAE("Failed to find input format or parser spec under [%s]. 
Found resources are %s", resourceRoot, map);
+}
+return map;
+  }
+
+  private Closeable createResourceCloser(GeneratedTestConfig 
generatedTestConfig)
+  {
+return Closer.create().register(() -> 
doMethodTeardown(generatedTestConfig));
   }
 
-  protected void doTestIndexDataWithInputFormatStableState() throws Exception
+  protected void doTestIndexDataStableState(
+  boolean transactionEnabled,
+  String serializerPath,
+  String parserType,
+  String specPath
+  ) throws Exception
   {
-StreamEventWriter streamEventWriter = cr

[GitHub] [druid] jihoonson commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



jihoonson commented on a change in pull request #9783:
URL: https://github.com/apache/druid/pull/9783#discussion_r417000607



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractStreamIndexingTest.java
##
@@ -67,92 +92,147 @@
   private IntegrationTestingConfig config;
 
   private StreamAdminClient streamAdminClient;
-  private WikipediaStreamEventStreamGenerator wikipediaStreamEventGenerator;
 
   abstract StreamAdminClient createStreamAdminClient(IntegrationTestingConfig 
config) throws Exception;
-  abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig 
config) throws Exception;
-  abstract Function 
generateStreamIngestionPropsTransform(String streamName,
-  
String fullDatasourceName,
-  
IntegrationTestingConfig config);
+
+  abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig 
config, boolean transactionEnabled)
+  throws Exception;
+
+  abstract Function generateStreamIngestionPropsTransform(
+  String streamName,
+  String fullDatasourceName,
+  String parserType,
+  String parserOrInputFormat,
+  IntegrationTestingConfig config
+  );
+
   abstract Function generateStreamQueryPropsTransform(String 
streamName, String fullDatasourceName);
+
   public abstract String getTestNamePrefix();
 
   protected void doBeforeClass() throws Exception
   {
 streamAdminClient = createStreamAdminClient(config);
-wikipediaStreamEventGenerator = new 
WikipediaStreamEventStreamGenerator(EVENTS_PER_SECOND, CYCLE_PADDING_MS);
   }
 
-  protected void doClassTeardown()
+  private static String getOnlyResourcePath(String resourceRoot) throws 
IOException
   {
-wikipediaStreamEventGenerator.shutdown();
+return String.join("/", resourceRoot, 
Iterables.getOnlyElement(listResources(resourceRoot)));
   }
 
-  protected void doTestIndexDataWithLegacyParserStableState() throws Exception
+  protected static List listDataFormatResources() throws IOException
   {
-StreamEventWriter streamEventWriter = createStreamEventWriter(config);
-final GeneratedTestConfig generatedTestConfig = new GeneratedTestConfig();
-try (
-final Closeable ignored1 = 
unloader(generatedTestConfig.getFullDatasourceName())
-) {
-  final String taskSpec = 
generatedTestConfig.getStreamIngestionPropsTransform().apply(getResourceAsString(INDEXER_FILE_LEGACY_PARSER));
-  LOG.info("supervisorSpec: [%s]\n", taskSpec);
-  // Start supervisor
-  generatedTestConfig.setSupervisorId(indexer.submitSupervisor(taskSpec));
-  LOG.info("Submitted supervisor");
-  // Start data generator
-  wikipediaStreamEventGenerator.run(generatedTestConfig.getStreamName(), 
streamEventWriter, TOTAL_NUMBER_OF_SECOND, FIRST_EVENT_TIME);
-  verifyIngestedData(generatedTestConfig);
+return listResources(DATA_RESOURCE_ROOT)
+.stream()
+.filter(resource -> !SUPERVISOR_SPEC_TEMPLATE_FILE.equals(resource))
+.collect(Collectors.toList());
+  }
+
+  /**
+   * Returns a map of key to path to spec. The returned map contains at least 
2 specs and one of them
+   * should be a {@link #SERIALIZER} spec.
+   */
+  protected static Map findTestSpecs(String resourceRoot) 
throws IOException
+  {
+final List specDirs = listResources(resourceRoot);
+final Map map = new HashMap<>();
+for (String eachSpec : specDirs) {
+  if (SERIALIZER_SPEC_DIR.equals(eachSpec)) {
+map.put(SERIALIZER, getOnlyResourcePath(String.join("/", resourceRoot, 
SERIALIZER_SPEC_DIR)));
+  } else if (INPUT_ROW_PARSER_SPEC_DIR.equals(eachSpec)) {
+map.put(INPUT_ROW_PARSER, getOnlyResourcePath(String.join("/", 
resourceRoot, INPUT_ROW_PARSER_SPEC_DIR)));
+  } else if (INPUT_FORMAT_SPEC_DIR.equals(eachSpec)) {
+map.put(INPUT_FORMAT, getOnlyResourcePath(String.join("/", 
resourceRoot, INPUT_FORMAT_SPEC_DIR)));
+  }
 }
-finally {
-  doMethodTeardown(generatedTestConfig, streamEventWriter);
+if (!map.containsKey(SERIALIZER_SPEC_DIR)) {
+  throw new IAE("Failed to find serializer spec under [%s]. Found 
resources are %s", resourceRoot, map);
 }
+if (map.size() == 1) {
+  throw new IAE("Failed to find input format or parser spec under [%s]. 
Found resources are %s", resourceRoot, map);
+}
+return map;
+  }
+
+  private Closeable createResourceCloser(GeneratedTestConfig 
generatedTestConfig)
+  {
+return Closer.create().register(() -> 
doMethodTeardown(generatedTestConfig));
   }
 
-  protected void doTestIndexDataWithInputFormatStableState() throws Exception
+  protected void doTestIndexDataStableState(
+  boolean transactionEnabled,
+  String serializerPath,
+  String parserType,
+  String specPath
+  ) throws Exception
   {
-StreamEventWriter streamEventWriter =

[GitHub] [druid] maytasm commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9783:
URL: https://github.com/apache/druid/pull/9783#discussion_r416997861



##
File path: integration-tests/src/test/resources/testng.xml
##
@@ -20,7 +20,7 @@
 http://testng.org/testng-1.0.dtd"; >
 
 
-
+

Review comment:
   nit: might be helpful to add what this does / how to use in README.md





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] maytasm commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9783:
URL: https://github.com/apache/druid/pull/9783#discussion_r416995812



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractStreamIndexingTest.java
##
@@ -67,92 +92,147 @@
   private IntegrationTestingConfig config;
 
   private StreamAdminClient streamAdminClient;
-  private WikipediaStreamEventStreamGenerator wikipediaStreamEventGenerator;
 
   abstract StreamAdminClient createStreamAdminClient(IntegrationTestingConfig 
config) throws Exception;
-  abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig 
config) throws Exception;
-  abstract Function 
generateStreamIngestionPropsTransform(String streamName,
-  
String fullDatasourceName,
-  
IntegrationTestingConfig config);
+
+  abstract StreamEventWriter createStreamEventWriter(IntegrationTestingConfig 
config, boolean transactionEnabled)
+  throws Exception;
+
+  abstract Function generateStreamIngestionPropsTransform(
+  String streamName,
+  String fullDatasourceName,
+  String parserType,
+  String parserOrInputFormat,
+  IntegrationTestingConfig config
+  );
+
   abstract Function generateStreamQueryPropsTransform(String 
streamName, String fullDatasourceName);
+
   public abstract String getTestNamePrefix();
 
   protected void doBeforeClass() throws Exception
   {
 streamAdminClient = createStreamAdminClient(config);
-wikipediaStreamEventGenerator = new 
WikipediaStreamEventStreamGenerator(EVENTS_PER_SECOND, CYCLE_PADDING_MS);
   }
 
-  protected void doClassTeardown()
+  private static String getOnlyResourcePath(String resourceRoot) throws 
IOException
   {
-wikipediaStreamEventGenerator.shutdown();
+return String.join("/", resourceRoot, 
Iterables.getOnlyElement(listResources(resourceRoot)));
   }
 
-  protected void doTestIndexDataWithLegacyParserStableState() throws Exception
+  protected static List listDataFormatResources() throws IOException
   {
-StreamEventWriter streamEventWriter = createStreamEventWriter(config);
-final GeneratedTestConfig generatedTestConfig = new GeneratedTestConfig();
-try (
-final Closeable ignored1 = 
unloader(generatedTestConfig.getFullDatasourceName())
-) {
-  final String taskSpec = 
generatedTestConfig.getStreamIngestionPropsTransform().apply(getResourceAsString(INDEXER_FILE_LEGACY_PARSER));
-  LOG.info("supervisorSpec: [%s]\n", taskSpec);
-  // Start supervisor
-  generatedTestConfig.setSupervisorId(indexer.submitSupervisor(taskSpec));
-  LOG.info("Submitted supervisor");
-  // Start data generator
-  wikipediaStreamEventGenerator.run(generatedTestConfig.getStreamName(), 
streamEventWriter, TOTAL_NUMBER_OF_SECOND, FIRST_EVENT_TIME);
-  verifyIngestedData(generatedTestConfig);
+return listResources(DATA_RESOURCE_ROOT)
+.stream()
+.filter(resource -> !SUPERVISOR_SPEC_TEMPLATE_FILE.equals(resource))
+.collect(Collectors.toList());
+  }
+
+  /**
+   * Returns a map of key to path to spec. The returned map contains at least 
2 specs and one of them
+   * should be a {@link #SERIALIZER} spec.
+   */
+  protected static Map findTestSpecs(String resourceRoot) 
throws IOException
+  {
+final List specDirs = listResources(resourceRoot);
+final Map map = new HashMap<>();
+for (String eachSpec : specDirs) {
+  if (SERIALIZER_SPEC_DIR.equals(eachSpec)) {
+map.put(SERIALIZER, getOnlyResourcePath(String.join("/", resourceRoot, 
SERIALIZER_SPEC_DIR)));
+  } else if (INPUT_ROW_PARSER_SPEC_DIR.equals(eachSpec)) {
+map.put(INPUT_ROW_PARSER, getOnlyResourcePath(String.join("/", 
resourceRoot, INPUT_ROW_PARSER_SPEC_DIR)));
+  } else if (INPUT_FORMAT_SPEC_DIR.equals(eachSpec)) {
+map.put(INPUT_FORMAT, getOnlyResourcePath(String.join("/", 
resourceRoot, INPUT_FORMAT_SPEC_DIR)));
+  }
 }
-finally {
-  doMethodTeardown(generatedTestConfig, streamEventWriter);
+if (!map.containsKey(SERIALIZER_SPEC_DIR)) {
+  throw new IAE("Failed to find serializer spec under [%s]. Found 
resources are %s", resourceRoot, map);
 }
+if (map.size() == 1) {
+  throw new IAE("Failed to find input format or parser spec under [%s]. 
Found resources are %s", resourceRoot, map);
+}
+return map;
+  }
+
+  private Closeable createResourceCloser(GeneratedTestConfig 
generatedTestConfig)
+  {
+return Closer.create().register(() -> 
doMethodTeardown(generatedTestConfig));
   }
 
-  protected void doTestIndexDataWithInputFormatStableState() throws Exception
+  protected void doTestIndexDataStableState(
+  boolean transactionEnabled,
+  String serializerPath,
+  String parserType,
+  String specPath
+  ) throws Exception
   {
-StreamEventWriter streamEventWriter = cr

[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



suneet-s commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r416992856



##
File path: 
processing/src/main/java/org/apache/druid/segment/join/lookup/LookupJoinMatcher.java
##
@@ -277,7 +277,7 @@ public void matchRemainder()
 } else {
   currentIterator = Iterators.filter(
   extractor.iterable().iterator(),
-  entry -> !matchedKeys.contains(entry.getKey())
+  entry -> entry != null && !matchedKeys.contains(entry.getKey())

Review comment:
   good call. Looks like this is on a hot path too, so I've added an 
inspection suppression instead of an assert not null. 👍 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



suneet-s commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r416992856



##
File path: 
processing/src/main/java/org/apache/druid/segment/join/lookup/LookupJoinMatcher.java
##
@@ -277,7 +277,7 @@ public void matchRemainder()
 } else {
   currentIterator = Iterators.filter(
   extractor.iterable().iterator(),
-  entry -> !matchedKeys.contains(entry.getKey())
+  entry -> entry != null && !matchedKeys.contains(entry.getKey())

Review comment:
   good call. Looks like this is on a hot path too, so I've added an 
inspection suppression. 👍 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jihoonson commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



jihoonson commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r416985934



##
File path: 
processing/src/main/java/org/apache/druid/segment/join/lookup/LookupJoinMatcher.java
##
@@ -277,7 +277,7 @@ public void matchRemainder()
 } else {
   currentIterator = Iterators.filter(
   extractor.iterable().iterator(),
-  entry -> !matchedKeys.contains(entry.getKey())
+  entry -> entry != null && !matchedKeys.contains(entry.getKey())

Review comment:
   Yeah. It also doesn't seem reasonable that a "row" can be null. I think 
it's a false warning because of the `Predicate` interface of Guava. It would be 
better to suppress warning instead of checking null.

##
File path: 
processing/src/main/java/org/apache/druid/segment/virtual/MultiValueExpressionDimensionSelector.java
##
@@ -50,11 +50,13 @@ public 
MultiValueExpressionDimensionSelector(ColumnValueSelector baseS
 this.baseSelector = baseSelector;
   }
 
+  @Nullable

Review comment:
   Does all callers need to check the returned eval? Seems there are a 
couple of missing places.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



suneet-s commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r416986933



##
File path: 
processing/src/main/java/org/apache/druid/segment/join/PossiblyNullDimensionSelector.java
##
@@ -138,7 +138,8 @@ public int lookupId(@Nullable String name)
   // id 0 is always null for this selector impl.
   return 0;
 } else {
-  return baseSelector.idLookup().lookupId(name) + nullAdjustment;
+  IdLookup idLookup = baseSelector.idLookup();
+  return (idLookup == null ? 0 : idLookup.lookupId(name)) + nullAdjustment;

Review comment:
   Added an assert statement with an explanation of why it should never be 
null in here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



suneet-s commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r416984862



##
File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinConditionAnalysis.java
##
@@ -79,7 +79,7 @@ private JoinConditionAnalysis(
 .allMatch(expr 
-> expr.isLiteral() && expr.eval(
 
ExprUtils.nilBindings()).asBoolean());
 canHashJoin = nonEquiConditions.stream().allMatch(Expr::isLiteral);
-rightKeyColumns = 
getEquiConditions().stream().map(Equality::getRightColumn).distinct().collect(Collectors.toSet());
+rightKeyColumns = 
getEquiConditions().stream().map(Equality::getRightColumn).collect(Collectors.toSet());

Review comment:
   should be a no-op. intelliJ suggested it's un-necessary since we're 
collecting to a set





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jon-wei commented on a change in pull request #9760: Fix potential NPEs in joins

2020-04-28 Thread GitBox



jon-wei commented on a change in pull request #9760:
URL: https://github.com/apache/druid/pull/9760#discussion_r416975593



##
File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinConditionAnalysis.java
##
@@ -79,7 +79,7 @@ private JoinConditionAnalysis(
 .allMatch(expr 
-> expr.isLiteral() && expr.eval(
 
ExprUtils.nilBindings()).asBoolean());
 canHashJoin = nonEquiConditions.stream().allMatch(Expr::isLiteral);
-rightKeyColumns = 
getEquiConditions().stream().map(Equality::getRightColumn).distinct().collect(Collectors.toSet());
+rightKeyColumns = 
getEquiConditions().stream().map(Equality::getRightColumn).collect(Collectors.toSet());

Review comment:
   What's the impact of removing the `distinct()` call (is it just 
unnecessary since they're being collected to a set already?)

##
File path: 
processing/src/main/java/org/apache/druid/segment/join/PossiblyNullDimensionSelector.java
##
@@ -138,7 +138,8 @@ public int lookupId(@Nullable String name)
   // id 0 is always null for this selector impl.
   return 0;
 } else {
-  return baseSelector.idLookup().lookupId(name) + nullAdjustment;
+  IdLookup idLookup = baseSelector.idLookup();
+  return (idLookup == null ? 0 : idLookup.lookupId(name)) + nullAdjustment;

Review comment:
   A null `idLookup` there should be an error condition, since the caller 
should check if `idLookup()` returned null before calling `lookupId(name)` on 
the `PossiblyNullDimensionSelector`

##
File path: 
processing/src/main/java/org/apache/druid/segment/join/lookup/LookupJoinable.java
##
@@ -95,18 +96,23 @@ public JoinMatcher makeJoinMatcher(
   boolean allowNonKeyColumnSearch
   )
   {
+if (!ALL_COLUMNS.contains(searchColumnName) || 
!ALL_COLUMNS.contains(retrievalColumnName)) {
+  return ImmutableSet.of();
+}
 Set correlatedValues;
 if (LookupColumnSelectorFactory.KEY_COLUMN.equals(searchColumnName)) {
   if (LookupColumnSelectorFactory.KEY_COLUMN.equals(retrievalColumnName)) {
 correlatedValues = ImmutableSet.of(searchColumnValue);
   } else {
-correlatedValues = ImmutableSet.of(extractor.apply(searchColumnName));
+// This should not happen in practice because the column to be joined 
on must be a key.
+correlatedValues = 
Collections.singleton(extractor.apply(searchColumnValue));

Review comment:
   :+1:





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s opened a new issue #9787: Filter pushdown for joins do not work a non-matching condition is on the RHS

2020-04-28 Thread GitBox



suneet-s opened a new issue #9787:
URL: https://github.com/apache/druid/issues/9787


   ### Affected Version
   
   0.18
   
   ### Description
   
   See the following CalciteQueryTest 
   ```
 @Test
 public void 
testFilterAndGroupByLookupUsingJoinOperatorWithValueFilterPushdownMatchesNothig()
 throws Exception
 {
   // Cannot vectorize JOIN operator.
   cannotVectorize();
   Map queryRewriteValueColumnFiltersContext = 
DEFAULT_QUERY_CONTEXT_BUILDER
   
.put(QueryContexts.JOIN_FILTER_REWRITE_VALUE_COLUMN_FILTERS_ENABLE_KEY, true)
   .build();
   testQuery(
   "SELECT lookyloo.k, COUNT(*)\n"
   + "FROM foo LEFT JOIN lookup.lookyloo ON foo.dim2 = lookyloo.k\n"
   + "WHERE lookyloo.v = '123'\n"
   + "GROUP BY lookyloo.k",
   queryRewriteValueColumnFiltersContext,
   ImmutableList.of(
   GroupByQuery.builder()
   .setDataSource(
   join(
   new 
TableDataSource(CalciteTests.DATASOURCE1),
   new LookupDataSource("lookyloo"),
   "j0.",
   
equalsCondition(DruidExpression.fromColumn("dim2"), 
DruidExpression.fromColumn("j0.k")),
   JoinType.LEFT
   )
   )
   .setInterval(querySegmentSpec(Filtration.eternity()))
   .setDimFilter(selector("j0.v", "123", null))
   .setGranularity(Granularities.ALL)
   .setDimensions(dimensions(new 
DefaultDimensionSpec("j0.k", "d0")))
   .setAggregatorSpecs(aggregators(new 
CountAggregatorFactory("a0")))
   .setContext(queryRewriteValueColumnFiltersContext)
   .build()
   ),
   ImmutableList.of()
   );
 }
   ```
   Since the filter `lookyloo.v = '123'` matches nothing, Druid treats this as 
if filter pushdown was disabled so it needs to do the hash join for every row 
in the fact table only to realize that the dimension does not match.
   
   The filter pushdown mechanism should handle the following cases
   * can't push down for some reason (like disabled by config)
   * non-existent column
   * no matches
   * matches some values



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r416969482



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java
##
@@ -84,6 +82,15 @@
*/
   public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage";
 
+  public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage";

Review comment:
   Ahhh actually you don't since we exclude the hadoop test package in 
testng.xml.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r416972412



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java
##
@@ -84,6 +82,15 @@
*/
   public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage";
 
+  public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage";

Review comment:
   Wait then your -Dgroups will not work. I believe if you run mvn cmd with 
-Dgroups it will use the suite defined in testng.xml. This suite excludes 
running hadoop package (org.apache.druid.tests.hadoop). I think one solution is 
you will have to remove the exclusion of the hadoop package from testng.xml and 
add all the new groups to the exclusion list in the Travis's other integration 
test group





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r416969482



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java
##
@@ -84,6 +82,15 @@
*/
   public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage";
 
+  public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage";

Review comment:
   Ahhh actually you don't since we exclude the hadoop test package.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jihoonson commented on a change in pull request #9740: fill out missing test coverage for druid-stats, druid-momentsketch, druid-tdigestsketch postaggs

2020-04-28 Thread GitBox



jihoonson commented on a change in pull request #9740:
URL: https://github.com/apache/druid/pull/9740#discussion_r416966480



##
File path: 
extensions-contrib/tdigestsketch/src/test/java/org/apache/druid/query/aggregation/tdigestsketch/TDigestSketchToQuantilePostAggregatorTest.java
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.tdigestsketch;
+
+import nl.jqno.equalsverifier.EqualsVerifier;
+import org.apache.druid.jackson.DefaultObjectMapper;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.query.aggregation.PostAggregator;
+import org.apache.druid.query.aggregation.post.ConstantPostAggregator;
+import org.junit.Assert;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+
+public class TDigestSketchToQuantilePostAggregatorTest
+{
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void testSerde() throws Exception
+  {
+TDigestSketchToQuantilePostAggregator there =
+new TDigestSketchToQuantilePostAggregator("post", new 
ConstantPostAggregator("", 100), 0.5);
+
+DefaultObjectMapper mapper = new DefaultObjectMapper();
+TDigestSketchToQuantilePostAggregator andBackAgain = mapper.readValue(
+mapper.writeValueAsString(there),
+TDigestSketchToQuantilePostAggregator.class
+);
+
+Assert.assertEquals(there, andBackAgain);
+Assert.assertArrayEquals(there.getCacheKey(), andBackAgain.getCacheKey());
+Assert.assertEquals(there.getDependentFields(), 
andBackAgain.getDependentFields());
+  }
+
+  @Test
+  public void testToString()
+  {
+PostAggregator postAgg =
+new TDigestSketchToQuantilePostAggregator("post", new 
ConstantPostAggregator("", 100), 0.5);
+
+Assert.assertEquals(
+"TDigestSketchToQuantilePostAggregator{name='post', 
field=ConstantPostAggregator{name='', constantValue=100}, fraction=0.5}",
+postAgg.toString()
+);
+  }
+
+  @Test
+  public void testComparator()
+  {
+expectedException.expect(IAE.class);
+expectedException.expectMessage("Comparing arrays of quantiles is not 
supported");

Review comment:
   Seems like the comparator should compare doubles instead of throwing an 
exception.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r416965658



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/TestNGGroup.java
##
@@ -84,6 +82,15 @@
*/
   public static final String HDFS_DEEP_STORAGE = "hdfs-deep-storage";
 
+  public static final String HADOOP_S3_TO_S3 = "hadoop-s3-to-s3-deep-storage";

Review comment:
   I think you also have to exclude this from Travis's other integration 
test group





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r416953397



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/hadoop/ITGcsInputToGcsHadoopIndexTest.java
##
@@ -29,17 +29,18 @@
  * To run this test, you must:
  * 1) Set the bucket and path for your data. This can be done by setting 
-Ddruid.test.config.cloudBucket and
  *-Ddruid.test.config.cloudPath or setting "cloud_bucket" and "cloud_path" 
in the config file.
- * 2. Set -Ddruid.test.config.hadoopGcsCredentialsPath to the location of your 
Google credentials file as it
+ * 2) Set -Ddruid.test.config.hadoopGcsCredentialsPath to the location of your 
Google credentials file as it
  *exists within the Hadoop cluster that will ingest the data. The 
credentials file can be placed in the
  *shared folder used by the integration test containers if running the 
Docker-based Hadoop container,
  *in which case this property can be set to 
/shared/
- * 2) Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and 
wikipedia_index_data3.json
+ * 3) Provide -Dresource.file.dir.path= with folder that 
contains GOOGLE_APPLICATION_CREDENTIALS file
+ * 4) Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and 
wikipedia_index_data3.json
  *located in integration-tests/src/test/resources/data/batch_index/json to 
your GCS at the location set in step 1.
- * 3) Provide -Doverride.config.path= with gcs configs set. See
+ * 5) Provide -Doverride.config.path= with gcs configs set. See
  *
integration-tests/docker/environment-configs/override-examples/hadoop/gcs_to_gcs
 for env vars to provide.
- * 4) Run the test with -Dstart.hadoop.docker=true 
-Dextra.datasource.name.suffix='' in the mvn command
+ * 6) Run the test with -Dstart.hadoop.docker=true 
-Dextra.datasource.name.suffix='' in the mvn command

Review comment:
   nit: maybe mention that -Dextra.datasource.name.suffix=''  is due to 
github issue xxx (not sure if we have an issue for this)

##
File path: integration-tests/README.md
##
@@ -214,31 +214,32 @@ of the integration test run discussed above.  This is 
because druid
 test clusters might not, in general, have access to hadoop.
 This also applies to integration test that uses Hadoop HDFS as an inputSource 
or as a deep storage. 
 To run integration test that uses Hadoop, you will have to run a Hadoop 
cluster. This can be done in two ways:
+1) Run Druid Docker test clusters with Hadoop container by passing 
-Dstart.hadoop.docker=true to the mvn command. 
 1) Run your own Druid + Hadoop cluster and specified Hadoop configs in the 
configuration file (CONFIG_FILE).
-2) Run Druid Docker test clusters with Hadoop container by passing 
-Dstart.hadoop.docker=true to the mvn command. 
 
 Currently, hdfs-deep-storage and other -deep-storage integration test 
groups can only be run with 
 Druid Docker test clusters by passing -Dstart.hadoop.docker=true to start 
Hadoop container.
 You will also have to provide -Doverride.config.path= with your 
Druid's Hadoop configs set. 
 See integration-tests/docker/environment-configs/override-examples/hdfs 
directory for example.
 Note that if the integration test you are running also uses other cloud 
extension (S3, Azure, GCS), additional
-credentials/configs may need to be set in the same file as your Druid's Hadoop 
configs set. 
+credentials/configs may need to be set in the same file as your Druid's Hadoop 
configs set.
 
 Currently, ITHadoopIndexTest can only be run with your own Druid + Hadoop 
cluster by following the below steps:
-Create a directory called batchHadoop1 in the hadoop file system
-(anywhere you want) and put batch_hadoop.data 
(integration-tests/src/test/resources/hadoop/batch_hadoop.data) 
-into that directory (as its only file).
-
-Add this keyword to the configuration file (see above):
+- Copy wikipedia_index_data1.json, wikipedia_index_data2.json, and 
wikipedia_index_data3.json
+  located in integration-tests/src/test/resources/data/batch_index/json to 
your HDFS at /batch_index/json/
+  If using the Docker-based Hadoop container, this is automatically done by 
the integration tests.
+- Copy batch_hadoop.data located in 
integration-tests/src/test/resources/data/batch_index/tsv to your HDFS
+  at /batch_index/tsv/
+  If using the Docker-based Hadoop container, this is automatically done by 
the integration tests.
 
+Run the test using mvn (using the bundled Docker-based Hadoop cluster):
 ```
-"hadoopTestDir": ""
+  mvn verify -P integration-tests -Dit.test=ITHadoopIndexTest 
-Dstart.hadoop.docker=true 
-Doverride.config.path=docker/environment-configs/override-examples/hdfs 
-Dextra.datasource.name.suffix=''
 ```
 
-Run the test using mvn:
-
+Run the test using mvn (using config file for existing Hadoop cluster):
 ```
-  mvn verify -P int-tests-config-file -Dit.test=ITHadoopIndexTest
+  mvn verify -P int-tests-config-file -Dit.test=ITHadoopIn

[GitHub] [druid] ccaominh commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



ccaominh commented on a change in pull request #9783:
URL: https://github.com/apache/druid/pull/9783#discussion_r416958063



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/parallelized/ITKinesisIndexingServiceDataFormatTest.java
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.tests.parallelized;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.inject.Inject;
+import org.apache.druid.guice.annotations.Json;
+import org.apache.druid.testing.guice.DruidTestModuleFactory;
+import org.apache.druid.tests.TestNGGroup;
+import org.apache.druid.tests.indexer.AbstractKinesisIndexingServiceTest;
+import org.apache.druid.tests.indexer.AbstractStreamIndexingTest;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.DataProvider;
+import org.testng.annotations.Guice;
+import org.testng.annotations.Test;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Test(groups = "kinesis-data-format")

Review comment:
   Keeping it separate is nice for running the groups concurrently. Perhaps 
we can explore TestNG [groups of 
groups](https://testng.org/doc/documentation-main.html#groups-of-groups) later 
to simplify running multiple groups.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jihoonson commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



jihoonson commented on a change in pull request #9783:
URL: https://github.com/apache/druid/pull/9783#discussion_r416956654



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/parallelized/ITKinesisIndexingServiceDataFormatTest.java
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.tests.parallelized;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.inject.Inject;
+import org.apache.druid.guice.annotations.Json;
+import org.apache.druid.testing.guice.DruidTestModuleFactory;
+import org.apache.druid.tests.TestNGGroup;
+import org.apache.druid.tests.indexer.AbstractKinesisIndexingServiceTest;
+import org.apache.druid.tests.indexer.AbstractStreamIndexingTest;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.DataProvider;
+import org.testng.annotations.Guice;
+import org.testng.annotations.Test;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Test(groups = "kinesis-data-format")

Review comment:
   On second thought, it might be useful to have a different group if we 
run this test on some CI in the future. I added a new group to `TestNGGroup`. 
The new group is disabled on Travis.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] maytasm commented on a change in pull request #9714: More Hadoop integration tests

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9714:
URL: https://github.com/apache/druid/pull/9714#discussion_r416950821



##
File path: integration-tests/README.md
##
@@ -214,31 +214,32 @@ of the integration test run discussed above.  This is 
because druid
 test clusters might not, in general, have access to hadoop.
 This also applies to integration test that uses Hadoop HDFS as an inputSource 
or as a deep storage. 
 To run integration test that uses Hadoop, you will have to run a Hadoop 
cluster. This can be done in two ways:
+1) Run Druid Docker test clusters with Hadoop container by passing 
-Dstart.hadoop.docker=true to the mvn command. 

Review comment:
   nit: typo? both are 1)

##
File path: integration-tests/run_cluster.sh
##
@@ -74,17 +74,18 @@
   # For druid-kinesis-indexing-service
   mkdir -p $SHARED_DIR/docker/extensions/druid-kinesis-indexing-service
   mv $SHARED_DIR/docker/lib/druid-kinesis-indexing-service-* 
$SHARED_DIR/docker/extensions/druid-kinesis-indexing-service
-  $ For druid-parquet-extensions
+  # For druid-parquet-extensions
   mkdir -p $SHARED_DIR/docker/extensions/druid-parquet-extensions
   mv $SHARED_DIR/docker/lib/druid-parquet-extensions-* 
$SHARED_DIR/docker/extensions/druid-parquet-extensions
-  $ For druid-orc-extensions
+  # For druid-orc-extensions
   mkdir -p $SHARED_DIR/docker/extensions/druid-orc-extensions
   mv $SHARED_DIR/docker/lib/druid-orc-extensions-* 
$SHARED_DIR/docker/extensions/druid-orc-extensions
 
   # Pull Hadoop dependency if needed
   if [ -n "$DRUID_INTEGRATION_TEST_START_HADOOP_DOCKER" ] && [ 
"$DRUID_INTEGRATION_TEST_START_HADOOP_DOCKER" == true ]
   then
-java -cp "$SHARED_DIR/docker/lib/*" 
-Ddruid.extensions.hadoopDependenciesDir="$SHARED_DIR/hadoop-dependencies" 
org.apache.druid.cli.Main tools pull-deps -h 
org.apache.hadoop:hadoop-client:2.8.5 -h org.apache.hadoop:hadoop-aws:2.8.5
+java -cp "$SHARED_DIR/docker/lib/*" 
-Ddruid.extensions.hadoopDependenciesDir="$SHARED_DIR/hadoop-dependencies" 
org.apache.druid.cli.Main tools pull-deps -h 
org.apache.hadoop:hadoop-client:2.8.5 -h org.apache.hadoop:hadoop-aws:2.8.5 -h 
org.apache.hadoop:hadoop-azure:2.8.5
+curl 
https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop2-latest.jar 
--output $SHARED_DIR/docker/lib/gcs-connector-hadoop2-latest.jar

Review comment:
   Should we create an issue for this? Seems like might be a problem for 
new users / going through tutorial? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jihoonson commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



jihoonson commented on a change in pull request #9783:
URL: https://github.com/apache/druid/pull/9783#discussion_r416944603



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/parallelized/ITKinesisIndexingServiceDataFormatTest.java
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.tests.parallelized;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.inject.Inject;
+import org.apache.druid.guice.annotations.Json;
+import org.apache.druid.testing.guice.DruidTestModuleFactory;
+import org.apache.druid.tests.TestNGGroup;
+import org.apache.druid.tests.indexer.AbstractKinesisIndexingServiceTest;
+import org.apache.druid.tests.indexer.AbstractStreamIndexingTest;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.DataProvider;
+import org.testng.annotations.Guice;
+import org.testng.annotations.Test;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Test(groups = "kinesis-data-format")

Review comment:
   Oh, I forgot to revert this change. I think it would be better to add 
this test to the existing `kinesis-index` group since we run them manually and 
probably want to run them all at once. What do you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] maytasm commented on a change in pull request #9778: Fix problem when running single integration test using -Dit.test=

2020-04-28 Thread GitBox



maytasm commented on a change in pull request #9778:
URL: https://github.com/apache/druid/pull/9778#discussion_r416943418



##
File path: 
integration-tests/src/main/java/org/testng/DruidTestRunnerFactory.java
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package /*CHECKSTYLE.OFF: PackageName*/org.testng/*CHECKSTYLE.ON: 
PackageName*/;
+
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.testing.utils.SuiteListener;
+import org.testng.internal.IConfiguration;
+import org.testng.internal.Systematiser;
+import org.testng.internal.annotations.IAnnotationFinder;
+import org.testng.xml.XmlTest;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * This class must be in package org.testng to access protected methods like 
TestNG.getDefault().getConfiguration()
+ */
+public class DruidTestRunnerFactory implements ITestRunnerFactory
+{
+  private static final Logger LOG = new Logger(DruidTestRunnerFactory.class);
+  private static final SuiteListener SUITE_LISTENER = new SuiteListener();
+
+  @Override
+  public TestRunner newTestRunner(ISuite suite,
+  XmlTest test,
+  Collection 
methodListeners,
+  List classListeners)
+  {
+IConfiguration configuration = TestNG.getDefault().getConfiguration();
+String outputDirectory = suite.getOutputDirectory();
+IAnnotationFinder annotationFinder = configuration.getAnnotationFinder();
+Boolean skipFailedInvocationCounts = 
suite.getXmlSuite().skipFailedInvocationCounts();
+return new DruidTestRunner(
+configuration,
+suite,
+test,
+outputDirectory,
+annotationFinder,
+skipFailedInvocationCounts,
+methodListeners,
+classListeners
+);
+  }
+
+  private static class DruidTestRunner extends TestRunner
+  {
+DruidTestRunner(
+IConfiguration configuration,
+ISuite suite,
+XmlTest test,
+String outputDirectory,
+IAnnotationFinder finder,
+boolean skipFailedInvocationCounts,
+Collection methodListeners,
+List classListeners
+)
+{
+  super(configuration, suite, test, outputDirectory, finder, 
skipFailedInvocationCounts, methodListeners, classListeners, 
Systematiser.getComparator(), Collections.emptyMap());
+}
+
+@Override
+public void run()
+{
+  try {
+// IntegrationTestSuite is run when -Dit.test is not specify in maven 
command.
+// IntegrationTestSuite uses the configuration from 
integration-tests/src/test/resources/testng.xml
+// which already handle suite onStart and onFinish automatically
+if (!"IntegrationTestSuite".equals(getSuite().getName())) {
+  SUITE_LISTENER.onStart(getSuite());
+}
+runTests();
+  }
+  catch (Exception e) {
+LOG.error(e, "");

Review comment:
   Removed it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] ccaominh commented on a change in pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



ccaominh commented on a change in pull request #9783:
URL: https://github.com/apache/druid/pull/9783#discussion_r416931177



##
File path: 
integration-tests/src/test/java/org/apache/druid/tests/parallelized/ITKinesisIndexingServiceDataFormatTest.java
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.tests.parallelized;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.google.inject.Inject;
+import org.apache.druid.guice.annotations.Json;
+import org.apache.druid.testing.guice.DruidTestModuleFactory;
+import org.apache.druid.tests.TestNGGroup;
+import org.apache.druid.tests.indexer.AbstractKinesisIndexingServiceTest;
+import org.apache.druid.tests.indexer.AbstractStreamIndexingTest;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.DataProvider;
+import org.testng.annotations.Guice;
+import org.testng.annotations.Test;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+@Test(groups = "kinesis-data-format")

Review comment:
   Can add a new named constant to `TestNGGroup` for "kinesis-data-format"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jihoonson commented on pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



jihoonson commented on pull request #9783:
URL: https://github.com/apache/druid/pull/9783#issuecomment-620854252


   Manually ran the new Kinesis test on my laptop.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jihoonson commented on a change in pull request #9778: Fix problem when running single integration test using -Dit.test=

2020-04-28 Thread GitBox



jihoonson commented on a change in pull request #9778:
URL: https://github.com/apache/druid/pull/9778#discussion_r416915215



##
File path: 
integration-tests/src/main/java/org/testng/DruidTestRunnerFactory.java
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package /*CHECKSTYLE.OFF: PackageName*/org.testng/*CHECKSTYLE.ON: 
PackageName*/;
+
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.testing.utils.SuiteListener;
+import org.testng.internal.IConfiguration;
+import org.testng.internal.Systematiser;
+import org.testng.internal.annotations.IAnnotationFinder;
+import org.testng.xml.XmlTest;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * This class must be in package org.testng to access protected methods like 
TestNG.getDefault().getConfiguration()
+ */
+public class DruidTestRunnerFactory implements ITestRunnerFactory
+{
+  private static final Logger LOG = new Logger(DruidTestRunnerFactory.class);
+  private static final SuiteListener SUITE_LISTENER = new SuiteListener();
+
+  @Override
+  public TestRunner newTestRunner(ISuite suite,
+  XmlTest test,
+  Collection 
methodListeners,
+  List classListeners)
+  {
+IConfiguration configuration = TestNG.getDefault().getConfiguration();
+String outputDirectory = suite.getOutputDirectory();
+IAnnotationFinder annotationFinder = configuration.getAnnotationFinder();
+Boolean skipFailedInvocationCounts = 
suite.getXmlSuite().skipFailedInvocationCounts();
+return new DruidTestRunner(
+configuration,
+suite,
+test,
+outputDirectory,
+annotationFinder,
+skipFailedInvocationCounts,
+methodListeners,
+classListeners
+);
+  }
+
+  private static class DruidTestRunner extends TestRunner
+  {
+DruidTestRunner(
+IConfiguration configuration,
+ISuite suite,
+XmlTest test,
+String outputDirectory,
+IAnnotationFinder finder,
+boolean skipFailedInvocationCounts,
+Collection methodListeners,
+List classListeners
+)
+{
+  super(configuration, suite, test, outputDirectory, finder, 
skipFailedInvocationCounts, methodListeners, classListeners, 
Systematiser.getComparator(), Collections.emptyMap());
+}
+
+@Override
+public void run()
+{
+  try {
+// IntegrationTestSuite is run when -Dit.test is not specify in maven 
command.
+// IntegrationTestSuite uses the configuration from 
integration-tests/src/test/resources/testng.xml
+// which already handle suite onStart and onFinish automatically
+if (!"IntegrationTestSuite".equals(getSuite().getName())) {
+  SUITE_LISTENER.onStart(getSuite());
+}
+runTests();
+  }
+  catch (Exception e) {
+LOG.error(e, "");

Review comment:
   Seems like neither `SuiteListener.onStart()` nor `runTests()` throw an 
`Exception`. Is this for logging? I changed `LoggerListener` to print stack 
trace properly [in my 
PR](https://github.com/apache/druid/pull/9783/files#diff-8e812fb64c6e87651c2a617561f88729R36)
 and this probably wouldn't be necessary.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] averma111 opened a new issue #9786: Druid 0.18 version

2020-04-28 Thread GitBox



averma111 opened a new issue #9786:
URL: https://github.com/apache/druid/issues/9786


    Hi Team,
   
   I am planning to build Druid 0.18 cluster in GKE environment using custom 
docker file which I have created.
   
   I am planning to create 3 pods 1 master(overlord and coordinator), 1 data 
node(historical and middle managers) and 1 query node(router and broker).
   
   Keeping zookeeper and meta store in VM instance so that they won’t interfere 
with GKE load balancer.
   
    
   Let me know if this design is correct or any other design I can look forward.
   Appreciate any help
    
   
   Thanks,
   Ashish
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] JulianJaffePinterest commented on issue #9707: Add IS_MULTIVALUE and COMPLEX_TYPE columns to INFORMATION_SCHEMA.COLUMNS

2020-04-28 Thread GitBox



JulianJaffePinterest commented on issue #9707:
URL: https://github.com/apache/druid/issues/9707#issuecomment-620836497


   Segment Metadata queries do return the complex type name (e.g. `thetaSketch` 
or `variance`). I'll try to grab some time to work on this soon.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] knoguchi commented on a change in pull request #7590: Add single-machine deployment example cfgs and scripts

2020-04-28 Thread GitBox



knoguchi commented on a change in pull request #7590:
URL: https://github.com/apache/druid/pull/7590#discussion_r416895607



##
File path: examples/bin/run-druid
##
@@ -34,7 +34,7 @@ else
   CONFDIR="$2"
 fi
 
-CONFDIR="$(cd "$CONFDIR" && pwd)/druid"
+CONFDIR="$(cd "$CONFDIR" && pwd)"

Review comment:
   Why was the /druid removed?  It's a breaking change for older version 
users.  It doesn't seem like an absolutely necessary change.  This file is one 
of the bin scripts in the apache-druid.tar.gz distribution although the 
directory name is "examples". Please avoid similar changes in the future 
release.  Thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] capistrant opened a new pull request #9785: Ensure Krb auth before killing YARN apps in graceful shutdown of hadoop batch index tasks

2020-04-28 Thread GitBox



capistrant opened a new pull request #9785:
URL: https://github.com/apache/druid/pull/9785


   
   
   
   
   
   
   ### Description
   
   
   
   
   
   
   
   My deployments had struggled with graceful shutdown of our Hadoop Batch 
Indexing tasks up until now. In the past, we simply patched out graceful 
shutdown since we didn't really mind the YARN apps not being killed. However, 
for our Druid 18 upgrade we wanted to get this fixed. 
   
   After troubleshooting, we found that the code was getting hung in 
`ToolRunner#run` when trying to kill the YARN app for the indexing job. After 
some trial and error in our fork we discovered it was an authentication issue 
and that Calling into `JobHelper#authenticate` fixed the issue with the 
graceful shutdown code becoming hung. Our analysis of calling into 
`JobHelper#authenticate` let us to believe it is a harmless method to call 
multiple times and also harmless to call in a cluster that is not using 
Kerberized Hadoop. The call is transparent if the authentication is already 
done or the Hadoop cluster in use is not using Kerberos. We also slightly 
refactored the signature of `JobHelper#authenticate` after intelliJ flagged the 
config parameter as being unused.
   
   With all this being said, I'm not sure if calling into the `authenticate` 
method from a different module is frowned upon as far as design goes. I decided 
to reuse that method and see what the community had to say about that.
   
   
   
   
   
   
   
   
   
   
   
   
   This PR has:
   - [ ] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   
   
   
   
   # Key changed/added classes in this PR
* `JobHelper`
* `HadoopIndexTask`
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[druid] branch master updated (e7e41e3 -> b279e04)

2020-04-28 Thread fjy

This is an automated email from the ASF dual-hosted git repository.

fjy pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from e7e41e3  Adding support for autoscaling in GCE (#8987)
 add b279e04  table fix (#9769)

No new revisions were added by this update.

Summary of changes:
 docs/querying/sql.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on issue #8134: OutOfMemoryError in BufferHashGrouperTest

2020-04-28 Thread GitBox



stale[bot] commented on issue #8134:
URL: https://github.com/apache/druid/issues/8134#issuecomment-620745702


   This issue has been marked as stale due to 280 days of inactivity. It will 
be closed in 4 weeks if no further activity occurs. If this issue is still 
relevant, please simply write any comment. Even if closed, you can still revive 
the issue at any time or discuss it on the d...@druid.apache.org list. Thank 
you for your contributions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s commented on a change in pull request #9773: [WIP] Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

2020-04-28 Thread GitBox



suneet-s commented on a change in pull request #9773:
URL: https://github.com/apache/druid/pull/9773#discussion_r416778349



##
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidJoinQueryRel.java
##
@@ -328,17 +327,27 @@ private static JoinType toDruidJoinType(JoinRelType 
calciteJoinType)
 }
   }
 
-  private static boolean computeLeftRequiresSubquery(final DruidRel left)
+  private static boolean computeLeftRequiresSubquery(final List> 
leftList)

Review comment:
   What do you think about keeping this interface the same and instead 
changing the private members left and right to `DruidRel`
   
   Whenever left and right is set - we can do something like `this.left = 
getCheapestDruidRel(RelNode);`
   
   As it's written, I wonder if it's possible for us to estimate that a 
subquery is not required in the computeSelfCost function, but the DruidRel used 
to build the native query actually uses a subquery.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] venkatramanp opened a new issue #9784: Filter datasources using regex in Segment Timeline of Druid Webconsole Datasourcese tab

2020-04-28 Thread GitBox



venkatramanp opened a new issue #9784:
URL: https://github.com/apache/druid/issues/9784


   ### Description
   
   Segment timeline by default shows all the datasources. If there are 100's of 
datasources with difficult names, its very difficult to find the datasource now.
   
   currently it doesn't seem to be in a sorted order as well.
   
   Note: There is a filter dialogue for the datasources below the segment 
timeline.
   
   ### Motivation
   
   This would help in easily finding segment timeline information in the Druid 
WebConsole. This was possible in the Older consoles.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] stale[bot] commented on issue #7957: Druid SQL planning time can exceed query timeouts

2020-04-28 Thread GitBox



stale[bot] commented on issue #7957:
URL: https://github.com/apache/druid/issues/7957#issuecomment-620712898


   This issue has been closed due to lack of activity. If you think that is 
incorrect, or the issue requires additional review, you can revive the issue at 
any time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] pjain1 commented on a change in pull request #9449: Add Sql InputSource

2020-04-28 Thread GitBox



pjain1 commented on a change in pull request #9449:
URL: https://github.com/apache/druid/pull/9449#discussion_r416549940



##
File path: server/src/main/java/org/apache/druid/guice/FirehoseModule.java
##
@@ -58,7 +59,8 @@ public void configure(Binder binder)
 new NamedType(CombiningFirehoseFactory.class, "combining"),
 new NamedType(FixedCountFirehoseFactory.class, "fixedCount"),
 new NamedType(SqlFirehoseFactory.class, "sql"),
-new NamedType(InlineFirehoseFactory.class, "inline")
+new NamedType(InlineFirehoseFactory.class, "inline"),
+new NamedType(SqlInputSource.class, "sql")

Review comment:
   can this be part of new module as InputSource seems to replacement for 
firehose related interfaces ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[druid] branch master updated: Adding support for autoscaling in GCE (#8987)

2020-04-28 Thread cwylie

This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
 new e7e41e3  Adding support for autoscaling in GCE (#8987)
e7e41e3 is described below

commit e7e41e3a369c236b444dc44338c443c2e46904f1
Author: Francesco Nidito <11637948+frnid...@users.noreply.github.com>
AuthorDate: Tue Apr 28 12:13:39 2020 +0200

Adding support for autoscaling in GCE (#8987)

* Adding support for autoscaling in GCE

* adding extra google deps also in gce pom

* fix link in doc

* remove unused deps

* adding terms to spelling file

* version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT

* GCEXyz -> GceXyz in naming for consistency

* add preconditions

* add VisibleForTesting annotation

* typos in comments

* use StringUtils.format instead of String.format

* use custom exception instead of exit

* factorize interval time between retries

* making literal value a constant

* iter all network interfaces

* use provided on google (non api) deps

* adding missing dep

* removing unneded this and use Objects methods instead o 3-way if in hash 
and comparison

* adding import

* adding retries around getRunningInstances and adding limit for operation 
end waiting

* refactor GceEnvironmentConfig.hashCode

* 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT

* removing unused config

* adding tests to hash and equals

* adding nullable to waitForOperationEnd

* adding testTerminate

* adding unit tests for createComputeService

* increasing retries in unrelated integration-test to prevent sporadic 
failure (hopefully)

* reverting queryResponseTemplate change

* adding comment for Compute.Builder.build() returning null
---
 distribution/pom.xml   |   2 +
 docs/configuration/index.md|   8 +-
 .../extensions-contrib/gce-extensions.md   | 103 +++
 docs/development/extensions.md |   1 +
 extensions-contrib/gce-extensions/pom.xml  | 131 
 .../overlord/autoscaling/gce/GceAutoScaler.java| 526 +
 .../autoscaling/gce/GceEnvironmentConfig.java  | 132 
 .../overlord/autoscaling/gce/GceModule.java|  42 +
 .../autoscaling/gce/GceServiceException.java   |  32 +
 .../overlord/autoscaling/gce/GceUtils.java |  73 ++
 .../org.apache.druid.initialization.DruidModule|  16 +
 .../autoscaling/gce/GceAutoScalerTest.java | 853 +
 .../overlord/autoscaling/gce/GceUtilsTest.java |  96 +++
 extensions-core/google-extensions/pom.xml  |   2 +-
 licenses.yaml  |  16 +-
 pom.xml|  12 +-
 .../druid/server/initialization/JettyTest.java |   2 +-
 website/.spelling  |   6 +
 website/i18n/en.json   |   3 +
 website/sidebars.json  |   1 +
 20 files changed, 2048 insertions(+), 9 deletions(-)

diff --git a/distribution/pom.xml b/distribution/pom.xml
index a47ead3..2b5c3d5 100644
--- a/distribution/pom.xml
+++ b/distribution/pom.xml
@@ -423,6 +423,8 @@
 
org.apache.druid.extensions.contrib:druid-moving-average-query
 -c
 
org.apache.druid.extensions.contrib:druid-tdigestsketch
+-c
+
org.apache.druid.extensions.contrib:gce-extensions
 
 
 
diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index 982ff5e..e17a83a 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -891,7 +891,7 @@ There are additional configs for autoscaling (if it is 
enabled):
 
 |Property|Description|Default|
 ||---|---|
-|`druid.indexer.autoscale.strategy`|Choices are "noop" or "ec2". Sets the 
strategy to run when autoscaling is required.|noop|
+|`druid.indexer.autoscale.strategy`|Choices are "noop", "ec2" or "gce". Sets 
the strategy to run when autoscaling is required.|noop|
 |`druid.indexer.autoscale.doAutoscale`|If set to "true" autoscaling will be 
enabled.|false|
 |`druid.indexer.autoscale.provisionPeriod`|How often to check whether or not 
new MiddleManagers should be added.|PT1M|
 |`druid.indexer.autoscale.terminatePeriod`|How often to check when 
MiddleManagers should be removed.|PT5M|
@@ -1115,7 +1115,9 @@ field. If not provided, the default is to not use it at 
all.
 
 # A

[GitHub] [druid] stale[bot] commented on pull request #8848: Refactor DirectDruidClient, extract QueryContextInputStreamResponseHandler

2020-04-28 Thread GitBox



stale[bot] commented on pull request #8848:
URL: https://github.com/apache/druid/pull/8848#issuecomment-620458156


   This pull request/issue has been closed due to lack of activity. If you 
think that is incorrect, or the pull request requires review, you can revive 
the PR at any time.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jihoonson opened a new pull request #9783: Integration tests for stream ingestion with various data formats

2020-04-28 Thread GitBox



jihoonson opened a new pull request #9783:
URL: https://github.com/apache/druid/pull/9783


   ### Description
   
   This PR adds new integration tests for stream ingestion against different 
data formats. To do this, a new interface `EventSerializer` was added which 
converts generated events using `SyntheticStreamGenerator` into a particular 
data format.
   
   The new tests automatically pick up serializer, inputFormat, and 
inputRowParser specs from test resources and populate test parameters. Since 
these tests take about 30 mins when you run in parallel with 2 threads, a new 
Travis job was added.
   
   
   
   This PR has:
   - [x] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths.
   - [x] added integration tests.
   - [ ] been tested in a test Druid cluster.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] 2bethere commented on pull request #9766: Druid Quickstart refactor and update

2020-04-28 Thread GitBox



2bethere commented on pull request #9766:
URL: https://github.com/apache/druid/pull/9766#issuecomment-620434325


   No comment, LGTM.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] majidbijani opened a new issue #9782: The console will not function at the moment

2020-04-28 Thread GitBox



majidbijani opened a new issue #9782:
URL: https://github.com/apache/druid/issues/9782


   I had trouble setting up the druid cluster. The cluster setup was done step 
by step according to the druid website(Clustered deployment).
   deep storage set is local. And for Metadata, the default derby is used. Most 
settings are default.
common.runtime.properties for Master Server and Query Server and Data 
Server are similar.
   common.runtime.properties:
   ```
   #
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   #
   
   # Extensions specified in the load list will be loaded by Druid
   # We are using local fs for deep storage - not recommended for production - 
use S3, HDFS, or NFS instead
   # We are using local derby for the metadata store - not recommended for 
production - use MySQL or Postgres instead
   
   # If you specify `druid.extensions.loadList=[]`, Druid won't load any 
extension from file system.
   # If you don't specify `druid.extensions.loadList`, Druid will load all the 
extensions under root extension directory.
   # More info: 
https://druid.apache.org/docs/latest/operations/including-extensions.html
   druid.extensions.loadList=["druid-hdfs-storage", 
"druid-kafka-indexing-service", "druid-datasketches"]
   
   # If you have a different version of Hadoop, place your Hadoop client jar 
files in your hadoop-dependencies directory
   # and uncomment the line below to point to your directory.
   #druid.extensions.hadoopDependenciesDir=/my/dir/hadoop-dependencies
   
   
   #
   # Hostname
   #
   druid.host=master
   
   #
   # Logging
   #
   
   # Log all runtime properties on startup. Disable to avoid logging properties 
on startup:
   druid.startup.logging.logProperties=true
   
   #
   # Zookeeper
   #
   
   druid.zk.service.host=master
   druid.zk.paths.base=/druid
   
   #
   # Metadata storage
   #
   
   # For Derby server on your Druid Coordinator (only viable in a cluster with 
a single Coordinator, no fail-over):
   druid.metadata.storage.type=derby
   
druid.metadata.storage.connector.connectURI=jdbc:derby://master:1527/var/druid/metadata.db;create=true
   druid.metadata.storage.connector.host=master
   druid.metadata.storage.connector.port=1527
   
   # For MySQL (make sure to include the MySQL JDBC driver on the classpath):
   #druid.metadata.storage.type=mysql
   
#druid.metadata.storage.connector.connectURI=jdbc:mysql://db.example.com:3306/druid
   #druid.metadata.storage.connector.user=...
   #druid.metadata.storage.connector.password=...
   
   # For PostgreSQL:
   #druid.metadata.storage.type=postgresql
   
#druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid
   #druid.metadata.storage.connector.user=...
   #druid.metadata.storage.connector.password=...
   
   #
   # Deep storage
   #
   
   # For local disk (only viable in a cluster if this is a network mount):
   druid.storage.type=local
   druid.storage.storageDirectory=var/druid/segments
   
   # For HDFS:
   #druid.storage.type=hdfs
   #druid.storage.storageDirectory=/druid/segments
   
   # For S3:
   #druid.storage.type=s3
   #druid.storage.bucket=your-bucket
   #druid.storage.baseKey=druid/segments
   #druid.s3.accessKey=...
   #druid.s3.secretKey=...
   
   #
   # Indexing service logs
   #
   
   # For local disk (only viable in a cluster if this is a network mount):
   druid.indexer.logs.type=file
   druid.indexer.logs.directory=var/druid/indexing-logs
   
   # For HDFS:
   #druid.indexer.logs.type=hdfs
   #druid.indexer.logs.directory=/druid/indexing-logs
   
   # For S3:
   #druid.indexer.logs.type=s3
   #druid.indexer.logs.s3Bucket=your-bucket
   #druid.indexer.logs.s3Prefix=druid/indexing-logs
   
   #
   # Service discovery
   #
   
   druid.selectors.indexing.serviceName=druid/overlord
   druid.selectors.coordinator.serviceName=druid/coordinator
   
   #
   # Monitoring
   #
   
   druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
   druid.emitter=noop
   druid.emitter.logging.logLevel=info
   
   # Storage type of double columns
   # ommiting this will lead to index double as float at the storage layer
   
   druid.indexing.doubleStorage=double
   
   #
   # Secur

[GitHub] [druid] majidbijani opened a new issue #9781: Request failed with status code 500

2020-04-28 Thread GitBox



majidbijani opened a new issue #9781:
URL: https://github.com/apache/druid/issues/9781


   I had trouble setting up the druid cluster. The cluster setup was done step 
by step according to the druid website(Clustered deployment). 
   deep storage set is local. And for Metadata, the default derby is used. Most 
settings are default.
    common.runtime.properties for Master Server and Query Server and Data 
Server are similar.
   common.runtime.properties:
   `#
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   #
   
   # Extensions specified in the load list will be loaded by Druid
   # We are using local fs for deep storage - not recommended for production - 
use S3, HDFS, or NFS instead
   # We are using local derby for the metadata store - not recommended for 
production - use MySQL or Postgres instead
   
   # If you specify `druid.extensions.loadList=[]`, Druid won't load any 
extension from file system.
   # If you don't specify `druid.extensions.loadList`, Druid will load all the 
extensions under root extension directory.
   # More info: 
https://druid.apache.org/docs/latest/operations/including-extensions.html
   druid.extensions.loadList=["druid-hdfs-storage", 
"druid-kafka-indexing-service", "druid-datasketches"]
   
   # If you have a different version of Hadoop, place your Hadoop client jar 
files in your hadoop-dependencies directory
   # and uncomment the line below to point to your directory.
   #druid.extensions.hadoopDependenciesDir=/my/dir/hadoop-dependencies
   
   
   #
   # Hostname
   #
   druid.host=master
   
   #
   # Logging
   #
   
   # Log all runtime properties on startup. Disable to avoid logging properties 
on startup:
   druid.startup.logging.logProperties=true
   
   #
   # Zookeeper
   #
   
   druid.zk.service.host=master
   druid.zk.paths.base=/druid
   
   #
   # Metadata storage
   #
   
   # For Derby server on your Druid Coordinator (only viable in a cluster with 
a single Coordinator, no fail-over):
   druid.metadata.storage.type=derby
   
druid.metadata.storage.connector.connectURI=jdbc:derby://master:1527/var/druid/metadata.db;create=true
   druid.metadata.storage.connector.host=master
   druid.metadata.storage.connector.port=1527
   
   # For MySQL (make sure to include the MySQL JDBC driver on the classpath):
   #druid.metadata.storage.type=mysql
   
#druid.metadata.storage.connector.connectURI=jdbc:mysql://db.example.com:3306/druid
   #druid.metadata.storage.connector.user=...
   #druid.metadata.storage.connector.password=...
   
   # For PostgreSQL:
   #druid.metadata.storage.type=postgresql
   
#druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid
   #druid.metadata.storage.connector.user=...
   #druid.metadata.storage.connector.password=...
   
   #
   # Deep storage
   #
   
   # For local disk (only viable in a cluster if this is a network mount):
   druid.storage.type=local
   druid.storage.storageDirectory=var/druid/segments
   
   # For HDFS:
   #druid.storage.type=hdfs
   #druid.storage.storageDirectory=/druid/segments
   
   # For S3:
   #druid.storage.type=s3
   #druid.storage.bucket=your-bucket
   #druid.storage.baseKey=druid/segments
   #druid.s3.accessKey=...
   #druid.s3.secretKey=...
   
   #
   # Indexing service logs
   #
   
   # For local disk (only viable in a cluster if this is a network mount):
   druid.indexer.logs.type=file
   druid.indexer.logs.directory=var/druid/indexing-logs
   
   # For HDFS:
   #druid.indexer.logs.type=hdfs
   #druid.indexer.logs.directory=/druid/indexing-logs
   
   # For S3:
   #druid.indexer.logs.type=s3
   #druid.indexer.logs.s3Bucket=your-bucket
   #druid.indexer.logs.s3Prefix=druid/indexing-logs
   
   #
   # Service discovery
   #
   
   druid.selectors.indexing.serviceName=druid/overlord
   druid.selectors.coordinator.serviceName=druid/coordinator
   
   #
   # Monitoring
   #
   
   druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
   druid.emitter=noop
   druid.emitter.logging.logLevel=info
   
   # Storage type of double columns
   # ommiting this will lead to index double as float at the storage layer
   
   druid.indexing.doubleStorage=double
   
   #
   # Security

60 matches

Mail list logo