[GitHub] [druid] clintropolis merged pull request #9203: [Backport] Web console: fix refresh button in segments view

2020-01-16 Thread GitBox
clintropolis merged pull request #9203: [Backport] Web console: fix refresh 
button in segments view
URL: https://github.com/apache/druid/pull/9203
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch 0.17.0 updated (7c7fffc -> 6874194)

2020-01-16 Thread cwylie
This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a change to branch 0.17.0
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 7c7fffc  Update Kinesis resharding information about task failures 
(#9104) (#9201)
 add 6874194  fix refresh button (#9195) (#9203)

No new revisions were added by this update.

Summary of changes:
 .../src/views/segments-view/segments-view.tsx  | 29 +++---
 1 file changed, 14 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis merged pull request #9201: [Backport] Update Kinesis resharding information about task failures (#9104)

2020-01-16 Thread GitBox
clintropolis merged pull request #9201: [Backport] Update Kinesis resharding 
information about task failures (#9104)
URL: https://github.com/apache/druid/pull/9201
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch 0.17.0 updated (e6246c9 -> 7c7fffc)

2020-01-16 Thread cwylie
This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a change to branch 0.17.0
in repository https://gitbox.apache.org/repos/asf/druid.git.


from e6246c9  Fix deserialization of maxBytesInMemory (#9092) (#9170)
 add 7c7fffc  Update Kinesis resharding information about task failures 
(#9104) (#9201)

No new revisions were added by this update.

Summary of changes:
 docs/development/extensions-core/kinesis-ingestion.md | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367775037
 
 

 ##
 File path: docs/development/extensions-core/hdfs.md
 ##
 @@ -94,7 +94,7 @@ For more configurations, see the [Hadoop AWS 
module](https://hadoop.apache.org/d
 
  Configuration for Google Cloud Storage
 
-To use the Google cloud Storage as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
+To use the Google Cloud Storage as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 Review comment:
   Thanks, I made changed based on the suggestions. But I would still want to 
keep the example properties for GCS, since they are pretty mandatory. The 
similar pattern is applied to [S3 
configuration](https://github.com/apache/druid/pull/9171/files#diff-51abd0f049462a98772db4c6ea063be3R66-R93).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367775037
 
 

 ##
 File path: docs/development/extensions-core/hdfs.md
 ##
 @@ -94,7 +94,7 @@ For more configurations, see the [Hadoop AWS 
module](https://hadoop.apache.org/d
 
  Configuration for Google Cloud Storage
 
-To use the Google cloud Storage as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
+To use the Google Cloud Storage as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 Review comment:
   Thanks, I added suggestions. But I would still want to keep the example 
properties for GCS, since they are pretty mandatory. The similar pattern is 
applied to [S3 
configuration](https://github.com/apache/druid/pull/9171/files#diff-51abd0f049462a98772db4c6ea063be3R66-R93).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis merged pull request #9198: Web console: fix bug where arrays can not be emptied out in the coordinator dialog

2020-01-16 Thread GitBox
clintropolis merged pull request #9198: Web console: fix bug where arrays can 
not be emptied out in the coordinator dialog
URL: https://github.com/apache/druid/pull/9198
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis opened a new pull request #9206: [Backport] Web console: fix bug where arrays can not be emptied out in the coordinator dialog

2020-01-16 Thread GitBox
clintropolis opened a new pull request #9206: [Backport] Web console: fix bug 
where arrays can not be emptied out in the coordinator dialog
URL: https://github.com/apache/druid/pull/9206
 
 
   Backport of #9198 to 0.17.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated: allow empty values to be set in the auto form (#9198)

2020-01-16 Thread cwylie
This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
 new ab26725  allow empty values to be set in the auto form (#9198)
ab26725 is described below

commit ab2672514b306243b8b72d64e7419fd8e8a18fe4
Author: Vadim Ogievetsky 
AuthorDate: Thu Jan 16 21:06:51 2020 -0800

allow empty values to be set in the auto form (#9198)
---
 web-console/src/components/auto-form/auto-form.tsx| 15 +++
 .../coordinator-dynamic-config-dialog.tsx |  3 +++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/web-console/src/components/auto-form/auto-form.tsx 
b/web-console/src/components/auto-form/auto-form.tsx
index 110bf49..66dffde 100644
--- a/web-console/src/components/auto-form/auto-form.tsx
+++ b/web-console/src/components/auto-form/auto-form.tsx
@@ -45,6 +45,7 @@ export interface Field {
 | 'json'
 | 'interval';
   defaultValue?: any;
+  emptyValue?: any;
   suggestions?: Functor;
   placeholder?: string;
   min?: number;
@@ -99,10 +100,16 @@ export class AutoForm> 
extends React.PureComponent
 const { model } = this.props;
 if (!model) return;
 
-const newModel =
-  typeof newValue === 'undefined'
-? deepDelete(model, field.name)
-: deepSet(model, field.name, newValue);
+let newModel: T;
+if (typeof newValue === 'undefined') {
+  if (typeof field.emptyValue === 'undefined') {
+newModel = deepDelete(model, field.name);
+  } else {
+newModel = deepSet(model, field.name, field.emptyValue);
+  }
+} else {
+  newModel = deepSet(model, field.name, newValue);
+}
 
 this.modelChange(newModel);
   };
diff --git 
a/web-console/src/dialogs/coordinator-dynamic-config-dialog/coordinator-dynamic-config-dialog.tsx
 
b/web-console/src/dialogs/coordinator-dynamic-config-dialog/coordinator-dynamic-config-dialog.tsx
index 8d82c0c..044e7ea 100644
--- 
a/web-console/src/dialogs/coordinator-dynamic-config-dialog/coordinator-dynamic-config-dialog.tsx
+++ 
b/web-console/src/dialogs/coordinator-dynamic-config-dialog/coordinator-dynamic-config-dialog.tsx
@@ -180,6 +180,7 @@ export class CoordinatorDynamicConfigDialog extends 
React.PureComponent<
 {
   name: 'killDataSourceWhitelist',
   type: 'string-array',
+  emptyValue: [],
   info: (
 <>
   List of dataSources for which kill tasks are sent if 
property{' '}
@@ -191,6 +192,7 @@ export class CoordinatorDynamicConfigDialog extends 
React.PureComponent<
 {
   name: 'killPendingSegmentsSkipList',
   type: 'string-array',
+  emptyValue: [],
   info: (
 <>
   List of dataSources for which pendingSegments are NOT 
cleaned up if property{' '}
@@ -259,6 +261,7 @@ export class CoordinatorDynamicConfigDialog extends 
React.PureComponent<
 {
   name: 'decommissioningNodes',
   type: 'string-array',
+  emptyValue: [],
   info: (
 <>
   List of historical services to 'decommission'. Coordinator 
will not assign new


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (448da78 -> 68ed2a2)

2020-01-16 Thread cwylie
This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 448da78  Speed up String first/last aggregators when folding isn't 
needed. (#9181)
 add 68ed2a2  Fix LATEST / EARLIEST Buffer Aggregator does not work on 
String column  (#9197)

No new revisions were added by this update.

Summary of changes:
 .../aggregation/first/StringFirstLastUtils.java|   2 +-
 .../first/StringFirstLastUtilsTest.java|  59 +
 .../apache/druid/sql/calcite/CalciteQueryTest.java | 147 -
 3 files changed, 202 insertions(+), 6 deletions(-)
 create mode 100644 
processing/src/test/java/org/apache/druid/query/aggregation/first/StringFirstLastUtilsTest.java


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis merged pull request #9197: Fix LATEST / EARLIEST Buffer Aggregator does not work on String column

2020-01-16 Thread GitBox
clintropolis merged pull request #9197: Fix LATEST / EARLIEST Buffer Aggregator 
does not work on String column 
URL: https://github.com/apache/druid/pull/9197
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (486c0fd -> 448da78)

2020-01-16 Thread cwylie
This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 486c0fd  Bump Apache Parquet to 1.11.0 (#9129)
 add 448da78  Speed up String first/last aggregators when folding isn't 
needed. (#9181)

No new revisions were added by this update.

Summary of changes:
 .../apache/druid/java/util/common/StringUtils.java | 17 ++-
 .../druid/java/util/common/StringUtilsTest.java| 28 +++
 .../aggregation/first/StringFirstAggregator.java   | 44 +++---
 .../first/StringFirstAggregatorFactory.java| 13 --
 .../first/StringFirstBufferAggregator.java | 54 --
 .../aggregation/first/StringFirstLastUtils.java| 29 +++-
 .../aggregation/last/StringLastAggregator.java | 44 +++---
 .../last/StringLastAggregatorFactory.java  | 14 --
 .../last/StringLastBufferAggregator.java   | 54 --
 .../first/StringFirstAggregationTest.java  |  8 +++-
 .../first/StringFirstBufferAggregatorTest.java | 46 --
 .../last/StringLastAggregationTest.java|  5 ++
 .../last/StringLastBufferAggregatorTest.java   | 50 ++--
 13 files changed, 321 insertions(+), 85 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis merged pull request #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
clintropolis merged pull request #9181: Speed up String first/last aggregators 
when folding isn't needed.
URL: https://github.com/apache/druid/pull/9181
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on issue #9199: Fix TSV bugs

2020-01-16 Thread GitBox
jon-wei commented on issue #9199: Fix TSV bugs
URL: https://github.com/apache/druid/pull/9199#issuecomment-575449429
 
 
   @jihoonson thanks, latest update lgtm


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9183: fix topn aggregation on numeric columns with null values

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9183: fix topn aggregation on 
numeric columns with null values
URL: https://github.com/apache/druid/pull/9183#discussion_r367748951
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/topn/types/NullableNumericTopNColumnAggregatesProcessor.java
 ##
 @@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.topn.types;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.Aggregator;
+import org.apache.druid.query.topn.BaseTopNAlgorithm;
+import org.apache.druid.query.topn.TopNParams;
+import org.apache.druid.query.topn.TopNQuery;
+import org.apache.druid.query.topn.TopNResultBuilder;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+import org.apache.druid.segment.Cursor;
+import org.apache.druid.segment.StorageAdapter;
+
+import java.util.Map;
+import java.util.function.Function;
+
+public abstract class NullableNumericTopNColumnAggregatesProcessor
+implements TopNColumnAggregatesProcessor
+{
+  private final boolean hasNulls = !NullHandling.replaceWithDefault();
+  final Function> converter;
+  Aggregator[] nullValueAggregates;
+
+  protected NullableNumericTopNColumnAggregatesProcessor(Function> converter)
+  {
+this.converter = converter;
+  }
+
+  abstract Aggregator[] getValueAggregators(TopNQuery query, Selector 
selector, Cursor cursor);
 
 Review comment:
   Can you add javadocs for the abstract methods?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9183: fix topn aggregation on numeric columns with null values

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9183: fix topn aggregation on 
numeric columns with null values
URL: https://github.com/apache/druid/pull/9183#discussion_r367748677
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/topn/types/NullableNumericTopNColumnAggregatesProcessor.java
 ##
 @@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.topn.types;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.Aggregator;
+import org.apache.druid.query.topn.BaseTopNAlgorithm;
+import org.apache.druid.query.topn.TopNParams;
+import org.apache.druid.query.topn.TopNQuery;
+import org.apache.druid.query.topn.TopNResultBuilder;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+import org.apache.druid.segment.Cursor;
+import org.apache.druid.segment.StorageAdapter;
+
+import java.util.Map;
+import java.util.function.Function;
+
+public abstract class NullableNumericTopNColumnAggregatesProcessor
+implements TopNColumnAggregatesProcessor
+{
+  private final boolean hasNulls = !NullHandling.replaceWithDefault();
+  final Function> converter;
+  Aggregator[] nullValueAggregates;
+
+  protected NullableNumericTopNColumnAggregatesProcessor(Function> converter)
+  {
+this.converter = converter;
+  }
+
+  abstract Aggregator[] getValueAggregators(TopNQuery query, Selector 
selector, Cursor cursor);
+
+  abstract Map getAggregatesStore();
+
+  abstract Comparable convertAggregatorStoreKeyToColumnValue(Object 
aggregatorStoreKey);
+
+  @Override
+  public int getCardinality(Selector selector)
+  {
+return TopNParams.CARDINALITY_UNKNOWN;
+  }
+
+  @Override
+  public Aggregator[][] getRowSelector(TopNQuery query, TopNParams params, 
StorageAdapter storageAdapter)
+  {
+return null;
+  }
+
+  @Override
+  public long scanAndAggregate(
+  TopNQuery query,
+  Selector selector,
+  Cursor cursor,
+  Aggregator[][] rowSelector
+  )
+  {
+initAggregateStore();
 
 Review comment:
   I think the `initAggregateStore` call could be moved into 
`HeapBasedTopNAlgorithm.scanAndAggregate` since both impls call it as the first 
step


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] lgtm-com[bot] commented on issue #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
lgtm-com[bot] commented on issue #9181: Speed up String first/last aggregators 
when folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#issuecomment-575439630
 
 
   This pull request **fixes 1 alert** when merging 
de0697cb1834f77a2fafc57e5d56673a558c5e83 into 
486c0fd149d9837a64550ecb9e85d9b6cd4beb24 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-cbc76f7454ddad92381f9db32c521dcbd504afb8)
   
   **fixed alerts:**
   
   * 1 for Useless null check


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on issue #9199: Fix TSV bugs

2020-01-16 Thread GitBox
jihoonson commented on issue #9199: Fix TSV bugs
URL: https://github.com/apache/druid/pull/9199#issuecomment-575437622
 
 
   @jon-wei @clintropolis thanks for the review. I needed to delete one test 
and modify another which was added in https://github.com/apache/druid/pull/8915 
because the delimited input format doesn't support those functionalities 
(recognizing quotes).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s opened a new pull request #9205: [0.17.0] Tutorials use new ingestion spec where possible (#9155)

2020-01-16 Thread GitBox
suneet-s opened a new pull request #9205: [0.17.0] Tutorials use new ingestion 
spec where possible (#9155)
URL: https://github.com/apache/druid/pull/9205
 
 
   Backports the following commits to 0.17.0:
- Tutorials use new ingestion spec where possible (#9155)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s opened a new pull request #9204: [0.17.0] Link javaOpts to middlemanager runtime.properties docs (#9101)

2020-01-16 Thread GitBox
suneet-s opened a new pull request #9204: [0.17.0] Link javaOpts to 
middlemanager runtime.properties docs (#9101)
URL: https://github.com/apache/druid/pull/9204
 
 
   Backports the following commits to 0.17.0:
- Link javaOpts to middlemanager runtime.properties docs (#9101)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
clintropolis commented on a change in pull request #9181: Speed up String 
first/last aggregators when folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#discussion_r367736165
 
 

 ##
 File path: 
core/src/test/java/org/apache/druid/java/util/common/StringUtilsTest.java
 ##
 @@ -246,4 +246,32 @@ public void testRpad()
 Assert.assertEquals(s5, null);
   }
 
+  @Test
+  public void testChop()
+  {
+Assert.assertEquals("foo", StringUtils.chop("foo", 5));
+Assert.assertEquals("fo", StringUtils.chop("foo", 2));
+Assert.assertEquals("", StringUtils.chop("foo", 0));
+Assert.assertEquals("smile  for", StringUtils.chop("smile  for the 
camera", 14));
 
 Review comment:
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis opened a new pull request #9203: [Backport] Web console: fix refresh button in segments view

2020-01-16 Thread GitBox
clintropolis opened a new pull request #9203: [Backport] Web console: fix 
refresh button in segments view
URL: https://github.com/apache/druid/pull/9203
 
 
   Backport of #9195 to 0.17.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis opened a new pull request #9202: [Backport] fix null handling for arithmetic post aggregator comparator

2020-01-16 Thread GitBox
clintropolis opened a new pull request #9202: [Backport] fix null handling for 
arithmetic post aggregator comparator
URL: https://github.com/apache/druid/pull/9202
 
 
   Backport of #9159 to 0.17.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on issue #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
gianm commented on issue #9181: Speed up String first/last aggregators when 
folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#issuecomment-575427555
 
 
   > There's a TC error about an unresolved reference to the chop method
   
   That was from a javadoc for `fastLooseChop`. It looks like `chop` was moved 
to StringUtils, so I moved `fastLooseChop` to the same place. And added unit 
tests for good measure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei opened a new pull request #9201: [Backport] Update Kinesis resharding information about task failures (#9104)

2020-01-16 Thread GitBox
jon-wei opened a new pull request #9201: [Backport] Update Kinesis resharding 
information about task failures (#9104)
URL: https://github.com/apache/druid/pull/9201
 
 
   Backport #9104 to 0.17.0


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] vogievetsky commented on issue #9190: Docs: move search to the left

2020-01-16 Thread GitBox
vogievetsky commented on issue #9190: Docs: move search to the left
URL: https://github.com/apache/druid/pull/9190#issuecomment-575421536
 
 
   @fjy the [docusaurus](https://docusaurus.io/docs/en/search) template forces 
you to have a search in the header. Putting it in the ToC would be a lot more 
work. Do you think this position is better than before?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on issue #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
jon-wei commented on issue #9181: Speed up String first/last aggregators when 
folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#issuecomment-575418815
 
 
   There's a TC error about an resolved reference to the chop method


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei edited a comment on issue #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
jon-wei edited a comment on issue #9181: Speed up String first/last aggregators 
when folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#issuecomment-575418815
 
 
   There's a TC error about an unresolved reference to the chop method


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367716973
 
 

 ##
 File path: docs/ingestion/hadoop.md
 ##
 @@ -149,11 +149,12 @@ For example, using the static input paths:
 ```
 
 You can also read from cloud storage such as AWS S3 or Google Cloud Storage.
-To do so, you need to install the necessary library under 
`${DRUID_HOME}/hadoop-dependencies` in _all MiddleManager or Indexer processes_.
+To do so, you need to install the necessary library under Druid's classpath in 
_all MiddleManager or Indexer processes_.
 For S3, you can run the below command to install the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
 
 ```bash
 java -classpath "${DRUID_HOME}lib/*" org.apache.druid.cli.Main tools pull-deps 
-h "org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}";
+cp 
${DRUID_HOME}/hadoop-dependencies/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar
 ${DRUID_HOME}/extensions/druid-hdfs-storage/
 
 Review comment:
   This should go before the java command


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367716973
 
 

 ##
 File path: docs/ingestion/hadoop.md
 ##
 @@ -149,11 +149,12 @@ For example, using the static input paths:
 ```
 
 You can also read from cloud storage such as AWS S3 or Google Cloud Storage.
-To do so, you need to install the necessary library under 
`${DRUID_HOME}/hadoop-dependencies` in _all MiddleManager or Indexer processes_.
+To do so, you need to install the necessary library under Druid's classpath in 
_all MiddleManager or Indexer processes_.
 For S3, you can run the below command to install the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
 
 ```bash
 java -classpath "${DRUID_HOME}lib/*" org.apache.druid.cli.Main tools pull-deps 
-h "org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}";
+cp 
${DRUID_HOME}/hadoop-dependencies/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar
 ${DRUID_HOME}/extensions/druid-hdfs-storage/
 
 Review comment:
   This should go before the java command


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367720309
 
 

 ##
 File path: docs/development/extensions-core/hdfs.md
 ##
 @@ -94,7 +94,7 @@ For more configurations, see the [Hadoop AWS 
module](https://hadoop.apache.org/d
 
  Configuration for Google Cloud Storage
 
-To use the Google cloud Storage as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
+To use the Google Cloud Storage as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 Review comment:
   For the installation section below, I think we could point to 
https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md
 and say the following, and remove the parts where we duplicate their setup 
instructions:
   
   > Please follow the instructions at 
https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md
 for configuring your `core-site.xml` with the filesystem and authentication 
properties needed for GCS."
   
   We can also add the following (it took me a while to find a download link 
for the connector): 
   
   > The GCS connector library is available at 
https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage#other_sparkhadoop_clusters
   
   
   The line below:
   "Tested with Druid 0.9.0, Hadoop 2.7.2 and gcs-connector jar 1.4.4-hadoop2."
   
   can be updated to
   
   "Tested with Druid 0.17.0, Hadoop 2.8.5 and gcs-connector jar 2.0.0-hadoop2.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9200: Optimize JoinCondition matching

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9200: Optimize JoinCondition 
matching
URL: https://github.com/apache/druid/pull/9200#discussion_r367719384
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinConditionAnalysis.java
 ##
 @@ -133,26 +142,23 @@ public String getOriginalExpression()
*/
   public boolean isAlwaysFalse()
   {
-return nonEquiConditions.stream()
-.anyMatch(expr -> expr.isLiteral() && 
!expr.eval(ExprUtils.nilBindings()).asBoolean());
+return anyFalseLiteralNonEquiConditions;
   }
 
   /**
* Return whether this condition is a constant that is always true.
*/
   public boolean isAlwaysTrue()
   {
-return equiConditions.isEmpty() &&
-   nonEquiConditions.stream()
-.allMatch(expr -> expr.isLiteral() && 
expr.eval(ExprUtils.nilBindings()).asBoolean());
+return equiConditions.isEmpty() && allTrueLiteralNonEquiConditions;
 
 Review comment:
   It seems like `allTrueLiteralNonEquiConditions` is only used here; how about 
caching `isAlwaysTrue` directly?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9200: Optimize JoinCondition matching

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9200: Optimize JoinCondition 
matching
URL: https://github.com/apache/druid/pull/9200#discussion_r367719499
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinConditionAnalysis.java
 ##
 @@ -133,26 +142,23 @@ public String getOriginalExpression()
*/
   public boolean isAlwaysFalse()
   {
-return nonEquiConditions.stream()
-.anyMatch(expr -> expr.isLiteral() && 
!expr.eval(ExprUtils.nilBindings()).asBoolean());
+return anyFalseLiteralNonEquiConditions;
 
 Review comment:
   Why not call this `isAlwaysFalse`? (It looks like it isn't used anywhere 
else, and it seems to me to be easier to understand the meaning of the field if 
it's named after what we want it to mean.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis merged pull request #9129: Bump Apache Parquet to 1.11.0

2020-01-16 Thread GitBox
clintropolis merged pull request #9129: Bump Apache Parquet to 1.11.0
URL: https://github.com/apache/druid/pull/9129
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (bd49ec0 -> 486c0fd)

2020-01-16 Thread cwylie
This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from bd49ec0  Move result-to-array logic from SQL layer into 
QueryToolChests. (#9130)
 add 486c0fd  Bump Apache Parquet to 1.11.0 (#9129)

No new revisions were added by this update.

Summary of changes:
 extensions-core/parquet-extensions/pom.xml | 2 +-
 licenses.yaml  | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9199: Fix TSV bugs

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9199: Fix TSV bugs
URL: https://github.com/apache/druid/pull/9199#discussion_r367709959
 
 

 ##
 File path: core/src/main/java/org/apache/druid/data/input/impl/CSVParser.java
 ##
 @@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.opencsv.RFC4180Parser;
+import com.opencsv.RFC4180ParserBuilder;
+import com.opencsv.enums.CSVReaderNullFieldIndicator;
+import org.apache.druid.common.config.NullHandling;
+import 
org.apache.druid.data.input.impl.DelimitedValueReader.DelimitedValueParser;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.List;
+
+public class CSVParser implements DelimitedValueParser
+{
+  private static final char SEPERATOR = ',';
 
 Review comment:
   Thanks, fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9199: Fix TSV bugs

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9199: Fix TSV bugs
URL: https://github.com/apache/druid/pull/9199#discussion_r367710195
 
 

 ##
 File path: 
core/src/main/java/org/apache/druid/data/input/impl/FlatTextInputFormat.java
 ##
 @@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputFormat;
+import org.apache.druid.indexer.Checks;
+import org.apache.druid.indexer.Property;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.List;
+import java.util.Objects;
+
+public abstract class FlatTextInputFormat implements InputFormat
+{
+  private final List columns;
+  private final String listDelimiter;
+  private final String delimiter;
+  private final boolean findColumnsFromHeader;
+  private final int skipHeaderRows;
+
+  FlatTextInputFormat(
+  @Nullable List columns,
+  @Nullable String listDelimiter,
+  String delimiter,
+  @Nullable Boolean hasHeaderRow,
+  @Nullable Boolean findColumnsFromHeader,
+  int skipHeaderRows
+  )
+  {
+this.columns = columns == null ? Collections.emptyList() : columns;
+this.listDelimiter = listDelimiter;
+this.delimiter = Preconditions.checkNotNull(delimiter, "delimiter");
+//noinspection ConstantConditions
+if (columns == null || columns.isEmpty()) {
+  this.findColumnsFromHeader = Checks.checkOneNotNullOrEmpty(
+  ImmutableList.of(
+  new Property<>("hasHeaderRow", hasHeaderRow),
+  new Property<>("findColumnsFromHeader", findColumnsFromHeader)
+  )
+  ).getValue();
+} else {
+  this.findColumnsFromHeader = false;
+}
+this.skipHeaderRows = skipHeaderRows;
+Preconditions.checkArgument(
+!delimiter.equals(listDelimiter),
+"Cannot have same delimiter and list delimiter of [%s]",
+delimiter
+);
+if (!this.columns.isEmpty()) {
+  for (String column : this.columns) {
+Preconditions.checkArgument(
 
 Review comment:
   Hmm, I'm not sure why we do this check.. I guess it wouldn't harm anything 
if the column name contains the delimiter. Maybe we can remove this check later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9199: Fix TSV bugs

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9199: Fix TSV bugs
URL: https://github.com/apache/druid/pull/9199#discussion_r367707521
 
 

 ##
 File path: 
core/src/main/java/org/apache/druid/data/input/impl/FlatTextInputFormat.java
 ##
 @@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.data.input.InputFormat;
+import org.apache.druid.indexer.Checks;
+import org.apache.druid.indexer.Property;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.List;
+import java.util.Objects;
+
+public abstract class FlatTextInputFormat implements InputFormat
+{
+  private final List columns;
+  private final String listDelimiter;
+  private final String delimiter;
+  private final boolean findColumnsFromHeader;
+  private final int skipHeaderRows;
+
+  FlatTextInputFormat(
+  @Nullable List columns,
+  @Nullable String listDelimiter,
+  String delimiter,
+  @Nullable Boolean hasHeaderRow,
+  @Nullable Boolean findColumnsFromHeader,
+  int skipHeaderRows
+  )
+  {
+this.columns = columns == null ? Collections.emptyList() : columns;
+this.listDelimiter = listDelimiter;
+this.delimiter = Preconditions.checkNotNull(delimiter, "delimiter");
+//noinspection ConstantConditions
+if (columns == null || columns.isEmpty()) {
+  this.findColumnsFromHeader = Checks.checkOneNotNullOrEmpty(
+  ImmutableList.of(
+  new Property<>("hasHeaderRow", hasHeaderRow),
+  new Property<>("findColumnsFromHeader", findColumnsFromHeader)
+  )
+  ).getValue();
+} else {
+  this.findColumnsFromHeader = false;
+}
+this.skipHeaderRows = skipHeaderRows;
+Preconditions.checkArgument(
+!delimiter.equals(listDelimiter),
+"Cannot have same delimiter and list delimiter of [%s]",
+delimiter
+);
+if (!this.columns.isEmpty()) {
+  for (String column : this.columns) {
+Preconditions.checkArgument(
 
 Review comment:
   Does this need to check for `listDelimiter` in the column names as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9199: Fix TSV bugs

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9199: Fix TSV bugs
URL: https://github.com/apache/druid/pull/9199#discussion_r367704911
 
 

 ##
 File path: core/src/main/java/org/apache/druid/data/input/impl/CSVParser.java
 ##
 @@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.data.input.impl;
+
+import com.opencsv.RFC4180Parser;
+import com.opencsv.RFC4180ParserBuilder;
+import com.opencsv.enums.CSVReaderNullFieldIndicator;
+import org.apache.druid.common.config.NullHandling;
+import 
org.apache.druid.data.input.impl.DelimitedValueReader.DelimitedValueParser;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.List;
+
+public class CSVParser implements DelimitedValueParser
+{
+  private static final char SEPERATOR = ',';
 
 Review comment:
   SEPERATOR -> SEPARATOR


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367707446
 
 

 ##
 File path: docs/development/extensions-core/hdfs.md
 ##
 @@ -36,49 +36,110 @@ To use this Apache Druid extension, make sure to 
[include](../../development/ext
 |`druid.hadoop.security.kerberos.principal`|`dr...@example.com`| Principal 
user name |empty|
 
|`druid.hadoop.security.kerberos.keytab`|`/etc/security/keytabs/druid.headlessUser.keytab`|Path
 to keytab file|empty|
 
-If you are using the Hadoop indexer, set your output directory to be a 
location on Hadoop and it will work.
+Besides the above settings, you also need to include all Hadoop configuration 
files (such as `core-site.xml`, `hdfs-site.xml`)
+in the Druid classpath. One way to do this is copying all those files under 
`${DRUID_HOME}/conf/_common`.
+
+If you are using the Hadoop ingestion, set your output directory to be a 
location on Hadoop and it will work.
 If you want to eagerly authenticate against a secured hadoop/hdfs cluster you 
must set `druid.hadoop.security.kerberos.principal` and 
`druid.hadoop.security.kerberos.keytab`, this is an alternative to the cron job 
method that runs `kinit` command periodically.
 
-### Configuration for Google Cloud Storage
+### Configuration for Cloud Storage
+
+You can also use the AWS S3 or the Google Cloud Storage as the deep storage 
via HDFS.
+
+ Configuration for AWS S3
 
-The HDFS extension can also be used for GCS as deep storage.
+To use the AWS S3 as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 |Property|Possible Values|Description|Default|
 ||---|---|---|
-|`druid.storage.type`|hdfs||Must be set.|
-|`druid.storage.storageDirectory`||gs://bucket/example/directory|Must be set.|
+|`druid.storage.type`|hdfs| |Must be set.|
+|`druid.storage.storageDirectory`|s3a://bucket/example/directory or 
s3n://bucket/example/directory|Path to the deep storage|Must be set.|
 
-All services that need to access GCS need to have the [GCS connector 
jar](https://cloud.google.com/hadoop/google-cloud-storage-connector#manualinstallation)
 in their class path. One option is to place this jar in /lib/ and 
/extensions/druid-hdfs-storage/
+You also need to include the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html),
 especially the `hadoop-aws.jar` in the Druid classpath.
+Run the below command to install the `hadoop-aws.jar` file under 
`${DRUID_HOME}/extensions/druid-hdfs-storage` in all nodes.
 
-Tested with Druid 0.9.0, Hadoop 2.7.2 and gcs-connector jar 1.4.4-hadoop2.
-
-
+```bash
+java -classpath "${DRUID_HOME}lib/*" org.apache.druid.cli.Main tools pull-deps 
-h "org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}";
+cp 
${DRUID_HOME}/hadoop-dependencies/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar
 ${DRUID_HOME}/extensions/druid-hdfs-storage/
+```
 
-## Native batch ingestion
+Finally, you need to add the below properties in the `core-site.xml`.
+For more configurations, see the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
+
+```xml
+
+  fs.s3a.impl
+  org.apache.hadoop.fs.s3a.S3AFileSystem
+  The implementation class of the S3A Filesystem
+
+
+
+  fs.AbstractFileSystem.s3a.impl
+  org.apache.hadoop.fs.s3a.S3A
+  The implementation class of the S3A 
AbstractFileSystem.
+
+
+
+  fs.s3a.access.key
+  AWS access key ID. Omit for IAM role-based or provider-based 
authentication.
+  your access key
+
+
+
+  fs.s3a.secret.key
+  AWS secret key. Omit for IAM role-based or provider-based 
authentication.
+  your secret key
+
+```
 
-This firehose ingests events from a predefined list of files from a Hadoop 
filesystem.
-This firehose is _splittable_ and can be used by [native parallel index 
tasks](../../ingestion/native-batch.md#parallel-task).
-Since each split represents an HDFS file, each worker task of `index_parallel` 
will read an object.
+ Configuration for Google Cloud Storage
 
-Sample spec:
+To use the Google cloud Storage as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 Review comment:
   Thanks, fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367707404
 
 

 ##
 File path: docs/development/modules.md
 ##
 @@ -148,29 +150,43 @@ To start a segment killing task, you need to access the 
old Coordinator console
 
 After the killing task ends, `index.zip` (`partitionNum_index.zip` for HDFS 
data storage) file should be deleted from the data storage.
 
-### Adding a new Firehose
+### Adding support for a new input source
 
-There is an example of this in the `s3-extensions` module with the 
StaticS3FirehoseFactory.
+Adding support for a new input source requires to implement three interfaces, 
i.e., `InputSource`, `InputEntity`, and `InputSourceReader`.
+`InputSource` is to define where the input data is stored. `InputEntity` is to 
define how data can be read in parallel
+in [native parallel indexing](../ingestion/native-batch.md).
+`InputSourceReader` defines how to read your new input source and you can 
simply use the provided `InputEntityIteratingReader` in most cases.
 
-Adding a Firehose is done almost entirely through the Jackson Modules instead 
of Guice.  Specifically, note the implementation
+There is an example of this in the `druid-s3-extensions` module with the 
`S3InputSource` and `S3Entity`.
+
+Adding an InputSource is done almost entirely through the Jackson Modules 
instead of Guice. Specifically, note the implementation
 
 ``` java
 @Override
 public List getJacksonModules()
 {
   return ImmutableList.of(
-  new SimpleModule().registerSubtypes(new 
NamedType(StaticS3FirehoseFactory.class, "static-s3"))
+  new SimpleModule().registerSubtypes(new 
NamedType(S3InputSource.class, "s3"))
   );
 }
 ```
 
-This is registering the FirehoseFactory with Jackson's polymorphic 
serialization/deserialization layer.  More concretely, having this will mean 
that if you specify a `"firehose": { "type": "static-s3", ... }` in your 
realtime config, then the system will load this FirehoseFactory for your 
firehose.
+This is registering the InputSource with Jackson's polymorphic 
serialization/deserialization layer.  More concretely, having this will mean 
that if you specify a `"inputSource": { "type": "s3", ... }` in your IO config, 
then the system will load this InputSource for your `InputSource` 
implementation.
+
+Note that inside of Druid, we have made the @JacksonInject annotation for 
Jackson deserialized objects actually use the base Guice injector to resolve 
the object to be injected.  So, if your InputSource needs access to some 
object, you can add a @JacksonInject annotation on a setter and it will get set 
on instantiation.
 
 Review comment:
   Added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367707398
 
 

 ##
 File path: docs/development/modules.md
 ##
 @@ -148,29 +150,43 @@ To start a segment killing task, you need to access the 
old Coordinator console
 
 After the killing task ends, `index.zip` (`partitionNum_index.zip` for HDFS 
data storage) file should be deleted from the data storage.
 
-### Adding a new Firehose
+### Adding support for a new input source
 
-There is an example of this in the `s3-extensions` module with the 
StaticS3FirehoseFactory.
+Adding support for a new input source requires to implement three interfaces, 
i.e., `InputSource`, `InputEntity`, and `InputSourceReader`.
+`InputSource` is to define where the input data is stored. `InputEntity` is to 
define how data can be read in parallel
+in [native parallel indexing](../ingestion/native-batch.md).
+`InputSourceReader` defines how to read your new input source and you can 
simply use the provided `InputEntityIteratingReader` in most cases.
 
-Adding a Firehose is done almost entirely through the Jackson Modules instead 
of Guice.  Specifically, note the implementation
+There is an example of this in the `druid-s3-extensions` module with the 
`S3InputSource` and `S3Entity`.
+
+Adding an InputSource is done almost entirely through the Jackson Modules 
instead of Guice. Specifically, note the implementation
 
 ``` java
 @Override
 public List getJacksonModules()
 {
   return ImmutableList.of(
-  new SimpleModule().registerSubtypes(new 
NamedType(StaticS3FirehoseFactory.class, "static-s3"))
+  new SimpleModule().registerSubtypes(new 
NamedType(S3InputSource.class, "s3"))
   );
 }
 ```
 
-This is registering the FirehoseFactory with Jackson's polymorphic 
serialization/deserialization layer.  More concretely, having this will mean 
that if you specify a `"firehose": { "type": "static-s3", ... }` in your 
realtime config, then the system will load this FirehoseFactory for your 
firehose.
+This is registering the InputSource with Jackson's polymorphic 
serialization/deserialization layer.  More concretely, having this will mean 
that if you specify a `"inputSource": { "type": "s3", ... }` in your IO config, 
then the system will load this InputSource for your `InputSource` 
implementation.
+
+Note that inside of Druid, we have made the @JacksonInject annotation for 
Jackson deserialized objects actually use the base Guice injector to resolve 
the object to be injected.  So, if your InputSource needs access to some 
object, you can add a @JacksonInject annotation on a setter and it will get set 
on instantiation.
+
+### Adding support for a new data format
+
+Adding support for a new data format requires to implement two interfaces, 
i.e., `InputFormat` and `InputEntityReader`.
 
 Review comment:
   Fixed, thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367707455
 
 

 ##
 File path: docs/development/extensions-core/hdfs.md
 ##
 @@ -36,49 +36,110 @@ To use this Apache Druid extension, make sure to 
[include](../../development/ext
 |`druid.hadoop.security.kerberos.principal`|`dr...@example.com`| Principal 
user name |empty|
 
|`druid.hadoop.security.kerberos.keytab`|`/etc/security/keytabs/druid.headlessUser.keytab`|Path
 to keytab file|empty|
 
-If you are using the Hadoop indexer, set your output directory to be a 
location on Hadoop and it will work.
+Besides the above settings, you also need to include all Hadoop configuration 
files (such as `core-site.xml`, `hdfs-site.xml`)
+in the Druid classpath. One way to do this is copying all those files under 
`${DRUID_HOME}/conf/_common`.
+
+If you are using the Hadoop ingestion, set your output directory to be a 
location on Hadoop and it will work.
 If you want to eagerly authenticate against a secured hadoop/hdfs cluster you 
must set `druid.hadoop.security.kerberos.principal` and 
`druid.hadoop.security.kerberos.keytab`, this is an alternative to the cron job 
method that runs `kinit` command periodically.
 
-### Configuration for Google Cloud Storage
+### Configuration for Cloud Storage
+
+You can also use the AWS S3 or the Google Cloud Storage as the deep storage 
via HDFS.
+
+ Configuration for AWS S3
 
-The HDFS extension can also be used for GCS as deep storage.
+To use the AWS S3 as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 |Property|Possible Values|Description|Default|
 ||---|---|---|
-|`druid.storage.type`|hdfs||Must be set.|
-|`druid.storage.storageDirectory`||gs://bucket/example/directory|Must be set.|
+|`druid.storage.type`|hdfs| |Must be set.|
+|`druid.storage.storageDirectory`|s3a://bucket/example/directory or 
s3n://bucket/example/directory|Path to the deep storage|Must be set.|
 
-All services that need to access GCS need to have the [GCS connector 
jar](https://cloud.google.com/hadoop/google-cloud-storage-connector#manualinstallation)
 in their class path. One option is to place this jar in /lib/ and 
/extensions/druid-hdfs-storage/
+You also need to include the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html),
 especially the `hadoop-aws.jar` in the Druid classpath.
+Run the below command to install the `hadoop-aws.jar` file under 
`${DRUID_HOME}/extensions/druid-hdfs-storage` in all nodes.
 
-Tested with Druid 0.9.0, Hadoop 2.7.2 and gcs-connector jar 1.4.4-hadoop2.
-
-
+```bash
+java -classpath "${DRUID_HOME}lib/*" org.apache.druid.cli.Main tools pull-deps 
-h "org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}";
+cp 
${DRUID_HOME}/hadoop-dependencies/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar
 ${DRUID_HOME}/extensions/druid-hdfs-storage/
+```
 
-## Native batch ingestion
+Finally, you need to add the below properties in the `core-site.xml`.
+For more configurations, see the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
+
+```xml
+
+  fs.s3a.impl
+  org.apache.hadoop.fs.s3a.S3AFileSystem
+  The implementation class of the S3A Filesystem
+
+
+
+  fs.AbstractFileSystem.s3a.impl
+  org.apache.hadoop.fs.s3a.S3A
+  The implementation class of the S3A 
AbstractFileSystem.
+
+
+
+  fs.s3a.access.key
+  AWS access key ID. Omit for IAM role-based or provider-based 
authentication.
+  your access key
+
+
+
+  fs.s3a.secret.key
+  AWS secret key. Omit for IAM role-based or provider-based 
authentication.
+  your secret key
+
+```
 
-This firehose ingests events from a predefined list of files from a Hadoop 
filesystem.
-This firehose is _splittable_ and can be used by [native parallel index 
tasks](../../ingestion/native-batch.md#parallel-task).
-Since each split represents an HDFS file, each worker task of `index_parallel` 
will read an object.
+ Configuration for Google Cloud Storage
 
 Review comment:
   I added `google.cloud.auth.service.account.enable` property. Haven't checked 
how it works, but just copied from 
https://github.com/GoogleCloudDataproc/bigdata-interop/blob/master/gcs/INSTALL.md.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] lgtm-com[bot] commented on issue #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
lgtm-com[bot] commented on issue #9181: Speed up String first/last aggregators 
when folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#issuecomment-575400116
 
 
   This pull request **fixes 1 alert** when merging 
c56d895caf30f0b3171ea5cc09615e551adeeae4 into 
42359c93dd53f16e52ed79dcd8b63829f4bf2f7b - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-60ba177285eee825f576c2665f6a7661b4aff17a)
   
   **fixed alerts:**
   
   * 1 for Useless null check


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (bfcb30e -> bd49ec0)

2020-01-16 Thread gian
This is an automated email from the ASF dual-hosted git repository.

gian pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from bfcb30e  Add javadocs and small improvements to join code. (#9196)
 add bd49ec0  Move result-to-array logic from SQL layer into 
QueryToolChests. (#9130)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/druid/query/BaseQuery.java |   1 +
 .../main/java/org/apache/druid/query/Query.java|   1 +
 .../org/apache/druid/query/QueryToolChest.java |  49 +++-
 .../query/groupby/GroupByQueryQueryToolChest.java  |  11 +
 .../apache/druid/query/scan/ScanQueryEngine.java   |   6 +-
 .../druid/query/scan/ScanQueryQueryToolChest.java  |  75 ++
 .../timeseries/TimeseriesQueryQueryToolChest.java  |  43 
 .../druid/query/topn/TopNQueryQueryToolChest.java  |  49 
 .../druid/query/QueryToolChestTestHelper.java} |  18 +-
 .../groupby/GroupByQueryQueryToolChestTest.java| 109 +
 .../query/scan/ScanQueryQueryToolChestTest.java| 205 +
 .../TimeseriesQueryQueryToolChestTest.java |  64 +-
 .../query/topn/TopNQueryQueryToolChestTest.java|  72 ++
 .../org/apache/druid/server/QueryLifecycle.java|   1 +
 .../sql/calcite/expression/SimpleExtraction.java   |  28 ++-
 .../apache/druid/sql/calcite/rel/QueryMaker.java   | 254 +++--
 16 files changed, 789 insertions(+), 197 deletions(-)
 copy processing/src/{main/java/org/apache/druid/query/NoopQueryRunner.java => 
test/java/org/apache/druid/query/QueryToolChestTestHelper.java} (65%)
 create mode 100644 
processing/src/test/java/org/apache/druid/query/scan/ScanQueryQueryToolChestTest.java


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm merged pull request #9130: Move result-to-array logic from SQL layer into QueryToolChests.

2020-01-16 Thread GitBox
gianm merged pull request #9130: Move result-to-array logic from SQL layer into 
QueryToolChests.
URL: https://github.com/apache/druid/pull/9130
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s opened a new pull request #9200: Optimize JoinCondition matching

2020-01-16 Thread GitBox
suneet-s opened a new pull request #9200: Optimize JoinCondition matching
URL: https://github.com/apache/druid/pull/9200
 
 
   ### Description
   
   The LookupJoinMatcher needs to check if a condition is always true or false
   multiple times. This can be pre-computed to speed up the match checking
   
   This change reduces the time it takes to perform a for joining on a long key
   from ~ 36 ms/op to 23 ms/ op
   
   ![Screen Shot 2020-01-16 at 3 34 16 
PM](https://user-images.githubusercontent.com/44787917/72571945-e6a31d00-3875-11ea-8f88-6cecc8a9ee1b.png)
   
   This PR has:
   - [ ] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson opened a new pull request #9199: Fix TSV bugs

2020-01-16 Thread GitBox
jihoonson opened a new pull request #9199: Fix TSV bugs
URL: https://github.com/apache/druid/pull/9199
 
 
   Fixes https://github.com/apache/druid/issues/9156, #9177, and #9188.
   
   
   
   This PR has:
   - [x] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367702186
 
 

 ##
 File path: docs/ingestion/hadoop.md
 ##
 @@ -145,7 +145,51 @@ A type of inputSpec where a static path to the data files 
is provided.
 For example, using the static input paths:
 
 ```
-"paths" : 
"s3n://billy-bucket/the/data/is/here/data.gz,s3n://billy-bucket/the/data/is/here/moredata.gz,s3n://billy-bucket/the/data/is/here/evenmoredata.gz"
+"paths" : 
"hdfs://path/to/data/is/here/data.gz,hdfs://path/to/data/is/here/moredata.gz,hdfs://path/to/data/is/here/evenmoredata.gz"
+```
+
+You can also read from cloud storage such as AWS S3 or Google Cloud Storage.
+To do so, you need to install the necessary library under 
`${DRUID_HOME}/hadoop-dependencies` in _all MiddleManager or Indexer processes_.
 
 Review comment:
   Noting here that `${DRUID_HOME}/hadoop-dependencies` doesn't work for this 
since the HDFS extension needs these libraries on the peon startup


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367697988
 
 

 ##
 File path: docs/development/extensions-core/hdfs.md
 ##
 @@ -36,49 +36,110 @@ To use this Apache Druid extension, make sure to 
[include](../../development/ext
 |`druid.hadoop.security.kerberos.principal`|`dr...@example.com`| Principal 
user name |empty|
 
|`druid.hadoop.security.kerberos.keytab`|`/etc/security/keytabs/druid.headlessUser.keytab`|Path
 to keytab file|empty|
 
-If you are using the Hadoop indexer, set your output directory to be a 
location on Hadoop and it will work.
+Besides the above settings, you also need to include all Hadoop configuration 
files (such as `core-site.xml`, `hdfs-site.xml`)
+in the Druid classpath. One way to do this is copying all those files under 
`${DRUID_HOME}/conf/_common`.
+
+If you are using the Hadoop ingestion, set your output directory to be a 
location on Hadoop and it will work.
 If you want to eagerly authenticate against a secured hadoop/hdfs cluster you 
must set `druid.hadoop.security.kerberos.principal` and 
`druid.hadoop.security.kerberos.keytab`, this is an alternative to the cron job 
method that runs `kinit` command periodically.
 
-### Configuration for Google Cloud Storage
+### Configuration for Cloud Storage
+
+You can also use the AWS S3 or the Google Cloud Storage as the deep storage 
via HDFS.
+
+ Configuration for AWS S3
 
-The HDFS extension can also be used for GCS as deep storage.
+To use the AWS S3 as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 |Property|Possible Values|Description|Default|
 ||---|---|---|
-|`druid.storage.type`|hdfs||Must be set.|
-|`druid.storage.storageDirectory`||gs://bucket/example/directory|Must be set.|
+|`druid.storage.type`|hdfs| |Must be set.|
+|`druid.storage.storageDirectory`|s3a://bucket/example/directory or 
s3n://bucket/example/directory|Path to the deep storage|Must be set.|
 
-All services that need to access GCS need to have the [GCS connector 
jar](https://cloud.google.com/hadoop/google-cloud-storage-connector#manualinstallation)
 in their class path. One option is to place this jar in /lib/ and 
/extensions/druid-hdfs-storage/
+You also need to include the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html),
 especially the `hadoop-aws.jar` in the Druid classpath.
+Run the below command to install the `hadoop-aws.jar` file under 
`${DRUID_HOME}/extensions/druid-hdfs-storage` in all nodes.
 
-Tested with Druid 0.9.0, Hadoop 2.7.2 and gcs-connector jar 1.4.4-hadoop2.
-
-
+```bash
+java -classpath "${DRUID_HOME}lib/*" org.apache.druid.cli.Main tools pull-deps 
-h "org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}";
+cp 
${DRUID_HOME}/hadoop-dependencies/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar
 ${DRUID_HOME}/extensions/druid-hdfs-storage/
+```
 
-## Native batch ingestion
+Finally, you need to add the below properties in the `core-site.xml`.
+For more configurations, see the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
+
+```xml
+
+  fs.s3a.impl
+  org.apache.hadoop.fs.s3a.S3AFileSystem
+  The implementation class of the S3A Filesystem
+
+
+
+  fs.AbstractFileSystem.s3a.impl
+  org.apache.hadoop.fs.s3a.S3A
+  The implementation class of the S3A 
AbstractFileSystem.
+
+
+
+  fs.s3a.access.key
+  AWS access key ID. Omit for IAM role-based or provider-based 
authentication.
+  your access key
+
+
+
+  fs.s3a.secret.key
+  AWS secret key. Omit for IAM role-based or provider-based 
authentication.
+  your secret key
+
+```
 
-This firehose ingests events from a predefined list of files from a Hadoop 
filesystem.
-This firehose is _splittable_ and can be used by [native parallel index 
tasks](../../ingestion/native-batch.md#parallel-task).
-Since each split represents an HDFS file, each worker task of `index_parallel` 
will read an object.
+ Configuration for Google Cloud Storage
 
-Sample spec:
+To use the Google cloud Storage as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 Review comment:
   Google cloud Storage -> Google Cloud Storage


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367700282
 
 

 ##
 File path: docs/development/modules.md
 ##
 @@ -148,29 +150,43 @@ To start a segment killing task, you need to access the 
old Coordinator console
 
 After the killing task ends, `index.zip` (`partitionNum_index.zip` for HDFS 
data storage) file should be deleted from the data storage.
 
-### Adding a new Firehose
+### Adding support for a new input source
 
-There is an example of this in the `s3-extensions` module with the 
StaticS3FirehoseFactory.
+Adding support for a new input source requires to implement three interfaces, 
i.e., `InputSource`, `InputEntity`, and `InputSourceReader`.
+`InputSource` is to define where the input data is stored. `InputEntity` is to 
define how data can be read in parallel
+in [native parallel indexing](../ingestion/native-batch.md).
+`InputSourceReader` defines how to read your new input source and you can 
simply use the provided `InputEntityIteratingReader` in most cases.
 
-Adding a Firehose is done almost entirely through the Jackson Modules instead 
of Guice.  Specifically, note the implementation
+There is an example of this in the `druid-s3-extensions` module with the 
`S3InputSource` and `S3Entity`.
+
+Adding an InputSource is done almost entirely through the Jackson Modules 
instead of Guice. Specifically, note the implementation
 
 ``` java
 @Override
 public List getJacksonModules()
 {
   return ImmutableList.of(
-  new SimpleModule().registerSubtypes(new 
NamedType(StaticS3FirehoseFactory.class, "static-s3"))
+  new SimpleModule().registerSubtypes(new 
NamedType(S3InputSource.class, "s3"))
   );
 }
 ```
 
-This is registering the FirehoseFactory with Jackson's polymorphic 
serialization/deserialization layer.  More concretely, having this will mean 
that if you specify a `"firehose": { "type": "static-s3", ... }` in your 
realtime config, then the system will load this FirehoseFactory for your 
firehose.
+This is registering the InputSource with Jackson's polymorphic 
serialization/deserialization layer.  More concretely, having this will mean 
that if you specify a `"inputSource": { "type": "s3", ... }` in your IO config, 
then the system will load this InputSource for your `InputSource` 
implementation.
+
+Note that inside of Druid, we have made the @JacksonInject annotation for 
Jackson deserialized objects actually use the base Guice injector to resolve 
the object to be injected.  So, if your InputSource needs access to some 
object, you can add a @JacksonInject annotation on a setter and it will get set 
on instantiation.
 
 Review comment:
   suggest putting backticks around `@JacksonInject`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367700438
 
 

 ##
 File path: docs/development/modules.md
 ##
 @@ -148,29 +150,43 @@ To start a segment killing task, you need to access the 
old Coordinator console
 
 After the killing task ends, `index.zip` (`partitionNum_index.zip` for HDFS 
data storage) file should be deleted from the data storage.
 
-### Adding a new Firehose
+### Adding support for a new input source
 
-There is an example of this in the `s3-extensions` module with the 
StaticS3FirehoseFactory.
+Adding support for a new input source requires to implement three interfaces, 
i.e., `InputSource`, `InputEntity`, and `InputSourceReader`.
+`InputSource` is to define where the input data is stored. `InputEntity` is to 
define how data can be read in parallel
+in [native parallel indexing](../ingestion/native-batch.md).
+`InputSourceReader` defines how to read your new input source and you can 
simply use the provided `InputEntityIteratingReader` in most cases.
 
-Adding a Firehose is done almost entirely through the Jackson Modules instead 
of Guice.  Specifically, note the implementation
+There is an example of this in the `druid-s3-extensions` module with the 
`S3InputSource` and `S3Entity`.
+
+Adding an InputSource is done almost entirely through the Jackson Modules 
instead of Guice. Specifically, note the implementation
 
 ``` java
 @Override
 public List getJacksonModules()
 {
   return ImmutableList.of(
-  new SimpleModule().registerSubtypes(new 
NamedType(StaticS3FirehoseFactory.class, "static-s3"))
+  new SimpleModule().registerSubtypes(new 
NamedType(S3InputSource.class, "s3"))
   );
 }
 ```
 
-This is registering the FirehoseFactory with Jackson's polymorphic 
serialization/deserialization layer.  More concretely, having this will mean 
that if you specify a `"firehose": { "type": "static-s3", ... }` in your 
realtime config, then the system will load this FirehoseFactory for your 
firehose.
+This is registering the InputSource with Jackson's polymorphic 
serialization/deserialization layer.  More concretely, having this will mean 
that if you specify a `"inputSource": { "type": "s3", ... }` in your IO config, 
then the system will load this InputSource for your `InputSource` 
implementation.
+
+Note that inside of Druid, we have made the @JacksonInject annotation for 
Jackson deserialized objects actually use the base Guice injector to resolve 
the object to be injected.  So, if your InputSource needs access to some 
object, you can add a @JacksonInject annotation on a setter and it will get set 
on instantiation.
+
+### Adding support for a new data format
+
+Adding support for a new data format requires to implement two interfaces, 
i.e., `InputFormat` and `InputEntityReader`.
 
 Review comment:
   Suggest the following
   
   "requires to implement two interfaces, i.e.," -> "requires implementing two 
interfaces: " 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (42359c9 -> bfcb30e)

2020-01-16 Thread gian
This is an automated email from the ASF dual-hosted git repository.

gian pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 42359c9  Implement ANY aggregator (#9187)
 add bfcb30e  Add javadocs and small improvements to join code. (#9196)

No new revisions were added by this update.

Summary of changes:
 .../druid/segment/ColumnProcessorFactory.java  |  3 ++
 .../apache/druid/segment/join/HashJoinEngine.java  |  7 ++--
 .../druid/segment/join/JoinConditionAnalysis.java  |  8 +
 .../apache/druid/segment/join/JoinableClause.java  | 10 +-
 .../join/PossiblyNullColumnValueSelector.java  |  4 +++
 .../druid/segment/join/table/IndexedTable.java | 40 ++
 .../join/table/IndexedTableJoinMatcher.java|  2 +-
 7 files changed, 70 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm merged pull request #9196: Add javadocs and small improvements to join code.

2020-01-16 Thread GitBox
gianm merged pull request #9196: Add javadocs and small improvements to join 
code.
URL: https://github.com/apache/druid/pull/9196
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367692944
 
 

 ##
 File path: docs/development/extensions-core/hdfs.md
 ##
 @@ -36,49 +36,110 @@ To use this Apache Druid extension, make sure to 
[include](../../development/ext
 |`druid.hadoop.security.kerberos.principal`|`dr...@example.com`| Principal 
user name |empty|
 
|`druid.hadoop.security.kerberos.keytab`|`/etc/security/keytabs/druid.headlessUser.keytab`|Path
 to keytab file|empty|
 
-If you are using the Hadoop indexer, set your output directory to be a 
location on Hadoop and it will work.
+Besides the above settings, you also need to include all Hadoop configuration 
files (such as `core-site.xml`, `hdfs-site.xml`)
+in the Druid classpath. One way to do this is copying all those files under 
`${DRUID_HOME}/conf/_common`.
+
+If you are using the Hadoop ingestion, set your output directory to be a 
location on Hadoop and it will work.
 If you want to eagerly authenticate against a secured hadoop/hdfs cluster you 
must set `druid.hadoop.security.kerberos.principal` and 
`druid.hadoop.security.kerberos.keytab`, this is an alternative to the cron job 
method that runs `kinit` command periodically.
 
-### Configuration for Google Cloud Storage
+### Configuration for Cloud Storage
+
+You can also use the AWS S3 or the Google Cloud Storage as the deep storage 
via HDFS.
+
+ Configuration for AWS S3
 
-The HDFS extension can also be used for GCS as deep storage.
+To use the AWS S3 as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 |Property|Possible Values|Description|Default|
 ||---|---|---|
-|`druid.storage.type`|hdfs||Must be set.|
-|`druid.storage.storageDirectory`||gs://bucket/example/directory|Must be set.|
+|`druid.storage.type`|hdfs| |Must be set.|
+|`druid.storage.storageDirectory`|s3a://bucket/example/directory or 
s3n://bucket/example/directory|Path to the deep storage|Must be set.|
 
-All services that need to access GCS need to have the [GCS connector 
jar](https://cloud.google.com/hadoop/google-cloud-storage-connector#manualinstallation)
 in their class path. One option is to place this jar in /lib/ and 
/extensions/druid-hdfs-storage/
+You also need to include the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html),
 especially the `hadoop-aws.jar` in the Druid classpath.
+Run the below command to install the `hadoop-aws.jar` file under 
`${DRUID_HOME}/extensions/druid-hdfs-storage` in all nodes.
 
-Tested with Druid 0.9.0, Hadoop 2.7.2 and gcs-connector jar 1.4.4-hadoop2.
-
-
+```bash
+java -classpath "${DRUID_HOME}lib/*" org.apache.druid.cli.Main tools pull-deps 
-h "org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}";
+cp 
${DRUID_HOME}/hadoop-dependencies/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar
 ${DRUID_HOME}/extensions/druid-hdfs-storage/
+```
 
-## Native batch ingestion
+Finally, you need to add the below properties in the `core-site.xml`.
+For more configurations, see the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
+
+```xml
+
+  fs.s3a.impl
+  org.apache.hadoop.fs.s3a.S3AFileSystem
+  The implementation class of the S3A Filesystem
+
+
+
+  fs.AbstractFileSystem.s3a.impl
+  org.apache.hadoop.fs.s3a.S3A
+  The implementation class of the S3A 
AbstractFileSystem.
+
+
+
+  fs.s3a.access.key
+  AWS access key ID. Omit for IAM role-based or provider-based 
authentication.
+  your access key
+
+
+
+  fs.s3a.secret.key
+  AWS secret key. Omit for IAM role-based or provider-based 
authentication.
+  your secret key
+
+```
 
-This firehose ingests events from a predefined list of files from a Hadoop 
filesystem.
-This firehose is _splittable_ and can be used by [native parallel index 
tasks](../../ingestion/native-batch.md#parallel-task).
-Since each split represents an HDFS file, each worker task of `index_parallel` 
will read an object.
+ Configuration for Google Cloud Storage
 
 Review comment:
   Is there authentication configuration needed for accessing GCS? Could add 
that in a follow-on PR if so.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on issue #9169: Docker-compose.yml broken after de-incubation cleanup

2020-01-16 Thread GitBox
jon-wei commented on issue #9169: Docker-compose.yml broken after de-incubation 
cleanup
URL: https://github.com/apache/druid/issues/9169#issuecomment-575382444
 
 
   @nh43de Thanks for the report, we're in the process of migrating to the new 
repo: https://issues.apache.org/jira/browse/INFRA-19648


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (a87db7f -> 42359c9)

2020-01-16 Thread jonwei
This is an automated email from the ASF dual-hosted git repository.

jonwei pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from a87db7f  Add HashJoinSegment, a virtual segment for joins. (#9111)
 add 42359c9  Implement ANY aggregator (#9187)

No new revisions were added by this update.

Summary of changes:
 .../apache/druid/java/util/common/StringUtils.java |  22 +++
 docs/querying/aggregations.md  |  55 ++
 docs/querying/sql.md   |   4 +
 .../apache/druid/jackson/AggregatorsModule.java|  10 +-
 .../druid/query/aggregation/AggregatorUtil.java|   6 +
 .../DoubleAnyAggregator.java}  |  50 ++---
 .../DoubleAnyAggregatorFactory.java}   |  59 +++---
 .../DoubleAnyBufferAggregator.java}|  47 ++---
 .../FloatAnyAggregator.java}   |  45 ++---
 .../FloatAnyAggregatorFactory.java}|  61 +++
 .../FloatAnyBufferAggregator.java} |  47 ++---
 .../LongAnyAggregator.java}|  48 ++---
 .../LongAnyAggregatorFactory.java} |  59 +++---
 .../LongAnyBufferAggregator.java}  |  47 ++---
 .../query/aggregation/any/StringAnyAggregator.java |  82 +
 .../StringAnyAggregatorFactory.java}   |  65 +++
 .../aggregation/any/StringAnyBufferAggregator.java | 102 +++
 .../aggregation/first/StringFirstAggregator.java   |   3 +-
 .../aggregation/first/StringFirstLastUtils.java|  14 --
 .../aggregation/last/StringLastAggregator.java |   3 +-
 ...or.java => EarliestLatestAnySqlAggregator.java} |  59 --
 .../aggregation/builtin/SimpleSqlAggregator.java   |   7 +-
 .../sql/calcite/planner/DruidOperatorTable.java|   7 +-
 .../apache/druid/sql/calcite/CalciteQueryTest.java | 202 +
 24 files changed, 791 insertions(+), 313 deletions(-)
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{first/DoubleFirstAggregator.java
 => any/DoubleAnyAggregator.java} (55%)
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{DoubleMaxAggregatorFactory.java
 => any/DoubleAnyAggregatorFactory.java} (65%)
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{last/DoubleLastBufferAggregator.java
 => any/DoubleAnyBufferAggregator.java} (56%)
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{last/FloatLastAggregator.java
 => any/FloatAnyAggregator.java} (56%)
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{FloatMinAggregatorFactory.java
 => any/FloatAnyAggregatorFactory.java} (65%)
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{first/FloatFirstBufferAggregator.java
 => any/FloatAnyBufferAggregator.java} (56%)
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{first/LongFirstAggregator.java
 => any/LongAnyAggregator.java} (56%)
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{LongMaxAggregatorFactory.java
 => any/LongAnyAggregatorFactory.java} (65%)
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{first/LongFirstBufferAggregator.java
 => any/LongAnyBufferAggregator.java} (56%)
 create mode 100644 
processing/src/main/java/org/apache/druid/query/aggregation/any/StringAnyAggregator.java
 copy 
processing/src/main/java/org/apache/druid/query/aggregation/{last/StringLastAggregatorFactory.java
 => any/StringAnyAggregatorFactory.java} (68%)
 create mode 100644 
processing/src/main/java/org/apache/druid/query/aggregation/any/StringAnyBufferAggregator.java
 rename 
sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/{EarliestLatestSqlAggregator.java
 => EarliestLatestAnySqlAggregator.java} (77%)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei merged pull request #9187: Implement ANY aggregator

2020-01-16 Thread GitBox
jon-wei merged pull request #9187: Implement ANY aggregator
URL: https://github.com/apache/druid/pull/9187
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] lgtm-com[bot] commented on issue #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
lgtm-com[bot] commented on issue #9181: Speed up String first/last aggregators 
when folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#issuecomment-575378497
 
 
   This pull request **fixes 1 alert** when merging 
92f2218cf771c70b1173264e96621850bead8ea8 into 
a87db7f353cdee4dfa9b541063f59d67706d1b07 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-cd9a4712d6ee17be67a5574f81c981254ad0052b)
   
   **fixed alerts:**
   
   * 1 for Useless null check


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] vogievetsky opened a new pull request #9198: Web console: fix bug where arrays can not be emptied out in the coordinator dialog

2020-01-16 Thread GitBox
vogievetsky opened a new pull request #9198: Web console: fix bug where arrays 
can not be emptied out in the coordinator dialog
URL: https://github.com/apache/druid/pull/9198
 
 
   Allow the defining of specific empty values in the AutoForm


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch 0.17.0 updated: Fix deserialization of maxBytesInMemory (#9092) (#9170)

2020-01-16 Thread cwylie
This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a commit to branch 0.17.0
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/0.17.0 by this push:
 new e6246c9  Fix deserialization of maxBytesInMemory (#9092) (#9170)
e6246c9 is described below

commit e6246c96f7cce9f7d3b5d17ca2cf27a7963eddc3
Author: Clint Wylie 
AuthorDate: Thu Jan 16 13:47:11 2020 -0800

Fix deserialization of maxBytesInMemory (#9092) (#9170)

* Fix deserialization of maxBytesInMemory

* Add maxBytes check

Co-authored-by: Atul Mohan 
---
 .../indexing/common/index/RealtimeAppenderatorTuningConfig.java | 1 +
 .../java/org/apache/druid/indexing/common/task/TaskSerdeTest.java   | 6 +-
 .../org/apache/druid/segment/indexing/RealtimeTuningConfig.java | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git 
a/indexing-service/src/main/java/org/apache/druid/indexing/common/index/RealtimeAppenderatorTuningConfig.java
 
b/indexing-service/src/main/java/org/apache/druid/indexing/common/index/RealtimeAppenderatorTuningConfig.java
index b66ccc8..eec9b98 100644
--- 
a/indexing-service/src/main/java/org/apache/druid/indexing/common/index/RealtimeAppenderatorTuningConfig.java
+++ 
b/indexing-service/src/main/java/org/apache/druid/indexing/common/index/RealtimeAppenderatorTuningConfig.java
@@ -143,6 +143,7 @@ public class RealtimeAppenderatorTuningConfig implements 
TuningConfig, Appendera
   }
 
   @Override
+  @JsonProperty
   public long getMaxBytesInMemory()
   {
 return maxBytesInMemory;
diff --git 
a/indexing-service/src/test/java/org/apache/druid/indexing/common/task/TaskSerdeTest.java
 
b/indexing-service/src/test/java/org/apache/druid/indexing/common/task/TaskSerdeTest.java
index 2ba37ff..c5841ea 100644
--- 
a/indexing-service/src/test/java/org/apache/druid/indexing/common/task/TaskSerdeTest.java
+++ 
b/indexing-service/src/test/java/org/apache/druid/indexing/common/task/TaskSerdeTest.java
@@ -394,7 +394,7 @@ public class TaskSerdeTest
 
 new RealtimeTuningConfig(
 1,
-null,
+10L,
 new Period("PT10M"),
 null,
 null,
@@ -446,6 +446,10 @@ public class TaskSerdeTest
 task2.getRealtimeIngestionSchema().getTuningConfig().getWindowPeriod()
 );
 Assert.assertEquals(
+
task.getRealtimeIngestionSchema().getTuningConfig().getMaxBytesInMemory(),
+
task2.getRealtimeIngestionSchema().getTuningConfig().getMaxBytesInMemory()
+);
+Assert.assertEquals(
 
task.getRealtimeIngestionSchema().getDataSchema().getGranularitySpec().getSegmentGranularity(),
 
task2.getRealtimeIngestionSchema().getDataSchema().getGranularitySpec().getSegmentGranularity()
 );
diff --git 
a/server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java
 
b/server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java
index a467944..728e2ff 100644
--- 
a/server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java
+++ 
b/server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java
@@ -174,6 +174,7 @@ public class RealtimeTuningConfig implements TuningConfig, 
AppenderatorConfig
   }
 
   @Override
+  @JsonProperty
   public long getMaxBytesInMemory()
   {
 return maxBytesInMemory;


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis merged pull request #9170: [Backport] Fix deserialization of maxBytesInMemory

2020-01-16 Thread GitBox
clintropolis merged pull request #9170: [Backport] Fix deserialization of 
maxBytesInMemory
URL: https://github.com/apache/druid/pull/9170
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] maytasm3 opened a new pull request #9197: Fix LATEST / EARLIEST Buffer Aggregator does not work on String column

2020-01-16 Thread GitBox
maytasm3 opened a new pull request #9197: Fix LATEST / EARLIEST Buffer 
Aggregator does not work on String column 
URL: https://github.com/apache/druid/pull/9197
 
 
   Fix LATEST / EARLIEST Buffer Aggregator does not work on String column 
   
   ### Description
   
   The LATEST / EARLIEST Buffer Aggregator was not working on String column 
because of incorrectly set limit on the buffer when storing the string. The 
limit did not take into account the offset position of the start of position to 
write the string.
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9130: Move result-to-array logic from SQL layer into QueryToolChests.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9130: Move result-to-array logic 
from SQL layer into QueryToolChests.
URL: https://github.com/apache/druid/pull/9130#discussion_r367661337
 
 

 ##
 File path: processing/src/main/java/org/apache/druid/query/QueryToolChest.java
 ##
 @@ -269,4 +270,50 @@ public ObjectMapper decorateObjectMapper(final 
ObjectMapper objectMapper, final
   {
 return segments;
   }
+
+  /**
+   * Returns a list of field names in the order than {@link #resultsAsArrays} 
would return them. The returned list will
 
 Review comment:
   Yes, it should. I updated it. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #9130: Move result-to-array logic from SQL layer into QueryToolChests.

2020-01-16 Thread GitBox
clintropolis commented on a change in pull request #9130: Move result-to-array 
logic from SQL layer into QueryToolChests.
URL: https://github.com/apache/druid/pull/9130#discussion_r367259286
 
 

 ##
 File path: processing/src/main/java/org/apache/druid/query/QueryToolChest.java
 ##
 @@ -269,4 +270,50 @@ public ObjectMapper decorateObjectMapper(final 
ObjectMapper objectMapper, final
   {
 return segments;
   }
+
+  /**
+   * Returns a list of field names in the order than {@link #resultsAsArrays} 
would return them. The returned list will
 
 Review comment:
   nit: should this be 'Returns a list of field names in the order _that_ ...'


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367659013
 
 

 ##
 File path: website/package-lock.json
 ##
 @@ -3913,8 +3913,7 @@
 "ansi-regex": {
   "version": "2.1.1",
   "bundled": true,
-  "dev": true,
 
 Review comment:
   Oops, this is not supposed to be added. Reverted all changed in this file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367659050
 
 

 ##
 File path: docs/ingestion/data-formats.md
 ##
 @@ -63,155 +65,968 @@ _TSV (Delimited)_
 
 Note that the CSV and TSV data do not contain column heads. This becomes 
important when you specify the data for ingesting.
 
+Besides text formats, Druid also supports binary formats such as [Orc](#orc) 
and [Parquet](#parquet) formats.
+
 ## Custom Formats
 
 Druid supports custom data formats and can use the `Regex` parser or the 
`JavaScript` parsers to parse these formats. Please note that using any of 
these parsers for
 parsing data will not be as efficient as writing a native Java parser or using 
an external stream processor. We welcome contributions of new Parsers.
 
-## Configuration
+## Input Format
+
+> The Input Format is a new way to specify the data format of your input data 
which was introduced in 0.17.0.
+Unfortunately, the Input Format doesn't support all data formats or ingestion 
methods supported by Druid yet.
+Especially if you want to use the Hadoop ingestion, you still need to use the 
[Parser](#parser-deprecated).
+If your data is formatted in some format not listed in this section, please 
consider using the Parser instead.
 
-All forms of Druid ingestion require some form of schema object. The format of 
the data to be ingested is specified using the`parseSpec` entry in your 
`dataSchema`.
+All forms of Druid ingestion require some form of schema object. The format of 
the data to be ingested is specified using the `inputFormat` entry in your 
[`ioConfig`](index.md#ioconfig).
 
 ### JSON
 
+The `inputFormat` to load data of JSON format. An example is:
+
+```json
+"ioConfig": {
+  "inputFormat": {
+"type": "json"
+  },
+  ...
+}
+```
+
+The JSON `inputFormat` has the following components:
+
+| Field | Type | Description | Required |
+|---|--|-|--|
+| type | String | This should say `json`. | yes |
+| flattenSpec | JSON Object | Specifies flattening configuration for nested 
JSON data. See [`flattenSpec`](#flattenspec) for more info. | no |
+| featureSpec | JSON Object | [JSON parser 
features](https://github.com/FasterXML/jackson-core/wiki/JsonParser-Features) 
supported by Jackson library. Those features will be applied when parsing the 
input JSON data. | no |
+
+### CSV
+
+The `inputFormat` to load data of the CSV format. An example is:
+
+```json
+"ioConfig": {
+  "inputFormat": {
+"type": "csv",
+"columns" : 
["timestamp","page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city","added","deleted","delta"]
+  },
+  ...
+}
+```
+
+The CSV `inputFormat` has the following components:
+
+| Field | Type | Description | Required |
+|---|--|-|--|
+| type | String | This should say `csv`. | yes |
+| listDelimiter | String | A custom delimiter for multi-value dimensions. | no 
(default == ctrl+A) |
+| columns | JSON array | Specifies the columns of the data. The columns should 
be in the same order with the columns of your data. | yes if 
`findColumnsFromHeader` is false or missing |
+| findColumnsFromHeader | Boolean | If this is set, the task will find the 
column names from the header row. Note that `skipHeaderRows` will be applied 
before finding column names from the header. For example, if you set 
`skipHeaderRows` to 2 and `findColumnsFromHeader` to true, the task will skip 
the first two lines and then extract column information from the third line. 
`columns` will be ignored if this is set to true. | no (default = false if 
`columns` is set; otherwise null) |
+| skipHeaderRows | Integer | If this is set, the task will skip the first 
`skipHeaderRows` rows. | no (default = 0) |
+
+### TSV (Delimited)
+
+```json
+"ioConfig": {
+  "inputFormat": {
+"type": "tsv",
+"columns" : 
["timestamp","page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city","added","deleted","delta"],
+"delimiter":"|"
+  },
+  ...
+}
+```
+
+The `inputFormat` to load data of a delimited format. An example is:
+
+| Field | Type | Description | Required |
+|---|--|-|--|
+| type | String | This should say `tsv`. | yes |
+| delimiter | String | A custom delimiter for data values. | no (default == 
`\t`) |
+| listDelimiter | String | A custom delimiter for multi-value dimensions. | no 
(default == ctrl+A) |
+| columns | JSON array | Specifies the columns of the data. The columns should 
be in the same order with the columns of your data. | yes if 
`findColumnsFromHeader` is false or missing |
+| findColumnsFromHeader | Boolean | If this is set, the task will find the 
column names from the header row. Note that `skipHeaderRows` will be applied 
before finding column names from the 

[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367659031
 
 

 ##
 File path: docs/ingestion/index.md
 ##
 @@ -287,44 +289,31 @@ definition is an _ingestion spec_.
 
 Ingestion specs consists of three main components:
 
-- [`dataSchema`](#dataschema), which configures the [datasource 
name](#datasource), [input row parser](#parser),
-   [primary timestamp](#timestampspec), [flattening of nested 
data](#flattenspec) (if needed),
-   [dimensions](#dimensionsspec), [metrics](#metricsspec), and [transforms and 
filters](#transformspec) (if needed).
-- [`ioConfig`](#ioconfig), which tells Druid how to connect to the source 
system and . For more information, see the
+- [`dataSchema`](#dataschema), which configures the [datasource 
name](#datasource),
+   [primary timestamp](#timestampspec), [dimensions](#dimensionsspec), 
[metrics](#metricsspec), and [transforms and filters](#transformspec) (if 
needed).
+- [`ioConfig`](#ioconfig), which tells Druid how to connect to the source 
system and how to parse data. For more information, see the
documentation for each [ingestion method](#ingestion-methods).
 - [`tuningConfig`](#tuningconfig), which controls various tuning parameters 
specific to each
   [ingestion method](#ingestion-methods).
 
-Example ingestion spec for task type "index" (native batch):
+Example ingestion spec for task type `parallel_index` (native batch):
 
 ```
 {
-  "type": "index",
+  "type": "parallel_index",
 
 Review comment:
   Oops, thanks. Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367658914
 
 

 ##
 File path: docs/development/extensions-core/hdfs.md
 ##
 @@ -36,49 +36,105 @@ To use this Apache Druid extension, make sure to 
[include](../../development/ext
 |`druid.hadoop.security.kerberos.principal`|`dr...@example.com`| Principal 
user name |empty|
 
|`druid.hadoop.security.kerberos.keytab`|`/etc/security/keytabs/druid.headlessUser.keytab`|Path
 to keytab file|empty|
 
-If you are using the Hadoop indexer, set your output directory to be a 
location on Hadoop and it will work.
+Besides the above settings, you also need to include all Hadoop configuration 
files (such as `core-site.xml`, `hdfs-site.xml`)
+in the Druid classpath. One way to do this is copying all those files under 
`${DRUID_HOME}/conf/_common`.
+
+If you are using the Hadoop ingestion, set your output directory to be a 
location on Hadoop and it will work.
 If you want to eagerly authenticate against a secured hadoop/hdfs cluster you 
must set `druid.hadoop.security.kerberos.principal` and 
`druid.hadoop.security.kerberos.keytab`, this is an alternative to the cron job 
method that runs `kinit` command periodically.
 
-### Configuration for Google Cloud Storage
+### Configuration for Cloud Storage
+
+You can also use the AWS S3 or the Google Cloud Storage as the deep storage 
via HDFS.
+
+ Configuration for AWS S3
 
-The HDFS extension can also be used for GCS as deep storage.
+To use the AWS S3 as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
 |Property|Possible Values|Description|Default|
 ||---|---|---|
-|`druid.storage.type`|hdfs||Must be set.|
-|`druid.storage.storageDirectory`||gs://bucket/example/directory|Must be set.|
+|`druid.storage.type`|hdfs| |Must be set.|
+|`druid.storage.storageDirectory`|s3a://bucket/example/directory or 
s3n://bucket/example/directory|Path to the deep storage|Must be set.|
 
-All services that need to access GCS need to have the [GCS connector 
jar](https://cloud.google.com/hadoop/google-cloud-storage-connector#manualinstallation)
 in their class path. One option is to place this jar in /lib/ and 
/extensions/druid-hdfs-storage/
+You also need to include the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html),
 especially the `hadoop-aws.jar` in the Druid classpath.
+Run the below command to install the `hadoop-aws.jar` file under 
`${DRUID_HOME}/extensions/druid-hdfs-storage` in all nodes.
 
-Tested with Druid 0.9.0, Hadoop 2.7.2 and gcs-connector jar 1.4.4-hadoop2.
-
-
+```bash
+java -classpath "${DRUID_HOME}lib/*" org.apache.druid.cli.Main tools pull-deps 
-h "org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}";
+cp 
${DRUID_HOME}/hadoop-dependencies/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar
 ${DRUID_HOME}/extensions/druid-hdfs-storage/
+```
 
-## Native batch ingestion
+Finally, you need to add the below properties in the `core-site.xml`.
+For more configurations, see the [Hadoop AWS 
module](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
+
+```xml
+
+  fs.s3a.impl
+  org.apache.hadoop.fs.s3a.S3AFileSystem
+  The implementation class of the S3A Filesystem
+
+
+
+  fs.AbstractFileSystem.s3a.impl
+  org.apache.hadoop.fs.s3a.S3A
+  The implementation class of the S3A 
AbstractFileSystem.
+
+
+
+  fs.s3a.access.key
+  AWS access key ID. Omit for IAM role-based or provider-based 
authentication.
+  your access key
+
+
+
+  fs.s3a.secret.key
+  AWS secret key. Omit for IAM role-based or provider-based 
authentication.
+  your secret key
+
+```
 
-This firehose ingests events from a predefined list of files from a Hadoop 
filesystem.
-This firehose is _splittable_ and can be used by [native parallel index 
tasks](../../ingestion/native-batch.md#parallel-task).
-Since each split represents an HDFS file, each worker task of `index_parallel` 
will read an object.
+ Configuration for Google Cloud Storage
 
-Sample spec:
+To use the Google cloud Storage as the deep storage, you need to configure 
`druid.storage.storageDirectory` properly.
 
-```json
-"firehose" : {
-"type" : "hdfs",
-"paths": "/foo/bar,/foo/baz"
-}
+|Property|Possible Values|Description|Default|
+||---|---|---|
+|`druid.storage.type`|hdfs||Must be set.|
+|`druid.storage.storageDirectory`|gs://bucket/example/directory|Path to the 
deep storage|Must be set.|
+
+All services that need to access GCS need to have the [GCS connector 
jar](https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md)
 in their class path.
+One option is to place this jar in `${DRUID_HOME}/lib/` and 
`${DRUID_HOME}/extensions/druid-hdfs-storage/`.
+
+Finally, you need to add the below properties in the `core-site.xml`.
+For 

[GitHub] [druid] jihoonson commented on a change in pull request #9171: Doc update for the new input source and the new input format

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9171: Doc update for the new 
input source and the new input format
URL: https://github.com/apache/druid/pull/9171#discussion_r367658976
 
 

 ##
 File path: docs/development/extensions-core/kafka-ingestion.md
 ##
 @@ -60,22 +60,16 @@ A sample supervisor spec is shown below:
   "type": "kafka",
   "dataSchema": {
 "dataSource": "metrics-kafka",
-"parser": {
 
 Review comment:
   Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on issue #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
gianm commented on issue #9181: Speed up String first/last aggregators when 
folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#issuecomment-575353132
 
 
   @clintropolis Thanks for reviewing. I updated the patch to reflect your 
comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9181: Speed up String first/last 
aggregators when folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#discussion_r367656532
 
 

 ##
 File path: 
processing/src/test/java/org/apache/druid/query/aggregation/last/StringLastBufferAggregatorTest.java
 ##
 @@ -81,6 +82,43 @@ public void testBufferAggregate()
 
   }
 
+  @Test
+  public void testBufferAggregateWithFoldCheck()
+  {
+final long[] timestamps = {1526724600L, 1526724700L, 1526724800L, 
1526725900L, 1526725000L};
+final String[] strings = {"", "", "", "", ""};
+Integer maxStringBytes = 1024;
+
+TestLongColumnSelector longColumnSelector = new 
TestLongColumnSelector(timestamps);
+TestObjectColumnSelector objectColumnSelector = new 
TestObjectColumnSelector<>(strings);
+
+StringLastAggregatorFactory factory = new StringLastAggregatorFactory(
+"billy", "billy", maxStringBytes
+);
+
+StringLastBufferAggregator agg = new StringLastBufferAggregator(
+longColumnSelector,
+objectColumnSelector,
+maxStringBytes,
+true
+);
+
+ByteBuffer buf = ByteBuffer.allocate(factory.getMaxIntermediateSize());
+int position = 0;
+
+agg.init(buf, position);
+//noinspection ForLoopReplaceableByForEach
+for (int i = 0; i < timestamps.length; i++) {
+  aggregateBuffer(longColumnSelector, objectColumnSelector, agg, buf, 
position);
+}
+
+SerializablePairLongString sp = ((SerializablePairLongString) agg.get(buf, 
position));
+
+
+Assert.assertEquals("expectec last string value", "", sp.rhs);
 
 Review comment:
   Updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9181: Speed up String first/last aggregators when folding isn't needed.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9181: Speed up String first/last 
aggregators when folding isn't needed.
URL: https://github.com/apache/druid/pull/9181#discussion_r367656434
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java
 ##
 @@ -33,23 +36,63 @@
 {
   private static final int NULL_VALUE = -1;
 
+  /**
+   * Shorten "s" to "maxBytes" chars. Fast and loose because these are *chars* 
not *bytes*. Use
+   * {@link #chop(String, int)} for slower, but accurate chopping.
+   */
+  @Nullable
+  public static String fastLooseChop(@Nullable final String s, final int 
maxBytes)
+  {
+if (s == null || s.length() <= maxBytes) {
+  return s;
+} else {
+  return s.substring(0, maxBytes);
+}
+  }
+
+  /**
+   * Shorten "s" to what could fit in "maxBytes" bytes as UTF-8.
+   */
   @Nullable
   public static String chop(@Nullable final String s, final int maxBytes)
   {
 if (s == null) {
   return null;
 } else {
-  // Shorten firstValue to what could fit in maxBytes as UTF-8.
   final byte[] bytes = new byte[maxBytes];
   final int len = StringUtils.toUtf8WithLimit(s, ByteBuffer.wrap(bytes));
   return new String(bytes, 0, len, StandardCharsets.UTF_8);
 }
   }
 
+  /**
+   * Returns whether a given value selector *might* contain 
SerializablePairLongString objects.
+   */
+  public static boolean selectorNeedsFoldCheck(
+  final BaseObjectColumnValueSelector valueSelector,
+  @Nullable final ColumnCapabilities valueSelectorCapabilities
+  )
+  {
+if (valueSelectorCapabilities != null && 
valueSelectorCapabilities.getType() != ValueType.COMPLEX) {
+  // Known, non-complex type.
+  return false;
+}
+
+if (valueSelector instanceof NilColumnValueSelector) {
+  // Nil column, definitely no SerializablePairLongStrings.
+  return false;
+}
+
+// Check if the reported class could possibly be 
SerializablePairLongString.
+final Class clazz = valueSelector.classOfObject();
+return clazz.isAssignableFrom(SerializablePairLongString.class)
 
 Review comment:
   I changed it to:
   
   ```java
   // Check if the selector class could possibly be a 
SerializablePairLongString (either a superclass or subclass).
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm opened a new pull request #9196: Add javadocs and small improvements to join code.

2020-01-16 Thread GitBox
gianm opened a new pull request #9196: Add javadocs and small improvements to 
join code.
URL: https://github.com/apache/druid/pull/9196
 
 
   A follow-up to #9111.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm merged pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm merged pull request #9111: Add HashJoinSegment, a virtual segment for 
joins.
URL: https://github.com/apache/druid/pull/9111
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (09efd20 -> a87db7f)

2020-01-16 Thread gian
This is an automated email from the ASF dual-hosted git repository.

gian pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 09efd20  fix refresh button (#9195)
 add a87db7f  Add HashJoinSegment, a virtual segment for joins. (#9111)

No new revisions were added by this update.

Summary of changes:
 .../apache/druid/common/config/NullHandling.java   |   23 +
 .../java/org/apache/druid/math/expr/Exprs.java |   71 +
 .../druid/common/config/NullHandlingTest.java  |   90 ++
 .../java/org/apache/druid/math/expr/ExprsTest.java |   99 ++
 .../apache/druid/server/lookup/LoadingLookup.java  |   38 +-
 .../apache/druid/server/lookup/PollingLookup.java  |   13 +
 processing/pom.xml |5 +
 .../query/dimension/DefaultDimensionSpec.java  |7 +
 .../druid/query/dimension/DimensionSpec.java   |6 +
 .../query/dimension/ExtractionDimensionSpec.java   |6 +
 .../query/dimension/ListFilteredDimensionSpec.java |7 +
 .../druid/query/dimension/LookupDimensionSpec.java |   15 +
 .../dimension/PrefixFilteredDimensionSpec.java |7 +
 .../dimension/RegexFilteredDimensionSpec.java  |7 +
 .../druid/query/extraction/MapLookupExtractor.java |   12 +
 ... VectorValueMatcherColumnProcessorFactory.java} |   20 +-
 .../druid/query/groupby/GroupByQueryHelper.java|1 +
 .../epinephelinae/RowBasedGrouperHelper.java   |9 +-
 ...va => GroupByVectorColumnProcessorFactory.java} |   20 +-
 .../epinephelinae/vector/VectorGroupByEngine.java  |2 +-
 .../apache/druid/query/lookup/LookupExtractor.java |   13 +-
 .../timeseries/TimeseriesQueryQueryToolChest.java  |2 +-
 .../druid/segment/ColumnProcessorFactory.java  |   56 +
 .../org/apache/druid/segment/ColumnProcessors.java |  144 ++
 .../druid/segment/DimensionHandlerUtils.java   |   24 +-
 .../segment/QueryableIndexStorageAdapter.java  |   16 +-
 .../ColumnCapabilities.java => RowAdapter.java}|   28 +-
 .../RowBasedColumnSelectorFactory.java |   85 +-
 .../org/apache/druid/segment/StorageAdapter.java   |2 -
 .../VectorColumnProcessorFactory.java} |   23 +-
 .../org/apache/druid/segment/VirtualColumns.java   |   25 +-
 .../druid/segment/column/ColumnCapabilities.java   |2 +-
 .../apache/druid/segment/filter/BoundFilter.java   |4 +-
 .../segment/filter/DimensionPredicateFilter.java   |4 +-
 .../org/apache/druid/segment/filter/InFilter.java  |4 +-
 .../apache/druid/segment/filter/LikeFilter.java|4 +-
 .../druid/segment/filter/SelectorFilter.java   |4 +-
 .../segment/incremental/IncrementalIndex.java  |2 +-
 .../IncrementalIndexStorageAdapter.java|6 -
 .../org/apache/druid/segment/join/Equality.java|   60 +
 .../apache/druid/segment/join/HashJoinEngine.java  |  211 +++
 .../apache/druid/segment/join/HashJoinSegment.java |   98 ++
 .../join/HashJoinSegmentStorageAdapter.java|  279 
 .../druid/segment/join/JoinConditionAnalysis.java  |  182 +++
 .../org/apache/druid/segment/join/JoinMatcher.java |   83 ++
 .../org/apache/druid/segment/join/JoinType.java|   89 ++
 .../org/apache/druid/segment/join/Joinable.java|   74 ++
 .../apache/druid/segment/join/JoinableClause.java  |  145 ++
 .../join/PossiblyNullColumnValueSelector.java  |   86 ++
 .../join/PossiblyNullDimensionSelector.java|  191 +++
 .../apache/druid/segment/join/PostJoinCursor.java  |  121 ++
 .../join/lookup/LookupColumnSelectorFactory.java   |  113 ++
 .../segment/join/lookup/LookupJoinMatcher.java |  312 +
 .../druid/segment/join/lookup/LookupJoinable.java  |   86 ++
 .../table/IndexedTable.java}   |   51 +-
 .../table/IndexedTableColumnSelectorFactory.java   |  104 ++
 .../table/IndexedTableColumnValueSelector.java |  132 ++
 .../join/table/IndexedTableDimensionSelector.java  |  144 ++
 .../join/table/IndexedTableJoinMatcher.java|  310 +
 .../segment/join/table/IndexedTableJoinable.java   |   78 ++
 .../segment/join/table/RowBasedIndexedTable.java   |  166 +++
 .../join/table/SortedIntIntersectionIterator.java  |   98 ++
 .../druid/segment/transform/Transformer.java   |2 +-
 .../druid/segment/virtual/ExpressionSelectors.java |3 +-
 .../query/extraction/MapLookupExtractorTest.java   |   26 +-
 .../topn/TopNMetricSpecOptimizationsTest.java  |6 -
 .../druid/segment/filter/BaseFilterTest.java   |2 +-
 .../join/HashJoinSegmentStorageAdapterTest.java| 1390 
 .../druid/segment/join/HashJoinSegmentTest.java|  138 ++
 .../segment/join/JoinConditionAnalysisTest.java|  293 +
 .../apache/druid/segment/join/JoinTestHelper.java  |  351 +
 .../druid/segment/join/JoinableClauseTest.java |  113 ++
 .../join/PossiblyNullDimensionSelectorTest.java|  143 ++
 .../join/table/RowBasedIndexedTableTest.java   |  183 +++
 

[GitHub] [druid] jon-wei commented on a change in pull request #9187: Implement ANY aggregator

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9187: Implement ANY aggregator
URL: https://github.com/apache/druid/pull/9187#discussion_r367652349
 
 

 ##
 File path: docs/querying/sql.md
 ##
 @@ -203,6 +203,10 @@ Only the COUNT aggregation can accept DISTINCT.
 |`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
 |`LATEST(expr)`|Returns the latest non-null value of `expr`, which must be 
numeric. If `expr` comes from a relation with a timestamp column (like a Druid 
datasource) then "latest" is the value last encountered with the maximum 
overall timestamp of all values being aggregated. If `expr` does not come from 
a relation with a timestamp, then it is simply the last value encountered.|
 |`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The 
`maxBytesPerString` parameter determines how much aggregation space to allocate 
per string. Strings longer than this limit will be truncated. This parameter 
should be set as low as possible, since high values will lead to wasted memory.|
+|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If 
`druid.generic.useDefaultValueForNull=true` this can returns the default value 
for null and does not prefer "non-null" values over the default value for null. 
If `druid.generic.useDefaultValueForNull=false`, then this will returns any 
non-null value of `expr`|
 
 Review comment:
   Hm, looks like the docs are out of date for those, we can fix those later


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367647052
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/table/SortedIntIntersectionIterator.java
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join.table;
+
+import com.google.common.base.Preconditions;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+import java.util.Arrays;
+import java.util.NoSuchElementException;
+
+/**
+ * Iterates over the intersection of an array of sorted int lists. Intended 
for situations where the number
 
 Review comment:
   Oh, I missed that part. Sounds good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367641594
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/table/SortedIntIntersectionIterator.java
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join.table;
+
+import com.google.common.base.Preconditions;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+import java.util.Arrays;
+import java.util.NoSuchElementException;
+
+/**
+ * Iterates over the intersection of an array of sorted int lists. Intended 
for situations where the number
 
 Review comment:
   Even though the next sentence says "The iterators must be composed of 
ascending, nonnegative ints."?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367591385
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/table/SortedIntIntersectionIterator.java
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join.table;
+
+import com.google.common.base.Preconditions;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+import java.util.Arrays;
+import java.util.NoSuchElementException;
+
+/**
+ * Iterates over the intersection of an array of sorted int lists. Intended 
for situations where the number
 
 Review comment:
   nit: probably better to be `sorted positive int lists`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
jihoonson commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367608679
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/HashJoinEngine.java
 ##
 @@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.Cursor;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.joda.time.DateTime;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+public class HashJoinEngine
+{
+  private HashJoinEngine()
+  {
+// No instantiation.
+  }
+
+  /**
+   * Creates a cursor that represents the join of {@param leftCursor} with 
{@param joinableClause}. The resulting
+   * cursor may generate nulls on the left-hand side (for righty joins; see 
{@link JoinType#isRighty()}) or on
+   * the right-hand side (for lefty joins; see {@link JoinType#isLefty()}). 
Columns that start with the
+   * joinable clause's prefix (see {@link JoinableClause#getPrefix()}) will 
come from the Joinable's column selector
+   * factory, and all other columns will come from the leftCursor's column 
selector factory.
+   *
+   * Ensuing that the joinable clause's prefix does not conflict with any 
columns from "leftCursor" is the
+   * responsibility of the caller.
+   */
+  public static Cursor makeJoinCursor(final Cursor leftCursor, final 
JoinableClause joinableClause)
+  {
+final ColumnSelectorFactory leftColumnSelectorFactory = 
leftCursor.getColumnSelectorFactory();
+final JoinMatcher joinMatcher = joinableClause.getJoinable()
+  .makeJoinMatcher(
+  
leftColumnSelectorFactory,
+  
joinableClause.getCondition(),
+  
joinableClause.getJoinType().isRighty()
+  );
+
+class JoinColumnSelectorFactory implements ColumnSelectorFactory
+{
+  @Override
+  public DimensionSelector makeDimensionSelector(DimensionSpec 
dimensionSpec)
+  {
+if (joinableClause.includesColumn(dimensionSpec.getDimension())) {
+  return joinMatcher.getColumnSelectorFactory()
+.makeDimensionSelector(
+
dimensionSpec.withDimension(joinableClause.unprefix(dimensionSpec.getDimension()))
+);
+} else {
+  final DimensionSelector leftSelector = 
leftColumnSelectorFactory.makeDimensionSelector(dimensionSpec);
+
+  if (!joinableClause.getJoinType().isRighty()) {
+return leftSelector;
+  } else {
+return new PossiblyNullDimensionSelector(leftSelector, 
joinMatcher::matchingRemainder);
+  }
+}
+  }
+
+  @Override
+  public ColumnValueSelector makeColumnValueSelector(String column)
+  {
+if (joinableClause.includesColumn(column)) {
+  return 
joinMatcher.getColumnSelectorFactory().makeColumnValueSelector(joinableClause.unprefix(column));
+} else {
+  final ColumnValueSelector leftSelector = 
leftColumnSelectorFactory.makeColumnValueSelector(column);
+
+  if (!joinableClause.getJoinType().isRighty()) {
+return leftSelector;
+  } else {
+return new PossiblyNullColumnValueSelector<>(leftSelector, 
joinMatcher::matchingRemainder);
+  }
+}
+  }
+
+  @Nullable
+  @Override
+  public ColumnCapabilities getColumnCapabilities(String column)
+  {
+if (joinableClause.includesColumn(column)) {
+  return 
joinMatcher.getColumnSelectorFactory().getColumnCapabilities(joinableClause.unprefix(column));
+} else {
+  

[GitHub] [druid] maytasm3 commented on a change in pull request #9187: Implement ANY aggregator

2020-01-16 Thread GitBox
maytasm3 commented on a change in pull request #9187: Implement ANY aggregator
URL: https://github.com/apache/druid/pull/9187#discussion_r367628285
 
 

 ##
 File path: 
sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java
 ##
 @@ -377,6 +377,15 @@ public AuthenticationResult 
createEscalatedAuthenticationResult()
   );
 
   public static final List ROWS1_WITH_NUMERIC_DIMS = 
ImmutableList.of(
+  createRow(
 
 Review comment:
   Actually, I think it's fine to just test with the same numfoo datasource 
(with first row being non-null)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] maytasm3 commented on a change in pull request #9187: Implement ANY aggregator

2020-01-16 Thread GitBox
maytasm3 commented on a change in pull request #9187: Implement ANY aggregator
URL: https://github.com/apache/druid/pull/9187#discussion_r367624495
 
 

 ##
 File path: docs/querying/sql.md
 ##
 @@ -203,6 +203,10 @@ Only the COUNT aggregation can accept DISTINCT.
 |`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
 |`LATEST(expr)`|Returns the latest non-null value of `expr`, which must be 
numeric. If `expr` comes from a relation with a timestamp column (like a Druid 
datasource) then "latest" is the value last encountered with the maximum 
overall timestamp of all values being aggregated. If `expr` does not come from 
a relation with a timestamp, then it is simply the last value encountered.|
 |`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The 
`maxBytesPerString` parameter determines how much aggregation space to allocate 
per string. Strings longer than this limit will be truncated. This parameter 
should be set as low as possible, since high values will lead to wasted memory.|
+|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If 
`druid.generic.useDefaultValueForNull=true` this can returns the default value 
for null and does not prefer "non-null" values over the default value for null. 
If `druid.generic.useDefaultValueForNull=false`, then this will returns any 
non-null value of `expr`|
+|`ANY_VALUE(expr, maxBytesPerString)`|Like `ANY_VALUE(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
 
 Review comment:
   Let's discuss. We can change this behaviour for LATEST, EARLIEST (and ANY)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] maytasm3 commented on a change in pull request #9187: Implement ANY aggregator

2020-01-16 Thread GitBox
maytasm3 commented on a change in pull request #9187: Implement ANY aggregator
URL: https://github.com/apache/druid/pull/9187#discussion_r367624288
 
 

 ##
 File path: docs/querying/sql.md
 ##
 @@ -203,6 +203,10 @@ Only the COUNT aggregation can accept DISTINCT.
 |`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
 |`LATEST(expr)`|Returns the latest non-null value of `expr`, which must be 
numeric. If `expr` comes from a relation with a timestamp column (like a Druid 
datasource) then "latest" is the value last encountered with the maximum 
overall timestamp of all values being aggregated. If `expr` does not come from 
a relation with a timestamp, then it is simply the last value encountered.|
 |`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The 
`maxBytesPerString` parameter determines how much aggregation space to allocate 
per string. Strings longer than this limit will be truncated. This parameter 
should be set as low as possible, since high values will lead to wasted memory.|
+|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If 
`druid.generic.useDefaultValueForNull=true` this can returns the default value 
for null and does not prefer "non-null" values over the default value for null. 
If `druid.generic.useDefaultValueForNull=false`, then this will returns any 
non-null value of `expr`|
+|`ANY_VALUE(expr, maxBytesPerString)`|Like `ANY_VALUE(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
 
 Review comment:
   Currently, the implementation for LATEST, EARLIEST (and ANY since I based it 
off LATEST, EARLIEST) is that if you use the json stuff, then maxStringBytes is 
optional and if not present will default to 1024 (as per the docs in 
docs/querying/aggregations.md). 
   However, this does not work the same if you issue the query through SQL. To 
use LATEST, EARLIEST (and ANY) in SQL, you must give the maxStringBytes as the 
second argument. If you do not, then the column actually gets cast into double 
(super weird).  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #9187: Implement ANY aggregator

2020-01-16 Thread GitBox
jon-wei commented on a change in pull request #9187: Implement ANY aggregator
URL: https://github.com/apache/druid/pull/9187#discussion_r367616811
 
 

 ##
 File path: docs/querying/sql.md
 ##
 @@ -203,6 +203,10 @@ Only the COUNT aggregation can accept DISTINCT.
 |`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
 |`LATEST(expr)`|Returns the latest non-null value of `expr`, which must be 
numeric. If `expr` comes from a relation with a timestamp column (like a Druid 
datasource) then "latest" is the value last encountered with the maximum 
overall timestamp of all values being aggregated. If `expr` does not come from 
a relation with a timestamp, then it is simply the last value encountered.|
 |`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The 
`maxBytesPerString` parameter determines how much aggregation space to allocate 
per string. Strings longer than this limit will be truncated. This parameter 
should be set as low as possible, since high values will lead to wasted memory.|
+|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If 
`druid.generic.useDefaultValueForNull=true` this can returns the default value 
for null and does not prefer "non-null" values over the default value for null. 
If `druid.generic.useDefaultValueForNull=false`, then this will returns any 
non-null value of `expr`|
+|`ANY_VALUE(expr, maxBytesPerString)`|Like `ANY_VALUE(expr)`, but for strings. 
The `maxBytesPerString` parameter determines how much aggregation space to 
allocate per string. Strings longer than this limit will be truncated. This 
parameter should be set as low as possible, since high values will lead to 
wasted memory.|
 
 Review comment:
   you have this block in StringAnyAggregatorFactory:
   
   ```
   
   this.maxStringBytes = maxStringBytes == null
 ? 
StringFirstAggregatorFactory.DEFAULT_MAX_STRING_SIZE
 : maxStringBytes;
   ```
   
   I would give the SQL function consistent behavior
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367616393
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/table/IndexedTable.java
 ##
 @@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join.table;
+
+import it.unimi.dsi.fastutil.ints.IntList;
+import org.apache.druid.segment.column.ValueType;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Map;
+
+public interface IndexedTable
 
 Review comment:
   I'm thinking of adding this:
   
   ```java
   /**
* An interface to a table where some columns (the 'key columns') have 
indexes that enable fast lookups.
*
* The main user of this class is {@link IndexedTableJoinable}, and its main 
purpose is to participate in joins.
*/
   public interface IndexedTable
   {
 /**
  * Returns the columns of this table that have indexes.
  */
 List keyColumns();
   
 /**
  * Returns all columns of this table, including the key and non-key 
columns.
  */
 List allColumns();
   
 /**
  * Returns the signature of this table: a map where each key is a column 
from {@link #allColumns()} and each value
  * is a type code.
  */
 Map rowSignature();
   
 /**
  * Returns the number of rows in this table. It must not change over time, 
since it is used for things like algorithm
  * selection and reporting of cardinality metadata.
  */
 int numRows();
   
 /**
  * Returns the index for a particular column. The provided column number 
must be that column's position in
  * {@link #allColumns()}.
  */
 Index columnIndex(int column);
   
 /**
  * Returns a reader for a particular column. The provided column number 
must be that column's position in
  * {@link #allColumns()}.
  */
 Reader columnReader(int column);
   
 /**
  * Indexes support fast lookups on key columns.
  */
 interface Index
 {
   /**
* Returns the list of row numbers where the column this Reader is based 
on contains 'key'.
*/
   IntList find(Object key);
 }
   
 /**
  * Readers support reading values out of any column.
  */
 interface Reader
 {
   /**
* Read the value at a particular row number. Throws an exception if the 
row is out of bounds (must be between zero
* and {@link #numRows()}).
*/
   @Nullable
   Object read(int row);
 }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367615823
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/HashJoinEngine.java
 ##
 @@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.Cursor;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.joda.time.DateTime;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+public class HashJoinEngine
+{
+  private HashJoinEngine()
+  {
+// No instantiation.
+  }
+
+  /**
+   * Creates a cursor that represents the join of {@param leftCursor} with 
{@param joinableClause}. The resulting
+   * cursor may generate nulls on the left-hand side (for righty joins; see 
{@link JoinType#isRighty()}) or on
+   * the right-hand side (for lefty joins; see {@link JoinType#isLefty()}). 
Columns that start with the
+   * joinable clause's prefix (see {@link JoinableClause#getPrefix()}) will 
come from the Joinable's column selector
+   * factory, and all other columns will come from the leftCursor's column 
selector factory.
+   *
+   * Ensuing that the joinable clause's prefix does not conflict with any 
columns from "leftCursor" is the
 
 Review comment:
   Oops, yeah, that's a typo. It should be "ensuring".
   
   Is this clearer?
   
   ```java
 /**
  * Ensuring that the joinable clause's prefix does not conflict with any 
columns from "leftCursor" is the
  * responsibility of the caller. If there is such a conflict (for example, 
if the joinable clause's prefix is "j.",
  * and the leftCursor has a field named "j.j.abrams"), then the field from 
the leftCursor will be shadowed and will
  * not be queryable through the returned Cursor. This happens even if the 
right-hand joinable doesn't actually have a
  * column with this name.
  */
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367615823
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/HashJoinEngine.java
 ##
 @@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import org.apache.druid.query.BaseQuery;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.Cursor;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.joda.time.DateTime;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+public class HashJoinEngine
+{
+  private HashJoinEngine()
+  {
+// No instantiation.
+  }
+
+  /**
+   * Creates a cursor that represents the join of {@param leftCursor} with 
{@param joinableClause}. The resulting
+   * cursor may generate nulls on the left-hand side (for righty joins; see 
{@link JoinType#isRighty()}) or on
+   * the right-hand side (for lefty joins; see {@link JoinType#isLefty()}). 
Columns that start with the
+   * joinable clause's prefix (see {@link JoinableClause#getPrefix()}) will 
come from the Joinable's column selector
+   * factory, and all other columns will come from the leftCursor's column 
selector factory.
+   *
+   * Ensuing that the joinable clause's prefix does not conflict with any 
columns from "leftCursor" is the
 
 Review comment:
   Oops, yeah, that's a typo. It should be "ensuring".
   
   Is this clearer?
   
   ```java
  * Ensuring that the joinable clause's prefix does not conflict with any 
columns from "leftCursor" is the
  * responsibility of the caller. If there is such a conflict (for example, 
if the joinable clause's prefix is "j.",
  * and the leftCursor has a field named "j.j.abrams"), then the field from 
the leftCursor will be shadowed and will
  * not be queryable through the returned Cursor. This happens even if the 
right-hand joinable doesn't actually have a
  * column with this name.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367614421
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinConditionAnalysis.java
 ##
 @@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.Pair;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.Exprs;
+import org.apache.druid.math.expr.Parser;
+import org.apache.druid.query.expression.ExprUtils;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+
+/**
+ * Represents analysis of a join condition.
+ *
+ * Each condition is decomposed into "equiConditions" and "nonEquiConditions".
+ *
+ * 1) The equiConditions are of the form ExpressionOfLeft = ColumnFromRight. 
The right-hand part cannot be an expression
+ * because we use this analysis to determine if we can perform the join using 
hashtables built off right-hand-side
+ * columns.
+ *
+ * 2) The nonEquiConditions are other conditions that should also be ANDed 
together
+ *
+ * All of these conditions are ANDed together to get the overall condition.
+ */
+public class JoinConditionAnalysis
+{
+  private final String originalExpression;
+  private final List equiConditions;
+  private final List nonEquiConditions;
+
+  private JoinConditionAnalysis(
+  final String originalExpression,
+  final List equiConditions,
+  final List nonEquiConditions
+  )
+  {
+this.originalExpression = Preconditions.checkNotNull(originalExpression, 
"originalExpression");
+this.equiConditions = equiConditions;
+this.nonEquiConditions = nonEquiConditions;
+  }
+
+  public static JoinConditionAnalysis forExpression(
+  final String condition,
 
 Review comment:
   I'm thinking of adding this javadoc:
   
   ```java
 /**
  * Analyze a join condition.
  *
  * @param condition   the condition expression
  * @param rightPrefix prefix for the right-hand side of the join; will be 
used to determine which identifiers in
  *the condition come from the right-hand side and 
which come from the left-hand side
  * @param macroTable  macro table for parsing the condition expression
  */
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367613419
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinConditionAnalysis.java
 ##
 @@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.Pair;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.Exprs;
+import org.apache.druid.math.expr.Parser;
+import org.apache.druid.query.expression.ExprUtils;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+
+/**
+ * Represents analysis of a join condition.
+ *
+ * Each condition is decomposed into "equiConditions" and "nonEquiConditions".
+ *
+ * 1) The equiConditions are of the form ExpressionOfLeft = ColumnFromRight. 
The right-hand part cannot be an expression
+ * because we use this analysis to determine if we can perform the join using 
hashtables built off right-hand-side
+ * columns.
+ *
+ * 2) The nonEquiConditions are other conditions that should also be ANDed 
together
+ *
+ * All of these conditions are ANDed together to get the overall condition.
+ */
+public class JoinConditionAnalysis
+{
+  private final String originalExpression;
+  private final List equiConditions;
+  private final List nonEquiConditions;
+
+  private JoinConditionAnalysis(
+  final String originalExpression,
+  final List equiConditions,
+  final List nonEquiConditions
+  )
+  {
+this.originalExpression = Preconditions.checkNotNull(originalExpression, 
"originalExpression");
+this.equiConditions = equiConditions;
+this.nonEquiConditions = nonEquiConditions;
+  }
+
+  public static JoinConditionAnalysis forExpression(
+  final String condition,
 
 Review comment:
   Yes, and that's because the way to think about the prefixes is that they 
aren't table names (which might be present or not), they are column name 
prefixes. They are mandatory.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367612949
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/table/IndexedTable.java
 ##
 @@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join.table;
+
+import it.unimi.dsi.fastutil.ints.IntList;
+import org.apache.druid.segment.column.ValueType;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Map;
+
+public interface IndexedTable
 
 Review comment:
   Sure, good call.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367608934
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/ColumnProcessorFactory.java
 ##
 @@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import org.apache.druid.query.dimension.ColumnSelectorStrategyFactory;
+import org.apache.druid.segment.column.ValueType;
+
+/**
+ * Class that encapsulates knowledge about how to create "column processors", 
which are... objects that process columns
+ * and want to have type-specific logic. Used by {@link 
ColumnProcessors#makeProcessor}.
+ *
+ * Column processors can be any type "T". The idea is that a 
ColumnProcessorFactory embodies the logic for wrapping
+ * and processing selectors of various types, and so enables nice code design, 
where type-dependent code is not
+ * sprinkled throughout.
+ *
+ * @see VectorColumnProcessorFactory the vectorized version
+ * @see ColumnProcessors#makeProcessor which uses these, and which is 
responsible for
+ * determining which type of selector to use for a given column
+ * @see ColumnSelectorStrategyFactory which serves a similar purpose and may 
be replaced by this in the future
+ * @see DimensionHandlerUtils#createColumnSelectorPluses which accepts {@link 
ColumnSelectorStrategyFactory} and is
+ * similar to {@link ColumnProcessors#makeProcessor}
+ */
+public interface ColumnProcessorFactory
+{
+  /**
+   * This default type will be used when the underlying column has an unknown 
type.
+   */
+  ValueType defaultType();
 
 Review comment:
   I'm thinking about adding this javadoc:
   
   ```java
 /**
  * This default type will be used when the underlying column has an 
unknown type.
  *
  * This allows a column processor factory to specify what type it prefers 
to deal with (the most 'natural' type for
  * whatever it is doing) when all else is equal.
  */
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367607423
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinableClause.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.IAE;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+/**
+ * Represents everything about a join clause except for the left-hand 
datasource. In other words, if the full join
+ * clause is "t1 JOIN t2 ON t1.x = t2.x" then this class represents "JOIN t2 
ON x = t2.x" -- it does not include
+ * references to the left-hand "t1".
+ */
+public class JoinableClause
+{
+  private final String prefix;
 
 Review comment:
   I'm planning to add this javadoc, which'll make it clearer:
   
   ```java
 /**
  * The prefix to apply to all columns from the Joinable. The idea is that 
during a join, any columns that start with
  * this prefix should be retrieved from our Joinable's {@link 
JoinMatcher#getColumnSelectorFactory()}. Any other
  * columns should be returned from the left-hand side of the join.
  *
  * The prefix can be any string, as long as it is nonempty and not itself 
a prefix of the reserved column name
  * {@code __time}.
  *
  * @see #getAvailableColumnsPrefixed() the list of columns from our {@link 
Joinable} with prefixes attached
  * @see #unprefix a method for removing prefixes
  */
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367607423
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinableClause.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.IAE;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+/**
+ * Represents everything about a join clause except for the left-hand 
datasource. In other words, if the full join
+ * clause is "t1 JOIN t2 ON t1.x = t2.x" then this class represents "JOIN t2 
ON x = t2.x" -- it does not include
+ * references to the left-hand "t1".
+ */
+public class JoinableClause
+{
+  private final String prefix;
 
 Review comment:
   I'm planning to add this javadoc, which'll make it clearer:
   
   ```java
 /**
  * The prefix to apply to all columns from the Joinable. The idea is that 
during a join, any columns that start with
  * this prefix should be retrieved from our Joinable's {@link 
JoinMatcher#getColumnSelectorFactory()}. Any other
  * columns should be returned from the left-hand side of the join.
  *
  * @see #getAvailableColumnsPrefixed() the list of columns from our {@link 
Joinable} with prefixes attached
  * @see #unprefix a method for removing prefixes
  */
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367606130
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinableClause.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.IAE;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+/**
+ * Represents everything about a join clause except for the left-hand 
datasource. In other words, if the full join
+ * clause is "t1 JOIN t2 ON t1.x = t2.x" then this class represents "JOIN t2 
ON x = t2.x" -- it does not include
+ * references to the left-hand "t1".
+ */
+public class JoinableClause
+{
+  private final String prefix;
+  private final Joinable joinable;
+  private final JoinType joinType;
+  private final JoinConditionAnalysis condition;
+
+  public JoinableClause(@Nullable String prefix, Joinable joinable, JoinType 
joinType, JoinConditionAnalysis condition)
+  {
+this.prefix = prefix != null ? prefix : "";
+this.joinable = Preconditions.checkNotNull(joinable, "joinable");
+this.joinType = Preconditions.checkNotNull(joinType, "joinType");
+this.condition = Preconditions.checkNotNull(condition, "condition");
+  }
+
+  /**
+   * The prefix to apply to all columns from the Joinable.
+   */
+  public String getPrefix()
+  {
+return prefix;
+  }
+
+  /**
+   * The right-hand Joinable.
+   */
+  public Joinable getJoinable()
+  {
+return joinable;
+  }
+
+  /**
+   * The type of join: LEFT, RIGHT, INNER, or FULL.
+   */
+  public JoinType getJoinType()
+  {
+return joinType;
+  }
+
+  /**
+   * The join condition. When referring to right-hand columns, it should 
include the prefix.
+   */
+  public JoinConditionAnalysis getCondition()
+  {
+return condition;
+  }
+
+  /**
+   * Returns a list of columns from the underlying {@link 
Joinable#getAvailableColumns()} method, with our
+   * prefix ({@link #getPrefix()}) prepended.
+   */
+  public List getAvailableColumnsPrefixed()
 
 Review comment:
   I actually like that the word "prefix" is in here since it makes the 
connection with `getPrefix` and `unprefix` more clear.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367605830
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinableClause.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.IAE;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+/**
+ * Represents everything about a join clause except for the left-hand 
datasource. In other words, if the full join
+ * clause is "t1 JOIN t2 ON t1.x = t2.x" then this class represents "JOIN t2 
ON x = t2.x" -- it does not include
+ * references to the left-hand "t1".
+ */
+public class JoinableClause
+{
+  private final String prefix;
 
 Review comment:
   It's whatever the caller wants it to be, really. The SQL layer is gonna use 
strings like `_j0.`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367605830
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/JoinableClause.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join;
+
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.IAE;
+
+import javax.annotation.Nullable;
+import java.util.List;
+import java.util.Objects;
+import java.util.stream.Collectors;
+
+/**
+ * Represents everything about a join clause except for the left-hand 
datasource. In other words, if the full join
+ * clause is "t1 JOIN t2 ON t1.x = t2.x" then this class represents "JOIN t2 
ON x = t2.x" -- it does not include
+ * references to the left-hand "t1".
+ */
+public class JoinableClause
+{
+  private final String prefix;
 
 Review comment:
   It's whatever the caller wants it to be, really. The SQL layer is gonna use 
strings like `_j0.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367604965
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/VectorColumnProcessorFactory.java
 ##
 @@ -17,25 +17,32 @@
  * under the License.
  */
 
-package org.apache.druid.query.dimension;
+package org.apache.druid.segment;
 
 import org.apache.druid.segment.vector.MultiValueDimensionVectorSelector;
 import org.apache.druid.segment.vector.SingleValueDimensionVectorSelector;
 import org.apache.druid.segment.vector.VectorValueSelector;
 
 /**
  * Class that encapsulates knowledge about how to create vector column 
processors. Used by
- * {@link org.apache.druid.segment.DimensionHandlerUtils#makeVectorProcessor}.
+ * {@link DimensionHandlerUtils#makeVectorProcessor}.
+ *
+ * Unlike {@link ColumnProcessorFactory}, this interface does not have a 
"defaultType" method. The default type is
+ * always implicitly STRING. It also does not have a "makeComplexProcessor" 
method; instead, complex-typed columns
 
 Review comment:
   I imagined it's a temporary thing. I would eventually like the two column 
processor factory interfaces to match up better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367605135
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/join/table/IndexedTableJoinMatcher.java
 ##
 @@ -0,0 +1,310 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.join.table;
+
+import com.google.common.base.Preconditions;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+import it.unimi.dsi.fastutil.ints.IntIterators;
+import it.unimi.dsi.fastutil.ints.IntRBTreeSet;
+import it.unimi.dsi.fastutil.ints.IntSet;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.segment.BaseDoubleColumnValueSelector;
+import org.apache.druid.segment.BaseFloatColumnValueSelector;
+import org.apache.druid.segment.BaseLongColumnValueSelector;
+import org.apache.druid.segment.BaseObjectColumnValueSelector;
+import org.apache.druid.segment.ColumnProcessorFactory;
+import org.apache.druid.segment.ColumnProcessors;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.DimensionSelector;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.data.IndexedInts;
+import org.apache.druid.segment.join.Equality;
+import org.apache.druid.segment.join.JoinConditionAnalysis;
+import org.apache.druid.segment.join.JoinMatcher;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.List;
+import java.util.NoSuchElementException;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+
+public class IndexedTableJoinMatcher implements JoinMatcher
+{
+  private final IndexedTable table;
+  private final List> conditionMatchers;
+  private final IntIterator[] currentMatchedRows;
+  private final ColumnSelectorFactory selectorFactory;
+
+  // matchedRows and matchingRemainder are used to implement matchRemainder().
+  private final IntSet matchedRows;
+  private boolean matchingRemainder = false;
+
+  // currentIterator and currentRow are used to track iteration position 
through the currently-matched-rows.
+  @Nullable
+  private IntIterator currentIterator;
+  private int currentRow;
+
+  IndexedTableJoinMatcher(
+  final IndexedTable table,
+  final ColumnSelectorFactory leftSelectorFactory,
+  final JoinConditionAnalysis condition,
+  final boolean remainderNeeded
+  )
+  {
+this.table = table;
+
+if (condition.isAlwaysTrue()) {
+  this.conditionMatchers = Collections.singletonList(() -> 
IntIterators.fromTo(0, table.numRows()));
+} else if (condition.isAlwaysFalse()) {
+  this.conditionMatchers = Collections.singletonList(() -> 
IntIterators.fromTo(0, 0));
 
 Review comment:
   Yeah, good point.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #9111: Add HashJoinSegment, a virtual segment for joins.

2020-01-16 Thread GitBox
gianm commented on a change in pull request #9111: Add HashJoinSegment, a 
virtual segment for joins.
URL: https://github.com/apache/druid/pull/9111#discussion_r367604763
 
 

 ##
 File path: 
processing/src/main/java/org/apache/druid/segment/ColumnProcessorFactory.java
 ##
 @@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment;
+
+import org.apache.druid.query.dimension.ColumnSelectorStrategyFactory;
+import org.apache.druid.segment.column.ValueType;
+
+/**
+ * Class that encapsulates knowledge about how to create "column processors", 
which are... objects that process columns
+ * and want to have type-specific logic. Used by {@link 
ColumnProcessors#makeProcessor}.
+ *
+ * Column processors can be any type "T". The idea is that a 
ColumnProcessorFactory embodies the logic for wrapping
+ * and processing selectors of various types, and so enables nice code design, 
where type-dependent code is not
+ * sprinkled throughout.
+ *
+ * @see VectorColumnProcessorFactory the vectorized version
+ * @see ColumnProcessors#makeProcessor which uses these, and which is 
responsible for
+ * determining which type of selector to use for a given column
+ * @see ColumnSelectorStrategyFactory which serves a similar purpose and may 
be replaced by this in the future
+ * @see DimensionHandlerUtils#createColumnSelectorPluses which accepts {@link 
ColumnSelectorStrategyFactory} and is
+ * similar to {@link ColumnProcessors#makeProcessor}
+ */
+public interface ColumnProcessorFactory
+{
+  /**
+   * This default type will be used when the underlying column has an unknown 
type.
+   */
+  ValueType defaultType();
 
 Review comment:
   It's meant to be the preferred type that the processor wants to deal with in 
situations where there is no type information for the underlying column. It 
should usually be related to whatever the processor wants to _do_ with the 
data. The idea is that you would return `STRING` if you prefer to deal with 
strings, `DOUBLE` (or `LONG`) if you prefer to deal with numbers, etc.
   
   Does that make sense / sound reasonable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



  1   2   >