date:20200820

[GitHub] [druid] averma111 commented on issue #9780: Support directly reading and writing Druid data from Spark

2020-08-20 Thread GitBox



averma111 commented on issue #9780:
URL: https://github.com/apache/druid/issues/9780#issuecomment-678035280


   Also as per the design document https://github.com/metamx/druid-spark-batch 
is option for writing into Druid. but the build is failing. Can I get some help.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] averma111 commented on issue #9780: Support directly reading and writing Druid data from Spark

2020-08-20 Thread GitBox



averma111 commented on issue #9780:
URL: https://github.com/apache/druid/issues/9780#issuecomment-678033439


   @mangrrua  and @JulianJaffePinterest  I am very thankful to your guys for 
the work of spark connector to druid. Can I package this jar and make it 
working with current druid 0.18.1 version?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] klDen commented on issue #9207: Load streaming data from Apache Kafka Problem

2020-08-20 Thread GitBox



klDen commented on issue #9207:
URL: https://github.com/apache/druid/issues/9207#issuecomment-677972453


   I faced the same issue running in docker containers.
   The issue was the docker service was running as another user. What resolved 
it temporarily was to make the local volume **storage** directory writable to 
all (e.g. chmod 777).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jon-wei edited a comment on issue #9380: Fine grained config and state resources

2020-08-20 Thread GitBox



jon-wei edited a comment on issue #9380:
URL: https://github.com/apache/druid/issues/9380#issuecomment-677957679


   @pjain1 
   
   > For both compaction and load rule I thought it more intuitive to bind read 
with read and write with write. Should we just add a new resource type like 
DATASOURCE-CONFIG for representing both compaction and load/drop rules ?
   
   I think a new resource type for that is fine, perhaps DATASOURCE WRITE could 
also be rolled into this new combined permission (ingest, set up compaction, 
and manage the segment distribution, is there anything else that would make 
sense here?).
   
   > for migration we can have a flag to switch between older and newer 
security model, once they are ready they flip the switch. 
   
   Prior to the switch, how would the old and new model coexist?  Maybe we 
could separate the old and new authorization DB (like use a separate metadata 
table or some kind of namespacing), I imagine it could be confusing for the 
user if permissions for both models exist at the same time and place.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jon-wei commented on issue #9380: Fine grained config and state resources

2020-08-20 Thread GitBox



jon-wei commented on issue #9380:
URL: https://github.com/apache/druid/issues/9380#issuecomment-677957679


   @pjain1 
   
   > For both compaction and load rule I thought it more intuitive to bind read 
with read and write with write. Should we just add a new resource type like 
DATASOURCE-CONFIG for representing both compaction and load/drop rules ?
   
   I think a new resource type for that is fine, perhaps DATASOURCE WRITE could 
also be rolled into this new combined permission (ingest, set up compaction, 
and manage the segment distribution, is there anything else that makes here?).
   
   > for migration we can have a flag to switch between older and newer 
security model, once they are ready they flip the switch. 
   
   Prior to the switch, how would the old and new model coexist?  Maybe we 
could separate the old and new authorization DB (like use a separate metadata 
table or some kind of namespacing), I imagine it could be confusing for the 
user if permissions for both models exist at the same time and place.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[druid-website-src] branch master updated: Update druid-powered.md

2020-08-20 Thread fjy

This is an automated email from the ASF dual-hosted git repository.

fjy pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid-website-src.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e21b5d  Update druid-powered.md
8e21b5d is described below

commit 8e21b5d15e7fe9d148c9ad3e871c5f3a45c19839
Author: Fangjin Yang 
AuthorDate: Thu Aug 20 14:52:35 2020 -0700

Update druid-powered.md

reorder
---
 druid-powered.md | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/druid-powered.md b/druid-powered.md
index 14205fb..9970b20 100644
--- a/druid-powered.md
+++ b/druid-powered.md
@@ -46,6 +46,10 @@ Atomx is a new media exchange that connects networks, DSPs, 
SSPs, and other part
 
 [Bannerflow](https://www.bannerflow.com) is the leading display ad production 
platform. We use Druid to power our customer facing analytics system and for 
internal reporting, monitoring and ad-hoc data exploration.
 
+## BIGO
+
+BIGO selects Druid as an OLAP engine to analyze app(Like, Bigolive, IMO, etc.) 
data in bigdata team. Typical analysis in Druid includes apm, A/B testing, 
push, funnel model, etc. Besides, Superset is used as dashboard for deep 
integration with Druid.
+
 ## Billy Mobile
 
 Billy Mobile is a mobile advertising platform, excelling in the 
performance-based optimisation segment. We use Druid to power our real-time 
analytics dashboards, in which our publishers, advertisers and staff can get 
insights on how their campaigns, offers and traffic are performing, with 
sub-second query time and minute granularity . We are using a 
lambda-architecture aproach, ingesting the traffic in real time with 
Tranquility and Storm, and a batch layer via a tight integration with H [...]
@@ -252,6 +256,10 @@ which they can do that is in a big part thanks to Druid!
 
 * [Data Insights Engine @ 
MakeMyTrip](https://medium.com/makemytrip-engineering/data-insights-engine-makemytrip-900bd353d99c)
 
+## MAKESENS
+
+[MAKESENS](http://www.battery-doctor.cn/) use Druid to store and analyze 
battery data.
+
 ## Marchex
 
 Marchex uses Druid to provide data for Marchex Call Analytics' new customer 
facing Speech Analytics dashboards.
@@ -600,14 +608,5 @@ China Youzan is a SaaS company which principally engaged 
in retail science and t
 
 [Zuoyebang](http://www.zuoyebang.com/) is the most used K12 education 
platform, 7 out of every 10 K12 users are using Zuoyebang. At Zuoyebang Data 
Platform Group, we use the Druid in the advertising scene,  mainly related to 
advertising display, click, billing, and other functions. The performance and 
timeliness of druid can meet our OLAP queries very well.
 
-## BIGO
-
-BIGO selects Druid as an OLAP engine to analyze app(Like, Bigolive, IMO, etc.) 
data in bigdata team. Typical analysis in Druid includes apm, A/B testing, 
push, funnel model, etc. Besides, Superset is used as dashboard for deep 
integration with Druid.
-
-
-## MAKESENS
-
-[MAKESENS](http://www.battery-doctor.cn/) use Druid to store and analyze 
battery data.
-
 
 [Add Your 
Company](https://github.com/apache/druid-website-src/blob/master/druid-powered.md)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] abhishekrb19 commented on issue #10283: Druid SQL: Querying a new data source with no data returns a CalciteContextException.

2020-08-20 Thread GitBox



abhishekrb19 commented on issue #10283:
URL: https://github.com/apache/druid/issues/10283#issuecomment-677909367


   @gianm, thanks for the clarification. 
   
   > Data that's currently only available from the realtime indexing system is 
still good enough to update the SQL metadata. Would that be good enough in your 
case?
   
   Yes, I think this should be good enough for most cases. However, if the 
supervisor is stopped/terminated/suspended before any data arrives, I'd think 
the data source won't be query-able from Druid SQL API?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[druid] branch master updated (618c04a -> 7620b0c)

2020-08-20 Thread cwylie

This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 618c04a  Fix CombiningFirehose compatibility (#10264)
 add 7620b0c  Segment backed broadcast join IndexedTable (#10224)

No new revisions were added by this update.

Summary of changes:
 .../movingaverage/MovingAverageQueryTest.java  |   3 +-
 .../docker/environment-configs/broker  |   5 +-
 .../clients/CoordinatorResourceTestClient.java |  21 ++
 .../tests/query/ITBroadcastJoinQueryTest.java  | 130 
 .../druid/tests/query/ITWikipediaQueryTest.java|   2 +-
 ...ex_task.json => broadcast_join_index_task.json} |  59 ++--
 .../indexer/sys_segment_batch_index_queries.json   |   3 +
 .../queries/broadcast_join_metadata_queries.json   |  26 ++
 .../resources/queries/broadcast_join_queries.json  |  29 ++
 .../resources/queries/sys_segment_queries.json |   3 +
 .../results/auth_test_sys_schema_servers.json  |  10 +
 .../apache/druid/jackson/SegmentizerModule.java|   4 +
 .../QueryableIndexColumnSelectorFactory.java   |   4 +-
 .../druid/segment/SimpleAscendingOffset.java   |   5 +-
 .../druid/segment/SimpleDescendingOffset.java  |  10 +-
 ...oricalCursor.java => SimpleSettableOffset.java} |   7 +-
 .../apache/druid/segment/column/RowSignature.java  |   5 +
 .../apache/druid/segment/join/HashJoinEngine.java  |  12 +-
 .../join/HashJoinSegmentStorageAdapter.java|   8 +-
 .../org/apache/druid/segment/join/Joinable.java|   8 +-
 .../apache/druid/segment/join/JoinableFactory.java |   3 +-
 .../druid/segment/join/MapJoinableFactory.java |  52 ++--
 .../druid/segment/join/lookup/LookupJoinable.java  |   5 +-
 .../join/table/BroadcastSegmentIndexedTable.java   | 255 +++
 .../druid/segment/join/table/IndexedTable.java |  28 +-
 .../table/IndexedTableColumnSelectorFactory.java   |  10 +-
 .../table/IndexedTableColumnValueSelector.java |   4 +-
 .../join/table/IndexedTableDimensionSelector.java  |   5 +-
 .../join/table/IndexedTableJoinMatcher.java|  23 +-
 .../segment/join/table/IndexedTableJoinable.java   |  73 +++--
 .../join/table/ReferenceCountingIndexedTable.java  |  78 +
 .../segment/join/table/RowBasedIndexedTable.java   |  56 ++--
 ...JoinableMMappedQueryableSegmentizerFactory.java |  98 ++
 .../druid/segment/join/MapJoinableFactoryTest.java |  15 +-
 .../table/BroadcastSegmentIndexedTableTest.java| 336 
 .../join/table/IndexedTableJoinableTest.java   |  10 +-
 ...bleMMappedQueryableSegmentizerFactoryTest.java} | 114 +++
 .../java/org/apache/druid/guice/DruidBinders.java  |  14 +-
 .../apache/druid/guice/JoinableFactoryModule.java  |  16 +-
 .../join/BroadcastTableJoinableFactory.java|  79 +
 .../org/apache/druid/server/SegmentManager.java|  84 -
 .../druid/guice/JoinableFactoryModuleTest.java |  58 +++-
 .../druid/server/ClientQuerySegmentWalkerTest.java |  36 ++-
 .../org/apache/druid/server/QueryStackTests.java   |  29 +-
 ...egmentManagerBroadcastJoinIndexedTableTest.java | 343 +
 .../server/SegmentManagerThreadSafetyTest.java |   2 +-
 .../schema/DruidCalciteSchemaModuleTest.java   |   2 +-
 .../calcite/schema/DruidSchemaNoDataInitTest.java  |   3 +-
 .../druid/sql/calcite/schema/DruidSchemaTest.java  |   2 +-
 .../druid/sql/calcite/schema/SystemSchemaTest.java |   2 +-
 .../druid/sql/calcite/util/CalciteTests.java   |   6 +-
 51 files changed, 1918 insertions(+), 277 deletions(-)
 create mode 100644 
integration-tests/src/test/java/org/apache/druid/tests/query/ITBroadcastJoinQueryTest.java
 copy 
integration-tests/src/test/resources/indexer/{wikipedia_cloud_simple_index_task.json
 => broadcast_join_index_task.json} (50%)
 create mode 100644 
integration-tests/src/test/resources/queries/broadcast_join_metadata_queries.json
 create mode 100644 
integration-tests/src/test/resources/queries/broadcast_join_queries.json
 copy 
processing/src/main/java/org/apache/druid/segment/{historical/HistoricalCursor.java
 => SimpleSettableOffset.java} (84%)
 create mode 100644 
processing/src/main/java/org/apache/druid/segment/join/table/BroadcastSegmentIndexedTable.java
 create mode 100644 
processing/src/main/java/org/apache/druid/segment/join/table/ReferenceCountingIndexedTable.java
 create mode 100644 
processing/src/main/java/org/apache/druid/segment/loading/BroadcastJoinableMMappedQueryableSegmentizerFactory.java
 create mode 100644 
processing/src/test/java/org/apache/druid/segment/join/table/BroadcastSegmentIndexedTableTest.java
 copy 
processing/src/test/java/org/apache/druid/segment/{CustomSegmentizerFactoryTest.java
 => loading/BroadcastJoinableMMappedQueryableSegmentizerFactoryTest.java} (50%)
 create mode 100644 
server/src/main/java/org/apache/druid/segment/join/BroadcastTableJoinableFactory.java
 create mode 100644 
server/src/test/java

[GitHub] [druid] clintropolis merged pull request #10224: Segment backed broadcast join IndexedTable

2020-08-20 Thread GitBox



clintropolis merged pull request #10224:
URL: https://github.com/apache/druid/pull/10224


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] mangrrua commented on issue #9780: Support directly reading and writing Druid data from Spark

2020-08-20 Thread GitBox



mangrrua commented on issue #9780:
URL: https://github.com/apache/druid/issues/9780#issuecomment-677879134


   Hello @JulianJaffePinterest 
   
   Do you still work on this connector? We may improve this connector, but for 
the first version your source code seems good. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] mangrrua commented on issue #9835: Spark built in connector for Druid

2020-08-20 Thread GitBox



mangrrua commented on issue #9835:
URL: https://github.com/apache/druid/issues/9835#issuecomment-677875718


   Check issue #9780 for batch read/write operation 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid-website-src] druid-matt commented on pull request #150: fix broken links to docs

2020-08-20 Thread GitBox



druid-matt commented on pull request #150:
URL: https://github.com/apache/druid-website-src/pull/150#issuecomment-677868428


   the relative link was leading to a redirect that was breaking https and
   Google ranking doesn't like that
   
   On Thu, Aug 20, 2020 at 12:36 PM Gian Merlino 
   wrote:
   
   > *@gianm* commented on this pull request.
   > --
   >
   > In community/index.md
   > 

   > :
   >
   > > @@ -52,9 +52,9 @@ If you're looking for some starter projects, we 
maintain a [list of issues](http
   >  for new developers.
   >
   >  There are plenty of ways to help outside writing Druid code. *Code review 
of pull requests*
   > -(even if you are not a committer), feature suggestions, reporting bugs, 
[documentation](/docs/{{ site.druid_stable_version }}/)
   > +(even if you are not a committer), feature suggestions, reporting bugs, 
[documentation](https://druid.apache.org/docs/latest/design/index.html)
   >
   > Why not keep the relative links?
   > --
   >
   > In community/index.md
   > 

   > :
   >
   > >  and usability feedback all matter immensely. Another big way to help is
   > -through [client libraries](/docs/latest/development/libraries.html), 
which are
   > +through [client 
libraries](https://druid.apache.org/docs/latest/development/libraries.html), 
which are
   >
   > Why not keep the relative links?
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   
   
   -- 
   Matt Sarrel
   Developer Evangelist
   Apache Druid  and Imply Data 
   matt.sar...@imply.io
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #10279: Add SQL "OFFSET" clause.

2020-08-20 Thread GitBox



gianm commented on a change in pull request #10279:
URL: https://github.com/apache/druid/pull/10279#discussion_r474232782



##
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##
@@ -897,6 +906,11 @@ public GroupByQuery toGroupByQuery()
   return null;
 }
 
+if (sorting != null && sorting.getOffsetLimit().hasLimit() && 
sorting.getOffsetLimit().getLimit() <= 0) {

Review comment:
   It can't, but I guess it doesn't hurt to overcheck.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] clintropolis commented on a change in pull request #10279: Add SQL "OFFSET" clause.

2020-08-20 Thread GitBox



clintropolis commented on a change in pull request #10279:
URL: https://github.com/apache/druid/pull/10279#discussion_r474231095



##
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##
@@ -897,6 +906,11 @@ public GroupByQuery toGroupByQuery()
   return null;
 }
 
+if (sorting != null && sorting.getOffsetLimit().hasLimit() && 
sorting.getOffsetLimit().getLimit() <= 0) {

Review comment:
   can limit be less than 0 here because of the preconditions in 
`OffsetLimit`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] gianm edited a comment on issue #10282: Druid SQL: A sub-query with expressions returns no data potentially due to an incorrect Query plan.

2020-08-20 Thread GitBox



gianm edited a comment on issue #10282:
URL: https://github.com/apache/druid/issues/10282#issuecomment-677862016


   Hmm, in general it should be OK to move a HAVING from an inner query to the 
WHERE of an outer query. That's what seems to be happening here. It doesn't 
look like a smoking gun to me. But it's possible it didn't get moved properly.
   
   Unfortunately query plans for multi stage queries can be tough to read. 
Personally I find it easier to enable request logging on the Broker 
(https://support.imply.io/hc/en-us/articles/360011745614-Enable-Query-or-Request-logging-for-Druid)
 and then grab the native query out of the request log.
   
   Could you do that and post it here? It'd be nice to see what the native 
query is.
   
   (We're hoping to change the output of explain plan to be more readable in 
the future.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] gianm commented on issue #10282: Druid SQL: A sub-query with expressions returns no data potentially due to an incorrect Query plan.

2020-08-20 Thread GitBox



gianm commented on issue #10282:
URL: https://github.com/apache/druid/issues/10282#issuecomment-677862016


   Hmm, in general it should be OK to move a HAVING from an inner query to the 
WHERE of an outer query. That's what seems to be happening here. It doesn't 
look like a smoking gun to me. But it's possible it didn't get moved properly.
   
   Unfortunately query plans for multi stage queries can be tough to read. 
Personally I find it easier to enable request logging on the Broker 
(https://support.imply.io/hc/en-us/articles/360011745614-Enable-Query-or-Request-logging-for-Druid)
 and then grab the native query out of the request log.
   
   Could you do that and post it here? It'd be nice to see what the native 
query is.
   
   (I'm hoping to change the output of explain plan to be more readable in the 
future.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid-website-src] gianm merged pull request #151: fix broken link to Docs

2020-08-20 Thread GitBox



gianm merged pull request #151:
URL: https://github.com/apache/druid-website-src/pull/151


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[druid-website-src] branch master updated: fix broken link to Docs

2020-08-20 Thread gian

This is an automated email from the ASF dual-hosted git repository.

gian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid-website-src.git


The following commit(s) were added to refs/heads/master by this push:
 new 001b8d6  fix broken link to Docs
 new c4928a8  Merge pull request #151 from druid-matt/patch-6
001b8d6 is described below

commit 001b8d60ecb183137416e5a6cd0a230b54c9ccd2
Author: Matt Sarrel <59710709+druid-m...@users.noreply.github.com>
AuthorDate: Wed Aug 19 20:41:27 2020 -0700

fix broken link to Docs
---
 _includes/page_footer.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_includes/page_footer.html b/_includes/page_footer.html
index e46436b..2f3bc2e 100644
--- a/_includes/page_footer.html
+++ b/_includes/page_footer.html
@@ -6,7 +6,7 @@
 Technology · 
 Use Cases · 
 Powered by Druid · 
-Docs · 
+Docs · 
 Community · 
 Download · 
 FAQ


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid-website-src] gianm commented on a change in pull request #150: fix broken links to docs

2020-08-20 Thread GitBox



gianm commented on a change in pull request #150:
URL: https://github.com/apache/druid-website-src/pull/150#discussion_r474224670



##
File path: community/index.md
##
@@ -52,9 +52,9 @@ If you're looking for some starter projects, we maintain a 
[list of issues](http
 for new developers.
 
 There are plenty of ways to help outside writing Druid code. *Code review of 
pull requests*
-(even if you are not a committer), feature suggestions, reporting bugs, 
[documentation](/docs/{{ site.druid_stable_version }}/)
+(even if you are not a committer), feature suggestions, reporting bugs, 
[documentation](https://druid.apache.org/docs/latest/design/index.html)

Review comment:
   Why not keep the relative links?

##
File path: community/index.md
##
@@ -52,9 +52,9 @@ If you're looking for some starter projects, we maintain a 
[list of issues](http
 for new developers.
 
 There are plenty of ways to help outside writing Druid code. *Code review of 
pull requests*
-(even if you are not a committer), feature suggestions, reporting bugs, 
[documentation](/docs/{{ site.druid_stable_version }}/)
+(even if you are not a committer), feature suggestions, reporting bugs, 
[documentation](https://druid.apache.org/docs/latest/design/index.html)
 and usability feedback all matter immensely. Another big way to help is
-through [client libraries](/docs/latest/development/libraries.html), which are
+through [client 
libraries](https://druid.apache.org/docs/latest/development/libraries.html), 
which are

Review comment:
   Why not keep the relative links?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[druid-website-src] branch master updated: fix /community/ link

2020-08-20 Thread gian

This is an automated email from the ASF dual-hosted git repository.

gian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid-website-src.git


The following commit(s) were added to refs/heads/master by this push:
 new 9002f21  fix /community/ link
 new aec426b  Merge pull request #153 from druid-matt/patch-8
9002f21 is described below

commit 9002f21bb468ac8f01835b8e1009956343783d1e
Author: Matt Sarrel <59710709+druid-m...@users.noreply.github.com>
AuthorDate: Wed Aug 19 21:06:11 2020 -0700

fix /community/ link
---
 index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/index.html b/index.html
index c54266d..7c66563 100644
--- a/index.html
+++ b/index.html
@@ -94,7 +94,7 @@ canonical: 'https://druid.apache.org/'
   
   Get Help
   
-Get help from a wide network of community 
members about using Druid.
+Get help from a wide network of community 
members about using Druid.
   
 
   


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid-website-src] gianm merged pull request #154: fix /community/ link

2020-08-20 Thread GitBox



gianm merged pull request #154:
URL: https://github.com/apache/druid-website-src/pull/154


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid-website-src] gianm merged pull request #153: fix /community/ link

2020-08-20 Thread GitBox



gianm merged pull request #153:
URL: https://github.com/apache/druid-website-src/pull/153


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[druid-website-src] branch master updated: fix link to basic-cluster-tuning.html

2020-08-20 Thread gian

This is an automated email from the ASF dual-hosted git repository.

gian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid-website-src.git


The following commit(s) were added to refs/heads/master by this push:
 new da3a8f3  fix link to basic-cluster-tuning.html
 new b7eaa2d  Merge pull request #152 from druid-matt/patch-7
da3a8f3 is described below

commit da3a8f32f5a443fdea4878d15acc5381314877ce
Author: Matt Sarrel <59710709+druid-m...@users.noreply.github.com>
AuthorDate: Wed Aug 19 21:01:13 2020 -0700

fix link to basic-cluster-tuning.html
---
 technology.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/technology.md b/technology.md
index d506f82..e516692 100644
--- a/technology.md
+++ b/technology.md
@@ -182,4 +182,4 @@ As such, Druid possesses several features to ensure uptime 
and no data loss.
   
 
 
-For more information, please visit [our docs 
page](/docs/latest/operations/recommendations.html).
+For more information, please visit [our docs 
page](/docs/latest/operations/basic-cluster-tuning.html).


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid-website-src] gianm merged pull request #152: fix link to basic-cluster-tuning.html

2020-08-20 Thread GitBox



gianm merged pull request #152:
URL: https://github.com/apache/druid-website-src/pull/152


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[druid-website-src] branch master updated: fix /community/ link

2020-08-20 Thread gian

This is an automated email from the ASF dual-hosted git repository.

gian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid-website-src.git


The following commit(s) were added to refs/heads/master by this push:
 new add6ccd  fix /community/ link
 new f77fa70  Merge pull request #154 from druid-matt/patch-9
add6ccd is described below

commit add6ccd5376f99620f56c1f8293702378f65435c
Author: Matt Sarrel <59710709+druid-m...@users.noreply.github.com>
AuthorDate: Wed Aug 19 21:08:30 2020 -0700

fix /community/ link
---
 faq.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/faq.md b/faq.md
index 4ef4283..7f17eca 100644
--- a/faq.md
+++ b/faq.md
@@ -1,6 +1,6 @@
 ---
 title: Frequently Asked Questions
-subtitle: Don't see your question here? Ask us
+subtitle: Don't see your question here? Ask us
 layout: simple_page
 sectionid: faq
 canonical: 'https://druid.apache.org/faq'


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] gianm commented on issue #10283: Druid SQL: Querying a new data source with no data returns a CalciteContextException.

2020-08-20 Thread GitBox



gianm commented on issue #10283:
URL: https://github.com/apache/druid/issues/10283#issuecomment-677859327


   This one is interesting.
   
   Our SQL metadata is based on doing segment metadata queries. (We look at all 
the active segments and build a table schema based on what columns are found.) 
Therefore, as far as the SQL API is concerned, datasources don't exist until 
they actually have some segments.
   
   You should be able to query the datasource `ds` once it has some data, i.e., 
once the supervisor launches tasks and they read something. It doesn't need to 
be published. Data that's currently only available from the realtime indexing 
system is still good enough to update the SQL metadata. Would that be good 
enough in your case?
   
   Beyond that, it would be tough to do a more fundamental fix without making 
major changes to how SQL metadata works in Druid. You could imagine defining 
what tables _should be_ rather than having the SQL layer observe what they 
_are_. This would be in line with what a lot of RDBMSes do (CREATE TABLE, ALTER 
TABLE, etc). It's something that does have its uses.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #10279: Add SQL "OFFSET" clause.

2020-08-20 Thread GitBox



gianm commented on a change in pull request #10279:
URL: https://github.com/apache/druid/pull/10279#discussion_r474217036



##
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##
@@ -762,9 +756,20 @@ public TimeseriesQuery toTimeseriesQuery()
   
Iterables.getOnlyElement(grouping.getDimensions()).toDimensionSpec().getOutputName()
   );
   if (sorting != null) {
-// If there is sorting, set timeseriesLimit to given value if less 
than Integer.Max_VALUE
-if (sorting.isLimited()) {
-  timeseriesLimit = Ints.checkedCast(sorting.getLimit());
+if (sorting.getOffsetLimit().hasOffset()) {
+  // Timeseries cannot handle offsets.
+  return null;
+}
+
+if (sorting.getOffsetLimit().hasLimit()) {
+  final long limit = sorting.getOffsetLimit().getLimit();
+
+  if (limit == 0) {

Review comment:
   That's true; at this point, a zero should be treated as an explicit 
zero. The native timeseries query treats `0` and `null` equivalently and both 
result in an unlimited query. So we have to bail out if the limit is an 
explicit zero.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] gianm commented on pull request #10279: Add SQL "OFFSET" clause.

2020-08-20 Thread GitBox



gianm commented on pull request #10279:
URL: https://github.com/apache/druid/pull/10279#issuecomment-677854285


   @clintropolis Any other comments or should we merge this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jihoonson commented on a change in pull request #10243: Add maxNumFiles to splitHintSpec

2020-08-20 Thread GitBox



jihoonson commented on a change in pull request #10243:
URL: https://github.com/apache/druid/pull/10243#discussion_r474171560



##
File path: 
core/src/main/java/org/apache/druid/data/input/MaxSizeSplitHintSpec.java
##
@@ -43,22 +45,55 @@
   public static final String TYPE = "maxSize";
 
   @VisibleForTesting
-  static final long DEFAULT_MAX_SPLIT_SIZE = 512 * 1024 * 1024;
+  static final HumanReadableBytes DEFAULT_MAX_SPLIT_SIZE = new 
HumanReadableBytes("1GiB");
 
-  private final long maxSplitSize;
+  /**
+   * There are two known issues when a split contains a large list of files.
+   *
+   * - 'jute.maxbuffer' in ZooKeeper. This system property controls the max 
size of ZNode. As its default is 500KB,
+   *   task allocation can fail if the serialized ingestion spec is larger 
than this limit.
+   * - 'max_allowed_packet' in MySQL. This is the max size of a communication 
packet sent to a MySQL server.
+   *   The default is either 64MB or 4MB depending on MySQL version. Updating 
metadata store can fail if the serialized
+   *   ingestion spec is larger than this limit.
+   *
+   * The default is conservatively chosen as 1000.
+   */
+  @VisibleForTesting
+  static final int DEFAULT_MAX_NUM_FILES = 1000;
+
+  private final HumanReadableBytes maxSplitSize;
+  private final int maxNumFiles;
 
   @JsonCreator
-  public MaxSizeSplitHintSpec(@JsonProperty("maxSplitSize") @Nullable Long 
maxSplitSize)
+  public MaxSizeSplitHintSpec(
+  @JsonProperty("maxSplitSize") @Nullable HumanReadableBytes maxSplitSize,
+  @JsonProperty("maxNumFiles") @Nullable Integer maxNumFiles
+  )
   {
 this.maxSplitSize = maxSplitSize == null ? DEFAULT_MAX_SPLIT_SIZE : 
maxSplitSize;
+this.maxNumFiles = maxNumFiles == null ? DEFAULT_MAX_NUM_FILES : 
maxNumFiles;
+Preconditions.checkArgument(this.maxSplitSize.getBytes() > 0, 
"maxSplitSize should be larger than 0");
+Preconditions.checkArgument(this.maxNumFiles > 0, "maxNumFiles should be 
larger than 0");
+  }
+
+  @VisibleForTesting
+  public MaxSizeSplitHintSpec(long maxSplitSize, @Nullable Integer maxNumFiles)
+  {
+this(new HumanReadableBytes(maxSplitSize), maxNumFiles);
   }
 
   @JsonProperty
-  public long getMaxSplitSize()
+  public HumanReadableBytes getMaxSplitSize()

Review comment:
   Ah good idea. It actually needs the same change because it needs to be 
identical to `maxSize` splitHintSpec at least for now. We could implement a 
better split based on stats in the future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s commented on pull request #10264: Fix CombiningFirehose compatibility

2020-08-20 Thread GitBox



suneet-s commented on pull request #10264:
URL: https://github.com/apache/druid/pull/10264#issuecomment-677802941


   Verified by running the tutorial locally. Thanks again @a2l007 !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s merged pull request #10264: Fix CombiningFirehose compatibility

2020-08-20 Thread GitBox



suneet-s merged pull request #10264:
URL: https://github.com/apache/druid/pull/10264


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] suneet-s closed issue #9389: CombiningFirehoseFactory cannot be cast to FiniteFirehoseFactory

2020-08-20 Thread GitBox



suneet-s closed issue #9389:
URL: https://github.com/apache/druid/issues/9389


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[druid] branch master updated (b36dab0 -> 618c04a)

2020-08-20 Thread suneet

This is an automated email from the ASF dual-hosted git repository.

suneet pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from b36dab0  fix connectionId issue with JDBC prepared statement queries 
and router  (#10272)
 add 618c04a  Fix CombiningFirehose compatibility (#10264)

No new revisions were added by this update.

Summary of changes:
 ...va => ITCombiningFirehoseFactoryIndexTest.java} |  62 +
 ...wikipedia_combining_firehose_index_queries.json | 141 +
 .../wikipedia_combining_firehose_index_task.json}  |  92 +++---
 .../wikipedia_combining_index_data.json}   |   6 +-
 .../firehose/CombiningFirehoseFactory.java |  32 -
 .../firehose/CombiningFirehoseFactoryTest.java |  41 --
 6 files changed, 285 insertions(+), 89 deletions(-)
 copy 
integration-tests/src/test/java/org/apache/druid/tests/indexer/{ITTestCoordinatorPausedTest.java
 => ITCombiningFirehoseFactoryIndexTest.java} (52%)
 create mode 100644 
integration-tests/src/test/resources/indexer/wikipedia_combining_firehose_index_queries.json
 copy 
integration-tests/src/test/resources/{hadoop/wikipedia_hadoop_index_task.json 
=> indexer/wikipedia_combining_firehose_index_task.json} (62%)
 copy 
integration-tests/src/test/resources/{data/batch_index/json/wikipedia_index_data1.json
 => indexer/wikipedia_combining_index_data.json} (58%)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: WIP: Add vectorization for druid-histogram extension

2020-08-20 Thread GitBox



lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-677723556


   This pull request **fixes 1 alert** when merging 
55401cde3715db3baec2d617cbcef7131713fee2 into 
b36dab0fe65557e4aaf675423d61bdaa51501a71 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-f7d906ed2de4ea503e82cd3e9e84842531b0b843)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] badalgeek commented on pull request #9579: Add Apache Ranger Authorization

2020-08-20 Thread GitBox



badalgeek commented on pull request #9579:
URL: https://github.com/apache/druid/pull/9579#issuecomment-677720282


   @bolkedebruin 
   Could you please provide ranger-servicedef-druid.json as specified in the 
documentation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: WIP: Add vectorization for druid-histogram extension

2020-08-20 Thread GitBox



abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r474040705



##
File path: 
extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogramBufferAggregatorInternal.java
##
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.common.config.NullHandling;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class FixedBucketsHistogramBufferAggregatorInternal
+{
+  private final double lowerLimit;
+  private final double upperLimit;
+  private final int numBuckets;
+  private final FixedBucketsHistogram.OutlierHandlingMode outlierHandlingMode;
+
+  public FixedBucketsHistogramBufferAggregatorInternal(
+  double lowerLimit,
+  double upperLimit,
+  int numBuckets,
+  FixedBucketsHistogram.OutlierHandlingMode outlierHandlingMode
+  )
+  {
+
+this.lowerLimit = lowerLimit;
+this.upperLimit = upperLimit;
+this.numBuckets = numBuckets;
+this.outlierHandlingMode = outlierHandlingMode;
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+ByteBuffer mutationBuffer = buf.duplicate();
+mutationBuffer.position(position);
+FixedBucketsHistogram histogram = new FixedBucketsHistogram(
+lowerLimit,
+upperLimit,
+numBuckets,
+outlierHandlingMode
+);
+mutationBuffer.put(histogram.toBytesFull(false));
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable Object val)
+  {
+ByteBuffer mutationBuffer = buf.duplicate();
+mutationBuffer.position(position);
+
+FixedBucketsHistogram h0 = 
FixedBucketsHistogram.fromByteBufferFullNoSerdeHeader(mutationBuffer);
+combine(h0, val);
+
+mutationBuffer.position(position);
+mutationBuffer.put(h0.toBytesFull(false));
+  }
+
+  public void combine(FixedBucketsHistogram histogram, @Nullable Object next)
+  {
+if (next == null) {
+  if (NullHandling.replaceWithDefault()) {

Review comment:
   I might have carried forward a bug here. This if/else should most likely 
be inverted. cc @jon-wei 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] abhishekagarwal87 opened a new pull request #10304: WIP: Add vectorization for druid-histogram extension

2020-08-20 Thread GitBox



abhishekagarwal87 opened a new pull request #10304:
URL: https://github.com/apache/druid/pull/10304


   
   ### Description
   
   This PR adds vectorization support for Aggregators in the `druid-histogram` 
extension. While these changes are unlikely to result in the usage of SIMD 
instructions, they can still help gain performance in two ways I can think of
 - Being more cache-friendly and less number of function calls
 - Enable vectorization for the whole query when one of the participating 
aggregator is `Approximate Histogram` assuming other aggregators in query 
support vectorization. 
   
   I am yet to add more tests hence the PR is in draft stage. 
   
   
   
   This PR has:
   - [ ] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   
   
   
   
   # Key changed/added classes in this PR
* `MyFoo`
* `OurBar`
* `TheirBaz`
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] clintropolis commented on a change in pull request #10279: Add SQL "OFFSET" clause.

2020-08-20 Thread GitBox



clintropolis commented on a change in pull request #10279:
URL: https://github.com/apache/druid/pull/10279#discussion_r473853234



##
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##
@@ -762,9 +756,20 @@ public TimeseriesQuery toTimeseriesQuery()
   
Iterables.getOnlyElement(grouping.getDimensions()).toDimensionSpec().getOutputName()
   );
   if (sorting != null) {
-// If there is sorting, set timeseriesLimit to given value if less 
than Integer.Max_VALUE
-if (sorting.isLimited()) {
-  timeseriesLimit = Ints.checkedCast(sorting.getLimit());
+if (sorting.getOffsetLimit().hasOffset()) {
+  // Timeseries cannot handle offsets.
+  return null;
+}
+
+if (sorting.getOffsetLimit().hasLimit()) {
+  final long limit = sorting.getOffsetLimit().getLimit();
+
+  if (limit == 0) {

Review comment:
   Ah i wasn't thinking it through, I guess you still need to check 
`hasLimit` for the non-zero case.
   
   For timeseries though, it looks like `timeseriesLimit` is initialized to 0, 
is returning null here changing behavior? I guess a zero here is an explicit 
zero, meaning unlimited wasn't probably the expected response?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] clintropolis commented on a change in pull request #10279: Add SQL "OFFSET" clause.

2020-08-20 Thread GitBox



clintropolis commented on a change in pull request #10279:
URL: https://github.com/apache/druid/pull/10279#discussion_r473853234



##
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##
@@ -762,9 +756,20 @@ public TimeseriesQuery toTimeseriesQuery()
   
Iterables.getOnlyElement(grouping.getDimensions()).toDimensionSpec().getOutputName()
   );
   if (sorting != null) {
-// If there is sorting, set timeseriesLimit to given value if less 
than Integer.Max_VALUE
-if (sorting.isLimited()) {
-  timeseriesLimit = Ints.checkedCast(sorting.getLimit());
+if (sorting.getOffsetLimit().hasOffset()) {
+  // Timeseries cannot handle offsets.
+  return null;
+}
+
+if (sorting.getOffsetLimit().hasLimit()) {
+  final long limit = sorting.getOffsetLimit().getLimit();
+
+  if (limit == 0) {

Review comment:
   Ah i wasn't thinking it through, I guess you still need to check 
`hasLimit` for the non-zero case.
   
   For timeseries though, it looks like `timeseriesLimit` is initialized to 0, 
is returning null here changing behavior?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] clintropolis commented on a change in pull request #10279: Add SQL "OFFSET" clause.

2020-08-20 Thread GitBox



clintropolis commented on a change in pull request #10279:
URL: https://github.com/apache/druid/pull/10279#discussion_r473832613



##
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java
##
@@ -762,9 +756,20 @@ public TimeseriesQuery toTimeseriesQuery()
   
Iterables.getOnlyElement(grouping.getDimensions()).toDimensionSpec().getOutputName()
   );
   if (sorting != null) {
-// If there is sorting, set timeseriesLimit to given value if less 
than Integer.Max_VALUE
-if (sorting.isLimited()) {
-  timeseriesLimit = Ints.checkedCast(sorting.getLimit());
+if (sorting.getOffsetLimit().hasOffset()) {
+  // Timeseries cannot handle offsets.
+  return null;
+}
+
+if (sorting.getOffsetLimit().hasLimit()) {
+  final long limit = sorting.getOffsetLimit().getLimit();
+
+  if (limit == 0) {

Review comment:
   Would it be worth pushing a method into `OffsetLimit` to check that 
limit is greater than 0 to use for timeseries/group by/scan to use instead of 
`hasLimit`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] clintropolis commented on a change in pull request #10243: Add maxNumFiles to splitHintSpec

2020-08-20 Thread GitBox



clintropolis commented on a change in pull request #10243:
URL: https://github.com/apache/druid/pull/10243#discussion_r473797085



##
File path: 
core/src/main/java/org/apache/druid/data/input/MaxSizeSplitHintSpec.java
##
@@ -43,22 +45,55 @@
   public static final String TYPE = "maxSize";
 
   @VisibleForTesting
-  static final long DEFAULT_MAX_SPLIT_SIZE = 512 * 1024 * 1024;
+  static final HumanReadableBytes DEFAULT_MAX_SPLIT_SIZE = new 
HumanReadableBytes("1GiB");
 
-  private final long maxSplitSize;
+  /**
+   * There are two known issues when a split contains a large list of files.
+   *
+   * - 'jute.maxbuffer' in ZooKeeper. This system property controls the max 
size of ZNode. As its default is 500KB,
+   *   task allocation can fail if the serialized ingestion spec is larger 
than this limit.
+   * - 'max_allowed_packet' in MySQL. This is the max size of a communication 
packet sent to a MySQL server.
+   *   The default is either 64MB or 4MB depending on MySQL version. Updating 
metadata store can fail if the serialized
+   *   ingestion spec is larger than this limit.
+   *
+   * The default is conservatively chosen as 1000.
+   */
+  @VisibleForTesting
+  static final int DEFAULT_MAX_NUM_FILES = 1000;
+
+  private final HumanReadableBytes maxSplitSize;
+  private final int maxNumFiles;
 
   @JsonCreator
-  public MaxSizeSplitHintSpec(@JsonProperty("maxSplitSize") @Nullable Long 
maxSplitSize)
+  public MaxSizeSplitHintSpec(
+  @JsonProperty("maxSplitSize") @Nullable HumanReadableBytes maxSplitSize,
+  @JsonProperty("maxNumFiles") @Nullable Integer maxNumFiles
+  )
   {
 this.maxSplitSize = maxSplitSize == null ? DEFAULT_MAX_SPLIT_SIZE : 
maxSplitSize;
+this.maxNumFiles = maxNumFiles == null ? DEFAULT_MAX_NUM_FILES : 
maxNumFiles;
+Preconditions.checkArgument(this.maxSplitSize.getBytes() > 0, 
"maxSplitSize should be larger than 0");
+Preconditions.checkArgument(this.maxNumFiles > 0, "maxNumFiles should be 
larger than 0");
+  }
+
+  @VisibleForTesting
+  public MaxSizeSplitHintSpec(long maxSplitSize, @Nullable Integer maxNumFiles)
+  {
+this(new HumanReadableBytes(maxSplitSize), maxNumFiles);
   }
 
   @JsonProperty
-  public long getMaxSplitSize()
+  public HumanReadableBytes getMaxSplitSize()

Review comment:
   Should `SegmentsSplitHintSpec` be updated to accept `HumanReadableBytes` 
for `maxInputSegmentBytesPerTask` as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [druid] jihoonson commented on a change in pull request #10243: Add maxNumFiles to splitHintSpec

2020-08-20 Thread GitBox



jihoonson commented on a change in pull request #10243:
URL: https://github.com/apache/druid/pull/10243#discussion_r473691192



##
File path: 
core/src/main/java/org/apache/druid/data/input/MaxSizeSplitHintSpec.java
##
@@ -43,22 +45,55 @@
   public static final String TYPE = "maxSize";
 
   @VisibleForTesting
-  static final long DEFAULT_MAX_SPLIT_SIZE = 512 * 1024 * 1024;
+  static final HumanReadableBytes DEFAULT_MAX_SPLIT_SIZE = new 
HumanReadableBytes("1GiB");
 
-  private final long maxSplitSize;
+  /**
+   * There are two known issues when a split contains a large list of files.
+   *
+   * - 'jute.maxbuffer' in ZooKeeper. This system property controls the max 
size of ZNode. As its default is 500KB,
+   *   task allocation can fail if the serialized ingestion spec is larger 
than this limit.
+   * - 'max_allowed_packet' in MySQL. This is the max size of a communication 
packet sent to a MySQL server.
+   *   The default is either 64MB or 4MB depending on MySQL version. Updating 
metadata store can fail if the serialized
+   *   ingestion spec is larger than this limit.
+   *
+   * The default is consertively chosen as 1000.

Review comment:
   Oops, thanks.

##
File path: docs/ingestion/native-batch.md
##
@@ -232,7 +232,8 @@ The size-based split hint spec is respected by all 
splittable input sources exce
 |property|description|default|required?|
 ||---|---|-|
 |type|This should always be `maxSize`.|none|yes|
-|maxSplitSize|Maximum number of bytes of input files to process in a single 
task. If a single file is larger than this number, it will be processed by 
itself in a single task (Files are never split across tasks yet).|500MB|no|
+|maxSplitSize|Maximum number of bytes of input files to process in a single 
task. If a single file is larger than this number, it will be processed by 
itself in a single task (Files are never split across tasks yet). Noe that one 
subtask will not process more files than `maxNumFiles` even if their total size 
is smaller than `maxSplitSize`. [Human-readable 
format](../configuration/human-readable-byte.md) is supported.|1GiB|no|

Review comment:
   Thanks, fixed.

##
File path: docs/ingestion/native-batch.md
##
@@ -232,7 +232,8 @@ The size-based split hint spec is respected by all 
splittable input sources exce
 |property|description|default|required?|
 ||---|---|-|
 |type|This should always be `maxSize`.|none|yes|
-|maxSplitSize|Maximum number of bytes of input files to process in a single 
task. If a single file is larger than this number, it will be processed by 
itself in a single task (Files are never split across tasks yet).|500MB|no|
+|maxSplitSize|Maximum number of bytes of input files to process in a single 
task. If a single file is larger than this number, it will be processed by 
itself in a single task (Files are never split across tasks yet). Noe that one 
subtask will not process more files than `maxNumFiles` even if their total size 
is smaller than `maxSplitSize`. [Human-readable 
format](../configuration/human-readable-byte.md) is supported.|1GiB|no|
+|maxNumFiles|Maximum number of input files to process in a single task. This 
limit is to avoid task failures when the ingestion spec is too long. There are 
two known limits on the max size of serialized ingestion spec, i.e., the max 
ZNode size in ZooKeeper (`jute.maxbuffer`) and the max packet size in MySQL 
(`max_allowed_packet`). These can make ingestion tasks fail if the serialized 
ingestion spec size hits one of them. Note that one subtask will not process 
more data than `maxSplitSize` even if the total number of files is smaller than 
`maxNumFiles`.|1000|no|

Review comment:
   Hmm, it is applied to only subtasks. I changed `single task` to `single 
subtask`. Is it better?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

42 matches

Mail list logo