[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10366: Proposed changes for making joins cacheable

2020-09-14 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r487837336



##
File path: 
server/src/main/java/org/apache/druid/query/ResultLevelCachingQueryRunner.java
##
@@ -86,7 +92,15 @@ public ResultLevelCachingQueryRunner(
 if (useResultCache || populateResultCache) {
 
   final String cacheKeyStr = 
StringUtils.fromUtf8(strategy.computeResultLevelCacheKey(query));
-  final byte[] cachedResultSet = 
fetchResultsFromResultLevelCache(cacheKeyStr);
+  DataSourceAnalysis analysis = 
DataSourceAnalysis.forDataSource(query.getDataSource());
+  byte[] dataSourceCacheKey = 
Joinables.computeDataSourceCacheKey(analysis, joinableFactory).orElse(null);
+  if (null == dataSourceCacheKey) {
+return baseRunner.run(
+queryPlus,
+responseContext
+);
+  }
+  final byte[] cachedResultSet = 
fetchResultsFromResultLevelCache(cacheKeyStr, dataSourceCacheKey);

Review comment:
   earlier `cacheKeyStr` was being used as namespace as well as the key 
which was just like unnecessary duplication. Internally, the namespace and key 
are composed together into another bigger key. By the way, this too would cause 
cache bust once on an upgrade. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10366: Proposed changes for making joins cacheable

2020-09-14 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r487835902



##
File path: 
processing/src/main/java/org/apache/druid/segment/join/table/BroadcastSegmentIndexedTable.java
##
@@ -241,6 +244,22 @@ public ColumnSelectorFactory 
makeColumnSelectorFactory(ReadableOffset offset, bo
 );
   }
 
+  @Override
+  public Optional computeCacheKey()
+  {
+SegmentId segmentId = segment.getId();

Review comment:
   I wanted to use `CacheUtil` here but that is in `server` module and not 
accessible here. May be I can move that class to `core`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10366: Proposed changes for making joins cacheable

2020-09-14 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r487819771



##
File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
##
@@ -118,6 +126,47 @@ public static boolean isPrefixedBy(final String 
columnName, final String prefix)
 );
   }
 
+  /**
+   * Compute a cache key prefix for data sources that is to be used in segment 
level and result level caches. The
+   * data source can either be base (clauses is empty) or RHS of a join 
(clauses is non-empty). In both of the cases,
+   * a non-null cache is returned. However, the cache key is null if there is 
a join and some of the right data sources
+   * participating in the join do not support caching yet
+   *
+   * @param dataSourceAnalysis
+   * @param joinableFactory
+   * @return
+   */
+  public static Optional computeDataSourceCacheKey(
+  final DataSourceAnalysis dataSourceAnalysis,
+  final JoinableFactory joinableFactory
+  )
+  {
+final CacheKeyBuilder keyBuilder;
+final List clauses = 
dataSourceAnalysis.getPreJoinableClauses();
+if (clauses.isEmpty()) {
+  keyBuilder = new CacheKeyBuilder(REGULAR_OPERATION);

Review comment:
   You are right. I was thinking that it will only be one time bust. And, 
that having a distinct prefix will eliminate the possibility of a cache key 
collision. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10366: Proposed changes for making joins cacheable

2020-09-14 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r487818427



##
File path: 
server/src/main/java/org/apache/druid/server/coordination/ServerManager.java
##
@@ -293,21 +302,28 @@ public ServerManager(
 queryMetrics -> queryMetrics.segment(segmentIdString)
 );
 
-CachingQueryRunner cachingQueryRunner = new CachingQueryRunner<>(
-segmentIdString,
-segmentDescriptor,
-objectMapper,
-cache,
-toolChest,
-metricsEmittingQueryRunnerInner,
-cachePopulator,
-cacheConfig
-);
+QueryRunner queryRunner;

Review comment:
   I was thinking that less places worry about the different meanings of 
`cacheKeyPrefix` content, the better it is. So for CachingQueryRunner, 
`cacheKeyPrefix` is just a plain dumb byte array 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10366: Proposed changes for making joins cacheable

2020-09-14 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r48760



##
File path: 
server/src/main/java/org/apache/druid/segment/join/BroadcastTableJoinableFactory.java
##
@@ -76,4 +76,27 @@ public boolean isDirectlyJoinable(DataSource dataSource)
 }
 return Optional.empty();
   }
+
+  @Override
+  public Optional computeJoinCacheKey(DataSource dataSource)

Review comment:
   Yes. good point. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10366: Proposed changes for making joins cacheable

2020-09-14 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r487699662



##
File path: 
server/src/main/java/org/apache/druid/query/ResultLevelCachingQueryRunner.java
##
@@ -86,7 +92,15 @@ public ResultLevelCachingQueryRunner(
 if (useResultCache || populateResultCache) {
 
   final String cacheKeyStr = 
StringUtils.fromUtf8(strategy.computeResultLevelCacheKey(query));
-  final byte[] cachedResultSet = 
fetchResultsFromResultLevelCache(cacheKeyStr);
+  DataSourceAnalysis analysis = 
DataSourceAnalysis.forDataSource(query.getDataSource());
+  byte[] dataSourceCacheKey = 
Joinables.computeDataSourceCacheKey(analysis, joinableFactory).orElse(null);
+  if (null == dataSourceCacheKey) {
+return baseRunner.run(
+queryPlus,
+responseContext
+);
+  }
+  final byte[] cachedResultSet = 
fetchResultsFromResultLevelCache(cacheKeyStr, dataSourceCacheKey);

Review comment:
   yeah. this is not well thought out right now. I will revisit this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10366: Proposed changes for making joins cacheable

2020-09-14 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r487699118



##
File path: 
processing/src/main/java/org/apache/druid/segment/join/table/BroadcastSegmentIndexedTable.java
##
@@ -241,6 +244,22 @@ public ColumnSelectorFactory 
makeColumnSelectorFactory(ReadableOffset offset, bo
 );
   }
 
+  @Override
+  public Optional computeCacheKey()
+  {
+SegmentId segmentId = segment.getId();

Review comment:
   are you suggesting to move this logic to `SegmentId` class? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10366: Proposed changes for making joins cacheable

2020-09-14 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r487698508



##
File path: 
processing/src/main/java/org/apache/druid/segment/join/MapJoinableFactory.java
##
@@ -80,4 +80,21 @@ public boolean isDirectlyJoinable(DataSource dataSource)
 }
 return maybeJoinable;
   }
+
+  @Override
+  public Optional computeJoinCacheKey(DataSource dataSource)

Review comment:
   Right now as I see that additional info comes from the data source 
itself. The joinable factory is deciding based on the `dataSource` whether it 
can build the joinable factory.  can you share any hypothetical examples 
wherein one factory decides to build Joinable but the other one does not on the 
basis of `JoinConditionAnalysis`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10366: Proposed changes for making joins cacheable

2020-09-14 Thread GitBox


abhishekagarwal87 commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r487695326



##
File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
##
@@ -118,6 +126,47 @@ public static boolean isPrefixedBy(final String 
columnName, final String prefix)
 );
   }
 
+  /**
+   * Compute a cache key prefix for data sources that is to be used in segment 
level and result level caches. The
+   * data source can either be base (clauses is empty) or RHS of a join 
(clauses is non-empty). In both of the cases,
+   * a non-null cache is returned. However, the cache key is null if there is 
a join and some of the right data sources
+   * participating in the join do not support caching yet
+   *
+   * @param dataSourceAnalysis
+   * @param joinableFactory
+   * @return
+   */
+  public static Optional computeDataSourceCacheKey(
+  final DataSourceAnalysis dataSourceAnalysis,
+  final JoinableFactory joinableFactory
+  )
+  {
+final CacheKeyBuilder keyBuilder;
+final List clauses = 
dataSourceAnalysis.getPreJoinableClauses();
+if (clauses.isEmpty()) {
+  keyBuilder = new CacheKeyBuilder(REGULAR_OPERATION);
+} else {
+  keyBuilder = new CacheKeyBuilder(JOIN_OPERATION);
+  for (PreJoinableClause clause : clauses) {
+if (!clause.getCondition().canHashJoin()) {

Review comment:
   I was thinking to keep it here so that I could just pass the 
`data-source` inside the `computeJoinCacheKey`.  The engine for hash join works 
outside the boundaries of joinable factories. so it felt simpler to just keep 
it here and joinable factories just worry about the data source. The other 
cache info from the condition is derived here in this class as well (original 
expression etc) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org