[GitHub] [incubator-pinot] npawar commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

2020-09-16 Thread GitBox


npawar commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489672177



##
File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##
@@ -100,8 +110,11 @@ public void map()
   }
 
   // Partitioning
-  // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table 
config partitioner. Combine both to get final partition.
-  String partition = _partitioner.getPartition(reusableRow);
+  int p = 0;
+  for (Partitioner partitioner : _partitioners) {
+partitions[p++] = partitioner.getPartition(reusableRow);
+  }
+  String partition = StringUtil.join("_", partitions);

Review comment:
   Practically, for the use case I described, it will be 2. But it need not 
be (there could be more custom logic). Also the json config spec has List of 
partitions, so I just continued it as List.
   All these things are not set in stone as of now. We will be continuosly 
re-evaluating, optimizing and editing this framework, as we begin using it (for 
minion, and merge). It is difficult to predict right now and I prefer to not 
introduce restrictions on number of partitioners.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] npawar commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

2020-09-16 Thread GitBox


npawar commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489182763



##
File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##
@@ -100,8 +110,11 @@ public void map()
   }
 
   // Partitioning
-  // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table 
config partitioner. Combine both to get final partition.
-  String partition = _partitioner.getPartition(reusableRow);
+  int p = 0;
+  for (Partitioner partitioner : _partitioners) {
+partitions[p++] = partitioner.getPartition(reusableRow);
+  }
+  String partition = StringUtil.join("_", partitions);

Review comment:
   Use case: say data in input segments is spread across 3 days. In the 
resulting segments, we want to create a segment for each day. Additionally, we 
want partitioning on some id column for query purposes.
   
   Partitioning by time column is first step. This doesn't affect segment 
metadata or broker routing. This is simply used by the framework, and it's 
scope ends with the framework. It's merely helping create date aligned input 
files for Segment generation stage.
   Partitioning by id column is second step. This is for queries. This will be 
whatever is in the table config. Only this partition will get set in the 
segment metadata. And even that will happen during segment creation.
   See this comment and 
discussion:https://github.com/apache/incubator-pinot/pull/5934#discussion_r486006754





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] npawar commented on a change in pull request #6021: List of partitioners in SegmentProcessorFramework

2020-09-15 Thread GitBox


npawar commented on a change in pull request #6021:
URL: https://github.com/apache/incubator-pinot/pull/6021#discussion_r489102318



##
File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentMapper.java
##
@@ -100,8 +110,11 @@ public void map()
   }
 
   // Partitioning
-  // TODO: 2 step partitioner. 1) Apply custom partitioner 2) Apply table 
config partitioner. Combine both to get final partition.
-  String partition = _partitioner.getPartition(reusableRow);
+  int p = 0;
+  for (Partitioner partitioner : _partitioners) {
+partitions[p++] = partitioner.getPartition(reusableRow);
+  }
+  String partition = StringUtil.join("_", partitions);

Review comment:
   Actually, it is not significant at all. It can be changed, and is not 
used by any other components. It won't even matter beyond the scope of that 
joiner line. And hence I don't think it needs to be scoped out of this class, 
or even out of this method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org