Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-14 Thread via GitHub


klsince merged PR #13256:
URL: https://github.com/apache/pinot/pull/13256


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-14 Thread via GitHub


tibrewalpratik17 commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1639419684


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/TableUpsertMetadataManagerFactory.java:
##
@@ -61,6 +64,12 @@ public static TableUpsertMetadataManager create(TableConfig 
tableConfig,
 upsertConfig.setEnablePreload(
 
Boolean.parseBoolean(instanceUpsertConfig.getProperty(UPSERT_DEFAULT_ENABLE_PRELOAD,
 "false")));
   }
+
+  // server level config honoured only when table level config is not set 
to true
+  if (!upsertConfig.isAllowPartialUpsertConsumptionDuringCommit()) {
+
upsertConfig.setAllowPartialUpsertConsumptionDuringCommit(Boolean.parseBoolean(
+
instanceUpsertConfig.getProperty(UPSERT_DEFAULT_ALLOW_PARTIAL_UPSERT_CONSUMPTION_DURING_COMMIT,
 "false")));

Review Comment:
   Ah my bad! I see we have changed the behaviour from `pause` to `allow`. Then 
this makes sense! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-14 Thread via GitHub


tibrewalpratik17 commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1639381537


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/TableUpsertMetadataManagerFactory.java:
##
@@ -61,6 +64,12 @@ public static TableUpsertMetadataManager create(TableConfig 
tableConfig,
 upsertConfig.setEnablePreload(
 
Boolean.parseBoolean(instanceUpsertConfig.getProperty(UPSERT_DEFAULT_ENABLE_PRELOAD,
 "false")));
   }
+
+  // server level config honoured only when table level config is not set 
to true
+  if (!upsertConfig.isAllowPartialUpsertConsumptionDuringCommit()) {
+
upsertConfig.setAllowPartialUpsertConsumptionDuringCommit(Boolean.parseBoolean(
+
instanceUpsertConfig.getProperty(UPSERT_DEFAULT_ALLOW_PARTIAL_UPSERT_CONSUMPTION_DURING_COMMIT,
 "false")));

Review Comment:
   we wanted to keep this `true` by default right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-13 Thread via GitHub


rohityadav1993 commented on PR #13256:
URL: https://github.com/apache/pinot/pull/13256#issuecomment-2167355537

   Removed unnecessary changes in the QuickStartRunner.java which got pushed by 
mistake in the previous commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-13 Thread via GitHub


rohityadav1993 commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1638735568


##
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##
@@ -1611,6 +1622,10 @@ public RealtimeSegmentDataManager(SegmentZKMetadata 
segmentZKMetadata, TableConf
   _segmentLogger
   .info("Starting consumption on realtime consuming segment {} 
maxRowCount {} maxEndTime {}", llcSegmentName,
   _segmentMaxRowCount, new DateTime(_consumeEndTime, 
DateTimeZone.UTC));
+  _allowPartialUpsertConsumptionDuringCommit =
+  _realtimeTableDataManager.isPartialUpsertEnabled() ? 
_tableConfig.getUpsertConfig() != null

Review Comment:
   Sounds good. Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-13 Thread via GitHub


klsince commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1638623922


##
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##
@@ -1611,6 +1622,10 @@ public RealtimeSegmentDataManager(SegmentZKMetadata 
segmentZKMetadata, TableConf
   _segmentLogger
   .info("Starting consumption on realtime consuming segment {} 
maxRowCount {} maxEndTime {}", llcSegmentName,
   _segmentMaxRowCount, new DateTime(_consumeEndTime, 
DateTimeZone.UTC));
+  _allowPartialUpsertConsumptionDuringCommit =
+  _realtimeTableDataManager.isPartialUpsertEnabled() ? 
_tableConfig.getUpsertConfig() != null

Review Comment:
   the config name as `allowPartialUpsertConsumptionDuringCommit` or 
`default.allow.partial.upsert.consumption.during.commit` makes a lot sense and 
it's false by default.
   
   but this variable name `_allowPartialUpsertConsumptionDuringCommit` here 
might be a bit less readable. Here we had to set it true for 
non-paritial-upsert table. I'd call it `_allowConsumptionDuringCommit`, then 
it's more intuitive to leave it as true for non-paritial-upsert table
   
   ```
   _allowConsumptionDuringCommit = 
!_realtimeTableDataManager.isPartialUpsertEnabled()? true : 
_tableConfig.getUpsertConfig().isAllowPartialUpsertConsumptionDuringCommit() 
   // nit: no need to check getUpsertConfig() is not null as 
isPartialUpsertEnabled() is true already.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-05 Thread via GitHub


rohityadav1993 commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1627386271


##
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/UpsertConfig.java:
##
@@ -94,6 +94,9 @@ public enum ConsistencyMode {
   @JsonPropertyDescription("Whether to drop out-of-order record")
   private boolean _dropOutOfOrderRecord;
 
+  @JsonPropertyDescription("Whether to pause partial upsert table's partition 
ingestion during commit")
+  private boolean _pausePartialUpsertPartitionIngestionDuringCommit;

Review Comment:
   Making default as enabled sounds good. This is a behaviour change, I also 
think it makes sense to make this a cluster level property as well so it can be 
toggled at bulk.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-04 Thread via GitHub


tibrewalpratik17 commented on PR #13256:
URL: https://github.com/apache/pinot/pull/13256#issuecomment-2148085192

   Suggest to add `release-notes` label to this! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-04 Thread via GitHub


tibrewalpratik17 commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1626401266


##
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/UpsertConfig.java:
##
@@ -94,6 +94,9 @@ public enum ConsistencyMode {
   @JsonPropertyDescription("Whether to drop out-of-order record")
   private boolean _dropOutOfOrderRecord;
 
+  @JsonPropertyDescription("Whether to pause partial upsert table's partition 
ingestion during commit")
+  private boolean _pausePartialUpsertPartitionIngestionDuringCommit;

Review Comment:
   +1 on making this the default behaviour! 
   Would also suggest to add a cluster-level and table-level config to enable / 
disable it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-04 Thread via GitHub


Jackie-Jiang commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1626391555


##
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/UpsertConfig.java:
##
@@ -94,6 +94,9 @@ public enum ConsistencyMode {
   @JsonPropertyDescription("Whether to drop out-of-order record")
   private boolean _dropOutOfOrderRecord;
 
+  @JsonPropertyDescription("Whether to pause partial upsert table's partition 
ingestion during commit")
+  private boolean _pausePartialUpsertPartitionIngestionDuringCommit;

Review Comment:
   cc @tibrewalpratik17 What's your opinion on enabling the new feature by 
default?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-04 Thread via GitHub


Jackie-Jiang commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1626387944


##
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/UpsertConfig.java:
##
@@ -94,6 +94,9 @@ public enum ConsistencyMode {
   @JsonPropertyDescription("Whether to drop out-of-order record")
   private boolean _dropOutOfOrderRecord;
 
+  @JsonPropertyDescription("Whether to pause partial upsert table's partition 
ingestion during commit")
+  private boolean _pausePartialUpsertPartitionIngestionDuringCommit;

Review Comment:
   I think we should pause consumption by default. Consider renaming it to 
`allowPartialUpsertConsumptionDuringCommit` and reverse the behavior?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-04 Thread via GitHub


klsince commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1626349679


##
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##
@@ -959,7 +959,13 @@ AtomicBoolean getAcquiredConsumerSemaphore() {
 
   @VisibleForTesting
   SegmentBuildDescriptor buildSegmentInternal(boolean forCommit) {
-closeStreamConsumers();
+// for partial upsert tables, do not release 
_partitionGroupConsumerSemaphore proactively and rely on onDestroy()

Review Comment:
   nit: looks like `offload()` closes the consumers
   
   but anyway it looks right to me that the semaphore should be released in the 
end, when the segment gets replaced



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-03 Thread via GitHub


tibrewalpratik17 commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1624574643


##
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/UpsertConfig.java:
##
@@ -94,6 +94,9 @@ public enum ConsistencyMode {
   @JsonPropertyDescription("Whether to drop out-of-order record")
   private boolean _dropOutOfOrderRecord;
 
+  @JsonPropertyDescription("Whether to pause partial upsert table's partition 
ingestion during commit")
+  private boolean _pausePartialUpsertPartitionIngestionDuringCommit;
+

Review Comment:
   you can check `BaseTableDataManager#init` it takes in 
`InstanceDataManagerConfig` var



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-03 Thread via GitHub


rohityadav1993 commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1624569795


##
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/UpsertConfig.java:
##
@@ -94,6 +94,9 @@ public enum ConsistencyMode {
   @JsonPropertyDescription("Whether to drop out-of-order record")
   private boolean _dropOutOfOrderRecord;
 
+  @JsonPropertyDescription("Whether to pause partial upsert table's partition 
ingestion during commit")
+  private boolean _pausePartialUpsertPartitionIngestionDuringCommit;
+

Review Comment:
   The suggestion sounds good, do you know how it is being done for any other 
table config?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



Re: [PR] [partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager [pinot]

2024-06-03 Thread via GitHub


tibrewalpratik17 commented on code in PR #13256:
URL: https://github.com/apache/pinot/pull/13256#discussion_r1623985038


##
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/UpsertConfig.java:
##
@@ -94,6 +94,9 @@ public enum ConsistencyMode {
   @JsonPropertyDescription("Whether to drop out-of-order record")
   private boolean _dropOutOfOrderRecord;
 
+  @JsonPropertyDescription("Whether to pause partial upsert table's partition 
ingestion during commit")
+  private boolean _pausePartialUpsertPartitionIngestionDuringCommit;
+

Review Comment:
   @rohityadav1993 can you also add it as a cluster level config where this 
table-level config will override the cluster-level value? In Uber, we would 
like to roll it out for almost all tables as data-correctness > 
data-ingestion-SLAs. For tables where ingestion lag is in order of 10s of 
minutes there we can make a call to use this table level config.



##
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##
@@ -1316,7 +1322,13 @@ public void goOnlineFromConsuming(SegmentZKMetadata 
segmentZKMetadata)
 
   protected void downloadSegmentAndReplace(SegmentZKMetadata segmentZKMetadata)
   throws Exception {
-closeStreamConsumers();
+// for partial upsert tables, do not release 
_partitionGroupConsumerSemaphore proactively and rely on onDestroy()
+// to release the semaphore. This ensures new consuming segment is not 
consuming until the segment replacement is
+// complete.
+if (!_realtimeTableDataManager.isPartialUpsertEnabled() || 
!_tableConfig.getUpsertConfig()
+.isPausePartialUpsertPartitionIngestionDuringCommit()) {

Review Comment:
   we should check if `tableConfig.getUpsertConfig` is not null as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org