subject:"\[I\] Upsert table backfill enhancement\: support externally partitioned data \[pinot\]"

Re: [I] Upsert table backfill enhancement: support externally partitioned data [pinot]

2024-04-24 Thread via GitHub



rohityadav1993 commented on issue #12987:
URL: https://github.com/apache/pinot/issues/12987#issuecomment-2075075108

   Another approach I believe can be utlized is defining a naming convention 
for uploaded segment similar to LLC. The segment name can capture the partition 
id. We already have a segment type as UPLOADED and 
`SegmentPartitionMetadataManager#getPartitionId` can be enhanced to extract 
partition id from name. 
   This would not require any changes to existing contracts or zk metadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [I] Upsert table backfill enhancement: support externally partitioned data [pinot]

2024-04-23 Thread via GitHub



tibrewalpratik17 commented on issue #12987:
URL: https://github.com/apache/pinot/issues/12987#issuecomment-2071920309

   > Provide partition id externally:
   Option 1: Provide partition id as http headers during segment upload
   Option 2: Provide partition id as part of uploaded segment metadata(not as 
columnPartitionMap) (metadata.properties)
   
   IMO if we go for option-2, then we should be consistent to add this / update 
this metadata for all present segments too. Option-1 is better in that aspect 
as we already pass a lot of info as headers during segment upload and use each 
header as more of a config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [I] Upsert table backfill enhancement: support externally partitioned data [pinot]

2024-04-22 Thread via GitHub



Jackie-Jiang commented on issue #12987:
URL: https://github.com/apache/pinot/issues/12987#issuecomment-2071013541

   For real-time ingested data, the partition must match the upstream partition 
id to ensure the upsert assumption of all data of the same partition served by 
the same server, and I don't think we can loose this requirement.
   
   `Partition function` is required for partition pruning. If partition pruning 
is not required, then we may allow custom partition id without a partition 
function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[I] Upsert table backfill enhancement: support externally partitioned data [pinot]

2024-04-22 Thread via GitHub

rohityadav1993 opened a new issue, #12987:
URL: https://github.com/apache/pinot/issues/12987

## Problem
#6567 allows uploading a batch generated segment to Pinot upsert realtime
table. Partitioned data is handled by defining the partition column in
`segmentPartitionConfig.columnPartitionMap`. The addSegment flow uses the
config to identify the partition value of the column in metadata.properties of
the uplaoded segment and then assign the segment to instance based on the
partition id and instance assignment zk metadata.
This puts a restriction to define a partition column that is part of the
table column(primary key) and also use only the configured [partition
function](https://github.com/apache/pinot/blob/a5c728f549fe1be5560a88080caaa2063def3d87/pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunctionFactory.java#L31)
for partitioning data during segment creating in batch job using
`SegmentIndexCreationDriverImpl`

This restriction is not applicable for stream generated segments during
realtime ingestion. The partition id is identified using the LLC segment name
convention. The stream is partitioned externally and Pinot table does not need
to be aware of the partitioned column/function.

## Proposal
Proposal to enhance `SegmentAssignment.assignSegment()` flow to rely on
externally provided partition id for an uplaoded segment.

1. Need to persist the partition id as part of segment zk metadata.
2. Modify the `StrictRealtimeSegmentAssignment.assignSegment` to get
partition id from zk metadata.
3. Provide partition id externally:
1. Option 1: Provide partition id as http headers during segment upload
2. Option 2: Provide partition id as part of uploaded segment
metadata(not as columnPartitionMap) (metadata.properties)

Related: #10896, #11914

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [I] Upsert table backfill enhancement: support externally partitioned data [pinot]

Re: [I] Upsert table backfill enhancement: support externally partitioned data [pinot]

Re: [I] Upsert table backfill enhancement: support externally partitioned data [pinot]

[I] Upsert table backfill enhancement: support externally partitioned data [pinot]

4 matches

Site Navigation

Mail list logo

Footer information