soumilshah1995 opened a new issue, #11705:
URL: https://github.com/apache/hudi/issues/11705
Hi everyone,
I’m working with a PostgreSQL table that uses a hash-based partitioning
strategy. Here is the setup:
```
-- Create the main sales table
CREATE TABLE public.sales
(
salesid SERIAL,
invoiceid integer,
itemid integer,
category text,
price numeric(10,2),
quantity integer,
orderdate date,
destinationstate text,
shippingtype text,
referral text,
updated_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (salesid, invoiceid) -- Include the partitioning column in
the primary key
) PARTITION BY HASH (invoiceid);
-- Create partitions
CREATE TABLE public.sales_part_0 PARTITION OF public.sales FOR VALUES WITH
(MODULUS 4, REMAINDER 0);
CREATE TABLE public.sales_part_1 PARTITION OF public.sales FOR VALUES WITH
(MODULUS 4, REMAINDER 1);
CREATE TABLE public.sales_part_2 PARTITION OF public.sales FOR VALUES WITH
(MODULUS 4, REMAINDER 2);
CREATE TABLE public.sales_part_3 PARTITION OF public.sales FOR VALUES WITH
(MODULUS 4, REMAINDER 3);
-- Insert data
INSERT INTO public.sales (invoiceid, itemid, category, price, quantity,
orderdate, destinationstate, shippingtype, referral)
VALUES
(101, 1, 'Electronics', 599.99, 2, '2023-11-21', 'California',
'Express', 'Friend'),
(102, 3, 'Clothing', 49.99, 5, '2023-11-22', 'New York', 'Standard',
'OnlineAd'),
(103, 2, 'Home & Garden', 199.50, 1, '2023-11-23', 'Texas', 'Express',
'WordOfMouth'),
(104, 4, 'Books', 15.75, 3, '2023-11-24', 'Florida', 'Standard',
'SocialMedia'),
(105, 2, 'Home & Garden', 199.50, 1, '2023-11-23', 'Texas', 'Express',
'WordOfMouth');
```
I have set up a Debezium connector with the following configuration:
```
name=PostgresConnector
connector.class=io.debezium.connector.postgresql.PostgresConnector
tasks.max=1
database.user=hive
database.dbname=metastore
database.hostname=metastore_db
database.password=hive
database.server.name=hive
table.include.list=public.sales
database.port=5432
plugin.name=pgoutput
tombstones.on.delete=false
```
This configuration creates the following topics:

hive.public.sales
hive.public.sales_part_0
hive.public.sales_part_1
hive.public.sales_part_2
hive.public.sales_part_3
I am looking for recommendations on how to set up DeltaStreamer jobs for
each partition. What is the best approach to handle this setup effectively?
Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]