soumilshah1995 opened a new issue, #11705:
URL: https://github.com/apache/hudi/issues/11705

   Hi everyone,
   
   I’m working with a PostgreSQL table that uses a hash-based partitioning 
strategy. Here is the setup:
   
   ```
   -- Create the main sales table
   CREATE TABLE public.sales
   (
       salesid SERIAL,
       invoiceid integer,
       itemid integer,
       category text,
       price numeric(10,2),
       quantity integer,
       orderdate date,
       destinationstate text,
       shippingtype text,
       referral text,
       updated_at TIMESTAMP DEFAULT NOW(),
       PRIMARY KEY (salesid, invoiceid)  -- Include the partitioning column in 
the primary key
   ) PARTITION BY HASH (invoiceid);
   
   -- Create partitions
   CREATE TABLE public.sales_part_0 PARTITION OF public.sales FOR VALUES WITH 
(MODULUS 4, REMAINDER 0);
   CREATE TABLE public.sales_part_1 PARTITION OF public.sales FOR VALUES WITH 
(MODULUS 4, REMAINDER 1);
   CREATE TABLE public.sales_part_2 PARTITION OF public.sales FOR VALUES WITH 
(MODULUS 4, REMAINDER 2);
   CREATE TABLE public.sales_part_3 PARTITION OF public.sales FOR VALUES WITH 
(MODULUS 4, REMAINDER 3);
   
   -- Insert data
   INSERT INTO public.sales (invoiceid, itemid, category, price, quantity, 
orderdate, destinationstate, shippingtype, referral)
   VALUES
       (101, 1, 'Electronics', 599.99, 2, '2023-11-21', 'California', 
'Express', 'Friend'),
       (102, 3, 'Clothing', 49.99, 5, '2023-11-22', 'New York', 'Standard', 
'OnlineAd'),
       (103, 2, 'Home & Garden', 199.50, 1, '2023-11-23', 'Texas', 'Express', 
'WordOfMouth'),
       (104, 4, 'Books', 15.75, 3, '2023-11-24', 'Florida', 'Standard', 
'SocialMedia'),
       (105, 2, 'Home & Garden', 199.50, 1, '2023-11-23', 'Texas', 'Express', 
'WordOfMouth');
   
   ```
   
   I have set up a Debezium connector with the following configuration:
   
   ```
   name=PostgresConnector
   connector.class=io.debezium.connector.postgresql.PostgresConnector
   tasks.max=1
   database.user=hive
   database.dbname=metastore
   database.hostname=metastore_db
   database.password=hive
   database.server.name=hive
   table.include.list=public.sales
   database.port=5432
   plugin.name=pgoutput
   tombstones.on.delete=false
   
   ```
   This configuration creates the following topics:
   
![image](https://github.com/user-attachments/assets/57ab5fd5-acbd-43c4-a6a0-7c9f952dc118)
   
   hive.public.sales
   hive.public.sales_part_0
   hive.public.sales_part_1
   hive.public.sales_part_2
   hive.public.sales_part_3
   I am looking for recommendations on how to set up DeltaStreamer jobs for 
each partition. What is the best approach to handle this setup effectively?
   
   Thank you!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to