Hi Balaji/Vinoth, Below is the log we obtained from Hudi.
20/01/22 10:30:03 INFO HiveSyncTool: Last commit time synced was found to be 20200122094611 20/01/22 10:30:03 INFO HoodieHiveClient: Last commit time synced is 20200122094611, Getting commits since then 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20180108, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180108, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180108 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20180221, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180221, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180221 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20180102, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180102, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180102 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191007, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191007, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191007 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191128, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191128, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191128 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191127, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191127, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191127 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191006, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191006, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191006 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191009, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191009, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191009 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191129, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191129, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191129 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191008, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191008, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191008 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191120, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191120, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191120 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191122, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191122, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191122 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191001, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191001, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191001 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191121, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191121, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191121 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191124, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191124, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191124 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191003, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191003, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191003 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191002, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191002, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191002 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191123, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191123, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191123 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191005, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191005, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191005 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191126, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191126, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191126 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191125, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191125, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191125 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191004, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191004, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191004 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20181208, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181208, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181208 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20181207, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181207, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181207 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20181206, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181206, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181206 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20181205, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181205, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181205 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20180117, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180117, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180117 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20181209, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181209, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181209 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20181204, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181204, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181204 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20181203, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181203, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181203 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20181202, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181202, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181202 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20181201, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181201, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181201 20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes. StorageVal=20191117, Existing Hive Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191117, New Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191117 Regards, Purushotham Pushpavanth On Tue, 4 Feb 2020 at 05:50, Vinoth Chandar <[email protected]> wrote: > Unfortunately, the mailing list does not support attachments, looks like :( > Could you paste it inline? > > On Sat, Feb 1, 2020 at 6:20 AM Purushotham Pushpavanthar < > [email protected]> wrote: > > > Hi Balaji, > > > > The attachment contains the logs you asked for. > > However, the only difference between storageValue and > > fullStoragePartitionPath is *target-base-path*. > > So if I'm not wrong, the code will be marking all partitions which got > > UPDATE data for partition update. Hence time consuming. > > > > Regards, > > Purushotham Pushpavanth > > > > > > > > On Mon, 20 Jan 2020 at 08:58, Balaji Varadarajan > > <[email protected]> wrote: > > > >> Hi Purushotham, > >> I am unable to reproduce same partitions getting hive-synced locally. > >> Can you add the following log message in HoodieHiveClient.java and run > the > >> code and send us logs. > >> diff --git > >> a/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java > >> b/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java > >> > >> index 4578bb2f..ba4b1147 100644 > >> > >> --- a/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java > >> > >> +++ b/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java > >> > >> @@ -237,6 +237,8 @@ public class HoodieHiveClient { > >> > >> if (!paths.containsKey(storageValue)) { > >> > >> > >> events.add(PartitionEvent.newPartitionAddEvent(storagePartition)); > >> > >> } else if > >> (!paths.get(storageValue).equals(fullStoragePartitionPath)) { > >> > >> + LOG.info("Partition Location changes. StorageVal=" + > >> storageValue > >> > >> + + ", Existing Hive Path=" + paths.get(storageValue) + ", > >> New Location=" + fullStoragePartitionPath); > >> > >> > >> events.add(PartitionEvent.newPartitionUpdateEvent(storagePartition)); > >> > >> } > >> > >> } > >> > >> THanks,Balaji.V > >> On Friday, January 17, 2020, 03:44:08 AM PST, Purushotham > >> Pushpavanthar <[email protected]> wrote: > >> > >> Hi, > >> > >> I noticed that > >> *org.apache.hudi.hive.HoodieHiveClient#updatePartitionsToTable()* is > time > >> consuming while running HUDI on set of records which contains data for > >> large set of partitions. All it is doing is setting location for each > >> updated partition path. However, > >> *org.apache.hudi.hive.HoodieHiveClient#addPartitionsToTable() > >> *is taking care of adding new partitions to the table. > >> > >> 1. For a given table, whose base path doesn't change (usually it > doesn't > >> in production), why *updatePartitionsToTable() *is needed? Can you > >> please throw some light on any such case where this is needed? > >> 2. If it is required, can we do something to optimise the time > consumed > >> by this operation? Currently, the *Alter Statements* are executed one > by > >> one on each (partition, path) pair for every updated partition. > >> > >> > >> > >> Regards, > >> Purushotham Pushpavanth > >> > > > > >
