Hi Balaji/Vinoth,

Below is the log we obtained from Hudi.

20/01/22 10:30:03 INFO HiveSyncTool: Last commit time synced was found to
be 20200122094611
20/01/22 10:30:03 INFO HoodieHiveClient: Last commit time synced is
20200122094611, Getting commits since then
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20180108, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180108,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180108
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20180221, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180221,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180221
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20180102, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180102,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180102
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191007, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191007,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191007
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191128, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191128,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191128
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191127, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191127,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191127
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191006, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191006,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191006
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191009, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191009,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191009
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191129, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191129,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191129
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191008, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191008,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191008
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191120, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191120,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191120
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191122, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191122,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191122
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191001, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191001,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191001
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191121, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191121,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191121
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191124, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191124,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191124
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191003, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191003,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191003
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191002, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191002,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191002
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191123, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191123,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191123
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191005, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191005,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191005
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191126, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191126,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191126
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191125, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191125,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191125
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191004, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191004,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191004
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20181208, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181208,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181208
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20181207, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181207,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181207
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20181206, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181206,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181206
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20181205, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181205,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181205
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20180117, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180117,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20180117
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20181209, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181209,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181209
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20181204, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181204,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181204
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20181203, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181203,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181203
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20181202, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181202,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181202
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20181201, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181201,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20181201
20/01/22 10:30:04 INFO HoodieHiveClient: Partition Location changes.
StorageVal=20191117, Existing Hive
Path=/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191117,
New
Location=s3a://dataplatform.internal.warehouse/data/warehouse/hudi/external/prod_db/sales_order_item/dt=20191117

Regards,
Purushotham Pushpavanth



On Tue, 4 Feb 2020 at 05:50, Vinoth Chandar <[email protected]> wrote:

> Unfortunately, the mailing list does not support attachments, looks like :(
> Could you paste it inline?
>
> On Sat, Feb 1, 2020 at 6:20 AM Purushotham Pushpavanthar <
> [email protected]> wrote:
>
> > Hi Balaji,
> >
> > The attachment contains the logs you asked for.
> > However, the only difference between storageValue and
> > fullStoragePartitionPath is *target-base-path*.
> > So if I'm not wrong, the code will be marking all partitions which got
> > UPDATE data for partition update. Hence time consuming.
> >
> > Regards,
> > Purushotham Pushpavanth
> >
> >
> >
> > On Mon, 20 Jan 2020 at 08:58, Balaji Varadarajan
> > <[email protected]> wrote:
> >
> >>  Hi Purushotham,
> >> I am unable to reproduce same  partitions getting hive-synced locally.
> >> Can you add the following log message in HoodieHiveClient.java and run
> the
> >> code and send us logs.
> >> diff --git
> >> a/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
> >> b/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
> >>
> >> index 4578bb2f..ba4b1147 100644
> >>
> >> --- a/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
> >>
> >> +++ b/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
> >>
> >> @@ -237,6 +237,8 @@ public class HoodieHiveClient {
> >>
> >>          if (!paths.containsKey(storageValue)) {
> >>
> >>
> >> events.add(PartitionEvent.newPartitionAddEvent(storagePartition));
> >>
> >>          } else if
> >> (!paths.get(storageValue).equals(fullStoragePartitionPath)) {
> >>
> >> +          LOG.info("Partition Location changes. StorageVal=" +
> >> storageValue
> >>
> >> +              + ", Existing Hive Path=" + paths.get(storageValue) + ",
> >> New Location=" + fullStoragePartitionPath);
> >>
> >>
> >> events.add(PartitionEvent.newPartitionUpdateEvent(storagePartition));
> >>
> >>          }
> >>
> >>        }
> >>
> >> THanks,Balaji.V
> >>     On Friday, January 17, 2020, 03:44:08 AM PST, Purushotham
> >> Pushpavanthar <[email protected]> wrote:
> >>
> >>  Hi,
> >>
> >> I noticed that
> >> *org.apache.hudi.hive.HoodieHiveClient#updatePartitionsToTable()* is
> time
> >> consuming while running HUDI on set of records which contains data for
> >> large set of partitions. All it is doing is setting location for each
> >> updated partition path. However,
> >> *org.apache.hudi.hive.HoodieHiveClient#addPartitionsToTable()
> >> *is taking care of adding new partitions to the table.
> >>
> >>   1. For a given table, whose base path doesn't change (usually it
> doesn't
> >>   in production), why *updatePartitionsToTable() *is needed? Can you
> >>   please throw some light on any such case where this is needed?
> >>   2. If it is required, can we do something to optimise the time
> consumed
> >>   by this operation? Currently, the *Alter Statements* are executed one
> by
> >>   one on each (partition, path) pair for every updated partition.
> >>
> >>
> >>
> >> Regards,
> >> Purushotham Pushpavanth
> >>
> >
> >
>

Reply via email to