Unfortunately, the mailing list does not support attachments, looks like :(
Could you paste it inline?

On Sat, Feb 1, 2020 at 6:20 AM Purushotham Pushpavanthar <
[email protected]> wrote:

> Hi Balaji,
>
> The attachment contains the logs you asked for.
> However, the only difference between storageValue and
> fullStoragePartitionPath is *target-base-path*.
> So if I'm not wrong, the code will be marking all partitions which got
> UPDATE data for partition update. Hence time consuming.
>
> Regards,
> Purushotham Pushpavanth
>
>
>
> On Mon, 20 Jan 2020 at 08:58, Balaji Varadarajan
> <[email protected]> wrote:
>
>>  Hi Purushotham,
>> I am unable to reproduce same  partitions getting hive-synced locally.
>> Can you add the following log message in HoodieHiveClient.java and run the
>> code and send us logs.
>> diff --git
>> a/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
>> b/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
>>
>> index 4578bb2f..ba4b1147 100644
>>
>> --- a/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
>>
>> +++ b/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
>>
>> @@ -237,6 +237,8 @@ public class HoodieHiveClient {
>>
>>          if (!paths.containsKey(storageValue)) {
>>
>>
>> events.add(PartitionEvent.newPartitionAddEvent(storagePartition));
>>
>>          } else if
>> (!paths.get(storageValue).equals(fullStoragePartitionPath)) {
>>
>> +          LOG.info("Partition Location changes. StorageVal=" +
>> storageValue
>>
>> +              + ", Existing Hive Path=" + paths.get(storageValue) + ",
>> New Location=" + fullStoragePartitionPath);
>>
>>
>> events.add(PartitionEvent.newPartitionUpdateEvent(storagePartition));
>>
>>          }
>>
>>        }
>>
>> THanks,Balaji.V
>>     On Friday, January 17, 2020, 03:44:08 AM PST, Purushotham
>> Pushpavanthar <[email protected]> wrote:
>>
>>  Hi,
>>
>> I noticed that
>> *org.apache.hudi.hive.HoodieHiveClient#updatePartitionsToTable()* is time
>> consuming while running HUDI on set of records which contains data for
>> large set of partitions. All it is doing is setting location for each
>> updated partition path. However,
>> *org.apache.hudi.hive.HoodieHiveClient#addPartitionsToTable()
>> *is taking care of adding new partitions to the table.
>>
>>   1. For a given table, whose base path doesn't change (usually it doesn't
>>   in production), why *updatePartitionsToTable() *is needed? Can you
>>   please throw some light on any such case where this is needed?
>>   2. If it is required, can we do something to optimise the time consumed
>>   by this operation? Currently, the *Alter Statements* are executed one by
>>   one on each (partition, path) pair for every updated partition.
>>
>>
>>
>> Regards,
>> Purushotham Pushpavanth
>>
>
>

Reply via email to