[GitHub] [hudi] prashantwason commented on pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-03-15 Thread GitBox


prashantwason commented on pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#issuecomment-799740584


   Looks good @vinothchandar 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-03-12 Thread GitBox


prashantwason commented on pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#issuecomment-797665038


   @vinothchandar  PTAL. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-03-09 Thread GitBox


prashantwason commented on pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#issuecomment-794269612


   @vinothchandar and I discussed simplifying this PR. The following changes 
are to be implemented:
   1. Remove the "reuse" configuration as it does not make sense for 
performance reasons. 
 - When timeline server is used, reuse should be on
 - When timeline server is not used, each executor has its own instance 
of the Metadata Reader and reuse is implicit.
   
   2. Simplify the above code to use the instance variables
   3. Locking is not required because of the usage pattern in #1. Locking will 
still be required in HFileReader because KeyScanner is not thread safe.
   
   
   I am working on updating this PR.




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-01-27 Thread GitBox


prashantwason commented on pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#issuecomment-768576158


   With enableReuse=false, the caching of readers needs special handling 
because:
   1. Multiple threads may call into 
HoodieBackedTableMetadata.getRecordByKeyFromMetadata() to read their respective 
keys
   2. If enableReuse=false, then each of these threads will try to close the 
readers after reading the key
   
   Hence, we essentially have two codepaths:
   1. enableReuse=false then  readers cannot be cached
   2. enableReuse=true then the readers can be cached.
   
   
   I have updated the patch to handle both these cases by modifying the 
openFileSliceIfNeeded function (renamed to getReader) which returns either:
   1. cached readers when enableReuse=true
   2. newly opened readers when enableReuse=false



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-01-27 Thread GitBox


prashantwason commented on pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#issuecomment-768495604


   > > The size of the base file was 3MB so this means that the in-memory HFile 
block caching was also working.
   > 
   > Trying to understand this part. Was the workload, trying to fetch all the 
keys out of the HFile or just 1?
   
   The workload was a commit followed by a Clean operation with 
num_versions_retained=1 so it will clean all partitions. Hence, number of key 
lookups should be equal to number of partitions and all the keys should have 
been read from the HFile.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org