hudi-bot opened a new issue, #14468:
URL: https://github.com/apache/hudi/issues/14468

   https://github.com/apache/incubator-hudi/pull/519
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-86
   - Type: New Feature
   
   
   ---
   
   
   ## Comments
   
   28/Apr/19 05:59;vinoth;Currently, the hudi log is not indexable. Hence, we 
cannot send inserts into the logs, forcing us to write out parquet files 
instead.. If we can solve this, we can unlock truly near-real time ingest on 
MOR storage. ;;;
   
   ---
   
   06/May/19 04:41;nishith29;[~vc] Should we take the following approach 
    * Add BloomFilter & Keys to the footer of every log block
    * Index Check has 2 steps
    ** check BloomFilter for existence of key
    ** If yes, read the keys from the footer of every log block of the log 
files and check for existence in the keys
   
   Although if we are reading bloom filter from footer, reading the keys from 
the footer and avoiding the bloom filter completely is also an option because 
we anyways have to seek for all log blocks to read the bloom filter (which is 
probably the time taking component). The size of the keys should not be that 
big, for example for 5 million keys in one log file is equivalent to 
(size_of(uuid) = 36 bytes * 5000000 = 5 MB). Obviously, if people have 
composite keys, that can get large in which case bloom filter might be helpful.
   
   Let me know your thoughts.;;;
   
   ---
   
   02/Jul/20 08:47;xleesf;hi [~vinoth] what's the status here?;;;
   
   ---
   
   02/Jul/20 21:01;vinoth;[~xleesf] I think the conclusion was we won't need 
this given RFC-08 .. and also the clustering approach - where we plan to create 
new file groups for inserts.. 
   
    
   
   Atleast, we can wait till those efforts make progress and decide?;;;
   
   ---
   
   03/Jul/20 04:50;xleesf;agree that after RFC-08 is done, RFC-06 is not 
needed.;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to