hudi-bot opened a new issue, #14468: URL: https://github.com/apache/hudi/issues/14468
https://github.com/apache/incubator-hudi/pull/519 ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-86 - Type: New Feature --- ## Comments 28/Apr/19 05:59;vinoth;Currently, the hudi log is not indexable. Hence, we cannot send inserts into the logs, forcing us to write out parquet files instead.. If we can solve this, we can unlock truly near-real time ingest on MOR storage. ;;; --- 06/May/19 04:41;nishith29;[~vc] Should we take the following approach * Add BloomFilter & Keys to the footer of every log block * Index Check has 2 steps ** check BloomFilter for existence of key ** If yes, read the keys from the footer of every log block of the log files and check for existence in the keys Although if we are reading bloom filter from footer, reading the keys from the footer and avoiding the bloom filter completely is also an option because we anyways have to seek for all log blocks to read the bloom filter (which is probably the time taking component). The size of the keys should not be that big, for example for 5 million keys in one log file is equivalent to (size_of(uuid) = 36 bytes * 5000000 = 5 MB). Obviously, if people have composite keys, that can get large in which case bloom filter might be helpful. Let me know your thoughts.;;; --- 02/Jul/20 08:47;xleesf;hi [~vinoth] what's the status here?;;; --- 02/Jul/20 21:01;vinoth;[~xleesf] I think the conclusion was we won't need this given RFC-08 .. and also the clustering approach - where we plan to create new file groups for inserts.. Atleast, we can wait till those efforts make progress and decide?;;; --- 03/Jul/20 04:50;xleesf;agree that after RFC-08 is done, RFC-06 is not needed.;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
