Hello, I see that Parquet already supports Bloom filters.
For my understanding, it currently uses them only on the entire value. Fo example, if I have a column "MovieTitle": - "The title of my movie" - "Another movie title" - "The best movie title" - ... Then the current Bloom filters can be used to find only the column chunks/pages that match an exact title. For example you can use the bloom filter to search for "The best movie title". It would be interesting to have *a bloom filter on the specific words*, instead of using the entire value: in this way you can search the word "best" in the "MovieTitle" column and find the titles that contain that specific word in an efficient way. It would enable a sort of full-text search of keywords inside text columns. It would also allow predicate pushdown for searches based on keywords. Would make sense to have such an addition? Is there any strategy already used by Parquet for fast keyword searches inside text columns? Best regards, Marco Colli AbstractBrain srls
