Hi, We are using Parquet with Drill and we are quite happy, thank you all very much.
We use Drill to query it and I wonder if there are some sort of best practices, recommended setup or any tips you could share. I also wanted to ask about some of the thing we think/hope are in scope and what affect they will have on performance. *Time-stamp support (bigint+delta_encoding) * We are using Avro for inbound/fresh data and I believe 1.8 finally has date/timestamp support and I wonder when Parquet will support timestamp+mills in a more efficient (encoded) way. *Predicate-Pushdown for dictionary values* I hope I'm using the right terms but I'm basically referring to the ability to skip segments if a the value being searched is not in the dictionary for that segment (when/if dictionary encoding is used). I may be wrong in thinking that this will speed up our queries quite a bit but I think our date and some of our queries would. *Bloom Filters* I monitored some discussion here on implementing bloom filters and some initial tests that were done to assess possible benefits. How did that go? (Meaning will it be done and are there any initial numbers regarding potential gain) *Multi column overhead* We are seeing that queries that fetch values from many columns are a lot slower then the "same" queries when run with only a few columns. This is to be expected but I wonder if there are any tricks/tips available here. We are, for example, using nested structures that could be flattened but that seems irrelevant. Best regards, -Stefán