Hello IOTDBers, I am slowly making progress. // List of <name, offset, childMetadataIndexType> private MetadataIndexNode metadataIndex;
I am learning again, You model the index with a MetadataIndexNode, which is essentially a list of entries. MetadataIndexEntry, that it's a pair/tuple (here using guava will help). Looks like this: <name, offset, indexType>, <name, offset, indexType>, <name, offset, indexType>. Questions: - Is this the layout you are looking for? Do i understand well? - Is there here space to innovate with hat tries ( https://github.com/tpn/pdfs/blob/master/HAT-trie%20-%20A%20Cache-conscious%20Trie-based%20Data%20Structure%20for%20Strings%20-%202007%20(CRPITV62Askitis).pdf) instead of using a list? What do you think? I mean do you use later MetadataIndexNode somewhere in the server? The next step might be having a sample this: /* * on windows you may want use <format> */ #include <fmt/core.h> using iotdb::tsfile::FileWriter; using iotdb::tsfile::TSRecord; using iotdb::tsfile::DataPoint; int main(int argc, char** argv) { try { FileWriter writer("/tmp/newfile.tsfile"); if (fp.Ok()) { TSRecord record{10000, "d1"}; record.add(DataPoint<float>("s1", 5.0f)); record.add(DataPoint<int>("s2", 5)); auto [ret, written] = writer.write(record); if (ret) { fmt.print("Written {} with success", written); } } } catch (const std::exception& e) { fmt.print("exception {}", e.what()); } } Working in a BDD test (I use catch2 with BDD - https://github.com/catchorg/Catch2/blob/devel/docs/tutorial.md#bdd-style). Yes looks like Go. Better keeping simple until we don't have writing, reading and querying. Then we can think about async io/io_uring/coroutines and things like that. >From your side i might benifit if you have time a Java small program that generates different part of the file on disk, in order to do e2e testing, something that works like that: $ java org.apache.iotdb.native.FileGen --record --name data.dat // generate a recod on disk $ java org.apache.iotdb.native.FileGen --chunk --name chunk.dat // generate a chunk And so on. Having dat will allow me to create a set of pytests that: - launch the file generation for a item - launch a native application that read the file part Ok, another interesting point are the bloom filters? Why do you use it? In parquet they use it for this reason: *In their current format, column statistics and dictionaries can be used for predicate pushdown. Statistics include minimum and maximum value, which can be used to filter out values not in the range. Dictionaries are more specific, and readers can filter out values that are between min and max but not in the dictionary. However, when there are too many distinct values, writers sometimes choose not to add dictionaries because of the extra space they occupy. This leaves columns with large cardinalities and widely separated min and max without support for predicate pushdown.* Is it for the same reason? As an implementation can I use this http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf ? *From Cache-, Hash- and Space-Efficient Bloom Filters Felix Putze, Peter Sanders, Johannes Singler.* Thanks for your patience, Best Regards, Giorgio.
