MetaData and other questions

Giorgio Zoppi Fri, 01 Oct 2021 13:34:17 -0700

Hello IOTDBers,
I am slowly making progress.

// List of <name, offset, childMetadataIndexType>
private MetadataIndexNode metadataIndex;


I am learning again,
You model the index with a MetadataIndexNode, which is essentially a list
of entries.
MetadataIndexEntry, that it's a pair/tuple (here using guava will help).
Looks like this:
<name, offset, indexType>, <name, offset, indexType>, <name, offset,
indexType>.
Questions:

- Is this the layout you are looking for? Do i understand well?
- Is there here space to innovate with hat tries (
https://github.com/tpn/pdfs/blob/master/HAT-trie%20-%20A%20Cache-conscious%20Trie-based%20Data%20Structure%20for%20Strings%20-%202007%20(CRPITV62Askitis).pdf)
instead of using a list?
What do you think? I mean do you use later MetadataIndexNode somewhere in
the server?

The next step might be having a sample this:

/*
* on windows you may want use <format>
*/
#include <fmt/core.h>

using iotdb::tsfile::FileWriter;
using iotdb::tsfile::TSRecord;
using iotdb::tsfile::DataPoint;

int main(int argc, char** argv) {
try {
FileWriter writer("/tmp/newfile.tsfile");
if (fp.Ok()) {
TSRecord record{10000, "d1"};
record.add(DataPoint<float>("s1", 5.0f));
record.add(DataPoint<int>("s2", 5));
auto [ret, written] = writer.write(record);
if (ret) {
fmt.print("Written {} with success", written);
}
}
} catch (const std::exception& e) {
fmt.print("exception {}", e.what());
}
}

Working in a BDD test (I use catch2 with BDD -
https://github.com/catchorg/Catch2/blob/devel/docs/tutorial.md#bdd-style).
Yes looks like Go.
Better keeping simple until we don't have writing, reading and querying.
Then we can think about async io/io_uring/coroutines and things like that.

>From your side i might benifit if you have time a Java  small program that
generates different part of  the file on disk, in order to do e2e testing,
something that works like that:
$ java org.apache.iotdb.native.FileGen --record --name data.dat   //
generate a recod on disk
$ java org.apache.iotdb.native.FileGen --chunk --name chunk.dat // generate
a chunk
And so on.
Having dat will allow me to create a set of pytests that:
-  launch the file generation for a item
-  launch a native application that read the file part

Ok, another interesting point are the bloom filters? Why do you use it?
In parquet they use it for this reason:
*In their current format, column statistics and dictionaries can be used
for predicate pushdown. Statistics include minimum and maximum value, which
can be used to filter out values not in the range. Dictionaries are more
specific, and readers can filter out values that are between min and max
but not in the dictionary. However, when there are too many distinct
values, writers sometimes choose not to add dictionaries because of the
extra space they occupy. This leaves columns with large cardinalities and
widely separated min and max without support for predicate pushdown.*
Is it for the same reason?
As an implementation can I use this
http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf
?
*From Cache-, Hash- and Space-Efficient Bloom Filters Felix Putze, Peter
Sanders, Johannes Singler.*

Thanks for your patience,
Best Regards,
Giorgio.

MetaData and other questions

Reply via email to