Re: MetaData and other questions

孙泽嵩 Fri, 01 Oct 2021 18:10:35 -0700

Hi Giorgio,

Thanks for your questions, and I'd like to answer the ones about 
MetadataIndexNode.



&gt; Is this the layout you are looking for?

Yes, each node is composed of a few entries, and each entry is a tuple.

&gt; I mean do you use later MetadataIndexNode somewhere in the server?

No, this structure is only used in MetadataIndexTree.

&gt; Is there here space to innovate with hat tries instead of using a list?

Actually, I think this is worth trying. We are also thinking about other 
structures to construct the index tree.


Thanks for your questions again!


--
Zesong Sun
School of Software, Tsinghua University

孙泽嵩
清华大学 软件学院

&gt; -----Original Messages-----
&gt; From: "Giorgio Zoppi" <[email protected]>
&gt; Sent Time: 2021-10-02 04:33:51 (Saturday)
&gt; To: dev <[email protected]>
&gt; Cc: 
&gt; Subject: MetaData and other questions
&gt; 
&gt; Hello IOTDBers,
&gt; I am slowly making progress.
&gt; 
&gt; // List of <name, offset,="" childmetadataindextype="">
&gt; private MetadataIndexNode metadataIndex;
&gt; 
&gt; I am learning again,
&gt; You model the index with a MetadataIndexNode, which is essentially a list
&gt; of entries.
&gt; MetadataIndexEntry, that it's a pair/tuple (here using guava will help).
&gt; Looks like this:
&gt; <name, offset,="" indextype="">, <name, offset,="" indextype="">, <name, 
offset,=""> indexType&gt;.
&gt; Questions:
&gt; 
&gt; - Is this the layout you are looking for? Do i understand well?
&gt; - Is there here space to innovate with hat tries (
&gt; 
https://github.com/tpn/pdfs/blob/master/HAT-trie%20-%20A%20Cache-conscious%20Trie-based%20Data%20Structure%20for%20Strings%20-%202007%20(CRPITV62Askitis).pdf)
&gt; instead of using a list?
&gt; What do you think? I mean do you use later MetadataIndexNode somewhere in
&gt; the server?
&gt; 
&gt; The next step might be having a sample this:
&gt; 
&gt; /*
&gt; * on windows you may want use <format>
&gt; */
&gt; #include <fmt core.h="">
&gt; 
&gt; using iotdb::tsfile::FileWriter;
&gt; using iotdb::tsfile::TSRecord;
&gt; using iotdb::tsfile::DataPoint;
&gt; 
&gt; int main(int argc, char** argv) {
&gt; try {
&gt; FileWriter writer("/tmp/newfile.tsfile");
&gt; if (fp.Ok()) {
&gt; TSRecord record{10000, "d1"};
&gt; record.add(DataPoint<float>("s1", 5.0f));
&gt; record.add(DataPoint<int>("s2", 5));
&gt; auto [ret, written] = writer.write(record);
&gt; if (ret) {
&gt; fmt.print("Written {} with success", written);
&gt; }
&gt; }
&gt; } catch (const std::exception&amp; e) {
&gt; fmt.print("exception {}", e.what());
&gt; }
&gt; }
&gt; 
&gt; Working in a BDD test (I use catch2 with BDD -
&gt; https://github.com/catchorg/Catch2/blob/devel/docs/tutorial.md#bdd-style).
&gt; Yes looks like Go.
&gt; Better keeping simple until we don't have writing, reading and querying.
&gt; Then we can think about async io/io_uring/coroutines and things like that.
&gt; 
&gt; From your side i might benifit if you have time a Java  small program that
&gt; generates different part of  the file on disk, in order to do e2e testing,
&gt; something that works like that:
&gt; $ java org.apache.iotdb.native.FileGen --record --name data.dat   //
&gt; generate a recod on disk
&gt; $ java org.apache.iotdb.native.FileGen --chunk --name chunk.dat // generate
&gt; a chunk
&gt; And so on.
&gt; Having dat will allow me to create a set of pytests that:
&gt; -  launch the file generation for a item
&gt; -  launch a native application that read the file part
&gt; 
&gt; Ok, another interesting point are the bloom filters? Why do you use it?
&gt; In parquet they use it for this reason:
&gt; *In their current format, column statistics and dictionaries can be used
&gt; for predicate pushdown. Statistics include minimum and maximum value, which
&gt; can be used to filter out values not in the range. Dictionaries are more
&gt; specific, and readers can filter out values that are between min and max
&gt; but not in the dictionary. However, when there are too many distinct
&gt; values, writers sometimes choose not to add dictionaries because of the
&gt; extra space they occupy. This leaves columns with large cardinalities and
&gt; widely separated min and max without support for predicate pushdown.*
&gt; Is it for the same reason?
&gt; As an implementation can I use this
&gt; 
http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf
&gt; ?
&gt; *From Cache-, Hash- and Space-Efficient Bloom Filters Felix Putze, Peter
&gt; Sanders, Johannes Singler.*
&gt; 
&gt; Thanks for your patience,
&gt; Best Regards,
&gt; Giorgio.
</int></float></fmt></format></name,></name,></name,></name,></[email protected]></[email protected]>

Re: MetaData and other questions

Reply via email to