Hi,

1. For PublicBAOS, it extends the ByteArrayOutputStream, so that it can
grow automatically.
As we do not define the init size, it should be 32.
I think giving a more intelligent init size is good, and it is the best if
the size == the real page size before the page has to be flushed.

Which factors impact the page size:
(1). The max point number ($n$) in a page: the size = the data type size *
$n$.
(2). The max page size $P$.
(3). The memtable size $M$ and the number of active Chunks $c$ in the
memtable: $M$/$c$.
The real scenarios may be more complicated.

If we can not find a intelligent way, at least what we can do is, we can
set the init size as an advanced parameter to let DBA tune.

For more info, please read the funcion ` void
checkPageSizeAndMayOpenANewPage()` in `ChunkWriterImpl`

If you have more questions, do not hesitate to send email.

Best,
----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


atoiLiu <atoi...@163.com> 于2019年12月12日周四 下午6:06写道:

> Hi,
> I understand the process of writing TsFile, but there is a question that
> is not very clear to me. I hope someone can give me some advice.
> TsFile has the concept of Page, which consists of two pieces of data that
> grow on each other,
> 1. timeOut
> 2. ValueOut
> Both are cached by the PublicBAOS class, where I notice it extends
> ByteArrayOutputStream and doesn't initialize the capacity when used.
> private PageWriter(Encoder timeEncoder, Encoder valueEncoder) {
>   this.timeOut = new PublicBAOS();
>   this.valueOut = new PublicBAOS();
>   this.timeEncoder = timeEncoder;
>   this.valueEncoder = valueEncoder;
> }
> public PublicBAOS() {
>   super();
> }
> public ByteArrayOutputStream() {
>     this(32);
> }
> I noticed that we had a page size that was about 64K in the design
> expectation,
> and this will make the cache constantly grow and need to copy the data
> again,
> I think this is a waste, so I want to add an initial value to it, so how
> much is appropriate?
> private void grow(int minCapacity) {
>     // overflow-conscious code
>     int oldCapacity = buf.length;
>     int newCapacity = oldCapacity << 1;
>     if (newCapacity - minCapacity < 0)
>         newCapacity = minCapacity;
>     if (newCapacity - MAX_ARRAY_SIZE > 0)
>         newCapacity = hugeCapacity(minCapacity);
>     buf = Arrays.copyOf(buf, newCapacity);
> }
> In the implementation of ByteArrayOutputStream, the default is to double
> the extension.
> In the write flow of page, the default is to write first and then check if
> the data is larger than 64K, which may make the data larger than 64K.
> In this case it would be wrong to set 64K, which would waste more
> resources
> and I think the initial value should be less than 64K, because it might be
> OOM when the time series is very large,
> So I don't really know how much to set
>
> I don't know whether I am correct in thinking this way. I am looking
> forward to your reply
> thanks again

Reply via email to