Re: Whether to automatically filter out rows that are entirely null in the result set during a scan by the storage engine

Yuan Tian Thu, 29 Aug 2024 00:01:21 -0700

Hi all,

Another question is whether we need to do incompatible changes in master
branch for aligned timeseries of tree model.(Actually, we've already
discussed whether to upgrade the version of master branch to 2.0.0-SNAPSHOT
which means that it may contain incompatible changes).

However if we change that in tree model, it will make result set for
aligned timeseries and non-aligned timeseries different.
For aligned timeseries, it's just like table model, each timeseries in one
device shares the same time column, so even if s2 and s3 don't have value
in second and third row, we still record that in their column bitmap which
indicates that there exists a row.

But for non-aligned timeseries, each timeseries in one device has its own
time column, which means that if we only query s2 and s3, we will only get
the timestamp `1`, if we want to know whether there exists other rows for
this devices, we need to scan all the timeseries under that device which is
very expensive operation, and we won't do that.

That's also the reason why we do so in tree model for aligned timeseries,
just want to keep the query result same as non-aligned timeseries.

So what do you think?

[1] change nothing for tree model, only change the way for table model
[2] make the same changes for aligned timeseries in tree model as the table
model, and leave alone non-aligned timeseries in tree model

Best regards,
--------------------
Yuan Tian

On Thu, Aug 29, 2024 at 10:07 AM Weihao Li <18110526...@163.com> wrote:

> Hi Yuan,
>
> I think it is good to be inconsistent with the behavior of relational
> databases. If we only  get one row when query select s2, s3 from
> root.db.d1, user cannot distinguish between the following two cases:
>
> 1. The device has no data in other time;
>
> 2. The device has data in other time, just these selected sensors has no
> data.
>
>
>
>
> Yours,
>
> Weihao Li
>
>
>
>
>
>
>
>
>
>
>
> At 2024-08-29 09:45:07, "Yuan Tian" <jackietie...@gmail.com> wrote:
> >Hi all,
> >
> >If you are familiar with the tree model, you should know that for a device
> >d1, if it contains three sensors: s1, s2, s3, and its data are like:
> >Time | s1 | s2 | s3 |
> >-------|-----|-----|-----|
> >1       |    1| 10|100 |
> >-------|-----|-----|-----|
> >2       |    2| null|null |
> >-------|-----|-----|-----|
> >3       |    3| null|null |
> >
> >
> >if we only query s2 and s3, select s2, s3 from root.db.d1, we will only
> get
> >one row(the first row), because for 2 and 3 row, s2 and s3 are all null,
> >we automatically filter out rows that are entirely null during a scan by
> >the storage engine.
> >
> >However, this is inconsistent with the behavior of relational databases
> >which will return all three rows. So in our table model, should we keep
> >consistent with tree model, or we follow the relational databases way?
> >Personally, I think that we should maintain consistency with relational
> >databases.
> >
> >What do you think?
> >
> >Best regards,
> >---------------------- Yuan Tian
>

Re: Whether to automatically filter out rows that are entirely null in the result set during a scan by the storage engine

Reply via email to