回覆: A SQL to Query a Group of Devices Seperately

Jack Tsai Thu, 05 Dec 2019 12:10:24 -0800

Hi,

First of all, sorry for the delay doing this function as I got some personal 
stuff to do these days. I got some problem while implementing this issue.


It could be implemented in some place by changing the filter operator from 
“and” to “or” within the group by device function such as below:

```
SELECT * FROM root.sg1.d1, root.sg1.d2 WHERE s1=1 group by device
```

Equals

Original:
```
SELECT * FROM root.sg1.d1, root.sg1.d2 WHERE root.sg1.d1.s1=1 and 
root.sg1.d2.s1=1
```
After:
```
SELECT * FROM root.sg1.d1, root.sg1.d2 WHERE root.sg1.d1.s1=1 or 
root.sg1.d2.s1=1
```

However, this will come up a problem. To be more specifically, the example will 
be provided below:

```
INSERT INTO root.sg1.d1(timestamp,s1,s2,s3) values(1,1,1,1)
INSERT INTO root.sg1.d2(timestamp,s1,s2,s3) values(1,3,2,2)
INSERT INTO root.sg1.d2(timestamp,s1,s2,s3) values(2,1,2,2)
```

Query with group by device (after changing the filter operator from “and” to 
“or” within the group by device function):
```
SELECT * FROM root.sg1.d1, root.sg1.d2 WHERE s1=1 group by device
```

This will return with the result as follow:
+-----------------------------------+--------------------+----------+----------+
|                               Time|              Device|          s1|      
s2|    s3|
+-----------------------------------+--------------------+----------+----------+
|                                     1|         root.sg1.d1|         1|       
1|      1|
|                                     1|         root.sg1.d2|         3|       
2|      2|
|                                     2|         root.sg1.d2|         1|       
2|      2|
+-----------------------------------+--------------------+----------+----------+

As above shown,  the record in root.sg1.d2 which shares the timestamp in 
root.sg1.d1 will also show up in the return result, even if it did not have the 
value 1 in its s1.

I am not sure or let’s say I think it is not acceptable for the requirement of 
this issue. However, I cannot find the proper way to solve this problem, which 
the implementation of the RowRecord seems like always needed to bind with the 
timestamp.

Do you guys have solutions for this kind of situations? Welcome and discuss 
with me. Or I may just commit the pull request for you to check it out.

Best regards,
Jack Tsai

________________________________
寄件者: Lei Rui <[email protected]>
寄件日期: Saturday, November 23, 2019 8:12:45 AM
收件者: [email protected] <[email protected]>
主旨: Re: A SQL to Query a Group of Devices Seperately

Hi,


I want to share some of my thoughts about the to-be-changed <group by device>,
after I took a look at the introduction of GROUP BY in the Microsoft SQL Docs 
[1] and
ISO/IEC 9075-2:2003 2 ("SQL/Foundation") <group by clause>.


Basically I agree with the semantic changes of <group by device> as Tsai 
described in the email.
I try to sum up: it is how <group by device> interacts with the SELECT 
statement that is to be changed.
The original <group by device> interacts mainly with the FROM clause.
The new <group by device> will additionally interact with the WHERE clause by 
making WHERE conditions function within separate devices.
Example query:
```
SELECT * FROM root.sg.d2,root.sg.d1 WHERE s1=1 <(new) group by device>
```
According to my understanding, the result will be the same as the following 
conceptual query that returns a union of two separate queries:
```
SELECT * FROM root.sg.d2 WHERE s1=1
UNION ALL
SELECT * FROM root.sg.d1 WHERE s1=1
ORDER BY device_name (p.s. or just keep the same order as in the FROM clause. 
This can be left for more discussion.)
```


------
Best,
Lei Rui


[1] 
https://docs.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver15#syntax




On 11/23/2019 13:37，Xiangdong Huang<[email protected]> wrote：
Hi,

The term "group by device" is fine for ok.

Or, we can  I think classify the queries into 3 part:
1. return data points  time series  by time series, while in each
timeseries, the data points are ordered by the timestamp. (Or, totally
ordered by timestamp for all timeseries)
2. Join all time series that belong to the same device on the timestamp, in
Relational SQL, looks like d1.s1 join d1.s2 on d1.s1.time=d1.s2.time join
d1.s3 on d1.s1.time=d1.s3.time ...
3. Join all the time series in the database, which is what IoTDB now
supports.

This issue IOTDB-305 is for solving the 2nd query.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院


Lei Rui <[email protected]> 于2019年11月23日周六 下午12:02写道：

Hi Tsai,


+1 for your idea. It sounds good to me.


Best,
Lei Rui
On 11/23/2019 09:42，Jack Tsai<[email protected]> wrote：
Hi all,

I am recently working on this issue:
https://issues.apache.org/jira/browse/IOTDB-305, which is about resolving
the problem while users want to query something like below as the issue
reporter Lei Rui said:

select * from root.sg.d1,root.sg.d2 where s1=1 <device separate>(a demo
conception)
/ select * from root.sg.* where s1=1 <device separate>(a demo conception)

There is no specific sql that could satisfy this kind of query
requirement. The only one which is similar to this concept is using the
“group by device” statement. However, it also cannot return the appropriate
result the users need as what Lei Rui mentioned in the issue.

Now, I plan to edit the implement logic of the “group by device”
statement. The original implement method, which is like using the “and”
statement to form the condition part of the sql as mentioned in this issue,
cannot well suited for regular users’ requirements.

In conclusion, the original function of the “group by device” will be
abandoned, and it will be replaced by the one which mentioned in this issue
to satisfy users query. I’m not sure whether this is OK for you guys or the
project. So if you got any advice, please welcome to discuss with me.

Best regards,
Jack Tsai

回覆: A SQL to Query a Group of Devices Seperately

Reply via email to