Hi Jialin


Yes the values would be different.



As as example, these are from a web server log. The device is openzweb01 which 
is an IIS web server which may handle multiple requests at the same time. The 
rows are unique in their own right but the timestamp is the same in the 
logging. 



2024-05-20 00:00:14 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/Meriadoc 200 0 0 3339 503 7


2024-05-20 00:00:14 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/Peregrin 200 0 0 3327 503 6


2024-05-20 00:00:14 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/Samwise 200 0 0 3325 502 6

2024-05-20 00:00:14 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/siteadmin 200 0 0 15279 504 5


2024-05-20 00:00:15 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/testuser 200 0 0 1794 503 6

2024-05-20 00:00:15 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/testuser2 200 0 0 1794 506 6



This particular log file only records in seconds. So what I am doing with these 
rows at the moment is to add an artitifical millisecond to enforce uniqueness.




2024-05-20 00:00:14.000 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/Meriadoc 200 0 0 3339 503 7 

2024-05-20 00:00:14.001 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/Peregrin 200 0 0 3327 503 6 

2024-05-20 00:00:14.002 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/Samwise 200 0 0 3325 502 6

2024-05-20 00:00:14.003 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/siteadmin 200 0 0 15279 504 5 

2024-05-20 00:00:15.000 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/testuser 200 0 0 1794 503 6

2024-05-20 00:00:15.001 W3SVC1 openzweb01 192.168.3.69 POST 
/portal/sharing/rest/community/users/testuser2 200 0 0 1794 506 6



For some other log files that I am processing they are in milliseconds already 
but there is a (small) chance of dataloss if multiple requests happen to be 
processed at the exact same time.



I have been thinking about this some more and I think that rather than break 
the IoTDB CRUD model I should handle this on the client side. In my use case 
the log data is actually staged in an H2 database before it is sent to IoTDB so 
I can enforce PK validation there. That way it is less expensive that checking 
the timestamp in IoTDB for each record.



Thanks 

Trevor Hart








---- On Fri, 17 May 2024 19:11:13 +1200 Jialin Qiao <qiaojia...@apache.org> 
wrote ---



Hi Trevor, 
 
Will different values of the same timestamp be the same? 
 
1. Same 
Time, Value 
1, 1 
1, 1 
1, 1 
 
2. Different 
Time, Value 
1, 1 
1, 2 
1, 1 
 
 
Jialin Qiao 
 
Trevor Hart <mailto:tre...@ope.nz> 于2024年5月14日周二 11:20写道: 
> 
> Thank you! I will implment some work around for now. 
> 
> 
> I would appreciate some consideration for this option in the future. 
> 
> 
> Thanks 
> 
> Trevor Hart 
> 
> Ope Limited 
> 
> w: http://www.ope.nz/ 
> 
> m: +64212728039 
> 
> 
> 
> 
> 
> 
> 
> 
> ---- On Tue, 14 May 2024 15:17:47 +1200 Xiangdong Huang 
> <mailto:saint...@gmail.com> wrote --- 
> 
> 
> 
> > 1. Checking before insert if the timestamp already exists and remedy on the 
> > client before resend 
> > 2. Moving to Nanosecond and introducing some insignificant time value to 
> > keep timestamp values unique. 
> Yes these maybe the best solutions for a specific application. 
> 
> 
> Analysis for IoTDB: 
> - Rejecting the write when receiving an existing timestamp in IoTDB is 
> time-costly (IoTDB needs to check historical data). I think we will do 
> not check it until we find a low-latency method. 
> - Allowing multiple value versions for a timestamp may introduce a 
> chain reaction and there may be a lot of codes that should be 
> modified, which is a huge work. 
> 
> There is a new idea (but I have no time to implement it...) 
> - Add a parameter in IoTDB: replace_strategy: first, last, avg etc... 
> - when an existing timestamp arrives, IoTDB accepts it 
> - when IoTDB runs LSM to merge data and meets multiple values for a 
> timestamp, then handles it according to the replace_startegy. 
> 
> The solution may also introduce some work to do... and we need to 
> think carefully the impact to the query process. 
> Need to survey whether this is a common requirement. 
> 
> Best, 
> ----------------------------------- 
> Xiangdong Huang 
> 
> Trevor Hart <mailto:mailto:tre...@ope.nz> 于2024年5月14日周二 09:55写道: 
> > 
> > Hello Yuan 
> > 
> > 
> > 
> > Correct, the first timestamp and values should be retained. 
> > 
> > 
> > 
> > I realise this is does not align with the current design. I was just asking 
> > whether there was an existing option to operate to block duplicates. 
> > 
> > 
> > 
> > In a normal RDBMS if you try to insert with a duplicate the insert will 
> > fail with a PK violation. It would be great in some circumstances if IotDB 
> > at least had the option to fail this way. 
> > 
> > 
> > 
> > I am considering some options such as; 
> > 
> > 
> > 
> > 1. Checking before insert if the timestamp already exists and remedy on the 
> > client before resend 
> > 
> > 2. Moving to Nanosecond and introducing some insignificant time value to 
> > keep timestamp values unique. 
> > 
> > 
> > 
> > I have already done something similar to #2 with storing IIS web log files 
> > as they are recorded in seconds and not milliseconds. 
> > 
> > 
> > 
> > Thanks 
> > 
> > Trevor Hart 
> > 
> > 
> > 
> > 
> > ---- On Tue, 14 May 2024 13:29:02 +1200 Yuan Tian 
> > <mailto:mailto:jackietie...@gmail.com> wrote --- 
> > 
> > 
> > 
> > Hi Trevor, 
> > 
> > By "rejects duplicates", you mean you want to keep the first duplicate 
> > timestamp and its corresponding values?(because the following duplicated 
> > ones will be rejected) 
> > 
> > Best regards, 
> > -------------------- 
> > Yuan Tian 
> > 
> > On Mon, May 13, 2024 at 6:24 PM Trevor Hart 
> > <mailto:mailto:mailto:tre...@ope.nz> wrote: 
> > 
> > > 
> > > 
> > > 
> > > 
> > >             Correct. I’m not disputing that. What I’m asking is that it 
> > > would be good to have a configuration that either allows overwrites or 
> > > rejects duplicates.My scenario is request log data from a server (the 
> > > device). As it may be processing multiple requests at once there is a 
> > > chance that there could be colliding time stamps.As it stands now I would 
> > > need to check if the timestamp exists before inserting the data. Which 
> > > obviously affects throughput. Thanks Trevor Hart    ---- On Fri, 10 May 
> > > 2024 00:33:40 +1200  Jialin 
> > > Qiao<mailto:mailto:mailto:qiaojia...@apache.org> wrote ---- Hi, 
> > > In IoT or IIoT scenarios, we thought each data point represent a metric 
> > > of 
> > > a timestamp.In which case you need to store duplicated values?  Take this 
> > > for an example: Time, root.sg1.car1.speed 1, 1 1, 2  Could a car has 
> > > different speed at time 1?   Jialin Qiao  Yuan Tian < 
> > > mailto:mailto:mailto:jackietie...@gmail.com> 于2024年5月9日周四 18:51写道: > > Hi 
> > > Trevor, > > Now we 
> > > will override the duplicate timestamp with a newer one. There is > 
> > > nothing 
> > > we can do about it now. > > Best regards, > ------------------- > Yuan 
> > > Tian 
> > > > > On Wed, May 8, 2024 at 5:31 PM Trevor Hart 
> > > > > <mailto:mailto:mailto:tre...@ope.nz> wrote: > > 
> > > > Hello > > > > > > > > I’m aware that when inserting a duplicate 
> > > > timestamp 
> > > the values will be > > overwritten. This will obviously result in data 
> > > loss. > > > > > > > > Is there a config/setting to reject or throw an 
> > > error 
> > > on duplicate > > inserts? Although highly unlikely I would prefer to be 
> > > alerted to the > > situation rather than lose data. > > > > > > > > I 
> > > read 
> > > through the documentation but couldn’t find anything. > > > > > > > > 
> > > Thanks > > > > Trevor Hart 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >

Reply via email to