Re: Handling Duplicate Timestamps

2024-05-13 Thread Trevor Hart
Thank you! I will implment some work around for now.


I would appreciate some consideration for this option in the future.


Thanks 

Trevor Hart

Ope Limited

w: http://www.ope.nz/

m: +64212728039








 On Tue, 14 May 2024 15:17:47 +1200 Xiangdong Huang  
wrote ---



> 1. Checking before insert if the timestamp already exists and remedy on the 
> client before resend 
> 2. Moving to Nanosecond and introducing some insignificant time value to keep 
> timestamp values unique. 
Yes these maybe the best solutions for a specific application. 
 
 
Analysis for IoTDB: 
- Rejecting the write when receiving an existing timestamp in IoTDB is 
time-costly (IoTDB needs to check historical data). I think we will do 
not check it until we find a low-latency method. 
- Allowing multiple value versions for a timestamp may introduce a 
chain reaction and there may be a lot of codes that should be 
modified, which is a huge work. 
 
There is a new idea (but I have no time to implement it...) 
- Add a parameter in IoTDB: replace_strategy: first, last, avg etc... 
- when an existing timestamp arrives, IoTDB accepts it 
- when IoTDB runs LSM to merge data and meets multiple values for a 
timestamp, then handles it according to the replace_startegy. 
 
The solution may also introduce some work to do... and we need to 
think carefully the impact to the query process. 
Need to survey whether this is a common requirement. 
 
Best, 
--- 
Xiangdong Huang 
 
Trevor Hart  于2024年5月14日周二 09:55写道: 
> 
> Hello Yuan 
> 
> 
> 
> Correct, the first timestamp and values should be retained. 
> 
> 
> 
> I realise this is does not align with the current design. I was just asking 
> whether there was an existing option to operate to block duplicates. 
> 
> 
> 
> In a normal RDBMS if you try to insert with a duplicate the insert will fail 
> with a PK violation. It would be great in some circumstances if IotDB at 
> least had the option to fail this way. 
> 
> 
> 
> I am considering some options such as; 
> 
> 
> 
> 1. Checking before insert if the timestamp already exists and remedy on the 
> client before resend 
> 
> 2. Moving to Nanosecond and introducing some insignificant time value to keep 
> timestamp values unique. 
> 
> 
> 
> I have already done something similar to #2 with storing IIS web log files as 
> they are recorded in seconds and not milliseconds. 
> 
> 
> 
> Thanks 
> 
> Trevor Hart 
> 
> 
> 
> 
>  On Tue, 14 May 2024 13:29:02 +1200 Yuan Tian 
>  wrote --- 
> 
> 
> 
> Hi Trevor, 
> 
> By "rejects duplicates", you mean you want to keep the first duplicate 
> timestamp and its corresponding values?(because the following duplicated 
> ones will be rejected) 
> 
> Best regards, 
>  
> Yuan Tian 
> 
> On Mon, May 13, 2024 at 6:24 PM Trevor Hart  
> wrote: 
> 
> > 
> > 
> > 
> > 
> > Correct. I’m not disputing that. What I’m asking is that it 
> > would be good to have a configuration that either allows overwrites or 
> > rejects duplicates.My scenario is request log data from a server (the 
> > device). As it may be processing multiple requests at once there is a 
> > chance that there could be colliding time stamps.As it stands now I would 
> > need to check if the timestamp exists before inserting the data. Which 
> > obviously affects throughput. Thanks Trevor Hart On Fri, 10 May 
> > 2024 00:33:40 +1200  Jialin Qiao wrote 
> >  Hi, 
> > In IoT or IIoT scenarios, we thought each data point represent a metric of 
> > a timestamp.In which case you need to store duplicated values?  Take this 
> > for an example: Time, root.sg1.car1.speed 1, 1 1, 2  Could a car has 
> > different speed at time 1?   Jialin Qiao  Yuan Tian < 
> > mailto:mailto:jackietie...@gmail.com> 于2024年5月9日周四 18:51写道: > > Hi Trevor, 
> > > > Now we 
> > will override the duplicate timestamp with a newer one. There is > nothing 
> > we can do about it now. > > Best regards, > --- > Yuan Tian 
> > > > On Wed, May 8, 2024 at 5:31 PM Trevor Hart 
> > > >  wrote: > > 
> > > Hello > > > > > > > > I’m aware that when inserting a duplicate timestamp 
> > the values will be > > overwritten. This will obviously result in data 
> > loss. > > > > > > > > Is there a config/setting to reject or throw an error 
> > on duplicate > > inserts? Although highly unlikely I would prefer to be 
> > alerted to the > > situation rather than lose data. > > > > > > > > I read 
> > through the documentation but couldn’t find anything. > > > > > > > > 
> > Thanks > > > > Trevor Hart 
> > 
> > 
> > 
> > 
> > 
> > 
> >

Re: Handling Duplicate Timestamps

2024-05-13 Thread Xiangdong Huang
> 1. Checking before insert if the timestamp already exists and remedy on the 
> client before resend
> 2. Moving to Nanosecond and introducing some insignificant time value to keep 
> timestamp values unique.
Yes these maybe the best solutions for a specific application.


Analysis for IoTDB:
- Rejecting the write when receiving an existing timestamp in IoTDB is
time-costly (IoTDB needs to check historical data). I think we will do
not check it until we find a low-latency method.
- Allowing multiple value versions for a timestamp may introduce a
chain reaction and there may be a lot of codes that should be
modified, which is a huge work.

There is a new idea (but I have no time to implement it...)
- Add a parameter in IoTDB: replace_strategy: first, last, avg etc...
- when an existing timestamp arrives, IoTDB accepts it
- when IoTDB runs LSM to merge data and meets multiple values for a
timestamp, then handles it according to the replace_startegy.

The solution may also introduce some work to do... and we need to
think carefully the impact to the query process.
Need to survey whether this is a common requirement.

Best,
---
Xiangdong Huang

Trevor Hart  于2024年5月14日周二 09:55写道:
>
> Hello Yuan
>
>
>
> Correct, the first timestamp and values should be retained.
>
>
>
> I realise this is does not align with the current design. I was just asking 
> whether there was an existing option to operate to block duplicates.
>
>
>
> In a normal RDBMS if you try to insert with a duplicate the insert will fail 
> with a PK violation. It would be great in some circumstances if IotDB at 
> least had the option to fail this way.
>
>
>
> I am considering some options such as;
>
>
>
> 1. Checking before insert if the timestamp already exists and remedy on the 
> client before resend
>
> 2. Moving to Nanosecond and introducing some insignificant time value to keep 
> timestamp values unique.
>
>
>
> I have already done something similar to #2 with storing IIS web log files as 
> they are recorded in seconds and not milliseconds.
>
>
>
> Thanks
>
> Trevor Hart
>
>
>
>
>  On Tue, 14 May 2024 13:29:02 +1200 Yuan Tian  
> wrote ---
>
>
>
> Hi Trevor,
>
> By "rejects duplicates", you mean you want to keep the first duplicate
> timestamp and its corresponding values?(because the following duplicated
> ones will be rejected)
>
> Best regards,
> 
> Yuan Tian
>
> On Mon, May 13, 2024 at 6:24 PM Trevor Hart  wrote:
>
> >
> >
> >
> >
> > Correct. I’m not disputing that. What I’m asking is that it
> > would be good to have a configuration that either allows overwrites or
> > rejects duplicates.My scenario is request log data from a server (the
> > device). As it may be processing multiple requests at once there is a
> > chance that there could be colliding time stamps.As it stands now I would
> > need to check if the timestamp exists before inserting the data. Which
> > obviously affects throughput. Thanks Trevor Hart On Fri, 10 May
> > 2024 00:33:40 +1200  Jialin Qiao wrote  
> > Hi,
> > In IoT or IIoT scenarios, we thought each data point represent a metric of
> > a timestamp.In which case you need to store duplicated values?  Take this
> > for an example: Time, root.sg1.car1.speed 1, 1 1, 2  Could a car has
> > different speed at time 1?   Jialin Qiao  Yuan Tian <
> > mailto:jackietie...@gmail.com> 于2024年5月9日周四 18:51写道: > > Hi Trevor, > > Now 
> > we
> > will override the duplicate timestamp with a newer one. There is > nothing
> > we can do about it now. > > Best regards, > --- > Yuan Tian
> > > > On Wed, May 8, 2024 at 5:31 PM Trevor Hart  
> > > > wrote: > >
> > > Hello > > > > > > > > I’m aware that when inserting a duplicate timestamp
> > the values will be > > overwritten. This will obviously result in data
> > loss. > > > > > > > > Is there a config/setting to reject or throw an error
> > on duplicate > > inserts? Although highly unlikely I would prefer to be
> > alerted to the > > situation rather than lose data. > > > > > > > > I read
> > through the documentation but couldn’t find anything. > > > > > > > >
> > Thanks > > > > Trevor Hart
> >
> >
> >
> >
> >
> >
> >


Re: Handling Duplicate Timestamps

2024-05-13 Thread Trevor Hart
Hello Yuan



Correct, the first timestamp and values should be retained.



I realise this is does not align with the current design. I was just asking 
whether there was an existing option to operate to block duplicates.



In a normal RDBMS if you try to insert with a duplicate the insert will fail 
with a PK violation. It would be great in some circumstances if IotDB at least 
had the option to fail this way.



I am considering some options such as;



1. Checking before insert if the timestamp already exists and remedy on the 
client before resend

2. Moving to Nanosecond and introducing some insignificant time value to keep 
timestamp values unique.



I have already done something similar to #2 with storing IIS web log files as 
they are recorded in seconds and not milliseconds.



Thanks 

Trevor Hart




 On Tue, 14 May 2024 13:29:02 +1200 Yuan Tian  
wrote ---



Hi Trevor, 
 
By "rejects duplicates", you mean you want to keep the first duplicate 
timestamp and its corresponding values?(because the following duplicated 
ones will be rejected) 
 
Best regards, 
 
Yuan Tian 
 
On Mon, May 13, 2024 at 6:24 PM Trevor Hart  wrote: 
 
> 
> 
> 
> 
> Correct. I’m not disputing that. What I’m asking is that it 
> would be good to have a configuration that either allows overwrites or 
> rejects duplicates.My scenario is request log data from a server (the 
> device). As it may be processing multiple requests at once there is a 
> chance that there could be colliding time stamps.As it stands now I would 
> need to check if the timestamp exists before inserting the data. Which 
> obviously affects throughput. Thanks Trevor Hart On Fri, 10 May 
> 2024 00:33:40 +1200  Jialin Qiao wrote  Hi, 
> In IoT or IIoT scenarios, we thought each data point represent a metric of 
> a timestamp.In which case you need to store duplicated values?  Take this 
> for an example: Time, root.sg1.car1.speed 1, 1 1, 2  Could a car has 
> different speed at time 1?   Jialin Qiao  Yuan Tian < 
> mailto:jackietie...@gmail.com> 于2024年5月9日周四 18:51写道: > > Hi Trevor, > > Now 
> we 
> will override the duplicate timestamp with a newer one. There is > nothing 
> we can do about it now. > > Best regards, > --- > Yuan Tian 
> > > On Wed, May 8, 2024 at 5:31 PM Trevor Hart  wrote: 
> > > > > 
> > Hello > > > > > > > > I’m aware that when inserting a duplicate timestamp 
> the values will be > > overwritten. This will obviously result in data 
> loss. > > > > > > > > Is there a config/setting to reject or throw an error 
> on duplicate > > inserts? Although highly unlikely I would prefer to be 
> alerted to the > > situation rather than lose data. > > > > > > > > I read 
> through the documentation but couldn’t find anything. > > > > > > > > 
> Thanks > > > > Trevor Hart 
> 
> 
> 
> 
> 
> 
>

Re: Handling Duplicate Timestamps

2024-05-13 Thread Yuan Tian
Hi Trevor,

By "rejects duplicates", you mean you want to keep the first duplicate
timestamp and its corresponding values?(because the following duplicated
ones will be rejected)

Best regards,

Yuan Tian

On Mon, May 13, 2024 at 6:24 PM Trevor Hart  wrote:

>
>
>
>
> Correct. I’m not disputing that. What I’m asking is that it
> would be good to have a configuration that either allows overwrites or
> rejects duplicates.My scenario is request log data from a server (the
> device). As it may be processing multiple requests at once there is a
> chance that there could be colliding time stamps.As it stands now I would
> need to check if the timestamp exists before inserting the data. Which
> obviously affects throughput. Thanks Trevor Hart On Fri, 10 May
> 2024 00:33:40 +1200  Jialin Qiao wrote  Hi,
> In IoT or IIoT scenarios, we thought each data point represent a metric of
> a timestamp.In which case you need to store duplicated values?  Take this
> for an example: Time, root.sg1.car1.speed 1, 1 1, 2  Could a car has
> different speed at time 1?   Jialin Qiao  Yuan Tian <
> jackietie...@gmail.com> 于2024年5月9日周四 18:51写道: > > Hi Trevor, > > Now we
> will override the duplicate timestamp with a newer one. There is > nothing
> we can do about it now. > > Best regards, > --- > Yuan Tian
> > > On Wed, May 8, 2024 at 5:31 PM Trevor Hart  wrote: > >
> > Hello > > > > > > > > I’m aware that when inserting a duplicate timestamp
> the values will be > > overwritten. This will obviously result in data
> loss. > > > > > > > > Is there a config/setting to reject or throw an error
> on duplicate > > inserts? Although highly unlikely I would prefer to be
> alerted to the > > situation rather than lose data. > > > > > > > > I read
> through the documentation but couldn’t find anything. > > > > > > > >
> Thanks > > > > Trevor Hart
>
>
>
>
>
>
>


Jakarta migration

2024-05-13 Thread Christofer Dutz
Hi all,

I am currently working on the Jakarta migration … initially I thought of this 
as an experiment in order to find out what the implications would be.
Turns out my gut-feeling was right that this would not be a simple change. Less 
for the complexity of the changes, but what they would bring to the rest of the 
build.

So, for now it seems as if the only Netty version compatible with the Jakarta 
namespace would be Netty 11 (Technically they claimed 10 would be compatible, 
but that’ just not true).
Unfortunately, Netty 9 was the last version to work with Java 8 … so it seems 
as if going to the Jakarta namespace would stop us from building on Java 8.

I know I knew why I would have preferred this migration for a bigger release, 
but I guess a 1.4.0 or 2.0.0 might be more appropriate for a change like this.

Chris


Using SNAPSHOT versions in our releases

2024-05-13 Thread Christofer Dutz
Hi all,

I just noticed that we have a dependency to a SNAPSHOT ratis versions in our 
build.

3.1.0-611b80a-SNAPSHOT

I wasn’t too concerned about this as I saw it was just recently set to that, 
but having a deeper look, this was changed from yet another snapshot version of 
ratis.

The problem is, that these snapshots only exist in the Apache maven repository 
and they will be cleaned up after some time. This WILL break any older version 
of IoTDB out there in the wild.
People will not be able to build our source bundles.

We really need to ensure that we only release software that consists completely 
of released aritfacts.

Chris



Re: Handling Duplicate Timestamps

2024-05-13 Thread Trevor Hart




Correct. I’m not disputing that. What I’m asking is that it would 
be good to have a configuration that either allows overwrites or rejects 
duplicates.My scenario is request log data from a server (the device). As it 
may be processing multiple requests at once there is a chance that there could 
be colliding time stamps.As it stands now I would need to check if the 
timestamp exists before inserting the data. Which obviously affects throughput. 
Thanks Trevor Hart On Fri, 10 May 2024 00:33:40 +1200  Jialin 
Qiao wrote  Hi,  In IoT or IIoT scenarios, we 
thought each data point represent a metric of a timestamp.In which case you 
need to store duplicated values?  Take this for an example: Time, 
root.sg1.car1.speed 1, 1 1, 2  Could a car has different speed at time 1?   
Jialin Qiao  Yuan Tian  于2024年5月9日周四 18:51写道: > > Hi 
Trevor, > > Now we will override the duplicate timestamp with a newer one. 
There is > nothing we can do about it now. > > Best regards, > 
--- > Yuan Tian > > On Wed, May 8, 2024 at 5:31 PM Trevor Hart 
 wrote: > > > Hello > > > > > > > > I’m aware that when 
inserting a duplicate timestamp the values will be > > overwritten. This will 
obviously result in data loss. > > > > > > > > Is there a config/setting to 
reject or throw an error on duplicate > > inserts? Although highly unlikely I 
would prefer to be alerted to the > > situation rather than lose data. > > > > 
> > > > I read through the documentation but couldn’t find anything. > > > > > 
> > > Thanks > > > > Trevor Hart  








Seeking for advice integrating IoTDB with Apache StreamPipes

2024-05-13 Thread Tim Bossenmaier
Hi all,

I'm from the Apache StreamPipes project, an IIoT toolbox where we are
currently working on integrating Apache IoTDB as our time series
storage. Currently we are using Influx, but we want to provide support
for IoTDB as well, maybe even replacing Influx in the end if all goes
well.

Right now we have a basic integration with StreamPipes and IoTDB that
allows us to persist our events in IoTDB. Achieving this was quite
easy and we are really enjoying working with IoTDB so far!

For the next steps, we want to support some more of the StreamPipes
functionality with IoTDB and are looking for some help on how to use
IoTDB the right way.

We have outlined our plans and questions in this GitHub discussion
(https://github.com/apache/streampipes/discussions/2857) and would
really appreciate any help you can provide.


Best
Tim