Bence It no doubt is superior to the current default in terms feature/benefit. We need to just address any fault scenarios. I personally dont care how much space it uses. All good there. I am only focused on fault scenarios and recovery of those. This isnt data we need to protect like content/metadata of the flow itself. We can and should be more ruthless in automating recovery.
Thanks On Wed, Jul 19, 2023 at 9:31 AM Simon Bence <simonbence....@gmail.com> wrote: > Thanks for the quick feedback! > > Joe: your concerns are relevant, let me provide some details: > > The database uses some disk space, determined by the number of components > and the number of covered days. During adding it I was checking for time > usage and however I don’t have the numbers any more, the usage seemed > reasonable. I can do a bit of testing and bring some numbers to improve > confidence with it. Additionally the necessary disk space is limited: we > have rollover handling capability, which limits the amount of stored data, > to the target number plus one days. This is due to the limitations of > QuestDB with partitioning: at the time of development the smallest > partition strategy way day based if I remember correctly so the unit of > deletion was the partition just shifted out from the threshold. (Now it > looks to be the hour based partitoning which might worth the effort to > upgrade to) > > The current rollover deletes all the data older than the threshold, but I > am thinking on adding a new implementation which keeps some aggregated > information about the components. That of course needs some more space, > again: depending on the number of components and the time. > > In case the disk is full, we have no way to push down metrics to the > database and currently there is no fallback strategy for it. A possible way > would be to temporarily keep the data in memory (similar to the > VolatileComponentStatusRepository in that regard) but I am not convinced > that if a node with resources close to the limitations it would be > necessarily a good strategy to write data into the memory instead of the > disk. This is something to consider. > > If the database becomes corrupted than we loose the status information. > This I think is true for most of the persisted storage however I would > think if the database files are not changed by using external tools there > is an insignificant chance for this. Fallback strategies might be added > (like if NiFi considers the database corrupted, it might start a new one) > but even without this I think the QuestDB based solution has its merits > compared to the in memory storage. > > Manual intervention should not be needed. Currently in order to use this > capability, the configuration must be changed but if we would make this the > default, it should work without any additional interaction. > > Regards, > Bence > > > On 2023. Jul 19., at 14:57, Joe Witt <joe.w...@gmail.com> wrote: > > > > Agree functionally > > > > How does this handle disk usage? Any manual intervention needed? What > if > > the disk is full where it writes? What if the db somehow becomes > > corrupted? > > > > Id like to ensure this thing is zero ops as much as possible such that in > > error conditions it resets and gets going again. > > > > Thanks > > > > On Wed, Jul 19, 2023 at 8:55 AM Pierre Villard < > pierre.villard...@gmail.com> > > wrote: > > > >> I do think this provides great value. The possibility to get access to > >> status history of the components and at system level across restart is a > >> great improvement for NiFi troubleshooting. It also gives the ability to > >> store this information for a longer period of time. I'm definitely in > favor > >> of making this the default starting with NiFi 2.0. > >> > >> Le mer. 19 juil. 2023 à 13:49, Simon Bence <simonbence....@gmail.com> a > >> écrit : > >> > >>> Hi Community, > >>> > >>> I was thinking if it would make sense to set the QuestDB as default for > >>> status history backend in 2.0? It is there for a while and I would > >> consider > >>> it as a step forward so the new major version might be a good time for > >> the > >>> wider audience. It comes with less memory usage for bigger flows, the > >>> possibility of checking status information when the node is not running > >> or > >>> restarted so I think it worth consideration. Any insight or improvement > >>> point is appreciated, thanks! > >>> > >>> Regards, > >>> Bence > >> > >