What if we added these as system metrics and added a way to write metrics to a (separate?) log file?
> On Oct 4, 2017, at 10:13 AM, Piotr Nowojski <pi...@data-artisans.com> wrote: > > Hi, > > Lately I was debugging some weird test failures on Travis and I needed to > look into metrics like: > - User, System, IOWait, IRQ CPU usages (based on CPU ticks since previous > check) > - System wide memory consumption (including making sure that swap was > disabled) > - network usage > - etc… > > Without an access to the machines itself. For this purpose I implemented some > periodic daemon thread logger. Log output looked like this: > > https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 > <https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7> > > I think it would be nice to add this feature to Flink itself, by extending > existing MemoryLogger. Same lack of information that I had with travis could > easily happen on productional environments. The problem is that there is no > easy way to obtain such kind of information without using some external > libraries (think about cross platform support). I have used for that: > > https://github.com/oshi/oshi <https://github.com/oshi/oshi> > > It has some minimal additional dependencies, one thing worth noting is a JNA > - it’s JAR weights ~1MB. We would have two options to add this feature: > > 1. Include this oshi dependency in flink-runtime > 2. Wrap oshi into flink-contrib/flink-resource-logger module and make this > new module an optional/dynamically loaded dependency by flink-runtime (used > only if user manually copies flink-resource-logger.jar to a class path). > > I would lean toward 1., since that’s a powerful tool and it’s dependencies > are pretty minimal (except this JNA’s jar size). What do you think? > > Piotrek