It sounds like we see logging differently. My approach is that for any library, the type of information should be categorized using the same criteria into log levels. For example, if it is a normal event you might want to know about, use info. It looks like your approach is that the levels should be set for information from the perspective of the end application: is this behavior relevant to the end user?
The problem is that you don't always know whether something is relevant to the end user because that context depends on the application. For the Parquet CLI, much more Parquet information is relevant than for Presto that is scanning Parquet files. That's why I think it's best to categorize the log information using a standard definition, and rely on the end application to configure log levels for its users expectations. On Fri, Jan 24, 2020 at 10:29 AM David Mollitor <[email protected]> wrote: > Hello Ryan, > > I appreciate you taking the time to share your thoughts. > > I'd just like to point out that there is also TRACE level logging if > Parquet requires greater granularity. > > Furthermore, I'm not suggesting that there be an unbreakable rule that all > logging must be DEBUG, but it should be the exception, not the rule. It is > more likely the situation the the wrapping application would be responsible > for logging at the INFO and WARN/ERROR level. Something like.... > > try { > LOG.info("Using Parquet to read file {}", path); > avroParquetReader.read(); > } catch (Exception e) { > LOG.error("Failed to read Parquet file", e); > } > > This is a very normal setup and doesn't require any additional logging > from the Parquet library itself. Once I see an error with "Failed to re > Parquet file", then I'm going to turn on DEBUG logging and try to reproduce > the error. > > Thanks, > David > > On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <[email protected]> > wrote: > >> I don't agree with the idea to convert all of Parquet's logs to DEBUG >> level, but I do think that we can improve the levels of individual >> messages. >> >> If we convert all logs to debug, then turning on logs to see what Parquet >> is doing would show everything from opening an input file to position >> tracking in output files. That's way too much information, which is why we >> use different log levels to begin with. >> >> I think we should continue using log levels to distinguish between types >> of >> information: error for errors, warn for recoverable errors that may or may >> not indicate a problem, info for regular operations, and debug for extra >> information if you're debugging the Parquet library. Following the common >> convention enables people to choose what information they want instead of >> mixing it all together. >> >> If you want to only see error and warning logs from Parquet, then the >> right >> way to do that is to configure your logger so that the level for >> org.apache.parquet classes is warn. That's not to say I don't agree that >> we >> can cut down on what is logged at info and clean it up; I just don't think >> it's a good idea to abandon the idea of log levels to distinguish between >> different information the user of a library will need. >> >> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <[email protected]> wrote: >> >> > Hi, >> > I can help too. >> > Lukas >> > >> > Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <[email protected]> >> > napsal: >> > >> > > Hello Team, >> > > >> > > I am happy to do the work of reviewing all Parquet logging, but I need >> > help >> > > getting the work committed. >> > > >> > > Fokko Driesprong has been a wonderfully ally in helping me get >> > incremental >> > > improvements into Parquet, but I wonder if there's anyone else that >> can >> > > share in the load. >> > > >> > > Thanks, >> > > David >> > > >> > > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <[email protected]> >> > wrote: >> > > >> > > > Hello David, >> > > > >> > > > As I mentioned on PARQUET-1758, we have been frustrated by overly >> > verbose >> > > > logging in Parquet for a long time. Various workarounds have been >> more >> > > or >> > > > less successful, e.g. >> > > > >> > > > https://github.com/bigdatagenomics/adam/issues/851 < >> > > > https://github.com/bigdatagenomics/adam/issues/851> >> > > > >> > > > I would support a move making Parquet a silent partner. :) >> > > > >> > > > michael >> > > > >> > > > >> > > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <[email protected]> >> > > wrote: >> > > > > >> > > > > Hello Team, >> > > > > >> > > > > I have been a consumer of Apache Parquet through Apache Hive for >> > > several >> > > > > years now. For a long time, logging in Parquet has been pretty >> > > painful. >> > > > > Some of the logging was going to STDOUT and some was going to >> Log4J. >> > > > > Overall, though the framework has been too verbose, spewing many >> log >> > > > lines >> > > > > about internal details of Parquet I don't understand. >> > > > > >> > > > > The logging has gotten a lot better with recent releases moving >> > solidly >> > > > > into SLF4J. That is awesome and very welcomed. However, (opinion >> > > > alert) I >> > > > > think the logging is still too verbose. I think Parquet should >> be a >> > > > silent >> > > > > partner in data processing. If everything is going well, it >> should >> > be >> > > > > silent (DEBUG level logging). If things are going wrong, it >> should >> > > throw >> > > > > an Exception. >> > > > > >> > > > > If an operator suspects Parquet is the issue (and that's rarely >> the >> > > first >> > > > > thing to check), they can set the logging for all of the Loggers >> in >> > the >> > > > > entire Parquet package (org.apache.parquet) to DEBUG to get the >> > > required >> > > > > information. Not to mention, the less logging it does, the >> faster it >> > > > will >> > > > > be. >> > > > > >> > > > > I've opened this discussion because I've got two PRs related to >> this >> > > > topic >> > > > > ready to go: >> > > > > >> > > > > PARQUET-1758 >> > > > > PARQUET-1761 >> > > > > >> > > > > Thanks, >> > > > > David >> > > > >> > > > >> > > >> > >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix
