Let's see how it goes moving more things to debug. I don't think it should be a policy to move everything to debug, but we can certainly put more information at the debug level that is currently not needed at info.
On Fri, Jan 24, 2020 at 4:08 PM David Mollitor <[email protected]> wrote: > Hey Ryan, > > I think you understand my position correctly and articulated it well. My > background is from higher up the stack; a consumer of these libraries. > > We may need to agree to disagree on this one. Projects these days include > 100+ libraries and I don't want to have to set a custom log level for each > one. Much easier for consumer of libraries to keep everything as quiet as > possible and then only have to worry about a custom logging level when > something goes wrong. Parquet in particular logs a lot of stuff at INFO > level that is very specific to Parquet and would only be useful (if at all) > to someone that really knows the library, not something that would be > helpful to the higher level application developer. > > Thanks. > > > > On Fri, Jan 24, 2020 at 6:48 PM Ryan Blue <[email protected]> wrote: > >> It sounds like we see logging differently. My approach is that for any >> library, the type of information should be categorized using the same >> criteria into log levels. For example, if it is a normal event you might >> want to know about, use info. It looks like your approach is that the >> levels should be set for information from the perspective of the end >> application: is this behavior relevant to the end user? >> >> The problem is that you don't always know whether something is relevant >> to the end user because that context depends on the application. For the >> Parquet CLI, much more Parquet information is relevant than for Presto that >> is scanning Parquet files. That's why I think it's best to categorize the >> log information using a standard definition, and rely on the end >> application to configure log levels for its users expectations. >> >> On Fri, Jan 24, 2020 at 10:29 AM David Mollitor <[email protected]> >> wrote: >> >>> Hello Ryan, >>> >>> I appreciate you taking the time to share your thoughts. >>> >>> I'd just like to point out that there is also TRACE level logging if >>> Parquet requires greater granularity. >>> >>> Furthermore, I'm not suggesting that there be an unbreakable rule that >>> all logging must be DEBUG, but it should be the exception, not the rule. >>> It is more likely the situation the the wrapping application would be >>> responsible for logging at the INFO and WARN/ERROR level. Something >>> like.... >>> >>> try { >>> LOG.info("Using Parquet to read file {}", path); >>> avroParquetReader.read(); >>> } catch (Exception e) { >>> LOG.error("Failed to read Parquet file", e); >>> } >>> >>> This is a very normal setup and doesn't require any additional logging >>> from the Parquet library itself. Once I see an error with "Failed to re >>> Parquet file", then I'm going to turn on DEBUG logging and try to reproduce >>> the error. >>> >>> Thanks, >>> David >>> >>> On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <[email protected]> >>> wrote: >>> >>>> I don't agree with the idea to convert all of Parquet's logs to DEBUG >>>> level, but I do think that we can improve the levels of individual >>>> messages. >>>> >>>> If we convert all logs to debug, then turning on logs to see what >>>> Parquet >>>> is doing would show everything from opening an input file to position >>>> tracking in output files. That's way too much information, which is why >>>> we >>>> use different log levels to begin with. >>>> >>>> I think we should continue using log levels to distinguish between >>>> types of >>>> information: error for errors, warn for recoverable errors that may or >>>> may >>>> not indicate a problem, info for regular operations, and debug for extra >>>> information if you're debugging the Parquet library. Following the >>>> common >>>> convention enables people to choose what information they want instead >>>> of >>>> mixing it all together. >>>> >>>> If you want to only see error and warning logs from Parquet, then the >>>> right >>>> way to do that is to configure your logger so that the level for >>>> org.apache.parquet classes is warn. That's not to say I don't agree >>>> that we >>>> can cut down on what is logged at info and clean it up; I just don't >>>> think >>>> it's a good idea to abandon the idea of log levels to distinguish >>>> between >>>> different information the user of a library will need. >>>> >>>> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <[email protected]> >>>> wrote: >>>> >>>> > Hi, >>>> > I can help too. >>>> > Lukas >>>> > >>>> > Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <[email protected]> >>>> > napsal: >>>> > >>>> > > Hello Team, >>>> > > >>>> > > I am happy to do the work of reviewing all Parquet logging, but I >>>> need >>>> > help >>>> > > getting the work committed. >>>> > > >>>> > > Fokko Driesprong has been a wonderfully ally in helping me get >>>> > incremental >>>> > > improvements into Parquet, but I wonder if there's anyone else that >>>> can >>>> > > share in the load. >>>> > > >>>> > > Thanks, >>>> > > David >>>> > > >>>> > > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <[email protected]> >>>> > wrote: >>>> > > >>>> > > > Hello David, >>>> > > > >>>> > > > As I mentioned on PARQUET-1758, we have been frustrated by overly >>>> > verbose >>>> > > > logging in Parquet for a long time. Various workarounds have >>>> been more >>>> > > or >>>> > > > less successful, e.g. >>>> > > > >>>> > > > https://github.com/bigdatagenomics/adam/issues/851 < >>>> > > > https://github.com/bigdatagenomics/adam/issues/851> >>>> > > > >>>> > > > I would support a move making Parquet a silent partner. :) >>>> > > > >>>> > > > michael >>>> > > > >>>> > > > >>>> > > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <[email protected] >>>> > >>>> > > wrote: >>>> > > > > >>>> > > > > Hello Team, >>>> > > > > >>>> > > > > I have been a consumer of Apache Parquet through Apache Hive for >>>> > > several >>>> > > > > years now. For a long time, logging in Parquet has been pretty >>>> > > painful. >>>> > > > > Some of the logging was going to STDOUT and some was going to >>>> Log4J. >>>> > > > > Overall, though the framework has been too verbose, spewing >>>> many log >>>> > > > lines >>>> > > > > about internal details of Parquet I don't understand. >>>> > > > > >>>> > > > > The logging has gotten a lot better with recent releases moving >>>> > solidly >>>> > > > > into SLF4J. That is awesome and very welcomed. However, >>>> (opinion >>>> > > > alert) I >>>> > > > > think the logging is still too verbose. I think Parquet should >>>> be a >>>> > > > silent >>>> > > > > partner in data processing. If everything is going well, it >>>> should >>>> > be >>>> > > > > silent (DEBUG level logging). If things are going wrong, it >>>> should >>>> > > throw >>>> > > > > an Exception. >>>> > > > > >>>> > > > > If an operator suspects Parquet is the issue (and that's rarely >>>> the >>>> > > first >>>> > > > > thing to check), they can set the logging for all of the >>>> Loggers in >>>> > the >>>> > > > > entire Parquet package (org.apache.parquet) to DEBUG to get the >>>> > > required >>>> > > > > information. Not to mention, the less logging it does, the >>>> faster it >>>> > > > will >>>> > > > > be. >>>> > > > > >>>> > > > > I've opened this discussion because I've got two PRs related to >>>> this >>>> > > > topic >>>> > > > > ready to go: >>>> > > > > >>>> > > > > PARQUET-1758 >>>> > > > > PARQUET-1761 >>>> > > > > >>>> > > > > Thanks, >>>> > > > > David >>>> > > > >>>> > > > >>>> > > >>>> > >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix
