Re: Parquet Verbose Logging

Ryan Blue Fri, 24 Jan 2020 17:08:14 -0800

Let's see how it goes moving more things to debug. I don't think it should
be a policy to move everything to debug, but we can certainly put more
information at the debug level that is currently not needed at info.


On Fri, Jan 24, 2020 at 4:08 PM David Mollitor <[email protected]> wrote:

> Hey Ryan,
>
> I think you understand my position correctly and articulated it well.  My
> background is from higher up the stack; a consumer of these libraries.
>
> We may need to agree to disagree on this one.  Projects these days include
> 100+ libraries and I don't want to have to set a custom log level for each
> one.  Much easier for consumer of libraries to keep everything as quiet as
> possible and then only have to worry about a custom logging level when
> something goes wrong.  Parquet in particular logs a lot of stuff at INFO
> level that is very specific to Parquet and would only be useful (if at all)
> to someone that really knows the library, not something that would be
> helpful to the higher level application developer.
>
> Thanks.
>
>
>
> On Fri, Jan 24, 2020 at 6:48 PM Ryan Blue <[email protected]> wrote:
>
>> It sounds like we see logging differently. My approach is that for any
>> library, the type of information should be categorized using the same
>> criteria into log levels. For example, if it is a normal event you might
>> want to know about, use info. It looks like your approach is that the
>> levels should be set for information from the perspective of the end
>> application: is this behavior relevant to the end user?
>>
>> The problem is that you don't always know whether something is relevant
>> to the end user because that context depends on the application. For the
>> Parquet CLI, much more Parquet information is relevant than for Presto that
>> is scanning Parquet files. That's why I think it's best to categorize the
>> log information using a standard definition, and rely on the end
>> application to configure log levels for its users expectations.
>>
>> On Fri, Jan 24, 2020 at 10:29 AM David Mollitor <[email protected]>
>> wrote:
>>
>>> Hello Ryan,
>>>
>>> I appreciate you taking the time to share your thoughts.
>>>
>>> I'd just like to point out that there is also TRACE level logging if
>>> Parquet requires greater granularity.
>>>
>>> Furthermore, I'm not suggesting that there be an unbreakable rule that
>>> all logging must be DEBUG, but it should be the exception, not the rule.
>>> It is more likely the situation the the wrapping application would be
>>> responsible for logging at the INFO and WARN/ERROR level.  Something
>>> like....
>>>
>>> try {
>>>    LOG.info("Using Parquet to read file {}", path);
>>>    avroParquetReader.read();
>>> } catch (Exception e) {
>>>   LOG.error("Failed to read Parquet file", e);
>>> }
>>>
>>> This is a very normal setup and doesn't require any additional logging
>>> from the Parquet library itself.  Once I see an error with "Failed to re
>>> Parquet file", then I'm going to turn on DEBUG logging and try to reproduce
>>> the error.
>>>
>>> Thanks,
>>> David
>>>
>>> On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <[email protected]>
>>> wrote:
>>>
>>>> I don't agree with the idea to convert all of Parquet's logs to DEBUG
>>>> level, but I do think that we can improve the levels of individual
>>>> messages.
>>>>
>>>> If we convert all logs to debug, then turning on logs to see what
>>>> Parquet
>>>> is doing would show everything from opening an input file to position
>>>> tracking in output files. That's way too much information, which is why
>>>> we
>>>> use different log levels to begin with.
>>>>
>>>> I think we should continue using log levels to distinguish between
>>>> types of
>>>> information: error for errors, warn for recoverable errors that may or
>>>> may
>>>> not indicate a problem, info for regular operations, and debug for extra
>>>> information if you're debugging the Parquet library. Following the
>>>> common
>>>> convention enables people to choose what information they want instead
>>>> of
>>>> mixing it all together.
>>>>
>>>> If you want to only see error and warning logs from Parquet, then the
>>>> right
>>>> way to do that is to configure your logger so that the level for
>>>> org.apache.parquet classes is warn. That's not to say I don't agree
>>>> that we
>>>> can cut down on what is logged at info and clean it up; I just don't
>>>> think
>>>> it's a good idea to abandon the idea of log levels to distinguish
>>>> between
>>>> different information the user of a library will need.
>>>>
>>>> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <[email protected]>
>>>> wrote:
>>>>
>>>> > Hi,
>>>> > I can help too.
>>>> > Lukas
>>>> >
>>>> > Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <[email protected]>
>>>> > napsal:
>>>> >
>>>> > > Hello Team,
>>>> > >
>>>> > > I am happy to do the work of reviewing all Parquet logging, but I
>>>> need
>>>> > help
>>>> > > getting the work committed.
>>>> > >
>>>> > > Fokko Driesprong has been a wonderfully ally in helping me get
>>>> > incremental
>>>> > > improvements into Parquet, but I wonder if there's anyone else that
>>>> can
>>>> > > share in the load.
>>>> > >
>>>> > > Thanks,
>>>> > > David
>>>> > >
>>>> > > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <[email protected]>
>>>> > wrote:
>>>> > >
>>>> > > > Hello David,
>>>> > > >
>>>> > > > As I mentioned on PARQUET-1758, we have been frustrated by overly
>>>> > verbose
>>>> > > > logging in Parquet for a long time.  Various workarounds have
>>>> been more
>>>> > > or
>>>> > > > less successful, e.g.
>>>> > > >
>>>> > > > https://github.com/bigdatagenomics/adam/issues/851 <
>>>> > > > https://github.com/bigdatagenomics/adam/issues/851>
>>>> > > >
>>>> > > > I would support a move making Parquet a silent partner.  :)
>>>> > > >
>>>> > > >    michael
>>>> > > >
>>>> > > >
>>>> > > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <[email protected]
>>>> >
>>>> > > wrote:
>>>> > > > >
>>>> > > > > Hello Team,
>>>> > > > >
>>>> > > > > I have been a consumer of Apache Parquet through Apache Hive for
>>>> > > several
>>>> > > > > years now.  For a long time, logging in Parquet has been pretty
>>>> > > painful.
>>>> > > > > Some of the logging was going to STDOUT and some was going to
>>>> Log4J.
>>>> > > > > Overall, though the framework has been too verbose, spewing
>>>> many log
>>>> > > > lines
>>>> > > > > about internal details of Parquet I don't understand.
>>>> > > > >
>>>> > > > > The logging has gotten a lot better with recent releases moving
>>>> > solidly
>>>> > > > > into SLF4J.  That is awesome and very welcomed.  However,
>>>> (opinion
>>>> > > > alert) I
>>>> > > > > think the logging is still too verbose.  I think Parquet should
>>>> be a
>>>> > > > silent
>>>> > > > > partner in data processing.  If everything is going well, it
>>>> should
>>>> > be
>>>> > > > > silent (DEBUG level logging).  If things are going wrong, it
>>>> should
>>>> > > throw
>>>> > > > > an Exception.
>>>> > > > >
>>>> > > > > If an operator suspects Parquet is the issue (and that's rarely
>>>> the
>>>> > > first
>>>> > > > > thing to check), they can set the logging for all of the
>>>> Loggers in
>>>> > the
>>>> > > > > entire Parquet package (org.apache.parquet) to DEBUG to get the
>>>> > > required
>>>> > > > > information.  Not to mention, the less logging it does, the
>>>> faster it
>>>> > > > will
>>>> > > > > be.
>>>> > > > >
>>>> > > > > I've opened this discussion because I've got two PRs related to
>>>> this
>>>> > > > topic
>>>> > > > > ready to go:
>>>> > > > >
>>>> > > > > PARQUET-1758
>>>> > > > > PARQUET-1761
>>>> > > > >
>>>> > > > > Thanks,
>>>> > > > > David
>>>> > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Parquet Verbose Logging

Reply via email to