Re: [DISCUSS] SonarCloud integration for Apache Hive

2022-08-14 Thread Ayush Saxena
Thanx Alessandro and Stamatis for driving this. If it is just
highlighting the issues induced by the code changes in the current PR and
not from the whole master or from all the files touched,
It does make sense to integrate it, As Sonar is good with a lot of false
alarms(personal experience), I too suggest we hold on having quality gates
for now and come back here after a couple of months.

Checking the PR, In the comment it does seems to show the count wrt current
master, but on navigating further, I found a link:
https://sonarcloud.io/summary/new_code?id=apache_hive which was
just showing issues with the new code, I think that is something required,
would have been better if it showed these counts only on the PR as comment,
rather than showing the current master counts, I don't think that count is
any relevant on all PRs

Anyway +1 to merge and we can follow up on further improvements possible

-Ayush

On Wed, 10 Aug 2022 at 15:40, Stamatis Zampetakis  wrote:

> That's great news! From the initial message, I got the impression that the
> Sonar label in the PR will report all problems currently in master (and not
> only the new ones).
>
> I agree, it is better not to enforce quality gates directly but leave some
> time for the rest of us to get familiar with the tool.
>
> Best,
> Stamatis
>
> On Tue, Aug 9, 2022 at 6:04 PM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Hi Stamatis,
> > glad to hear you find Sonar helpful, thanks for providing your feedback.
> >
> > The master branch analysis already provides what I think you are looking
> > for, you have:
> >
> >- all code analysis (to see the full status of the code):
> >https://sonarcloud.io/summary/overall?id=apache_hive
> >- new code analysis (basically what changed in the last commit):
> >https://sonarcloud.io/summary/new_code?id=apache_hive
> >
> > For PRs, similarly, the analysis covers the changes w.r.t. the target
> > branch, it's a good and quick way to ascertain the code quality of the
> PR.
> >
> > Regarding "Is it possible to somehow save the current analysis on master
> > and make the
> > PR quality gates fail when things become worse?", it is definitely
> > possible, we can define a success/failure threshold for each of the
> > metrics, and make it fail if the quality gate criteria are not met.
> >
> > I was suggesting to postpone this to allow people to get first familiar
> > with it, I would not want to disrupt existing work, Sonar is a rich tool
> > and people might need a bit of time to adjust to it.
> >
> > Good news is that quality gates can be changed directly from SonarCloud
> and
> > won't require code changes, we might kick in a feedback discussion after
> a
> > month or so from when we introduce Sonar analysis and see what people
> > think.
> >
> > Best regards,
> > Alessandro
> >
> > On Tue, 9 Aug 2022 at 16:38, Stamatis Zampetakis 
> > wrote:
> >
> > > Hi Alessandro,
> > >
> > > Sonar integration will definitely help in improving cope quality and
> > > preventing bugs so many thanks for pushing this forward.
> > >
> > > I went over the PR and it is in good shape. I plan to merge it in the
> > > following days unless someone objects.
> > > We can tackle further improvements in follow up JIRAs.
> > >
> > > Is it possible to somehow save the current analysis on master and make
> > the
> > > PR quality gates fail when things become worse?
> > > If not then what may help in reviewing PRs is to have a diff view
> > (between
> > > a PR and current master) so we can quickly tell if the PR we are about
> to
> > > merge makes things better or worse; as far as I understand the idea is
> to
> > > do this manually at the moment by checking the results on master and on
> > the
> > > PR under review.
> > >
> > > Enabling code coverage would be very helpful as well. Looking forward
> to
> > > this.
> > >
> > > Best,
> > > Stamatis
> > >
> > > On Mon, Aug 8, 2022 at 1:22 PM Alessandro Solimando <
> > > alessandro.solima...@gmail.com> wrote:
> > >
> > > > Errata corrige: the right PR link is the following
> > > > https://github.com/apache/hive/pull/3254
> > > >
> > > > Best regards,
> > > > Alessandro
> > > >
> > > > On Mon, 8 Aug 2022 at 10:04, Alessandro Solimando <
> > > > alessandro.solima...@gmail.com> wrote:
> > > >
> > > > > Hi community,
> > > > > in the context of HIVE-26196
> > > > >  we started
> > > > considering
> > > > > the adoption of SonarCloud 
> analysis
> > > for
> > > > > Apache Hive to promote data-driven code quality improvements and to
> > > allow
> > > > > reviewers to focus on the conceptual part of the changes by helping
> > > them
> > > > > spot trivial code smells, security issues and bugs.
> > > > >
> > > > > SonarCloud has already been adopted and integrated into a few top
> > > Apache
> > > > > projects like DolphinScheduler <
> https://dolphinscheduler.apache.org/
> > >
> > > > and Apache
> > 

Re: [DISCUSS] SonarCloud integration for Apache Hive

2022-08-10 Thread Stamatis Zampetakis
That's great news! From the initial message, I got the impression that the
Sonar label in the PR will report all problems currently in master (and not
only the new ones).

I agree, it is better not to enforce quality gates directly but leave some
time for the rest of us to get familiar with the tool.

Best,
Stamatis

On Tue, Aug 9, 2022 at 6:04 PM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hi Stamatis,
> glad to hear you find Sonar helpful, thanks for providing your feedback.
>
> The master branch analysis already provides what I think you are looking
> for, you have:
>
>- all code analysis (to see the full status of the code):
>https://sonarcloud.io/summary/overall?id=apache_hive
>- new code analysis (basically what changed in the last commit):
>https://sonarcloud.io/summary/new_code?id=apache_hive
>
> For PRs, similarly, the analysis covers the changes w.r.t. the target
> branch, it's a good and quick way to ascertain the code quality of the PR.
>
> Regarding "Is it possible to somehow save the current analysis on master
> and make the
> PR quality gates fail when things become worse?", it is definitely
> possible, we can define a success/failure threshold for each of the
> metrics, and make it fail if the quality gate criteria are not met.
>
> I was suggesting to postpone this to allow people to get first familiar
> with it, I would not want to disrupt existing work, Sonar is a rich tool
> and people might need a bit of time to adjust to it.
>
> Good news is that quality gates can be changed directly from SonarCloud and
> won't require code changes, we might kick in a feedback discussion after a
> month or so from when we introduce Sonar analysis and see what people
> think.
>
> Best regards,
> Alessandro
>
> On Tue, 9 Aug 2022 at 16:38, Stamatis Zampetakis 
> wrote:
>
> > Hi Alessandro,
> >
> > Sonar integration will definitely help in improving cope quality and
> > preventing bugs so many thanks for pushing this forward.
> >
> > I went over the PR and it is in good shape. I plan to merge it in the
> > following days unless someone objects.
> > We can tackle further improvements in follow up JIRAs.
> >
> > Is it possible to somehow save the current analysis on master and make
> the
> > PR quality gates fail when things become worse?
> > If not then what may help in reviewing PRs is to have a diff view
> (between
> > a PR and current master) so we can quickly tell if the PR we are about to
> > merge makes things better or worse; as far as I understand the idea is to
> > do this manually at the moment by checking the results on master and on
> the
> > PR under review.
> >
> > Enabling code coverage would be very helpful as well. Looking forward to
> > this.
> >
> > Best,
> > Stamatis
> >
> > On Mon, Aug 8, 2022 at 1:22 PM Alessandro Solimando <
> > alessandro.solima...@gmail.com> wrote:
> >
> > > Errata corrige: the right PR link is the following
> > > https://github.com/apache/hive/pull/3254
> > >
> > > Best regards,
> > > Alessandro
> > >
> > > On Mon, 8 Aug 2022 at 10:04, Alessandro Solimando <
> > > alessandro.solima...@gmail.com> wrote:
> > >
> > > > Hi community,
> > > > in the context of HIVE-26196
> > > >  we started
> > > considering
> > > > the adoption of SonarCloud  analysis
> > for
> > > > Apache Hive to promote data-driven code quality improvements and to
> > allow
> > > > reviewers to focus on the conceptual part of the changes by helping
> > them
> > > > spot trivial code smells, security issues and bugs.
> > > >
> > > > SonarCloud has already been adopted and integrated into a few top
> > Apache
> > > > projects like DolphinScheduler  >
> > > and Apache
> > > > Jackrabbit FileVault .
> > > >
> > > > For those who don't know, Sonar is a code analysis tool, the initial
> > > > adoption would aim at tracking code quality for the master branch,
> and
> > > > making the PRs' review process easier, by allowing to compare which
> > > > code/security issues a PR solved/introduced with respect to the main
> > > branch.
> > > >
> > > > We already have a Hive-dedicated project under the Apache
> foundation's
> > > > SonarCloud account:
> > > https://sonarcloud.io/project/overview?id=apache_hive.
> > > >
> > > > In what follows I will highlight the main points of interest:
> > > >
> > > > 1) sonar adoption scope:
> > > > For the time being a descriptive approach (just show the analysis and
> > > > associated metrics) could be adopted, delaying a prescriptive one
> > (i.e.,
> > > > quality gates based on the metrics for PRs' mergeability) to a later
> > time
> > > > where we have tested SonarCloud for long enough to judge that it
> could
> > > be a
> > > > sensible move.
> > > >
> > > > 2) false positives:
> > > > Sonar suffers from false positives, but they can be marked as such
> from
> > > > the web 

Re: [DISCUSS] SonarCloud integration for Apache Hive

2022-08-09 Thread Alessandro Solimando
Hi Stamatis,
glad to hear you find Sonar helpful, thanks for providing your feedback.

The master branch analysis already provides what I think you are looking
for, you have:

   - all code analysis (to see the full status of the code):
   https://sonarcloud.io/summary/overall?id=apache_hive
   - new code analysis (basically what changed in the last commit):
   https://sonarcloud.io/summary/new_code?id=apache_hive

For PRs, similarly, the analysis covers the changes w.r.t. the target
branch, it's a good and quick way to ascertain the code quality of the PR.

Regarding "Is it possible to somehow save the current analysis on master
and make the
PR quality gates fail when things become worse?", it is definitely
possible, we can define a success/failure threshold for each of the
metrics, and make it fail if the quality gate criteria are not met.

I was suggesting to postpone this to allow people to get first familiar
with it, I would not want to disrupt existing work, Sonar is a rich tool
and people might need a bit of time to adjust to it.

Good news is that quality gates can be changed directly from SonarCloud and
won't require code changes, we might kick in a feedback discussion after a
month or so from when we introduce Sonar analysis and see what people think.

Best regards,
Alessandro

On Tue, 9 Aug 2022 at 16:38, Stamatis Zampetakis  wrote:

> Hi Alessandro,
>
> Sonar integration will definitely help in improving cope quality and
> preventing bugs so many thanks for pushing this forward.
>
> I went over the PR and it is in good shape. I plan to merge it in the
> following days unless someone objects.
> We can tackle further improvements in follow up JIRAs.
>
> Is it possible to somehow save the current analysis on master and make the
> PR quality gates fail when things become worse?
> If not then what may help in reviewing PRs is to have a diff view (between
> a PR and current master) so we can quickly tell if the PR we are about to
> merge makes things better or worse; as far as I understand the idea is to
> do this manually at the moment by checking the results on master and on the
> PR under review.
>
> Enabling code coverage would be very helpful as well. Looking forward to
> this.
>
> Best,
> Stamatis
>
> On Mon, Aug 8, 2022 at 1:22 PM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Errata corrige: the right PR link is the following
> > https://github.com/apache/hive/pull/3254
> >
> > Best regards,
> > Alessandro
> >
> > On Mon, 8 Aug 2022 at 10:04, Alessandro Solimando <
> > alessandro.solima...@gmail.com> wrote:
> >
> > > Hi community,
> > > in the context of HIVE-26196
> > >  we started
> > considering
> > > the adoption of SonarCloud  analysis
> for
> > > Apache Hive to promote data-driven code quality improvements and to
> allow
> > > reviewers to focus on the conceptual part of the changes by helping
> them
> > > spot trivial code smells, security issues and bugs.
> > >
> > > SonarCloud has already been adopted and integrated into a few top
> Apache
> > > projects like DolphinScheduler 
> > and Apache
> > > Jackrabbit FileVault .
> > >
> > > For those who don't know, Sonar is a code analysis tool, the initial
> > > adoption would aim at tracking code quality for the master branch, and
> > > making the PRs' review process easier, by allowing to compare which
> > > code/security issues a PR solved/introduced with respect to the main
> > branch.
> > >
> > > We already have a Hive-dedicated project under the Apache foundation's
> > > SonarCloud account:
> > https://sonarcloud.io/project/overview?id=apache_hive.
> > >
> > > In what follows I will highlight the main points of interest:
> > >
> > > 1) sonar adoption scope:
> > > For the time being a descriptive approach (just show the analysis and
> > > associated metrics) could be adopted, delaying a prescriptive one
> (i.e.,
> > > quality gates based on the metrics for PRs' mergeability) to a later
> time
> > > where we have tested SonarCloud for long enough to judge that it could
> > be a
> > > sensible move.
> > >
> > > 2) false positives:
> > > Sonar suffers from false positives, but they can be marked as such from
> > > the web UI: (source https://docs.sonarqube.org/latest/faq/#header-1)
> > >
> > > How do I get rid of issues that are False-Positives?
> > >> False-Positive and Won't Fix
> > >> You can mark individual issues False Positive or Won't Fix through the
> > >> issues interface. If you're using PR analysis provided by the
> Developer
> > >> Edition, issues marked False Positive or Won't Fix will retain that
> > status
> > >> after merge. This is the preferred approach.
> > >
> > >
> > >> //NOSONAR
> > >> For most languages, SonarQube supports the use of the generic
> mechanism:
> > >> //NOSONAR at the end of the line of the issue. This will suppress all

Re: [DISCUSS] SonarCloud integration for Apache Hive

2022-08-09 Thread Stamatis Zampetakis
Hi Alessandro,

Sonar integration will definitely help in improving cope quality and
preventing bugs so many thanks for pushing this forward.

I went over the PR and it is in good shape. I plan to merge it in the
following days unless someone objects.
We can tackle further improvements in follow up JIRAs.

Is it possible to somehow save the current analysis on master and make the
PR quality gates fail when things become worse?
If not then what may help in reviewing PRs is to have a diff view (between
a PR and current master) so we can quickly tell if the PR we are about to
merge makes things better or worse; as far as I understand the idea is to
do this manually at the moment by checking the results on master and on the
PR under review.

Enabling code coverage would be very helpful as well. Looking forward to
this.

Best,
Stamatis

On Mon, Aug 8, 2022 at 1:22 PM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Errata corrige: the right PR link is the following
> https://github.com/apache/hive/pull/3254
>
> Best regards,
> Alessandro
>
> On Mon, 8 Aug 2022 at 10:04, Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Hi community,
> > in the context of HIVE-26196
> >  we started
> considering
> > the adoption of SonarCloud  analysis for
> > Apache Hive to promote data-driven code quality improvements and to allow
> > reviewers to focus on the conceptual part of the changes by helping them
> > spot trivial code smells, security issues and bugs.
> >
> > SonarCloud has already been adopted and integrated into a few top Apache
> > projects like DolphinScheduler 
> and Apache
> > Jackrabbit FileVault .
> >
> > For those who don't know, Sonar is a code analysis tool, the initial
> > adoption would aim at tracking code quality for the master branch, and
> > making the PRs' review process easier, by allowing to compare which
> > code/security issues a PR solved/introduced with respect to the main
> branch.
> >
> > We already have a Hive-dedicated project under the Apache foundation's
> > SonarCloud account:
> https://sonarcloud.io/project/overview?id=apache_hive.
> >
> > In what follows I will highlight the main points of interest:
> >
> > 1) sonar adoption scope:
> > For the time being a descriptive approach (just show the analysis and
> > associated metrics) could be adopted, delaying a prescriptive one (i.e.,
> > quality gates based on the metrics for PRs' mergeability) to a later time
> > where we have tested SonarCloud for long enough to judge that it could
> be a
> > sensible move.
> >
> > 2) false positives:
> > Sonar suffers from false positives, but they can be marked as such from
> > the web UI: (source https://docs.sonarqube.org/latest/faq/#header-1)
> >
> > How do I get rid of issues that are False-Positives?
> >> False-Positive and Won't Fix
> >> You can mark individual issues False Positive or Won't Fix through the
> >> issues interface. If you're using PR analysis provided by the Developer
> >> Edition, issues marked False Positive or Won't Fix will retain that
> status
> >> after merge. This is the preferred approach.
> >
> >
> >> //NOSONAR
> >> For most languages, SonarQube supports the use of the generic mechanism:
> >> //NOSONAR at the end of the line of the issue. This will suppress all
> >> issues - now and in the future - that might be raised on the line.
> >
> >
> > For the time being, I think that marking false positives via the UI is
> > more convenient than using "//NOSONAR", but this can be discussed
> further.
> >
> > 3) test code coverage:
> >
> > Due to the specific structure of the ptest infra (split execution and
> > other peculiarities), we are not yet supporting test code coverage, this
> > can be added at a later stage, in the meantime all the code quality and
> > security metrics are available.
> >
> > 4) what will be analyzed:
> >
> > the master branch and each open PR
> >
> > 5) integration with github:
> >
> > SonarCloud integrates with GitHub in two ways, the first one is an
> > additional item in the list of checks (where you have the spell checking,
> > CI result etc.) that will just say Passed/Not Passed and provide a link
> for
> > all the details, the second is a "summary" comment under the PR
> > highlighting the main info (you can see an example here
> > ).
> >
> > The second integration can be disabled if we consider that the first one
> > is enough, and that if we want to dig more we can open the associated
> link
> > for the full analysis in SonarCloud.
> >
> > 6) analysis runtime:
> >
> > In CI the full analysis takes around 30 minutes, but this step is
> executed
> > in parallel with the test split tasks and won't add to the total runtime.
> > For PRs SonarCloud detects unchanged files and avoids analysing 

Re: [DISCUSS] SonarCloud integration for Apache Hive

2022-08-08 Thread Alessandro Solimando
Errata corrige: the right PR link is the following
https://github.com/apache/hive/pull/3254

Best regards,
Alessandro

On Mon, 8 Aug 2022 at 10:04, Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hi community,
> in the context of HIVE-26196
>  we started considering
> the adoption of SonarCloud  analysis for
> Apache Hive to promote data-driven code quality improvements and to allow
> reviewers to focus on the conceptual part of the changes by helping them
> spot trivial code smells, security issues and bugs.
>
> SonarCloud has already been adopted and integrated into a few top Apache
> projects like DolphinScheduler  and 
> Apache
> Jackrabbit FileVault .
>
> For those who don't know, Sonar is a code analysis tool, the initial
> adoption would aim at tracking code quality for the master branch, and
> making the PRs' review process easier, by allowing to compare which
> code/security issues a PR solved/introduced with respect to the main branch.
>
> We already have a Hive-dedicated project under the Apache foundation's
> SonarCloud account: https://sonarcloud.io/project/overview?id=apache_hive.
>
> In what follows I will highlight the main points of interest:
>
> 1) sonar adoption scope:
> For the time being a descriptive approach (just show the analysis and
> associated metrics) could be adopted, delaying a prescriptive one (i.e.,
> quality gates based on the metrics for PRs' mergeability) to a later time
> where we have tested SonarCloud for long enough to judge that it could be a
> sensible move.
>
> 2) false positives:
> Sonar suffers from false positives, but they can be marked as such from
> the web UI: (source https://docs.sonarqube.org/latest/faq/#header-1)
>
> How do I get rid of issues that are False-Positives?
>> False-Positive and Won't Fix
>> You can mark individual issues False Positive or Won't Fix through the
>> issues interface. If you're using PR analysis provided by the Developer
>> Edition, issues marked False Positive or Won't Fix will retain that status
>> after merge. This is the preferred approach.
>
>
>> //NOSONAR
>> For most languages, SonarQube supports the use of the generic mechanism:
>> //NOSONAR at the end of the line of the issue. This will suppress all
>> issues - now and in the future - that might be raised on the line.
>
>
> For the time being, I think that marking false positives via the UI is
> more convenient than using "//NOSONAR", but this can be discussed further.
>
> 3) test code coverage:
>
> Due to the specific structure of the ptest infra (split execution and
> other peculiarities), we are not yet supporting test code coverage, this
> can be added at a later stage, in the meantime all the code quality and
> security metrics are available.
>
> 4) what will be analyzed:
>
> the master branch and each open PR
>
> 5) integration with github:
>
> SonarCloud integrates with GitHub in two ways, the first one is an
> additional item in the list of checks (where you have the spell checking,
> CI result etc.) that will just say Passed/Not Passed and provide a link for
> all the details, the second is a "summary" comment under the PR
> highlighting the main info (you can see an example here
> ).
>
> The second integration can be disabled if we consider that the first one
> is enough, and that if we want to dig more we can open the associated link
> for the full analysis in SonarCloud.
>
> 6) analysis runtime:
>
> In CI the full analysis takes around 30 minutes, but this step is executed
> in parallel with the test split tasks and won't add to the total runtime.
> For PRs SonarCloud detects unchanged files and avoids analysing them, so
> the runtime there is expected to be lower.
>
> I'd like to hear your thoughts on this, and I am looking for reviewers for
> the PR https://github.com/apache/hive/pull/3339.
>
> Best regards,
> Alessandro
>


[DISCUSS] SonarCloud integration for Apache Hive

2022-08-08 Thread Alessandro Solimando
Hi community,
in the context of HIVE-26196
 we started considering
the adoption of SonarCloud  analysis for
Apache Hive to promote data-driven code quality improvements and to allow
reviewers to focus on the conceptual part of the changes by helping them
spot trivial code smells, security issues and bugs.

SonarCloud has already been adopted and integrated into a few top Apache
projects like DolphinScheduler 
and Apache
Jackrabbit FileVault .

For those who don't know, Sonar is a code analysis tool, the initial
adoption would aim at tracking code quality for the master branch, and
making the PRs' review process easier, by allowing to compare which
code/security issues a PR solved/introduced with respect to the main branch.

We already have a Hive-dedicated project under the Apache foundation's
SonarCloud account: https://sonarcloud.io/project/overview?id=apache_hive.

In what follows I will highlight the main points of interest:

1) sonar adoption scope:
For the time being a descriptive approach (just show the analysis and
associated metrics) could be adopted, delaying a prescriptive one (i.e.,
quality gates based on the metrics for PRs' mergeability) to a later time
where we have tested SonarCloud for long enough to judge that it could be a
sensible move.

2) false positives:
Sonar suffers from false positives, but they can be marked as such from the
web UI: (source https://docs.sonarqube.org/latest/faq/#header-1)

How do I get rid of issues that are False-Positives?
> False-Positive and Won't Fix
> You can mark individual issues False Positive or Won't Fix through the
> issues interface. If you're using PR analysis provided by the Developer
> Edition, issues marked False Positive or Won't Fix will retain that status
> after merge. This is the preferred approach.


> //NOSONAR
> For most languages, SonarQube supports the use of the generic mechanism:
> //NOSONAR at the end of the line of the issue. This will suppress all
> issues - now and in the future - that might be raised on the line.


For the time being, I think that marking false positives via the UI is more
convenient than using "//NOSONAR", but this can be discussed further.

3) test code coverage:

Due to the specific structure of the ptest infra (split execution and other
peculiarities), we are not yet supporting test code coverage, this can be
added at a later stage, in the meantime all the code quality and security
metrics are available.

4) what will be analyzed:

the master branch and each open PR

5) integration with github:

SonarCloud integrates with GitHub in two ways, the first one is an
additional item in the list of checks (where you have the spell checking,
CI result etc.) that will just say Passed/Not Passed and provide a link for
all the details, the second is a "summary" comment under the PR
highlighting the main info (you can see an example here
).

The second integration can be disabled if we consider that the first one is
enough, and that if we want to dig more we can open the associated link for
the full analysis in SonarCloud.

6) analysis runtime:

In CI the full analysis takes around 30 minutes, but this step is executed
in parallel with the test split tasks and won't add to the total runtime.
For PRs SonarCloud detects unchanged files and avoids analysing them, so
the runtime there is expected to be lower.

I'd like to hear your thoughts on this, and I am looking for reviewers for
the PR https://github.com/apache/hive/pull/3339.

Best regards,
Alessandro