Re: [apache/commons-csv] Run failed: Scorecards supply-chain security - master (481d8b1)

2022-10-23 Thread Gary Gregory
Fixed.

Gary

On Sun, Oct 23, 2022, 14:36 Gary Gregory  wrote:

> Note that these types of failures are because I am trying to update this
> GH action which requires some debugging with Apache Infra.
>
> See https://issues.apache.org/jira/browse/INFRA-23811
>
> Gary
>
>
> On Sun, Oct 23, 2022, 14:34 Gary Gregory  wrote:
>
>>
>> [image: GitHub] [apache/commons-csv] Scorecards supply-chain security
>> workflow run, Attempt #2
>>
>>   Scorecards supply-chain security, Attempt #2: All jobs have failed
>>
>> View workflow run
>> 
>>
>> [image: Scorecards analysis]
>>
>> *Scorecards supply-chain security, Attempt #2* / Scorecards analysis
>> Failed in 5 minutes and 41 seconds
>>
>>
>>
>> —
>> You are receiving this because you are subscribed to this thread.
>> Manage your GitHub Actions notifications
>> 
>>
>>
>> GitHub, Inc. ・88 Colin P Kelly Jr Street ・San Francisco, CA 94107
>>
>>
>


[csv] validation of duplicate headers (was [VOTE] Release Apache Commons CSV 1.10.0 based on RC1)

2022-10-23 Thread Alex Herbert
On Sun, 23 Oct 2022 at 14:09, Gary D. Gregory  wrote:
>
> Hi All, Alex, more below:
>
> On 2022/10/22 21:23:13 Alex Herbert wrote:
> > On Sat, 22 Oct 2022 at 20:05, Gary D. Gregory  wrote:
> > >
> > > Thank you for the new tests Alex!
> > >
> > > Here is one area that is easy to overlook: As Commons CSV has evolved, 
> > > _not all settings in CSVFormat_ apply to both writing and parsing. To 
> > > wit, the existing Javadoc for CSVFormat.getAllowMissingColumnNames(): 
> > > "Gets whether missing column names are allowed when parsing the header 
> > > line."
> > >
> > > I've updated the Javadoc for 
> > > CSVFormat.Builder.setAllowMissingColumnNames(boolean) to state this is a 
> > > parser setting.
> > >
> > > I've also updated the test since the commented-out test data is valid for 
> > > parsing.
> > >
> > > In git master now.
> >
> > OK, thanks for the explanation. If the CSVFormat ignores the
> > allowEmptyColumnsNames in the validate() method it should act as if
> > the setting is true, i.e. only the DuplicateHeaderMode should be used
> > to validate the header.
>
> That's how it is now, allowEmptyColumnsNames cannot be considered in the 
> validate method because a CSVFormat does not know if it is going to be used 
> for parsing or writing.
>
> So the test can be updated to stream the full
> > set of cases (reinstating those you commented out) and filter to those
> > where the flag is set to true, e.g. using:
>
> I already had added a new test data provider method to do that so, all I had 
> to do now is make sure that when I deleted the commented out test elements, 
> these were covered in the 2nd data provider called 
> duplicateHeaderParseOnlyData().
>
> >
> > ---
> >
> > static Stream duplicateHeaderAllowsMissingColumnsNamesData() {
> > return duplicateHeaderData().filter(arg ->
> > Boolean.TRUE.equals(arg.get()[1]));
> > }
>
> I did not do this because it seemed (to me) clearer (less magical) to create 
> a duplicateHeaderParseOnlyData() and list explicitly test cases there. WDYT?

I prefer to keep all the parser test data in one method. For the
CSVFormat data, it can be reused and filtered based on the
allowMissingColumnNames flag. The CSVFormat 'should' behave like the
CSVParser when allowMissingColumnNames is true. It makes sense for the
CSVFormat test cases to be a subset of the CSVParser. Note this may
not always be the case. However with the current code this is mostly
true (see below) and it simplifies adding all the test cases.

I added some more test cases and have found a bug and some more issues.

Bugs

This errors:

CSVFormat:
Arguments.of(DuplicateHeaderMode.DISALLOW,true,  new String[]
{"A", ""}, true),

There is no duplicate, so nothing should be triggered. But it triggers
on the first empty header. I have fixed this in git master to sanitise
any blank as the empty string "" before checking for duplicates. Thus
only a second duplicate empty name will throw. However the names are
only sanitised for validate(). The original names are stored as is,
thus the CSVFormat.getHeader() will return the original blank names,
for example using two different blanks as {" ", " "}.

Issues:

In CSVParser using the isBlank method labels an empty header. But
headers are added to the map raw (without trim()). Thus this errors:

CSVParser:
Arguments.of(DuplicateHeaderMode.DISALLOW,true,  new String[] {"
", "   "}, false),

When the blank strings are different lengths they are both identified
as empty. However they are checked in the duplicates map using
different strings, thus no duplicate is identified, and no exception
raised for a duplicate blank header. Thus at present the CSVFormat and
CSVParser have a different interpretation of what is a duplicate empty
header.

I think that CSVParser should be updated to sanitise all non-null
blank strings to the empty string with regard to duplicates. But this
has to be handled differently to the CSVFormat which keeps a throw
away Set of headers. The CSVParser builds a map using the
current headers. To avoid changing the blank headers to "" would
require keeping a flag to mark an observation of a blank header. The
existing header map functionality can then be left unchanged. This
allows the headers from getHeaders() to contain different blank
strings after parsing.

At present the CSVFormat treats null names and blank names as empty.
So you cannot have an input of {"", null} with DISALLOW. But this is
allowed for CSVParser. That currently ignores null names from
duplicate checking. This is different behaviour from CSVFormat.

I have not made any more changes as I wanted to have a second opinion.
Presently all the test cases in the CSVDuplicateHeaderTest pass for
CSVFormat when allowEmptyColumnNames=true. The commented out cases
have a different result in CSVParser as it ignores duplicates of null
and blank.

I believe the CSVParser handling of null headers is simply because it
uses a Map to map column names to a column index.
This will not 

Re: [apache/commons-csv] Run failed: Scorecards supply-chain security - master (481d8b1)

2022-10-23 Thread Gary Gregory
Note that these types of failures are because I am trying to update this GH
action which requires some debugging with Apache Infra.

See https://issues.apache.org/jira/browse/INFRA-23811

Gary


On Sun, Oct 23, 2022, 14:34 Gary Gregory  wrote:

>
> [image: GitHub] [apache/commons-csv] Scorecards supply-chain security
> workflow run, Attempt #2
>
>   Scorecards supply-chain security, Attempt #2: All jobs have failed
>
> View workflow run
> 
>
> [image: Scorecards analysis]
>
> *Scorecards supply-chain security, Attempt #2* / Scorecards analysis
> Failed in 5 minutes and 41 seconds
>
>
>
> —
> You are receiving this because you are subscribed to this thread.
> Manage your GitHub Actions notifications
> 
>
>
> GitHub, Inc. ・88 Colin P Kelly Jr Street ・San Francisco, CA 94107
>
>


[VOTE] Release Apache Commons BCEL 6.6.1 based on RC1

2022-10-23 Thread Gary Gregory
We have fixed one bug since Apache Commons BCEL 6.6.0 was released, so
I would like to release Apache Commons BCEL 6.6.1. This will help
SpotBugs migrate from 6.5.0.

Apache Commons BCEL 6.6.1 RC1 is available for review here:
https://dist.apache.org/repos/dist/dev/commons/bcel/6.6.1-RC1 (svn
revision 57519)

The Git tag commons-bcel-6.6.1-RC1 commit for this RC is
ec208052baf4c596f7e8cfacc281d3e2408809ac which you can browse here:

https://gitbox.apache.org/repos/asf?p=commons-bcel.git;a=commit;h=ec208052baf4c596f7e8cfacc281d3e2408809ac
You may checkout this tag using:
git clone https://gitbox.apache.org/repos/asf/commons-bcel.git
--branch commons-bcel-6.6.1-RC1 commons-bcel-6.6.1-RC1

Maven artifacts are here:

https://repository.apache.org/content/repositories/orgapachecommons-1602/org/apache/bcel/bcel/6.6.1/

These are the artifacts and their hashes:

#Release SHA-512s
#Sun Oct 23 10:44:36 EDT 2022
Apache\ Commons\
BCEL-6.6.1.spdx.rdf.xml=f07a9526a013c6c700e84005f45eb3f582b105756486b665b084be1accf147179e22db752c370dd2cf2c779a2c9188b2fec431fc7de56f0cfc355c26ea25d2fe
bcel-6.6.1-bin.tar.gz=4358ab0a7103e33bf4d0774cdeab11c4b32a7ef65944e08602b6a5c2da7ac1e8f04edc1603b458381e29c9341c32f1679521bb544eb4a081c351d82b646d6024
bcel-6.6.1-bin.zip=1acd0c776c06d50e7f2a4c5d6f7ea90374cdb9f0d6ddc994eb5d8a6eb3632ac10327167b372567456b0b83c0ebd99652f0f69c39e3ba57d31b87b32063eb223d
bcel-6.6.1-bom.json=e51e003589f1de2e73d1fbc3dfdff6f30e85df2aca036b9bcdf5dffe5eb6915adbbb8ab72aed63ff5dcd32b9db51316a088c2b89e8ad54657170c21feae404a8
bcel-6.6.1-bom.xml=f00ddbf5a93d88209e781a71de7067897005e36ca88d5be10f105af87f8ae7cabe07e6373f166def2de4c7d72b30e25e8f20a6f17827260754ed870480647049
bcel-6.6.1-javadoc.jar=35d2dc3daaa7adefd17561f4dc896dbdce5fb0530c8d867fc8178bdab8c9d21afd27b4429b9650281d8692a62039739795b642e9ff5c317a73b1089661aab491
bcel-6.6.1-sources.jar=3ecf109550cd8837dba586570e33c73fa81efccc69e2b68cc03b53756947e4c9f95575e77b751181cf934f20f5e078654f343400b8184ab414e89b316f7f33f4
bcel-6.6.1-src.tar.gz=bb4283d1ebcdeaef7f3530826f16681cad7a53508056178d63621f7a17321df3bbc11914e75dd7b81f967723c8b6bc4009853dc80bb4707de2f5f6db26eb8f7a
bcel-6.6.1-src.zip=a62790a65ecb2f69f260f683d9f75a3a4ef98abfbec5a37efbe10ad7d5fd9c33771ff5cba305c3f18c336f8495a1ce5c5ab43ecb9124973e2728742d634b6288
bcel-6.6.1-test-sources.jar=0f4dd1c7652b564b1343cc2731118c3575d351022f2811d3049f9eccda77b9fb05779691996ddb6396717fa75699e8a407f5c654bf75d5a7aca50bd1a2e9d95f
bcel-6.6.1-tests.jar=1a06e51f00d8de83c4128ddfaf3a318a2afca8e5e67739cfb05956d4c4fce02028d3a93ca3288fc494b46ca08d6e707544b5cc5401c12396cde4a0dc25d49136

I have tested this with 'mvn' (the default goal) and 'mvn -V
-Duser.name=$my_apache_id -Prelease -Ptest-deploy -P jacoco -P japicmp
clean package site deploy' using:

Darwin  21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 20:17:10
PDT 2022; root:xnu-8020.140.49~2/RELEASE_X86_64 x86_64
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /usr/local/Cellar/maven/3.8.6/libexec
Java version: 1.8.0_345, vendor: Homebrew, runtime:
/usr/local/Cellar/openjdk@8/1.8.0+345/libexec/openjdk.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "12.6", arch: "x86_64", family: "mac"

Details of changes since 6.6.0 are in the release notes:

https://dist.apache.org/repos/dist/dev/commons/bcel/6.6.1-RC1/RELEASE-NOTES.txt

https://dist.apache.org/repos/dist/dev/commons/bcel/6.6.1-RC1/site/changes-report.html

Site:

https://dist.apache.org/repos/dist/dev/commons/bcel/6.6.1-RC1/site/index.html
(note some *relative* links are broken and the 6.6.1 directories
are not yet created - these will be OK once the site is deployed.)

JApiCmp Report (compared to 6.6.0):

https://dist.apache.org/repos/dist/dev/commons/bcel/6.6.1-RC1/site/japicmp.html

RAT Report:

https://dist.apache.org/repos/dist/dev/commons/bcel/6.6.1-RC1/site/rat-report.html

KEYS:
  https://downloads.apache.org/commons/KEYS

Please review the release candidate and vote.
This vote will close no sooner than 72 hours from now.

  [ ] +1 Release these artifacts
  [ ] +0 OK, but...
  [ ] -0 OK, but really should fix...
  [ ] -1 I oppose this release because...

Thank you,

Gary Gregory,
Release Manager (using key 86fdc7e2a11262cb)

For following is intended as a helper and refresher for reviewers.

Validating a release candidate
==

These guidelines are NOT complete.

Requirements: Git, Java, Maven.

You can validate a release from a release candidate (RC) tag as follows.

1) Clone and checkout the RC tag

git clone https://gitbox.apache.org/repos/asf/commons-bcel.git
--branch commons-bcel-6.6.1-RC1 commons-bcel-6.6.1-RC1
cd commons-bcel-6.6.1-RC1

2) Check Apache licenses

This step is not required if the site includes a RAT report page which
you then must check.

mvn apache-rat:check

3) Check binary compatibility

Older components still use Apache Clirr:

This step is not required if 

Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-23 Thread Gary D. Gregory
Hi All, Alex, more below:

On 2022/10/22 21:23:13 Alex Herbert wrote:
> On Sat, 22 Oct 2022 at 20:05, Gary D. Gregory  wrote:
> >
> > Thank you for the new tests Alex!
> >
> > Here is one area that is easy to overlook: As Commons CSV has evolved, _not 
> > all settings in CSVFormat_ apply to both writing and parsing. To wit, the 
> > existing Javadoc for CSVFormat.getAllowMissingColumnNames(): "Gets whether 
> > missing column names are allowed when parsing the header line."
> >
> > I've updated the Javadoc for 
> > CSVFormat.Builder.setAllowMissingColumnNames(boolean) to state this is a 
> > parser setting.
> >
> > I've also updated the test since the commented-out test data is valid for 
> > parsing.
> >
> > In git master now.
> 
> OK, thanks for the explanation. If the CSVFormat ignores the
> allowEmptyColumnsNames in the validate() method it should act as if
> the setting is true, i.e. only the DuplicateHeaderMode should be used
> to validate the header. 

That's how it is now, allowEmptyColumnsNames cannot be considered in the 
validate method because a CSVFormat does not know if it is going to be used for 
parsing or writing.

So the test can be updated to stream the full
> set of cases (reinstating those you commented out) and filter to those
> where the flag is set to true, e.g. using:

I already had added a new test data provider method to do that so, all I had to 
do now is make sure that when I deleted the commented out test elements, these 
were covered in the 2nd data provider called duplicateHeaderParseOnlyData().

> 
> ---
> 
> static Stream duplicateHeaderAllowsMissingColumnsNamesData() {
> return duplicateHeaderData().filter(arg ->
> Boolean.TRUE.equals(arg.get()[1]));
> }

I did not do this because it seemed (to me) clearer (less magical) to create a 
duplicateHeaderParseOnlyData() and list explicitly test cases there. WDYT?

> 
> @ParameterizedTest
> @MethodSource(value = {"duplicateHeaderAllowsMissingColumnsNamesData"})
> public void testCSVFormat(final DuplicateHeaderMode duplicateHeaderMode, ...
> 
> ---
> 
> This finds the following failure:
> 
> Arguments.of(DuplicateHeaderMode.ALLOW_EMPTY, true,  new String[] {"
> ", " "}, true),
> 
> This is due to the difference in the CSVParser and CSVFormat in the
> use of the trim() method to detect an empty header:
> 
> CSVParser:
> final boolean emptyHeader = header == null || header.trim().isEmpty();
> 
> CSVFormat:
> final boolean empty = header == null || header.isEmpty();
> 
> This means that the definition of 'empty' is different for the
> CSVFormat and the CSVParser. Should this be corrected in CSVFormat to
> use trim()?

I refactored the logic in CSVFormat#isBlank(String) where the "blank" concept 
vs. "empty" is the same as in Apache Commons Lang where "empty" means size 0 
and "blank" means "trims to size 0"; see 
org.apache.commons.lang3.StringUtils.isBlank(CharSequence).

> 
> If so then I can push the updated test and fix to master. That would
> remove the requirement to comment out any of the test cases and moving
> those to a different test stream method.

Ah, well, let's have you review git master now and feel free to refactor. I 
think we are close if not done for another RC. WDYT?

TY!
Gary

> 
> Alex
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org