Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-21 Thread sman81
Hi Gary,

the PR did not have conflicts 30 minutes ago. So we must have encountered a 
race condition between you updating master and me rebasing and force-pushing my 
pr.
I've done that again just now. There are no conflicts.
At the same time I see parts of my PR have dribbled into master already, so if 
you rather implement the changes yourself, please go ahead. I am fine with 
closing my pr.

regards,
Markus


From: Gary Gregory 
Sent: Friday, October 21, 2022 15:17
To: Commons Developers List 
Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 
 
Hi Markus,

The PR has conflicts and does not test its changes.

I'd like feedback from the community on whether or not git master as it is
now is OK for the duplicate header behavior or if changing the default is
needed and if doing so is just ping-pongonging from one incompatibility to
another, based on previous versions.

TY!

Gary

On Fri, Oct 21, 2022, 08:52  wrote:

> I've removed serialization stuff from the PR and rebased it to master
> https://github.com/apache/commons-csv/pull/276
>
> kind regards,
> Markus
>
> From: Gary D. Gregory 
> Sent: Friday, October 21, 2022 14:18
> To: dev@commons.apache.org 
> Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1
>
> On 2022/10/20 22:56:05 Alex Herbert wrote:
> > On Thu, 20 Oct 2022 at 23:43, Alex Herbert 
> wrote:
> > >
> > > I did not have time to track through whether this behaviour changed
> > > after the initial implementation of the flag. I would think not as the
> > > original behaviour is from 1.0. This would map to:
> > >
> > > true -> ALLOW_ALL
> > > false -> ALLOW_EMPTY
> > > new -> DISALLOW
> > >
> > > Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to
> > > change the use of the flag to 'false -> DISALLOW' is not maintaining
> > > behavioural compatibility (to 1.7, or back to 1.0).
> >
> > PS. I just verified that PR 276 changes the DuplicateHeaderMode value
> > for allowDuplicates=false and does not change any tests.
> >
> > So the test suite is currently not enforcing behavioural
> > compatibility. This seems like a glaring hole in the tests and should
> > be addressed to prevent regressions.
>
> In git master, there are many tests for senders of
> withAllowDuplicateHeaderNames(boolean) and
> setAllowDuplicateHeaderNames(boolean).
>
> I filled in the coverage in testGetAllowDuplicateHeaderNames() for
> getAllowDuplicateHeaderNames() to reflect the _current_ behavior, right or
> wrong.
>
> Let's reassess now git master now please. If the PR matters still, it
> should be rebased on git master.
>
> Gary
>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>
-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-21 Thread sman81
I've removed serialization stuff from the PR and rebased it to master
https://github.com/apache/commons-csv/pull/276

kind regards,
Markus

From: Gary D. Gregory 
Sent: Friday, October 21, 2022 14:18
To: dev@commons.apache.org 
Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 
 
On 2022/10/20 22:56:05 Alex Herbert wrote:
> On Thu, 20 Oct 2022 at 23:43, Alex Herbert  wrote:
> >
> > I did not have time to track through whether this behaviour changed
> > after the initial implementation of the flag. I would think not as the
> > original behaviour is from 1.0. This would map to:
> >
> > true -> ALLOW_ALL
> > false -> ALLOW_EMPTY
> > new -> DISALLOW
> >
> > Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] to
> > change the use of the flag to 'false -> DISALLOW' is not maintaining
> > behavioural compatibility (to 1.7, or back to 1.0).
> 
> PS. I just verified that PR 276 changes the DuplicateHeaderMode value
> for allowDuplicates=false and does not change any tests.
> 
> So the test suite is currently not enforcing behavioural
> compatibility. This seems like a glaring hole in the tests and should
> be addressed to prevent regressions.

In git master, there are many tests for senders of 
withAllowDuplicateHeaderNames(boolean) and 
setAllowDuplicateHeaderNames(boolean). 

I filled in the coverage in testGetAllowDuplicateHeaderNames() for 
getAllowDuplicateHeaderNames() to reflect the _current_ behavior, right or 
wrong. 

Let's reassess now git master now please. If the PR matters still, it should be 
rebased on git master.

Gary

> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-21 Thread sman81
> Would't it be simpler to deal with the serialization issue by bumping the 
> serialVersionID?
simpler yes, but it's a different thing
The PR makes the serialized forms for commons-csv versions 1.9.0 and 1.10.0 
compatible.

Given that serialization has been broken for several versions in commons-csv 
and given that "fixing" it now is a labor of love which does not add much value,
I am in favor of throwing it out altogether now rather than in some future 
(major) version. Clearly inform about it in the release notes and be done with 
it.

> Also note the PR will throw an NPE in the builder
> when instead of using the validate() method.

setDuplicateHeaderMode(null) is illegal and should be fail-fast IMO

kind regards,
Markus

From: Gary Gregory 
Sent: Thursday, October 20, 2022 16:43
To: Commons Developers List 
Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 
 
Would't it be simpler to deal with the serialization issue by bumping the
serialVersionID? We can just say that you only serialized and deserialize
for the same version. Also note the PR will throw an NPE in the builder
when instead of using the validate() method.

Gary

On Wed, Oct 19, 2022, 18:27 Gary D. Gregory  wrote:

> I've commented on the PR.
> TY.
> Gary
>
> On 2022/10/19 16:51:57 Gary Gregory wrote:
> > On Wed, Oct 19, 2022 at 10:01 AM Alex Herbert 
> wrote:
> > >
> > > On Wed, 19 Oct 2022 at 14:57, Gary D. Gregory 
> wrote:
> > > >
> > > > My +1
> > > >
> > > > Gary
> > >
> > > Gary,
> > >
> > > PR #276 highlights a behavioural compatibility error in the 1.10.0 RC1.
> > >
> > > AllowDuplicates enum may be set to the incorrect value when setting
> > > the allow duplicates boolean. Have you reviewed this? I believe it is
> > > valid.
> >
> > I will re-read later tonight...
> >
> > Gary
> >
> > >
> > > Alex
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>
-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-19 Thread sman81
> You have added test data for CSVFormat for 1.7 and 1.8 and these do
not work (commented out). I take it this means serialization has been
broken since the CSVFormat.delimiter was changed from char to String
in 1.9.0.

That's correct, Alex. I added the comments for documentation. Should we decide 
to fix serialization from version 1.7 and version 1.8 upwards, the test and 
test data can be enabled.

> So given serialization was already broken in the last
release does it make any sense to fix it for 1.9.0 to 1.10.0 for the new field?

According to the number of Maven central downloads versions 1.8 and 1.9.0 are 
the most popular of commons-csv.
So I think there is merit in fixing serialization issues from 1.9.0 to 1.10.0 
Besides the PR already offers the fix, why abandon it?

Will PR #276 be part of version 1.10.0?
Do we have a volunteer to review and merge it?

regards,
Markus



From: Alex Herbert 
Sent: Monday, October 17, 2022 19:53
To: Commons Developers List 
Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

On Mon, 17 Oct 2022 at 17:11,  wrote:
>
> Hello
>
> > This is the logic from the current builder:
> > DuplicateHeaderMode mode = allowDuplicateHeaderNames ?
> > DuplicateHeaderMode.ALLOW_ALL : DuplicateHeaderMode.ALLOW_EMPTY
>
> This is true only if Builder.setAllowDuplicateHeaderNames is actually called, 
> which is at the library user's discretion.

I was only discussing the deserialization path and the logic that
would be required in custom deserialization code to support backwards
compatibility. However I do not think this library is supporting
Serializable going forward. It is already known to be broken for
CSVRecord and now also CSVFormat.

>
> Otherwise CSVFormat.Builder.duplicateHeaderMode remains null.
> Also, the method Builder.setDuplicateHeaderMode(DuplicateHeaderMode) accepts 
> null. A non-null parameter should be enforced.
>
> Besides, shouldn't the above statement be:
> allowDuplicateHeaderNames ? ALLOW_ALL : DISALLOW
> ?
>
> duplicateHeaderMode should default to DISALLOW for backward compatibility.

Duplicate header support was added in 1.7. Before that I do not know
what happened without checking the old code. It may have thrown an
exception or silently ignored it. I would have to track down the Jira
ticket for when the feature was added.

>
> Found another small bug in setNullString where member quotedNullString was 
> inconsistently written (missing write in setQuote):
>
> this.quotedNullString = quoteCharacter + nullString + 
> quoteCharacter;
>
> All of the above in pull request 
> https://github.com/apache/commons-csv/pull/276
>
> @Alex would you review my PR please?

Sure.

You have added test data for CSVFormat for 1.7 and 1.8 and these do
not work (commented out). I take it this means serialization has been
broken since the CSVFormat.delimiter was changed from char to String
in 1.9.0. So given serialization was already broken in the last
release does it make any sense to fix it for 1.9.0 to 1.10.0 for the
new field?

I agree that the quotedNullString is inconsistently handled.

Regards,

Alex

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-17 Thread sman81
Hello

> This is the logic from the current builder:
> DuplicateHeaderMode mode = allowDuplicateHeaderNames ?
> DuplicateHeaderMode.ALLOW_ALL : DuplicateHeaderMode.ALLOW_EMPTY

This is true only if Builder.setAllowDuplicateHeaderNames is actually called, 
which is at the library user's discretion.

Otherwise CSVFormat.Builder.duplicateHeaderMode remains null.
Also, the method Builder.setDuplicateHeaderMode(DuplicateHeaderMode) accepts 
null. A non-null parameter should be enforced.

Besides, shouldn't the above statement be:
allowDuplicateHeaderNames ? ALLOW_ALL : DISALLOW
?

duplicateHeaderMode should default to DISALLOW for backward compatibility.

Found another small bug in setNullString where member quotedNullString was 
inconsistently written (missing write in setQuote):

this.quotedNullString = quoteCharacter + nullString + 
quoteCharacter;

All of the above in pull request https://github.com/apache/commons-csv/pull/276

@Alex would you review my PR please?

Kind regards,
Markus


From: Alex Herbert 
Sent: Monday, October 17, 2022 13:46
To: Commons Developers List 
Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

On Mon, 17 Oct 2022 at 11:00,  wrote:
>
> Hello
>
> CSV-264 (Add DuplicateHeaderMode) introduces bugs that should be fixed before 
> shipping 1.10.0 IMO
> - missing default
> - broken serialization of class CSVFormat
>
> I raised these issues in CSV-302.
>
> The serialization issue is caught by Revapi. I had suggested to include 
> Revapi in the project (CSV-303) or alternatively add revapi and drop clirr in 
> commons-parent but that more or less fell on deaf ears.
>
> mvn org.revapi:revapi-maven-plugin:check
>
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  3.560 s
> [INFO] Finished at: 2022-10-17T11:30:21+02:00
> [INFO] 
> 
> [ERROR] Failed to execute goal org.revapi:revapi-maven-plugin:0.14.7:check 
> (default-cli) on project commons-csv: The following API problems caused the 
> build to fail:
> [ERROR] java.field.serialVersionUIDUnchanged: field 
> org.apache.commons.csv.CSVFormat.serialVersionUID: The class changed in an 
> incompatible way with regards to serialization but the serialVersionUID field 
> stayed unchanged. This might be ok and/or desired but is suspicious. 
> https://revapi.org/revapi-java/differences.html#java.field.serialVersionUIDUnchanged

This is an allowed source and binary compatible error; it has
potentially breaking semantic severity which do apply here.

The CSVFormat object is deserialized using the defaultReadObject
method. This will be able to deserialize older versions of CSVFormat.
The new fields added in this version will be absent from the
ObjectInputStream and set to their default initialised value.

IIUC this is the change:

remove: private boolean allowDuplicateHeaderNames
add: private DuplicateHeaderMode duplicateHeaderMode

Deserialization would mean that the new field 'duplicateHeaderMode' is
set to null and the previous field 'allowDuplicateHeaderNames' is
ignored. To fix this would require setting duplicateHeaderMode
appropriately using the field allowDuplicateHeaderName. This is the
logic from the current builder:

DuplicateHeaderMode mode = allowDuplicateHeaderNames ?
DuplicateHeaderMode.ALLOW_ALL : DuplicateHeaderMode.ALLOW_EMPTY

However, serialization was broken previously in CSV. This was for the
CSVRecord. The release notes contain this in the description for 1.8:

"This release fixes serialization compatibility of CSVRecord with
versions 1.0 to 1.6. New fields added since 1.7 are not serialized.
Support for Serializable is scheduled to be removed in version 2.0."

The Jira ticket is 248:
https://issues.apache.org/jira/browse/CSV-248

In that case we made sure we could (de)serialize without throwing an
exception to minimise errors for downstream users. But we did not fix
behavioural compatibility across versions. In this instance I do not
think exceptions need to be addressed as missing fields will be
ignored and the new field will be initialized as null. However I have
not verified this with a serialization test. If this new field is null
then the format will behave as if duplicates is set to DISALLOW.

I do not know the user case for serializing a CSVFormat. It would not
be within the library. A reference is kept in CSVParser and CSVPrinter
but these are not Serializable.

The serialization support dates back to the legacy codebase used to
create the project. Serialization can be replaced by other more
configurable libraries to read/write binary objects. Perhaps we just
need to add to the release notes that CSVFormat serialization is no
longer supported.


> [ERROR] java.method.exception.checkedRemoved: method 
> java.util.List 
> 

Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

2022-10-17 Thread sman81
Hello

CSV-264 (Add DuplicateHeaderMode) introduces bugs that should be fixed before 
shipping 1.10.0 IMO
- missing default
- broken serialization of class CSVFormat

I raised these issues in CSV-302.

The serialization issue is caught by Revapi. I had suggested to include Revapi 
in the project (CSV-303) or alternatively add revapi and drop clirr in 
commons-parent but that more or less fell on deaf ears.

mvn org.revapi:revapi-maven-plugin:check

[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  3.560 s
[INFO] Finished at: 2022-10-17T11:30:21+02:00
[INFO] 
[ERROR] Failed to execute goal org.revapi:revapi-maven-plugin:0.14.7:check 
(default-cli) on project commons-csv: The following API problems caused the 
build to fail:
[ERROR] java.field.serialVersionUIDUnchanged: field 
org.apache.commons.csv.CSVFormat.serialVersionUID: The class changed in an 
incompatible way with regards to serialization but the serialVersionUID field 
stayed unchanged. This might be ok and/or desired but is suspicious. 
https://revapi.org/revapi-java/differences.html#java.field.serialVersionUIDUnchanged
[ERROR] java.method.exception.checkedRemoved: method 
java.util.List 
org.apache.commons.csv.CSVParser::getRecords(): Method no longer throws checked 
exceptions: java.io.IOException. 
https://revapi.org/revapi-java/differences.html#java.method.exception.checkedRemoved

Also noticed that class DuplicateHeaderMode and method 
CSVFormat.getDuplicateHeaderMode() are incorrectly annotated with "@since 
1.9.0". Both are new in version 1.10.0.

My vote is -1 assuming I am allowed to vote.

Kind regards,
Markus

From: Gary Gregory 
Sent: Sunday, October 16, 2022 14:48
To: Commons Developers List 
Subject: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 
 
To: dev@commons.apache.org
We have fixed a few bugs and added some enhancements since Apache
Commons CSV 1.9.0 was released, so I would like to release Apache
Commons CSV 1.10.0.

Apache Commons CSV 1.10.0 RC1 is available for review here:
    https://dist.apache.org/repos/dist/dev/commons/csv/1.10.0-RC1 (svn
revision 57404)

The Git tag commons-csv-1.10.0-RC1 commit for this RC is
1bd1fd8e6065da9d07b5a3a1723b059246b14001 which you can browse here:
    
https://gitbox.apache.org/repos/asf?p=commons-csv.git;a=commit;h=1bd1fd8e6065da9d07b5a3a1723b059246b14001
You may checkout this tag using:
    git clone https://gitbox.apache.org/repos/asf/commons-csv.git
--branch commons-csv-1.10.0-RC1 commons-csv-1.10.0-RC1

Maven artifacts are here:
    
https://repository.apache.org/content/repositories/orgapachecommons-1600/org/apache/commons/commons-csv/1.10.0/

These are the artifacts and their hashes:

#Release SHA-512s
#Sun Oct 16 08:32:39 EDT 2022
Apache\ Commons\
CSV-1.10.0.spdx.rdf.xml=c8cf7495637ce5282c72aa8e8ef4b93f206fbbdf21c7b337a34daa3b89af9816073d0185118cab4cdbed5104cc7f090e81efb614594079b02829c96829fe8aeb
commons-csv-1.10.0-bin.tar.gz=5b0ff3f5cc4aeff6c2aa2c11e92ea8b1b1fbbcd098ba79d97bcfd9af3ea08a9ebc922e5237bda8abcb238eb0d2fe7a5ef27534d0d796a1fc77691684a698f3d1
commons-csv-1.10.0-bin.zip=a52caa6ccda5b830c133c965fd6843d8f960e1ce3ef83decf5435022481746093cf8b4308c1fded21a168b751724b4a9f5e72f3eca34381828968bff89e7239b
commons-csv-1.10.0-bom.json=0e3e6652e520afb697461a3f6b33458dc8ef1dd157c2b1442664b8f12f7504226916e64960cea5cc92dd2b923d18047ed921be60eb7febfbeffc8ac3e3254fc0
commons-csv-1.10.0-bom.xml=91e494b94eea35ee7d85c8d8dbdc5029f981e0c5a8b1aae2b0f3cd77d1cea0fb0fc8c857d18d41ca055d12c4c4eb1adf70582f178a88751b836861c3286283db
commons-csv-1.10.0-javadoc.jar=719656f5399e0889af2e8b40f49458d0dec6e8ae85413f36caa2187235c68d9e42502bf07c66c0bcc7c5c3427bc0aac51f9897581842d303501bbff5ec73bdc0
commons-csv-1.10.0-sources.jar=c78c8fa112ef35d6d6c061b05e08816ce516a3acd56d720c59dbdaeb9d1202b16294233bd2aa2fac91cfa759456f8d7e08951876d85decdf0b8fb215a8e31754
commons-csv-1.10.0-src.tar.gz=a0d3a7aedab567aaa32a3dd1ceddeff412e686f3fd2b0805993db7e3dd4804b4d5d59e70f1500f23e8d0ba6531832e9f20d0c298bd443d9825724d5d1b1c87b3
commons-csv-1.10.0-src.zip=5cbdbcf367c9dd78e76228817592d70378b3cd8d919fdcd7df5c3eb2d7ebd8bd47fbb17376a631efb640be4115d42d3cc7aafa866bfefdfe073d5a4fcf625e0b
commons-csv-1.10.0-test-sources.jar=0d77c791934609b176457df5c60fdd5bfa3c6f392375adef81e5e7cd31de7fb25bd91b0bbe60a9ff46acb62efc2a33660adff24850370b7546a71374aa267f94
commons-csv-1.10.0-tests.jar=9a08e4d2ec865d9e1dafc01e4ee58804490b2d50f886bac47a5235581077b31a15dda5b3f8f2adb5a70be8794cca886126d980d6d3722eec2e10471fc24e1940

I have tested this with 'mvn' and 'mvn -V -Duser.name=$my_apache_id
-Prelease -Ptest-deploy -P jacoco -P japicmp clean package site
deploy' using:

Darwin *** 21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 20:17:10
PDT 2022; root:xnu-8020.140.49~2/RELEASE_X86_64 x86_64

Apache Maven 3.8.6 

Re: commons-parent: add revapi, drop clirr

2022-08-31 Thread sman81
commons-parent has two plugins "doing the same thing": japicmp and clirr

We drop the one that does not work / is unmaintained and not future-proof 
(clirr) and replace it by revapi. Still two plugins.

Rather than doing the same thing, these plugins complement each other - kind of 
like Checkstyle and PMD: large overlap yet they produce findings the other one 
does not.

japicmp has failed to recognize a recent incompatible API change in csv-commons 
that revapi spots 
(CSV-302) therefore 
brushing off revapi is not a good solution (to me).
At the same time there might be scenarios where japicmp beats revapi.

cheers
Markus


From: Gary Gregory 
Sent: Monday, August 29, 2022 12:28
To: Commons Developers List 
Subject: Re: commons-parent: add revapi, drop clirr

Dropping Clirr is fine by me but adding adding another plugin to do the
same thing that JApiCmp is not (for me), it's confusing. A component can
always choose to replace JApiCmp with revapi.

Gary

On Mon, Aug 29, 2022, 05:50  wrote:

> Hi,
>
> as part of commons-csv pr 252<
> https://github.com/apache/commons-csv/pull/252> there is a discussion
> whether to add revapi-maven-plugin to commons-parent in addition to
> japicmp-maven-plugin and drop clirr-maven-plugin.
> The plugins perform checks to ensure source compatibility and binary
> compatibility of artifacts.
>
> The proposal is that japicmp and revapi co-exist in parent as together the
> tools provide better coverage than just having either one.
> Clirr is not under active maintenance any longer and will be removed.
>
> Does anyone have thoughts or objections?
>
> cheers
> Markus
>


commons-parent: add revapi, drop clirr

2022-08-29 Thread sman81
Hi,

as part of commons-csv pr 252 
there is a discussion whether to add revapi-maven-plugin to commons-parent in 
addition to japicmp-maven-plugin and drop clirr-maven-plugin.
The plugins perform checks to ensure source compatibility and binary 
compatibility of artifacts.

The proposal is that japicmp and revapi co-exist in parent as together the 
tools provide better coverage than just having either one.
Clirr is not under active maintenance any longer and will be removed.

Does anyone have thoughts or objections?

cheers
Markus