Re: commons.text.CaseUtils

2024-04-10 Thread Stephan Peters
Gary, thank you for your response.

I initiated the pull request (#528) and already received some very
constructive feedback from user mbenson.
I am modifying the code to contain fewer methods that may be externally
modified by a user, if something as simple as .toLowerCase() is required.

I also noticed some recent discussion of this which you commented on in
pull 450  Cases API + 4 implementations (Pascal, Camel, Kebab, Snake) #450

When I am done with the edits and new tests and pushed them to my fork, I
may join this conversation #450.

My Jira account has been approved (after an initial disapproval.) I haven't
looked at it yet, I will look for similar topics there.

I also uncovered an issue with my code when I devised some tests I
specifically designed to break it if possible, and I need to fix this.

assertThat(CaseUtils.toTitleCase(" ' \u2019 Titl'e Case \u2019 '
")).isEqualTo("Title_Case");  // todo fix this failure.

org.opentest4j.AssertionFailedError:
expected: "Title_Case"
 but was: "Title_Case_’_'"
Expected :"Title_Case"
Actual   :"Title_Case_’_'"

This is because of the way I handle apostrophes so "That's good!" will
return "Thats_Good"

Again, thank you for your response.

Stephan Peters


On Tue, Apr 9, 2024 at 5:56 PM Stephan Peters 
wrote:

> OK, I will initiate a PR.
> Some of the added methods will be more useful than others.
> The PR will come from speters33w.
>
> Thank you,
> Stephan Peters
>
> On Tue, Apr 9, 2024 at 5:31 PM Gary Gregory 
> wrote:
>
>> Hello Stephan,
>>
>> The best way to see what you are proposing is a PR, it's a bit painful to
>> see differences otherwise, at least for me.
>>
>> That said anything new should solve a real world use case, not merely
>> something that might be useful (or not) 
>>
>> I think seeing tests in a PR will help clarify what it is you are
>> proposing
>> that the current code doesn't do.
>>
>> See also also https://github.com/apache/commons-text/pull/450
>>
>> TY,
>> Gary
>>
>> On Tue, Apr 9, 2024, 4:37 PM Stephan Peters
>>  wrote:
>>
>> > I added several methods to the org.apache.commons.CaseUtils class I
>> think
>> > would be very useful, for example to use for normalized naming
>> conventions
>> > for file paths, file names, URLs, etc.
>> >
>> > I'm planning on initiating a pull request.
>> >
>> > I would like to discuss it here.
>> >
>> > I've posted it in a fork here:
>> >
>> >
>> https://github.com/speters33w/commons-text/blob/master/src/main/java/org/apache/commons/text/CaseUtils.java
>> >
>> > and written new tests for all the methods that pass here:
>> >
>> >
>> https://github.com/speters33w/commons-text/blob/master/src/test/java/org/apache/commons/text/CaseUtilsTest.java
>> >
>> > There is an example of the method return values at the top of the
>> revised
>> > CaseUtils.java.
>> >
>> > The methods have a little different behavior than the existing
>> > toCamelCase(String, boolean, char[]) (which I left intact) in that they
>> > normalize the input first before processing, so toCamelSnakeCase("The
>> > café’s piñata gave me déjà vu.") will return
>> > "the_Cafes_Pinata_Gave_Me_Deja_Vu"
>> >
>> > The main driver engine is in the toTitleCase() method and the rest of
>> the
>> > methods piggyback on that engine and perform minor changes to the return
>> > value.
>> >
>> > If anyone feels like taking a look, I'd appreciate any feedback.
>> >
>> > Thank you.
>> >
>> > Stephan Peters
>> >
>>
>


Re: [IO] Change in behavior in Commons FileUpload after upgrade to Commons IO 2.16.1

2024-04-10 Thread Gary Gregory
Stephan,

Thank you for your message.

This is more of a design deffect IMO,. If there is a desire for disabling a
feature like caching then there should be a toggle for that, not relying on
a side effect of a magic number. PRs welcome! :-)

Gary


On Wed, Apr 10, 2024, 9:24 AM Stephan Markwalder
 wrote:

> Hi,
>
> Today, I found the following questions in
> https://github.com/apache/commons-io/pull/609:
>
> > The behavior for a negative threshold should be the same as 0 IMO. WDYT?
> > What does it even mean that a threshold is negative?
> > Writing zero bytes writes nothing, so there is nothing to reach until
> you at least write one byte.
> > Am I missing something?
>
> I would like to highlight a "use case" for a negative threshold, and how
> the change to disallow a negative threshold might impact existing code.
>
> I upgraded from Commons IO 2.15.1 to 2.16.1 yesterday and found a small
> change in the behavior of Commons FileUpload when uploading and processing
> empty files. The effect is visible only when passing a negative value as
> file size threshold to Commons FileUpload. Here is a small extract from the
> Javadoc of Commons FileUpload, class `DefaultFileItemFactory`, constructor
> parameter `sizeThreshold`:
>
> > sizeThreshold - The threshold, in bytes, below which items will be
> > retained in memory and above which they will be stored as a file.
> (source:
> https://javadoc.io/doc/commons-fileupload/commons-fileupload/latest/index.html
> )
>
> By passing a negative value for `sizeThreshold`, Commons FileUpload can be
> configured to disable the in-memory caching for all uploaded files,
> including empty files with a size of 0 bytes. As a result, `DiskFileItem`
> objects created by Commons FileUpload will always have a `File` instance
> set internally, even for empty files.
>
> `DiskFileItem` in Commons FileUpload internally makes use of
> `DeferredFileOutputStream`, and therefore `ThresholdingOutputStream`. At
> some point it calls `isThresholdExceeded()` to check whether the size of
> the uploaded file exceeds the given threshold. By disallowing a negative
> threshold, empty files will now be treated differently by Commons
> FileUpload. With a size of 0 bytes, they will not exceed the enforced
> minimum threshold of 0 bytes anymore, and their data will therefore be kept
> in memory. This can break follow-up code which relies on the previous
> behavior and expects a `File` instance to be created for every uploaded
> file, even empty files.
>
> I know that this is a very specific use case. I don't know whether the
> developers of Commons FileUpload ever intended a negative threshold to be
> used. Still, the question was asked whether a negative threshold could have
> any meaning. I assume the answer is "yes". But I don't know whether this
> qualifies as a bug or a regression. I also don't know whether there are
> other similar use cases in other libraries depending on Commons IO.
>
> Best,
> Stephan
> Email Disclaimer
> FNZ (UK) Ltd registered in England and Wales (05435760) 10th Floor, 135
> Bishopsgate, London EC2M 3TP, FNZ (UK) Ltd is authorised and regulated by
> the Financial Conduct Authority (438687); FNZ TA Services Ltd registered in
> England and Wales (09571767) 10th Floor, 135 Bishopsgate, London EC2M 3TP,
> FNZ TA Services Ltd is authorised and regulated by the Financial Conduct
> Authority (932253); FNZ Securities Ltd registered in England and Wales
> (09486463) 10th Floor, 135 Bishopsgate, London EC2M 3TP, FNZ Securities
> Ltd, is authorised and regulated by the Financial Conduct Authority
> (733400); JHC Systems Limited registered in England and Wales (08729370)
> Temple Point 6th Floor, 1 Temple Row, Birmingham, West Midlands, B2 5LG;
> FNZ (Europe) DAC registered in Ireland (657886)  Block C, Irish Life
> Centre, Lower Abbey Street, Dublin 1, D01V9F5, Ireland; FNZ SA (Pty) Ltd
> registered under the laws of South Africa (2018/547997/07), 1st floor,
> Newport House, Prestwich Street, Greenpoint, western Cape, 8001; FNZ
> Limited registered in New Zealand (1797706) FNZ House, Level 3, 29A Brandon
> Street, Wellington, 6011 New Zealand; FNZ (Australia) Pty Ltd registered in
> Australia (138 819 119) Level 1, 99 Elizabeth St, Sydney 2000; FNZ (Hong
> Kong) Limited registered in Hong Kong (1305362) 6A-1, Koshun House, 331
> Nathan Road, Hong Kong; FNZ (Singapore) Services Pte. Ltd. registered in
> Singapore (201307199R) 61 Robinson Road, #13-03A, Robinson Centre,
> Singapore (068893); and FNZ (China) Ltd registered in China
> (91310115MA1K3G4K6T) [中国(上海)自由贸易试验区世纪大道1196 号二 座20 层.
> This message is intended solely for the addressee and may contain
> confidential information. If you have received this message in error,
> please send it back to us, and immediately and permanently delete it. Do
> not use, copy or disclose the information contained in this message or in
> any attachment.
> Emails sent to and from FNZ may be monitored and read for legitimate
> business purposes. Emails 

[IO] Change in behavior in Commons FileUpload after upgrade to Commons IO 2.16.1

2024-04-10 Thread Stephan Markwalder
Hi,

Today, I found the following questions in 
https://github.com/apache/commons-io/pull/609:

> The behavior for a negative threshold should be the same as 0 IMO. WDYT?
> What does it even mean that a threshold is negative?
> Writing zero bytes writes nothing, so there is nothing to reach until you at 
> least write one byte.
> Am I missing something?

I would like to highlight a "use case" for a negative threshold, and how the 
change to disallow a negative threshold might impact existing code.

I upgraded from Commons IO 2.15.1 to 2.16.1 yesterday and found a small change 
in the behavior of Commons FileUpload when uploading and processing empty 
files. The effect is visible only when passing a negative value as file size 
threshold to Commons FileUpload. Here is a small extract from the Javadoc of 
Commons FileUpload, class `DefaultFileItemFactory`, constructor parameter 
`sizeThreshold`:

> sizeThreshold - The threshold, in bytes, below which items will be
> retained in memory and above which they will be stored as a file.
(source: 
https://javadoc.io/doc/commons-fileupload/commons-fileupload/latest/index.html)

By passing a negative value for `sizeThreshold`, Commons FileUpload can be 
configured to disable the in-memory caching for all uploaded files, including 
empty files with a size of 0 bytes. As a result, `DiskFileItem` objects created 
by Commons FileUpload will always have a `File` instance set internally, even 
for empty files.

`DiskFileItem` in Commons FileUpload internally makes use of 
`DeferredFileOutputStream`, and therefore `ThresholdingOutputStream`. At some 
point it calls `isThresholdExceeded()` to check whether the size of the 
uploaded file exceeds the given threshold. By disallowing a negative threshold, 
empty files will now be treated differently by Commons FileUpload. With a size 
of 0 bytes, they will not exceed the enforced minimum threshold of 0 bytes 
anymore, and their data will therefore be kept in memory. This can break 
follow-up code which relies on the previous behavior and expects a `File` 
instance to be created for every uploaded file, even empty files.

I know that this is a very specific use case. I don't know whether the 
developers of Commons FileUpload ever intended a negative threshold to be used. 
Still, the question was asked whether a negative threshold could have any 
meaning. I assume the answer is "yes". But I don't know whether this qualifies 
as a bug or a regression. I also don't know whether there are other similar use 
cases in other libraries depending on Commons IO.

Best,
Stephan
Email Disclaimer
FNZ (UK) Ltd registered in England and Wales (05435760) 10th Floor, 135 
Bishopsgate, London EC2M 3TP, FNZ (UK) Ltd is authorised and regulated by the 
Financial Conduct Authority (438687); FNZ TA Services Ltd registered in England 
and Wales (09571767) 10th Floor, 135 Bishopsgate, London EC2M 3TP, FNZ TA 
Services Ltd is authorised and regulated by the Financial Conduct Authority 
(932253); FNZ Securities Ltd registered in England and Wales (09486463) 10th 
Floor, 135 Bishopsgate, London EC2M 3TP, FNZ Securities Ltd, is authorised and 
regulated by the Financial Conduct Authority (733400); JHC Systems Limited 
registered in England and Wales (08729370) Temple Point 6th Floor, 1 Temple 
Row, Birmingham, West Midlands, B2 5LG; FNZ (Europe) DAC registered in Ireland 
(657886)  Block C, Irish Life Centre, Lower Abbey Street, Dublin 1, D01V9F5, 
Ireland; FNZ SA (Pty) Ltd registered under the laws of South Africa 
(2018/547997/07), 1st floor, Newport House, Prestwich Street, Greenpoint, 
western Cape, 8001; FNZ Limited registered in New Zealand (1797706) FNZ House, 
Level 3, 29A Brandon Street, Wellington, 6011 New Zealand; FNZ (Australia) Pty 
Ltd registered in Australia (138 819 119) Level 1, 99 Elizabeth St, Sydney 
2000; FNZ (Hong Kong) Limited registered in Hong Kong (1305362) 6A-1, Koshun 
House, 331 Nathan Road, Hong Kong; FNZ (Singapore) Services Pte. Ltd. 
registered in Singapore (201307199R) 61 Robinson Road, #13-03A, Robinson 
Centre, Singapore (068893); and FNZ (China) Ltd registered in China 
(91310115MA1K3G4K6T) [中国(上海)自由贸易试验区世纪大道1196 号二 座20 层.
This message is intended solely for the addressee and may contain confidential 
information. If you have received this message in error, please send it back to 
us, and immediately and permanently delete it. Do not use, copy or disclose the 
information contained in this message or in any attachment. 
Emails sent to and from FNZ may be monitored and read for legitimate business 
purposes. Emails cannot be guaranteed to be secure or error-free, and you 
should protect your systems. FNZ does not accept any liability arising from 
interception, corruption, loss or destruction of this email, or if it arrives 
late or incomplete or with errors or viruses.
For more information about the FNZ group please visit our website here where 
you can also find links to our policies including our Privacy policy which 

Re: [collections]

2024-04-10 Thread Elliotte Rusty Harold
On Tue, Apr 9, 2024 at 11:09 PM Rodion Efremov  wrote:
>
> Hello,
>
> Fair enough. However, why we have CursorableLinkedList and
> NodeCachingLinkedList around when my previous benchmarking showed that they
> are inferior compated to both TreeList and IndexedLinkedList?

We have a lot of things we don't need and that shouldn't be used. It
happens sometimes on long lived bazaar-style projects without a clear
vision and maintainer. If those two classes are demonstrably inferior,
it might be worthwhile deprecating them. Meanwhile I'd prefer not to
make the situation worse. We already have more code than we can
maintain, and are wasting a lot of dev cycles on idiosyncratic churn
to no good end.

> Also, note that TreeList requires 3 references, 2 ints and 2 booleans per
> node. IndexedLinkedList requires only 3  references per node.
>
> If you need benchmarking on small lists, just tell me and I will arrange
> that.

Lies, damned lies, and benchmarks. :-)

Benchmarking is hard and rarely matches reality. By coincidence, I
spent last week learning about the damage the TPCH benchmarks do in
the database space. The benchmarks that matter are profiles of real
world applications, and every application is different. Better
algorithms are sometimes discovered, even for well trod territory like
lists, but typically they only improve performance in the limit and
often decrease performance in real world applications.

Looking at the repo, this seems to be a newly constructed data
structure. I suggest cleaning up the blog post and submitting it to an
appropriate peer reviewed journal in the field and posting the
preprint on arxiv so true experts can take a look. (I'm just a
practitioner.) If the data structure proves out over time in real
world use cases, then it should be considered for Apache Commons.
However, I don't think Commons is the right place for bleeding edge
research.

-- 
Elliotte Rusty Harold
elh...@ibiblio.org

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org