[jira] [Created] (JEXL-377) Add support for javascript style function definition

2022-08-06 Thread Xu Pengcheng (Jira)
Xu Pengcheng created JEXL-377:
-

 Summary: Add support for javascript style function definition
 Key: JEXL-377
 URL: https://issues.apache.org/jira/browse/JEXL-377
 Project: Commons JEXL
  Issue Type: Improvement
Reporter: Xu Pengcheng


It would be nice to allow the Javascript function definition:
{code:java}
function add(x, y) {
  return x + y;
}{code}
Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONFIGURATION-793) Error reading a list of complex objects from JSON after 2.3

2022-08-06 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/CONFIGURATION-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576215#comment-17576215
 ] 

Gary D. Gregory commented on CONFIGURATION-793:
---

In git master, I see that {{testGetList_nested_with_list}} passes and 
{{testGetList}} fails, nothing jumps out looking at the test in the debugger. 
Not sure why there is not a case for that.

I added {{testGetList_nested_with_list}} to the test class and a new disabled 
test method.

Thoughts from anyone?

Note: You assert parameters are reversed, and the pattern is 
{{assertX(expected, actual)}}.

> Error reading a list of complex objects from JSON after 2.3
> ---
>
> Key: CONFIGURATION-793
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-793
> Project: Commons Configuration
>  Issue Type: Bug
>Affects Versions: 2.3
> Environment: JDK 11.0.4
>Reporter: Cliff Evans
>Priority: Major
>
> We have just attempted to move from version 2.2 to 2.7 and have encountered 
> and issue whilst attempting to read a List from a JSON formatted 
> configuration file.
> The issue appears to have been introduced in 2.3.
> Adding the following tests to 
> org.apache.commons.configuration2.TestJSONConfiguration will demonstrate the 
> change.  These tests will pass against 2.2 but fail agains 2.3 (and 2.7.)
> https://issues.apache.org/jira/browse/CONFIGURATION-686 appears to have fixed 
> access to lists of complex objects using the dot notation for strings (see 
> testGetProperty_dictionaryInList() in TestJSONConfiguration) but has broken 
> access to the list of complex objects.
>  
> {noformat}
> @Test
> public void testGetList_nested_with_list()
> {
> assertEquals(Arrays.asList("col1", "col2"),
>  jsonConfiguration.getList(String.class, "key4.key5"));
> }
> @Test
> public void testGetList() {
> final List configList = jsonConfiguration.getList(Map.class, 
> "capitals");
> assertEquals(configList.size(), 2);
> assertEquals(configList.get(0).get("country"), "USA");
> assertEquals(configList.get(0).get("capital"), "Washington");
> assertEquals(configList.get(1).get("country"), "UK");
> assertEquals(configList.get(1).get("capital"), "London");
> }
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONFIGURATION-819) Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write

2022-08-06 Thread Gilles Sadowski (Jira)


[ 
https://issues.apache.org/jira/browse/CONFIGURATION-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576197#comment-17576197
 ] 

Gilles Sadowski commented on CONFIGURATION-819:
---

bq. What do others think?

So this is the error:
{noformat}
Java Exception: org.yaml.snakeyaml.error.YAMLException: invalid string value 
has occurred
{noformat}

TL;DR; but it seems that caller says "Here is a YAML string" and the YAML 
implementation replies "No, it's not".  How is it different, in principle, from 
"Here is a non-null reference", and the JVM replying with NPE if it is, in 
fact, null?

I don't agree with the patch, because the cause of the error is *not* an I/O 
issue; it is a programming error (on the part of the caller).
The original exception _is-a_ {{RuntimeException}} as it should (as per J. 
Bloch's "Effective Java"...): There no reason to wrap it in a checked exception.

> Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write
> -
>
> Key: CONFIGURATION-819
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-819
> Project: Commons Configuration
>  Issue Type: Bug
>Reporter: Weber Jo
>Priority: Major
> Attachments: 48192.patch, 
> clusterfuzz-testcase-YAMLConfigurationWriteFuzzer-5634459279425536, 
> clusterfuzz-testcase-minimized-YAMLConfigurationWriteFuzzer-5634459279425536, 
> stacktrace.txt
>
>
> When executing YAMLConfiguration.write with malformed input, there is the 
> possibility to receive a snakeyaml.error.YAMLException which does not get 
> caught and leads to a crash.
> This was found through OSS-Fuzz ([Crash 
> #48192|https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48192]).
> I attached the stacktrace and the crashing inputs.
> Furthermore, I attached a possible fix that suppresses the given crashing 
> inputs.
> It passes all unit tests, but I am not sure if fits your code standards or if 
> you want to catch the exception earlier (as in YAMLConfiguration.dump)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (CONFIGURATION-814) Apache Commons Configuration 2.7 used spring version is 4.3.26,Impacted by Vulnerability CVE-2022-22965

2022-08-06 Thread Gary D. Gregory (Jira)


 [ 
https://issues.apache.org/jira/browse/CONFIGURATION-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary D. Gregory closed CONFIGURATION-814.
-
Fix Version/s: 2.8.0
   Resolution: Fixed

We released 2.8.0 which depends on Spring 5.3.21.

> Apache Commons Configuration 2.7 used spring version is 4.3.26,Impacted by 
> Vulnerability CVE-2022-22965
> ---
>
> Key: CONFIGURATION-814
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-814
> Project: Commons Configuration
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Radar wen
>Priority: Major
> Fix For: 2.8.0
>
>
> !image-2022-05-11-17-08-50-207.png! Is there a plan to release a new version?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CONFIGURATION-818) Stackoverflow bugs fixed in 2.8.0

2022-08-06 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/CONFIGURATION-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576192#comment-17576192
 ] 

Gary D. Gregory edited comment on CONFIGURATION-818 at 8/6/22 12:22 PM:


No, the reproducer links are useless (to me at least): "You 
(email=***@gmail.com) are not authorized to access this page!" If I choose my 
GitHub account, it wants personal information: "oss-fuzz login by oliverchang
wants to access your garydgregory account", so no. Example link: 
https://oss-fuzz.com/download?testcase_id=4832418871246848

You can create PRs on GitHub that demonstrate the problems if want to move this 
through, so we can see what the deal is for each link.


was (Author: garydgregory):
No, the links are useless (to me at least): "You (email=***@gmail.com) are not 
authorized to access this page!" If I choose my GitHub account, it wants 
personal information: "oss-fuzz login by oliverchang
wants to access your garydgregory account", so no.

You can create PRs on GitHub that demonstrate the problems if want to move this 
through, so we can see what the deal is for each link.

> Stackoverflow bugs fixed in 2.8.0
> -
>
> Key: CONFIGURATION-818
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-818
> Project: Commons Configuration
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Henry Lin
>Priority: Major
>  Labels: security
> Fix For: 2.8.0
>
>
> Dear Apache Commons Configuration maintainers,
> The Code Intelligence JVM fuzzer 
> [Jazzer|https://github.com/CodeIntelligenceTesting/jazzer] has found multiple 
> vulnerabilities in Apache Commons Configuration during a fuzzing run in 
> [Google OSS-Fuzz|https://github.com/google/oss-fuzz]. The vulnerabilities 
> were already fixed. Version <= 2.7 of Apache Commons Configuration is 
> vulnerable.
> Detailed Information can be found here:
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48737]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48610]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48522]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48391]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48195]
>  
> Please let me know if you have any questions regarding fuzzing or the 
> OSS-Fuzz integration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (CONFIGURATION-818) Stackoverflow bugs fixed in 2.8.0

2022-08-06 Thread Gary D. Gregory (Jira)


 [ 
https://issues.apache.org/jira/browse/CONFIGURATION-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary D. Gregory resolved CONFIGURATION-818.
---
Resolution: Fixed

> Stackoverflow bugs fixed in 2.8.0
> -
>
> Key: CONFIGURATION-818
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-818
> Project: Commons Configuration
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Henry Lin
>Priority: Major
>  Labels: security
> Fix For: 2.8.0
>
>
> Dear Apache Commons Configuration maintainers,
> The Code Intelligence JVM fuzzer 
> [Jazzer|https://github.com/CodeIntelligenceTesting/jazzer] has found multiple 
> vulnerabilities in Apache Commons Configuration during a fuzzing run in 
> [Google OSS-Fuzz|https://github.com/google/oss-fuzz]. The vulnerabilities 
> were already fixed. Version <= 2.7 of Apache Commons Configuration is 
> vulnerable.
> Detailed Information can be found here:
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48737]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48610]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48522]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48391]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48195]
>  
> Please let me know if you have any questions regarding fuzzing or the 
> OSS-Fuzz integration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONFIGURATION-818) Stackoverflow bugs fixed in 2.8.0

2022-08-06 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/CONFIGURATION-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576192#comment-17576192
 ] 

Gary D. Gregory commented on CONFIGURATION-818:
---

No, the links are useless (to me at least): "You (email=***@gmail.com) are not 
authorized to access this page!" If I choose my GitHub account, it wants 
personal information: "oss-fuzz login by oliverchang
wants to access your garydgregory account", so no.

You can create PRs on GitHub that demonstrate the problems if want to move this 
through, so we can see what the deal is for each link.

> Stackoverflow bugs fixed in 2.8.0
> -
>
> Key: CONFIGURATION-818
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-818
> Project: Commons Configuration
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Henry Lin
>Priority: Major
>  Labels: security
> Fix For: 2.8.0
>
>
> Dear Apache Commons Configuration maintainers,
> The Code Intelligence JVM fuzzer 
> [Jazzer|https://github.com/CodeIntelligenceTesting/jazzer] has found multiple 
> vulnerabilities in Apache Commons Configuration during a fuzzing run in 
> [Google OSS-Fuzz|https://github.com/google/oss-fuzz]. The vulnerabilities 
> were already fixed. Version <= 2.7 of Apache Commons Configuration is 
> vulnerable.
> Detailed Information can be found here:
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48737]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48610]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48522]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48391]
> [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48195]
>  
> Please let me know if you have any questions regarding fuzzing or the 
> OSS-Fuzz integration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONFIGURATION-819) Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write

2022-08-06 Thread Weber Jo (Jira)


[ 
https://issues.apache.org/jira/browse/CONFIGURATION-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576191#comment-17576191
 ] 

Weber Jo commented on CONFIGURATION-819:


Clearly looks like I misunderstood the term "crash".
I am sorry for that. I will remember it for future issues.

> Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write
> -
>
> Key: CONFIGURATION-819
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-819
> Project: Commons Configuration
>  Issue Type: Bug
>Reporter: Weber Jo
>Priority: Major
> Attachments: 48192.patch, 
> clusterfuzz-testcase-YAMLConfigurationWriteFuzzer-5634459279425536, 
> clusterfuzz-testcase-minimized-YAMLConfigurationWriteFuzzer-5634459279425536, 
> stacktrace.txt
>
>
> When executing YAMLConfiguration.write with malformed input, there is the 
> possibility to receive a snakeyaml.error.YAMLException which does not get 
> caught and leads to a crash.
> This was found through OSS-Fuzz ([Crash 
> #48192|https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48192]).
> I attached the stacktrace and the crashing inputs.
> Furthermore, I attached a possible fix that suppresses the given crashing 
> inputs.
> It passes all unit tests, but I am not sure if fits your code standards or if 
> you want to catch the exception earlier (as in YAMLConfiguration.dump)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CONFIGURATION-819) Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write

2022-08-06 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/CONFIGURATION-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576189#comment-17576189
 ] 

Gary D. Gregory commented on CONFIGURATION-819:
---

FYI, a "crash" is when the JVM crashes, not when an exception is thrown. So 
this is neither a "crash" nor a "Major" issue. Some projects/components choose 
to simply document what exceptions a method throws and leave it at that. I'm 
not sure what is best here: Rethrow or document? What do others think?


> Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write
> -
>
> Key: CONFIGURATION-819
> URL: https://issues.apache.org/jira/browse/CONFIGURATION-819
> Project: Commons Configuration
>  Issue Type: Bug
>Reporter: Weber Jo
>Priority: Major
> Attachments: 48192.patch, 
> clusterfuzz-testcase-YAMLConfigurationWriteFuzzer-5634459279425536, 
> clusterfuzz-testcase-minimized-YAMLConfigurationWriteFuzzer-5634459279425536, 
> stacktrace.txt
>
>
> When executing YAMLConfiguration.write with malformed input, there is the 
> possibility to receive a snakeyaml.error.YAMLException which does not get 
> caught and leads to a crash.
> This was found through OSS-Fuzz ([Crash 
> #48192|https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48192]).
> I attached the stacktrace and the crashing inputs.
> Furthermore, I attached a possible fix that suppresses the given crashing 
> inputs.
> It passes all unit tests, but I am not sure if fits your code standards or if 
> you want to catch the exception earlier (as in YAMLConfiguration.dump)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CONFIGURATION-819) Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write

2022-08-06 Thread Weber Jo (Jira)
Weber Jo created CONFIGURATION-819:
--

 Summary: Uncaught snakeyaml.error.YAMLException in 
YAMLConfiguration.write
 Key: CONFIGURATION-819
 URL: https://issues.apache.org/jira/browse/CONFIGURATION-819
 Project: Commons Configuration
  Issue Type: Bug
Reporter: Weber Jo
 Attachments: 48192.patch, 
clusterfuzz-testcase-YAMLConfigurationWriteFuzzer-5634459279425536, 
clusterfuzz-testcase-minimized-YAMLConfigurationWriteFuzzer-5634459279425536, 
stacktrace.txt

When executing YAMLConfiguration.write with malformed input, there is the 
possibility to receive a snakeyaml.error.YAMLException which does not get 
caught and leads to a crash.

This was found through OSS-Fuzz ([Crash 
#48192|https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48192]).

I attached the stacktrace and the crashing inputs.

Furthermore, I attached a possible fix that suppresses the given crashing 
inputs.
It passes all unit tests, but I am not sure if fits your code standards or if 
you want to catch the exception earlier (as in YAMLConfiguration.dump)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change

2022-08-06 Thread GitBox


Claudenw commented on code in PR #320:
URL: 
https://github.com/apache/commons-collections/pull/320#discussion_r939505118


##
src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+/**
+ * A Hasher that implements combinatorial hashing as as described by
+ * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch 
and Mitzenmacher using the enhanced double hashing technique
+ * described in the wikipedia article  https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing";>Double
 Hashing.
+ * 
+ * Common use for this hasher is to generate bit indices from a byte array 
output of a hashing
+ * or MessageDigest algorithm.
+ *
+ * Thoughts on the hasher input
+ *
+ *Note that it is worse to create smaller numbers for the 
initial and increment. If the initial is 
smaller than
+ * the number of bits in a filter then hashing will start at the same point 
when the size increases; likewise the increment will be
+ * the same if it remains smaller than the number of bits in the filter and so 
the first few indices will be the same if the number of bits
+ * changes (but is still larger than the increment). In a worse 
case scenario with small initial and increment for
+ * all items, hashing may not create indices that fill the full region within 
a much larger filter. Imagine hashers created with initial
+ * and increment values less than 255 with a filter size of 3 
and number of hash functions as 5. Ignoring the
+ * tetrahedral addition (a maximum of 20 for k=5) the max index is 255 * 4 + 
255 = 1275, this covers 4.25% of the filter. This also
+ * ignores the negative wrapping but the behaviour is the same, some bits 
cannot be reached.
+ * 
+ * So this needs to be avoided as the filter probability assumptions will be 
void. If the initial and increment are larger
+ * than the number of bits then the modulus will create a 'random' position 
and increment within the size.
+ * 
+ *
+ * @since 4.5
+ */
+public class EnhancedDoubleHasher implements Hasher {
+
+/**
+ * The initial hash value.
+ */
+private final long initial;
+
+/**
+ * The value to increment the hash value by.
+ */
+private final long increment;
+
+/**
+ * Convert bytes to big-endian long filling with zero bytes as necessary..
+ * @param byteArray the byte array to extract the values from.
+ * @param offset the offset to start extraction from.
+ * @param len the length of the extraction, may be longer than 8.
+ * @return
+ */
+private static long toLong(byte[] byteArray, int offset, int len) {
+long val = 0;
+len = Math.min(len, Long.BYTES);
+int shift = Long.SIZE;
+for (int i = 0; i < len; i++) {
+shift -=  Byte.SIZE;
+val |= ((long) (byteArray[offset + i] & 0x00FF) << shift);

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change

2022-08-06 Thread GitBox


Claudenw commented on code in PR #320:
URL: 
https://github.com/apache/commons-collections/pull/320#discussion_r939505109


##
src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+/**
+ * A Hasher that implements combinatorial hashing as as described by
+ * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch 
and Mitzenmacher using the enhanced double hashing technique
+ * described in the wikipedia article  https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing";>Double
 Hashing.
+ * 
+ * Common use for this hasher is to generate bit indices from a byte array 
output of a hashing
+ * or MessageDigest algorithm.
+ *
+ * Thoughts on the hasher input
+ *
+ *Note that it is worse to create smaller numbers for the 
initial and increment. If the initial is 
smaller than
+ * the number of bits in a filter then hashing will start at the same point 
when the size increases; likewise the increment will be
+ * the same if it remains smaller than the number of bits in the filter and so 
the first few indices will be the same if the number of bits
+ * changes (but is still larger than the increment). In a worse 
case scenario with small initial and increment for
+ * all items, hashing may not create indices that fill the full region within 
a much larger filter. Imagine hashers created with initial
+ * and increment values less than 255 with a filter size of 3 
and number of hash functions as 5. Ignoring the
+ * tetrahedral addition (a maximum of 20 for k=5) the max index is 255 * 4 + 
255 = 1275, this covers 4.25% of the filter. This also
+ * ignores the negative wrapping but the behaviour is the same, some bits 
cannot be reached.
+ * 
+ * So this needs to be avoided as the filter probability assumptions will be 
void. If the initial and increment are larger
+ * than the number of bits then the modulus will create a 'random' position 
and increment within the size.
+ * 
+ *
+ * @since 4.5
+ */
+public class EnhancedDoubleHasher implements Hasher {
+
+/**
+ * The initial hash value.
+ */
+private final long initial;
+
+/**
+ * The value to increment the hash value by.
+ */
+private final long increment;
+
+/**
+ * Convert bytes to big-endian long filling with zero bytes as necessary..
+ * @param byteArray the byte array to extract the values from.
+ * @param offset the offset to start extraction from.
+ * @param len the length of the extraction, may be longer than 8.
+ * @return
+ */
+private static long toLong(byte[] byteArray, int offset, int len) {
+long val = 0;
+len = Math.min(len, Long.BYTES);
+int shift = Long.SIZE;
+for (int i = 0; i < len; i++) {
+shift -=  Byte.SIZE;
+val |= ((long) (byteArray[offset + i] & 0x00FF) << shift);
+}
+return val;
+}
+
+/**
+ * Constructs the EnhancedDoubleHasher from a byte array.
+ * 
+ * This method simplifies the conversion from a Digest or hasher algorithm 
output
+ * to the two values used by the EnhancedDoubleHasher.
+ * The byte array is split in 2 and the first 8 bytes of each half are 
interpreted as a big-endian long value.
+ * Excess bytes are ignored.
+ * If there are fewer than 16 bytes the following conversions are made.
+ *
+ * 
+ * If there is an odd number of bytes the excess byte is assigned to 
the increment value
+ * The bytes alloted are read in big-endian order any byte not 
populated is set to zero.
+ * 
+ * 
+ * This ensures that small arrays generate the largest possible increment 
and initial values.
+ * 
+ * @param buffer the buffer to extract the longs from.
+ * @throws IllegalArgumentException is buffer length is zero.
+ */
+public EnhancedDoubleHasher(byte[] buffer) {
+if (buffer.length == 0) {
+throw new IllegalArgumentException("buffer length must be greater 
than 0");
+  

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change

2022-08-06 Thread GitBox


Claudenw commented on code in PR #320:
URL: 
https://github.com/apache/commons-collections/pull/320#discussion_r939505077


##
src/test/java/org/apache/commons/collections4/bloomfilter/IncrementingHasher.java:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+/**
+ * A Hasher that implements simple combinatorial hashing as as described by
+ * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch 
and Mitzenmacher.
+ *
+ * To be used for testing only.
+ *
+ * @since 4.5
+ */
+class IncrementingHasher implements Hasher {
+
+/**
+ * The initial hash value.
+ */
+private final long initial;
+
+/**
+ * The value to increment the hash value by.
+ */
+private final long increment;
+
+/**
+ * Constructs the IncrementingHasher from 2 longs.  The long values will 
be interpreted as unsigned values.
+ * 
+ * The initial hash value will be the modulus of the initial value.
+ * Subsequent values will be calculated by repeatedly adding the increment 
to the last value and taking the modulus.
+ * 
+ * @param initial The initial value for the hasher.
+ * @param increment The value to increment the hash by on each iteration.
+ * @see #getDefaultIncrement()
+ */
+IncrementingHasher(long initial, long increment) {
+this.initial = initial;
+this.increment = increment;
+}
+
+@Override
+public IndexProducer indices(final Shape shape) {
+Objects.requireNonNull(shape, "shape");
+
+return new IndexProducer() {
+
+@Override
+public boolean forEachIndex(IntPredicate consumer) {
+Objects.requireNonNull(consumer, "consumer");
+int bits = shape.getNumberOfBits();
+/*
+ * Essentially this is computing a wrapped modulus from a 
start point and an

Review Comment:
   fixed



##
src/test/java/org/apache/commons/collections4/bloomfilter/IncrementingHasher.java:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+/**
+ * A Hasher that implements simple combinatorial hashing as as described by
+ * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch 
and Mitzenmacher.
+ *
+ * To be used for testing only.
+ *
+ * @since 4.5
+ */
+class IncrementingHasher implements Hasher {
+
+/**
+ * The initial hash value.
+ */
+private final long initial;
+
+/**
+ * The value to increment the hash value by.
+ */
+private final long increment;
+
+/**
+ * Constructs the IncrementingHasher from 2 longs.  The long values will 
be interpreted as unsigned values.
+ * 
+ * The initial hash value will be the modulus of the initial value.
+ * Subsequent values will be calculated by repeatedly adding the increment 
to the last value and taking the modulus.
+ * 
+ * @param initial The initial value for the hasher.
+ * @param increment The value to increment the hash by on each iteration.
+ * @see #getDefaultIncrement()

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the me

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change

2022-08-06 Thread GitBox


Claudenw commented on code in PR #320:
URL: 
https://github.com/apache/commons-collections/pull/320#discussion_r939505059


##
src/test/java/org/apache/commons/collections4/bloomfilter/IncrementingHasher.java:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+/**
+ * A Hasher that implements simple combinatorial hashing as as described by
+ * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch 
and Mitzenmacher.
+ *
+ * To be used for testing only.
+ *
+ * @since 4.5
+ */
+class IncrementingHasher implements Hasher {
+
+/**
+ * The initial hash value.
+ */
+private final long initial;
+
+/**
+ * The value to increment the hash value by.
+ */
+private final long increment;
+
+/**
+ * Constructs the IncrementingHasher from 2 longs.  The long values will 
be interpreted as unsigned values.
+ * 
+ * The initial hash value will be the modulus of the initial value.
+ * Subsequent values will be calculated by repeatedly adding the increment 
to the last value and taking the modulus.
+ * 
+ * @param initial The initial value for the hasher.
+ * @param increment The value to increment the hash by on each iteration.
+ * @see #getDefaultIncrement()
+ */
+IncrementingHasher(long initial, long increment) {
+this.initial = initial;
+this.increment = increment;
+}
+
+@Override
+public IndexProducer indices(final Shape shape) {
+Objects.requireNonNull(shape, "shape");
+
+return new IndexProducer() {
+
+@Override
+public boolean forEachIndex(IntPredicate consumer) {
+Objects.requireNonNull(consumer, "consumer");
+int bits = shape.getNumberOfBits();
+/*
+ * Essentially this is computing a wrapped modulus from a 
start point and an
+ * increment. So actually you only need two modulus operations 
before the loop.
+ * This avoids any modulus operation inside the while loop. It 
uses a long index
+ * to avoid overflow.
+ */
+long index = EnhancedDoubleHasher.mod(initial, bits);
+int inc = EnhancedDoubleHasher.mod(increment, bits);
+
+for (int functionalCount = 0; functionalCount < 
shape.getNumberOfHashFunctions(); functionalCount++) {
+
+if (!consumer.test((int) index)) {
+return false;
+}
+index += inc;
+index = index >= bits ? index - bits : index;
+}
+return true;
+}
+
+@Override
+public int[] asIndexArray() {
+int[] result = new int[shape.getNumberOfHashFunctions()];
+int[] idx = new int[1];
+/*

Review Comment:
   fixed



##
src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+/**
+ * A Hasher that implements combinatorial hashing as as described by
+ * https

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change

2022-08-06 Thread GitBox


Claudenw commented on code in PR #320:
URL: 
https://github.com/apache/commons-collections/pull/320#discussion_r939505030


##
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractHasherTest.java:
##
@@ -74,16 +77,28 @@ public void testHashing(int k, int m) {
 });
 assertEquals(k * getHasherSize(hasher), count[0],
 () -> String.format("Did not produce k=%d * m=%d indices", k, 
getHasherSize(hasher)));
+
+// test early exit
+count[0] = 0;
+hasher.indices(Shape.fromKM(k, m)).forEachIndex(i -> {
+assertTrue(i >= 0 && i < m, () -> "Out of range: " + i + ", m=" + 
m);
+count[0]++;
+return false;
+});
+assertEquals(1, count[0], "did not exit early" );
 }
 
 @Test
 public void testUniqueIndex() {
-// create a hasher that produces duplicates with the specified shape.
-// this setup produces 5, 17, 29, 41, 53, 65 two times
-Shape shape = Shape.fromKM(12, 72);
-Hasher hasher = new SimpleHasher(5, 12);
-Set set = new HashSet<>();
-assertTrue(hasher.uniqueIndices(shape).forEachIndex(set::add), 
"Duplicate detected");
-assertEquals(6, set.size());
+// generating 11 numbers in the ragne of [0,9] will yield at least on 
collision.

Review Comment:
   fixed



##
src/test/java/org/apache/commons/collections4/bloomfilter/IncrementingHasher.java:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+/**
+ * A Hasher that implements simple combinatorial hashing as as described by
+ * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch 
and Mitzenmacher.
+ *
+ * To be used for testing only.
+ *
+ * @since 4.5
+ */
+class IncrementingHasher implements Hasher {
+
+/**
+ * The initial hash value.
+ */
+private final long initial;
+
+/**
+ * The value to increment the hash value by.
+ */
+private final long increment;
+
+/**
+ * Constructs the IncrementingHasher from 2 longs.  The long values will 
be interpreted as unsigned values.
+ * 
+ * The initial hash value will be the modulus of the initial value.
+ * Subsequent values will be calculated by repeatedly adding the increment 
to the last value and taking the modulus.
+ * 
+ * @param initial The initial value for the hasher.
+ * @param increment The value to increment the hash by on each iteration.
+ * @see #getDefaultIncrement()
+ */
+IncrementingHasher(long initial, long increment) {
+this.initial = initial;
+this.increment = increment;
+}
+
+@Override
+public IndexProducer indices(final Shape shape) {
+Objects.requireNonNull(shape, "shape");
+
+return new IndexProducer() {
+
+@Override
+public boolean forEachIndex(IntPredicate consumer) {
+Objects.requireNonNull(consumer, "consumer");
+int bits = shape.getNumberOfBits();
+/*
+ * Essentially this is computing a wrapped modulus from a 
start point and an
+ * increment. So actually you only need two modulus operations 
before the loop.
+ * This avoids any modulus operation inside the while loop. It 
uses a long index
+ * to avoid overflow.
+ */
+long index = EnhancedDoubleHasher.mod(initial, bits);
+int inc = EnhancedDoubleHasher.mod(increment, bits);
+
+for (int functionalCount = 0; functionalCount < 
shape.getNumberOfHashFunctions(); functionalCount++) {
+

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change

2022-08-06 Thread GitBox


Claudenw commented on code in PR #320:
URL: 
https://github.com/apache/commons-collections/pull/320#discussion_r939504995


##
src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+/**
+ * A Hasher that implements combinatorial hashing as as described by
+ * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch 
and Mitzenmacher using the enhanced double hashing technique
+ * described in the wikipedia article  https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing";>Double
 Hashing.
+ * 
+ * Common use for this hasher is to generate bit indices from a byte array 
output of a hashing
+ * or MessageDigest algorithm.
+ *
+ * Thoughts on the hasher input
+ *
+ *Note that it is worse to create smaller numbers for the 
initial and increment. If the initial is 
smaller than

Review Comment:
   fixed



##
src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import java.util.Objects;
+import java.util.function.IntPredicate;
+
+/**
+ * A Hasher that implements combinatorial hashing as as described by
+ * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch 
and Mitzenmacher using the enhanced double hashing technique
+ * described in the wikipedia article  https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing";>Double
 Hashing.
+ * 
+ * Common use for this hasher is to generate bit indices from a byte array 
output of a hashing
+ * or MessageDigest algorithm.
+ *
+ * Thoughts on the hasher input
+ *
+ *Note that it is worse to create smaller numbers for the 
initial and increment. If the initial is 
smaller than
+ * the number of bits in a filter then hashing will start at the same point 
when the size increases; likewise the increment will be
+ * the same if it remains smaller than the number of bits in the filter and so 
the first few indices will be the same if the number of bits
+ * changes (but is still larger than the increment). In a worse 
case scenario with small initial and increment for
+ * all items, hashing may not create indices that fill the full region within 
a much larger filter. Imagine hashers created with initial
+ * and increment values less than 255 with a filter size of 3 
and number of hash functions as 5. Ignoring the
+ * tetrahedral addition (a maximum of 20 for k=5) the max index is 255 * 4 + 
255 = 1275, this covers 4.25% of the filter. This also
+ * ignores the negative wrapping but the behaviour is the same, some bits 
cannot be reached.
+ * 
+ * So this needs to be avoided as the filter probability assumptions will be 
void. If the initial and increment are larger
+ * than the number of bits then the modulus will create a 'random' position 
and increment within the size.
+ * 
+ *
+ * @since 4.5
+ */
+public class EnhancedDoubleHasher implements Hasher {
+
+/**
+ * The initial hash value.
+ */
+private final long initial;
+
+/**
+ * The value to increment the hash value by.
+ */
+private final long increment;
+
+/**
+ * Convert bytes to big-endian lon