[jira] [Created] (JEXL-377) Add support for javascript style function definition
Xu Pengcheng created JEXL-377: - Summary: Add support for javascript style function definition Key: JEXL-377 URL: https://issues.apache.org/jira/browse/JEXL-377 Project: Commons JEXL Issue Type: Improvement Reporter: Xu Pengcheng It would be nice to allow the Javascript function definition: {code:java} function add(x, y) { return x + y; }{code} Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CONFIGURATION-793) Error reading a list of complex objects from JSON after 2.3
[ https://issues.apache.org/jira/browse/CONFIGURATION-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576215#comment-17576215 ] Gary D. Gregory commented on CONFIGURATION-793: --- In git master, I see that {{testGetList_nested_with_list}} passes and {{testGetList}} fails, nothing jumps out looking at the test in the debugger. Not sure why there is not a case for that. I added {{testGetList_nested_with_list}} to the test class and a new disabled test method. Thoughts from anyone? Note: You assert parameters are reversed, and the pattern is {{assertX(expected, actual)}}. > Error reading a list of complex objects from JSON after 2.3 > --- > > Key: CONFIGURATION-793 > URL: https://issues.apache.org/jira/browse/CONFIGURATION-793 > Project: Commons Configuration > Issue Type: Bug >Affects Versions: 2.3 > Environment: JDK 11.0.4 >Reporter: Cliff Evans >Priority: Major > > We have just attempted to move from version 2.2 to 2.7 and have encountered > and issue whilst attempting to read a List from a JSON formatted > configuration file. > The issue appears to have been introduced in 2.3. > Adding the following tests to > org.apache.commons.configuration2.TestJSONConfiguration will demonstrate the > change. These tests will pass against 2.2 but fail agains 2.3 (and 2.7.) > https://issues.apache.org/jira/browse/CONFIGURATION-686 appears to have fixed > access to lists of complex objects using the dot notation for strings (see > testGetProperty_dictionaryInList() in TestJSONConfiguration) but has broken > access to the list of complex objects. > > {noformat} > @Test > public void testGetList_nested_with_list() > { > assertEquals(Arrays.asList("col1", "col2"), > jsonConfiguration.getList(String.class, "key4.key5")); > } > @Test > public void testGetList() { > final List configList = jsonConfiguration.getList(Map.class, > "capitals"); > assertEquals(configList.size(), 2); > assertEquals(configList.get(0).get("country"), "USA"); > assertEquals(configList.get(0).get("capital"), "Washington"); > assertEquals(configList.get(1).get("country"), "UK"); > assertEquals(configList.get(1).get("capital"), "London"); > } > {noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CONFIGURATION-819) Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write
[ https://issues.apache.org/jira/browse/CONFIGURATION-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576197#comment-17576197 ] Gilles Sadowski commented on CONFIGURATION-819: --- bq. What do others think? So this is the error: {noformat} Java Exception: org.yaml.snakeyaml.error.YAMLException: invalid string value has occurred {noformat} TL;DR; but it seems that caller says "Here is a YAML string" and the YAML implementation replies "No, it's not". How is it different, in principle, from "Here is a non-null reference", and the JVM replying with NPE if it is, in fact, null? I don't agree with the patch, because the cause of the error is *not* an I/O issue; it is a programming error (on the part of the caller). The original exception _is-a_ {{RuntimeException}} as it should (as per J. Bloch's "Effective Java"...): There no reason to wrap it in a checked exception. > Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write > - > > Key: CONFIGURATION-819 > URL: https://issues.apache.org/jira/browse/CONFIGURATION-819 > Project: Commons Configuration > Issue Type: Bug >Reporter: Weber Jo >Priority: Major > Attachments: 48192.patch, > clusterfuzz-testcase-YAMLConfigurationWriteFuzzer-5634459279425536, > clusterfuzz-testcase-minimized-YAMLConfigurationWriteFuzzer-5634459279425536, > stacktrace.txt > > > When executing YAMLConfiguration.write with malformed input, there is the > possibility to receive a snakeyaml.error.YAMLException which does not get > caught and leads to a crash. > This was found through OSS-Fuzz ([Crash > #48192|https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48192]). > I attached the stacktrace and the crashing inputs. > Furthermore, I attached a possible fix that suppresses the given crashing > inputs. > It passes all unit tests, but I am not sure if fits your code standards or if > you want to catch the exception earlier (as in YAMLConfiguration.dump) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (CONFIGURATION-814) Apache Commons Configuration 2.7 used spring version is 4.3.26,Impacted by Vulnerability CVE-2022-22965
[ https://issues.apache.org/jira/browse/CONFIGURATION-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary D. Gregory closed CONFIGURATION-814. - Fix Version/s: 2.8.0 Resolution: Fixed We released 2.8.0 which depends on Spring 5.3.21. > Apache Commons Configuration 2.7 used spring version is 4.3.26,Impacted by > Vulnerability CVE-2022-22965 > --- > > Key: CONFIGURATION-814 > URL: https://issues.apache.org/jira/browse/CONFIGURATION-814 > Project: Commons Configuration > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Radar wen >Priority: Major > Fix For: 2.8.0 > > > !image-2022-05-11-17-08-50-207.png! Is there a plan to release a new version? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CONFIGURATION-818) Stackoverflow bugs fixed in 2.8.0
[ https://issues.apache.org/jira/browse/CONFIGURATION-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576192#comment-17576192 ] Gary D. Gregory edited comment on CONFIGURATION-818 at 8/6/22 12:22 PM: No, the reproducer links are useless (to me at least): "You (email=***@gmail.com) are not authorized to access this page!" If I choose my GitHub account, it wants personal information: "oss-fuzz login by oliverchang wants to access your garydgregory account", so no. Example link: https://oss-fuzz.com/download?testcase_id=4832418871246848 You can create PRs on GitHub that demonstrate the problems if want to move this through, so we can see what the deal is for each link. was (Author: garydgregory): No, the links are useless (to me at least): "You (email=***@gmail.com) are not authorized to access this page!" If I choose my GitHub account, it wants personal information: "oss-fuzz login by oliverchang wants to access your garydgregory account", so no. You can create PRs on GitHub that demonstrate the problems if want to move this through, so we can see what the deal is for each link. > Stackoverflow bugs fixed in 2.8.0 > - > > Key: CONFIGURATION-818 > URL: https://issues.apache.org/jira/browse/CONFIGURATION-818 > Project: Commons Configuration > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Henry Lin >Priority: Major > Labels: security > Fix For: 2.8.0 > > > Dear Apache Commons Configuration maintainers, > The Code Intelligence JVM fuzzer > [Jazzer|https://github.com/CodeIntelligenceTesting/jazzer] has found multiple > vulnerabilities in Apache Commons Configuration during a fuzzing run in > [Google OSS-Fuzz|https://github.com/google/oss-fuzz]. The vulnerabilities > were already fixed. Version <= 2.7 of Apache Commons Configuration is > vulnerable. > Detailed Information can be found here: > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48737] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48610] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48522] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48391] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48195] > > Please let me know if you have any questions regarding fuzzing or the > OSS-Fuzz integration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (CONFIGURATION-818) Stackoverflow bugs fixed in 2.8.0
[ https://issues.apache.org/jira/browse/CONFIGURATION-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary D. Gregory resolved CONFIGURATION-818. --- Resolution: Fixed > Stackoverflow bugs fixed in 2.8.0 > - > > Key: CONFIGURATION-818 > URL: https://issues.apache.org/jira/browse/CONFIGURATION-818 > Project: Commons Configuration > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Henry Lin >Priority: Major > Labels: security > Fix For: 2.8.0 > > > Dear Apache Commons Configuration maintainers, > The Code Intelligence JVM fuzzer > [Jazzer|https://github.com/CodeIntelligenceTesting/jazzer] has found multiple > vulnerabilities in Apache Commons Configuration during a fuzzing run in > [Google OSS-Fuzz|https://github.com/google/oss-fuzz]. The vulnerabilities > were already fixed. Version <= 2.7 of Apache Commons Configuration is > vulnerable. > Detailed Information can be found here: > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48737] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48610] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48522] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48391] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48195] > > Please let me know if you have any questions regarding fuzzing or the > OSS-Fuzz integration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CONFIGURATION-818) Stackoverflow bugs fixed in 2.8.0
[ https://issues.apache.org/jira/browse/CONFIGURATION-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576192#comment-17576192 ] Gary D. Gregory commented on CONFIGURATION-818: --- No, the links are useless (to me at least): "You (email=***@gmail.com) are not authorized to access this page!" If I choose my GitHub account, it wants personal information: "oss-fuzz login by oliverchang wants to access your garydgregory account", so no. You can create PRs on GitHub that demonstrate the problems if want to move this through, so we can see what the deal is for each link. > Stackoverflow bugs fixed in 2.8.0 > - > > Key: CONFIGURATION-818 > URL: https://issues.apache.org/jira/browse/CONFIGURATION-818 > Project: Commons Configuration > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Henry Lin >Priority: Major > Labels: security > Fix For: 2.8.0 > > > Dear Apache Commons Configuration maintainers, > The Code Intelligence JVM fuzzer > [Jazzer|https://github.com/CodeIntelligenceTesting/jazzer] has found multiple > vulnerabilities in Apache Commons Configuration during a fuzzing run in > [Google OSS-Fuzz|https://github.com/google/oss-fuzz]. The vulnerabilities > were already fixed. Version <= 2.7 of Apache Commons Configuration is > vulnerable. > Detailed Information can be found here: > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48737] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48610] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48522] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48391] > [https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48195] > > Please let me know if you have any questions regarding fuzzing or the > OSS-Fuzz integration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CONFIGURATION-819) Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write
[ https://issues.apache.org/jira/browse/CONFIGURATION-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576191#comment-17576191 ] Weber Jo commented on CONFIGURATION-819: Clearly looks like I misunderstood the term "crash". I am sorry for that. I will remember it for future issues. > Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write > - > > Key: CONFIGURATION-819 > URL: https://issues.apache.org/jira/browse/CONFIGURATION-819 > Project: Commons Configuration > Issue Type: Bug >Reporter: Weber Jo >Priority: Major > Attachments: 48192.patch, > clusterfuzz-testcase-YAMLConfigurationWriteFuzzer-5634459279425536, > clusterfuzz-testcase-minimized-YAMLConfigurationWriteFuzzer-5634459279425536, > stacktrace.txt > > > When executing YAMLConfiguration.write with malformed input, there is the > possibility to receive a snakeyaml.error.YAMLException which does not get > caught and leads to a crash. > This was found through OSS-Fuzz ([Crash > #48192|https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48192]). > I attached the stacktrace and the crashing inputs. > Furthermore, I attached a possible fix that suppresses the given crashing > inputs. > It passes all unit tests, but I am not sure if fits your code standards or if > you want to catch the exception earlier (as in YAMLConfiguration.dump) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CONFIGURATION-819) Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write
[ https://issues.apache.org/jira/browse/CONFIGURATION-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576189#comment-17576189 ] Gary D. Gregory commented on CONFIGURATION-819: --- FYI, a "crash" is when the JVM crashes, not when an exception is thrown. So this is neither a "crash" nor a "Major" issue. Some projects/components choose to simply document what exceptions a method throws and leave it at that. I'm not sure what is best here: Rethrow or document? What do others think? > Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write > - > > Key: CONFIGURATION-819 > URL: https://issues.apache.org/jira/browse/CONFIGURATION-819 > Project: Commons Configuration > Issue Type: Bug >Reporter: Weber Jo >Priority: Major > Attachments: 48192.patch, > clusterfuzz-testcase-YAMLConfigurationWriteFuzzer-5634459279425536, > clusterfuzz-testcase-minimized-YAMLConfigurationWriteFuzzer-5634459279425536, > stacktrace.txt > > > When executing YAMLConfiguration.write with malformed input, there is the > possibility to receive a snakeyaml.error.YAMLException which does not get > caught and leads to a crash. > This was found through OSS-Fuzz ([Crash > #48192|https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48192]). > I attached the stacktrace and the crashing inputs. > Furthermore, I attached a possible fix that suppresses the given crashing > inputs. > It passes all unit tests, but I am not sure if fits your code standards or if > you want to catch the exception earlier (as in YAMLConfiguration.dump) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CONFIGURATION-819) Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write
Weber Jo created CONFIGURATION-819: -- Summary: Uncaught snakeyaml.error.YAMLException in YAMLConfiguration.write Key: CONFIGURATION-819 URL: https://issues.apache.org/jira/browse/CONFIGURATION-819 Project: Commons Configuration Issue Type: Bug Reporter: Weber Jo Attachments: 48192.patch, clusterfuzz-testcase-YAMLConfigurationWriteFuzzer-5634459279425536, clusterfuzz-testcase-minimized-YAMLConfigurationWriteFuzzer-5634459279425536, stacktrace.txt When executing YAMLConfiguration.write with malformed input, there is the possibility to receive a snakeyaml.error.YAMLException which does not get caught and leads to a crash. This was found through OSS-Fuzz ([Crash #48192|https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=48192]). I attached the stacktrace and the crashing inputs. Furthermore, I attached a possible fix that suppresses the given crashing inputs. It passes all unit tests, but I am not sure if fits your code standards or if you want to catch the exception earlier (as in YAMLConfiguration.dump) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change
Claudenw commented on code in PR #320: URL: https://github.com/apache/commons-collections/pull/320#discussion_r939505118 ## src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.Objects; +import java.util.function.IntPredicate; + +/** + * A Hasher that implements combinatorial hashing as as described by + * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch and Mitzenmacher using the enhanced double hashing technique + * described in the wikipedia article https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing";>Double Hashing. + * + * Common use for this hasher is to generate bit indices from a byte array output of a hashing + * or MessageDigest algorithm. + * + * Thoughts on the hasher input + * + *Note that it is worse to create smaller numbers for the initial and increment. If the initial is smaller than + * the number of bits in a filter then hashing will start at the same point when the size increases; likewise the increment will be + * the same if it remains smaller than the number of bits in the filter and so the first few indices will be the same if the number of bits + * changes (but is still larger than the increment). In a worse case scenario with small initial and increment for + * all items, hashing may not create indices that fill the full region within a much larger filter. Imagine hashers created with initial + * and increment values less than 255 with a filter size of 3 and number of hash functions as 5. Ignoring the + * tetrahedral addition (a maximum of 20 for k=5) the max index is 255 * 4 + 255 = 1275, this covers 4.25% of the filter. This also + * ignores the negative wrapping but the behaviour is the same, some bits cannot be reached. + * + * So this needs to be avoided as the filter probability assumptions will be void. If the initial and increment are larger + * than the number of bits then the modulus will create a 'random' position and increment within the size. + * + * + * @since 4.5 + */ +public class EnhancedDoubleHasher implements Hasher { + +/** + * The initial hash value. + */ +private final long initial; + +/** + * The value to increment the hash value by. + */ +private final long increment; + +/** + * Convert bytes to big-endian long filling with zero bytes as necessary.. + * @param byteArray the byte array to extract the values from. + * @param offset the offset to start extraction from. + * @param len the length of the extraction, may be longer than 8. + * @return + */ +private static long toLong(byte[] byteArray, int offset, int len) { +long val = 0; +len = Math.min(len, Long.BYTES); +int shift = Long.SIZE; +for (int i = 0; i < len; i++) { +shift -= Byte.SIZE; +val |= ((long) (byteArray[offset + i] & 0x00FF) << shift); Review Comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change
Claudenw commented on code in PR #320: URL: https://github.com/apache/commons-collections/pull/320#discussion_r939505109 ## src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.Objects; +import java.util.function.IntPredicate; + +/** + * A Hasher that implements combinatorial hashing as as described by + * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch and Mitzenmacher using the enhanced double hashing technique + * described in the wikipedia article https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing";>Double Hashing. + * + * Common use for this hasher is to generate bit indices from a byte array output of a hashing + * or MessageDigest algorithm. + * + * Thoughts on the hasher input + * + *Note that it is worse to create smaller numbers for the initial and increment. If the initial is smaller than + * the number of bits in a filter then hashing will start at the same point when the size increases; likewise the increment will be + * the same if it remains smaller than the number of bits in the filter and so the first few indices will be the same if the number of bits + * changes (but is still larger than the increment). In a worse case scenario with small initial and increment for + * all items, hashing may not create indices that fill the full region within a much larger filter. Imagine hashers created with initial + * and increment values less than 255 with a filter size of 3 and number of hash functions as 5. Ignoring the + * tetrahedral addition (a maximum of 20 for k=5) the max index is 255 * 4 + 255 = 1275, this covers 4.25% of the filter. This also + * ignores the negative wrapping but the behaviour is the same, some bits cannot be reached. + * + * So this needs to be avoided as the filter probability assumptions will be void. If the initial and increment are larger + * than the number of bits then the modulus will create a 'random' position and increment within the size. + * + * + * @since 4.5 + */ +public class EnhancedDoubleHasher implements Hasher { + +/** + * The initial hash value. + */ +private final long initial; + +/** + * The value to increment the hash value by. + */ +private final long increment; + +/** + * Convert bytes to big-endian long filling with zero bytes as necessary.. + * @param byteArray the byte array to extract the values from. + * @param offset the offset to start extraction from. + * @param len the length of the extraction, may be longer than 8. + * @return + */ +private static long toLong(byte[] byteArray, int offset, int len) { +long val = 0; +len = Math.min(len, Long.BYTES); +int shift = Long.SIZE; +for (int i = 0; i < len; i++) { +shift -= Byte.SIZE; +val |= ((long) (byteArray[offset + i] & 0x00FF) << shift); +} +return val; +} + +/** + * Constructs the EnhancedDoubleHasher from a byte array. + * + * This method simplifies the conversion from a Digest or hasher algorithm output + * to the two values used by the EnhancedDoubleHasher. + * The byte array is split in 2 and the first 8 bytes of each half are interpreted as a big-endian long value. + * Excess bytes are ignored. + * If there are fewer than 16 bytes the following conversions are made. + * + * + * If there is an odd number of bytes the excess byte is assigned to the increment value + * The bytes alloted are read in big-endian order any byte not populated is set to zero. + * + * + * This ensures that small arrays generate the largest possible increment and initial values. + * + * @param buffer the buffer to extract the longs from. + * @throws IllegalArgumentException is buffer length is zero. + */ +public EnhancedDoubleHasher(byte[] buffer) { +if (buffer.length == 0) { +throw new IllegalArgumentException("buffer length must be greater than 0"); +
[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change
Claudenw commented on code in PR #320: URL: https://github.com/apache/commons-collections/pull/320#discussion_r939505077 ## src/test/java/org/apache/commons/collections4/bloomfilter/IncrementingHasher.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.Objects; +import java.util.function.IntPredicate; + +/** + * A Hasher that implements simple combinatorial hashing as as described by + * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch and Mitzenmacher. + * + * To be used for testing only. + * + * @since 4.5 + */ +class IncrementingHasher implements Hasher { + +/** + * The initial hash value. + */ +private final long initial; + +/** + * The value to increment the hash value by. + */ +private final long increment; + +/** + * Constructs the IncrementingHasher from 2 longs. The long values will be interpreted as unsigned values. + * + * The initial hash value will be the modulus of the initial value. + * Subsequent values will be calculated by repeatedly adding the increment to the last value and taking the modulus. + * + * @param initial The initial value for the hasher. + * @param increment The value to increment the hash by on each iteration. + * @see #getDefaultIncrement() + */ +IncrementingHasher(long initial, long increment) { +this.initial = initial; +this.increment = increment; +} + +@Override +public IndexProducer indices(final Shape shape) { +Objects.requireNonNull(shape, "shape"); + +return new IndexProducer() { + +@Override +public boolean forEachIndex(IntPredicate consumer) { +Objects.requireNonNull(consumer, "consumer"); +int bits = shape.getNumberOfBits(); +/* + * Essentially this is computing a wrapped modulus from a start point and an Review Comment: fixed ## src/test/java/org/apache/commons/collections4/bloomfilter/IncrementingHasher.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.Objects; +import java.util.function.IntPredicate; + +/** + * A Hasher that implements simple combinatorial hashing as as described by + * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch and Mitzenmacher. + * + * To be used for testing only. + * + * @since 4.5 + */ +class IncrementingHasher implements Hasher { + +/** + * The initial hash value. + */ +private final long initial; + +/** + * The value to increment the hash value by. + */ +private final long increment; + +/** + * Constructs the IncrementingHasher from 2 longs. The long values will be interpreted as unsigned values. + * + * The initial hash value will be the modulus of the initial value. + * Subsequent values will be calculated by repeatedly adding the increment to the last value and taking the modulus. + * + * @param initial The initial value for the hasher. + * @param increment The value to increment the hash by on each iteration. + * @see #getDefaultIncrement() Review Comment: fixed -- This is an automated message from the Apache Git Service. To respond to the me
[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change
Claudenw commented on code in PR #320: URL: https://github.com/apache/commons-collections/pull/320#discussion_r939505059 ## src/test/java/org/apache/commons/collections4/bloomfilter/IncrementingHasher.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.Objects; +import java.util.function.IntPredicate; + +/** + * A Hasher that implements simple combinatorial hashing as as described by + * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch and Mitzenmacher. + * + * To be used for testing only. + * + * @since 4.5 + */ +class IncrementingHasher implements Hasher { + +/** + * The initial hash value. + */ +private final long initial; + +/** + * The value to increment the hash value by. + */ +private final long increment; + +/** + * Constructs the IncrementingHasher from 2 longs. The long values will be interpreted as unsigned values. + * + * The initial hash value will be the modulus of the initial value. + * Subsequent values will be calculated by repeatedly adding the increment to the last value and taking the modulus. + * + * @param initial The initial value for the hasher. + * @param increment The value to increment the hash by on each iteration. + * @see #getDefaultIncrement() + */ +IncrementingHasher(long initial, long increment) { +this.initial = initial; +this.increment = increment; +} + +@Override +public IndexProducer indices(final Shape shape) { +Objects.requireNonNull(shape, "shape"); + +return new IndexProducer() { + +@Override +public boolean forEachIndex(IntPredicate consumer) { +Objects.requireNonNull(consumer, "consumer"); +int bits = shape.getNumberOfBits(); +/* + * Essentially this is computing a wrapped modulus from a start point and an + * increment. So actually you only need two modulus operations before the loop. + * This avoids any modulus operation inside the while loop. It uses a long index + * to avoid overflow. + */ +long index = EnhancedDoubleHasher.mod(initial, bits); +int inc = EnhancedDoubleHasher.mod(increment, bits); + +for (int functionalCount = 0; functionalCount < shape.getNumberOfHashFunctions(); functionalCount++) { + +if (!consumer.test((int) index)) { +return false; +} +index += inc; +index = index >= bits ? index - bits : index; +} +return true; +} + +@Override +public int[] asIndexArray() { +int[] result = new int[shape.getNumberOfHashFunctions()]; +int[] idx = new int[1]; +/* Review Comment: fixed ## src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.Objects; +import java.util.function.IntPredicate; + +/** + * A Hasher that implements combinatorial hashing as as described by + * https
[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change
Claudenw commented on code in PR #320: URL: https://github.com/apache/commons-collections/pull/320#discussion_r939505030 ## src/test/java/org/apache/commons/collections4/bloomfilter/AbstractHasherTest.java: ## @@ -74,16 +77,28 @@ public void testHashing(int k, int m) { }); assertEquals(k * getHasherSize(hasher), count[0], () -> String.format("Did not produce k=%d * m=%d indices", k, getHasherSize(hasher))); + +// test early exit +count[0] = 0; +hasher.indices(Shape.fromKM(k, m)).forEachIndex(i -> { +assertTrue(i >= 0 && i < m, () -> "Out of range: " + i + ", m=" + m); +count[0]++; +return false; +}); +assertEquals(1, count[0], "did not exit early" ); } @Test public void testUniqueIndex() { -// create a hasher that produces duplicates with the specified shape. -// this setup produces 5, 17, 29, 41, 53, 65 two times -Shape shape = Shape.fromKM(12, 72); -Hasher hasher = new SimpleHasher(5, 12); -Set set = new HashSet<>(); -assertTrue(hasher.uniqueIndices(shape).forEachIndex(set::add), "Duplicate detected"); -assertEquals(6, set.size()); +// generating 11 numbers in the ragne of [0,9] will yield at least on collision. Review Comment: fixed ## src/test/java/org/apache/commons/collections4/bloomfilter/IncrementingHasher.java: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.Objects; +import java.util.function.IntPredicate; + +/** + * A Hasher that implements simple combinatorial hashing as as described by + * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch and Mitzenmacher. + * + * To be used for testing only. + * + * @since 4.5 + */ +class IncrementingHasher implements Hasher { + +/** + * The initial hash value. + */ +private final long initial; + +/** + * The value to increment the hash value by. + */ +private final long increment; + +/** + * Constructs the IncrementingHasher from 2 longs. The long values will be interpreted as unsigned values. + * + * The initial hash value will be the modulus of the initial value. + * Subsequent values will be calculated by repeatedly adding the increment to the last value and taking the modulus. + * + * @param initial The initial value for the hasher. + * @param increment The value to increment the hash by on each iteration. + * @see #getDefaultIncrement() + */ +IncrementingHasher(long initial, long increment) { +this.initial = initial; +this.increment = increment; +} + +@Override +public IndexProducer indices(final Shape shape) { +Objects.requireNonNull(shape, "shape"); + +return new IndexProducer() { + +@Override +public boolean forEachIndex(IntPredicate consumer) { +Objects.requireNonNull(consumer, "consumer"); +int bits = shape.getNumberOfBits(); +/* + * Essentially this is computing a wrapped modulus from a start point and an + * increment. So actually you only need two modulus operations before the loop. + * This avoids any modulus operation inside the while loop. It uses a long index + * to avoid overflow. + */ +long index = EnhancedDoubleHasher.mod(initial, bits); +int inc = EnhancedDoubleHasher.mod(increment, bits); + +for (int functionalCount = 0; functionalCount < shape.getNumberOfHashFunctions(); functionalCount++) { + Review Comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-collections] Claudenw commented on a diff in pull request #320: Collections 824: Optimize SimpleHasher.forEachIndex and SimpleHasher name change
Claudenw commented on code in PR #320: URL: https://github.com/apache/commons-collections/pull/320#discussion_r939504995 ## src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.Objects; +import java.util.function.IntPredicate; + +/** + * A Hasher that implements combinatorial hashing as as described by + * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch and Mitzenmacher using the enhanced double hashing technique + * described in the wikipedia article https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing";>Double Hashing. + * + * Common use for this hasher is to generate bit indices from a byte array output of a hashing + * or MessageDigest algorithm. + * + * Thoughts on the hasher input + * + *Note that it is worse to create smaller numbers for the initial and increment. If the initial is smaller than Review Comment: fixed ## src/main/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasher.java: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.Objects; +import java.util.function.IntPredicate; + +/** + * A Hasher that implements combinatorial hashing as as described by + * https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf";>Krisch and Mitzenmacher using the enhanced double hashing technique + * described in the wikipedia article https://en.wikipedia.org/wiki/Double_hashing#Enhanced_double_hashing";>Double Hashing. + * + * Common use for this hasher is to generate bit indices from a byte array output of a hashing + * or MessageDigest algorithm. + * + * Thoughts on the hasher input + * + *Note that it is worse to create smaller numbers for the initial and increment. If the initial is smaller than + * the number of bits in a filter then hashing will start at the same point when the size increases; likewise the increment will be + * the same if it remains smaller than the number of bits in the filter and so the first few indices will be the same if the number of bits + * changes (but is still larger than the increment). In a worse case scenario with small initial and increment for + * all items, hashing may not create indices that fill the full region within a much larger filter. Imagine hashers created with initial + * and increment values less than 255 with a filter size of 3 and number of hash functions as 5. Ignoring the + * tetrahedral addition (a maximum of 20 for k=5) the max index is 255 * 4 + 255 = 1275, this covers 4.25% of the filter. This also + * ignores the negative wrapping but the behaviour is the same, some bits cannot be reached. + * + * So this needs to be avoided as the filter probability assumptions will be void. If the initial and increment are larger + * than the number of bits then the modulus will create a 'random' position and increment within the size. + * + * + * @since 4.5 + */ +public class EnhancedDoubleHasher implements Hasher { + +/** + * The initial hash value. + */ +private final long initial; + +/** + * The value to increment the hash value by. + */ +private final long increment; + +/** + * Convert bytes to big-endian lon