[GitHub] [commons-lang] coveralls edited a comment on issue #501: Expand Streams functionality
coveralls edited a comment on issue #501: Expand Streams functionality URL: https://github.com/apache/commons-lang/pull/501#issuecomment-595538858 [![Coverage Status](https://coveralls.io/builds/29203361/badge)](https://coveralls.io/builds/29203361) Coverage decreased (-0.2%) to 94.86% when pulling **d0376b51c6bd715a73b25ffc1db75fc4a32e6f86 on Isira-Seneviratne:master** into **fde46a232d82f2b746f62bc7546e2e3371f20dca on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (MATH-1516) Define an interface for ranking a list of clusters
[ https://issues.apache.org/jira/browse/MATH-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054258#comment-17054258 ] Chen Tao edited comment on MATH-1516 at 3/8/20, 2:28 AM: - {quote}I don't understand what you mean.{quote} If the evaluator do not implement the isBetterScore itself, then application program may have to hard code the evaluator name to judge the score is High-endian or Low-endian. {quote}Then the ranking function could be{quote} IMHO, each ClusterRanking implementation should according to its *mathematical definition*, otherwise the application program's design is component relatively. In reality, the app algorithm research is on python, and then we translate the python code into java for production. It is hard to judge if the java version is as good as python version, without same scoring system. I think there should be a default method in "ClusterRanking": {code:java} @FunctionalInterface public interface ClusterRanking { /** * Computes the rank. * * @param clusters Clusters to be evaluated. * @return the rank of the provided {@code clusters}. */ double compute(List> clusters); /** * Returns whether the first evaluation score is considered to be better * than the second one by this evaluator. * * The default logic is the higher the better. * * Specific implementations shall override this method if the returned scores * do not follow the same ordering, i.e. lower score is better. * * @param score1 the first score * @param score2 the second score * @return {@code true} if the first score is considered to be better, {@code false} otherwise */ default boolean isBetterScore(double score1, double score2) { return score1 > score2; } } {code} was (Author: chentao106): {quote}I don't understand what you mean.{quote} If the evaluator do not implement the isBetterScore itself, then application program may have to hard code the evaluator name to judge the score is High-endian or Low-endian. {quote}Then the ranking function could be{quote} IMHO, each ClusterRanking implementation should according to its mathematical definition, otherwise the application program's design is component relatively. In reality, the app algorithm research is on python, and then we translate the python code into java for production. It is hard to judge if the java version is as good as python version, without same scoring system. I think there should be a default method in "ClusterRanking": {code:java} @FunctionalInterface public interface ClusterRanking { /** * Computes the rank. * * @param clusters Clusters to be evaluated. * @return the rank of the provided {@code clusters}. */ double compute(List> clusters); /** * Returns whether the first evaluation score is considered to be better * than the second one by this evaluator. * * The default logic is the higher the better. * * Specific implementations shall override this method if the returned scores * do not follow the same ordering, i.e. lower score is better. * * @param score1 the first score * @param score2 the second score * @return {@code true} if the first score is considered to be better, {@code false} otherwise */ default boolean isBetterScore(double score1, double score2) { return score1 > score2; } } {code} > Define an interface for ranking a list of clusters > -- > > Key: MATH-1516 > URL: https://issues.apache.org/jira/browse/MATH-1516 > Project: Commons Math > Issue Type: Sub-task >Reporter: Gilles Sadowski >Assignee: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > [On the "dev" ML|https://markmail.org/message/z4qr3fcsg5emt2nn] it has been > suggested to create a functional interface for unequivocally defining the > quality of a clustering: > * a valid ranking must be positive, > * better clustering is conveyed through higher ranking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (MATH-1516) Define an interface for ranking a list of clusters
[ https://issues.apache.org/jira/browse/MATH-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054258#comment-17054258 ] Chen Tao edited comment on MATH-1516 at 3/8/20, 2:22 AM: - {quote}I don't understand what you mean.{quote} If the evaluator do not implement the isBetterScore itself, then application program may have to hard code the evaluator name to judge the score is High-endian or Low-endian. {quote}Then the ranking function could be{quote} IMHO, each ClusterRanking implementation should according to its mathematical definition, otherwise the application program's design is component relatively. In reality, the app algorithm research is on python, and then we translate the python code into java for production. It is hard to judge if the java version is as good as python version, without same scoring system. I think there should be a default method in "ClusterRanking": {code:java} @FunctionalInterface public interface ClusterRanking { /** * Computes the rank. * * @param clusters Clusters to be evaluated. * @return the rank of the provided {@code clusters}. */ double compute(List> clusters); /** * Returns whether the first evaluation score is considered to be better * than the second one by this evaluator. * * The default logic is the higher the better. * * Specific implementations shall override this method if the returned scores * do not follow the same ordering, i.e. lower score is better. * * @param score1 the first score * @param score2 the second score * @return {@code true} if the first score is considered to be better, {@code false} otherwise */ default boolean isBetterScore(double score1, double score2) { return score1 > score2; } } {code} was (Author: chentao106): ??I don't understand what you mean.?? If the evaluator do not implement the isBetterScore itself, then application program may have to hard code the evaluator name to judge the score is High-endian or Low-endian. ??Then the ranking function could be?? IMHO, each ClusterRanking implementation should according to its mathematical definition, otherwise the application program's design is component relatively. I think there should be a default method in "ClusterRanking": {code:java} @FunctionalInterface public interface ClusterRanking { /** * Computes the rank. * * @param clusters Clusters to be evaluated. * @return the rank of the provided {@code clusters}. */ double compute(List> clusters); /** * Returns whether the first evaluation score is considered to be better * than the second one by this evaluator. * * The default logic is the higher the better. * * Specific implementations shall override this method if the returned scores * do not follow the same ordering, i.e. lower score is better. * * @param score1 the first score * @param score2 the second score * @return {@code true} if the first score is considered to be better, {@code false} otherwise */ default boolean isBetterScore(double score1, double score2) { return score1 > score2; } } {code} > Define an interface for ranking a list of clusters > -- > > Key: MATH-1516 > URL: https://issues.apache.org/jira/browse/MATH-1516 > Project: Commons Math > Issue Type: Sub-task >Reporter: Gilles Sadowski >Assignee: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > [On the "dev" ML|https://markmail.org/message/z4qr3fcsg5emt2nn] it has been > suggested to create a functional interface for unequivocally defining the > quality of a clustering: > * a valid ranking must be positive, > * better clustering is conveyed through higher ranking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1516) Define an interface for ranking a list of clusters
[ https://issues.apache.org/jira/browse/MATH-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054258#comment-17054258 ] Chen Tao commented on MATH-1516: ??I don't understand what you mean.?? If the evaluator do not implement the isBetterScore itself, then application program may have to hard code the evaluator name to judge the score is High-endian or Low-endian. ??Then the ranking function could be?? IMHO, each ClusterRanking implementation should according to its mathematical definition, otherwise the application program's design is component relatively. I think there should be a default method in "ClusterRanking": {code:java} @FunctionalInterface public interface ClusterRanking { /** * Computes the rank. * * @param clusters Clusters to be evaluated. * @return the rank of the provided {@code clusters}. */ double compute(List> clusters); /** * Returns whether the first evaluation score is considered to be better * than the second one by this evaluator. * * The default logic is the higher the better. * * Specific implementations shall override this method if the returned scores * do not follow the same ordering, i.e. lower score is better. * * @param score1 the first score * @param score2 the second score * @return {@code true} if the first score is considered to be better, {@code false} otherwise */ default boolean isBetterScore(double score1, double score2) { return score1 > score2; } } {code} > Define an interface for ranking a list of clusters > -- > > Key: MATH-1516 > URL: https://issues.apache.org/jira/browse/MATH-1516 > Project: Commons Math > Issue Type: Sub-task >Reporter: Gilles Sadowski >Assignee: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > [On the "dev" ML|https://markmail.org/message/z4qr3fcsg5emt2nn] it has been > suggested to create a functional interface for unequivocally defining the > quality of a clustering: > * a valid ranking must be positive, > * better clustering is conveyed through higher ranking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (MATH-1518) Method to compute centroid is duplicated
[ https://issues.apache.org/jira/browse/MATH-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilles Sadowski resolved MATH-1518. --- Resolution: Done Commit aafc49afd764351668045c0fadf2ad6222490ba7 ("master" branch). > Method to compute centroid is duplicated > > > Key: MATH-1518 > URL: https://issues.apache.org/jira/browse/MATH-1518 > Project: Commons Math > Issue Type: Improvement >Reporter: Gilles Sadowski >Assignee: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > Below package {{o.a.c.math4.ml.clustering}}, similar codes are present in > {{ClusterEvaluator}} and in {{KMeansPlusPlusClusterer}}. > Duplication could be removed if the method to compute the centroid is moved > to class {{Cluster}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MATH-1518) Method to compute centroid is duplicated
Gilles Sadowski created MATH-1518: - Summary: Method to compute centroid is duplicated Key: MATH-1518 URL: https://issues.apache.org/jira/browse/MATH-1518 Project: Commons Math Issue Type: Improvement Reporter: Gilles Sadowski Assignee: Gilles Sadowski Fix For: 4.0 Below package {{o.a.c.math4.ml.clustering}}, similar codes are present in {{ClusterEvaluator}} and in {{KMeansPlusPlusClusterer}}. Duplication could be removed if the method to compute the centroid is moved to class {{Cluster}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-collections] aherbert commented on a change in pull request #137: WIP: CountingBloomFilter
aherbert commented on a change in pull request #137: WIP: CountingBloomFilter URL: https://github.com/apache/commons-collections/pull/137#discussion_r389322748 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/ArrayCountingBloomFilter.java ## @@ -0,0 +1,396 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.BitSet; +import java.util.HashSet; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; +import java.util.PrimitiveIterator.OfInt; +import java.util.function.Consumer; +import java.util.function.IntConsumer; +import java.util.Set; + +import org.apache.commons.collections4.bloomfilter.hasher.Hasher; +import org.apache.commons.collections4.bloomfilter.hasher.Shape; +import org.apache.commons.collections4.bloomfilter.hasher.StaticHasher; + +/** + * A counting Bloom filter using an array to track counts for each enabled bit + * index. + * + * Any operation that results in negative counts or integer overflow of counts will + * mark this filter as invalid. This transition is not reversible. The counts for the + * filter immediately prior to the operation that create invalid counts can be recovered. + * See the documentation in {@link #isValid()} for details. + * + * All the operations in the filter assume the counts are currently valid. Behaviour + * of an invalid filter is undefined. It will no longer function identically to a standard + * Bloom filter that is the merge of all the Bloom filters that have been added + * to and not later subtracted from the counting Bloom filter. + * + * The maximum supported number of items that can be stored in the filter is + * limited by the maximum array size combined with the {@link Shape}. For + * example an implementation using a {@link Shape} with a false-positive + * probability of 1e-6 and {@link Integer#MAX_VALUE} bits can reversibly store + * approximately 75 million items using 20 hash functions per item with a memory + * consumption of approximately 8 GB. + * + * @since 4.5 + * @see Shape + */ +public class ArrayCountingBloomFilter extends AbstractBloomFilter implements CountingBloomFilter { + +/** + * The count of each bit index in the filter. + */ +private final int[] counts; + +/** + * The state flag. This is a bitwise OR of the entire history of all updated + * counts. If negative then a negative count or integer overflow has occurred on + * one or more counts in the history of the filter and the state is invalid. + * + * Maintenance of this state flag is branch-free for improved performance. It + * eliminates a conditional check for a negative count during remove/subtract + * operations and a conditional check for integer overflow during merge/add + * operations. + * + * Note: Integer overflow is unlikely in realistic usage scenarios. A count + * that overflows indicates that the number of items in the filter exceeds the + * maximum possible size (number of bits) of any Bloom filter constrained by + * integer indices. At this point the filter is most likely full (all bits are + * non-zero) and thus useless. + * + * Negative counts are a concern if the filter is used incorrectly by + * removing an item that was never added. It is expected that a user of a + * counting Bloom filter will not perform this action as it is a mistake. + * Enabling an explicit recovery path for negative or overflow counts is a major + * performance burden not deemed necessary for the unlikely scenarios when an + * invalid state is created. Maintenance of the state flag is a concession to + * flag improper use that should not have a major performance impact. + */ +private int state; + +/** + * An iterator of all indexes with non-zero counts. + * + * In the event that the filter state is invalid any index with a negative count + * will also be produced by the iterator. + */ +private class IndexIterator implements PrimitiveIterator.OfInt { +/** The next non-zero index (or counts.length). */ +private int ne
[jira] [Created] (MATH-1517) "Cluster" leaks internal data
Gilles Sadowski created MATH-1517: - Summary: "Cluster" leaks internal data Key: MATH-1517 URL: https://issues.apache.org/jira/browse/MATH-1517 Project: Commons Math Issue Type: Bug Reporter: Gilles Sadowski Fix For: 4.0 Method {{getPoints()}} returns a reference to a mutable instance field. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-collections] aherbert commented on a change in pull request #137: WIP: CountingBloomFilter
aherbert commented on a change in pull request #137: WIP: CountingBloomFilter URL: https://github.com/apache/commons-collections/pull/137#discussion_r389319599 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/ArrayCountingBloomFilter.java ## @@ -0,0 +1,396 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.BitSet; +import java.util.HashSet; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; +import java.util.PrimitiveIterator.OfInt; +import java.util.function.Consumer; +import java.util.function.IntConsumer; +import java.util.Set; + +import org.apache.commons.collections4.bloomfilter.hasher.Hasher; +import org.apache.commons.collections4.bloomfilter.hasher.Shape; +import org.apache.commons.collections4.bloomfilter.hasher.StaticHasher; + +/** + * A counting Bloom filter using an array to track counts for each enabled bit + * index. + * + * Any operation that results in negative counts or integer overflow of counts will + * mark this filter as invalid. This transition is not reversible. The counts for the + * filter immediately prior to the operation that create invalid counts can be recovered. + * See the documentation in {@link #isValid()} for details. + * + * All the operations in the filter assume the counts are currently valid. Behaviour + * of an invalid filter is undefined. It will no longer function identically to a standard + * Bloom filter that is the merge of all the Bloom filters that have been added + * to and not later subtracted from the counting Bloom filter. + * + * The maximum supported number of items that can be stored in the filter is + * limited by the maximum array size combined with the {@link Shape}. For + * example an implementation using a {@link Shape} with a false-positive + * probability of 1e-6 and {@link Integer#MAX_VALUE} bits can reversibly store + * approximately 75 million items using 20 hash functions per item with a memory + * consumption of approximately 8 GB. + * + * @since 4.5 + * @see Shape + */ +public class ArrayCountingBloomFilter extends AbstractBloomFilter implements CountingBloomFilter { + +/** + * The count of each bit index in the filter. + */ +private final int[] counts; + +/** + * The state flag. This is a bitwise OR of the entire history of all updated + * counts. If negative then a negative count or integer overflow has occurred on + * one or more counts in the history of the filter and the state is invalid. + * + * Maintenance of this state flag is branch-free for improved performance. It + * eliminates a conditional check for a negative count during remove/subtract + * operations and a conditional check for integer overflow during merge/add + * operations. + * + * Note: Integer overflow is unlikely in realistic usage scenarios. A count + * that overflows indicates that the number of items in the filter exceeds the + * maximum possible size (number of bits) of any Bloom filter constrained by + * integer indices. At this point the filter is most likely full (all bits are + * non-zero) and thus useless. + * + * Negative counts are a concern if the filter is used incorrectly by + * removing an item that was never added. It is expected that a user of a + * counting Bloom filter will not perform this action as it is a mistake. + * Enabling an explicit recovery path for negative or overflow counts is a major + * performance burden not deemed necessary for the unlikely scenarios when an + * invalid state is created. Maintenance of the state flag is a concession to + * flag improper use that should not have a major performance impact. + */ +private int state; + +/** + * An iterator of all indexes with non-zero counts. + * + * In the event that the filter state is invalid any index with a negative count + * will also be produced by the iterator. + */ +private class IndexIterator implements PrimitiveIterator.OfInt { +/** The next non-zero index (or counts.length). */ +private int ne
[jira] [Comment Edited] (MATH-1516) Define an interface for ranking a list of clusters
[ https://issues.apache.org/jira/browse/MATH-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054174#comment-17054174 ] Gilles Sadowski edited comment on MATH-1516 at 3/7/20, 7:47 PM: {quote}the evaluator algorithm has the responsibility to isolate the rank rule. {quote} I don't understand what you mean. Anyways, whatever the "algorithm" produces as a _score_, it is possible to *define* a _rank_ as a function of that score but with the requirement that "higher is better". If the resulting value is only used to tell which of two clusterings is better (and not *how much* better), the relative amplitudes do not matter as long as the ordering is the same. {quote}SumOfClusterVariances, the score is the lower the better. {quote} Then the ranking function could be {code:java} @Override public double compute(List> cList) { return 1 / score(cList); } {code} to fulfill the requirement imposed by the new {{ClusterRanking}} interface. was (Author: erans): bq. the evaluator algorithm has the responsibility to isolate the rank rule. I don't understand what you mean. Anyways, whatever the "algorithm" produces as a _score_, it is possible to *define* a _rank_ as a function of that score but with the requirement that "higher is better". If the resulting value is only used to tell which of two clusterings is better (and not *how much* better), the relative amplitudes do not matter as long as the ordering is the same. bq. SumOfClusterVariances, the score is the lower the better. Then the ranking function could be {code} @Override public double compute(List cList) { return 1 / score(cList); } {code} to fulfill the requirement imposed by the new {{ClusterRanking}} interface. > Define an interface for ranking a list of clusters > -- > > Key: MATH-1516 > URL: https://issues.apache.org/jira/browse/MATH-1516 > Project: Commons Math > Issue Type: Sub-task >Reporter: Gilles Sadowski >Assignee: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > [On the "dev" ML|https://markmail.org/message/z4qr3fcsg5emt2nn] it has been > suggested to create a functional interface for unequivocally defining the > quality of a clustering: > * a valid ranking must be positive, > * better clustering is conveyed through higher ranking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1516) Define an interface for ranking a list of clusters
[ https://issues.apache.org/jira/browse/MATH-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054187#comment-17054187 ] Gilles Sadowski commented on MATH-1516: --- Please have a look at the changes (commit d64d4c9c02ae487e90e23d40f945cf78991ff1e0 in "master" branch). > Define an interface for ranking a list of clusters > -- > > Key: MATH-1516 > URL: https://issues.apache.org/jira/browse/MATH-1516 > Project: Commons Math > Issue Type: Sub-task >Reporter: Gilles Sadowski >Assignee: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > [On the "dev" ML|https://markmail.org/message/z4qr3fcsg5emt2nn] it has been > suggested to create a functional interface for unequivocally defining the > quality of a clustering: > * a valid ranking must be positive, > * better clustering is conveyed through higher ranking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1516) Define an interface for ranking a list of clusters
[ https://issues.apache.org/jira/browse/MATH-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054174#comment-17054174 ] Gilles Sadowski commented on MATH-1516: --- bq. the evaluator algorithm has the responsibility to isolate the rank rule. I don't understand what you mean. Anyways, whatever the "algorithm" produces as a _score_, it is possible to *define* a _rank_ as a function of that score but with the requirement that "higher is better". If the resulting value is only used to tell which of two clusterings is better (and not *how much* better), the relative amplitudes do not matter as long as the ordering is the same. bq. SumOfClusterVariances, the score is the lower the better. Then the ranking function could be {code} @Override public double compute(List cList) { return 1 / score(cList); } {code} to fulfill the requirement imposed by the new {{ClusterRanking}} interface. > Define an interface for ranking a list of clusters > -- > > Key: MATH-1516 > URL: https://issues.apache.org/jira/browse/MATH-1516 > Project: Commons Math > Issue Type: Sub-task >Reporter: Gilles Sadowski >Assignee: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > [On the "dev" ML|https://markmail.org/message/z4qr3fcsg5emt2nn] it has been > suggested to create a functional interface for unequivocally defining the > quality of a clustering: > * a valid ranking must be positive, > * better clustering is conveyed through higher ranking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-collections] Claudenw commented on issue #137: WIP: CountingBloomFilter
Claudenw commented on issue #137: WIP: CountingBloomFilter URL: https://github.com/apache/commons-collections/pull/137#issuecomment-596121410 All in all this looks like a good change to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw commented on a change in pull request #137: WIP: CountingBloomFilter
Claudenw commented on a change in pull request #137: WIP: CountingBloomFilter URL: https://github.com/apache/commons-collections/pull/137#discussion_r389299774 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/ArrayCountingBloomFilter.java ## @@ -0,0 +1,396 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.BitSet; +import java.util.HashSet; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; +import java.util.PrimitiveIterator.OfInt; +import java.util.function.Consumer; +import java.util.function.IntConsumer; +import java.util.Set; + +import org.apache.commons.collections4.bloomfilter.hasher.Hasher; +import org.apache.commons.collections4.bloomfilter.hasher.Shape; +import org.apache.commons.collections4.bloomfilter.hasher.StaticHasher; + +/** + * A counting Bloom filter using an array to track counts for each enabled bit + * index. + * + * Any operation that results in negative counts or integer overflow of counts will + * mark this filter as invalid. This transition is not reversible. The counts for the + * filter immediately prior to the operation that create invalid counts can be recovered. + * See the documentation in {@link #isValid()} for details. + * + * All the operations in the filter assume the counts are currently valid. Behaviour + * of an invalid filter is undefined. It will no longer function identically to a standard + * Bloom filter that is the merge of all the Bloom filters that have been added + * to and not later subtracted from the counting Bloom filter. + * + * The maximum supported number of items that can be stored in the filter is + * limited by the maximum array size combined with the {@link Shape}. For + * example an implementation using a {@link Shape} with a false-positive + * probability of 1e-6 and {@link Integer#MAX_VALUE} bits can reversibly store + * approximately 75 million items using 20 hash functions per item with a memory + * consumption of approximately 8 GB. + * + * @since 4.5 + * @see Shape + */ +public class ArrayCountingBloomFilter extends AbstractBloomFilter implements CountingBloomFilter { + +/** + * The count of each bit index in the filter. + */ +private final int[] counts; + +/** + * The state flag. This is a bitwise OR of the entire history of all updated + * counts. If negative then a negative count or integer overflow has occurred on + * one or more counts in the history of the filter and the state is invalid. + * + * Maintenance of this state flag is branch-free for improved performance. It + * eliminates a conditional check for a negative count during remove/subtract + * operations and a conditional check for integer overflow during merge/add + * operations. + * + * Note: Integer overflow is unlikely in realistic usage scenarios. A count + * that overflows indicates that the number of items in the filter exceeds the + * maximum possible size (number of bits) of any Bloom filter constrained by + * integer indices. At this point the filter is most likely full (all bits are + * non-zero) and thus useless. + * + * Negative counts are a concern if the filter is used incorrectly by + * removing an item that was never added. It is expected that a user of a + * counting Bloom filter will not perform this action as it is a mistake. + * Enabling an explicit recovery path for negative or overflow counts is a major + * performance burden not deemed necessary for the unlikely scenarios when an + * invalid state is created. Maintenance of the state flag is a concession to + * flag improper use that should not have a major performance impact. + */ +private int state; + +/** + * An iterator of all indexes with non-zero counts. + * + * In the event that the filter state is invalid any index with a negative count + * will also be produced by the iterator. + */ +private class IndexIterator implements PrimitiveIterator.OfInt { +/** The next non-zero index (or counts.length). */ +private int ne
[GitHub] [commons-compress] bodewig commented on issue #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry
bodewig commented on issue #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry URL: https://github.com/apache/commons-compress/pull/96#issuecomment-596111716 Thanks @PeterAlfreadLee - I've just added a few comments about unnecessary new imports. BTW, I think it is time you add yourself as developer to the POM. :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-compress] bodewig commented on a change in pull request #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry
bodewig commented on a change in pull request #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry URL: https://github.com/apache/commons-compress/pull/96#discussion_r389294213 ## File path: src/test/java/org/apache/commons/compress/archivers/zip/ZipUtilTest.java ## @@ -20,6 +20,7 @@ import static org.junit.Assert.*; +import java.io.UnsupportedEncodingException; Review comment: not used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-compress] bodewig commented on a change in pull request #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry
bodewig commented on a change in pull request #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry URL: https://github.com/apache/commons-compress/pull/96#discussion_r389293600 ## File path: src/test/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntryTest.java ## @@ -22,6 +22,9 @@ import static org.junit.Assert.*; import java.io.ByteArrayOutputStream; +import java.lang.reflect.Field; +import java.lang.reflect.Modifier; +import java.util.Arrays; Review comment: These are not sctually needed, are they? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (MATH-1516) Define an interface for ranking a list of clusters
[ https://issues.apache.org/jira/browse/MATH-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054157#comment-17054157 ] Chen Tao edited comment on MATH-1516 at 3/7/20, 5:23 PM: - There are many clusters evaluation algorithm: [scikit-learn clustering-performance-evaluation|https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation] They can be divided into 2 categories: “External Measures” and "Internal Measures". The function signatureis can be decided by the category the evaluation algorithm belong to. Althought the score is the higher the better for most of these evaluation algorithm, but there is a special case: [Davies-Bouldin Index|https://scikit-learn.org/stable/modules/clustering.html#davies-bouldin-index] There also some simplified evaluators like SumOfClusterVariances, the score is the lower the better. If there is a training application program, replaceable evaluator is necessary, the evaluator algorithm has the responsibility to isolate the rank rule. This should be considered in the design. was (Author: chentao106): There are many clusters evaluation algorithm: [scikit-learn clustering-performance-evaluation|https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation] They can be divided into 2 categories: “External Measures” and "Internal Measures". The function signatureis can be decided by the category the evaluation algorithm belong to. Althought the score is the higher the better for most of these evaluation algorithm, but there is a special case: [Davies-Bouldin Index|https://scikit-learn.org/stable/modules/clustering.html#davies-bouldin-index] There also some simplified evaluation like SumOfClusterVariances, the score is the lower the better. If there is a training application program, replaceable evaluator is necessary, the evaluator algorithm has the responsibility to isolate the rank rule. This should be considered in the design. > Define an interface for ranking a list of clusters > -- > > Key: MATH-1516 > URL: https://issues.apache.org/jira/browse/MATH-1516 > Project: Commons Math > Issue Type: Sub-task >Reporter: Gilles Sadowski >Assignee: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > [On the "dev" ML|https://markmail.org/message/z4qr3fcsg5emt2nn] it has been > suggested to create a functional interface for unequivocally defining the > quality of a clustering: > * a valid ranking must be positive, > * better clustering is conveyed through higher ranking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (MATH-1516) Define an interface for ranking a list of clusters
[ https://issues.apache.org/jira/browse/MATH-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054157#comment-17054157 ] Chen Tao commented on MATH-1516: There are many clusters evaluation algorithm: [scikit-learn clustering-performance-evaluation|https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation] They can be divided into 2 categories: “External Measures” and "Internal Measures". The function signatureis can be decided by the category the evaluation algorithm belong to. Althought the score is the higher the better for most of these evaluation algorithm, but there is a special case: [Davies-Bouldin Index|https://scikit-learn.org/stable/modules/clustering.html#davies-bouldin-index] There also some simplified evaluation like SumOfClusterVariances, the score is the lower the better. If there is a training application program, replaceable evaluator is necessary, the evaluator algorithm has the responsibility to isolate the rank rule. This should be considered in the design. > Define an interface for ranking a list of clusters > -- > > Key: MATH-1516 > URL: https://issues.apache.org/jira/browse/MATH-1516 > Project: Commons Math > Issue Type: Sub-task >Reporter: Gilles Sadowski >Assignee: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > [On the "dev" ML|https://markmail.org/message/z4qr3fcsg5emt2nn] it has been > suggested to create a functional interface for unequivocally defining the > quality of a clustering: > * a valid ranking must be positive, > * better clustering is conveyed through higher ranking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-collections] Claudenw commented on a change in pull request #137: WIP: CountingBloomFilter
Claudenw commented on a change in pull request #137: WIP: CountingBloomFilter URL: https://github.com/apache/commons-collections/pull/137#discussion_r389283827 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/ArrayCountingBloomFilter.java ## @@ -0,0 +1,396 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter; + +import java.util.BitSet; +import java.util.HashSet; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; +import java.util.PrimitiveIterator.OfInt; +import java.util.function.Consumer; +import java.util.function.IntConsumer; +import java.util.Set; + +import org.apache.commons.collections4.bloomfilter.hasher.Hasher; +import org.apache.commons.collections4.bloomfilter.hasher.Shape; +import org.apache.commons.collections4.bloomfilter.hasher.StaticHasher; + +/** + * A counting Bloom filter using an array to track counts for each enabled bit + * index. + * + * Any operation that results in negative counts or integer overflow of counts will + * mark this filter as invalid. This transition is not reversible. The counts for the + * filter immediately prior to the operation that create invalid counts can be recovered. + * See the documentation in {@link #isValid()} for details. + * + * All the operations in the filter assume the counts are currently valid. Behaviour + * of an invalid filter is undefined. It will no longer function identically to a standard + * Bloom filter that is the merge of all the Bloom filters that have been added + * to and not later subtracted from the counting Bloom filter. + * + * The maximum supported number of items that can be stored in the filter is + * limited by the maximum array size combined with the {@link Shape}. For + * example an implementation using a {@link Shape} with a false-positive + * probability of 1e-6 and {@link Integer#MAX_VALUE} bits can reversibly store + * approximately 75 million items using 20 hash functions per item with a memory + * consumption of approximately 8 GB. + * + * @since 4.5 + * @see Shape + */ +public class ArrayCountingBloomFilter extends AbstractBloomFilter implements CountingBloomFilter { + +/** + * The count of each bit index in the filter. + */ +private final int[] counts; + +/** + * The state flag. This is a bitwise OR of the entire history of all updated + * counts. If negative then a negative count or integer overflow has occurred on + * one or more counts in the history of the filter and the state is invalid. + * + * Maintenance of this state flag is branch-free for improved performance. It + * eliminates a conditional check for a negative count during remove/subtract + * operations and a conditional check for integer overflow during merge/add + * operations. + * + * Note: Integer overflow is unlikely in realistic usage scenarios. A count + * that overflows indicates that the number of items in the filter exceeds the + * maximum possible size (number of bits) of any Bloom filter constrained by + * integer indices. At this point the filter is most likely full (all bits are + * non-zero) and thus useless. + * + * Negative counts are a concern if the filter is used incorrectly by + * removing an item that was never added. It is expected that a user of a + * counting Bloom filter will not perform this action as it is a mistake. + * Enabling an explicit recovery path for negative or overflow counts is a major + * performance burden not deemed necessary for the unlikely scenarios when an + * invalid state is created. Maintenance of the state flag is a concession to + * flag improper use that should not have a major performance impact. + */ +private int state; + +/** + * An iterator of all indexes with non-zero counts. + * + * In the event that the filter state is invalid any index with a negative count + * will also be produced by the iterator. + */ +private class IndexIterator implements PrimitiveIterator.OfInt { +/** The next non-zero index (or counts.length). */ +private int ne
[jira] [Commented] (GEOMETRY-56) Create distribution archive
[ https://issues.apache.org/jira/browse/GEOMETRY-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054148#comment-17054148 ] Gilles Sadowski commented on GEOMETRY-56: - I guess that it will work when the URL exists, but the point is that it should still work without it (unless I'm missing the purpose of the "test-deploy" profile). Could you please post about the issue on the "dev" ML and file a bug report at the Commons' [common JIRA project|https://issues.apache.org/jira/browse/COMMONSSITE]? It could well be that additional properties must be set but IMHO, no connection attempt should be made when performing a local test intended to ensure that we won't break anything... > Create distribution archive > --- > > Key: GEOMETRY-56 > URL: https://issues.apache.org/jira/browse/GEOMETRY-56 > Project: Apache Commons Geometry > Issue Type: Improvement >Reporter: Karl Heinz Marbaise >Assignee: Karl Heinz Marbaise >Priority: Blocker > Labels: Maven, jenkins, pull-request-available > Fix For: 1.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *Problem* > * Currently the configuration for building a distribution archive does not > exist > * The current configuration for creating archives is wrong > *Goal* > * Create separate distribution archive module > * Create correct configuration for maven-assembly-plugin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEOMETRY-56) Create distribution archive
[ https://issues.apache.org/jira/browse/GEOMETRY-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054144#comment-17054144 ] Matt Juntunen commented on GEOMETRY-56: --- I used the rng module as a template so they should be equivalent. The {{commons.distSvnStagingUrl}} property that I copied from rng is set to {noformat} scm:svn:https://dist.apache.org/repos/dist/dev/commons/${commons.componentid} {noformat} This url exists for rng and other released commons projects but not for geometry (see [https://dist.apache.org/repos/dist/dev/commons/).] Perhaps we need to have someone create the entries for geometry (and numbers)? > Create distribution archive > --- > > Key: GEOMETRY-56 > URL: https://issues.apache.org/jira/browse/GEOMETRY-56 > Project: Apache Commons Geometry > Issue Type: Improvement >Reporter: Karl Heinz Marbaise >Assignee: Karl Heinz Marbaise >Priority: Blocker > Labels: Maven, jenkins, pull-request-available > Fix For: 1.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *Problem* > * Currently the configuration for building a distribution archive does not > exist > * The current configuration for creating archives is wrong > *Goal* > * Create separate distribution archive module > * Create correct configuration for maven-assembly-plugin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (LANG-1526) Add 1 and 0 in toBooleanObject(final String str) #502
[ https://issues.apache.org/jira/browse/LANG-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary D. Gregory resolved LANG-1526. --- Fix Version/s: 3.10 Resolution: Fixed > Add 1 and 0 in toBooleanObject(final String str) #502 > - > > Key: LANG-1526 > URL: https://issues.apache.org/jira/browse/LANG-1526 > Project: Commons Lang > Issue Type: Improvement >Reporter: Gary D. Gregory >Priority: Major > Fix For: 3.10 > > > Add 1 and 0 in toBooleanObject(final String str) #502 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (LANG-1526) Add 1 and 0 in toBooleanObject(final String str) #502
Gary D. Gregory created LANG-1526: - Summary: Add 1 and 0 in toBooleanObject(final String str) #502 Key: LANG-1526 URL: https://issues.apache.org/jira/browse/LANG-1526 Project: Commons Lang Issue Type: Improvement Reporter: Gary D. Gregory Add 1 and 0 in toBooleanObject(final String str) #502 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-lang] garydgregory commented on issue #502: Added 1 and 0 in toBooleanObject(final String str)
garydgregory commented on issue #502: Added 1 and 0 in toBooleanObject(final String str) URL: https://github.com/apache/commons-lang/pull/502#issuecomment-596102613 https://issues.apache.org/jira/browse/LANG-1526 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-lang] garydgregory merged pull request #502: Added 1 and 0 in toBooleanObject(final String str)
garydgregory merged pull request #502: Added 1 and 0 in toBooleanObject(final String str) URL: https://github.com/apache/commons-lang/pull/502 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-lang] garydgregory commented on issue #502: Added 1 and 0 in toBooleanObject(final String str)
garydgregory commented on issue #502: Added 1 and 0 in toBooleanObject(final String str) URL: https://github.com/apache/commons-lang/pull/502#issuecomment-596102522 OK by me as it also matches `javax.xml.bind.DatatypeConverter.parseBoolean(String)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-compress] bodewig commented on issue #94: Deprecated constructors of ZstdOutputStream fix
bodewig commented on issue #94: Deprecated constructors of ZstdOutputStream fix URL: https://github.com/apache/commons-compress/pull/94#issuecomment-596102333 many thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-compress] bodewig merged pull request #94: Deprecated constructors of ZstdOutputStream fix
bodewig merged pull request #94: Deprecated constructors of ZstdOutputStream fix URL: https://github.com/apache/commons-compress/pull/94 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-lang] coveralls commented on issue #502: Added 1 and 0 in toBooleanObject(final String str)
coveralls commented on issue #502: Added 1 and 0 in toBooleanObject(final String str) URL: https://github.com/apache/commons-lang/pull/502#issuecomment-596100034 [![Coverage Status](https://coveralls.io/builds/29198299/badge)](https://coveralls.io/builds/29198299) Coverage increased (+0.006%) to 95.103% when pulling **2170a00a75174bc6dc4ac522083b68ca4479b5d6 on dscham:master** into **ba607f525b842661d40195d0d4778528e2384e70 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (MATH-1515) Enhance clustering API
[ https://issues.apache.org/jira/browse/MATH-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilles Sadowski updated MATH-1515: -- Description: It had been noted (cf. other JIRA reports) quite some ago that the API defined in package {{o.a.c.math4.ml.clustering}} could be improved. Interest in this code has been recently renewed (cf. [this thread on the "dev" ML|https://markmail.org/message/vpablvbjou4nhrac]). This report will collect specific refactoring tasks. Linked reports should serve as a guide towards a flexible API and efficient implementations. was: It had been noted (cf. other JIRA reports) quite some ago that the API defined in package {{o.a.c.math4.ml.clustering}} could be improved. Interest in this code has been recently renewed (cf. [this thread on the "dev" ML|https://markmail.org/message/vpablvbjou4nhrac]). This report will collect specific refactoring tasks. > Enhance clustering API > -- > > Key: MATH-1515 > URL: https://issues.apache.org/jira/browse/MATH-1515 > Project: Commons Math > Issue Type: Improvement >Affects Versions: 3.6.1 >Reporter: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > It had been noted (cf. other JIRA reports) quite some ago that the API > defined in package {{o.a.c.math4.ml.clustering}} could be improved. > Interest in this code has been recently renewed (cf. [this thread on the > "dev" ML|https://markmail.org/message/vpablvbjou4nhrac]). > This report will collect specific refactoring tasks. > Linked reports should serve as a guide towards a flexible API and efficient > implementations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (MATH-1515) Enhance clustering API
[ https://issues.apache.org/jira/browse/MATH-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilles Sadowski updated MATH-1515: -- Summary: Enhance clustering API (was: Simplify clustering API) > Enhance clustering API > -- > > Key: MATH-1515 > URL: https://issues.apache.org/jira/browse/MATH-1515 > Project: Commons Math > Issue Type: Improvement >Affects Versions: 3.6.1 >Reporter: Gilles Sadowski >Priority: Minor > Fix For: 4.0 > > > It had been noted (cf. other JIRA reports) quite some ago that the API > defined in package {{o.a.c.math4.ml.clustering}} could be improved. > Interest in this code has been recently renewed (cf. [this thread on the > "dev" ML|https://markmail.org/message/vpablvbjou4nhrac]). > This report will collect specific refactoring tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-lang] dscham opened a new pull request #502: Added 1 and 0 in toBooleanObject(final String str)
dscham opened a new pull request #502: Added 1 and 0 in toBooleanObject(final String str) URL: https://github.com/apache/commons-lang/pull/502 I stubmled over this during the week and am wondering why 1 and 0 are not a default true and false representation. So I created this PR. I'm not strongly opnionated about it. So if there is a good reason for 1 and 0 not beeing a good case here, feel free to close it. I think, parsing the string to an int before calling toBoolean is unnecessary. And it should be common enough that you don't have to call: toBooleanObject(final String str, final String trueString, final String falseString, final String nullString) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (MATH-1516) Define an interface for ranking a list of clusters
Gilles Sadowski created MATH-1516: - Summary: Define an interface for ranking a list of clusters Key: MATH-1516 URL: https://issues.apache.org/jira/browse/MATH-1516 Project: Commons Math Issue Type: Sub-task Reporter: Gilles Sadowski Assignee: Gilles Sadowski Fix For: 4.0 [On the "dev" ML|https://markmail.org/message/z4qr3fcsg5emt2nn] it has been suggested to create a functional interface for unequivocally defining the quality of a clustering: * a valid ranking must be positive, * better clustering is conveyed through higher ranking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (MATH-1515) Simplify clustering API
Gilles Sadowski created MATH-1515: - Summary: Simplify clustering API Key: MATH-1515 URL: https://issues.apache.org/jira/browse/MATH-1515 Project: Commons Math Issue Type: Improvement Affects Versions: 3.6.1 Reporter: Gilles Sadowski Fix For: 4.0 It had been noted (cf. other JIRA reports) quite some ago that the API defined in package {{o.a.c.math4.ml.clustering}} could be improved. Interest in this code has been recently renewed (cf. [this thread on the "dev" ML|https://markmail.org/message/vpablvbjou4nhrac]). This report will collect specific refactoring tasks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEOMETRY-56) Create distribution archive
[ https://issues.apache.org/jira/browse/GEOMETRY-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054095#comment-17054095 ] Gilles Sadowski commented on GEOMETRY-56: - Yes, trying this command: {noformat} JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 mvn -Duser.name=erans -Dcommons.release.dryRun=true -Ptest-deploy -Prelease clean test package site deploy {noformat} I get {noformat} [INFO] Checking out dist from: scm:svn:https://dist.apache.org/repos/dist/dev/commons/geometry Executing: /bin/sh -c cd '/home/eran/devel/java/apache/commons-geometry/trunk/dist-archive/target/commons-release-plugin' && 'svn' '--username' 'erans' '--no-auth-cache' '--non-interactive' 'checkout' 'https://dist.apache.org/repos/dist/dev/commons/geometry@' '/home/eran/devel/java/apache/commons-geometry/trunk/dist-archive/target/commons-release-plugin/scm-cleanup' [INFO] [INFO] Reactor Summary for Apache Commons Geometry 1.0-SNAPSHOT: [INFO] [INFO] Apache Commons Geometry SUCCESS [ 26.860 s] [INFO] Apache Commons Geometry Core ... SUCCESS [ 28.600 s] [INFO] Apache Commons Geometry Euclidean .. SUCCESS [ 44.256 s] [INFO] Apache Commons Geometry Enclosing .. SUCCESS [ 19.762 s] [INFO] Apache Commons Geometry Spherical .. SUCCESS [ 26.146 s] [INFO] Apache Commons Geometry Hull ... SUCCESS [ 18.299 s] [INFO] Apache Commons Geometry Examples ... SUCCESS [ 7.335 s] [INFO] Apache Commons Geometry JMH Benchmark .. SUCCESS [ 32.216 s] [INFO] Apache Commons Geometry (full distribution) FAILURE [ 1.020 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 03:24 min [INFO] Finished at: 2020-03-07T15:10:37+01:00 [INFO] [ERROR] Failed to execute goal org.apache.commons:commons-release-plugin:1.7:clean-staging (clean-staging) on project commons-geometry: Failed to checkout files from SCM: The svn command failed. [svn: E17: URL 'https://dist.apache.org/repos/dist/dev/commons/geometry' doesn't exist {noformat} At first sight, this is strange because the "-Ptest-deploy" is supposed to only perform local operations. There are recurrent issues with SVN (the annoying attempts to copy the whole site in each of the modules' "site-content" directory), which Alex had solved (IIRC). Could you check and ensure that the set-up of {{dist-archive}} is exactly the same as for "Commons RNG"? > Create distribution archive > --- > > Key: GEOMETRY-56 > URL: https://issues.apache.org/jira/browse/GEOMETRY-56 > Project: Apache Commons Geometry > Issue Type: Improvement >Reporter: Karl Heinz Marbaise >Assignee: Karl Heinz Marbaise >Priority: Blocker > Labels: Maven, jenkins, pull-request-available > Fix For: 1.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *Problem* > * Currently the configuration for building a distribution archive does not > exist > * The current configuration for creating archives is wrong > *Goal* > * Create separate distribution archive module > * Create correct configuration for maven-assembly-plugin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEOMETRY-56) Create distribution archive
[ https://issues.apache.org/jira/browse/GEOMETRY-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054024#comment-17054024 ] Matt Juntunen commented on GEOMETRY-56: --- Do you see any issues with this dist-archive module? > Create distribution archive > --- > > Key: GEOMETRY-56 > URL: https://issues.apache.org/jira/browse/GEOMETRY-56 > Project: Apache Commons Geometry > Issue Type: Improvement >Reporter: Karl Heinz Marbaise >Assignee: Karl Heinz Marbaise >Priority: Blocker > Labels: Maven, jenkins, pull-request-available > Fix For: 1.0 > > Time Spent: 10m > Remaining Estimate: 0h > > *Problem* > * Currently the configuration for building a distribution archive does not > exist > * The current configuration for creating archives is wrong > *Goal* > * Create separate distribution archive module > * Create correct configuration for maven-assembly-plugin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-math] coveralls edited a comment on issue #122: Implement the Calinski-Harabasz Evaluator algorithm for Cluster
coveralls edited a comment on issue #122: Implement the Calinski-Harabasz Evaluator algorithm for Cluster URL: https://github.com/apache/commons-math/pull/122#issuecomment-595858824 [![Coverage Status](https://coveralls.io/builds/29197288/badge)](https://coveralls.io/builds/29197288) Coverage increased (+0.0009%) to 90.513% when pulling **51f80c2a8b02888b3a6209456401b2ecd0ebc952 on chentao106:HalinskiHarabasz** into **4d5983aa8756c99cdd20618aecf24ea31c399311 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (GEOMETRY-90) Slerp Wrapper
[ https://issues.apache.org/jira/browse/GEOMETRY-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Juntunen closed GEOMETRY-90. - > Slerp Wrapper > - > > Key: GEOMETRY-90 > URL: https://issues.apache.org/jira/browse/GEOMETRY-90 > Project: Apache Commons Geometry > Issue Type: Improvement >Reporter: Matt Juntunen >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The current API for performing slerp operations with {{QuaternionRotation}} > instances is somewhat cumbersome. The {{QuaternionRotation.slerp}} method > returns an instance of {{org.apache.commons.numbers.quaternion.Slerp}}, which > cannot be directly used with any of the other classes in commons-geometry. > The use cases therefore end up looking like this: > {code:java} > QuaterionRotation start = ...; > QuaternionRotation end = ...; > Slerp s = start.slerp(end); > Quaternion midQuat = s.apply(0.5); // commons-numbers objects > QuaterionRotation mid = QuaternionRotation.of(midQuat); // convert to > commons-geometry object > {code} > I propose that the {{QuaternionRotation.slerp}} method return a small wrapper > class (perhaps named {{SlerpFunction}} to avoid a name collision with > {{Slerp}}) that makes this more convenient. > {code:java} > QuaterionRotation start = ...; > QuaternionRotation end = ...; > SlerpFunction s = start.slerp(end); > QuaterionRotation mid = s.apply(0.5); // no conversions needed > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEOMETRY-90) Slerp Wrapper
[ https://issues.apache.org/jira/browse/GEOMETRY-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054019#comment-17054019 ] Matt Juntunen commented on GEOMETRY-90: --- Looks good to me! > Slerp Wrapper > - > > Key: GEOMETRY-90 > URL: https://issues.apache.org/jira/browse/GEOMETRY-90 > Project: Apache Commons Geometry > Issue Type: Improvement >Reporter: Matt Juntunen >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The current API for performing slerp operations with {{QuaternionRotation}} > instances is somewhat cumbersome. The {{QuaternionRotation.slerp}} method > returns an instance of {{org.apache.commons.numbers.quaternion.Slerp}}, which > cannot be directly used with any of the other classes in commons-geometry. > The use cases therefore end up looking like this: > {code:java} > QuaterionRotation start = ...; > QuaternionRotation end = ...; > Slerp s = start.slerp(end); > Quaternion midQuat = s.apply(0.5); // commons-numbers objects > QuaterionRotation mid = QuaternionRotation.of(midQuat); // convert to > commons-geometry object > {code} > I propose that the {{QuaternionRotation.slerp}} method return a small wrapper > class (perhaps named {{SlerpFunction}} to avoid a name collision with > {{Slerp}}) that makes this more convenient. > {code:java} > QuaterionRotation start = ...; > QuaternionRotation end = ...; > SlerpFunction s = start.slerp(end); > QuaterionRotation mid = s.apply(0.5); // no conversions needed > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (GEOMETRY-90) Slerp Wrapper
[ https://issues.apache.org/jira/browse/GEOMETRY-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Juntunen resolved GEOMETRY-90. --- Resolution: Done > Slerp Wrapper > - > > Key: GEOMETRY-90 > URL: https://issues.apache.org/jira/browse/GEOMETRY-90 > Project: Apache Commons Geometry > Issue Type: Improvement >Reporter: Matt Juntunen >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The current API for performing slerp operations with {{QuaternionRotation}} > instances is somewhat cumbersome. The {{QuaternionRotation.slerp}} method > returns an instance of {{org.apache.commons.numbers.quaternion.Slerp}}, which > cannot be directly used with any of the other classes in commons-geometry. > The use cases therefore end up looking like this: > {code:java} > QuaterionRotation start = ...; > QuaternionRotation end = ...; > Slerp s = start.slerp(end); > Quaternion midQuat = s.apply(0.5); // commons-numbers objects > QuaterionRotation mid = QuaternionRotation.of(midQuat); // convert to > commons-geometry object > {code} > I propose that the {{QuaternionRotation.slerp}} method return a small wrapper > class (perhaps named {{SlerpFunction}} to avoid a name collision with > {{Slerp}}) that makes this more convenient. > {code:java} > QuaterionRotation start = ...; > QuaternionRotation end = ...; > SlerpFunction s = start.slerp(end); > QuaterionRotation mid = s.apply(0.5); // no conversions needed > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-lang] coveralls edited a comment on issue #501: Expand Streams functionality
coveralls edited a comment on issue #501: Expand Streams functionality URL: https://github.com/apache/commons-lang/pull/501#issuecomment-595538858 [![Coverage Status](https://coveralls.io/builds/29197065/badge)](https://coveralls.io/builds/29197065) Coverage decreased (-0.2%) to 94.86% when pulling **fea99ccfd2c3e4278835334a2f7983d5fd65b3c5 on Isira-Seneviratne:master** into **200d8e97453aa755af78f10dbc268c5d4f2a3e01 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-compress] coveralls commented on issue #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry
coveralls commented on issue #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry URL: https://github.com/apache/commons-compress/pull/96#issuecomment-596072872 [![Coverage Status](https://coveralls.io/builds/29196607/badge)](https://coveralls.io/builds/29196607) Coverage increased (+0.2%) to 87.182% when pulling **f2c09166ddd3f7af8afbcd70eb4a02ce99cda0eb on PeterAlfreadLee:ADD_TESTCASES_FOR_ZIP** into **1e8d131ec08ae418f0140d8166d452dff6739937 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-compress] PeterAlfreadLee opened a new pull request #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry
PeterAlfreadLee opened a new pull request #96: Add testcases for ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry URL: https://github.com/apache/commons-compress/pull/96 Add testcases for zip, including ZipUtil, ZipFile, ZipArchiveOutputStream and ZipArchiveEntry This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] aherbert commented on a change in pull request #138: Update the package-info.java
aherbert commented on a change in pull request #138: Update the package-info.java URL: https://github.com/apache/commons-collections/pull/138#discussion_r389244501 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/function/package-info.java ## @@ -16,7 +16,7 @@ */ /** - * Implementations of org.apache.commons.collections4.bloomfilter.hasher.HasherFunction + * Implementations of org.apache.commons.collections4.bloomfilter.hasher.function Review comment: This package info is a skeleton. It should really be expanded. For now it could probably be changed to: ``` Provides classes and interfaces to define the shape of a Bloom filter and the conversion of generic bytes to a hash of bit indexes to be used with a Bloom filter. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] aherbert commented on a change in pull request #138: Update the package-info.java
aherbert commented on a change in pull request #138: Update the package-info.java URL: https://github.com/apache/commons-collections/pull/138#discussion_r389244609 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/package-info.java ## @@ -16,7 +16,7 @@ */ /** - * Implementations of org.apache.commons.collections4.bloomfilter.Hasher + * Implementations of org.apache.commons.collections4.bloomfilter.hasher Review comment: As above this is just a skeleton ``` Provides implementations of the {@link org.apache.commons.collections4.bloomfilter.hasher.HashFunction}. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] coveralls commented on issue #139: Make BloomFilterIndexer and HashFunctionValidator public
coveralls commented on issue #139: Make BloomFilterIndexer and HashFunctionValidator public URL: https://github.com/apache/commons-collections/pull/139#issuecomment-596071207 [![Coverage Status](https://coveralls.io/builds/29196283/badge)](https://coveralls.io/builds/29196283) Coverage decreased (-0.03%) to 89.921% when pulling **35dc6805c6482d1bd55dcc262eda2fcac5826aff on Claudenw:Collections-755_make_bloom_filter_helper_functions_public** into **9831773447456466ae93b07e59ea76c8cce31787 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw opened a new pull request #139: Make BloomFilterIndexer and HashFunctionValidator public
Claudenw opened a new pull request #139: Make BloomFilterIndexer and HashFunctionValidator public URL: https://github.com/apache/commons-collections/pull/139 fix for Collections-755 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (COLLECTIONS-755) Make Bloom filter helper functions public
Claude Warren created COLLECTIONS-755: - Summary: Make Bloom filter helper functions public Key: COLLECTIONS-755 URL: https://issues.apache.org/jira/browse/COLLECTIONS-755 Project: Commons Collections Issue Type: Improvement Components: Collection Affects Versions: 4.5 Reporter: Claude Warren The HashFunctionValidator and BloomFilterIndexer provide static methods that are useful in creating new Bloom filter implementations that can interact with the implementations provided by Collections. This change is to make those methods public to enable and simplify the interaction for developers building new Bloom filter implementations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [commons-collections] coveralls commented on issue #138: Update the package-info.java
coveralls commented on issue #138: Update the package-info.java URL: https://github.com/apache/commons-collections/pull/138#issuecomment-596069578 [![Coverage Status](https://coveralls.io/builds/29195975/badge)](https://coveralls.io/builds/29195975) Coverage remained the same at 89.951% when pulling **cde94d4017d1a161e3287b5295e2c28f2a3dd78c on dota17:updatePackage-info** into **9831773447456466ae93b07e59ea76c8cce31787 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242973 ## File path: src/test/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasherTest.java ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; +import java.util.PrimitiveIterator.OfInt; + +import org.apache.commons.collections4.bloomfilter.hasher.function.MD5Cyclic; +import org.junit.Before; +import org.junit.Test; + +/** + * Tests the {@link CachingHasher}. + */ +public class CachingHasherTest { +private CachingHasher.Builder builder; +private Shape shape; + +private final HashFunctionIdentity testFunction = new HashFunctionIdentity() { + +@Override +public String getName() { +return "Test Function"; +} + +@Override +public ProcessType getProcessType() { +return ProcessType.CYCLIC; +} + +@Override +public String getProvider() { +return "Apache Commons Collection Tests"; +} + +@Override +public long getSignature() { +return 0; +} + +@Override +public Signedness getSignedness() { +return Signedness.SIGNED; +} +}; + +/** + * Sets up the CachingHasher. + */ +@Before +public void setup() { +builder = new CachingHasher.Builder(new MD5Cyclic()); +shape = new Shape(new MD5Cyclic(), 3, 72, 17); +} + +/** + * Tests that the expected bits are returned from hashing. + */ +@Test +public void testGetBits() { + +final int[] expected = { 6, 69, 44, 19, 10, 57, 48, 23, 70, 61, 36, 11, 2, 49, 24, 15, 62 }; + +final Hasher hasher = builder.with("Hello").build(); + +final OfInt iter = hasher.getBits(shape); + +for (final int element : expected) { +assertTrue(iter.hasNext()); +assertEquals(element, iter.nextInt()); +} +assertFalse(iter.hasNext()); +} + +/** + * Tests that bits from multiple hashes are returned correctly. + */ +@Test +public void testGetBits_MultipleHashes() { +final int[] expected = { 6, 69, 44, 19, 10, 57, 48, 23, 70, 61, 36, 11, 2, 49, 24, 15, 62, 1, 63, 53, 43, 17, 7, +69, 59, 49, 39, 13, 3, 65, 55, 45, 35, 25 }; + +final Hasher hasher = builder.with("Hello").with("World").build(); + +final OfInt iter = hasher.getBits(shape); + +for (final int element : expected) { +assertTrue(iter.hasNext()); +assertEquals(element, iter.nextInt()); +} +assertFalse(iter.hasNext()); +try { +iter.next(); +fail("Should have thown NoSuchElementException"); +} catch (final NoSuchElementException ignore) { +// do nothing +} +} + +/** + * Tests that retrieving bits for the wrong shape throws an exception. + */ +@Test +public void testGetBits_WongShape() { + +final Hasher hasher = builder.with("Hello").build(); + +try { +hasher.getBits(new Shape(testFunction, 3, 72, 17)); +fail("Should have thown IllegalArgumentException"); +} catch (final IllegalArgumentException expected) { +// do nothing +} +} + +/** + * Tests if isEmpty() reports correctly and the iterator returns no values. + */ +@Test +public void testIsEmpty() { +CachingHasher hasher = builder.build(); +assertTrue(hasher.isEmpty()); +final OfInt iter = hasher.getBits(shape); +assertFalse(iter.hasNext()); +try { +iter.next(); +fail("Should have thown NoSuchElementException"); +} catch (final NoSu
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242984 ## File path: src/test/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasherTest.java ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; +import java.util.PrimitiveIterator.OfInt; + +import org.apache.commons.collections4.bloomfilter.hasher.function.MD5Cyclic; +import org.junit.Before; +import org.junit.Test; + +/** + * Tests the {@link CachingHasher}. + */ +public class CachingHasherTest { +private CachingHasher.Builder builder; +private Shape shape; + +private final HashFunctionIdentity testFunction = new HashFunctionIdentity() { + +@Override +public String getName() { +return "Test Function"; +} + +@Override +public ProcessType getProcessType() { +return ProcessType.CYCLIC; +} + +@Override +public String getProvider() { +return "Apache Commons Collection Tests"; +} + +@Override +public long getSignature() { +return 0; +} + +@Override +public Signedness getSignedness() { +return Signedness.SIGNED; +} +}; + +/** + * Sets up the CachingHasher. + */ +@Before +public void setup() { +builder = new CachingHasher.Builder(new MD5Cyclic()); +shape = new Shape(new MD5Cyclic(), 3, 72, 17); +} + +/** + * Tests that the expected bits are returned from hashing. + */ +@Test +public void testGetBits() { + +final int[] expected = { 6, 69, 44, 19, 10, 57, 48, 23, 70, 61, 36, 11, 2, 49, 24, 15, 62 }; + +final Hasher hasher = builder.with("Hello").build(); + +final OfInt iter = hasher.getBits(shape); + +for (final int element : expected) { +assertTrue(iter.hasNext()); +assertEquals(element, iter.nextInt()); +} +assertFalse(iter.hasNext()); +} + +/** + * Tests that bits from multiple hashes are returned correctly. + */ +@Test +public void testGetBits_MultipleHashes() { +final int[] expected = { 6, 69, 44, 19, 10, 57, 48, 23, 70, 61, 36, 11, 2, 49, 24, 15, 62, 1, 63, 53, 43, 17, 7, +69, 59, 49, 39, 13, 3, 65, 55, 45, 35, 25 }; + +final Hasher hasher = builder.with("Hello").with("World").build(); + +final OfInt iter = hasher.getBits(shape); + +for (final int element : expected) { +assertTrue(iter.hasNext()); +assertEquals(element, iter.nextInt()); +} +assertFalse(iter.hasNext()); +try { +iter.next(); +fail("Should have thown NoSuchElementException"); +} catch (final NoSuchElementException ignore) { +// do nothing +} +} + +/** + * Tests that retrieving bits for the wrong shape throws an exception. + */ +@Test +public void testGetBits_WongShape() { + +final Hasher hasher = builder.with("Hello").build(); + +try { +hasher.getBits(new Shape(testFunction, 3, 72, 17)); +fail("Should have thown IllegalArgumentException"); +} catch (final IllegalArgumentException expected) { +// do nothing +} +} + +/** + * Tests if isEmpty() reports correctly and the iterator returns no values. + */ +@Test +public void testIsEmpty() { +CachingHasher hasher = builder.build(); +assertTrue(hasher.isEmpty()); +final OfInt iter = hasher.getBits(shape); +assertFalse(iter.hasNext()); +try { +iter.next(); +fail("Should have thown NoSuchElementException"); +} catch (final NoSu
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242943 ## File path: src/test/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasherTest.java ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; +import java.util.PrimitiveIterator.OfInt; + +import org.apache.commons.collections4.bloomfilter.hasher.function.MD5Cyclic; +import org.junit.Before; +import org.junit.Test; + +/** + * Tests the {@link CachingHasher}. + */ +public class CachingHasherTest { +private CachingHasher.Builder builder; +private Shape shape; + +private final HashFunctionIdentity testFunction = new HashFunctionIdentity() { + +@Override +public String getName() { +return "Test Function"; +} + +@Override +public ProcessType getProcessType() { +return ProcessType.CYCLIC; +} + +@Override +public String getProvider() { +return "Apache Commons Collection Tests"; +} + +@Override +public long getSignature() { +return 0; +} + +@Override +public Signedness getSignedness() { +return Signedness.SIGNED; +} +}; + +/** + * Sets up the CachingHasher. + */ +@Before +public void setup() { +builder = new CachingHasher.Builder(new MD5Cyclic()); +shape = new Shape(new MD5Cyclic(), 3, 72, 17); +} + +/** + * Tests that the expected bits are returned from hashing. + */ +@Test +public void testGetBits() { + +final int[] expected = { 6, 69, 44, 19, 10, 57, 48, 23, 70, 61, 36, 11, 2, 49, 24, 15, 62 }; + +final Hasher hasher = builder.with("Hello").build(); + +final OfInt iter = hasher.getBits(shape); + +for (final int element : expected) { +assertTrue(iter.hasNext()); +assertEquals(element, iter.nextInt()); +} +assertFalse(iter.hasNext()); +} + +/** + * Tests that bits from multiple hashes are returned correctly. + */ +@Test +public void testGetBits_MultipleHashes() { +final int[] expected = { 6, 69, 44, 19, 10, 57, 48, 23, 70, 61, 36, 11, 2, 49, 24, 15, 62, 1, 63, 53, 43, 17, 7, +69, 59, 49, 39, 13, 3, 65, 55, 45, 35, 25 }; + +final Hasher hasher = builder.with("Hello").with("World").build(); + +final OfInt iter = hasher.getBits(shape); + +for (final int element : expected) { +assertTrue(iter.hasNext()); +assertEquals(element, iter.nextInt()); +} +assertFalse(iter.hasNext()); +try { +iter.next(); +fail("Should have thown NoSuchElementException"); +} catch (final NoSuchElementException ignore) { +// do nothing +} +} + +/** + * Tests that retrieving bits for the wrong shape throws an exception. + */ +@Test +public void testGetBits_WongShape() { + +final Hasher hasher = builder.with("Hello").build(); + +try { +hasher.getBits(new Shape(testFunction, 3, 72, 17)); +fail("Should have thown IllegalArgumentException"); +} catch (final IllegalArgumentException expected) { +// do nothing +} +} + +/** + * Tests if isEmpty() reports correctly and the iterator returns no values. + */ +@Test +public void testIsEmpty() { +CachingHasher hasher = builder.build(); +assertTrue(hasher.isEmpty()); +final OfInt iter = hasher.getBits(shape); +assertFalse(iter.hasNext()); +try { +iter.next(); +fail("Should have thown NoSuchElementException"); +} catch (final NoSu
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242956 ## File path: src/test/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasherTest.java ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; +import java.util.PrimitiveIterator.OfInt; + +import org.apache.commons.collections4.bloomfilter.hasher.function.MD5Cyclic; +import org.junit.Before; +import org.junit.Test; + +/** + * Tests the {@link CachingHasher}. + */ +public class CachingHasherTest { +private CachingHasher.Builder builder; +private Shape shape; + +private final HashFunctionIdentity testFunction = new HashFunctionIdentity() { Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242994 ## File path: src/test/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasherTest.java ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; +import java.util.PrimitiveIterator.OfInt; + +import org.apache.commons.collections4.bloomfilter.hasher.function.MD5Cyclic; +import org.junit.Before; +import org.junit.Test; + +/** + * Tests the {@link CachingHasher}. + */ +public class CachingHasherTest { +private CachingHasher.Builder builder; +private Shape shape; + +private final HashFunctionIdentity testFunction = new HashFunctionIdentity() { + +@Override +public String getName() { +return "Test Function"; +} + +@Override +public ProcessType getProcessType() { +return ProcessType.CYCLIC; +} + +@Override +public String getProvider() { +return "Apache Commons Collection Tests"; +} + +@Override +public long getSignature() { +return 0; +} + +@Override +public Signedness getSignedness() { +return Signedness.SIGNED; +} +}; + +/** + * Sets up the CachingHasher. + */ +@Before +public void setup() { +builder = new CachingHasher.Builder(new MD5Cyclic()); +shape = new Shape(new MD5Cyclic(), 3, 72, 17); +} + +/** + * Tests that the expected bits are returned from hashing. + */ +@Test +public void testGetBits() { + +final int[] expected = { 6, 69, 44, 19, 10, 57, 48, 23, 70, 61, 36, 11, 2, 49, 24, 15, 62 }; + +final Hasher hasher = builder.with("Hello").build(); + +final OfInt iter = hasher.getBits(shape); + +for (final int element : expected) { +assertTrue(iter.hasNext()); +assertEquals(element, iter.nextInt()); +} +assertFalse(iter.hasNext()); +} + +/** + * Tests that bits from multiple hashes are returned correctly. + */ +@Test +public void testGetBits_MultipleHashes() { +final int[] expected = { 6, 69, 44, 19, 10, 57, 48, 23, 70, 61, 36, 11, 2, 49, 24, 15, 62, 1, 63, 53, 43, 17, 7, +69, 59, 49, 39, 13, 3, 65, 55, 45, 35, 25 }; + +final Hasher hasher = builder.with("Hello").with("World").build(); + +final OfInt iter = hasher.getBits(shape); + +for (final int element : expected) { +assertTrue(iter.hasNext()); +assertEquals(element, iter.nextInt()); +} +assertFalse(iter.hasNext()); +try { +iter.next(); +fail("Should have thown NoSuchElementException"); +} catch (final NoSuchElementException ignore) { +// do nothing +} +} + +/** + * Tests that retrieving bits for the wrong shape throws an exception. + */ +@Test +public void testGetBits_WongShape() { + Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242913 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasher.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; + +import org.apache.commons.collections4.bloomfilter.hasher.HashFunctionIdentity.ProcessType; + +/** + * An implementation of Hasher that attempts to leak as little data as possible. + * Each item in the hasher is represented by two (2) longs. So this Hasher will + * still indicate how many items are in the hasher but will not leak the buffers + * that are being hashed as the @code DynamicHasher} does. + * + * This hasher only accepts HashFunctions that are cyclic in nature. + * + * @see DynamicHasher + * @see ProcessType + */ +public class CachingHasher implements Hasher { + +/** + * The list of byte arrays that are to be hashed. + */ +private final List buffers; + +/** + * The hash function identity + */ +private final HashFunctionIdentity functionIdentity; + +/** + * Constructs a CachingHasher from a list of arrays of hash values. + * + * The list of hash values comprises a @code{List} where each @code{long[]} + * is comprises two (2) values that are the result of hashing the original buffer. Thus a + * CachingHasher that was built from five (5) buffers will have five arrays of two @code{longs} + * each. + * + * @param functionIdentity The identity of the function. + * @param buffers a list of @code{long} arrays comprising two (2) values. + * @throws IllegalArgumentException if the name does not indicate a cyclic + * hashing function. + */ +public CachingHasher(HashFunctionIdentity functionIdentity, List buffers) { +this.functionIdentity = checkIdentity(functionIdentity); +this.buffers = new ArrayList(buffers); +} + +/** + * Constructs a CachingHasher from an array of arrys of hash values. + * + * @param functionIdentity The identity of the function. + * @param buffers an array of @code{long} arrays comprising two (2) values. + * @throws IllegalArgumentException if the name does not indicate a cyclic + * hashing function. + */ +public CachingHasher(HashFunctionIdentity functionIdentity, long[][] buffers) { +this.functionIdentity = checkIdentity(functionIdentity); +this.buffers = Arrays.asList(buffers); +} + +/** + * Checks that the name is valid for this hasher. + * + * @param functionIdentity the Function Identity to check. + */ +private static HashFunctionIdentity checkIdentity(HashFunctionIdentity functionIdentity) { +if (functionIdentity.getProcessType() != ProcessType.CYCLIC) { +throw new IllegalArgumentException("Only cyclic hash functions may be used in a caching hasher"); +} +return functionIdentity; +} + +@Override +public HashFunctionIdentity getHashFunctionIdentity() { +return functionIdentity; +} + +@Override +public boolean isEmpty() { +return buffers.isEmpty(); +} + + +@Override +public PrimitiveIterator.OfInt getBits(Shape shape) { +HashFunctionValidator.checkAreEqual(getHashFunctionIdentity(), +shape.getHashFunctionIdentity()); +return new IntIterator(shape); +} + +/** + * Gets the long representations of the buffers. + * + * This method returns the long representations of the buffers. This is commonly used + * to transmit the Hasher from one system to another. + * + * @return a copy if the long buffer representation. + */ +public List getBuffers() { +return new Arra
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242932 ## File path: src/test/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasherBuilderTest.java ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.PrimitiveIterator.OfInt; + +import org.apache.commons.collections4.bloomfilter.hasher.function.MD5Cyclic; +import org.junit.Before; +import org.junit.Test; + +/** + * {@link CachingHasher.Builder} tests. + */ +public class CachingHasherBuilderTest { + +private CachingHasher.Builder builder; +private final Shape shape = new Shape(new MD5Cyclic(), 1, Integer.MAX_VALUE, 1); + +/** + * Tests that hashing a byte works as expected. + */ +@Test +public void buildTest_byte() { +final CachingHasher hasher = builder.with((byte) 0x1).build(); + +final int expected = 1483089307; + +final OfInt iter = hasher.getBits(shape); + +assertTrue(iter.hasNext()); +assertEquals(expected, iter.nextInt()); +assertFalse(iter.hasNext()); +} + +/** + * Tests that hashing a byte array works as expected. + */ +@Test +public void buildTest_byteArray() { +final CachingHasher hasher = builder.with("Hello".getBytes()).build(); +final int expected = 1519797563; + +final OfInt iter = hasher.getBits(shape); + +assertTrue(iter.hasNext()); +assertEquals(expected, iter.nextInt()); +assertFalse(iter.hasNext()); +} + +/** + * Tests that an empty hasher works as expected. + */ +@Test +public void buildTest_Empty() { +final CachingHasher hasher = builder.build(); + +final OfInt iter = hasher.getBits(shape); + +assertFalse(iter.hasNext()); +} + +/** + * Tests that hashing a string works as expected. + */ +@Test +public void buildTest_String() { +final CachingHasher hasher = builder.with("Hello").build(); +final int expected = 1519797563; + +final OfInt iter = hasher.getBits(shape); + +assertTrue(iter.hasNext()); +assertEquals(expected, iter.nextInt()); +assertFalse(iter.hasNext()); +} + +/** + * Sets up the builder for testing. + */ +@Before Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242745 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasher.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; + +import org.apache.commons.collections4.bloomfilter.hasher.HashFunctionIdentity.ProcessType; + +/** + * An implementation of Hasher that attempts to leak as little data as possible. + * Each item in the hasher is represented by two (2) longs. So this Hasher will + * still indicate how many items are in the hasher but will not leak the buffers + * that are being hashed as the @code DynamicHasher} does. + * + * This hasher only accepts HashFunctions that are cyclic in nature. + * + * @see DynamicHasher + * @see ProcessType + */ +public class CachingHasher implements Hasher { + +/** + * The list of byte arrays that are to be hashed. + */ +private final List buffers; + +/** + * The hash function identity + */ +private final HashFunctionIdentity functionIdentity; + +/** + * Constructs a CachingHasher from a list of arrays of hash values. + * + * The list of hash values comprises a @code{List} where each @code{long[]} + * is comprises two (2) values that are the result of hashing the original buffer. Thus a + * CachingHasher that was built from five (5) buffers will have five arrays of two @code{longs} + * each. + * + * @param functionIdentity The identity of the function. + * @param buffers a list of @code{long} arrays comprising two (2) values. + * @throws IllegalArgumentException if the name does not indicate a cyclic + * hashing function. + */ +public CachingHasher(HashFunctionIdentity functionIdentity, List buffers) { +this.functionIdentity = checkIdentity(functionIdentity); +this.buffers = new ArrayList(buffers); +} + +/** + * Constructs a CachingHasher from an array of arrys of hash values. + * + * @param functionIdentity The identity of the function. + * @param buffers an array of @code{long} arrays comprising two (2) values. + * @throws IllegalArgumentException if the name does not indicate a cyclic + * hashing function. + */ +public CachingHasher(HashFunctionIdentity functionIdentity, long[][] buffers) { +this.functionIdentity = checkIdentity(functionIdentity); +this.buffers = Arrays.asList(buffers); +} + +/** + * Checks that the name is valid for this hasher. + * + * @param functionIdentity the Function Identity to check. + */ +private static HashFunctionIdentity checkIdentity(HashFunctionIdentity functionIdentity) { +if (functionIdentity.getProcessType() != ProcessType.CYCLIC) { +throw new IllegalArgumentException("Only cyclic hash functions may be used in a caching hasher"); +} +return functionIdentity; +} + +@Override +public HashFunctionIdentity getHashFunctionIdentity() { +return functionIdentity; +} + +@Override +public boolean isEmpty() { +return buffers.isEmpty(); +} + + +@Override +public PrimitiveIterator.OfInt getBits(Shape shape) { +HashFunctionValidator.checkAreEqual(getHashFunctionIdentity(), +shape.getHashFunctionIdentity()); +return new IntIterator(shape); +} + +/** + * Gets the long representations of the buffers. + * + * This method returns the long representations of the buffers. This is commonly used + * to transmit the Hasher from one system to another. + * + * @return a copy if the long buffer representation. + */ +public List getBuffers() { +return new Arra
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242724 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasher.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; + +import org.apache.commons.collections4.bloomfilter.hasher.HashFunctionIdentity.ProcessType; + +/** + * An implementation of Hasher that attempts to leak as little data as possible. + * Each item in the hasher is represented by two (2) longs. So this Hasher will + * still indicate how many items are in the hasher but will not leak the buffers + * that are being hashed as the @code DynamicHasher} does. + * + * This hasher only accepts HashFunctions that are cyclic in nature. + * + * @see DynamicHasher + * @see ProcessType + */ +public class CachingHasher implements Hasher { + +/** + * The list of byte arrays that are to be hashed. + */ +private final List buffers; + +/** + * The hash function identity + */ +private final HashFunctionIdentity functionIdentity; + +/** + * Constructs a CachingHasher from a list of arrays of hash values. + * + * The list of hash values comprises a @code{List} where each @code{long[]} + * is comprises two (2) values that are the result of hashing the original buffer. Thus a + * CachingHasher that was built from five (5) buffers will have five arrays of two @code{longs} + * each. + * + * @param functionIdentity The identity of the function. + * @param buffers a list of @code{long} arrays comprising two (2) values. + * @throws IllegalArgumentException if the name does not indicate a cyclic + * hashing function. + */ +public CachingHasher(HashFunctionIdentity functionIdentity, List buffers) { +this.functionIdentity = checkIdentity(functionIdentity); +this.buffers = new ArrayList(buffers); +} + +/** + * Constructs a CachingHasher from an array of arrys of hash values. + * + * @param functionIdentity The identity of the function. + * @param buffers an array of @code{long} arrays comprising two (2) values. Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242733 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasher.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; + +import org.apache.commons.collections4.bloomfilter.hasher.HashFunctionIdentity.ProcessType; + +/** + * An implementation of Hasher that attempts to leak as little data as possible. + * Each item in the hasher is represented by two (2) longs. So this Hasher will + * still indicate how many items are in the hasher but will not leak the buffers + * that are being hashed as the @code DynamicHasher} does. + * + * This hasher only accepts HashFunctions that are cyclic in nature. + * + * @see DynamicHasher + * @see ProcessType + */ +public class CachingHasher implements Hasher { + +/** + * The list of byte arrays that are to be hashed. + */ +private final List buffers; + +/** + * The hash function identity + */ +private final HashFunctionIdentity functionIdentity; + +/** + * Constructs a CachingHasher from a list of arrays of hash values. + * + * The list of hash values comprises a @code{List} where each @code{long[]} + * is comprises two (2) values that are the result of hashing the original buffer. Thus a + * CachingHasher that was built from five (5) buffers will have five arrays of two @code{longs} + * each. + * + * @param functionIdentity The identity of the function. + * @param buffers a list of @code{long} arrays comprising two (2) values. + * @throws IllegalArgumentException if the name does not indicate a cyclic + * hashing function. + */ +public CachingHasher(HashFunctionIdentity functionIdentity, List buffers) { +this.functionIdentity = checkIdentity(functionIdentity); +this.buffers = new ArrayList(buffers); +} + +/** + * Constructs a CachingHasher from an array of arrys of hash values. + * + * @param functionIdentity The identity of the function. + * @param buffers an array of @code{long} arrays comprising two (2) values. + * @throws IllegalArgumentException if the name does not indicate a cyclic + * hashing function. + */ +public CachingHasher(HashFunctionIdentity functionIdentity, long[][] buffers) { +this.functionIdentity = checkIdentity(functionIdentity); +this.buffers = Arrays.asList(buffers); +} + +/** + * Checks that the name is valid for this hasher. + * + * @param functionIdentity the Function Identity to check. + */ +private static HashFunctionIdentity checkIdentity(HashFunctionIdentity functionIdentity) { +if (functionIdentity.getProcessType() != ProcessType.CYCLIC) { +throw new IllegalArgumentException("Only cyclic hash functions may be used in a caching hasher"); +} +return functionIdentity; +} + +@Override +public HashFunctionIdentity getHashFunctionIdentity() { +return functionIdentity; +} + +@Override +public boolean isEmpty() { +return buffers.isEmpty(); +} + + +@Override +public PrimitiveIterator.OfInt getBits(Shape shape) { +HashFunctionValidator.checkAreEqual(getHashFunctionIdentity(), +shape.getHashFunctionIdentity()); +return new IntIterator(shape); +} + +/** + * Gets the long representations of the buffers. + * Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this ser
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242645 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasher.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; + +import org.apache.commons.collections4.bloomfilter.hasher.HashFunctionIdentity.ProcessType; + +/** + * An implementation of Hasher that attempts to leak as little data as possible. + * Each item in the hasher is represented by two (2) longs. So this Hasher will + * still indicate how many items are in the hasher but will not leak the buffers + * that are being hashed as the @code DynamicHasher} does. + * + * This hasher only accepts HashFunctions that are cyclic in nature. + * + * @see DynamicHasher + * @see ProcessType + */ +public class CachingHasher implements Hasher { + +/** + * The list of byte arrays that are to be hashed. + */ +private final List buffers; + +/** + * The hash function identity + */ +private final HashFunctionIdentity functionIdentity; + +/** + * Constructs a CachingHasher from a list of arrays of hash values. + * + * The list of hash values comprises a @code{List} where each @code{long[]} Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242640 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/CachingHasher.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.commons.collections4.bloomfilter.hasher; + +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.NoSuchElementException; +import java.util.PrimitiveIterator; + +import org.apache.commons.collections4.bloomfilter.hasher.HashFunctionIdentity.ProcessType; + +/** + * An implementation of Hasher that attempts to leak as little data as possible. + * Each item in the hasher is represented by two (2) longs. So this Hasher will + * still indicate how many items are in the hasher but will not leak the buffers + * that are being hashed as the @code DynamicHasher} does. Review comment: fixed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242584 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/HashFunctionIdentityImpl.java ## @@ -51,7 +51,8 @@ public HashFunctionIdentityImpl(final HashFunctionIdentity identity) { * @param process the processes of the hash function. * @param signature the signature for the hash function. */ -public HashFunctionIdentityImpl(final String provider, final String name, final Signedness signedness, final ProcessType process, +public HashFunctionIdentityImpl(final String provider, final String name, final Signedness signedness, Review comment: reversed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] Claudenw commented on a change in pull request #131: Added caching hasher
Claudenw commented on a change in pull request #131: Added caching hasher URL: https://github.com/apache/commons-collections/pull/131#discussion_r389242559 ## File path: src/main/java/org/apache/commons/collections4/bloomfilter/hasher/HashFunctionValidator.java ## @@ -20,7 +20,7 @@ /** * Contains validation for hash functions. */ -final class HashFunctionValidator { +public final class HashFunctionValidator { Review comment: reversed - unintended for this pull This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-collections] dota17 opened a new pull request #138: Update the package-info.java
dota17 opened a new pull request #138: Update the package-info.java URL: https://github.com/apache/commons-collections/pull/138 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (DAEMON-416) prunsrv.exe adding special character while executing in windows 2019
[ https://issues.apache.org/jira/browse/DAEMON-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053944#comment-17053944 ] Mark Thomas commented on DAEMON-416: See also DAEMON-413. > prunsrv.exe adding special character while executing in windows 2019 > > > Key: DAEMON-416 > URL: https://issues.apache.org/jira/browse/DAEMON-416 > Project: Commons Daemon > Issue Type: Bug > Environment: Windows 2019 >Reporter: rajiv devaraj >Priority: Major > > While executing {{prunsrv.exe}} from an earlier version of Windows, (like > 2016), the below command was working. But upon introducing the same statement > for Windows 2019, it's not starting the service. > I inspected the registry to check the path in which the process was created. > It seems to add a random special character before the path, which is why it's > not able to start. > Observed Path: > > {{C:\/prunsrv.exe ^E//RS//}} > Here near {{//RS}}, {{^E}} is prepended and it's random special character is > generated for every execution. > Expected Path: > > {{C:\/prunsrv.exe //RS//}} > This is the line which is responsible to initiate the process which is > triggered from {{.bat}} file: > > {{"%SOMEPATH%/bin/prunsrv.exe" //IS//%SERVICE_NAME%}} > I need some help on how to remove this special character before the process > is started. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DAEMON-416) prunsrv.exe adding special character while executing in windows 2019
[ https://issues.apache.org/jira/browse/DAEMON-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053942#comment-17053942 ] Mark Thomas commented on DAEMON-416: Please re-test with the latest DAEMON release (1.2.2 as I type this). > prunsrv.exe adding special character while executing in windows 2019 > > > Key: DAEMON-416 > URL: https://issues.apache.org/jira/browse/DAEMON-416 > Project: Commons Daemon > Issue Type: Bug > Environment: Windows 2019 >Reporter: rajiv devaraj >Priority: Major > > While executing {{prunsrv.exe}} from an earlier version of Windows, (like > 2016), the below command was working. But upon introducing the same statement > for Windows 2019, it's not starting the service. > I inspected the registry to check the path in which the process was created. > It seems to add a random special character before the path, which is why it's > not able to start. > Observed Path: > > {{C:\/prunsrv.exe ^E//RS//}} > Here near {{//RS}}, {{^E}} is prepended and it's random special character is > generated for every execution. > Expected Path: > > {{C:\/prunsrv.exe //RS//}} > This is the line which is responsible to initiate the process which is > triggered from {{.bat}} file: > > {{"%SOMEPATH%/bin/prunsrv.exe" //IS//%SERVICE_NAME%}} > I need some help on how to remove this special character before the process > is started. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CSV-148) CSVFormat.ignoreSurroundingSpaces is ignored when printing
[ https://issues.apache.org/jira/browse/CSV-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053939#comment-17053939 ] Chen commented on CSV-148: -- just replace {code:java} .withIgnoreSurroundingSpaces(true) {code} to {code:java} .withTrim() {code} the testunit could past . so why dou we need a patch? > CSVFormat.ignoreSurroundingSpaces is ignored when printing > -- > > Key: CSV-148 > URL: https://issues.apache.org/jira/browse/CSV-148 > Project: Commons CSV > Issue Type: Bug > Components: Printer >Affects Versions: 1.1 > Environment: JDK 1.7 >Reporter: Piotr Ciruk >Priority: Minor > Fix For: Review > > Attachments: commons-csv_CSV-148.patch, commons-csv_CSV-148.patch > > > It seems that {{CSVFormat}}'s property {{ignoreSurroundingSpaces}} is not > taken into consideration while printing out values using {{CSVPrinter}}. > Given: > {code} > System.out.println( > CSVFormat.DEFAULT > .withIgnoreSurroundingSpaces(true) > .format("", > " ", > " Single space on the left", > "Single space on the right ", > " Single spaces on both sides ", > " Multiple spaces on the left", > "Multiple spaces on the right", > " Multiple spaces on both sides ") > ); > {code} > Actual result: > {code} > ""," "," Single space on the left","Single space on the right "," Single > spaces on both sides "," Multiple spaces on the left","Multiple spaces on > the right"," Multiple spaces on both sides " > {code} > Expected result: > {code} > "","","Single space on the left","Single space on the right","Single spaces > on both sides","Multiple spaces on the left","Multiple spaces on the > right","Multiple spaces on both sides" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)