[GitHub] spark issue #16864: [SPARK-19527][Core] Approximate Size of Intersection of ...

2017-06-23 Thread Bcpoole
Github user Bcpoole commented on the issue:

https://github.com/apache/spark/pull/16864
  
Looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...

2017-02-10 Thread Bcpoole
Github user Bcpoole commented on a diff in the pull request:

https://github.com/apache/spark/pull/16864#discussion_r100624882
  
--- Diff: 
common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilterImpl.java 
---
@@ -221,6 +221,49 @@ public BloomFilter mergeInPlace(BloomFilter other) 
throws IncompatibleMergeExcep
   }
 
   @Override
+  public double approxItems() {
+double m = bitSize();
+return (-m / numHashFunctions) * Math.log(1 - (bits.cardinality() / 
m));
--- End diff --

> Math is deprecated. Use math.

Assume you were thinking of Scala?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...

2017-02-09 Thread Bcpoole
Github user Bcpoole commented on a diff in the pull request:

https://github.com/apache/spark/pull/16864#discussion_r100385258
  
--- Diff: 
common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java ---
@@ -81,6 +81,11 @@ int getVersionNumber() {
   public abstract long bitSize();
 
   /**
+   * Swamidass & Baldi (2007) approximation for number of items in a Bloom 
filter
+   */
+  public abstract double approxItems();
--- End diff --

I was debating this due to possible rounding errors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...

2017-02-09 Thread Bcpoole
Github user Bcpoole commented on a diff in the pull request:

https://github.com/apache/spark/pull/16864#discussion_r100381372
  
--- Diff: 
common/sketch/src/main/java/org/apache/spark/util/sketch/IncompatibleUnionException.java
 ---
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.sketch;
+
+public class IncompatibleUnionException extends Exception {
--- End diff --

In that case IncompatibleMergeException needs it too :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...

2017-02-08 Thread Bcpoole
GitHub user Bcpoole opened a pull request:

https://github.com/apache/spark/pull/16864

[SPARK-19527][Core] Approximate Size of Intersection of Bloom Filters

**What changes were proposed in this pull request?**

Added functions to get the Swamidass & Baldi (2007) approximation for 
number of items in a Bloom filter and the intersections of two filters. Added 
an exception type IncompatibleUnionException mimicing 
IncompatibleMergeException. As needed for the intersection approximation, there 
is a function that create the union of two Bloom filters (no mutations).

**How was this patch tested?**

Manual Tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Bcpoole/spark 
approxItemsInBloomFilterIntersection

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16864


commit 7a3ad46ff86bd3d2d47f6a56bace1a0c4fd171c8
Author: Bcpoole <brandoncpo...@gmail.com>
Date:   2017-02-09T01:11:07Z

Swamidass & Baldi approx. items in intersection of two Bloom filters. Also 
function to create union (non-mutation) of two Bloom filters.

commit b9680c57b2f8b1d93c28884de9a7ebbe52505f6c
Author: Bcpoole <brandoncpo...@gmail.com>
Date:   2017-02-09T01:42:36Z

Changed createUnionBloomFilter & approxItemsInIntersection to be instance 
instead of static functions

commit 501ad7e22101b00862c0c77ef8c38e1b166d33a4
Author: Bcpoole <brandoncpo...@gmail.com>
Date:   2017-02-09T01:53:50Z

Updated abstract class to reflect changes in previous commit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16863: Swamidass & Baldi Approximations

2017-02-08 Thread Bcpoole
Github user Bcpoole closed the pull request at:

https://github.com/apache/spark/pull/16863


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16863: Swamidass & Baldi Approximations

2017-02-08 Thread Bcpoole
GitHub user Bcpoole reopened a pull request:

https://github.com/apache/spark/pull/16863

Swamidass & Baldi Approximations

## What changes were proposed in this pull request?

Added functions to get the Swamidass & Baldi (2007) approximation for 
number of items in a Bloom filter and the intersections of two filters. Added 
an exception type IncompatibleUnionException mimicing 
IncompatibleMergeException. As needed for the intersection approximation, there 
is a function that create the union of two Bloom filters (no mutations).

## How was this patch tested?

Manual Tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Bcpoole/spark 
approxItemsInBloomFilterIntersection

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16863.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16863


commit 7a3ad46ff86bd3d2d47f6a56bace1a0c4fd171c8
Author: Bcpoole <brandoncpo...@gmail.com>
Date:   2017-02-09T01:11:07Z

Swamidass & Baldi approx. items in intersection of two Bloom filters. Also 
function to create union (non-mutation) of two Bloom filters.

commit b9680c57b2f8b1d93c28884de9a7ebbe52505f6c
Author: Bcpoole <brandoncpo...@gmail.com>
Date:   2017-02-09T01:42:36Z

Changed createUnionBloomFilter & approxItemsInIntersection to be instance 
instead of static functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16863: Swamidass & Baldi Approximations

2017-02-08 Thread Bcpoole
Github user Bcpoole closed the pull request at:

https://github.com/apache/spark/pull/16863


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16863: Swamidass & Baldi Approximations

2017-02-08 Thread Bcpoole
GitHub user Bcpoole opened a pull request:

https://github.com/apache/spark/pull/16863

Swamidass & Baldi Approximations

## What changes were proposed in this pull request?

Added functions to get the Swamidass & Baldi (2007) approximation for 
number of items in a Bloom filter and the intersections of two filters. Added 
an exception type IncompatibleUnionException mimicing 
IncompatibleMergeException. As needed for the intersection approximation, there 
is a function that create the union of two Bloom filters (no mutations).

## How was this patch tested?

Manual Tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Bcpoole/spark 
approxItemsInBloomFilterIntersection

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16863.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16863


commit 7a3ad46ff86bd3d2d47f6a56bace1a0c4fd171c8
Author: Bcpoole <brandoncpo...@gmail.com>
Date:   2017-02-09T01:11:07Z

Swamidass & Baldi approx. items in intersection of two Bloom filters. Also 
function to create union (non-mutation) of two Bloom filters.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16861: Added function to get union of 2 Bloom filters (n...

2017-02-08 Thread Bcpoole
Github user Bcpoole closed the pull request at:

https://github.com/apache/spark/pull/16861


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16861: Added function to get union of 2 Bloom filters (n...

2017-02-08 Thread Bcpoole
GitHub user Bcpoole opened a pull request:

https://github.com/apache/spark/pull/16861

Added function to get union of 2 Bloom filters (no mutation). Added S…

…wamidass & Baldi approximations for number of items in a Bloom filter 
and for the intersection of 2 Bloom filters given those Bloom filters.

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Bcpoole/spark intersectionOfBloomFilters

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16861


commit 934766b6527f512d8d81d2533899583b98d0219c
Author: Bcpoole <brandoncpo...@gmail.com>
Date:   2017-02-08T23:59:41Z

Added function to get union of 2 Bloom filters (no mutation). Added 
Swamidass & Baldi approximations for number of items in a Bloom filter and for 
the intersection of 2 Bloom filters given those Bloom filters.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org