[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023456#comment-16023456 ] Joseph Witt commented on NIFI-3644: --- [~bjorn.ols...@gmail.com] and [~bbende] this is a really cool addition. Nice work and thanks! > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Assignee: Bryan Bende >Priority: Minor > Fix For: 1.3.0 > > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023451#comment-16023451 ] ASF GitHub Bot commented on NIFI-3644: -- Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/1645 > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > Fix For: 1.3.0 > > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023450#comment-16023450 ] ASF subversion and git services commented on NIFI-3644: --- Commit ae3db823037ef01f8dc123e494f1d9e6522f29fe in nifi's branch refs/heads/master from [~bbende] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=ae3db82 ] NIFI-3644 Fixing the result handler in HBase_1_1_2_ClientMapCacheService to use the offsets for the value bytes This closes #1645. Signed-off-by: Bryan Bende > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023328#comment-16023328 ] ASF GitHub Bot commented on NIFI-3644: -- Github user bbende commented on the issue: https://github.com/apache/nifi/pull/1645 Sorry for taking so long to get back to this... I tested this using PutDistributedMapCache and FetchDistributedMapCache, and noticed the value coming back from fetch wasn't exactly what I had stored. In HBaseRowHandler we had: `lastResultBytes = resultCell.getValueArray()` And we need: `lastResultBytes = Arrays.copyOfRange(resultCell.getValueArray(), resultCell.getValueOffset(), resultCell.getValueLength() + resultCell.getValueOffset()); ` I made a commit here that includes the change: https://github.com/bbende/nifi/commit/dc8f14d95d6cdbab2aa6e815269fe0d98faa2fe6 I also moved MockHBaseClientService into it's own class so it can be used by both tests, so that we don't have to duplicate that code. Everything else looks good so I will go ahead and merge these changes together (your commit then mine). Thanks again for contributing! and sorry for the delay. > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022295#comment-16022295 ] ASF GitHub Bot commented on NIFI-3644: -- Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/1645 This looks like it could be pretty helpful! I wonder if in light of the recent LookupService work we should consider exposing/using this via that interface instead of or in addition to this distributed cache one. Thoughts? > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962767#comment-15962767 ] ASF GitHub Bot commented on NIFI-3644: -- Github user baolsen commented on the issue: https://github.com/apache/nifi/pull/1645 @bbende Ready for another review! I've updated per your comments and added some unit tests. > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959732#comment-15959732 ] Joseph Witt commented on NIFI-3644: --- [~bjorn.ols...@gmail.com] Very cool that you've taken the feedback and done so much with it! Thanks also to [~bbende] for reviewing and helping bjorn make this happen. I suspect this will be a very popular feature! > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959729#comment-15959729 ] ASF GitHub Bot commented on NIFI-3644: -- Github user bbende commented on a diff in the pull request: https://github.com/apache/nifi/pull/1645#discussion_r110264255 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java --- @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.hbase; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnEnabled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.controller.AbstractControllerService; +import org.apache.nifi.controller.ConfigurationContext; + +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.distributed.cache.client.Serializer; +import org.apache.nifi.distributed.cache.client.Deserializer; +import java.io.ByteArrayOutputStream; +import org.apache.nifi.reporting.InitializationException; + +import java.nio.charset.StandardCharsets; +import org.apache.nifi.hbase.scan.ResultCell; +import org.apache.nifi.hbase.scan.ResultHandler; +import org.apache.nifi.hbase.scan.Column; +import org.apache.nifi.hbase.put.PutColumn; + + +import org.apache.nifi.processor.util.StandardValidators; + +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"}) +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"}) +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache." ++ " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.") + +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient { + +static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder() +.name("HBase Client Service") +.description("Specifies the HBase Client Controller Service to use for accessing HBase.") +.required(true) +.identifiesControllerService(HBaseClientService.class) +.build(); + +public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder() +.name("HBase Cache Table Name") +.description("Name of the table on HBase to use for the cache.") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder() +.name("HBase Column Family") +.description("Name of the column family on HBase to use for the cache.") +.required(true) +.defaultValue("f") +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder() +.name("HBase Column Qualifier") +.description("Name of the column qualifier on HBase to use for the cache") +.defaultValue("q") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +@Override +protected List getSupportedPropertyDescriptors() { +final List descriptors = new ArrayList
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959367#comment-15959367 ] ASF GitHub Bot commented on NIFI-3644: -- Github user baolsen commented on a diff in the pull request: https://github.com/apache/nifi/pull/1645#discussion_r110219758 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java --- @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.hbase; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnEnabled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.controller.AbstractControllerService; +import org.apache.nifi.controller.ConfigurationContext; + +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.distributed.cache.client.Serializer; +import org.apache.nifi.distributed.cache.client.Deserializer; +import java.io.ByteArrayOutputStream; +import org.apache.nifi.reporting.InitializationException; + +import java.nio.charset.StandardCharsets; +import org.apache.nifi.hbase.scan.ResultCell; +import org.apache.nifi.hbase.scan.ResultHandler; +import org.apache.nifi.hbase.scan.Column; +import org.apache.nifi.hbase.put.PutColumn; + + +import org.apache.nifi.processor.util.StandardValidators; + +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"}) +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"}) +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache." ++ " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.") + +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient { + +static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder() +.name("HBase Client Service") +.description("Specifies the HBase Client Controller Service to use for accessing HBase.") +.required(true) +.identifiesControllerService(HBaseClientService.class) +.build(); + +public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder() +.name("HBase Cache Table Name") +.description("Name of the table on HBase to use for the cache.") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder() +.name("HBase Column Family") +.description("Name of the column family on HBase to use for the cache.") +.required(true) +.defaultValue("f") +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder() +.name("HBase Column Qualifier") +.description("Name of the column qualifier on HBase to use for the cache") +.defaultValue("q") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +@Override +protected List getSupportedPropertyDescriptors() { +final List descriptors = new ArrayLis
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959086#comment-15959086 ] ASF GitHub Bot commented on NIFI-3644: -- Github user baolsen commented on a diff in the pull request: https://github.com/apache/nifi/pull/1645#discussion_r110191840 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java --- @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.hbase; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnEnabled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.controller.AbstractControllerService; +import org.apache.nifi.controller.ConfigurationContext; + +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.distributed.cache.client.Serializer; +import org.apache.nifi.distributed.cache.client.Deserializer; +import java.io.ByteArrayOutputStream; +import org.apache.nifi.reporting.InitializationException; + +import java.nio.charset.StandardCharsets; +import org.apache.nifi.hbase.scan.ResultCell; +import org.apache.nifi.hbase.scan.ResultHandler; +import org.apache.nifi.hbase.scan.Column; +import org.apache.nifi.hbase.put.PutColumn; + + +import org.apache.nifi.processor.util.StandardValidators; + +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"}) +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"}) +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache." ++ " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.") + +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient { + +static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder() +.name("HBase Client Service") +.description("Specifies the HBase Client Controller Service to use for accessing HBase.") +.required(true) +.identifiesControllerService(HBaseClientService.class) +.build(); + +public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder() +.name("HBase Cache Table Name") +.description("Name of the table on HBase to use for the cache.") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder() +.name("HBase Column Family") +.description("Name of the column family on HBase to use for the cache.") +.required(true) +.defaultValue("f") +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder() +.name("HBase Column Qualifier") +.description("Name of the column qualifier on HBase to use for the cache") +.defaultValue("q") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +@Override +protected List getSupportedPropertyDescriptors() { +final List descriptors = new ArrayLis
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955646#comment-15955646 ] ASF GitHub Bot commented on NIFI-3644: -- Github user bbende commented on a diff in the pull request: https://github.com/apache/nifi/pull/1645#discussion_r109728311 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java --- @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.hbase; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnEnabled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.controller.AbstractControllerService; +import org.apache.nifi.controller.ConfigurationContext; + +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.distributed.cache.client.Serializer; +import org.apache.nifi.distributed.cache.client.Deserializer; +import java.io.ByteArrayOutputStream; +import org.apache.nifi.reporting.InitializationException; + +import java.nio.charset.StandardCharsets; +import org.apache.nifi.hbase.scan.ResultCell; +import org.apache.nifi.hbase.scan.ResultHandler; +import org.apache.nifi.hbase.scan.Column; +import org.apache.nifi.hbase.put.PutColumn; + + +import org.apache.nifi.processor.util.StandardValidators; + +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"}) +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"}) +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache." ++ " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.") + +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient { + +static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder() +.name("HBase Client Service") +.description("Specifies the HBase Client Controller Service to use for accessing HBase.") +.required(true) +.identifiesControllerService(HBaseClientService.class) +.build(); + +public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder() --- End diff -- For the table, col fal, and col qual, you may want to support expression language. There are obviously no flow files in this case, but if you have expressionLanguageSupported(true) on the property descriptors and then when you get the values .evaluateAttributeExpressions(), this would let someone reference an environment variable if they want to specify a different table across environments, > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBa
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955645#comment-15955645 ] ASF GitHub Bot commented on NIFI-3644: -- Github user bbende commented on a diff in the pull request: https://github.com/apache/nifi/pull/1645#discussion_r109752585 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java --- @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.hbase; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnEnabled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.controller.AbstractControllerService; +import org.apache.nifi.controller.ConfigurationContext; + +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.distributed.cache.client.Serializer; +import org.apache.nifi.distributed.cache.client.Deserializer; +import java.io.ByteArrayOutputStream; +import org.apache.nifi.reporting.InitializationException; + +import java.nio.charset.StandardCharsets; +import org.apache.nifi.hbase.scan.ResultCell; +import org.apache.nifi.hbase.scan.ResultHandler; +import org.apache.nifi.hbase.scan.Column; +import org.apache.nifi.hbase.put.PutColumn; + + +import org.apache.nifi.processor.util.StandardValidators; + +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"}) +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"}) +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache." ++ " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.") + +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient { + +static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder() +.name("HBase Client Service") +.description("Specifies the HBase Client Controller Service to use for accessing HBase.") +.required(true) +.identifiesControllerService(HBaseClientService.class) +.build(); + +public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder() +.name("HBase Cache Table Name") +.description("Name of the table on HBase to use for the cache.") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder() +.name("HBase Column Family") +.description("Name of the column family on HBase to use for the cache.") +.required(true) +.defaultValue("f") +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder() +.name("HBase Column Qualifier") +.description("Name of the column qualifier on HBase to use for the cache") +.defaultValue("q") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +@Override +protected List getSupportedPropertyDescriptors() { +final List descriptors = new ArrayList
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955647#comment-15955647 ] ASF GitHub Bot commented on NIFI-3644: -- Github user bbende commented on a diff in the pull request: https://github.com/apache/nifi/pull/1645#discussion_r109728428 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java --- @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.hbase; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnEnabled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.controller.AbstractControllerService; +import org.apache.nifi.controller.ConfigurationContext; + +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.distributed.cache.client.Serializer; +import org.apache.nifi.distributed.cache.client.Deserializer; +import java.io.ByteArrayOutputStream; +import org.apache.nifi.reporting.InitializationException; + +import java.nio.charset.StandardCharsets; +import org.apache.nifi.hbase.scan.ResultCell; +import org.apache.nifi.hbase.scan.ResultHandler; +import org.apache.nifi.hbase.scan.Column; +import org.apache.nifi.hbase.put.PutColumn; + + +import org.apache.nifi.processor.util.StandardValidators; + +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"}) +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"}) +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache." ++ " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.") + +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient { + +static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder() +.name("HBase Client Service") +.description("Specifies the HBase Client Controller Service to use for accessing HBase.") +.required(true) +.identifiesControllerService(HBaseClientService.class) +.build(); + +public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder() +.name("HBase Cache Table Name") +.description("Name of the table on HBase to use for the cache.") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder() +.name("HBase Column Family") +.description("Name of the column family on HBase to use for the cache.") +.required(true) +.defaultValue("f") +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder() +.name("HBase Column Qualifier") +.description("Name of the column qualifier on HBase to use for the cache") +.defaultValue("q") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +@Override +protected List getSupportedPropertyDescriptors() { +final List descriptors = new ArrayList
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955644#comment-15955644 ] ASF GitHub Bot commented on NIFI-3644: -- Github user bbende commented on a diff in the pull request: https://github.com/apache/nifi/pull/1645#discussion_r109727737 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-hbase_1_1_2-client-service-bundle/nifi-hbase_1_1_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_1_1_2_ClientMapCacheService.java --- @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.hbase; + +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.annotation.lifecycle.OnEnabled; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.controller.AbstractControllerService; +import org.apache.nifi.controller.ConfigurationContext; + +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.distributed.cache.client.Serializer; +import org.apache.nifi.distributed.cache.client.Deserializer; +import java.io.ByteArrayOutputStream; +import org.apache.nifi.reporting.InitializationException; + +import java.nio.charset.StandardCharsets; +import org.apache.nifi.hbase.scan.ResultCell; +import org.apache.nifi.hbase.scan.ResultHandler; +import org.apache.nifi.hbase.scan.Column; +import org.apache.nifi.hbase.put.PutColumn; + + +import org.apache.nifi.processor.util.StandardValidators; + +@Tags({"distributed", "cache", "state", "map", "cluster","hbase"}) +@SeeAlso(classNames = {"org.apache.nifi.distributed.cache.server.map.DistributedMapCacheClient", "org.apache.nifi.hbase.HBase_1_1_2_ClientService"}) +@CapabilityDescription("Provides the ability to use an HBase table as a cache, in place of a DistributedMapCache." ++ " Uses a HBase_1_1_2_ClientService controller to communicate with HBase.") + +public class HBase_1_1_2_ClientMapCacheService extends AbstractControllerService implements DistributedMapCacheClient { + +static final PropertyDescriptor HBASE_CLIENT_SERVICE = new PropertyDescriptor.Builder() +.name("HBase Client Service") +.description("Specifies the HBase Client Controller Service to use for accessing HBase.") +.required(true) +.identifiesControllerService(HBaseClientService.class) +.build(); + +public static final PropertyDescriptor HBASE_CACHE_TABLE_NAME = new PropertyDescriptor.Builder() +.name("HBase Cache Table Name") +.description("Name of the table on HBase to use for the cache.") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_FAMILY = new PropertyDescriptor.Builder() +.name("HBase Column Family") +.description("Name of the column family on HBase to use for the cache.") +.required(true) +.defaultValue("f") +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor HBASE_COLUMN_QUALIFIER = new PropertyDescriptor.Builder() +.name("HBase Column Qualifier") +.description("Name of the column qualifier on HBase to use for the cache") +.defaultValue("q") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +@Override +protected List getSupportedPropertyDescriptors() { +final List descriptors = new ArrayList
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954708#comment-15954708 ] ASF GitHub Bot commented on NIFI-3644: -- Github user baolsen commented on the issue: https://github.com/apache/nifi/pull/1645 Hi @bbende, please take a look at this PR. I've added HBase_1_1_2_ClientMapCacheService as a controller service which uses the HBase_1_1_2_ClientService to store a cache of values on HBase. Can be used in the DetectDuplicate processor in place of a DistributedMapCache (and other processors as well) Travis build is passing 4/5, not sure why one of the languages would fail. The AppVeyor build is failing on a specific test "TestListFile.testAttributesSet" which I don't think is mine. Let me know what you think. Thanks! > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952671#comment-15952671 ] ASF GitHub Bot commented on NIFI-3644: -- GitHub user baolsen opened a pull request: https://github.com/apache/nifi/pull/1645 NIFI-3644 - Added HBase_1_1_2_ClientMapCacheService Added HBase_1_1_2_ClientMapCacheService which implements DistributedMapCacheClient. The DetectDuplicate processor can now make use of HBase_1_1_2_ClientMapCacheService for storing the duplicate cache on HBase. You can merge this pull request into a Git repository by running: $ git pull https://github.com/baolsen/nifi DistributedMapCacheHBaseClientService Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/1645.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1645 commit 8c0285b5efb6afd1607bb050650b758fed7d06e3 Author: baolsen Date: 2017-03-23T12:35:43Z Update HBaseClientService.java Added "get" function call for doing single row lookup on HBase (HBase get) commit 03d1b36376c6954d8bdcf4056314fced0cf0d1fc Author: baolsen Date: 2017-03-23T13:20:41Z Update HBase_1_1_2_ClientService.java Implemented "get" function for retrieval of single HBase rows. commit 6dbca10e82b3b6b8ac94f8f0152b8fff85008082 Author: baolsen Date: 2017-03-23T13:33:15Z Update HBase_1_1_2_ClientService.java commit df30a22a3ba71fedfe1dffedefcc0eb64c3670b0 Author: baolsen Date: 2017-03-23T13:40:08Z Update HBase_1_1_2_ClientService.java commit 6d8036cc03ef49e41b92dbb5fa7e0de41cc15c3d Author: baolsen Date: 2017-03-23T13:44:12Z Update MockHBaseClientService.java Implemented "get" function with UnsupportedException commit 4bcb26fd6a99a23852097f4f3db02cbeb6b8a3b5 Author: baolsen Date: 2017-03-23T13:46:23Z Update HBase_1_1_2_ClientService.java commit 4b266d9d1d112e2bf8aa198f87253d17c055dbbc Author: baolsen Date: 2017-03-23T13:50:09Z Update MockHBaseClientService.java commit 2ef850bc7c2bce5f9dd35fc9ce5cf08c7ecf07c4 Author: baolsen Date: 2017-03-29T08:51:11Z Test commit e802f147bcd19664b9053e240ec1476ff7a61e7b Author: baolsen Date: 2017-03-29T08:52:35Z Test commit 4cabff26658090c08d813e74d27894a9fd684c57 Author: baolsen Date: 2017-03-31T07:59:50Z Completed initial development of HBase_1_1_2_ClientMapCacheService.java which is compatible with DetectDuplicate (and other processors) Still need to implement value deletion commit 7790d3f5a8d56f0801d40ad2c836a8db7c123e1b Author: baolsen Date: 2017-03-31T08:31:06Z Undid changes to files for an earlier attempt at this commit 594dc059cdbe708f10849c794b826d24e83e787d Author: baolsen Date: 2017-03-31T08:33:47Z Undid changes to files for an earlier attempt at this commit fbd3034e736ecdd1d721cc788e5c984eee6560c7 Author: baolsen Date: 2017-04-02T13:01:21Z Added remove() for cache and Documentation > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940467#comment-15940467 ] Bjorn Olsen commented on NIFI-3644: --- Hi Joe Thanks for the suggestion, I hadn't considered writing an HBase version of DistributedMapCache. I've already written my own DetectDuplicateUsingHBase processor today, as I needed something that was quick to develop. Code here, much copy-pasta from DetectDuplicate: https://github.com/baolsen/nifi/blob/DetectDuplicateUsingHBase/nifi-nar-bundles/nifi-hbase-bundle/nifi-hbase-processors/src/main/java/org/apache/nifi/hbase/DetectDuplicateUsingHBase.java It seems that implementing an HBase-based DistributedMapCache is more complex, but more reusable. Do you have any suggestions for documentation for this sort of thing? Lastly, do you think it is worth including DetectDuplicateUsingHBase or rather wait for a more reusable option? I'm a bit tight on time, and Java and NiFi are both new to me. Meanwhile I can keep DetectDuplicateUsingHBase for my own use, so no worries there. > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940247#comment-15940247 ] Joseph Witt commented on NIFI-3644: --- Bjorn, We can add you to the contributors list in JIRA so that you can assign items to yourself. However, in the meantime you can definitely contribute and work on tasks without this. For this concept please note you should only need to create an implementation of the DistributedMapCache which is backed by HBase rather than a new processor. DetectDuplicate can use any implementation of that interface by design. Thanks Joe > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3644) Add DetectDuplicateUsingHBase processor
[ https://issues.apache.org/jira/browse/NIFI-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939859#comment-15939859 ] Bjorn Olsen commented on NIFI-3644: --- Please assign to me > Add DetectDuplicateUsingHBase processor > --- > > Key: NIFI-3644 > URL: https://issues.apache.org/jira/browse/NIFI-3644 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Bjorn Olsen >Priority: Minor > > The DetectDuplicate processor makes use of a distributed map cache for > maintaining a list of unique file identifiers (such as hashes). > The distributed map cache functionality could be provided by an HBase table, > which then allows for reliably storing a huge volume of file identifiers and > auditing information. The downside of this approach is of course that HBase > is required. > Storing the unique file identifiers in a reliable, query-able manner along > with some audit information is of benefit to several use cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)