Re: [DISCUSS] On HBase client retries (NIFI-6197)

Josh Elser Tue, 25 Feb 2020 12:44:53 -0800

So, do we have a real issue here? Or are we just overlapping twodiscussions about the same code-change with different context?

My impression from Bryan is, now that he has the bigger context of whyimplicit retries are important for HBase, that he is not against settingthis value higher (although, he also originally said this [1]), but justnot the 1.x default of 35 retries (which *I* will say is too high). Am Ireading too much into what you've said, Bryan?

I understand that folks are busy and this is probably not the highestpriority. I'm just trying to figure out if we're actually still indisagreement. My gut says that we're not.

Re-stating goals: I think everyone is trying to make sure that the userof NiFi with HBase in the mix has a great user experience OOTB. Thatrequires finding a balance between "fail fast when something isoutwardly messed up" and "don't fail up when HBase can implicitly rideover a transient/automatically-recoverable state".


[1] https://github.com/apache/nifi/pull/3425#issuecomment-487962996

On 2/19/20 11:14 AM, Lars Francke wrote:

That's what I suggested[1] and Bryan rejected[2]

[1] <https://github.com/apache/nifi/pull/3425#issuecomment-482711295>
[2] <https://github.com/apache/nifi/pull/3425#issuecomment-491984116>


On Wed, Feb 19, 2020 at 4:29 PM Josh Elser <els...@apache.org> wrote:

Thanks for sharing the code, Bryan! I lazily did not go digging.

Looking into #onEnabled, we could change this to create its own
Connection with a very low 1 or 2 retries with a very short retry
duration. Throw that one away after we did our sanity check, set a high
retry amount, and then create a new Connection for all of the
puts/gets/scans/whatever the service will do.

I think NiFi, in general, can err on the side of "lower" client retries
(both in number and duration of backoff) than a normal client since it
can implicitly buffer+retry flowfiles that fail.

WDYT, Lars?

On 2/19/20 9:24 AM, Bryan Bende wrote:

We do already expose the ability to configure the retries in the
controller service [1], it was just a debate about what the default
value should be. Currently it is set to 1 because we believed it was
better to fail fast during the initial configuration of the service. A
user can easily set it back to 15, or whatever HBase client normally
does. I even suggested a compromise of making the default value 7,
which was half way between the two extremes, but that was considered
unacceptable, even though based on what you said in your original
message, most cases work on the first retry, so 7 would have covered
those.

We can also expose a property for the RPC timeout, but I think the
retries is more the issue here.

We can also definitely improve the docs, but I would still lean
towards the default retires being something lower, and then the docs
can explain that after ensuring the service can be successfully
started that this value can be increased to 15.

[1]

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-hbase_2-client-service-bundle/nifi-hbase_2-client-service/src/main/java/org/apache/nifi/hbase/HBase_2_ClientService.java#L134-L140


On Tue, Feb 18, 2020 at 8:31 PM Josh Elser <els...@apache.org> wrote:


We could certainly implement some kind of "sanity check" via HBase code,
but I think the thing that is missing is the logical way to validate
this in NiFi itself.

Something like...

```
Configuration conf = HBaseConfiguration.create();
conf.setInt("hbase.rpc.timeout", 5000);
try (Connection conn = ConnectionFactory.create(conf)) {
     // do sanity check
}
Configuration conf2 = new Configuration(conf);
conf2.setInt("hbase.rpc.timeout", 25000);
try (Connection conn = ConnectionFactory.create(conf2)) {
     // do real hbase-y stuff
}
```

Maybe instead of requiring an implicit way to do this (requiring NiFi
code changes), we could solve the problem at the "human level": create
docs that walk people through how to push a dummy record through the
service/processor with the low configuration of timeouts and retries?
Make the "sanity check" a human operation and just expose the ability to
set timeout/retries via the controller service?

On 2/18/20 4:36 PM, Lars Francke wrote:

Hi,

Josh, thanks for bringing it up here again.
I'm happy to revive the PR with whatever the outcome of this thread is.
It came up today because another client complained about how "unstable"
HBase is on NiFi.

@Josh: As the whole issue is only the initial connect can we have a
different timeout setting there? I have to admit I don't know.

Cheers,
Lars

On Tue, Feb 18, 2020 at 8:11 PM Pierre Villard <

pierre.villard...@gmail.com>

wrote:

Good point, I don't think we can do that on a controller service.

Le mar. 18 févr. 2020 à 11:06, Bryan Bende <bbe...@gmail.com> a

écrit :

That could make it a little better, but I can't remember, can we
terminate on a controller service?

The issue here would be on first time enabling the the HBase client
service, so before even getting to a processor.

On Tue, Feb 18, 2020 at 2:00 PM Pierre Villard
<pierre.villard...@gmail.com> wrote:


Bryan,

I didn't follow the whole discussion so I apologize if I'm saying

something

stupid here. Now that we have the possibility to terminate threads

in a

processor, would that solve the issue?

Pierre

Le mar. 18 févr. 2020 à 10:52, Bryan Bende <bbe...@gmail.com> a

écrit

Hi Josh,

The problem isn't so much about the retries within the flow, its

more

about setting up the service for the first time.

A common scenario for users was the following:

- Create a new HBase client service
- Enter some config that wasn't quite correct, possibly hostnames

that

weren't reachable from nifi as one example
- Enable service and enter retry loop
- Attempt to disable service to fix config, but have to wait 5+

mins

for the retries to finish

Maybe a lazy initialization of the connection on our side would

help

here, although it would just be moving the problem until later

(i.e.

service immediately enables because nothing is happening, then they
find out about config problems later when a flow file hits an hbase
processor).

I guess the ideal scenario would be to have different logic for
initializing the connection vs. using it, so that there wouldn't be
retries during initialization.

-Bryan



On Tue, Feb 18, 2020 at 1:21 PM Josh Elser <josh.el...@gmail.com>

wrote:


Hiya!

LarsF brought this up in the apache-hbase slack account and it

caught my

eye. Sending a note here since the PR is closed where this was

being

discussed before[1].

I understand Bryan's concerns that misconfiguration of an HBase
processor with a high number of retries and back-off can create a
situation in which the processing of a single FlowFile will take a

very

long time to hit the onFailure state.

However, as an HBase developer, I can confidently state that
hbase.client.retries=1 will create scenarios in which you'll be

pushing

a FlowFile through a retry loop inside of NiFi for things which

should

be implicitly retried inside of the HBase client.

For example, if a Region is being moved between two RegionServers

and an

HBase processor is trying to read/write to that Region, the client

will

see an exception. This is a "retriable" exception in

HBase-parlance

which means that HBase client code would automatically re-process

that

request (looking for the new location of that Region first). In

most

cases, the subsequent RPC would succeed and the caller is

non-the-wiser

and the whole retry logic took 1's of milliseconds.

My first idea was also what Lars' had suggested -- can we come up

with a

sanity check to validate "correct" configuration for the processor
before we throw the waterfall of data at it? I can respect if

processors

don't have a "good" hook to do such a check.

What _would_ be the ideal semantics from NiFi's? perspective? We

have

the ability to implicitly retry operations and also control the

retry

backoff values. Is there something more we could do from the HBase

side,

given what y'all have seen from the battlefield?

Thanks!

- Josh

[1] https://github.com/apache/nifi/pull/3425

Re: [DISCUSS] On HBase client retries (NIFI-6197)

Reply via email to