I think with the retry one has to distinguish between:

1. currently running tasks that have not yet completed (current timeout
   behavior)
2. tasks that have failed because the server died.

In case #1, we currently have the ability to compound any load error, by forwarding the request to the cluster again, thus just adding unnecessary load. In this case we should NEVER just randomly retry.

In case #2, the client should retry another server because the original request might have been lost, never completed, or completed but not yet notified on result.  THIS would be the ONLY case were an auto retry makes sense....

--Udo


On 8/31/17 11:10, Mark Hanson wrote:
This basic problem exists with the following cases.
Interval: to do something at an interval
Wait: to wait a certain length of time
Retry: to retry a certain number of times
Attempts: to make a certain number of attempts (similar to retry)
Sets of objects: to iterate through an unscoped set of objects.

On Thu, Aug 31, 2017 at 11:04 AM, Jacob Barrett <jbarr...@pivotal.io> wrote:

I should have scoped it to the native API.

On Aug 31, 2017, at 10:30 AM, Bruce Schuchardt <bschucha...@pivotal.io>
wrote:
The DistributedLockService uses -1/0/n


On 8/31/17 10:21 AM, Jacob Barrett wrote:
In relation to this particular example you provided the discussion of
removing it is valid as an alternative to fixing it.
Are there other examples of this -1/0/n parameter style we should
discuss?
-Jake


Sent from my iPhone

On Aug 31, 2017, at 10:15 AM, Mark Hanson <mhan...@pivotal.io> wrote:

As I understand it here, the question is when the first server is no
longer
available, do we retry on another server. I would say the answer is
clearly
yes and we in the name of controlling load want to have an API that
controls the timing of how that is done. The customer can say no
retries
and they can right their own........

This is a little bit off the topic of the much larger topic though. The
reason I was told to send this email was to broach the larger
discussion of
iteration and the overloading to use -1 to mean infinite. At least
that is
my understanding...


On Thu, Aug 31, 2017 at 9:32 AM, Udo Kohlmeyer <ukohlme...@pivotal.io>
wrote:

+1 to removing retry,

Imo, the retry should made the responsibility of the submitting
application. When an operation fails, the user should have to decide
if
they should retry or not. It should not be default behavior of a
connection
pool.

--Udo



On 8/31/17 09:26, Dan Smith wrote:

The java client does still have a retry-attempts setting - it's
pretty
much
the same as the C++ API.

I agree with Bruce though, I think the current retry behavior is not
ideal.
I think it only really makes sense for the client to retry an
operation
that it actually sent to the server if the server stops responding to
pings. The believe the current retry behavior just waits the
read-timeout
and then retries the operation on a new server.

-Dan

On Thu, Aug 31, 2017 at 8:08 AM, Bruce Schuchardt <
bschucha...@pivotal.io
wrote:

Does anyone have a good argument for clients retrying operations?  I
can
see doing that if the server has died but otherwise it just
overloads the
servers.




On 8/30/17 8:36 PM, Dan Smith wrote:

In general, I think we need making the configuration of geode less
complex,
not more.

As far as retry-attempts goes, maybe the best thing to do is to
get rid
of
it. The P2P layer has no such concept. I don't think users should
really
have to care about how many servers an operation is attempted
against. A
user may want to specify how long an operation is allowed to take,
but
that
could be better specified with an operation timeout rather than the
current
read-timeout + retry-attempts.

-Dan



On Wed, Aug 30, 2017 at 2:08 PM, Patrick Rhomberg <
prhomb...@pivotal.io
wrote:

Personally, I don't much like sentinel values, even if they have
their
occasional use.

Do we need to provide an authentic infinite value?  64-bit MAXINT
is
nearly
10 quintillion.  At 10GHz, that still takes almost three years.
If
each
retry takes as much as 10ms, we're still looking at "retry for as
long
as
the earth has existed."  32-bit's is much more attainable, of
course,
but I
think the point stands -- if you need to retry that much,
something
else
is
very wrong.

In the more general sense, I struggle to think of a context where
an
authentic infinity is meaningfully distinct in application from a
massive
finite like MAXINT.  But I could be wrong and would love to hear
what
other
people think.

On Wed, Aug 30, 2017 at 1:26 PM, Mark Hanson <mhan...@pivotal.io>
wrote:

Hi All,

*Question: how should we deal in a very forward and clean
fashion with
the
implicit ambiguity of -1 or all, or infinite, or forever?*
*Background:*


We are looking to get some feedback on the subject of

infinite/all/forever
in the geode/geode-native code.
In looking at the code, we see an example function,


setRetryAttempts
<https://github.com/apache/geode-native/blob/
006df0e70eeb481ef5e9e821dba0050dee9c6893/cppcache/include/
geode/PoolFactory.hpp#L327>()
[1] currently -1 means try all servers before failing. 0 means
try 1
server
before failing, and a number greater than 0 means try number of
servers
+1
before failing. In the case of setRetryAttempts, we don’t know
how many
servers there are. This means that -1 for "All" servers has no
relation

to
the actual number of servers that we have. Perhaps
setRetryAttempts
could
be renamed to setNumberOfAttempts to clarify as well, but the
problem
still
stands...
*Discussion:*


In an attempt to provide the best code possible to the geode
community,
there has been some discussion of the use of
infinite/all/forever as
an
overload of a count. Often -1 indicates infinite, while 0
indicates
never,
and 1 to MAXINT, inclusive, indicates a count.
There are three obvious approaches to solve the problem of the

overloading
of -1. The first approach is do nothing… Status quo.
The second approach to clarify things would be to create an
enumeration
that would be passed in as well as the number or an object..


struct Retries

{

    typedef enum { eINFINITE, eCOUNT, eNONE} eCount;

    eCount approach;

    unsigned int count;

};



The third approach would be to pass a continue object of some
sort
such
that it tells you if it is ok to continue through the use of an

algorithm.
An example would be
class Continue

{

virtual bool Continue() = 0;

}


class InfiniteContinue : public Continue

{

bool Continue()

{

return true;

}

}


Continue co = InfiniteContinue;


while( co.Continue() )

{

//do a thing

}


Another example would be a Continue limited to 5 let’s say,


class CountContinue : public Continue

{

private:

int count;


public:

CountContinue(int count)

{

this.count = count;

}

bool Continue()

{

    return count— > 0;

}

}


In both of these cases what is happening is that the algorithm is
being
outsourced.


*Conclusion:*


We are putting this out, to start a discussion on the best way
to move
this
forward… *What do people think? What direction would be the best
going
forward?*


[1]
https://github.com/apache/geode-native/blob/

006df0e70eeb481ef5e9e821dba005
0dee9c6893/cppcache/include/geode/PoolFactory.hpp#L327


Reply via email to