Igor, thanks for the reply. > Approach with taskId will require a lot of changes in protocol and thus more "heavy" for implementation Do you mean approach with server notifications mechanism? Yes, it will require a lot of changes. But in most recent messages we've discussed with Pavel approach without server notifications mechanism. This approach have the same complexity and performance as an approach with requestId.
> But such clients as Python, Node.js, PHP, Go most probably won't have support for this API, at least for now. Without a server notifications mechanism, there will be no breaking changes in the protocol, so client implementation can just skip this feature and protocol version and implement the next one. > Or never. I think it still useful to execute java compute tasks from non-java thin clients. Also, we can provide some out-of-the-box java tasks, for example ExecutePythonScriptTask with python compute implementation, which can run python script on server node. > So, maybe it's a good time for us to change our backward compatibility mechanism from protocol versioning to feature masks? I like the idea with feature masks, but it will force us to support both backward compatibility mechanisms, protocol versioning and feature masks. пн, 20 янв. 2020 г. в 20:34, Pavel Tupitsyn <ptupit...@apache.org>: > Huge +1 from me for Feature Masks. > I think this should be our top priority for thin client protocol, since it > simplifies change management a lot. > > On Mon, Jan 20, 2020 at 8:21 PM Igor Sapego <isap...@apache.org> wrote: > > > Sorry for the late reply. > > > > Approach with taskId will require a lot of changes in protocol and thus > > more "heavy" for implementation, but it definitely looks to me less hacky > > than reqId-approach. Moreover, as was mentioned, server notifications > > mechanism will be required in a future anyway with high probability. So > > from this point of view I like taskId-approach. > > > > On the other hand, what we should also consider here is performance. > > Speaking of latency, it looks like reqId will have better results in case > > of > > small and fast tasks. The only question here, if we want to optimize thin > > clients for this case. > > > > Also, what are you talking about mostly involves clients on platforms > > that already have Compute API for thick clients. Let me mention one > > more point of view here and another concern here. > > > > The changes you propose are going to change protocol version for sure. > > In case with taskId approach and server notifications - even more so. > > > > But such clients as Python, Node.js, PHP, Go most probably won't have > > support for this API, at least for now. Or never. But current > > backward-compatibility mechanism implies protocol versions where we > > imply that client that supports version 1.5 also supports all the > features > > introduced in all the previous versions of the protocol. > > > > Thus implementing Compute API in any of the proposed ways *may* > > force mentioned clients to support changes in protocol which they not > > necessarily need in order to introduce new features in the future. > > > > So, maybe it's a good time for us to change our backward compatibility > > mechanism from protocol versioning to feature masks? > > > > WDYT? > > > > Best Regards, > > Igor > > > > > > On Fri, Jan 17, 2020 at 9:37 AM Alex Plehanov <plehanov.a...@gmail.com> > > wrote: > > > > > Looks like we didn't rich consensus here. > > > > > > Igor, as thin client maintainer, can you please share your opinion? > > > > > > Everyone else also welcome, please share your thoughts about options to > > > implement operations for compute. > > > > > > > > > чт, 28 нояб. 2019 г. в 10:02, Alex Plehanov <plehanov.a...@gmail.com>: > > > > > > > > Since all thin client operations are inherently async, we should be > > > able > > > > to cancel any of them > > > > It's illogical to have such ability. What should do cancel operation > of > > > > cancel operation? Moreover, sometimes it's dangerous, for example, > > create > > > > cache operation should never be canceled. There should be an explicit > > set > > > > of processes that we can cancel: queries, transactions, tasks, > > services. > > > > The lifecycle of services is more complex than the lifecycle of > tasks. > > > With > > > > services, I suppose, we can't use request cancelation, so tasks will > be > > > the > > > > only process with an exceptional pattern. > > > > > > > > > The request would be "execute task with specified node filter" - > > simple > > > > and efficient. > > > > It's not simple: every compute or service request should contain > > complex > > > > node filtering logic, which duplicates the same logic for cluster > API. > > > > It's not efficient: for example, we can't implement forPredicate() > > > > filtering in this case. > > > > > > > > > > > > ср, 27 нояб. 2019 г. в 19:25, Pavel Tupitsyn <ptupit...@apache.org>: > > > > > > > >> > The request is already processed (task is started), we can't > cancel > > > the > > > >> request > > > >> The request is not "start a task". It is "execute task" (and get > > > result). > > > >> Same as "cache get" - you get a result in the end, we don't "start > > cache > > > >> get" then "end cache get". > > > >> > > > >> Since all thin client operations are inherently async, we should be > > able > > > >> to > > > >> cancel any of them > > > >> by sending another request with an id of prior request to be > > cancelled. > > > >> That's why I'm advocating for this approach - it will work for > > anything, > > > >> no > > > >> special cases. > > > >> And it keeps "happy path" as simple as it is right now. > > > >> > > > >> Queries are different because we retrieve results in pages, we can't > > do > > > >> them as one request. > > > >> Transactions are also different because client controls when they > > should > > > >> end. > > > >> There is no reason for task execution to be a special case like > > queries > > > or > > > >> transactions. > > > >> > > > >> > we always need to send 2 requests to server to execute the task > > > >> Nope. We don't need to get nodes on client at all. > > > >> The request would be "execute task with specified node filter" - > > simple > > > >> and > > > >> efficient. > > > >> > > > >> > > > >> On Wed, Nov 27, 2019 at 4:31 PM Alex Plehanov < > > plehanov.a...@gmail.com> > > > >> wrote: > > > >> > > > >> > > We do cancel a request to perform a task. We may and should use > > > this > > > >> to > > > >> > cancel any other request in future. > > > >> > The request is already processed (task is started), we can't > cancel > > > the > > > >> > request. As you mentioned before, we already do almost the same > for > > > >> queries > > > >> > (close the cursor, but not cancel the request to run a query), > it's > > > >> better > > > >> > to do such things in a common way. We have a pattern: start some > > > process > > > >> > (query, transaction), get id of this process, end process by this > > id. > > > >> The > > > >> > "Execute task" process should match the same pattern. In my > opinion, > > > >> > implementation with two-way requests is the best option to match > > this > > > >> > pattern (we can even reuse OP_RESOURCE_CLOSE operation type in > this > > > >> case). > > > >> > Sometime in the future, we will need two-way requests for some > other > > > >> > functionality (continuous queries, event listening, etc). But even > > > >> without > > > >> > two-way requests introducing some process id (task id in our case) > > > will > > > >> be > > > >> > closer to existing pattern than canceling tasks by request id. > > > >> > > > > >> > > So every new request will apply those filters on server side, > > using > > > >> the > > > >> > most recent set of nodes. > > > >> > In this case, we always need to send 2 requests to server to > execute > > > the > > > >> > task. First - to get nodes by the filter, second - to actually > > execute > > > >> the > > > >> > task. It seems like overhead. The same will be for services. > Cluster > > > >> group > > > >> > remains the same if the topology hasn't changed. We can use this > > fact > > > >> and > > > >> > bind "execute task" request to topology. If topology has changed - > > get > > > >> > nodes for new topology and retry request. > > > >> > > > > >> > вт, 26 нояб. 2019 г. в 17:44, Pavel Tupitsyn < > ptupit...@apache.org > > >: > > > >> > > > > >> > > > After all, we don't cancel request > > > >> > > We do cancel a request to perform a task. We may and should use > > this > > > >> to > > > >> > > cancel any other request in future. > > > >> > > > > > >> > > > Client uses some cluster group filtration (for example > > > forServers() > > > >> > > cluster group) > > > >> > > Please see above - Aleksandr Shapkin described how we store > > > >> > > filtered cluster groups on client. > > > >> > > We don't store node IDs, we store actual filters. So every new > > > request > > > >> > will > > > >> > > apply those filters on server side, > > > >> > > using the most recent set of nodes. > > > >> > > > > > >> > > var myGrp = cluster.forServers().forAttribute("foo"); // This > does > > > not > > > >> > > issue any server requests, just builds an object with filters on > > > >> client > > > >> > > while (true) myGrp.compute().executeTask("bar"); // Every > request > > > >> > includes > > > >> > > filters, and filters are applied on the server side > > > >> > > > > > >> > > On Tue, Nov 26, 2019 at 1:42 PM Alex Plehanov < > > > >> plehanov.a...@gmail.com> > > > >> > > wrote: > > > >> > > > > > >> > > > > Anyway, my point stands. > > > >> > > > I can't agree. Why you don't want to use task id for this? > After > > > >> all, > > > >> > we > > > >> > > > don't cancel request (request is already processed), we cancel > > the > > > >> > task. > > > >> > > So > > > >> > > > it's more convenient to use task id here. > > > >> > > > > > > >> > > > > Can you please provide equivalent use case with existing > > "thick" > > > >> > > client? > > > >> > > > For example: > > > >> > > > Cluster consists of one server node. > > > >> > > > Client uses some cluster group filtration (for example > > > forServers() > > > >> > > cluster > > > >> > > > group). > > > >> > > > Client starts to send periodically (for example 1 per minute) > > > >> long-term > > > >> > > > (for example 1 hour long) tasks to the cluster. > > > >> > > > Meanwhile, several server nodes joined the cluster. > > > >> > > > > > > >> > > > In case of thick client: All server nodes will be used, tasks > > will > > > >> be > > > >> > > load > > > >> > > > balanced. > > > >> > > > In case of thin client: Only one server node will be used, > > client > > > >> will > > > >> > > > detect topology change after an hour. > > > >> > > > > > > >> > > > > > > >> > > > вт, 26 нояб. 2019 г. в 11:50, Pavel Tupitsyn < > > > ptupit...@apache.org > > > >> >: > > > >> > > > > > > >> > > > > > I can't see any usage of request id in query cursors > > > >> > > > > You are right, cursor id is a separate thing. > > > >> > > > > Anyway, my point stands. > > > >> > > > > > > > >> > > > > > client sends long term tasks to nodes and wants to do it > > with > > > >> load > > > >> > > > > balancing > > > >> > > > > I still don't get it. Can you please provide equivalent use > > case > > > >> with > > > >> > > > > existing "thick" client? > > > >> > > > > > > > >> > > > > > > > >> > > > > On Mon, Nov 25, 2019 at 11:59 PM Alex Plehanov < > > > >> > > plehanov.a...@gmail.com> > > > >> > > > > wrote: > > > >> > > > > > > > >> > > > > > > And it is fine to use request ID to identify compute > tasks > > > >> (as we > > > >> > > do > > > >> > > > > with > > > >> > > > > > query cursors). > > > >> > > > > > I can't see any usage of request id in query cursors. We > > send > > > >> query > > > >> > > > > request > > > >> > > > > > and get cursor id in response. After that, we only use > > cursor > > > id > > > >> > (to > > > >> > > > get > > > >> > > > > > next pages and to close the resource). Did I miss > something? > > > >> > > > > > > > > >> > > > > > > Looks like I'm missing something - how is topology > change > > > >> > relevant > > > >> > > to > > > >> > > > > > executing compute tasks from client? > > > >> > > > > > It's not relevant directly. But there are some cases where > > it > > > >> will > > > >> > be > > > >> > > > > > helpful. For example, if client sends long term tasks to > > nodes > > > >> and > > > >> > > > wants > > > >> > > > > to > > > >> > > > > > do it with load balancing it will detect topology change > > only > > > >> after > > > >> > > > some > > > >> > > > > > time in the future with the first response, so load > > balancing > > > >> will > > > >> > no > > > >> > > > > work. > > > >> > > > > > Perhaps we can add optional "topology version" field to > the > > > >> > > > > > OP_COMPUTE_EXECUTE_TASK request to solve this problem. > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > пн, 25 нояб. 2019 г. в 22:42, Pavel Tupitsyn < > > > >> ptupit...@apache.org > > > >> > >: > > > >> > > > > > > > > >> > > > > > > Alex, > > > >> > > > > > > > > > >> > > > > > > > we will mix entities from different layers (transport > > > layer > > > >> and > > > >> > > > > request > > > >> > > > > > > body) > > > >> > > > > > > I would not call our message header (which includes the > > id) > > > >> > > > "transport > > > >> > > > > > > layer". > > > >> > > > > > > TCP is our transport layer. And it is fine to use > request > > ID > > > >> to > > > >> > > > > identify > > > >> > > > > > > compute tasks (as we do with query cursors). > > > >> > > > > > > > > > >> > > > > > > > we still can't be sure that the task is successfully > > > started > > > >> > on a > > > >> > > > > > server > > > >> > > > > > > The request to start the task will fail and we'll get a > > > >> response > > > >> > > > > > indicating > > > >> > > > > > > that right away > > > >> > > > > > > > > > >> > > > > > > > we won't ever know about topology change > > > >> > > > > > > Looks like I'm missing something - how is topology > change > > > >> > relevant > > > >> > > to > > > >> > > > > > > executing compute tasks from client? > > > >> > > > > > > > > > >> > > > > > > On Mon, Nov 25, 2019 at 10:17 PM Alex Plehanov < > > > >> > > > > plehanov.a...@gmail.com> > > > >> > > > > > > wrote: > > > >> > > > > > > > > > >> > > > > > > > Pavel, in this case, we will mix entities from > different > > > >> layers > > > >> > > > > > > (transport > > > >> > > > > > > > layer and request body), it's not very good. The same > > > >> behavior > > > >> > we > > > >> > > > can > > > >> > > > > > > > achieve with generated on client-side task id, but > there > > > >> will > > > >> > be > > > >> > > no > > > >> > > > > > > > inter-layer data intersection and I think it will be > > > easier > > > >> to > > > >> > > > > > implement > > > >> > > > > > > on > > > >> > > > > > > > both client and server-side. But we still can't be > sure > > > that > > > >> > the > > > >> > > > task > > > >> > > > > > is > > > >> > > > > > > > successfully started on a server. We won't ever know > > about > > > >> > > topology > > > >> > > > > > > change, > > > >> > > > > > > > because topology changed flag will be sent from server > > to > > > >> > client > > > >> > > > only > > > >> > > > > > > with > > > >> > > > > > > > a response when the task will be completed. Are we > > accept > > > >> that? > > > >> > > > > > > > > > > >> > > > > > > > пн, 25 нояб. 2019 г. в 19:07, Pavel Tupitsyn < > > > >> > > ptupit...@apache.org > > > >> > > > >: > > > >> > > > > > > > > > > >> > > > > > > > > Alex, > > > >> > > > > > > > > > > > >> > > > > > > > > I have a simpler idea. We already do request id > > handling > > > >> in > > > >> > the > > > >> > > > > > > protocol, > > > >> > > > > > > > > so: > > > >> > > > > > > > > - Client sends a normal request to execute compute > > task. > > > >> > > Request > > > >> > > > ID > > > >> > > > > > is > > > >> > > > > > > > > generated as usual. > > > >> > > > > > > > > - As soon as task is completed, a response is > > received. > > > >> > > > > > > > > > > > >> > > > > > > > > As for cancellation - client can send a new request > > > (with > > > >> new > > > >> > > > > request > > > >> > > > > > > ID) > > > >> > > > > > > > > and (in the body) pass the request ID from above > > > >> > > > > > > > > as a task identifier. As a result, there are two > > > >> responses: > > > >> > > > > > > > > - Cancellation response > > > >> > > > > > > > > - Task response (with proper cancelled status) > > > >> > > > > > > > > > > > >> > > > > > > > > That's it, no need to modify the core of the > protocol. > > > One > > > >> > > > request > > > >> > > > > - > > > >> > > > > > > one > > > >> > > > > > > > > response. > > > >> > > > > > > > > > > > >> > > > > > > > > On Mon, Nov 25, 2019 at 6:20 PM Alex Plehanov < > > > >> > > > > > plehanov.a...@gmail.com > > > >> > > > > > > > > > > >> > > > > > > > > wrote: > > > >> > > > > > > > > > > > >> > > > > > > > > > Pavel, we need to inform the client when the task > is > > > >> > > completed, > > > >> > > > > we > > > >> > > > > > > need > > > >> > > > > > > > > the > > > >> > > > > > > > > > ability to cancel the task. I see several ways to > > > >> implement > > > >> > > > this: > > > >> > > > > > > > > > > > > >> > > > > > > > > > 1. Сlient sends a request to the server to start a > > > task, > > > >> > > server > > > >> > > > > > > return > > > >> > > > > > > > > task > > > >> > > > > > > > > > id in response. Server notifies client when task > is > > > >> > completed > > > >> > > > > with > > > >> > > > > > a > > > >> > > > > > > > new > > > >> > > > > > > > > > request (from server to client). Client can cancel > > the > > > >> task > > > >> > > by > > > >> > > > > > > sending > > > >> > > > > > > > a > > > >> > > > > > > > > > new request with operation type "cancel" and task > > id. > > > In > > > >> > this > > > >> > > > > case, > > > >> > > > > > > we > > > >> > > > > > > > > > should implement 2-ways requests. > > > >> > > > > > > > > > 2. Client generates unique task id and sends a > > request > > > >> to > > > >> > the > > > >> > > > > > server > > > >> > > > > > > to > > > >> > > > > > > > > > start a task, server don't reply immediately but > > wait > > > >> until > > > >> > > > task > > > >> > > > > is > > > >> > > > > > > > > > completed. Client can cancel task by sending new > > > request > > > >> > with > > > >> > > > > > > operation > > > >> > > > > > > > > > type "cancel" and task id. In this case, we should > > > >> decouple > > > >> > > > > request > > > >> > > > > > > and > > > >> > > > > > > > > > response on the server-side (currently response is > > > sent > > > >> > right > > > >> > > > > after > > > >> > > > > > > > > request > > > >> > > > > > > > > > was processed). Also, we can't be sure that task > is > > > >> > > > successfully > > > >> > > > > > > > started > > > >> > > > > > > > > on > > > >> > > > > > > > > > a server. > > > >> > > > > > > > > > 3. Client sends a request to the server to start a > > > task, > > > >> > > server > > > >> > > > > > > return > > > >> > > > > > > > id > > > >> > > > > > > > > > in response. Client periodically asks the server > > about > > > >> task > > > >> > > > > status. > > > >> > > > > > > > > Client > > > >> > > > > > > > > > can cancel the task by sending new request with > > > >> operation > > > >> > > type > > > >> > > > > > > "cancel" > > > >> > > > > > > > > and > > > >> > > > > > > > > > task id. This case brings some overhead to the > > > >> > communication > > > >> > > > > > channel. > > > >> > > > > > > > > > > > > >> > > > > > > > > > Personally, I think that the case with 2-ways > > requests > > > >> is > > > >> > > > better, > > > >> > > > > > but > > > >> > > > > > > > I'm > > > >> > > > > > > > > > open to any other ideas. > > > >> > > > > > > > > > > > > >> > > > > > > > > > Aleksandr, > > > >> > > > > > > > > > > > > >> > > > > > > > > > Filtering logic for OP_CLUSTER_GROUP_GET_NODE_IDS > > > looks > > > >> > > > > > > > overcomplicated. > > > >> > > > > > > > > Do > > > >> > > > > > > > > > we need server-side filtering at all? Wouldn't it > be > > > >> better > > > >> > > to > > > >> > > > > send > > > >> > > > > > > > basic > > > >> > > > > > > > > > info (ids, order, flags) for all nodes (there is > > > >> relatively > > > >> > > > small > > > >> > > > > > > > amount > > > >> > > > > > > > > of > > > >> > > > > > > > > > data) and extended info (attributes) for selected > > list > > > >> of > > > >> > > > nodes? > > > >> > > > > In > > > >> > > > > > > > this > > > >> > > > > > > > > > case, we can do basic node filtration on > client-side > > > >> > > > > (forClients(), > > > >> > > > > > > > > > forServers(), forNodeIds(), forOthers(), etc). > > > >> > > > > > > > > > > > > >> > > > > > > > > > Do you use standard ClusterNode serialization? > There > > > are > > > >> > also > > > >> > > > > > metrics > > > >> > > > > > > > > > serialized with ClusterNode, do we need it on thin > > > >> client? > > > >> > > > There > > > >> > > > > > are > > > >> > > > > > > > > other > > > >> > > > > > > > > > interfaces exist to show metrics, I think it's > > > >> redundant to > > > >> > > > > export > > > >> > > > > > > > > metrics > > > >> > > > > > > > > > to thin clients too. > > > >> > > > > > > > > > > > > >> > > > > > > > > > What do you think? > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > пт, 22 нояб. 2019 г. в 20:15, Aleksandr Shapkin < > > > >> > > > > lexw...@gmail.com > > > >> > > > > > >: > > > >> > > > > > > > > > > > > >> > > > > > > > > > > Alex, > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > I think you can create a new IEP page and I will > > > fill > > > >> it > > > >> > > with > > > >> > > > > the > > > >> > > > > > > > > Cluster > > > >> > > > > > > > > > > API details. > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > In short, I’ve introduced several new codes: > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > Cluster API is pretty straightforward: > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > OP_CLUSTER_IS_ACTIVE = 5000 > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > OP_CLUSTER_CHANGE_STATE = 5001 > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > OP_CLUSTER_CHANGE_WAL_STATE = 5002 > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > OP_CLUSTER_GET_WAL_STATE = 5003 > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > Cluster group codes: > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > OP_CLUSTER_GROUP_GET_NODE_IDS = 5100 > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > OP_CLUSTER_GROUP_GET_NODE_INFO = 5101 > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > The underlying implementation is based on the > > thick > > > >> > client > > > >> > > > > logic. > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > For every request, we provide a known topology > > > version > > > >> > and > > > >> > > if > > > >> > > > > it > > > >> > > > > > > has > > > >> > > > > > > > > > > changed, > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > a client updates it firstly and then re-sends > the > > > >> > filtering > > > >> > > > > > > request. > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > Alongside the topVer a client sends a serialized > > > nodes > > > >> > > > > projection > > > >> > > > > > > > > object > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > that could be considered as a code to value > > mapping. > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > Consider: [{Code = 1, Value= [“DotNet”, > > > >> “MyAttribute”}, > > > >> > > > > {Code=2, > > > >> > > > > > > > > > Value=1}] > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > Where “1” stands for Attribute filtering and > “2” – > > > >> > > > > > serverNodesOnly > > > >> > > > > > > > > flag. > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > As a result of request processing, a server > sends > > > >> nodeId > > > >> > > > UUIDs > > > >> > > > > > and > > > >> > > > > > > a > > > >> > > > > > > > > > > current topVer. > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > When a client obtains nodeIds, it can perform a > > > >> NODE_INFO > > > >> > > > call > > > >> > > > > to > > > >> > > > > > > > get a > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > serialized ClusterNode object. In addition there > > > >> should > > > >> > be > > > >> > > a > > > >> > > > > > > > different > > > >> > > > > > > > > > API > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > method for accessing/updating node metrics. > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > чт, 21 нояб. 2019 г. в 12:32, Sergey Kozlov < > > > >> > > > > > skoz...@gridgain.com > > > >> > > > > > > >: > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Hi Pavel > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > On Thu, Nov 21, 2019 at 11:30 AM Pavel > Tupitsyn > > < > > > >> > > > > > > > > ptupit...@apache.org> > > > >> > > > > > > > > > > > wrote: > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > 1. I believe that Cluster operations for > Thin > > > >> Client > > > >> > > > > protocol > > > >> > > > > > > are > > > >> > > > > > > > > > > already > > > >> > > > > > > > > > > > > in the works > > > >> > > > > > > > > > > > > by Alexandr Shapkin. Can't find the ticket > > > though. > > > >> > > > > > > > > > > > > Alexandr, can you please confirm and attach > > the > > > >> > ticket > > > >> > > > > > number? > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > 2. Proposed changes will work only for Java > > > tasks > > > >> > that > > > >> > > > are > > > >> > > > > > > > already > > > >> > > > > > > > > > > > deployed > > > >> > > > > > > > > > > > > on server nodes. > > > >> > > > > > > > > > > > > This is mostly useless for other thin > clients > > we > > > >> have > > > >> > > > > > (Python, > > > >> > > > > > > > PHP, > > > >> > > > > > > > > > > .NET, > > > >> > > > > > > > > > > > > C++). > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > I don't guess so. The task (execution) is a > way > > to > > > >> > > > implement > > > >> > > > > > own > > > >> > > > > > > > > layer > > > >> > > > > > > > > > > for > > > >> > > > > > > > > > > > the thin client application. > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > We should think of a way to make this useful > > for > > > >> all > > > >> > > > > clients. > > > >> > > > > > > > > > > > > For example, we may allow sending tasks in > > some > > > >> > > scripting > > > >> > > > > > > > language > > > >> > > > > > > > > > like > > > >> > > > > > > > > > > > > Javascript. > > > >> > > > > > > > > > > > > Thoughts? > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > The arbitrary code execution from a remote > > client > > > >> must > > > >> > be > > > >> > > > > > > protected > > > >> > > > > > > > > > > > from malicious code. > > > >> > > > > > > > > > > > I don't know how it could be designed but > > without > > > >> that > > > >> > we > > > >> > > > > open > > > >> > > > > > > the > > > >> > > > > > > > > hole > > > >> > > > > > > > > > > to > > > >> > > > > > > > > > > > kill cluster. > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > On Thu, Nov 21, 2019 at 11:21 AM Sergey > > Kozlov < > > > >> > > > > > > > > skoz...@gridgain.com > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > wrote: > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Hi Alex > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > The idea is great. But I have some > concerns > > > that > > > >> > > > probably > > > >> > > > > > > > should > > > >> > > > > > > > > be > > > >> > > > > > > > > > > > taken > > > >> > > > > > > > > > > > > > into account for design: > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > 1. We need to have the ability to stop > a > > > task > > > >> > > > > execution, > > > >> > > > > > > > smth > > > >> > > > > > > > > > like > > > >> > > > > > > > > > > > > > OP_COMPUTE_CANCEL_TASK operation > (client > > > to > > > >> > > server) > > > >> > > > > > > > > > > > > > 2. What's about task execution timeout? > > It > > > >> may > > > >> > > help > > > >> > > > to > > > >> > > > > > the > > > >> > > > > > > > > > cluster > > > >> > > > > > > > > > > > > > survival for buggy tasks > > > >> > > > > > > > > > > > > > 3. Ignite doesn't have > > roles/authorization > > > >> > > > > functionality > > > >> > > > > > > for > > > >> > > > > > > > > > now. > > > >> > > > > > > > > > > > But > > > >> > > > > > > > > > > > > a > > > >> > > > > > > > > > > > > > task is the risky operation for cluster > > > (for > > > >> > > > security > > > >> > > > > > > > > reasons). > > > >> > > > > > > > > > > > Could > > > >> > > > > > > > > > > > > we > > > >> > > > > > > > > > > > > > add for Ignite configuration new > options: > > > >> > > > > > > > > > > > > > - Explicit turning on for compute > task > > > >> > support > > > >> > > > for > > > >> > > > > > thin > > > >> > > > > > > > > > > protocol > > > >> > > > > > > > > > > > > > (disabled by default) for whole > > cluster > > > >> > > > > > > > > > > > > > - Explicit turning on for compute > task > > > >> > support > > > >> > > > for > > > >> > > > > a > > > >> > > > > > > node > > > >> > > > > > > > > > > > > > - The list of task names (classes) > > > >> allowed to > > > >> > > > > execute > > > >> > > > > > > by > > > >> > > > > > > > > thin > > > >> > > > > > > > > > > > > client. > > > >> > > > > > > > > > > > > > 4. Support the labeling for task that > may > > > >> help > > > >> > to > > > >> > > > > > > > investigate > > > >> > > > > > > > > > > issues > > > >> > > > > > > > > > > > > on > > > >> > > > > > > > > > > > > > cluster (the idea from IEP-34 [1]) > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > 1. > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-34+Thin+client%3A+transactions+support > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > On Thu, Nov 21, 2019 at 10:58 AM Alex > > > Plehanov < > > > >> > > > > > > > > > > > plehanov.a...@gmail.com> > > > >> > > > > > > > > > > > > > wrote: > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hello, Igniters! > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > I have plans to start implementation of > > > >> Compute > > > >> > > > > interface > > > >> > > > > > > for > > > >> > > > > > > > > > > Ignite > > > >> > > > > > > > > > > > > thin > > > >> > > > > > > > > > > > > > > client and want to discuss features that > > > >> should > > > >> > be > > > >> > > > > > > > implemented. > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > We already have Compute implementation > for > > > >> > > > binary-rest > > > >> > > > > > > > clients > > > >> > > > > > > > > > > > > > > (GridClientCompute), which have the > > > following > > > >> > > > > > > functionality: > > > >> > > > > > > > > > > > > > > - Filtering cluster nodes (projection) > for > > > >> > compute > > > >> > > > > > > > > > > > > > > - Executing task by the name > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > I think we can implement this > > functionality > > > >> in a > > > >> > > thin > > > >> > > > > > > client > > > >> > > > > > > > as > > > >> > > > > > > > > > > well. > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > First of all, we need some operation > types > > > to > > > >> > > > request a > > > >> > > > > > > list > > > >> > > > > > > > of > > > >> > > > > > > > > > all > > > >> > > > > > > > > > > > > > > available nodes and probably node > > attributes > > > >> (by > > > >> > a > > > >> > > > list > > > >> > > > > > of > > > >> > > > > > > > > > nodes). > > > >> > > > > > > > > > > > Node > > > >> > > > > > > > > > > > > > > attributes will be helpful if we will > > decide > > > >> to > > > >> > > > > implement > > > >> > > > > > > > > analog > > > >> > > > > > > > > > of > > > >> > > > > > > > > > > > > > > ClusterGroup#forAttribute or > > > >> > > > ClusterGroup#forePredicate > > > >> > > > > > > > methods > > > >> > > > > > > > > > in > > > >> > > > > > > > > > > > the > > > >> > > > > > > > > > > > > > thin > > > >> > > > > > > > > > > > > > > client. Perhaps they can be requested > > > lazily. > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > From the protocol point of view there > will > > > be > > > >> two > > > >> > > new > > > >> > > > > > > > > operations: > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > OP_CLUSTER_GET_NODES > > > >> > > > > > > > > > > > > > > Request: empty > > > >> > > > > > > > > > > > > > > Response: long topologyVersion, int > > > >> > > > > minorTopologyVersion, > > > >> > > > > > > int > > > >> > > > > > > > > > > > > nodesCount, > > > >> > > > > > > > > > > > > > > for each node set of node fields (UUID > > > nodeId, > > > >> > > Object > > > >> > > > > or > > > >> > > > > > > > String > > > >> > > > > > > > > > > > > > > consistentId, long order, etc) > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > OP_CLUSTER_GET_NODE_ATTRIBUTES > > > >> > > > > > > > > > > > > > > Request: int nodesCount, for each node: > > UUID > > > >> > nodeId > > > >> > > > > > > > > > > > > > > Response: int nodesCount, for each node: > > int > > > >> > > > > > > attributesCount, > > > >> > > > > > > > > for > > > >> > > > > > > > > > > > each > > > >> > > > > > > > > > > > > > node > > > >> > > > > > > > > > > > > > > attribute: String name, Object value > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > To execute tasks we need something like > > > these > > > >> > > methods > > > >> > > > > in > > > >> > > > > > > the > > > >> > > > > > > > > > client > > > >> > > > > > > > > > > > > API: > > > >> > > > > > > > > > > > > > > Object execute(String task, Object arg) > > > >> > > > > > > > > > > > > > > Future<Object> executeAsync(String task, > > > >> Object > > > >> > > arg) > > > >> > > > > > > > > > > > > > > Object affinityExecute(String task, > String > > > >> cache, > > > >> > > > > Object > > > >> > > > > > > key, > > > >> > > > > > > > > > > Object > > > >> > > > > > > > > > > > > arg) > > > >> > > > > > > > > > > > > > > Future<Object> > affinityExecuteAsync(String > > > >> task, > > > >> > > > String > > > >> > > > > > > > cache, > > > >> > > > > > > > > > > Object > > > >> > > > > > > > > > > > > > key, > > > >> > > > > > > > > > > > > > > Object arg) > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Which can be mapped to protocol > > operations: > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > OP_COMPUTE_EXECUTE_TASK > > > >> > > > > > > > > > > > > > > Request: UUID nodeId, String taskName, > > > Object > > > >> arg > > > >> > > > > > > > > > > > > > > Response: Object result > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > OP_COMPUTE_EXECUTE_TASK_AFFINITY > > > >> > > > > > > > > > > > > > > Request: String cacheName, Object key, > > > String > > > >> > > > taskName, > > > >> > > > > > > > Object > > > >> > > > > > > > > > arg > > > >> > > > > > > > > > > > > > > Response: Object result > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > The second operation is needed because > we > > > >> > sometimes > > > >> > > > > can't > > > >> > > > > > > > > > calculate > > > >> > > > > > > > > > > > and > > > >> > > > > > > > > > > > > > > connect to affinity node on the > > client-side > > > >> > > (affinity > > > >> > > > > > > > awareness > > > >> > > > > > > > > > can > > > >> > > > > > > > > > > > be > > > >> > > > > > > > > > > > > > > disabled, custom affinity function can > be > > > >> used or > > > >> > > > there > > > >> > > > > > can > > > >> > > > > > > > be > > > >> > > > > > > > > no > > > >> > > > > > > > > > > > > > > connection between client and affinity > > > node), > > > >> but > > > >> > > we > > > >> > > > > can > > > >> > > > > > > make > > > >> > > > > > > > > > best > > > >> > > > > > > > > > > > > effort > > > >> > > > > > > > > > > > > > > to send request to target node if > affinity > > > >> > > awareness > > > >> > > > is > > > >> > > > > > > > > enabled. > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Currently, on the server-side requests > > > always > > > >> > > > processed > > > >> > > > > > > > > > > synchronously > > > >> > > > > > > > > > > > > and > > > >> > > > > > > > > > > > > > > responses are sent right after request > was > > > >> > > processed. > > > >> > > > > To > > > >> > > > > > > > > execute > > > >> > > > > > > > > > > long > > > >> > > > > > > > > > > > > > tasks > > > >> > > > > > > > > > > > > > > async we should whether change this > logic > > or > > > >> > > > introduce > > > >> > > > > > some > > > >> > > > > > > > > kind > > > >> > > > > > > > > > > > > two-way > > > >> > > > > > > > > > > > > > > communication between client and server > > (now > > > >> only > > > >> > > > > one-way > > > >> > > > > > > > > > requests > > > >> > > > > > > > > > > > from > > > >> > > > > > > > > > > > > > > client to server are allowed). > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Two-way communication can also be useful > > in > > > >> the > > > >> > > > future > > > >> > > > > if > > > >> > > > > > > we > > > >> > > > > > > > > will > > > >> > > > > > > > > > > > send > > > >> > > > > > > > > > > > > > some > > > >> > > > > > > > > > > > > > > server-side generated events to clients. > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > In case of two-way communication there > can > > > be > > > >> new > > > >> > > > > > > operations > > > >> > > > > > > > > > > > > introduced: > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > OP_COMPUTE_EXECUTE_TASK (from client to > > > >> server) > > > >> > > > > > > > > > > > > > > Request: UUID nodeId, String taskName, > > > Object > > > >> arg > > > >> > > > > > > > > > > > > > > Response: long taskId > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > OP_COMPUTE_TASK_FINISHED (from server to > > > >> client) > > > >> > > > > > > > > > > > > > > Request: taskId, Object result > > > >> > > > > > > > > > > > > > > Response: empty > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > The same for affinity requests. > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Also, we can implement not only execute > > task > > > >> > > > operation, > > > >> > > > > > but > > > >> > > > > > > > > some > > > >> > > > > > > > > > > > other > > > >> > > > > > > > > > > > > > > operations from IgniteCompute > (broadcast, > > > run, > > > >> > > call), > > > >> > > > > but > > > >> > > > > > > it > > > >> > > > > > > > > will > > > >> > > > > > > > > > > be > > > >> > > > > > > > > > > > > > useful > > > >> > > > > > > > > > > > > > > only for java thin client. And even with > > > java > > > >> > thin > > > >> > > > > client > > > >> > > > > > > we > > > >> > > > > > > > > > should > > > >> > > > > > > > > > > > > > whether > > > >> > > > > > > > > > > > > > > implement peer-class-loading for thin > > > clients > > > >> > (this > > > >> > > > > also > > > >> > > > > > > > > requires > > > >> > > > > > > > > > > > > two-way > > > >> > > > > > > > > > > > > > > client-server communication) or put > > classes > > > >> with > > > >> > > > > executed > > > >> > > > > > > > > > closures > > > >> > > > > > > > > > > to > > > >> > > > > > > > > > > > > the > > > >> > > > > > > > > > > > > > > server locally. > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > What do you think about proposed > protocol > > > >> > changes? > > > >> > > > > > > > > > > > > > > Do we need two-way requests between > client > > > and > > > >> > > > server? > > > >> > > > > > > > > > > > > > > Do we need support of compute methods > > other > > > >> than > > > >> > > > > "execute > > > >> > > > > > > > > task"? > > > >> > > > > > > > > > > > > > > What do you think about > peer-class-loading > > > for > > > >> > thin > > > >> > > > > > > clients? > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > -- > > > >> > > > > > > > > > > > > > Sergey Kozlov > > > >> > > > > > > > > > > > > > GridGain Systems > > > >> > > > > > > > > > > > > > www.gridgain.com > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > -- > > > >> > > > > > > > > > > > Sergey Kozlov > > > >> > > > > > > > > > > > GridGain Systems > > > >> > > > > > > > > > > > www.gridgain.com > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > -- > > > >> > > > > > > > > > > Alex. > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >