Re: [PROPOSAL] Speed up _list and _update optimizing req object

Giovanni Lenzi Fri, 13 Nov 2015 04:03:51 -0800

> Every nodeX will have the same "notification" process, which is listening
to dbX/_changes.
sorry with "same" here, I mean same type of process, but obviously one
instance of it, running on each node


--Giovanni

2015-11-13 13:00 GMT+01:00 Giovanni Lenzi <g.le...@smileupps.com>:

> not sure I understood correctly.. what you mean is:
>
> I create 3 nodes:
> node1 with single database named db1
> node2 with single database named db2
> node3 with single database named db3
>
> Then I create 3 continuous replication: db1 <-> db2, db1<-> db3, db2 <->db3
>
> Every nodeX will have the same "notification" process, which is listening
> to dbX/_changes.
>
> What you mean is then: "I use db_name as filter instead of node_name,
> given that every nodeX will have one and only one single database dbX".
> Right?
>
>
>
> --Giovanni
>
> 2015-11-13 11:44 GMT+01:00 Alexander Shorin <kxe...@gmail.com>:
>
>> On Fri, Nov 13, 2015 at 1:28 PM, Giovanni Lenzi <g.le...@smileupps.com>
>> wrote:
>> >> No, slow is gathering all the stats. Especially in cluster. The
>> >> db_name you can get from req.userCtx without problem.
>> >>
>> >
>> > Does req.userCtx contain also db_name currently? I thought it was only
>> for
>> > user data (username and roles). Are you saying that it would be
>> possible
>> > to gather db_name only or you are forced to fetch the entire set only?
>> >
>>
>> not db_name exactly, but:
>>
>>     "userCtx": {
>>         "db": "mailbox",
>>         "name": "Mike",
>>         "roles": [
>>             "user"
>>         ]
>>     }
>>
>>
>> >> > Also I was wondering how heavy could be to include some kind of
>> machine
>> >> > identifier(hostname or ip address of machine running couchdb) inside
>> of
>> >> the
>> >> > request object?
>> >>
>> >> What is use case for this? Technically, req.headers['Host'] points on
>> >> the requested CouchDB.
>> >>
>> >> > Or if you want to make it even more flexible: how heavy could be to
>> >> include
>> >> > a configuration parameter inside of the request object?
>> >> >
>> >> > That could be of great help in some N-nodes master-master redunded
>> >> database
>> >> > configurations, to let one node only(the write node) handle some
>> specific
>> >> > background action.
>> >>
>> >> Can you describe this problem a little bit more? How this
>> >> configuration parameter could be used and what it will be?
>> >>
>> >>
>> > Ok let's think to a 2-node setup with master-master replication set up
>> and
>> > a round-robin load-balancer in front of them. In normal condition, with
>> > master-master replication you can balance both read and write requests
>> to
>> > every node, right?
>> >
>> > Now, let's think we need backend services too(email, sms, payments) by
>> > using some plugin or node.js process(like triggerjob). These  react to
>> > database _changes, execute some background task and then update the same
>> > document with a COMPLETED state. The drawback is that, in N-node
>> > configuration, every node is going to execute same background tasks(2 or
>> > N-emails will be sent instead of 1, 2 payment transaction instead of 1
>> and
>> > so on).
>> >
>> > Ok, you may say, with haproxy you can balance only reads(GET,HEAD) and
>> use
>> > one node only for writes. But what if the write-node goes down? I won't
>> > have the chance to write anymore, only read.
>> >
>> > BUT we can probably do better.. let's step back to balance both read and
>> > writes. If we have a way to specify, in the update function itself,
>> which
>> > node is in charge of executing those tasks, they could then be executed
>> > only once! A trivial, but efficient solution which comes to my mind is:
>> let
>> > the backend task be handled by the node who received the write request.
>> If
>> > the update function knows some kind of machine identifier (or
>> configuration
>> > parameter previously setup), it could mark the task in the document
>> itself
>> > with the name of the machine responsible for its execution. The plugin
>> or
>> > node-js process may then execute only tasks allocated to him, by simply
>> > using a filtered _changes request with his own node name.
>> >
>> > This solution has the benefit of letting system administrators to have
>> > identical N nodes (same data, same ddocs and configuration, only node
>> name
>> > differs) which balance both read, write requests and backend task
>> > processing. In this way you may then scale out by simply spawning a new
>> > node with the same amazon AMI as example.
>> >
>> > Am I missing something?
>>
>> That's what 2.0 is going to solve (:
>>
>> For 1.x I would use the following configuation:
>>
>> db1 --- /_changes --\
>> db2 --- /_changes ---> notification-process -> notification-db
>> dbN --- /_changes --/
>>
>> In notification db you store all the tasks that are need to be done
>> and are already done. Since your db1, db2, dbN are in sync, their
>> changes feed will eventually produce similar events which you'll have
>> to filter by using your notification-db data.
>>
>> --
>> ,,,^..^,,,
>>
>
>

Re: [PROPOSAL] Speed up _list and _update optimizing req object

Reply via email to