Re: [ovs-dev] OVN meeting report
Hi Ben, On 13.04.2017 20:53, Ben Pfaff wrote: On Wed, Apr 12, 2017 at 06:09:28PM +0500, Valentine Sinitsyn wrote: Hi, On 04.04.2017 15:29, Valentine Sinitsyn wrote: On 03.04.2017 20:29, Valentine Sinitsyn wrote: Hi Ben, On 23.03.2017 08:11, Ben Pfaff wrote: Hello everyone. I am not sure whether I am going to be able to attend the OVN meeting tomorrow, because I will be in another possibly distracting meeting, so I'm going to give my report here. Toward the end of last week I did a full pass of reviews through patchwork. The most notable result, I think, is that I applied patches that add 802.1ad support. For OVN, this makes it more reasonable to consider adding support for tagged logical ports--currently, OVN drops all tagged logical packets--which I've heard requested once or twice, because it means that they can now be gatewayed to physical ports within an outer VLAN. I don't have any plans to work on that, but I think that it is worth pointing out. The OVS "Open Source Day" talks have been scheduled at OpenStack Boston. They are all on Wednesday: https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 I've been spending what dev time I have on database clustering. Today, I managed to get it working, with many caveats. It will take weeks or months longer to get it finished, tested, and ready for posting. (If you want what I have, check out the raft3 branch in my ovs-reviews repo at github.) I've checked out your raft3 branch, and even learned how to create an OVSDB cluster. Thanks for the docs! What I don't get though is how do I instruct IDL to connect to the cluster now? Do I just connect to a random server, or there should be some dispatcher, or whatever? OK I see this is an ongoing work in your branch. I had some time to play with raft3 branch last week. I added very basic and hacky replica set support to IDL and brought up an OVN setup with clustered southbound database. It works to some extent, yet if I try to throw several hundreds of logical ports into the mix, the database becomes inconsistent. The reason is probably the race window between when the raft leader appends a log entry to other nodes (so a client such as ovn-northd already sees it) and the entry really appears in the leader's log itself. Not sure if it is my bug or not. The original code had some minor issues as well (which is absolutely normal for WIP) - I can send my (rather trivial) patches if there is any interest. I'm not surprised that there are inconsistency bugs. The testing I've done so far is really sketchy. Let me assure you that I will implement much more thorough testing before I will propose anything to be merged. Sure, I didn't expect it to be bug free either. Is there some design outline for the missing implementation bits? Specifically, it would be good to know the following: 1. With clustered OVSDB, a client such as IDL needs two JSON RPC connections: to the leader (to commit transactions), and a read-only one to an arbitrary replica set (scaling reads). Will it be implemented on ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems natural yet multiple remotes support went to jsonrpc_session already. There are multiple possible approaches here. The one that I am planning to try out first is to have a client connect to only one randomly selected server, and then have that server be responsible for relaying write transactions to the leader. Yes, this is an option. However, our tests suggest that ovsdb-server doesn't scale well with respect to (hundreds to thousands) connections. This relay approach adds at most one new connection within the cluster per new client connection, which could be a bottleneck. Thanks, Valentine 2. How does the client know which replica set member is currently a leader? I just loop over remotes until one accepts the transaction (which is an awful idea). It would be nice to send some sort of cluster metadata snapshot to JSON RPC client during initial handshake. Alternatively, one can extend the "not leader" error object with a leader URL. If we do adopt the idea that followers relay write transactions to the leader, then the client doesn't need to know the leader. But if that isn't practical, then the Raft thesis, section 6.2, suggests the same idea as you did, of having the follower point to the leader if it knows it. 3. For eventual consistency reasons, if an IDL reads from one member (A) but writes to another one (B), it can try to delete a row not yet in A's database. This would make all further requests fail with "inconsistent data" error and basically is what I observe in my tests. How do you plan to overcome this? This sounds like a bug in the existing code (not too surprising). What is supposed to happen is that the client waits until it receives updated data from the server, which it knows will eventually arrive because it knows that its write wa
Re: [ovs-dev] OVN meeting report
Hi, On 04.04.2017 15:29, Valentine Sinitsyn wrote: On 03.04.2017 20:29, Valentine Sinitsyn wrote: Hi Ben, On 23.03.2017 08:11, Ben Pfaff wrote: Hello everyone. I am not sure whether I am going to be able to attend the OVN meeting tomorrow, because I will be in another possibly distracting meeting, so I'm going to give my report here. Toward the end of last week I did a full pass of reviews through patchwork. The most notable result, I think, is that I applied patches that add 802.1ad support. For OVN, this makes it more reasonable to consider adding support for tagged logical ports--currently, OVN drops all tagged logical packets--which I've heard requested once or twice, because it means that they can now be gatewayed to physical ports within an outer VLAN. I don't have any plans to work on that, but I think that it is worth pointing out. The OVS "Open Source Day" talks have been scheduled at OpenStack Boston. They are all on Wednesday: https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 I've been spending what dev time I have on database clustering. Today, I managed to get it working, with many caveats. It will take weeks or months longer to get it finished, tested, and ready for posting. (If you want what I have, check out the raft3 branch in my ovs-reviews repo at github.) I've checked out your raft3 branch, and even learned how to create an OVSDB cluster. Thanks for the docs! What I don't get though is how do I instruct IDL to connect to the cluster now? Do I just connect to a random server, or there should be some dispatcher, or whatever? OK I see this is an ongoing work in your branch. I had some time to play with raft3 branch last week. I added very basic and hacky replica set support to IDL and brought up an OVN setup with clustered southbound database. It works to some extent, yet if I try to throw several hundreds of logical ports into the mix, the database becomes inconsistent. The reason is probably the race window between when the raft leader appends a log entry to other nodes (so a client such as ovn-northd already sees it) and the entry really appears in the leader's log itself. Not sure if it is my bug or not. The original code had some minor issues as well (which is absolutely normal for WIP) - I can send my (rather trivial) patches if there is any interest. Is there some design outline for the missing implementation bits? Specifically, it would be good to know the following: 1. With clustered OVSDB, a client such as IDL needs two JSON RPC connections: to the leader (to commit transactions), and a read-only one to an arbitrary replica set (scaling reads). Will it be implemented on ovsdb_idl level or encapsulated inside jsonrpc_session? The former seems natural yet multiple remotes support went to jsonrpc_session already. 2. How does the client know which replica set member is currently a leader? I just loop over remotes until one accepts the transaction (which is an awful idea). It would be nice to send some sort of cluster metadata snapshot to JSON RPC client during initial handshake. Alternatively, one can extend the "not leader" error object with a leader URL. 3. For eventual consistency reasons, if an IDL reads from one member (A) but writes to another one (B), it can try to delete a row not yet in A's database. This would make all further requests fail with "inconsistent data" error and basically is what I observe in my tests. How do you plan to overcome this? Thanks in advance! Valentine Best, Valentine Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- С уважением, Синицын Валентин ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN meeting report
On 03.04.2017 20:29, Valentine Sinitsyn wrote: Hi Ben, On 23.03.2017 08:11, Ben Pfaff wrote: Hello everyone. I am not sure whether I am going to be able to attend the OVN meeting tomorrow, because I will be in another possibly distracting meeting, so I'm going to give my report here. Toward the end of last week I did a full pass of reviews through patchwork. The most notable result, I think, is that I applied patches that add 802.1ad support. For OVN, this makes it more reasonable to consider adding support for tagged logical ports--currently, OVN drops all tagged logical packets--which I've heard requested once or twice, because it means that they can now be gatewayed to physical ports within an outer VLAN. I don't have any plans to work on that, but I think that it is worth pointing out. The OVS "Open Source Day" talks have been scheduled at OpenStack Boston. They are all on Wednesday: https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 I've been spending what dev time I have on database clustering. Today, I managed to get it working, with many caveats. It will take weeks or months longer to get it finished, tested, and ready for posting. (If you want what I have, check out the raft3 branch in my ovs-reviews repo at github.) I've checked out your raft3 branch, and even learned how to create an OVSDB cluster. Thanks for the docs! What I don't get though is how do I instruct IDL to connect to the cluster now? Do I just connect to a random server, or there should be some dispatcher, or whatever? OK I see this is an ongoing work in your branch. Best, Valentine Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN meeting report
Hi Ben, On 23.03.2017 08:11, Ben Pfaff wrote: Hello everyone. I am not sure whether I am going to be able to attend the OVN meeting tomorrow, because I will be in another possibly distracting meeting, so I'm going to give my report here. Toward the end of last week I did a full pass of reviews through patchwork. The most notable result, I think, is that I applied patches that add 802.1ad support. For OVN, this makes it more reasonable to consider adding support for tagged logical ports--currently, OVN drops all tagged logical packets--which I've heard requested once or twice, because it means that they can now be gatewayed to physical ports within an outer VLAN. I don't have any plans to work on that, but I think that it is worth pointing out. The OVS "Open Source Day" talks have been scheduled at OpenStack Boston. They are all on Wednesday: https://www.openstack.org/summit/boston-2017/summit-schedule/#track=135 I've been spending what dev time I have on database clustering. Today, I managed to get it working, with many caveats. It will take weeks or months longer to get it finished, tested, and ready for posting. (If you want what I have, check out the raft3 branch in my ovs-reviews repo at github.) I've checked out your raft3 branch, and even learned how to create an OVSDB cluster. Thanks for the docs! What I don't get though is how do I instruct IDL to connect to the cluster now? Do I just connect to a random server, or there should be some dispatcher, or whatever? Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Don't save pkt_mark in compose_output_action__().
Hi, On 17.03.2017 22:55, Ben Pfaff wrote: Previously, this function could modify the pkt_mark field as part of IPsec integration. It no longer does that, so there's no longer any need for it to save and restore pkt_mark, and this commit removes that. Does it mean that now there is no way to send a bit of information across a pair of patch ports, that is, mark a packet on bridge A and check the mark on bridge B? Thanks, Valentine CC: Ansis AttekaSigned-off-by: Ben Pfaff --- ofproto/ofproto-dpif-xlate.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 1a82b8d569be..9fe778a32857 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -3265,7 +3265,6 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port, struct flow *flow = >xin->flow; struct flow_tnl flow_tnl; union flow_vlan_hdr flow_vlans[FLOW_MAX_VLAN_HEADERS]; -uint32_t flow_pkt_mark; uint8_t flow_nw_tos; odp_port_t out_port, odp_port; bool tnl_push_pop_send = false; @@ -3460,7 +3459,6 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port, } memcpy(flow_vlans, flow->vlans, sizeof flow_vlans); -flow_pkt_mark = flow->pkt_mark; flow_nw_tos = flow->nw_tos; if (count_skb_priorities(xport)) { @@ -3588,7 +3586,6 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port, out: /* Restore flow */ memcpy(flow->vlans, flow_vlans, sizeof flow->vlans); -flow->pkt_mark = flow_pkt_mark; flow->nw_tos = flow_nw_tos; } ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Reproducing ovn-scale-test results
Hi Han, On 17.03.2017 23:36, Han Zhou wrote: On Fri, Mar 17, 2017 at 2:50 AM, Valentine Sinitsyn <valentine.sinit...@gmail.com <mailto:valentine.sinit...@gmail.com>> wrote: Did you restart controllers or ovn-northd after running full tests before binding more ports. It would be interesting to learn how long does it take warm up IDL in controllers/northd in your setup. No need to restart. It will cool down when test ends. On top of the scale, we can just test creating & binding 500 lports and then tear down the 500 lports before next test, which takes much less time than the full run from empty. Moreover, in current ovs-scale-test code, the step "wait 100 lport up" is updated utilizing a new feature (wait for HVs to catch up) that was Are you referring to ovn-nbctl --wait-until, or something else? If you were not using it for the talk, what exactly does "create + bind" mean on the graph? --wait-until has been used always, to wait lport state become "up" in NB, which means port binding is reflected in NB. The new change [1] was using the new feature of ovn-nbctl: "sync --wait=hv" which will wait until the port binding to be processed on all HVs. This feature was not there yet when we had the talk, and no reasonable alternative to achieve the same. Got you know. This may affect the test negatovely indeed, as OVSDB seems to scale poorly with the number of clients, as our tests suggest. Thanks you again. Valentine [1] https://github.com/openvswitch/ovn-scale-test/commit/0ece1038de45f05f461b45162b21a8bde2793010 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Reproducing ovn-scale-test results
Hi, On 17.03.2017 02:24, Han Zhou wrote: On Thu, Mar 16, 2017 at 1:06 PM, Valentine Sinitsyn <valentine.sinit...@gmail.com <mailto:valentine.sinit...@gmail.com>> wrote: Hi Han, Thanks for the quick answer. On 17.03.2017 00:34, Han Zhou wrote: On Thu, Mar 16, 2017 at 3:58 AM, Valentine Sinitsyn <valentine.sinit...@gmail.com <mailto:valentine.sinit...@gmail.com> <mailto:valentine.sinit...@gmail.com <mailto:valentine.sinit...@gmail.com>>> wrote: Hi all, We are doing some stress testing on OVN 2.7, and wanted to reproduce results from the talk [1]. Looking at ovn-scale-test sources, I have two questions: Hi Valentine, Thanks for picking this up. - Do I get correctly that the benchmark always starts with the empty northbound db. Then lswitches are added, then you add ports to each lswitch? Yes, the test result shown in the talk was started from empty to gradually reach 20k lports on 200 lswitches. - What is the batch size in port_create_args? I remember it was 100. In addition, there were 5 jobs running in parallel. +Lei to confirm. Could you recall how long (approximately) does it take to create and bind 20K ports with these settings? This would be really helpful. I don't have the raw data now, but it took around 1 - 2 hours. We don't always run the full test, but after the full test is completed, we can just run another task to create and bind 1k more lports to evaluate the optimizations in each iteration on top of the existing scale. Did you restart controllers or ovn-northd after running full tests before binding more ports. It would be interesting to learn how long does it take warm up IDL in controllers/northd in your setup. One more thing, the graph shared also involved sandbox (simulated HV) creation and lswitch creation. They were all created gradually during the test run. The flow was like: 1. create 50 sandboxes 2. (5 jobs in parallel) create 1 lswitch, create 100 lports, bind 100 lports, wait 100 lport up 3. if there are 100 sandboxes already on the BM, switch to another BM 4. goto step1, until it is done for all 20 BMs. Moreover, in current ovs-scale-test code, the step "wait 100 lport up" is updated utilizing a new feature (wait for HVs to catch up) that was Are you referring to ovn-nbctl --wait-until, or something else? If you were not using it for the talk, what exactly does "create + bind" mean on the graph? Many thanks again. Valentine added after the report, and we didn't run the test again yet with this change. I would expect it impact the test result slightly negatively, but it would more accurate. In short: is it true that for the setup involving (say) 1 ports spanned over 100 lswitches in the aforementioned test, a Rally task would look like this? { "version": 2, "title": "Create and bind port", "subtasks": [{ "title": "Create and bind port", "workloads": [{ "name": "OvnNetwork.create_and_bind_ports", "args": { "network_create_args": { "amount": 100, "batch": 1, "start_cidr": "172.16.1.0/24 <http://172.16.1.0/24> <http://172.16.1.0/24>", "physical_network": "providernet" }, "port_create_args" : {"batch": 2}, "ports_per_network": 100, "port_bind_args": {"wait_up": true} }, "runner": { "type": "serial","times": 1}, "context": { "ovn_multihost" : { "controller": "ovn-controller-node" }, "sandbox":{ "tag": "ToR1"} } }] }] } 1. https://youtu.be/okralc7LrZo?t=1185 Thanks, Valentine -- С уважением, Синицын Валентин ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Multi-threaded OVSDB
Hi Andy, On 17.03.2017 02:28, Andy Zhou wrote: On Thu, Mar 16, 2017 at 1:52 PM, Ben Pfaff <b...@ovn.org> wrote: On Thu, Mar 16, 2017 at 11:38:19PM +0500, Valentine Sinitsyn wrote: On 16.03.2017 20:56, Ben Pfaff wrote: On Tue, Mar 14, 2017 at 07:08:54PM +0500, Valentine Sinitsyn wrote: Recently, I was evaluating a multi-threaded OVSDB/ovn-northd design, and came across the patchset [1]. Looks like this RFC patchset was received well, but never completed. What's the reason? No real performance benefits, lack of interest, other high-priority tasks or whatever? It's kind of a combination of those. Andy got preempted by other higher-priority work, plus it's unclear whether threading ovsdb-server solves an important problem at this time. I'm currently working on adding clustering support to OVSDB, which ought to allow scaling out reads, which are most of the OVN workload, so that might solve the same problem in a different way. This sounds promising. Are you planning something Mongo-like, that is, one server writes should be directed to, and all servers serving reads? That's essentially the planned approach. This should allow better scaling out reads. Half an hour isn't really acceptable and OVN should aim to do much better than that. In our tests, it takes about half an hour (and a few hundred reconnects) to send an initial snapshot of a large southbound database to 1000+ OVN 2.7 controllers. This makes disaster recovery plan a pain. Should we expect things to get better here (we can probably contribute to this, if feasible)? I'd expect that the clustered database design should scale pretty well for reads, which are most of the OVN workload. I'll have to have something actually working before we can test and tune it, though. As for multi-threaded OVSDB, the latest patch series I found in Andy's fork segfaults just after startup, so we can't even do a quick test to check if it makes things better for us or not. I don't know whether Andy thought it was ready for testing. It was work-in-progress, not ready for testing. I have since worked a bit more to multi-thread all OVSDB sever features and found the changes will make OVSDB server quite more complex. Given that Ben is working on clustering, it may not be wise to make two major changes at the same time. I plan to revisit multi-threading after the clustering changes are in. This sounds sane. As things are going to change significantly, perhaps I'd stop trying to put mt6 branch in the Andy's fork into testable state. I've fixed a use-after-free bug (see the patch attached), but it still smashes the stack or ends up with garbage in barrier->seq instead of a pointer in ovs_barrier_block(). I haven't figured out the reason. As for the clustering, we are currently looking for ways to scale OVSDB, and will be happy to be early adopters for this. As I mentioned previously, we can also contribute code if it would make things go quicker, so I can persuade my manager this is a worthwhile investment of time :) Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Reproducing ovn-scale-test results
Hi Han, Thanks for the quick answer. On 17.03.2017 00:34, Han Zhou wrote: On Thu, Mar 16, 2017 at 3:58 AM, Valentine Sinitsyn <valentine.sinit...@gmail.com <mailto:valentine.sinit...@gmail.com>> wrote: Hi all, We are doing some stress testing on OVN 2.7, and wanted to reproduce results from the talk [1]. Looking at ovn-scale-test sources, I have two questions: Hi Valentine, Thanks for picking this up. - Do I get correctly that the benchmark always starts with the empty northbound db. Then lswitches are added, then you add ports to each lswitch? Yes, the test result shown in the talk was started from empty to gradually reach 20k lports on 200 lswitches. - What is the batch size in port_create_args? I remember it was 100. In addition, there were 5 jobs running in parallel. +Lei to confirm. Could you recall how long (approximately) does it take to create and bind 20K ports with these settings? This would be really helpful. Thanks, Valentine In short: is it true that for the setup involving (say) 1 ports spanned over 100 lswitches in the aforementioned test, a Rally task would look like this? { "version": 2, "title": "Create and bind port", "subtasks": [{ "title": "Create and bind port", "workloads": [{ "name": "OvnNetwork.create_and_bind_ports", "args": { "network_create_args": { "amount": 100, "batch": 1, "start_cidr": "172.16.1.0/24 <http://172.16.1.0/24>", "physical_network": "providernet" }, "port_create_args" : {"batch": 2}, "ports_per_network": 100, "port_bind_args": {"wait_up": true} }, "runner": { "type": "serial","times": 1}, "context": { "ovn_multihost" : { "controller": "ovn-controller-node" }, "sandbox":{ "tag": "ToR1"} } }] }] } 1. https://youtu.be/okralc7LrZo?t=1185 Thanks, Valentine -- С уважением, Синицын Валентин ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Reproducing ovn-scale-test results
Hi all, We are doing some stress testing on OVN 2.7, and wanted to reproduce results from the talk [1]. Looking at ovn-scale-test sources, I have two questions: - Do I get correctly that the benchmark always starts with the empty northbound db. Then lswitches are added, then you add ports to each lswitch? - What is the batch size in port_create_args? In short: is it true that for the setup involving (say) 1 ports spanned over 100 lswitches in the aforementioned test, a Rally task would look like this? { "version": 2, "title": "Create and bind port", "subtasks": [{ "title": "Create and bind port", "workloads": [{ "name": "OvnNetwork.create_and_bind_ports", "args": { "network_create_args": { "amount": 100, "batch": 1, "start_cidr": "172.16.1.0/24", "physical_network": "providernet" }, "port_create_args" : {"batch": 2}, "ports_per_network": 100, "port_bind_args": {"wait_up": true} }, "runner": { "type": "serial","times": 1}, "context": { "ovn_multihost" : { "controller": "ovn-controller-node" }, "sandbox":{ "tag": "ToR1"} } }] }] } 1. https://youtu.be/okralc7LrZo?t=1185 Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Multi-threaded OVSDB
Hi all, Recently, I was evaluating a multi-threaded OVSDB/ovn-northd design, and came across the patchset [1]. Looks like this RFC patchset was received well, but never completed. What's the reason? No real performance benefits, lack of interest, other high-priority tasks or whatever? 1. https://mail.openvswitch.org/pipermail/ovs-dev/2016-March/310673.html Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] -fomit-frame-pointer in OVS
Hi, Currently, OVS seems to disable frame pointer (-fomit-frame-pointer) in non-debug builds. While I do know this is a common optimization, it makes run-time profiling substantially less straightforward. So I wonder, if there are any benchmarks showing the effect of omitting frame pointer, especially on x86-64? Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [branch-2.7] Set release date for 2.7.0.
Hi, Open vSiwtch 2.7 won't be an LTS release, right? Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] OVN: Preserving state between logical datapaths
Hi all, Imagine you want to mark a packet in logical switch datapath then use this mark in logical router datapath somehow (an artificial use-case would be policy routing based on VM port, not destination IP address). Is there a better way than using packet mark (which also doesn't seem to survive "output" action, yet it's easily fixable)? I assume OVS/OVN 2.6 on Linux with in-kernel datapath, if this matters. Many thanks, Valentine Sinitsyn ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Quick question on Linux expo 2017
On 16.02.2017 11:52, Russell Bryant wrote: On Thu, Feb 16, 2017 at 12:53 AM, Valentine Sinitsyn <valentine.sinit...@gmail.com <mailto:valentine.sinit...@gmail.com>> wrote: Hi all, I feel strange about replying to seemingly spam emails, but is it the same Southern California Linux Expo as in [1]? Yes, I believe this was spam. Yes, but what worries me is and attempt to sell private data which IMO compromises a respected event, not spam. We all know this happens but I never saw it happened on a public mailing list before (by accident, I suppose). Sorry if I got things wrong. Valentine -- Russell Bryant ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Quick question on Linux expo 2017
Hi all, I feel strange about replying to seemingly spam emails, but is it the same Southern California Linux Expo as in [1]? If so, isn't it a brute violation of the Terms and Conditions [2], quoted below? Privacy Policy The Linux Expo of Southern California gathers personal information from people who register to attend the So Cal Linux Expo. This information is ONLY used by SCALE to improve future Expos. If other groups that participate in SCALE ask for information about attendees to improve their offering at SCALE, and we agree to share it, it will be only those demographics that are not identifiable to any individual. Although I heard of the event just a few minutes ago, this doesn't look good for me. 1. https://www.socallinuxexpo.org/scale/15x 2. https://www.socallinuxexpo.org/scale/15x/policies Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] how to create new ovsdb tables and code c funtion related?
Hi, On 15.02.2017 10:08, lg.yue wrote: Hi, everyone: 一. after i change ovn/ovn-[sn]b.ovsschema, how to apply the changes? makefile compiles ovn/ovn-[sn]b.ovsschema to ovn/lib/ovn-[s,n]b-idl.ovsidl, i can not find anyone use ovn-[s,n]b-idl.ovsidl. As the name suggests, *.ovsschema defines schema only. You still need to write code which uses your new tables. It should be possible to fill them with ovs-vsctl, but without something reading them, this would be dead data. 二. lets take sb for example. ovsdb-server --detach --monitor -vconsole:off --log-file=/var/log/openvswitch/ovsdb-server-sb.log --remote=punix:/run/openvswitch/ovnsb_db.sock --pidfile=/run/openvswitch/ovnsb_db.pid --remote=db:OVN_Southbound,SB_Global,connections --unixctl=ovnsb_db.ctl --private-key=db:OVN_Southbound,SSL,private_key --certificate=db:OVN_Southbound,SSL,certificate --ca-cert=db:OVN_Southbound,SSL,ca_cert /var/lib/openvswitch/ovnsb_db.db 1. who launches the ovsdb-server process? and when? Usually this happens from the init scripts. The real machinery is under ${PREFIX}/share/openvswitch/scripts. 2. how /var/lib/openvswitch/ovnsb_db.db is created? You can use ovsdb-tool create, see INSTALL.md. ovn-ctl script called from init scripts should handle this automatically. 3. 'ovn-sbctl set-connection ptp:6642' how this instruction associates port 6642 with ovsdb-server (please tell the source code) 三. supported the new db is created, whether or not i need to code c funtion like sbrec_idl_class in ovn/lib/ovn-sb-idl.c This is basically a compiled schema definition. You should start at ovn/utilities/ovn-sbctl.c too many questions, please help me figure it out. thanks very much Just don't forget to consult the documentation, some of your questions are already answered there ;-) Valentine At 2017-02-14 21:15:42, "Valentine Sinitsyn" <valentine.sinit...@gmail.com> wrote: Hi, Look at ovn/ovn-[sn]b.ovsschema. It's JSON. You'll also need to update the documentation and the cksum; make explains how to do it. Best, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] how to set up ovn's nb and sb table and column structure?
Hi, Look at ovn/ovn-[sn]b.ovsschema. It's JSON. You'll also need to update the documentation and the cksum; make explains how to do it. Best, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 00/16] port Jiri Benc's L3 patchset to ovs
Hi Jan, On 10.02.2017 04:14, Jan Scheurich wrote: Hi Valentine, On 2017-02-09 08:58, Valentine Sinitsyn wrote: This L3 patchset looks similar to what we did internally with OVS 2.6 to add support for IPv6 tunnels. Could you please confirm that ovs-dpctl reports correct statistics with this patchset when one uses in-kernel Linux datapath? We had some issues with this (the counters were always zero). Largely, this was because userspace code (I refer to the tools and the daemon, not DPDK datapath here) assumes a plugged network interface is always L2, and I don't see this patch touching these files. The most recent user-space code in vswitchd that deals with the L3 tunnels in netdev and kernel datapath is contained in another patch series: https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328391.html You could help in reviewing that. It may not be complete with respect to handling kernel datapath tunnels as we were not able to test yet due to a lack of patches to configure L3 tunnel ports in the kernel. But similar problems with counters might also exist with netdev datapath. I had a quick look at patch series, thanks for the links. Will try to have a more in-depth look this week. As for the "counters issue" I mentioned, it seems tiny given the scope of the patchset. Moreover, a get_etheraddr() change in [1] should fix it, although I haven't checked yet. [1] https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328392.html Best, Valentine Thanks, Jan ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 00/16] port Jiri Benc's L3 patchset to ovs
Hi all, This L3 patchset looks similar to what we did internally with OVS 2.6 to add support for IPv6 tunnels. Could you please confirm that ovs-dpctl reports correct statistics with this patchset when one uses in-kernel Linux datapath? We had some issues with this (the counters were always zero). Largely, this was because userspace code (I refer to the tools and the daemon, not DPDK datapath here) assumes a plugged network interface is always L2, and I don't see this patch touching these files. Thanks for your co-operation. Best regards, Valentine Sinitsyn ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Flow key update in conntrack/nat
Hi Joe, On 11.01.2017 23:30, Joe Stringer wrote: On 11 January 2017 at 02:47, Valentine Sinitsyn <valentine.sinit...@gmail.com> wrote: Hi all, I'm struggling to find an answer to a seemingly simple question: why does "ct(nat)" action need to update the flow key after NAT (see ovs_nat_update_key())? My confusion comes from the following scenario. Consider the first to-be-NATed packet coming. There is no datapath flow installed, so this results in an upcall. The userspace part will then install a new datapath flow (using original, unmodified flow key it got) and execute the action. Subsequent packets will be handled in the kernel automatically, but again, the ovs_nat_update_key() flow key will be silently discarded in ovs_vport_receive(). So it looks like the modified flow key is never used. What am I missing here? This depends on your flow table. If another lookup needs to occur (eg, ct(table=N,...) option), or the packet is sent to userspace (sflow,ipfix, etc), then the updated flow key needs to be provided - in datapath, recirc (if it triggers upcall) or userspace actions. Most OVS actions in the datapath modify the key in-place so that it is correct whenever it needs to be used; the key doesn't need to be completely repopulated afresh when it is needed. Thanks for answering. So the point I was missing is that there could be other actions following 'ct(nat)', which may use the flow. Makes sense now. Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Flow key update in conntrack/nat
Hi all, I'm struggling to find an answer to a seemingly simple question: why does "ct(nat)" action need to update the flow key after NAT (see ovs_nat_update_key())? My confusion comes from the following scenario. Consider the first to-be-NATed packet coming. There is no datapath flow installed, so this results in an upcall. The userspace part will then install a new datapath flow (using original, unmodified flow key it got) and execute the action. Subsequent packets will be handled in the kernel automatically, but again, the ovs_nat_update_key() flow key will be silently discarded in ovs_vport_receive(). So it looks like the modified flow key is never used. What am I missing here? Thanks, Valentine Sinitsyn ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] Per-switch configuration in datapath actions
Hi all, Suppose you are implementing a custom OpenFlow action, and you need some per-bridge configuration to translate it into a datapath action. Which would be the architecturally correct way to promote this bit of information from OVSDB to somewhere inside struct xlate_ctx? Thanks for your suggestions! -- Best regards, Valentine Sinitsyn ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] [PATCH] ovn: Support sample action in logical datapath
On 30.11.2016 06:18, Ben Pfaff wrote: On Tue, Nov 29, 2016 at 07:22:32PM +0500, Valentine Sinitsyn wrote: On 29.11.2016 05:21, Ben Pfaff wrote: On Fri, Oct 14, 2016 at 04:35:46PM +0500, Valentine Sinitsyn wrote: This is a quick attempt to implement sample action at logical port level.The goal is to export IPFIX flows for logical ports, yet it is easy to extend this approach to logical switches as well. Nothing is done to provision OVS instances with required Flow_Sample_Collector_Set and IPFIX entries at this point. This is pretty cool! The integration among OVS and OVN and IPFIX is graceful. The part that worries me is the CMS integration. Have you actually built that integration already (for which CMS)? I have two concerns. First, I'd prefer to see at least one CMS (probably OpenStack) support this at or around the time that it goes into OVN. Second, I have some skepticism around the idea that the CMS should configure the Flow_Sample_Collector_Set, etc., because OVN doesn't currently require the CMS to have any connectivity to OVSDB on each of the hypervisors and this would require the CMS to add that support. I agree that this particular bit is somewhat hacky. We plan to follow this route for an in-house CMS we build, but I doubt OpenStack community would pickup the idea. What alternatives do you see here? Having collector config at south db level doesn't seem clear either. Think I want to configure collector at 127.0.0.1:5900 - which localhost does this entry refer to? Is this a common way to configure IPFIX? I had been under the impression that generally there's one or a few collectors in a network, to which each switch forwards packets. If it's common to use a per-hypervisor collector, then that might actually makes thing easier, since that would be easy for ovn-controller to configure into OVS on each hypervisor. Running collectors local to hypervisors is what we do here. I can't say if it's a common scenario, but given that IPFIX is most often UDP which can be lost, it usually makes sense to keep collectors and exporters as close as possible. Otherwise, I'm inclined to at least learn what the requirements would be for common deployments of IPFIX. Even if we don't implement it them (or all of them), it's important to me to know what we're leaving out so that what we add now is built in a way that it's gracefully extensible later. For example: if a packet should be sent to a collector, should the collector be chosen based on the packet's logical network, or based on the packet's physical network (the hypervisor it's ingressing or egressing), or on some combination of those? I also find myself wondering whether logical port level is the right level at which to choose whether to sample packets. Will OVN users want finer-grained control over sampling and, if so, would it make more sense to add an ACL-like table for that purpose at the northbound level? You mean using an lflow match to control when the "sample" action will trigger, rather than hard-wiring these actions to logical ports via "ipfix_options"? This sounds reasonable and not to hard to implement, given that we already have these tables in the southbound db. If the sample() integration looks good, CMS assumptions aside, is there a chance to merge it as a stand-alone action? That's true no publicly available CMS would use it for a while, but when they decide to, the code would already be there. And the code is not dead, as we'll be using it as well. It's better than no users at all. Do you have any thoughts about supporting other monitoring technology that OVS supports (e.g. sFlow) using similar techniques? I haven't targeted any of them specifically, but it doesn't seem to be a daunting task. One only need some way to associate sample() instance and a sFlow receiver the same way collector_set_id does for IPFIX. I'd suggest to generalize Flow_Sample_Collector_Set somehow, but we agreed configuring things through this table in OVN scenario is suboptimal. Any thoughts? Did we have an earlier discussion? I've spent a few minutes searching my email archive and I don't see one. If there was one, can you point it out? No, no prior discussion, sorry for being unclear. I was referring to your concerns regarding CMS integration in the beginning of this thread. Thanks, Valentine ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev