Re: [iovisor-dev] Detecting modification of a bpf map from userspace
Pablo, this looks very similar to Francesco's request here: https://lists.iovisor.org/pipermail/iovisor-dev/2018-February/001225.html FYI, Francesco and Sebastiano are currently working on a patch to enable this feature. You can get in touch directly with them. fulvio On 21/02/2018 17:49, Alvarez, Pablo via iovisor-dev wrote: Dear devs, Is there a way to poll() or select() or otherwise not-busy-wait on a bpf map in userspace? I would like to react quickly to new data coming into the map from my bpf program withoug consuming too many resources. I ran an experiment using the map fd and poll(), and found that the revents field always returned with POLLIN set, whether there was new data in the map or not. This makes a certain sense, since one never calls read() on the map to clear the field, but is not particularly useful. The question applies both to - a tc cls map added with "tc filter add dev eth0 egress bpf da obj clsact_get_packet.o sec getpacket" - a map accessed from a bpf program inserted into a TCP socket (with load_bpf_file() and setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_BPF, prog_fd, sizeof(prog_fd[0])). If not, is there another way to get a bpf program to signal up to userspace that something has happened? Thanks Pablo Alvarez ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] Notification when an eBPF map is modified
That's cool :-) Francesco will have something to do for the next few days :-) fulvio On 17/02/2018 18:40, Jesper Dangaard Brouer wrote: On Sat, 17 Feb 2018 13:49:22 + Teng Qin via iovisor-dev wrote: We were looking for a mechanism transparent to the eBPF program, though. A possible rational is to have an hot-standby copy of the program (including the state) in some other location, but I don't want my dataplane to be aware of that. Thanks, fulvio You could also (use another BPF program or ftrace) to trace the bpf_map_update_elem Tracepoint. But in that case you get all update calls and would need to filter for the one you are interested on your own:) That is a good idea. Try it out via perf-record to see if it contains what you need: $ perf record -e bpf:bpf_map_update_elem -a $ perf script xdp_redirect_ma 2273 [011] 261187.968223: bpf:bpf_map_update_elem: map type= ufd=4 key=[00 00 00 00] val=[07 00 00 00] Looking at the above output and tracepoint kernel code, we should extend that with a map_id to easily identify/filter what map you are interested in. See patch below signature (not even compile tested). Example for attaching to tracepoints see: samples/bpf/xdp_monitor_*.c ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] Notification when an eBPF map is modified
On 17/02/2018 08:41, Y Song wrote: On Fri, Feb 16, 2018 at 11:19 PM, Fulvio Risso wrote: On 17/02/2018 08:02, Y Song via iovisor-dev wrote: On Fri, Feb 16, 2018 at 7:59 AM, Francesco Picciariello via iovisor-dev wrote: Hello all, Is there a way to receive asynchronous notification each time an eBPF map is modified? When map is modified in kernel, you can send something into a ring buffer and userspace can poll on this ring buffer to get notification. Got it. We were looking for a mechanism transparent to the eBPF program, though. A possible rational is to have an hot-standby copy of the program (including the state) in some other location, but I don't want my dataplane to be aware of that. Thanks, fulvio When you say "you" means "the dataplane program"? Yes. In other words, the dataplane must be collaborative and send a notification to userspace when the map has been modified? Yes. Not just notification, may be actual data or part of actual data, or aggregated data if modification is too frequent. Thanks, fulvio I used bpf_obj_pin() in order to save a specific map on filesystem, and I called linux inotify() on the pinned object. The inotify API provides a mechanism for monitoring filesystem events, and the goal is to notice when a pinned eBPF map is modified, but it seems inotify() does not work properly with bpf filesystem. In fact it's able to detect the creation and the deletion, but not the modification of the pinned object. The inotify takes vfs_read/write for read/write operations. Here bpf map read/write happens inside bpf program, or through bpf syscall, and hence inotify mechanism won't work. Regards. ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] Notification when an eBPF map is modified
On 17/02/2018 08:02, Y Song via iovisor-dev wrote: On Fri, Feb 16, 2018 at 7:59 AM, Francesco Picciariello via iovisor-dev wrote: Hello all, Is there a way to receive asynchronous notification each time an eBPF map is modified? When map is modified in kernel, you can send something into a ring buffer and userspace can poll on this ring buffer to get notification. When you say "you" means "the dataplane program"? In other words, the dataplane must be collaborative and send a notification to userspace when the map has been modified? Thanks, fulvio I used bpf_obj_pin() in order to save a specific map on filesystem, and I called linux inotify() on the pinned object. The inotify API provides a mechanism for monitoring filesystem events, and the goal is to notice when a pinned eBPF map is modified, but it seems inotify() does not work properly with bpf filesystem. In fact it's able to detect the creation and the deletion, but not the modification of the pinned object. The inotify takes vfs_read/write for read/write operations. Here bpf map read/write happens inside bpf program, or through bpf syscall, and hence inotify mechanism won't work. Regards. ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH RFC 0/4] Initial 32-bit eBPF encoding support
Dear Jiong, that's a great work. I havent' gone through the whole patches, but it seems to me that the documentation is not that much. From my past experiences, putting your hands into a compiler without at least some high-level documentation that presents how it works, it would be a nightmare. Even something like what you wrote in this email is valuable; of course, also how to turn this feature on. Thanks, fulvio On 18/09/2017 13:47, Jiong Wang via iovisor-dev wrote: Hi, Currently, LLVM eBPF backend always generate code in 64-bit mode, this may cause troubles when JITing to 32-bit targets. For example, it is quite common for XDP eBPF program to access some packet fields through base + offset that the default eBPF will generate BPF_ALU64 for the address formation, later when JITing to 32-bit hardware, BPF_ALU64 needs to be expanded into 32 bit ALU sequences even though the address space is 32-bit that the high bits is not significant. While a complete 32-bit mode implemention may need an new ABI (something like -target-abi=ilp32), this patch set first add some initial code so we could construct 32-bit eBPF tests through hand-written assembly. A new 32-bit register set is introduced, its name is with "w" prefix and LLVM assembler will encode statements like "w1 += w2" into the following 8-bit code field: BPF_ADD | BPF_X | BPF_ALU BPF_ALU will be used instead of BPF_ALU64. NOTE, currently you can only use "w" register with ALU statements, not with others like branches etc as they don't have different encoding for 32-bit target. Comments? *** BLURB HERE *** Jiong Wang (4): Improve instruction encoding descriptions Improve class inheritance in instruction patterns New 32-bit register set Initial 32-bit ALU (BPF_ALU) encoding support in assembler lib/Target/BPF/BPFInstrFormats.td | 84 +++- lib/Target/BPF/BPFInstrInfo.td | 506 +++- lib/Target/BPF/BPFRegisterInfo.td | 74 +++- lib/Target/BPF/Disassembler/BPFDisassembler.cpp | 15 + test/MC/BPF/insn-unit-32.s | 53 +++ 5 files changed, 427 insertions(+), 305 deletions(-) create mode 100644 test/MC/BPF/insn-unit-32.s ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] Iovisor and Real-time Llinux scheduler
Thanks, Alexei. So, simpler question :-) Is there any documentation that presents the interrupt context in which the different iomodules operate, according to the attaching point? fulvio On 11/04/2017 17:56, Alexei Starovoitov wrote: On Mon, Apr 10, 2017 at 9:18 AM, Fulvio Risso via iovisor-dev wrote: Dear all, since IOVisor is an in-kernel paradigm, is there anyone aware of any study regarding how RT scheduling within the linux kernel is impacted by the presence (and operation) of various IO-Modules? I don't think anyone looked at RT kernel, but it's certainly would be interesting research subject. ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
[iovisor-dev] Iovisor and Real-time Llinux scheduler
Dear all, since IOVisor is an in-kernel paradigm, is there anyone aware of any study regarding how RT scheduling within the linux kernel is impacted by the presence (and operation) of various IO-Modules? Thanks, fulvio ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
[iovisor-dev] Can we attach an IOmodule to a pcap file?
Dear all, perhaps a naive question. Instead of attaching an Iomodule to a NIC and perform statistics/whatever, can we attach the Iomodule to a pcap file? The classical BPF (with libpcap) has this feature, which enables to run multiple tests on the same data and see what happens (e.g., to compare the performance of different programs). We're unable to find a similar feature when playing with eBPF and more complex programs that go beyond packet filtering. Thanks in advance for any suggestion, fulvio ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
[iovisor-dev] IEEE Workshop on Open-Source Software Networking (OSSN 2017) - Deadline extended
**Please apologies for the cross posting** **Deadline extended: March, 27** This workshop could be a good way to promote some of the current activities beyond the boundaries of open-source networking communities. Please note this requirement in the CFP: "All papers accepted for the presentation refer to available open-source implementations, which must be cited in the paper and publicly available at the time of the review." === Second IEEE Workshop on Open-Source Software Networking (OSSN 2017) === http://opennetworking.kr/ossn Co-located with NetSoft 2017, Bologna, Italy, July 2017 Scope - Open-source software has already demonstrated its potential in the fields of operating systems and applications, as it allows enterprises and telcos to adapt, extend, and improve software according to their needs, while at the same time sharing part of the development cost. However, network softwarization introduces additional constraints such as the necessity to provide service guarantees, the challenge to orchestrate massively distributed systems, the obligation to provide end-to-end services, spanning across multiple technological domains and traversing different administrative boundaries, the requirements to comply with a strict regulatory framework, and more. This second Workshop on Open-Source Software Networking (OSSN-2017) aims at bringing together researchers, universities, companies, standardization bodies, and open-source communities to discuss the possible role of the open-source software in future network infrastructures and to share experiences on developing and operating open-source software-centric networking tools and platforms. In addition, OSSN-2017 aims at providing a vibrant atmosphere for representatives of different communities to setup a common discussion about their roadmap toward future network services and and to increase the cooperation between different communities. Finally, it aims at facilitating the collaboration between different actors, such as individual researchers (e.g., PhD students) contributing to existing established projects, and between different projects. Full and short (work-in-process) papers are solicited to discuss experiences on development, deployment, operation, and experimental studies around the overall lifecycle of open-source software networking. Topics of Interest -- Authors are invited to submit papers that fall into any topics related with open-source software networking. All papers accepted for the presentation on the workshop MUST refer to available open-source implementations, which must be cited in the paper and publicly available at the time of the review. Topics of interest include, but are not limited to, the following: * Management of future network infrastructures and operation support systems * Agile delivery of robust network services in telco environments * Network Functions Virtualization and Software Defined Networks * Softwarization of the access side of the network * Integration of edge devices (e.g., robots, IoT) with the network infrastructure * Long-term sustainability of the open-source economic model for telco services * Experience from large scale deployment of open-source systems * Open infrastructures for testing and evaluation of open-source projects * Architecture and internal details of existing open-source projects * Comparison and benchmarking between different open-source projects * Cooperation between different open-source communities * Mutual influence of open standards and open-source implementations * Software that improves or extends existing open-source networking projects such as the ones listed below: * SDN (Software Defined Networking): ONOS, Open Daylight, Ryu, etc. * NFV (Network Function Virtualization): OPNFV, OSM, Open Baton, Open-O, etc. * Switching: Open vSwitch, Open Switch, Open Network Linux, Indigo, etc. * Routing: Click, Zebra, Quagga, XORP, VyOS, Project Calico, etc. * Wireless: OpenLTE, OpenAirInterface, srsLTE, BATMAN, etc. * Cloud networking: OpenStack Neutron, Docker networking, etc. * Network-oriented Analytics: PNDA, Apache Spark Streaming, Apache Flink, etc. * Monitoring and Messaging: Catti, Nagios, Zabbix, Zenoss, Ntop, Kafka, RabbitMQ, etc. * Security and Utilities: Snort, OpenVPN, Netfilter, IPtables, Wireshark, NIST Net, etc. * Development and Simulation: NetFPGA, GNU radio, ns-3, etc. Paper Submission Guideline -- Authors are invited to submit original papers ( in English) not published or submitted for publication elsewhere. Full papers can be up to 6 written pages while short (work-in-progress) papers are up to 4 pages. Papers should be in IEEE 2-column US-Letter style using IEEE Conference templates (http://www.ieee.org/conf
[iovisor-dev] CFP - Second IEEE Workshop on Open-Source Software Networking (OSSN 2017)
Please apologies for the cross posting. This workshop could be a good way to promote some of the current activities beyond the boundaries of the iovisor group. Please note this requirement in the CFP: "All papers accepted for the presentation refer to available open-source implementations, which must be cited in the paper and publicly available at the time of the review." fulvio risso === Second IEEE Workshop on Open-Source Software Networking (OSSN 2017) === http://opennetworking.kr/ossn Scope - Open-source software has already demonstrated its potential in the fields of operating systems and applications, as it allows enterprises and telcos to adapt, extend, and improve software according to their needs, while at the same time sharing part of the development cost. However, network softwarization introduces additional constraints such as the necessity to provide service guarantees, the challenge to orchestrate massively distributed systems, the obligation to provide end-to-end services, spanning across multiple technological domains and traversing different administrative boundaries, the requirements to comply with a strict regulatory framework, and more. This second Workshop on Open-Source Software Networking (OSSN-2017) aims at bringing together researchers, universities, companies, standardization bodies, and open-source communities to discuss the possible role of the open-source software in future network infrastructures and to share experiences on developing and operating open-source software-centric networking tools and platforms. In addition, OSSN-2017 aims at providing a vibrant atmosphere for representatives of different communities to setup a common discussion about their roadmap toward future network services and and to increase the cooperation between different communities. Finally, it aims at facilitating the collaboration between different actors, such as individual researchers (e.g., PhD students) contributing to existing established projects, and between different projects. Full and short (work-in-process) papers are solicited to discuss experiences on development, deployment, operation, and experimental studies around the overall lifecycle of open-source software networking. Topics of Interest -- Authors are invited to submit papers that fall into any topics related with open-source software networking. All papers accepted for the presentation on the workshop MUST refer to available open-source implementations, which must be cited in the paper and publicly available at the time of the review. Topics of interest include, but are not limited to, the following: * Management of future network infrastructures and operation support systems * Agile delivery of robust network services in telco environments * Network Functions Virtualization and Software Defined Networks * Softwarization of the access side of the network * Integration of edge devices (e.g., robots, IoT) with the network infrastructure * Long-term sustainability of the open-source economic model for telco services * Experience from large scale deployment of open-source systems * Open infrastructures for testing and evaluation of open-source projects * Architecture and internal details of existing open-source projects * Comparison and benchmarking between different open-source projects * Cooperation between different open-source communities * Mutual influence of open standards and open-source implementations * Software that improves or extends existing open-source networking projects such as the ones listed below: * SDN (Software Defined Networking): ONOS, Open Daylight, Ryu, etc. * NFV (Network Function Virtualization): OPNFV, OSM, Open Baton, Open-O, etc. * Switching: Open vSwitch, Open Switch, Open Network Linux, Indigo, etc. * Routing: Click, Zebra, Quagga, XORP, VyOS, Project Calico, etc. * Wireless: OpenLTE, OpenAirInterface, srsLTE, BATMAN, etc. * Cloud networking: OpenStack Neutron, Docker networking, etc. * Network-oriented Analytics: PNDA, Apache Spark Streaming, Apache Flink, etc. * Monitoring and Messaging: Catti, Nagios, Zabbix, Zenoss, Ntop, Kafka, RabbitMQ, etc. * Security and Utilities: Snort, OpenVPN, Netfilter, IPtables, Wireshark, NIST Net, etc. * Development and Simulation: NetFPGA, GNU radio, ns-3, etc. Paper Submission Guideline -- Authors are invited to submit original papers ( in English) not published or submitted for publication elsewhere. Full papers can be up to 6 written pages while short (work-in-progress) papers are up to 4 pages. Papers should be in IEEE 2-column US-Letter style using IEEE Conference templates (http://www.ieee.org/conferences_events/conferences/publishing/templates.html) and submitted in PDF format via JEMS at https://submi
Re: [iovisor-dev] How to update the csum of a trimmed UDP packet?
Mauricio, is the UDP checksum needed? It seems to me that the checksum, in case of UDP frames, is optional. fulvio On 11/01/2017 22:51, Mauricio Vasquez via iovisor-dev wrote: Hi Daniel, On 01/10/2017 04:49 AM, Daniel Borkmann wrote: On 01/09/2017 04:48 PM, Mauricio Vasquez via iovisor-dev wrote: Hello, I am trying to implement an eBPF program that trims an UDP packet to a predefined length. I am able to trim the packet (using the bpf_change_tail helper) and also to update the ip->len and udp->length fields. After changing the those fields I update the checksums using the bpf_[l3,l4]_csum_replace helpers. Due to the fact that the UDP checksum also covers the data itself, it is necessary to update it. I tried to use the bpf_csum_diff together with the bpf_l4_csum_replace helpers, however I found that is not possible to pass a variable size to the bpf_csum_diff as it is marked as "ARG_CONST_STACK_SIZE_OR_ZER" in the function prototype. This limitation makes it impossible to update the checksum on different packets size as the quantity of bytes to be trimmed depends on the length of the packet. The code is available at [1] (please note that it is based on hover). The error that I get when I tried to load the program is: 31: (61) r9 = *(u32 *)(r6 +48) 32: (61) r2 = *(u32 *)(r6 +0) 33: (07) r2 += -100 34: (67) r2 <<= 32 35: (77) r2 >>= 32 36: (b7) r8 = 0 37: (b7) r3 = 0 38: (b7) r4 = 0 39: (b7) r5 = 0 40: (85) call 28 R2 type=inv expected=imm My questions are. 1. Am I missing another way to update the UDP checksum? 2. In case 1 is false, how could it be implemented? I think having a function to trim the packet without having a way to update the checksum is, in some way, useless. What's your specific use-case on this? Reason why I'm asking is that the helper needs to linearize skb and is thus only suitable for slow path [1]. We are implementing a DHCP server, DHCP messages are quite unusual so slow path helpers are not a problem in this case. F.e. we use it for rewriting an skb as time exceeded reply [2]. How far from the header start are you trimming? Perhaps it's feasible to to calc the csum up to the point of trimming via bpf_csum_diff()? Thanks to it I realized that I was doing the difficult way, I was trying to "remove" the trimmed bytes from the checksum. I am trying to implement it in a way similar to [2], calculating the checksum from scratch. [...] /* calculate checksum of the new udp packet */ u32 src_ip = bpf_htonl(ip->src); u32 dst_ip = bpf_htonl(ip->dst); u32 len_udp_n = ((u32)ip->nextp<<8) | (bpf_htons(ip->tlen)<<16); int csum = bpf_csum_diff(NULL, 0, data + sizeof(struct ethernet_t) + sizeof(struct ip_t), udp_len, 0); csum = bpf_csum_diff (NULL, 0, &src_ip, 4, csum); csum = bpf_csum_diff (NULL, 0, &dst_ip, 4, csum); csum = bpf_csum_diff (NULL, 0, &len_udp_n, 4, csum); bpf_trace_printk("[dhcp] csum: %x\n", csum); err = bpf_l4_csum_replace(skb, UDP_CRC_OFF, 0, csum, BPF_F_PSEUDO_HDR); [...] The full code of this test is available at [1]. I am calculating the csum of the udp packet itself, for that reason I skip the ethernet and ip headers on the first call to bpf_csum_diff. Then, as the UDP checksum includes a IPV4 pseudo header [2], I update the csum including those fields. This code compiles and passes verification, however it is not calculating the checksum correctly. According to wireshark there is always a difference of 0x14 between the calculated and the expected checksums. I am pretty sure that I am missing something about checksums, but I cannot figure it out. Can you spot some mistake on that code? Thank you very much, Mauricio V. [1] https://gist.github.com/mvbpolito/9697247eba1c412cd4a6c8efe6fab55d [2] https://www.ietf.org/rfc/rfc768.txt Did you check whether [3] could help in that? [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5293efe62df81908f2e90c9820c7edcc8e61f5e9 [2] https://github.com/cilium/cilium/blob/master/bpf/lib/icmp6.h [3] https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=06c1c049721a995dee2829ad13b24aaf5d7c5cce Thanks in advance. Mauricio V. [1] https://gist.github.com/mvbpolito/77edb3b4e65cc03c92862b3f17452286 ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] Automatic Loop Unrolling not working with #pragma unroll
Dear Thomas, definitely a good reason for not unrolling the loop. Thanks, fulvio On 08/12/2016 15:12, Thomas Graf via iovisor-dev wrote: On 7 December 2016 at 22:06, Mauricio Vasquez via iovisor-dev wrote: BPF_TABLE("array", u32, struct rt_entry, routing_table, ROUTING_TABLE_DIM); static int handle_rx(void *skb, struct metadata *md) { u8 *cursor = 0; struct ip_t *ip = cursor_advance(cursor, sizeof(*ip)); int i = 0; struct rt_entry *rt_entry_p = 0; u32 ip_dst_masked = 0; //#pragma unroll #pragma clang loop unroll(full) for (i = 0; i < ROUTING_TABLE_DIM; i++) { rt_entry_p = routing_table.lookup(&i); Might be because you pass a pointer here? if (rt_entry_p) { ip_dst_masked = ip->dst & rt_entry_p->netmask; if (ip_dst_masked == rt_entry_p->network) { goto FORWARD; } } } DROP: return RX_DROP; FORWARD: pkt_redirect(skb,md, rt_entry_p->port); return RX_REDIRECT; } Do you have any idea why clang is not unrolling that loop? It is working fine for us here and we're just using the standard unroll pragma: https://github.com/cilium/cilium/blob/master/bpf/lib/l3.h#L131 ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev