On Tue, 8 Jun 2021 at 08:28, Peter Eisentraut <peter.eisentr...@enterprisedb.com> wrote: > > I wrote a script to automatically generate the node support functions > (copy, equal, out, and read, as well as the node tags enum) from the > struct definitions.
Thanks for working on this. I agree that it would be nice to see improvements in this area. It's almost 2 years ago now, but I'm wondering if you saw what Andres proposed in [1]? The idea was basically to make a metadata array of the node structs so that, instead of having to output large amounts of .c code to do read/write/copy/equals, instead just have small functions that loop over the elements in the array for the given struct and perform the required operation based on the type. There were still quite a lot of unsolved problems, for example, how to determine the length of arrays so that we know how many bytes to compare in equal funcs. I had a quick look at what you've got and see you've got a solution for that by looking at the last "int" field before the array and using that. (I wonder if you'd be better to use something more along the lines of your pg_node_attr() for that?) There's quite a few advantages having the metadata array rather than the current approach: 1. We don't need to compile 4 huge .c files and link them into the postgres binary. I imagine this will make the binary a decent amount smaller. 2. We can easily add more operations on nodes. e.g serialize nodes for sending plans to parallel workers. or generating a hash value so we can store node types in a hash table. One disadvantage would be what Andres mentioned in [2]. He found around a 5% performance regression. However, looking at the NodeTypeComponents struct in [1], we might be able to speed it up further by shrinking that struct down a bit and just storing an uint16 position into a giant char array which contains all of the field names. I imagine they wouldn't take more than 64k. fieldtype could see a similar change. That would take the NodeTypeComponents struct from 26 bytes down to 14 bytes, which means about double the number of field metadata we could fit on a cache line. Do you have any thoughts about that approach instead? David [1] https://www.postgresql.org/message-id/20190828234136.fk2ndqtld3onf...@alap3.anarazel.de [2] https://www.postgresql.org/message-id/20190920051857.2fhnvhvx4qddd...@alap3.anarazel.de