Re: [HACKERS] Horizontal scalability/sharding
On Tue, Sep 1, 2015 at 6:55 AM, Bruce Momjian wrote: > On Mon, Aug 31, 2015 at 11:23:58PM -0300, Alvaro Herrera wrote: > > Bruce Momjian wrote: > > > > > My hope is that many FDW improvements will benefit sharding and > > > non-sharding workloads, but I bet some improvements are going to be > > > sharding-specific. I would say we are still in the exploratory stage, > > > but based on the number of people who care about this feature and want > > > to be involved, I think we are off to a very good start. :-) > > > > Having lots of interested people doesn't help with some problems, > > though. The Citus document says: > > > > And the issue with these four limitations wasn't with foreign > > data wrappers. We wrote mongo_fdw and cstore_fdw, and we're > > quite happy with the contract FDWs provide. The problem was that > > we were trying to retrofit an API for something that it was > > fundamentally not designed to do. > > I had a chance to review the Citus Data document just now: > > https://goo.gl/vJWF85 > > Particularly, it links to this document, which is clearer about the > issues they are trying to solve: > > > https://www.citusdata.com/blog/114-how-to-build-your-distributed-database > > The document opens a big question --- when queries can't be processed in > a traditional top/down fashion, Citus has the goal of sending groups of > results up the the coordinator, reordering them, then sending them back > to the shards for further processing, basically using the shards as > compute engines because the shards are no longer using local data to do > their computations. The two examples they give are COUNT(DISTINCT) and > a join across two sharded tables ("CANADA"). > > I assumed these queries were going to be solved by sending as digested > data as possible to the coordinator, and having the coordinator complete > any remaining processing. I think we are going to need to decide if > such "sending data back to shards" is something we are ever going to > implement. I can see FDWs _not_ working well for that use-case. > > For efficient internodes joins with row shipping, FDWs may also not be easy to do. Maybe it is possible if we optionally pass in lists of other nodes and information about how they are partitioned so data knows where to get shipped. A challenge for planning with arbitrary copies of different shards is that sometimes you may be able to push down joins, sometimes not. Planning and execution get ugly. Maybe this can be simplified by parent-child tables following the same partitioning scheme. Mason
Re: [HACKERS] Horizontal scalability/sharding
On Mon, Aug 31, 2015 at 3:08 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Mon, Aug 31, 2015 at 9:48 PM, Mason S wrote: > >> >>> We also a bit disappointed by Huawei position about CSN patch, we hoped >>> to use for our XTM. >>> >> >> Disappointed in what way? Moving to some sort of CSN approach seems to >> open things up for different future ideas. In the short term, it would mean >> replacing potentially large snapshots and longer visibility checks. In the >> long term, perhaps CSN could help simplify the design of multi-master >> replication schemes. >> > > We are disappointed because at PGCon talk Huawei announced publishing of > their CSN patch and further work in this direction together with community. > However, it's even not published yet despite all the promises. Nobody from > Huawei answers CSN thread in the hackers. > So, I think we got nothing from Huawei except teasing and should rely only > on ourselves. That is disappointing. > > Oh, I see. I was in contact with some of the folks involved a few weeks ago. The intention is to do a bit more work on the patch and then post it. This should be happening "soon". Thanks, Mason
Re: [HACKERS] Horizontal scalability/sharding
> > > We also a bit disappointed by Huawei position about CSN patch, we hoped > to use for our XTM. > Disappointed in what way? Moving to some sort of CSN approach seems to open things up for different future ideas. In the short term, it would mean replacing potentially large snapshots and longer visibility checks. In the long term, perhaps CSN could help simplify the design of multi-master replication schemes. > FDW approach has been actively criticized by pg_shard people and that's > also made me a bit suspicious. It looks like we are doomed to continue > several development forks, so we decided to work on very important common > project, XTM, which we hoped could be accepted by all parties and > eventually committed to 9.6. Now I see we were right, unfortunately. > > I think the original XC project probably would have taken the FDW approach as a basis if it had existed, with focus on push-down optimizations. I assume that future work around PG sharding probably would be more likely to be accepted with the FDW approach. One could perhaps work on pushing down joins, aggregates and order by, then look at any optimizations gained if code is moved outside of FDW. It would make sense if some kind of generic optimization for foreign tables for SQL-based sources could be leveraged across all databases, rather than having to re-implement for each FDW. There are different approaches and related features that may need to be improved. Do we want multiple copies of shards, like the pg_shard approach? Or keep things simpler and leave it up to the DBA to add standbys? Do we want to leverage table inheritance? If so, we may want to spend time improving performance for when the number of shards becomes large with what currently exists. If using table inheritance, we could add the ability to specify what node (er, foreign server) the subtable lives on. We could create top level sharding expressions that allow these to be implicitly created. Should we allow arbitrary expressions for shards, not just range, list and hash? Maybe the most community-acceptable approach would look something like - Use FDWs, and continue to optimize push-down operations, also for non-PostgreSQL databases. - Use table inheritance for defining the shards. Ideally allow for specifying that some shards may be replicated to other foreign servers (and itself) (for pushing down joins with lookup/static tables; at this point it should be decent for star schema based data warehouses). - XTM/GTM hooks. Preferably we move to CSN for snapshots in core PostgreSQL though. Longer term, efficient internode joins would require a lot more work. The devil is in the details. There are things that have to be addressed, for example, if using global XIDs via GTM, not every transaction is on every node, so we need to make sure that new clog pages get added properly. There is also the potential to require a lot more code to be added, like for cursor handling and stored functions. Perhaps some limitations when using shards to foreign servers are acceptable if it is desired to minimize code changes. XC and XL code help. Regards, Mason