Re: [HACKERS] Horizontal scalability/sharding

2015-09-01 Thread Mason S
On Tue, Sep 1, 2015 at 6:55 AM, Bruce Momjian  wrote:

> On Mon, Aug 31, 2015 at 11:23:58PM -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> >
> > > My hope is that many FDW improvements will benefit sharding and
> > > non-sharding workloads, but I bet some improvements are going to be
> > > sharding-specific.  I would say we are still in the exploratory stage,
> > > but based on the number of people who care about this feature and want
> > > to be involved, I think we are off to a very good start.  :-)
> >
> > Having lots of interested people doesn't help with some problems,
> > though.  The Citus document says:
> >
> >   And the issue with these four limitations wasn't with foreign
> >   data wrappers. We wrote mongo_fdw and cstore_fdw, and we're
> >   quite happy with the contract FDWs provide. The problem was that
> >   we were trying to retrofit an API for something that it was
> >   fundamentally not designed to do.
>
> I had a chance to review the Citus Data document just now:
>
> https://goo.gl/vJWF85
>
> Particularly, it links to this document, which is clearer about the
> issues they are trying to solve:
>
>
> https://www.citusdata.com/blog/114-how-to-build-your-distributed-database
>
> The document opens a big question --- when queries can't be processed in
> a traditional top/down fashion, Citus has the goal of sending groups of
> results up the the coordinator, reordering them, then sending them back
> to the shards for further processing, basically using the shards as
> compute engines because the shards are no longer using local data to do
> their computations.  The two examples they give are COUNT(DISTINCT) and
> a join across two sharded tables ("CANADA").
>
> I assumed these queries were going to be solved by sending as digested
> data as possible to the coordinator, and having the coordinator complete
> any remaining processing.  I think we are going to need to decide if
> such "sending data back to shards" is something we are ever going to
> implement.  I can see FDWs _not_ working well for that use-case.
>
>
For efficient internodes joins with row shipping, FDWs may also not be easy
to do. Maybe it is possible if we optionally pass in lists of other nodes
and information about how they are partitioned so data knows where to get
shipped.

A challenge for planning with arbitrary copies of different shards is that
sometimes you may be able to push down joins, sometimes not. Planning and
execution get ugly. Maybe this can be simplified by parent-child tables
following the same partitioning scheme.

Mason


Re: [HACKERS] Horizontal scalability/sharding

2015-09-01 Thread Mason S
On Mon, Aug 31, 2015 at 3:08 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Aug 31, 2015 at 9:48 PM, Mason S  wrote:
>
>>
>>>  We also a bit disappointed by Huawei position about CSN patch, we hoped
>>> to use for  our XTM.
>>>
>>
>> Disappointed in what way? Moving to some sort of CSN approach seems to
>> open things up for different future ideas. In the short term, it would mean
>> replacing potentially large snapshots and longer visibility checks. In the
>> long term, perhaps CSN could help simplify the design of multi-master
>> replication schemes.
>>
>
> We are disappointed because at PGCon talk Huawei announced publishing of
> their CSN patch and further work in this direction together with community.
> However, it's even not published yet despite all the promises. Nobody from
> Huawei answers CSN thread in the hackers.
> So, I think we got nothing from Huawei except teasing and should rely only
> on ourselves. That is disappointing.
>
>
Oh, I see. I was in contact with some of the folks involved a few weeks
ago.  The intention is to do a bit more work on the patch and then post it.
This should be happening "soon".

Thanks,

Mason


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Mason S
>
>
>  We also a bit disappointed by Huawei position about CSN patch, we hoped
> to use for  our XTM.
>

Disappointed in what way? Moving to some sort of CSN approach seems to open
things up for different future ideas. In the short term, it would mean
replacing potentially large snapshots and longer visibility checks. In the
long term, perhaps CSN could help simplify the design of multi-master
replication schemes.


> FDW approach has been actively criticized by pg_shard people and that's
> also made me a bit suspicious.  It looks like  we are doomed to continue
> several development forks, so we decided to work on very important common
> project, XTM, which we hoped could be accepted by all parties and
> eventually committed to 9.6.  Now I see we were right, unfortunately.
>
>
I think the original XC project probably would have taken the FDW approach
as a basis if it had existed, with focus on push-down optimizations.

I assume that future work around PG sharding probably would be more likely
to be accepted with the FDW approach. One could perhaps work on pushing
down joins, aggregates and order by, then look at any optimizations gained
if code is moved outside of FDW.  It would make sense if some kind of
generic optimization for foreign tables for SQL-based sources could be
leveraged across all databases, rather than having to re-implement for each
FDW.

There are different approaches and related features that may need to be
improved.

Do we want multiple copies of shards, like the pg_shard approach? Or keep
things simpler and leave it up to the DBA to add standbys?

Do we want to leverage table inheritance? If so, we may want to spend time
improving performance for when the number of shards becomes large with what
currently exists. If using table inheritance, we could add the ability to
specify what node (er, foreign server) the subtable lives on. We could
create top level sharding expressions that allow these to be implicitly
created.

Should we allow arbitrary expressions for shards, not just range, list and
hash?

Maybe the most community-acceptable approach would look something like

- Use FDWs, and continue to optimize push-down operations, also for
non-PostgreSQL databases.

- Use table inheritance for defining the shards. Ideally allow for
specifying that some shards may be replicated to other foreign servers (and
itself) (for pushing down joins with lookup/static tables; at this point it
should be decent for star schema based data warehouses).

- XTM/GTM hooks. Preferably we move to CSN for snapshots in core PostgreSQL
though.

Longer term, efficient internode joins would require a lot more work.

The devil is in the details. There are things that have to be addressed,
for example, if using global XIDs via GTM, not every transaction is on
every node, so we need to make sure that new clog pages get added
properly.  There is also the potential to require a lot more code to be
added, like for cursor handling and stored functions. Perhaps some
limitations when using shards to foreign servers are acceptable if it is
desired to minimize code changes.  XC and XL code help.

Regards,

Mason