Re: [GENERAL] Columnar store as default for PostgreSQL 10?

2016-05-17 Thread Bráulio Bhavamitra
Alvaro, is this related or dependent on
https://www.pgcon.org/2016/schedule/events/920.en.html ?

On Mon, Apr 25, 2016 at 11:20 AM Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Bráulio Bhavamitra wrote:
> > Hi all,
> >
> > I'm finally having performance issues with PostgreSQL when doing big
> > analytics queries over almost the entire database of more than 100gb of
> > data.
> >
> > And what I keep reading all over the web is many databases switching to
> > columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
> > performance on queries in general and giant boosts with big analytics
> > queries.
> >
> > I wonder if there is any plans to move postgresql entirely to a columnar
> > store (or at least make it an option), maybe for version 10?
>
> This is a pretty interesting question.  I wrote an answer, then thought
> it would make a good blog post, so it's at
> http://blog.2ndquadrant.com/column-store-plans/
> I reproduce it below.
>
> Completely replacing the current row-based store wouldn't be a good
> idea: it has served us extremely well and I’m pretty sure that replacing
> it entirely with a columnar store would be disastrous performance-wise
> for OLTP use cases.
>
> That doesn't mean columnar stores are a bad idea in general -- because
> they aren't. They just have a more limited use case than "the whole
> database". For analytical queries on append-mostly data, a columnar
> store is a much more appropriate representation than the regular
> row-based store, but not all databases are analytical.
>
> However, in order to attain interesting performance gains you need to do
> a lot more than just change the underlying storage: you need to ensure
> that the rest of the system can take advantage of the changed
> representation, so that it can execute queries optimally; for instance,
> you may want aggregates that operate in a SIMD mode rather than
> one-value-at-a-time as it is today. This, in itself, is a large
> undertaking, and there are other challenges too.
>
> As it turns out, there's a team at 2ndQuadrant working precisely on
> these matters. We posted a patch last year, but it wasn’t terribly
> interesting -— it only made a single-digit percentage improvement in
> TPC-H scores; not enough to bother the development community with (it
> was a fairly invasive patch). We want more than that.
>
> In our design, columnar or not is going to be an option: you're going to
> be able to say "Dear server, for this table kindly set up columnar
> storage for me, would you? Thank you very much." And then you’re going
> to get a table which may be slower for regular usage but which will rock
> for analytics. For most of your tables the current row-based store will
> still likely be the best option, because row-based storage is much
> better suited to the more general cases.
>
> We don’t have a timescale yet. Stay tuned.
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [GENERAL] Columnar store as default for PostgreSQL 10?

2016-04-25 Thread Bráulio Bhavamitra
On Mon, Apr 25, 2016 at 11:20 AM Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Bráulio Bhavamitra wrote:
> > Hi all,
> >
> > I'm finally having performance issues with PostgreSQL when doing big
> > analytics queries over almost the entire database of more than 100gb of
> > data.
> >
> > And what I keep reading all over the web is many databases switching to
> > columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
> > performance on queries in general and giant boosts with big analytics
> > queries.
> >
> > I wonder if there is any plans to move postgresql entirely to a columnar
> > store (or at least make it an option), maybe for version 10?
>
> This is a pretty interesting question.  I wrote an answer, then thought
> it would make a good blog post, so it's at
> http://blog.2ndquadrant.com/column-store-plans/
> I reproduce it below.
>
> Completely replacing the current row-based store wouldn't be a good
> idea: it has served us extremely well and I’m pretty sure that replacing
> it entirely with a columnar store would be disastrous performance-wise
> for OLTP use cases.
>
> That doesn't mean columnar stores are a bad idea in general -- because
> they aren't. They just have a more limited use case than "the whole
> database". For analytical queries on append-mostly data, a columnar
> store is a much more appropriate representation than the regular
> row-based store, but not all databases are analytical.
>
> However, in order to attain interesting performance gains you need to do
> a lot more than just change the underlying storage: you need to ensure
> that the rest of the system can take advantage of the changed
> representation, so that it can execute queries optimally; for instance,
> you may want aggregates that operate in a SIMD mode rather than
> one-value-at-a-time as it is today. This, in itself, is a large
> undertaking, and there are other challenges too.
>
> As it turns out, there's a team at 2ndQuadrant working precisely on
> these matters. We posted a patch last year, but it wasn’t terribly
> interesting -— it only made a single-digit percentage improvement in
> TPC-H scores; not enough to bother the development community with (it
> was a fairly invasive patch). We want more than that.
>
> In our design, columnar or not is going to be an option: you're going to
> be able to say "Dear server, for this table kindly set up columnar
> storage for me, would you? Thank you very much." And then you’re going
> to get a table which may be slower for regular usage but which will rock
> for analytics. For most of your tables the current row-based store will
> still likely be the best option, because row-based storage is much
> better suited to the more general cases.
>
Nice Alvaro, I think that's the right approach.

Wish a good work for you on that :)

cheers,
bráulio

>
> We don’t have a timescale yet. Stay tuned.
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [GENERAL] Columnar store as default for PostgreSQL 10?

2016-04-21 Thread Bráulio Bhavamitra
On Thu, Apr 21, 2016 at 1:39 PM Geoff Winkless  wrote:

> On 21 April 2016 at 17:08, David G. Johnston 
> wrote:
> > I have little experience (and nothing practical) with columnar store but
> at
> > a high level I don't see the point.  I would hope that anyone interested
> in
> > working on a columnar store database would pick an existing one to
> improve
> > rather than converting a very successful row store database into one.
> And I
> > don't immediately understand how a dual setup would even be viable - it
> > seems like you'd have to re-write so much
> > of the code the only thing left would be the SQL parser.
>
> To be fair, I'd say that this "only thing" would be pretty huge. The
> cost of changing databases is often prohibitive (or nearly so) because
> the parser isn't _quite_ the same, and if the sort of gains that are
> bandied about could really be achieved just by choosing columnar
> storage for certain tables without having to rewrite large chunks of
> code that would be a very big win.
>
> I certainly agree that changing the store to columnar-only makes
> little sense though, because it would alienate a lot (I would suggest
> the majority) of users whose data fits far better into a row model.
>
> FWIW, looking at the cstore_fdw extension did get me quite excited
> (because I have an inkling that quite a lot of our queries might
> benefit from such a feature) until I saw that DELETEs aren't possible,
> which would invalidate most of the wins for us because of the
> subsequent massive cost of modifying data.
>
> There's also an interesting document from the monet_db guys about how
> the wins to be gained just by using cstore_fdw (rather than moving to
> a native column-store) aren't as high as you would hope. I have a
> feeling that would remain the case even if the store were integrated.
>
>
> https://www.monetdb.org/content/citusdb-postgresql-column-store-vs-monetdb-tpc-h-shootout
> " the margin by which MonetDB outperforms cstore_ftw shows that only
> switching storage models alone is probably not enough"
>
I think the gains are really high as with big data caching is usually not
really possible.
But of course cstore_fdw should perform better when caching is feasible.


>
> Geoff
> (Disclaimer: I've no connection to MonetDB in any way)
>


Re: [GENERAL] Feature request: fsync and commit_delay options per database

2015-06-30 Thread Bráulio Bhavamitra
On Tue, Jun 30, 2015 at 3:43 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 2015-06-29 15:18 GMT-07:00 Bráulio Bhavamitra brauli...@gmail.com:

 Hello all,

 After reading
 http://stackoverflow.com/questions/9407442/optimise-postgresql-for-fast-testing
 I've tried to use commit_delay to make commits really slow on a test
 environment. Unfortunetely, the maximum value is 100ms (100_000
 microseconds).

 Besides increasing it, it would be great to have these two options
 (fsync and commit_delay) per database, that is, valid only for
 databases configured with them. That would greatly speed up test
 running and still make the cluster available for other real
 databases.

 Is this feature or something similar planned?


 fsync is inherently across the cluster, so that can't be set per database.
 You can configure a different commit_delay in each database on the cluster
 using alter database jjanes set commit_delay to 1000; for example, but if
 different databases have different settings they will interact with each
 other in complex, unintuitive ways.  And it is not really clear what you are
 trying to accomplish by doing this.
Great! But for commit_delay to be an usable parameter for in-memory
test databases, it should allow for much higher delays. I would be
happy with 10 minutes, for instance. Is there a reason for a
limitation of 100ms?


 Running multiple clusters on the same server is pretty easy to do, as long
 your client allows you configure which port number it connects to.  If you
 really want fsync on for one database and off for another one, but each
 database in a different cluster.
Nice, will try that too, but would prefer the commit_delay setup above.


 Cheers,

 Jeff



-- 
Lute pela sua ideologia. Seja um com sua ideologia. Viva pela sua
ideologia. Morra por sua ideologia P.R. Sarkar

EITA - Educação, Informação e Tecnologias para Autogestão
http://cirandas.net/brauliobo
http://eita.org.br

Paramapurusha é meu pai e Parama Prakriti é minha mãe. O universo é
meu lar e todos nós somos cidadãos deste cosmo. Este universo é a
imaginação da Mente Macrocósmica, e todas as entidades estão sendo
criadas, preservadas e destruídas nas fases de extroversão e
introversão do fluxo imaginativo cósmico. No âmbito pessoal, quando
uma pessoa imagina algo em sua mente, naquele momento, essa pessoa é a
única proprietária daquilo que ela imagina, e ninguém mais. Quando um
ser humano criado mentalmente caminha por um milharal também
imaginado, a pessoa imaginada não é a propriedade desse milharal, pois
ele pertence ao indivíduo que o está imaginando. Este universo foi
criado na imaginação de Brahma, a Entidade Suprema, por isso a
propriedade deste universo é de Brahma, e não dos microcosmos que
também foram criados pela imaginação de Brahma. Nenhuma propriedade
deste mundo, mutável ou imutável, pertence a um indivíduo em
particular; tudo é o patrimônio comum de todos.
Restante do texto em
http://cirandas.net/brauliobo/blog/a-problematica-de-hoje-em-dia


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Feature request: fsync and commit_delay options per database

2015-06-29 Thread Bráulio Bhavamitra
Hello all,

After reading 
http://stackoverflow.com/questions/9407442/optimise-postgresql-for-fast-testing
I've tried to use commit_delay to make commits really slow on a test
environment. Unfortunetely, the maximum value is 100ms (100_000
microseconds).

Besides increasing it, it would be great to have these two options
(fsync and commit_delay) per database, that is, valid only for
databases configured with them. That would greatly speed up test
running and still make the cluster available for other real
databases.

Is this feature or something similar planned?

cheers,
bráulio

-- 
Lute pela sua ideologia. Seja um com sua ideologia. Viva pela sua
ideologia. Morra por sua ideologia P.R. Sarkar

EITA - Educação, Informação e Tecnologias para Autogestão
http://cirandas.net/brauliobo
http://eita.org.br

Paramapurusha é meu pai e Parama Prakriti é minha mãe. O universo é
meu lar e todos nós somos cidadãos deste cosmo. Este universo é a
imaginação da Mente Macrocósmica, e todas as entidades estão sendo
criadas, preservadas e destruídas nas fases de extroversão e
introversão do fluxo imaginativo cósmico. No âmbito pessoal, quando
uma pessoa imagina algo em sua mente, naquele momento, essa pessoa é a
única proprietária daquilo que ela imagina, e ninguém mais. Quando um
ser humano criado mentalmente caminha por um milharal também
imaginado, a pessoa imaginada não é a propriedade desse milharal, pois
ele pertence ao indivíduo que o está imaginando. Este universo foi
criado na imaginação de Brahma, a Entidade Suprema, por isso a
propriedade deste universo é de Brahma, e não dos microcosmos que
também foram criados pela imaginação de Brahma. Nenhuma propriedade
deste mundo, mutável ou imutável, pertence a um indivíduo em
particular; tudo é o patrimônio comum de todos.
Restante do texto em
http://cirandas.net/brauliobo/blog/a-problematica-de-hoje-em-dia


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Feature request: fsync and commit_delay options per database

2015-06-29 Thread Bráulio Bhavamitra
On Mon, Jun 29, 2015 at 7:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 =?UTF-8?Q?Br=C3=A1ulio_Bhavamitra?= brauli...@gmail.com writes:
 Besides increasing it, it would be great to have these two options
 (fsync and commit_delay) per database, that is, valid only for
 databases configured with them. That would greatly speed up test
 running and still make the cluster available for other real
 databases.

 Is this feature or something similar planned?

 No.  Neither of them make any sense per-database.
Is there something else planned for in-memory databases?


 regards, tom lane



-- 
Lute pela sua ideologia. Seja um com sua ideologia. Viva pela sua
ideologia. Morra por sua ideologia P.R. Sarkar

EITA - Educação, Informação e Tecnologias para Autogestão
http://cirandas.net/brauliobo
http://eita.org.br

Paramapurusha é meu pai e Parama Prakriti é minha mãe. O universo é
meu lar e todos nós somos cidadãos deste cosmo. Este universo é a
imaginação da Mente Macrocósmica, e todas as entidades estão sendo
criadas, preservadas e destruídas nas fases de extroversão e
introversão do fluxo imaginativo cósmico. No âmbito pessoal, quando
uma pessoa imagina algo em sua mente, naquele momento, essa pessoa é a
única proprietária daquilo que ela imagina, e ninguém mais. Quando um
ser humano criado mentalmente caminha por um milharal também
imaginado, a pessoa imaginada não é a propriedade desse milharal, pois
ele pertence ao indivíduo que o está imaginando. Este universo foi
criado na imaginação de Brahma, a Entidade Suprema, por isso a
propriedade deste universo é de Brahma, e não dos microcosmos que
também foram criados pela imaginação de Brahma. Nenhuma propriedade
deste mundo, mutável ou imutável, pertence a um indivíduo em
particular; tudo é o patrimônio comum de todos.
Restante do texto em
http://cirandas.net/brauliobo/blog/a-problematica-de-hoje-em-dia


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general