Re: [GENERAL] Changing foreign key referential actions in big databases

2016-11-07 Thread Arthur Silva
On Nov 7, 2016 3:34 PM, "Tom Lane"  wrote:
>
> Arthur Silva  writes:
> > We recently started looking into a long standing ticket to change some
> > foreign keys referential actions from CASCADE to RESTRICT for our own
> > safety. Everything else in the FK stays the same.
> > The problem is that running a query like the one bellow takes an
exclusive
> > lock for too long (order of minutes in some tables when testing against
a
> > backup db).
> > ...
> > Is it safe(ish) to just update pg_constraint.confupdtype and
> > pg_constraint.confdeltype for those?
>
> Well, it's probably safe, but it wouldn't have the results you want.
> What actually drives that behavior is the choice of trigger functions
> applied to the relations, so you'd have to also update the related
> pg_trigger rows appropriately.
>
> Also, I'm not too sure about the cacheing situation for pg_trigger,
> but it's moderately likely that a manual UPDATE on pg_trigger wouldn't
> force a cache flush, so that you'd have to do something extra to get
> running backends to notice the pg_trigger changes.  Since you're living
> dangerously already, a dummy UPDATE on the pg_class row for the affected
> relation would be good enough.
>
> You could probably get away with all that as long as your application
> isn't doing anything that makes it matter critically which semantics
> get applied while the changeover is being made.
>
> But test on a scratch database ...
>
> regards, tom lane

I see. Unfortunately I think all that would cross our "living too
dangerously" line.


Re: [GENERAL] Changing foreign key referential actions in big databases

2016-11-07 Thread Arthur Silva
On Nov 7, 2016 3:29 PM, "Adrian Klaver"  wrote:
>
> On 11/07/2016 02:09 AM, Arthur Silva wrote:
>>
>> Hi all, we're running a few Pg databases in production.
>>
>> Ubuntu 14.04 x64
>> 32 x64 cores
>> 64GB to 256GB memory, depending on cluster
>> PostgreSQL 9.3.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
>> 4.8.2-19ubuntu1) 4.8.2, 64-bit
>> FusionIO storage
>>
>> We recently started looking into a long standing ticket to change some
>> foreign keys referential actions from CASCADE to RESTRICT for our own
>> safety. Everything else in the FK stays the same.
>>
>> The problem is that running a query like the one bellow takes an
>> exclusive lock for too long (order of minutes in some tables when
>> testing against a backup db).
>>
>> ALTER TABLE "partneracl"
>> DROP CONSTRAINT "partneracl_partner_fkey",
>> ADD CONSTRAINT "partneracl_partner_fkey"
>> FOREIGN KEY ("partner")
>> REFERENCES "partner"("name");
>>
>> Is there any way to change the foreign key referential actions quickly
>> and/or without an exclusive lock?
>
>
> Are there indexes on the child columns?
>

Yes, they're all backed by indexes.

>
>> Is it safe(ish) to just update pg_constraint.confupdtype and
>> pg_constraint.confdeltype for those?
>>
>> Regards
>>
>> --
>> Arthur Silva
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com


[GENERAL] Changing foreign key referential actions in big databases

2016-11-07 Thread Arthur Silva
Hi all, we're running a few Pg databases in production.

Ubuntu 14.04 x64
32 x64 cores
64GB to 256GB memory, depending on cluster
PostgreSQL 9.3.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit
FusionIO storage

We recently started looking into a long standing ticket to change some
foreign keys referential actions from CASCADE to RESTRICT for our own
safety. Everything else in the FK stays the same.

The problem is that running a query like the one bellow takes an exclusive
lock for too long (order of minutes in some tables when testing against a
backup db).

ALTER TABLE "partneracl"
DROP CONSTRAINT "partneracl_partner_fkey",
ADD CONSTRAINT "partneracl_partner_fkey"
FOREIGN KEY ("partner")
REFERENCES "partner"("name");

Is there any way to change the foreign key referential actions quickly
and/or without an exclusive lock?
Is it safe(ish) to just update pg_constraint.confupdtype and
pg_constraint.confdeltype for those?

Regards

--
Arthur Silva


Re: [GENERAL] jsonb search

2016-06-28 Thread Arthur Silva
On Tue, Jun 28, 2016 at 5:09 PM, Oleg Bartunov  wrote:

> On Tue, Jun 28, 2016 at 5:15 PM, Armand Pirvu (home)
>  wrote:
> > Hi
> >
> > In my quest of JSONB querying and searching without having to actually
> cast
> > into a text, I found JSQuery
> >
> > I do admit my JSONB knowledge shortcoming and I am not a developer but a
> > DBA. As such some examples would be greatly appreciated since I tend to
> > understand better
> >
> > I compiled and installed the extension
> >
> > 1 - Exact matching without knowing the hierarchy, just the key and
> element,
> > I built a set like
> >
> > col1 |   col2
> > --+--
> >1 | {"Home Email": {"EmailAddress": "1...@yahoo.com"}}
> >2 | {"Home Email": {"EmailAddress": "2...@yahoo.com"}}
> >3 | {"Home Email": {"EmailAddress": "3...@yahoo.com"}}
> >
> >
> > JSQuqery is super
> >
> > SELECT * FROM test1 WHERE col2 @@ '*.EmailAddress = "1...@yahoo.com"';
> >
> > Now I can do a performance boost using
> >
> > CREATE INDEX idx1 ON test1 USING GIN (col2 jsonb_value_path_ops);
> >
> > I see this yield
> >
> > from
> >
> > testdb=# explain analyze^JSELECT * FROM test1 WHERE col2 @@
> '*.EmailAddress
> > = "1...@yahoo.com"';
> > Seq Scan on test1  (cost=0.00..12423.00 rows=500 width=68) (actual
> > time=0.016..160.777 rows=1 loops=1)
> >   Filter: (col2 @@ '*."EmailAddress" = "1...@yahoo.com"'::jsquery)
> >   Rows Removed by Filter: 49
> > Planning time: 0.042 ms
> > Execution time: 160.799 ms
> > (5 rows)
> >
> >
> > to
> >
> > testdb-# SELECT * FROM test1 WHERE col2 @@ '*.EmailAddress =
> > "1...@yahoo.com"';
> > Bitmap Heap Scan on test1  (cost=31.88..1559.32 rows=500 width=68)
> (actual
> > time=0.018..0.019 rows=1 loops=1)
> >   Recheck Cond: (col2 @@ '*."EmailAddress" = "1...@yahoo.com"'::jsquery)
> >   Heap Blocks: exact=1
> >   ->  Bitmap Index Scan on idx1  (cost=0.00..31.75 rows=500 width=0)
> (actual
> > time=0.011..0.011 rows=1 loops=1)
> > Index Cond: (col2 @@ '*."EmailAddress" = "1...@yahoo.com
> "'::jsquery)
> > Planning time: 0.039 ms
> > Execution time: 0.038 ms
> > (7 rows)
> >
> > A whooping 4000 times improvement
> >
> >
> >
> >
> > But I also noticed a vodka index
> >
> >
> > testdb=# CREATE INDEX idx2 ON
> > testdb-# test1 USING vodka (col2);
> > ERROR:  access method "vodka" does not exist
> >
> > What am I missing ?
> >
> > 2 - Is there anyway I can accomplish a pattern and/or case insensitive
> > search using JSQuery similar to
> >
> >
> > select * from test2 where upper((col2 -> 'Home Email') ->>
> 'EmailAddress')
> > ilike '%3%YAH%';
> >
> > select * from test2 where (col2 -> 'Home Email') ->> 'EmailAddress' like
> > '%3%yah%';
> >
> >
> > If so what indexing strategy can be used to have similar gains as above ?
> >
> >
> > Many thanks for any help
>
> Vodka is our experimental prototype of access method of next
> generation and it doesn't exists in production-ready form. You can
> check our presentation
> http://www.sai.msu.su/~megera/postgres/talks/highload-2014-vodka.pdf
> to understand jsquery limitation and why we stop its development.
> Also, 2 years ago I wrote (in russian)
> http://obartunov.livejournal.com/179422.html about jsonb query
> language and our plans. Google translate might helps
>
>
> https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=http%3A%2F%2Fobartunov.livejournal.com%2F179422.html&edit-text=&act=url
>
>
> >
> >
> > Armand
> >
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Could you share your future plans for it (or it's reincarnation), if any?

Even in the limited form, vodka is very impressive.

--
Arthur Silva


Re: [GENERAL] Postgresql-fdw

2016-05-22 Thread Arthur Silva
You can build a multicorn fdw http://multicorn.org/

Regards

On May 23, 2016 7:54 AM, "aluka raju"  wrote:
>
> I have data storage in flat files (structured and unstructured) . I want
to run sql queries on that , so i am looking in to postgresql how to use
fdw on the data that i have. I want to prepare an api for running sql
queries for the data in files,so i am trying to have separate postgresql
engine and write an wrappper for connecting the data that are in files.
>
> Help me out how can i proceed.
>
> Thanks
> aluka
>
> Sent with MailTrack


Re: [GENERAL] Share my experience and Thank you !

2016-05-13 Thread Arthur Silva
Any specific reason for choosing this old version of postgres?
On May 13, 2016 8:46 AM, "JingYuan Chen"  wrote:

> Hello,
>
> I want to share my experience about one of my projects and say thank you
> to the community.
>
> Scenario :
> My company's ERP system is SAP and rent a procurement system for bid. It's
> architecture bases on Webshpere5 and Oracle and IBM Java 1.4. The provider
> informed us that they decide to close this bid platform on 2016/03/31 about
> 3 month ago.
>
> The good news is that they can give us the complete source code. But we
> don't have Webshpere and another Oracle License for this system.
> Fortunately, I found that there is a PostgreSQL wiki site providing useful
> information about tools and migration tips. After doing some analysis, I
> decide to use Jetty 9 and PostgreSQL 9.1 and Oracle Java 1.7 to replace
> them. Transferring data is another problem. I decide to use Pentaho Data
> Integration tool. It is an ETL tool. I can learn about the status of
> transferring until the job was done. It helps me to keep the consistency of
> data in Oracle and PostgreSQL.
>
> After modifying many SQL commands to be suitable for PostgreSQL and
> setting  web.xml for Jetty, our procurement system can be active
> successfully without Webshpere and Oracle. However, its performance is
> poorly. I found that there are two problems to result in this situation.
> One is that JVM will crash accidentally. The other is that sometimes JDBC
> could not connect to PostgreSQL.
>
> Fortunately, Jetty is flexible and Java's garbage collection log provides
> useful information. I can tune JVM with different parameters while
> initiating Jetty. The last problem is JDBC. It use DBCP 1.3 as default to
> connect database. According to our new architecture, I replace DBCP with
> PGConnectionPoolDataSource.
>
> Now our system is running smoothly with this kind of architecture. I also
> use some PERL scripts to transfer data and JCO RFC to perform SAP's
> transaction.
>
> Thank You All !
>
>
> Regards,
>
> Chingyuan
>


Re: [GENERAL] arrays, inline to pointer

2016-05-03 Thread Arthur Silva
In fact, disabling toast compression will probably improve the performance
(the indirection will still take place). A float array is not usually very
compressible anyway.
On May 3, 2016 10:37 AM, "John R Pierce"  wrote:

> On 5/3/2016 1:21 AM, Marcus Engene wrote:
>
> For each array I've added, and populated, any dealings with the table has
> become way way slower. I can only assume this is because the array data is
> inline in the datablock on disk that stores the row.
>
>
> any field on a table thats more than a few dozen bytes gets 'toasted' and
> stored out of line in special 'toast' tables.
>
>
> --
> john r pierce, recycling bits in santa cruz
>
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Arthur Silva
Is this correct? I'm fairly sure jsonb supports lazily parsing objects and
each object level is actually searched using binary search.
Em 29/11/2015 11:25 AM, "Tom Smith"  escreveu:

> Hi, Thanks for everyone's response.
>
> The issue is not just compression, but lack of "indexing" or
> "segmentation" when a
> single doc has, say 2000 top level keys (with multiple levels of subkeys).
>  right now, if I query for one key,  the whole doc
> has to be first uncompressed and loaded and then search for the single key.
>
> Compared to traditional way of storing each top level key with a separate
> column, this is huge overhead when table scan is required.  Some kind of
> "keyed/slotted" storage for the doc could
> help, (for illustration, all keys starting with 'A' would have its own
> storage unit, so on,
> so when I search for key  "A1" only that unit would be unpacked and
> traversed to get :"A1" value". it is like postgresql predfine 26
> columns/slots for the whole doc. an internal indexing
> within each doc for fast retrieval of individual field values.
>
> Someone mentioned a plan in roadmap for this route but I'd like to know if
> it is in 9.6 plan.
>
> below url mentions the similar issue. I am not sure if it has been
> completely resolved.
>
>
> http://stackoverflow.com/questions/26259374/jsonb-performance-degrades-as-number-of-keys-increase
>
> below url mentions the potential issue.
>
>
> https://www.reddit.com/r/PostgreSQL/comments/36rdlr/improving_select_performance_with_jsonb_vs_hstore/
>
> Thanks
>
>
>
> On Sun, Nov 29, 2015 at 7:35 AM, Thomas Kellerer 
> wrote:
>
>> Tom Smith schrieb am 29.11.2015 um 03:27:
>>
>>> Hello:
>>>
>>> Is there a plan for 9.6 to resolve the issue of very slow
>>> query/retrieval of jsonb fields
>>> when there are large number (maybe several thousands) of top level keys.
>>> Currently, if I save a large json document with top level keys of
>>> thousands and query/retrieve
>>> field values,  the whole document has to be first decompressed and load
>>> to memory
>>> before searching for the specific field key/value.
>>>
>>> Thanks in Advance
>>>
>>
>> If you are concerned about the compression overhead, then why don't you
>> use (or try) JSON instead?
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


Re: [GENERAL] How to speed up pg_trgm / gin index scan

2015-07-28 Thread Arthur Silva
gt; >
> >
> > Any good ideas on how I could speed this up a bit?
> >
> > I have already tried to throw quite a bunch of memory at the problem:
> >
> > shared_buffers = 64GB
> > work_mem = 16GB
> >
> > but it didn't improve between this and the 32GB shared/ 2GB work GB I
> had before.
> >
> > This is on Postgres 9.1.15 on Linux.
> >
>
>
>
> > Try 9.4 and you'll surprise.
> >
> > 1. GIN has compression
> > 2. GIN has fast scan feature.
> >
> > Oleg
>
>
> Hi Oleg and List
>
> I finally got around to try 9.4, and it is quite fantastic.
>
> Index size went from 58 to now 14 GB:
>
> DM=# \di+ tridx_logs_01_msg
>List of relations
>  Schema |   Name| Type  |  Owner   |  Table  | Size  |
> Description
>
> +---+---+--+-+---+-
>  public | tridx_logs_01_msg | index | postgres | logs_01 | 14 GB |
>
>
> And the time for the above query went down to about 20 seconds:
>
>
> DM=# explain analyze
> DM-# select log_date, host, msg
> DM-# from logs_01 as log   where  log.msg like '%192.23.33.177%'
> DM-# and log.log_date >= '2015-1-18 1:45:24'
> DM-# and log.log_date <= '2015-1-19 1:45:24'
> DM-# order by log_date asc offset 200 limit 50;
>
> QUERY PLAN
>
> --
>  Limit  (cost=28815.06..28815.06 rows=1 width=194) (actual
> time=19032.099..19032.099 rows=0 loops=1)
>->  Sort  (cost=28814.74..28815.06 rows=128 width=194) (actual
> time=19032.093..19032.093 rows=0 loops=1)
>  Sort Key: log_date
>  Sort Method: quicksort  Memory: 25kB
>  ->  Bitmap Heap Scan on logs_01 log  (cost=28298.06..28810.26
> rows=128 width=194) (actual time=19031.992..19031.992 rows=0 loops=1)
>Recheck Cond: ((msg ~~ '%192.23.33.177%'::text) AND
> (log_date >= '2015-01-18 01:45:24'::timestamp without time zone) AND
> (log_date <= '2015-01-19 01:45:24'::timestamp without time zone))
>->  BitmapAnd  (cost=28298.06..28298.06 rows=128 width=0)
> (actual time=19031.983..19031.983 rows=0 loops=1)
>  ->  Bitmap Index Scan on tridx_logs_01_msg
> (cost=0.00..508.15 rows=8020 width=0) (actual time=18408.121..18408.121
> rows=99 loops=1)
>Index Cond: (msg ~~ '%192.23.33.177%'::text)
>  ->  Bitmap Index Scan on logs_01_date_index
> (cost=0.00..27789.60 rows=1325303 width=0) (actual time=623.084..623.084
> rows=1173048 loops=1)
>Index Cond: ((log_date >= '2015-01-18
> 01:45:24'::timestamp without time zone) AND (log_date <= '2015-01-19
> 01:45:24'::timestamp without time zone))
>  Planning time: 0.945 ms
>  Execution time: 19032.409 ms
> (13 rows)
>
> Great stuff! Sorry Oleg I don't have your original message anymore and
> can't reply into the right place in the thread, so I took the liberty to
> CC: you.
>
> Christian
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Christian

You could experiment recompiling pg_trgm commenting out the KEEPONLYALNUM
and/or IGNORECASE definitions if you are looking for exact matches, this
will increase the index size but will make it more selective.

Also, there's a thread around for pg_trgrm 1.2 which will get you even more
boost.

--
Arthur Silva


Re: [GENERAL] Which replication is the best for our case ?

2015-07-01 Thread Arthur Silva
On Wed, Jul 1, 2015 at 7:08 AM, ben.play 
wrote:

> In fact, the cron job will :
> -> select about 10 000 lines from a big table (>100 Gb of data). 1 user has
> about 10 lines.
> -> each line will be examinate by an algorithm
> -> at the end of each line, the cron job updates a few parameters for the
> user (add some points for example)
> -> Then, it inserts a line in another table to indicate to the user each
> transaction.
>
> All updates and inserts can be inserted ONLY by the cron job ...
> Therefore ... the merge can be done easily : no one can be update these new
> datas.
>
> But ... how big company like Facebook or Youtube can calculate on (a)
> dedicated server(s) without impacting users ?
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5856062.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


I'm assuming this query is really HUGE,
otherwise I can't see why it'd bring your database to halt, specially with
that amount of main memory.

That aside, I don't see why you can't send inserts in small batches back to
the master DB.

Regards.


Re: [GENERAL] Which replication is the best for our case ?

2015-06-30 Thread Arthur Silva
On Tue, Jun 30, 2015 at 1:57 PM, ben.play 
wrote:

> Hi guys,
>
> Thank you a lot for your answers.
>
> In fact, I tried to write the easiest explanation of my problem in order to
> be understood...
> My project is developed with Symfony and Doctrine (BERK, i know ...).
>
> The project has more than 2 years and Doctrine makes some bad decisions and
> lock all the table for a while.
> We are developing the project without Doctrine but it will not be available
> within 1 year...
>
> To be more precise : We have a database with more than 400 Gb and ONE table
> with more than 100 Gb of data. This is huge for doctrine. When the cron
> runs, it writes a lot on the disks in temporary file (although we have 128
> GB of Ram...). Of course, each table is well indexes...
>
> That is why I'm thinking about replication : My server A (master) is for my
> users... and my server B is a server reserved for calculations (and this
> server B which writes on the base)
>
> This is a image of my dream system :
> <
> http://postgresql.nabble.com/file/n5855916/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg
> >
> (If you can't see the image :
> http://tof.canardpc.com/view/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg)
>
>
> Thank you a lot for your help !
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5855916.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Hello,

Streaming replication will do just fine from ServerA to ServerB, but as for
the rest of the data flow I'm afraid we will need more details.


Re: [GENERAL] Which replication is the best for our case ?

2015-06-29 Thread Arthur Silva
On Mon, Jun 29, 2015 at 1:44 PM, Adrian Klaver 
wrote:

> On 06/29/2015 08:23 AM, Arthur Silva wrote:
>
>> On Mon, Jun 29, 2015 at 10:02 AM, ben.play > <mailto:benjamin.co...@playrion.com>> wrote:
>>
>> Hi guys,
>>
>> We have a PG database with more than 400 GB of data.
>> At this moment, a cron runs each ten minutes and updates about 10
>> 000 lines
>> with complex algorithms in PHP.
>>
>> Each time the cron runs, the website is almost down because some
>> queries
>> have to make an update on the FULL table...
>>
>> Therefore, I'm asking if it's possible to duplicate my main database
>> on a
>> slave server in order to run these cron on this second server... then,
>> replicate these changes on the main database (master).
>>
>> Which replication is the best in this case ?
>>
>> http://www.postgresql.org/docs/9.3/static/warm-standby.html ?
>>
>> Do you have any links or tutorial which explain this kind of
>> operation ?
>>
>> Thanks a lot !
>>
>>
>>
>> --
>> View this message in context:
>>
>> http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
>> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org
>> <mailto:pgsql-general@postgresql.org>)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>>
>>
>> Hello Adrian, can you give us one example of such FULL table update
>> queries?
>>
>
> Actually it is the OP(Ben) that is going to have to supply that.
>
>
>
>> By website down you mean slowed to a halt or read-only mode (due to the
>> update locks)?
>>
>> Either way it doesn't look like replication is going to solve your
>> problem.
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Oh of course Adrian! I must have confused the names at the time. Sorry!


Re: [GENERAL] Which replication is the best for our case ?

2015-06-29 Thread Arthur Silva
On Mon, Jun 29, 2015 at 10:02 AM, ben.play 
wrote:

> Hi guys,
>
> We have a PG database with more than 400 GB of data.
> At this moment, a cron runs each ten minutes and updates about 10 000 lines
> with complex algorithms in PHP.
>
> Each time the cron runs, the website is almost down because some queries
> have to make an update on the FULL table...
>
> Therefore, I'm asking if it's possible to duplicate my main database on a
> slave server in order to run these cron on this second server... then,
> replicate these changes on the main database (master).
>
> Which replication is the best in this case ?
>
> http://www.postgresql.org/docs/9.3/static/warm-standby.html ?
>
> Do you have any links or tutorial which explain this kind of operation ?
>
> Thanks a lot !
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Hello Adrian, can you give us one example of such FULL table update queries?

By website down you mean slowed to a halt or read-only mode (due to the
update locks)?

Either way it doesn't look like replication is going to solve your problem.


Re: [GENERAL] advocating LTS release and feature-train release cycles

2015-06-01 Thread Arthur Silva
On Sun, May 31, 2015 at 6:44 AM, Zenaan Harkness  wrote:

> My comments advocating a (ubuntu/debian/linux-kernel/firefox) LTS
> release and feature-train release cycle:
> https://lwn.net/Articles/646740/
> https://lwn.net/Articles/646743/
>
> The parent article "PostgreSQL: the good, the bad, and the ugly":
> https://lwn.net/Articles/645020/
>
> My summary (from one of my comments above):
> "For PostgreSQL may be:
> - normal release every 3 or 4 months
> - LTS release every 12, 18 or 24 months
>
> This model provides:
> - higher frequency normal releases to
>   a) showcase new features to the public and
>   b) reduce pressure on developers wanting to not miss an "infrequent
> annual" release; and
>
> - lower frequency LTS releases to
>   a) focus testing, stability and long term support resources
>   b) satisfy "conservative/ enterprise" RDBMS admins
> "
>
> Regards,
> Zenaan
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


I'm surprised it got no replies so far.

In my opinion a twice a year schedule would be good.
The LTS would be every 2 or 4 releases. Keeping 2 LTS versions supported at
all moments.

Maybe this should be reposted to the hackers list?


Re: R: [GENERAL] Index on integer or on string field

2015-05-15 Thread Arthur Silva
Yes that's my suggestion. Btree-Gin deals with lots of repeated values much
better than the Btree index as repeated keys are only stored once.
Em 15/05/2015 12:38, "Job"  escreveu:

>  Hello Arthur!
>
> So, i read that btree-gin have got "the ability to enforce uniqueness".
>
> If in this 10.millions long table i have, in index, 50 recurring values, i
> can leave the alphabetical field and change to btree-gin the index on it?!
>
> Thank you!
> Francesco
>
>  --
> *Da:* Arthur Silva [arthur...@gmail.com]
> *Inviato:* venerdì 15 maggio 2015 17.26
> *A:* Job
> *Cc:* pgsql-general@postgresql.org
> *Oggetto:* Re: [GENERAL] Index on integer or on string field
>
>   You should probably experiment with a btree-gin index on those.
> Em 15/05/2015 12:22, "Job"  escreveu:
>
>> Hello,
>>
>> i have a table of about 10 millions of records, with the index on a
>> string field.
>> Actually is alphabetical; since queries are about 100/200 per seconds, i
>> was looking for a better way to improve performance and reduce workload.
>>
>> The unique values, of that fields, are about the 50 (category name), and
>> we could create a second table to codify, with numerical integer values,
>> the 50 recurring names.
>>
>> Is index are integer and not characteral, performance are better and
>> workload reduces?
>>
>> Is there any comparisons?
>>
>> Thank you!
>> Francesco
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>


Re: [GENERAL] Index on integer or on string field

2015-05-15 Thread Arthur Silva
You should probably experiment with a btree-gin index on those.
Em 15/05/2015 12:22, "Job"  escreveu:

> Hello,
>
> i have a table of about 10 millions of records, with the index on a string
> field.
> Actually is alphabetical; since queries are about 100/200 per seconds, i
> was looking for a better way to improve performance and reduce workload.
>
> The unique values, of that fields, are about the 50 (category name), and
> we could create a second table to codify, with numerical integer values,
> the 50 recurring names.
>
> Is index are integer and not characteral, performance are better and
> workload reduces?
>
> Is there any comparisons?
>
> Thank you!
> Francesco
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] json-patch support?

2015-03-27 Thread Arthur Silva
On Fri, Mar 27, 2015 at 1:56 PM, Deven Phillips 
wrote:

> Are there any plans or ideas about implement JSON Patch (
> http://jsonpatch.com/) support for PostgreSQL? We deal with some
> relatively large JSON documents for our in-house application and it is
> often better to just send a json-patch update rather than the full
> document. It would be very nice if we could just select for the changes via
> a trigger and use NOTIFY to tell our application about a patch. If nobody
> has discussed it previously, perhaps I will look into implementing it
> myself.
>
> Thanks in advance,
>
> Deven
>

This could be implemented as an extension.
There're already a few extensions that provide this functionality with
plain functions, so it's just a matter of parsing the json and executing
those functions.


Re: [GENERAL] Some indexing advice for a Postgres newbie, please?

2015-02-19 Thread Arthur Silva
On Thu, Feb 19, 2015 at 2:14 PM, Brian Dunavant  wrote:

> You should consider a BitString.
> http://www.postgresql.org/docs/9.4/static/datatype-bit.html
>
> On Thu, Feb 19, 2015 at 11:10 AM, brian  wrote:
> >
> > Hi folks,
> >
> > I have a single-user application which is growing beyond the
> > fixed-format data files in which it currently holds its data, I need a
> > proper database as the backend. The front end is written using Lazarus
> > and FreePascal under Linux, should anyone feel that makes a
> > difference. The database will need to grow to around 250,000 records.
> >
> > My problem is with the data field which is the (unique) key. It's
> > really a single 192-bit integer (it holds various bits of bitmapped
> > data) which I currently hold as six 32-bit integers, but can convert
> > if needed when transferring the data.
> >
> > How would you advise that I hold this field in a Postgres database,
> > given the requirement for the whole thing to be a unique key? The
> > first 64 bits change relatively infrequently, the last 128 bits will
> > change with virtually every record. The last 128 bits will ALMOST be
> > unique in themselves, but not quite. :(
> >
> > Thanks,
> >
> > Brian.
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

I suggest the bytea type.


Re: [GENERAL] Advice for using integer arrays?

2015-01-06 Thread Arthur Silva
On Jan 6, 2015 3:12 PM, "Michael Heaney"  wrote:
>
> I'm fairly new to Postgres, and have a design issue for which an array of
integers might be a good solution.  But I'd like to hear from the experts
before proceeding down this path.
>
> Essentially, I'm trying to model the relationship between a group of
biological samples and their genes. Each sample (there are ~10K of them
now, with more coming) will have about 30,000 genes. Conversely, a
particular gene may be present in almost all samples.
>
> So I've created the following table to handle the many-to-many
relationship:
>
>  create table sample_gene (id serial, sample_id int, gene_id int);
>
> which looks like this when populated:
>
> sample_id|gene_id
> ---
> 1 |  1
> 1 |  2
> ...
> 1 |30475
> 2 | 1
> 2 | 2
> ...
> 2 |29973
> 3 |  1
> etc.
>
> The table now contains hundreds of millions of rows (with many, many more
to come).  Join performance between samples and genes is quite slow, even
with indexes on sample_id and gene_id.
>
> So it occurred to me: why not eliminate all the duplicate sample_id
values by storing the gene_id's in an array, like so:
>
>  create table sample_gene_array (id serial, sample_id int, gene_id int []
);
>
> So now the table data looks like this:
>
> sample_id|gene_id []
> ---
> 1 |  [1:30475]
> 2 |  [1:29973]
> etc.
>
> The new table is significantly smaller, and performance (using ANY[] ) is
quite good.  Nevertheless, I'm uneasy.  I come from a Sybase ASE
background, and so have no experience with arrays as datatypes. Is it okay
to store 30K+ gene values in an array in the linking table (or maybe even
in the sample table itself, thus eliminating the linking table)?  Should I
unnest the gene_id's first, before using them to join to the gene table?
>
> TIA for any guidance you can provide.  Again, I'm a Postgres neophyte -
but I'm in awe of the power and flexibility of this database, and wish that
I'd started using it sooner.
>
> --
> Michael Heaney
> JCVI
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Please provide sample queries so we can understand how you query the data.


Re: [GENERAL] Storing Video's or vedio file in DB.

2014-12-17 Thread Arthur Silva
This! I'm surprised it took so long to somebody suggest an object store.
On Dec 17, 2014 9:22 PM, "Jonathan Vanasco"  wrote:

>
> I wouldn't even store it on the filesystem if I could avoid that.
> Most people I know will assign the video a unique identifier (which is
> stored in the database) and then store the video file with a 3rd party
> (e.g. Amazon S3).
>
> 1. This is often cheaper.  Videos take up a lot of disk space.  Having to
> ensure 2-3 copies of a file as a failover is not fun.
> 2. It offloads work from internal servers.  Why deal with connections that
> are serving a static file if you can avoid it?
>
> In terms of FS vs DB (aside from the open vs streaming which was already
> brought up)
>
> I think the big issue with storing large files in the database is the
> input/output connection.
> Postgres has a specified number of max connections available, and each one
> has some overhead to operate. Meanwhile, a server like nginx can handle 10k
> connections easily, and with little or no overhead.  While the speed is
> comparable to the OS, you end up using a resource from a limited database
> connection pool.  And you run the risk of a slow/dropped client tying up
> the connection.
> Why allocate a resource to these operations, when there are more
> lightweight alternatives that won't tie up a database connection ?
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Help Optimizing a Summary Query

2014-12-11 Thread Arthur Silva
On Thu, Dec 11, 2014 at 6:52 PM, Robert DiFalco 
wrote:

> Thanks Arthur. I don't think there is as big a different between BIGINT
> and INTEGER as you think there is. In fact with an extended filesystem you
> might not see any difference at all.
>
> As I put in the first emal I am using a GIST index on user.name.
>
> I was really more interested in the LEFT OUTER JOINs vs EXISTS queries and
> if there was a better alternative I had not considered.
>
> On Tue, Dec 9, 2014 at 11:44 AM, Arthur Silva  wrote:
>
>> On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco 
>> wrote:
>>
>>> I'm sorry, I missed a JOIN on the second variation. It is:
>>>
>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>>CASE
>>>   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>>>   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>>>   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>>>   ELSE 'none'
>>>END AS 'friendStatus',
>>>(SELECT COUNT(1) AS d
>>>   FROM friends f1
>>>  JOIN friends f2 ON f1.fiend_id = f2.friend_id
>>>   WHERE f1.user_id = 33 AND f2.user_id = u.id)
>>> FROM users u
>>> *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
>>> <http://u.id>*
>>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>>> WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;
>>>
>>>
>>> On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco <
>>> robert.difa...@gmail.com> wrote:
>>>
>>>> I have users, friends, and friend_requests. I need a query that
>>>> essentially returns a summary containing:
>>>>
>>>> * user (name, imageURL, bio, ...)
>>>> * Friend status (relative to an active user)
>>>>* Is the user a friend of the active user?
>>>>* Has the user sent a friend request to the active user?
>>>>* Has the user received a friend request from the active user?
>>>> * # of mutualFriends
>>>> * Exclude the active user from the result set.
>>>>
>>>> So I have mocked this up two ways but both have complicated query plans
>>>> that will be problematic with large data sets. I'm thinking that my lack of
>>>> deep SQL knowledge is making me miss the obvious choice.
>>>>
>>>> Here's my two query examples:
>>>>
>>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>>>CASE
>>>>   WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
>>>> f.friend_id = u.id)   THEN 'isFriend'
>>>>   WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
>>>> AND s.from_id = u.id) THEN 'hasSentRequest'
>>>>   WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
>>>> AND r.from_id = 33)   THEN 'hasReceivedRequest'
>>>>   ELSE 'none'
>>>>END AS "friendStatus",
>>>>(SELECT COUNT(1)
>>>>   FROM friends f1
>>>>  JOIN friends f2 ON f1.friend_id = f2.friend_id
>>>>   WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
>>>> FROM users u
>>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>>>
>>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>>>CASE
>>>>   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>>>>   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>>>>   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>>>>   ELSE 'none'
>>>>END AS 'friendStatus',
>>>>(SELECT COUNT(1) AS d
>>>>   FROM friends f1
>>>>  JOIN friends f2 ON f1.fiend_id = f2.friend_id
>>>>   WHERE f1.user_id = 33 AND f2.user_id = u.id)
>>>> FROM users u
>>>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>>>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>>>
>>>> 33 is just the id of the active user I am using for testing. The WHERE
>>>> clause could be anything. I'm just using "u.name" here

Re: [GENERAL] Help Optimizing a Summary Query

2014-12-09 Thread Arthur Silva
On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco 
wrote:

> I'm sorry, I missed a JOIN on the second variation. It is:
>
> SELECT u.id, u.name, u.imageURL, u.bio,
>CASE
>   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>   ELSE 'none'
>END AS 'friendStatus',
>(SELECT COUNT(1) AS d
>   FROM friends f1
>  JOIN friends f2 ON f1.fiend_id = f2.friend_id
>   WHERE f1.user_id = 33 AND f2.user_id = u.id)
> FROM users u
> *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
> *
> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
> WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;
>
>
> On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco 
> wrote:
>
>> I have users, friends, and friend_requests. I need a query that
>> essentially returns a summary containing:
>>
>> * user (name, imageURL, bio, ...)
>> * Friend status (relative to an active user)
>>* Is the user a friend of the active user?
>>* Has the user sent a friend request to the active user?
>>* Has the user received a friend request from the active user?
>> * # of mutualFriends
>> * Exclude the active user from the result set.
>>
>> So I have mocked this up two ways but both have complicated query plans
>> that will be problematic with large data sets. I'm thinking that my lack of
>> deep SQL knowledge is making me miss the obvious choice.
>>
>> Here's my two query examples:
>>
>> SELECT u.id, u.name, u.imageURL, u.bio,
>>CASE
>>   WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
>> f.friend_id = u.id)   THEN 'isFriend'
>>   WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
>> AND s.from_id = u.id) THEN 'hasSentRequest'
>>   WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
>> AND r.from_id = 33)   THEN 'hasReceivedRequest'
>>   ELSE 'none'
>>END AS "friendStatus",
>>(SELECT COUNT(1)
>>   FROM friends f1
>>  JOIN friends f2 ON f1.friend_id = f2.friend_id
>>   WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
>> FROM users u
>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>
>> SELECT u.id, u.name, u.imageURL, u.bio,
>>CASE
>>   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>>   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>>   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>>   ELSE 'none'
>>END AS 'friendStatus',
>>(SELECT COUNT(1) AS d
>>   FROM friends f1
>>  JOIN friends f2 ON f1.fiend_id = f2.friend_id
>>   WHERE f1.user_id = 33 AND f2.user_id = u.id)
>> FROM users u
>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>
>> 33 is just the id of the active user I am using for testing. The WHERE
>> clause could be anything. I'm just using "u.name" here but I'm more
>> concerned about the construction of the result set than the WHERE clause.
>> These have more or less similar query plans, nothing that would change
>> things factorially. Is this the best I can do or am I missing the obvious?
>>
>> Here are the tables:
>>
>>
>> CREATE TABLE users (
>>   idBIGINT,
>>   name  VARCHAR,
>>   imageURL  VARCHAR
>>   created   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
>>   phone_natlBIGINT,   /* National Phone Number */
>>   country_e164  SMALLINT, /* E164 country code */
>>   email VARCHAR(255),
>>   PRIMARY KEY (id),
>>   UNIQUE (email),
>>   UNIQUE (phone_natl, country_e164)
>> );
>>
>>
>> CREATE TABLE friends (
>>   user_id  BIGINT,
>>   friend_id   BIGINT,
>>   PRIMARY KEY (user_id, user_id),
>>   FOREIGN KEY (user_id)REFERENCES users(id) ON DELETE CASCADE,
>>   FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
>> );
>> CREATE INDEX idx_friends_friend ON friends(friend_id);
>>
>> CREATE TABLE friend_requests (
>>   from_id  BIGINT,
>>   to_idBIGINT,
>>   created  TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
>>   PRIMARY KEY (from_id, user_id),
>>   FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
>>   FOREIGN KEY (to_id)REFERENCES users(id) ON DELETE CASCADE
>> );
>> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);
>>
>> Let me know if you guys need anything else.
>>
>>
>

Hello Robert, none of your schemas worked for me, here's a clean version

CREATE TABLE users (
  idBIGINT,
  name  VARCHAR,
  imageURL  VARCHAR,
  created   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  phone_natlBIGINT,
  country_e164  SMALLINT,
  email VARCHAR(255),
  PRIMARY KEY (id),
  UNIQUE (email),
  UNIQUE (phone_natl, country_e164)
);


CREA

[GENERAL] Dynomite from Netflix - Making Non-Distributed Databases, Distributed

2014-11-03 Thread Arthur Silva
Hello all,
I've come across Dynomite this evening.
http://techblog.netflix.com/2014/11/introducing-dynomite.html

It already has some interesting features and I'm sure there'll be some
momentum behind it.
It's too soon to say but I can definitely see something neat being build on
top of Postgres.

Ps: It's writen in C and Apache licensed.

--
Arthur Silva