Re: [GENERAL] first_value/last_value

2016-05-18 Thread Tom Smith
It would really save all the troubles for many people if postgresql has a
built-in first/last function  along with sum/avg.
There is already a C extension and a wiki sample  and  implemented for
window function.
I am curious why these two functions were not added along  their window
implementation counter part,
for completness and consistency


On Wed, May 18, 2016 at 10:42 PM, Melvin Davidson 
wrote:

>
>
> On Wed, May 18, 2016 at 10:36 PM, Adam Brusselback <
> adambrusselb...@gmail.com> wrote:
>
>> Here is an example that works in a single query.  Since you have two
>> different orders you want the data back in, you need to use subqueries to
>> get the proper data back, but it works, and is very fast.
>>
>> CREATE TEMPORARY TABLE foo AS
>> SELECT generate_series as bar
>> FROM generate_series(1, 100);
>>
>> CREATE INDEX idx_foo_bar ON foo (bar);
>>
>>
>> SELECT *
>> FROM (
>> SELECT bar
>> FROM foo
>> ORDER BY bar asc
>> LIMIT 1
>> ) x
>> UNION ALL
>> SELECT *
>> FROM (
>> SELECT bar
>> FROM foo
>> ORDER BY bar desc
>> LIMIT 1
>> ) y;
>>
>> DROP TABLE foo;
>>
>
> Seems to me SELECT min(),  max() FROM deja.vu ; would also
> work.
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>


Re: [GENERAL] first_value/last_value

2016-05-18 Thread Tom Smith
I need to use both in single sql.

On Wed, May 18, 2016 at 9:08 PM, Adam Brusselback <adambrusselb...@gmail.com
> wrote:

> Is there a reason you can't do that now with a limit 1/order by/union all?
> Just have it ordered one way on the first query and the other on the
> bottom. That will give you two rows that are the first / last in your set
> based on whatever column you order on.
> On May 18, 2016 8:47 PM, "Tom Smith" <tomsmith198...@gmail.com> wrote:
>
>> Hello:
>>
>> Is there a plan for 9.7  to enable using the two aggregate function
>> as non-window function?   i.e.  enabling getting the first/last row
>> in single sql without using window features.
>> There is actually a  C-extension for first()/last().
>> I am wondering  if 9.7 would make them built-in function like max/min
>>
>> Thanks
>>
>


[GENERAL] first_value/last_value

2016-05-18 Thread Tom Smith
Hello:

Is there a plan for 9.7  to enable using the two aggregate function
as non-window function?   i.e.  enabling getting the first/last row
in single sql without using window features.
There is actually a  C-extension for first()/last().
I am wondering  if 9.7 would make them built-in function like max/min

Thanks


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-05-01 Thread Tom Smith
No, it is within the individual json object storage. In a way, it would be
part of query plan,
but strictly for the individual json object storage structure, it is not
necessarily an "index"
one possible(but primitive) implementation could be like having multiple
"segments" in the storage,
all keys starting with "a"  is in first segment, etc.

On Sun, May 1, 2016 at 4:14 PM, Oleg Bartunov <obartu...@gmail.com> wrote:

>
>
> On Sun, May 1, 2016 at 6:46 AM, Tom Smith <tomsmith198...@gmail.com>
> wrote:
>
>> Hello:
>>
>> I'd like to bring this JSONB performance issue again.
>> Below is a link of MySQL way of storing/retrieving Json key/value
>>
>> https://dev.mysql.com/doc/refman/5.7/en/json.html
>>
>> Instead of providing column indexing(like GIN for JSONB in Postgresql).
>> it provides only internal data structure level indexing within each
>> individual json object
>> for fast retrieval.  compression is not used.
>>
>> Perhaps without implementing  complicated column level GIN indexing,
>> implementing
>> a new variant JSON type that only handle  individual json object indexing
>> would be
>> feasible?  Combined with current JSONB implementation,   both common use
>> cases
>> (one is global doc indexing, the other is fast retrieval of individual
>> values)
>> would work out and make postgresql unbeatable.
>>
>
> It's called expression index ?
>
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian <br...@momjian.us> wrote:
>>
>>> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
>>> > Hi,
>>> >
>>> > Congrats on the official release of 9.5
>>> >
>>> > And I'd like bring up the issue again about if 9.6 would address the
>>> jsonb
>>> > performance issue
>>> > with large number of top level keys.
>>> > It is true that it does not have to use JSON format. it is about
>>> serialization
>>> > and fast retrieval
>>> > of dynamic tree structure objects. (at top level, it might be called
>>> dynamic
>>> > columns)
>>> > So if postgresql can have its own way, that would work out too as long
>>> as it
>>> > can have intuitive query
>>> > (like what are implemented for json and jsonb) and fast retrieval of a
>>> tree
>>> > like object,
>>> > it can be called no-sql data type. After all, most motivations of
>>> using no-sql
>>> > dbs like MongoDB
>>> > is about working with dynamic tree object.
>>> >
>>> > If postgresql can have high performance on this, then many no-sql dbs
>>> would
>>> > become history.
>>>
>>> I can give you some backstory on this.  TOAST was designed in 2001 as a
>>> way to store, in a data-type-agnostic way, long strings compressed and
>>> any other long data type, e.g. long arrays.
>>>
>>> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
>>> unique case because it is one of the few types that can be processed
>>> without reading the entire value, e.g. it has an index.
>>>
>>> We are going to be hesitant to do something data-type-specific for
>>> JSONB.  It would be good if we could develop a data-type-agnostic
>>> approach to has TOAST can be improved.  I know of no such work for 9.6,
>>> and it is unlikely it will be done in time for 9.6.
>>>
>>> --
>>>   Bruce Momjian  <br...@momjian.us>http://momjian.us
>>>   EnterpriseDB http://enterprisedb.com
>>>
>>> + As you are, so once was I. As I am, so you will be. +
>>> + Roman grave inscription +
>>>
>>
>>
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-04-30 Thread Tom Smith
Hello:

I'd like to bring this JSONB performance issue again.
Below is a link of MySQL way of storing/retrieving Json key/value

https://dev.mysql.com/doc/refman/5.7/en/json.html

Instead of providing column indexing(like GIN for JSONB in Postgresql).
it provides only internal data structure level indexing within each
individual json object
for fast retrieval.  compression is not used.

Perhaps without implementing  complicated column level GIN indexing,
implementing
a new variant JSON type that only handle  individual json object indexing
would be
feasible?  Combined with current JSONB implementation,   both common use
cases
(one is global doc indexing, the other is fast retrieval of individual
values)
would work out and make postgresql unbeatable.









On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian <br...@momjian.us> wrote:

> On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> > Hi,
> >
> > Congrats on the official release of 9.5
> >
> > And I'd like bring up the issue again about if 9.6 would address the
> jsonb
> > performance issue
> > with large number of top level keys.
> > It is true that it does not have to use JSON format. it is about
> serialization
> > and fast retrieval
> > of dynamic tree structure objects. (at top level, it might be called
> dynamic
> > columns)
> > So if postgresql can have its own way, that would work out too as long
> as it
> > can have intuitive query
> > (like what are implemented for json and jsonb) and fast retrieval of a
> tree
> > like object,
> > it can be called no-sql data type. After all, most motivations of using
> no-sql
> > dbs like MongoDB
> > is about working with dynamic tree object.
> >
> > If postgresql can have high performance on this, then many no-sql dbs
> would
> > become history.
>
> I can give you some backstory on this.  TOAST was designed in 2001 as a
> way to store, in a data-type-agnostic way, long strings compressed and
> any other long data type, e.g. long arrays.
>
> In all previous cases, _part_ of the value wasn't useful.  JSONB is a
> unique case because it is one of the few types that can be processed
> without reading the entire value, e.g. it has an index.
>
> We are going to be hesitant to do something data-type-specific for
> JSONB.  It would be good if we could develop a data-type-agnostic
> approach to has TOAST can be improved.  I know of no such work for 9.6,
> and it is unlikely it will be done in time for 9.6.
>
> --
>   Bruce Momjian  <br...@momjian.us>http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Roman grave inscription +
>


Re: [GENERAL] retrieve grouping sets/rollup rows

2016-03-13 Thread Tom Smith
exactly what I am looking for. very nice.  Thx

On Sun, Mar 13, 2016 at 10:44 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> >> On 03/13/2016 10:07 PM, Tom Smith wrote:
> >>> It would help if the resultset has some param to mark which is which
> >>> with the grouping sets index.
>
> I think you're looking for the GROUPING() function.  See
> http://www.postgresql.org/docs/9.5/static/functions-aggregate.html
>
> regards, tom lane
>


Re: [GENERAL] retrieve grouping sets/rollup rows

2016-03-13 Thread Tom Smith
It would help if the resultset has some param to mark which is which with
the grouping sets index.
for example, the results for (a,b,c,d) would be marked as for index =0,
(b,c,d) would be index=1

On Sun, Mar 13, 2016 at 9:52 PM, James Keener <j...@jimkeener.com> wrote:

> Do you want to know if a row is from the (a,b) or (c,d) group?  All rows
> will contain (a,b,c,d) but (a,b) will be NULL for the (c,d) grouping
> sets, and vice-versa.
>
> Jim
>
> On 03/13/2016 09:45 PM, Tom Smith wrote:
> > Hello:
> >
> > With JDBC, how can I tell which row is for which grouping sets or rollup
> > using result sets
> >
> > Thanks
> >
> >
>


[GENERAL] retrieve grouping sets/rollup rows

2016-03-13 Thread Tom Smith
Hello:

With JDBC, how can I tell which row is for which grouping sets or rollup
using result sets

Thanks


Re: [GENERAL] BRIN Usage

2016-02-18 Thread Tom Smith
it is for reducing index size as the table become huge.
sorry for confusion, by timestamp, I meant a time series number, not the
sql timestamp type.
I need the unique on the column to ensure no duplicate,   but the btree
index is getting
huge so BRIN seems to solve problem but can not ensure unique

On Thu, Feb 18, 2016 at 2:14 AM, David Rowley <david.row...@2ndquadrant.com>
wrote:

>
> On 18/02/2016 9:34 am, "Tom Smith" <tomsmith198...@gmail.com> wrote:
> >
> > Hi:
> >
> > I feel it is a stupid question.
> >
> > Can  BRIN index enforce uniqueness?
> > My issue is
> > the column I'd like to apply BRIN index  also needs to be unique
> > (think of timestamp as primary key).
>
> Only btree supports unique.
> Is there a special reason not to use btree? I'm also finding it hard to
> imagine a case where a timestamp primary key is a good idea.
>


[GENERAL] BRIN Usage

2016-02-17 Thread Tom Smith
Hi:

I feel it is a stupid question.

Can  BRIN index enforce uniqueness?
My issue is
the column I'd like to apply BRIN index  also needs to be unique
(think of timestamp as primary key).


Thanks


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-02-08 Thread Tom Smith
Using JSON/JSONB type in postgresql is usually due to the use case that the
keys (top  level included) can not be predefined.   this is the major
difference between NoSQL/Document and RDBMS.

Why would TOAST have to be used?  Can some speciailly structured "raw"
files be used
outside current database files? and jsonb column value would be a pointer
to that file.


On Wed, Jan 20, 2016 at 7:32 AM, Bill Moran 
wrote:

> On Tue, 19 Jan 2016 23:53:19 -0300
> Alvaro Herrera  wrote:
>
> > Bill Moran wrote:
> >
> > > As far as a current solution: my solution would be to decompose the
> > > JSON into an optimized table. I.e.:
> > >
> > > CREATE TABLE store1 (
> > >  id SERIAL PRIMARY KEY,
> > >  data JSONB
> > > );
> > >
> > > CREATE TABLE store2 (
> > >  id INT NOT NULL REFERENCES store1(id),
> > >  top_level_key VARCHAR(1024),
> > >  data JSONB,
> > >  PRIMARY KEY(top_level_key, id)
> > > );
> >
> > Isn't this what ToroDB already does?
> > https://www.8kdata.com/torodb/
>
> Looks like. I wasn't aware of ToroDB, thanks for the link.
>
> --
> Bill Moran
>


Re: [GENERAL] TABLESAMPLE usage

2016-01-25 Thread Tom Smith
Yeah. I am looking for fastest possible method that Postgresql would
use its internal data structure knowledge to walk through the timestamp
index
and resturns every "nth" row

On Mon, Jan 25, 2016 at 5:56 AM, Simon Riggs  wrote:

> On 25 January 2016 at 09:44, Matija Lesar  wrote:
>
>
>> you can accomplish this with row_number()
>> 
>> :
>>
>> WITH data_cte as (
>> SELECT
>> id,
>> clock_timestamp() as ctimestamp
>> FROM generate_series(1,1000) as id
>> )
>> SELECT
>> *
>> FROM
>> (SELECT
>> id,
>> ctimestamp,
>> row_number() OVER (ORDER BY ctimestamp) as rownum
>> FROM data_cte
>> ) as data_withrownumbers
>> WHERE
>> rownum%100=1;
>>
>
> You can, but its not very fast.
>
> --
> Simon Riggshttp://www.2ndQuadrant.com/
> 
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [GENERAL] TABLESAMPLE usage

2016-01-25 Thread Tom Smith
Thanks, the solution would work for fixed interval timestamp.
But the data I am dealing with has irregular timestamp so can not be
generated with exact steps.

I would consider this a special case/method of random sampling, evenly
distributed sampling according to the defined  timestamp index.

On Mon, Jan 25, 2016 at 3:48 AM, Vik Fearing <v...@2ndquadrant.fr> wrote:

> On 01/25/2016 05:09 AM, Tom Smith wrote:
> > Hello:
> >
> > I have a big table with that is always appended with new data with a
> unique
> > sequence id  (always incremented, or timestamp as unique index) each row.
> > I'd like to sample, say 100 rows out of say 1000 rows evently across all
> > the rows,
> > so that it would return  rows  of1, 101, 201, 301you get idea.
> > can TABLESAMPLEget one row for every 100 rows, based on the order
> > of the rows added to table using the timestamp as already indexed/sorted
> > sequence
>
> No, TABLESAMPLE is intended to take a random sampling of the data using
> various methods.
>
> You're looking for something more like this:
>
> select t.*
> from generate_series(1, (select max(id) from t), 100) g
> join t on t.id = g;
> --
> Vik Fearing  +33 6 46 75 15 36
> http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
>


[GENERAL] TABLESAMPLE usage

2016-01-24 Thread Tom Smith
Hello:

I have a big table with that is always appended with new data with a unique
sequence id  (always incremented, or timestamp as unique index) each row.
I'd like to sample, say 100 rows out of say 1000 rows evently across all
the rows,
so that it would return  rows  of1, 101, 201, 301you get idea.
can TABLESAMPLEget one row for every 100 rows, based on the order
of the rows added to table using the timestamp as already indexed/sorted
sequence

Thanks


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-01-11 Thread Tom Smith
Hi,

Congrats on the official release of 9.5

And I'd like bring up the issue again about if 9.6 would address the jsonb
performance issue
with large number of top level keys.
It is true that it does not have to use JSON format. it is about
serialization and fast retrieval
of dynamic tree structure objects. (at top level, it might be called
dynamic columns)
So if postgresql can have its own way, that would work out too as long as
it can have intuitive query
(like what are implemented for json and jsonb) and fast retrieval of a tree
like object,
it can be called no-sql data type. After all, most motivations of using
no-sql dbs like MongoDB
is about working with dynamic tree object.

If postgresql can have high performance on this, then many no-sql dbs would
become history.

Thanks








On Thu, Dec 3, 2015 at 5:31 PM, Jim Nasby  wrote:

> On 11/29/15 9:30 AM, Arthur Silva wrote:
>
>> Is this correct? I'm fairly sure jsonb supports lazily parsing objects
>> and each object level is actually searched using binary search.
>>
>
> The problem is there's no support for loading just part of a TOASTed
> field. Even if that existed, we'd still need a way to know what byte
> position in the TOASTed field a key lived at.
>
> It's possible to add all that, but I think it'd be a serious amount of
> work.
>
> Since someone else was just wondering about storing more specific types in
> JSON, it might be more useful/interesting to devise a Postgres-specific way
> to store variable schema documents. That would give us a lot more
> flexibility over implementation than the JSON type ever will.
>
> Or think about it this way: there's really no great reason why everyone
> chose JSON. There's tons of other serialization storage formats for other
> languages out there, so why not one specialized to Postgres? (And of course
> we'd want to be able to cast from that to JSON and back...)
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Tom Smith
Hi:

The goal is fast retrieval of a a field value with a row when the row is
already
picked, one scenario is download a particular field value (if exists) of
all rows in the table.
It is actually a very common use case of exporting data of several  user
selected fields.
The performance is extremely slow.

Thanks




On Sun, Nov 29, 2015 at 12:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Bill Moran <wmo...@potentialtech.com> writes:
> > Tom Smith <tomsmith198...@gmail.com> wrote:
> >> Is there a plan for 9.6 to resolve the issue of very slow
> >> query/retrieval of jsonb fields when there are large number (maybe
> >> several thousands) of top level keys.  Currently, if I save a large
> >> json document with top level keys of thousands and query/retrieve field
> >> values, the whole document has to be first decompressed and load to
> >> memory before searching for the specific field key/value.
>
> > I could be off-base here, but have you tried:
> > ATLER TABLE $table ALTER COLUMN $json_column SET STORAGE EXTERNAL;
>
> There is just about zero chance we'll ever worry about this for compressed
> columns.  However, even the uncompressed case does currently involve
> loading the whole column value, as Tom says.  We did consider the
> possibility of such an optimization when designing the JSONB storage
> format, but I don't know of anyone actively working on it.
>
> In any case, it's unlikely that it'd ever be super fast, since it's
> certainly going to involve at least a couple of TOAST fetches.
> Personally I'd be looking for a different representation.  If there
> are specific fields that are known to be needed a lot, maybe functional
> indexes would help?
>
> regards, tom lane
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Tom Smith
Hi, Thanks for everyone's response.

The issue is not just compression, but lack of "indexing" or "segmentation"
when a
single doc has, say 2000 top level keys (with multiple levels of subkeys).
 right now, if I query for one key,  the whole doc
has to be first uncompressed and loaded and then search for the single key.

Compared to traditional way of storing each top level key with a separate
column, this is huge overhead when table scan is required.  Some kind of
"keyed/slotted" storage for the doc could
help, (for illustration, all keys starting with 'A' would have its own
storage unit, so on,
so when I search for key  "A1" only that unit would be unpacked and
traversed to get :"A1" value". it is like postgresql predfine 26
columns/slots for the whole doc. an internal indexing
within each doc for fast retrieval of individual field values.

Someone mentioned a plan in roadmap for this route but I'd like to know if
it is in 9.6 plan.

below url mentions the similar issue. I am not sure if it has been
completely resolved.

http://stackoverflow.com/questions/26259374/jsonb-performance-degrades-as-number-of-keys-increase

below url mentions the potential issue.

https://www.reddit.com/r/PostgreSQL/comments/36rdlr/improving_select_performance_with_jsonb_vs_hstore/

Thanks



On Sun, Nov 29, 2015 at 7:35 AM, Thomas Kellerer <spam_ea...@gmx.net> wrote:

> Tom Smith schrieb am 29.11.2015 um 03:27:
>
>> Hello:
>>
>> Is there a plan for 9.6 to resolve the issue of very slow query/retrieval
>> of jsonb fields
>> when there are large number (maybe several thousands) of top level keys.
>> Currently, if I save a large json document with top level keys of
>> thousands and query/retrieve
>> field values,  the whole document has to be first decompressed and load
>> to memory
>> before searching for the specific field key/value.
>>
>> Thanks in Advance
>>
>
> If you are concerned about the compression overhead, then why don't you
> use (or try) JSON instead?
>
>
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Tom Smith
Unfortunately, the keys can not be predefined or fixed. it is a doc, the
reason jsonb
is used.  It works well for small docs with small number of keys.
but really slow with large number of keys. If this issue is resolved, I
think Postgresql
would be an absolutely superior choice over MongoDB.for document data.

On Sun, Nov 29, 2015 at 12:37 AM, John R Pierce <pie...@hogranch.com> wrote:

> On 11/28/2015 6:27 PM, Tom Smith wrote:
>
>> Is there a plan for 9.6 to resolve the issue of very slow query/retrieval
>> of jsonb fields
>> when there are large number (maybe several thousands) of top level keys.
>> Currently, if I save a large json document with top level keys of
>> thousands and query/retrieve
>> field values,  the whole document has to be first decompressed and load
>> to memory
>> before searching for the specific field key/value.
>>
>
> If it was my data, I'd be decomposing that large JSON thing into multiple
> SQL records, and storing as much stuff as possible in named SQL fields,
> using JSON in the database only for things that are too ambiguous for SQL.
>
>
>
> --
> john r pierce, recycling bits in santa cruz
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


[GENERAL] JSONB performance enhancement for 9.6

2015-11-28 Thread Tom Smith
Hello:

Is there a plan for 9.6 to resolve the issue of very slow query/retrieval
of jsonb fields
when there are large number (maybe several thousands) of top level keys.
Currently, if I save a large json document with top level keys of thousands
and query/retrieve
field values,  the whole document has to be first decompressed and load to
memory
before searching for the specific field key/value.

Thanks in Advance


Re: [GENERAL] LATERAL query extreme slow due to partition

2015-09-08 Thread Tom Smith
Almost all lateral query would be "dynamic sql" since it will vary as left
value changes.
perhaps query planner can mark it as "deferred" and during execution,
replacing the original planning with a newly generated plan.
Or we have to say lateral feature is not suitable for partitioned table.


On Tue, Sep 8, 2015 at 3:51 PM, Merlin Moncure <mmonc...@gmail.com> wrote:

> On Mon, Sep 7, 2015 at 12:07 AM, Tom Smith <tomsmith198...@gmail.com>
> wrote:
> > Hi:
> >
> > I am using the wonderful  lateral query  feature like the following
> >
> > select * from generate_series (1,10,5) T(t),
> >lateral (select * from P where t between  t and t +  3)
> >
> > P is a parent table of a hundred partitions
> > the idea is to for each t value from 1 to 10 with step of 5,
> > get rows from P (in one or two of its partitions) that  between
> > the current value of t and t+3,
> > so each lateral select should only index scan one or two partitons
> > but the query plan shows that each will scan all hundred paritions,
> > I think due to its unable to determine the range since
> > the query is select * from P where t between  t and t +  3
> > as "t" is unknown at the time of parsing.
> >
> > How to force query planner "dynamically" generate plan to
> > for each lateral select query as "t" changes.
>
> I think you're asking to much of the planner here.  The query is
> planned first and executed second.  Because of that partitioning
> generally depends on specific values, not dynamic ones, for exclusion
> to take effect.  I would consider rewriting to loop and see if that
> helps.
>


Re: [GENERAL] jsonb value retrieval performance

2015-09-08 Thread Tom Smith
It can be any jsonb so I am asking a general question of the implementaion
for each jsonb storage (not about GIN or table wide indexing, but only
within
a single jsonb item in a single row.

A sample would be like (no quotes)

{
a1: {b1:v1, b2:v2, b100:v100}

a3000: {c1:x1, c2: x2. c200: v200
}

you get the idea.  I wonder how postgresql does it when try

jsonb->a1->b1,

 does it read in the whole jsonb tree structure in memory
and get to v1  or it has some optimization so only get v1 instead
of reading in the whole structure.









On Tue, Sep 8, 2015 at 11:58 AM, Teodor Sigaev  wrote:

> and I am trying to get value via  jsonb->parentKey->childKey
>> it seems it is very slow.
>> Would it be actually faster to use top level key only and parse it at
>> client side?
>>
>
> Suppose, most time is spent for decompressing huge value, not for actual
> search inside jsonb. If so, we need to implement some search method which
> decompress some chunks of jsonb.
>
>
> Could you send to me an example of that jsonb?
>
>
>
> --
> Teodor Sigaev   E-mail: teo...@sigaev.ru
>WWW:
> http://www.sigaev.ru/
>


[GENERAL] jsonb value retrieval performance

2015-09-07 Thread Tom Smith
Hello:

Does JSONB storage has some internal indexing(like hasmap)
to fast look up a value given a key?
I have a jsonb doc with two level keys
(parentKey: {childKey:value}}
there are maybe 2000 parent keys per doc and 100 child keys per parent key

and I am trying to get value via  jsonb->parentKey->childKey
it seems it is very slow.
Would it be actually faster to use top level key only and parse it at
client side?

Thanks


[GENERAL] LATERAL query extreme slow due to partition

2015-09-06 Thread Tom Smith
Hi:

I am using the wonderful  lateral query  feature like the following

select * from generate_series (1,10,5) T(t),
   lateral (select * from P where t between  t and t +  3)

P is a parent table of a hundred partitions
the idea is to for each t value from 1 to 10 with step of 5,
get rows from P (in one or two of its partitions) that  between
the current value of t and t+3,
so each lateral select should only index scan one or two partitons
but the query plan shows that each will scan all hundred paritions,
I think due to its unable to determine the range since
the query is select * from P where t between  t and t +  3
as "t" is unknown at the time of parsing.

How to force query planner "dynamically" generate plan to
for each lateral select query as "t" changes.

Thanks


Re: [GENERAL] modify postgresql.conf

2015-09-06 Thread Tom Smith
Got it. Thanks very much

On Sun, Sep 6, 2015 at 11:25 PM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Mon, Sep 7, 2015 at 12:12 PM, Tom Smith <tomsmith198...@gmail.com>
> wrote:
> > Hi,
> >
> > Can setting be appended to the end of the postgresql.conf
> > so that it will override whatever is already in the previous version.
> > For example
> >
> > the existing postgresql.conf already has setting
> >
> > max_connections = 100
> >
> > in the middle of file.
> >
> > Now I append a line as the end of the file, regardless what is set
> already
> > max_connections = 500
> >
> > will this setting at the end of the file be guranteed effective value?
> >
> > I am trying to do this way as I want keep all my changes at the end of
> the
> > file
> > in one place for easy maintenance.
>
> The last value read for a parameter is the effective one. This counts
> as well for included configuration files. The configuration file
> managed by ALTER SYSTEM has the priority over the rest.
> --
> Michael
>


[GENERAL] modify postgresql.conf

2015-09-06 Thread Tom Smith
Hi,

Can setting be appended to the end of the postgresql.conf
so that it will override whatever is already in the previous version.
For example

the existing postgresql.conf already has setting

max_connections = 100

in the middle of file.

Now I append a line as the end of the file, regardless what is set already
max_connections = 500

will this setting at the end of the file be guranteed effective value?

I am trying to do this way as I want keep all my changes at the end of the
file
in one place for easy maintenance.

Thanks


Re: [GENERAL] get first and last row in one sql as two columns

2015-09-03 Thread Tom Smith
Hi:

The window function works for me (with adding  limit 1 in the end to output
only one row
 needed instead of many duplicate rows).

thanks very much.

On Thu, Sep 3, 2015 at 6:51 AM, Dickson S. Guedes <lis...@guedesoft.net>
wrote:

> On Wed, Sep 02, 2015 at 07:14:40PM -0400, Tom Smith wrote:
> > Hi:
> >
> > I need to get the first and last tow in one sql like below
> >
> > select first(col1), last(col1) from table order by col1
>
> Have you tried the window functions [1] last_value and first_value?
>
> You could use something like:
>
> SELECT first_value(col1) over (order by col1),
>last_value(col1)  over (order by col1)
> FROM table;
>
> It should be as simple as that, or I miss your point.
>
>
> [1] http://www.postgresql.org/docs/current/static/functions-window.html
>
>
> []s
> --
> Dickson S. Guedes
> @guediz - http://github.com/guedes
>


Re: [GENERAL] get first and last row in one sql as two columns

2015-09-02 Thread Tom Smith
Thanks for suggestion.  I could use two separate sqls, but I am looking to
get it in a single sql.
I just compiled and installed (under postgres id)  first_last_agg C
extension.
http://pgxn.org/dist/first_last_agg/
it WORKED under postgres user with sample
select first(column1), last(column1) from (values (null),(1),(3),(null)) as
x
but somehow when I run under other users, it say

ERROR:  function first(integer) does not exist

below is the control file ,  I can not find out why I can not run under
other user ids.
Any help would be appreciated.

# first_last_agg extension
comment = 'first() and last() aggregate functions'
default_version = '0.1.4'
module_pathname = '$libdir/first_last_agg'
relocatable = false
schema = pg_catalog
superuser = false







On Wed, Sep 2, 2015 at 7:47 PM, Melvin Davidson <melvin6...@gmail.com>
wrote:

> Try this:
>
> SELECT
>   (SELECT 
>  FROM 
>  ORDER BY  offset 0 LIMIT 1) ,
>   (SELECT 
>  FROM 
>  ORDER BY  OFFSET (SELECT COUNT(*) ) LIMIT 1)
>FROM  LIMIT 1;
>
>
> On Wed, Sep 2, 2015 at 7:27 PM, Rob Sargent <robjsarg...@gmail.com> wrote:
>
>> On 09/02/2015 05:14 PM, Tom Smith wrote:
>>
>> Hi:
>>
>> I need to get the first and last tow in one sql like below
>>
>> select first(col1), last(col1) from table order by col1
>>
>> I saw some posting in wiki with a custom function (or C extention)
>> to do this.   Is it widely used and reliable?
>> https://wiki.postgresql.org/wiki/First/last_(aggregate)
>>
>> I am wondering why these two functions are not part of postgresql built-in
>> functions as it has many use cases
>>
>> Thanks
>>
>> If you're ordering by col1, does
>>
>> select min(col1), max(col1) from table order by col1
>>
>> not do the trick;
>>
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>


[GENERAL] get first and last row in one sql as two columns

2015-09-02 Thread Tom Smith
Hi:

I need to get the first and last tow in one sql like below

select first(col1), last(col1) from table order by col1

I saw some posting in wiki with a custom function (or C extention)
to do this.   Is it widely used and reliable?
https://wiki.postgresql.org/wiki/First/last_(aggregate)

I am wondering why these two functions are not part of postgresql built-in
functions as it has many use cases

Thanks


[GENERAL] fastes way to retrieve segmented without using loop

2015-08-23 Thread Tom Smith
Hello:

I have a time series table,
using below sql, loop (psque code), I can get one row for each hour

for( H=1: H 9; H++){
   select  * from table where  t = H and t  H+1   limit 1
}

t (time column) is indexed).

Is there a better way to use a issue a SINGLE SQL
with  an array of time start/end pair
so it will be executed once to send back result, which would be much faster
than issuing sql again and again (even with prepared statement and using
sql function).

Thanks in Advance


Re: [GENERAL] fastes way to retrieve segmented without using loop

2015-08-23 Thread Tom Smith
Sorry for the confusion. I will later find a better way to present the
issue.
But thanks for everyone's help.

On Sun, Aug 23, 2015 at 6:20 PM, Melvin Davidson melvin6...@gmail.com
wrote:

 Yes, I agree. I was a bit confused by the term time column. Not mention,
 the O/S and PG version were not given.
 If column t is truly type time, then only 24 rows can be returned
 regardless of limit, as in this Universe, there are only 24 hrs in time.
 However, if t is a timestamp, that is a whole other animal and the
 DISTINCT would have to be adjusted to include date  hour.
 Perhaps if we were given a more accurate table structure, a more exact
 solution could be provided.

 On Sun, Aug 23, 2015 at 6:09 PM, John McKown john.archie.mck...@gmail.com
  wrote:

 Melvin's use of DISTINCT ON (...) is superior to my use of DISTINCT(...)
 because it doesn't return the value to your program. I keep forgetting this
 way. I learned it the other way. Old dog + new trick == problem.

 On Sun, Aug 23, 2015 at 5:04 PM, John McKown 
 john.archie.mck...@gmail.com wrote:

 On Sun, Aug 23, 2015 at 4:05 PM, Tom Smith tomsmith198...@gmail.com
 wrote:

 Hello:

 I have a time series table,
 using below sql, loop (psque code), I can get one row for each hour


 ​s/psque/pseudo/g;​




 for( H=1: H 9; H++){
select  * from table where  t = H and t  H+1   limit 1
 }

 t (time column) is indexed).

 Is there a better way to use a issue a SINGLE SQL
 with  an array of time start/end pair
 so it will be executed once to send back result, which would be much
 faster
 than issuing sql again and again (even with prepared statement and
 using sql function).

 Thanks in Advance


 Well, I have a bit of a problem if t is a time column. Do you mean a
 time interval? Or do you really mean it is an integer of some sort. I ask
 because H sure looks like a plain old integer to me.

 In any case, if t is an int as opposed to a time interval, then
 you could start with something like:

 SELECT DISTINCT(t),  columns  FROM table WHERE t BETWEEN 1 AND
 9;

 But if t really is a time interval in the PostgreSQL sense, and H is
 like the hour portion (H -- Hour, makes sense to this weirdo). And you
 want to select one row of data where the t interval is 1 hour, another
 where the t interval is 2 hours, another where the t interval is 3
 hours, and so on up to an interval of at most 99_999 hours. Then you might
 need something like:

 SELECT DISTINCT(EXTRACT(HOUR FROM t)) AS interval_truncated_to_hour, *
 FROM table WHERE t BETWEEN 1 AND 9;

 I don't know of a way to eliminate the first field from the result. But,
 in reality, I would _never_ issue a SELECT * in a normal program. Only
 ask for the columns you are actually going to need. Because, someday,
 someone, is going to change the schema on the table and your program is
 (im)pure porcine excrement at that point. With no errors returned to it.
 IMO, it is an unshakable rule to ONLY and ALWAYS specify the variable
 names. The only exception is if your program actually examines the schema
 of the table before doing a SELECT and dynamically constructs it.






 --

 Schrodinger's backup: The condition of any backup is unknown until a
 restore is attempted.

 Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will
 be.

 He's about as useful as a wax frying pan.

 10 to the 12th power microphones = 1 Megaphone

 Maranatha! 
 John McKown




 --

 Schrodinger's backup: The condition of any backup is unknown until a
 restore is attempted.

 Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will
 be.

 He's about as useful as a wax frying pan.

 10 to the 12th power microphones = 1 Megaphone

 Maranatha! 
 John McKown




 --
 *Melvin Davidson*
 I reserve the right to fantasize.  Whether or not you
 wish to share my fantasy is entirely up to you.



[GENERAL] retrieve subset of a jsonb object with a list of keys

2015-08-19 Thread Tom Smith
Hi:

I have a jsonb columne with json object like belo
{a: 1, b:2, c:3}

I'd like to get subset of the object with key list [a,c]
so it retruns json object of

{a: 1,  c:3}

something like

select '{a: 1, b:2, c:3}'::jsob -'[a,c]'

what would be the most efficient (and simplest if possible) to get the
subset with the key list?

Thanks


Re: [GENERAL] retrieve subset of a jsonb object with a list of keys

2015-08-19 Thread Tom Smith
thanks.  I hope a new function can be added(with high perf C function) in
new release to allow something like

json_subset(jsonb_object, [key1,key2])


On Wed, Aug 19, 2015 at 9:46 AM, Chris Mair ch...@1006.org wrote:

 On 19/08/15 13:37, Tom Smith wrote:
  Hi:
 
  I have a jsonb columne with json object like belo
  {a: 1, b:2, c:3}
 
  I'd like to get subset of the object with key list [a,c]
  so it retruns json object of
 
  {a: 1,  c:3}
 
  something like
 
  select '{a: 1, b:2, c:3}'::jsob -'[a,c]'
 
  what would be the most efficient (and simplest if possible) to get the
  subset with the key list?

 Hi,

 I came up with this:

 select json_object_agg(key, value)
   from jsonb_each_text('{a: 1, b:2, c:3}'::jsonb)
   where key in ('a', 'c');

 IDK if there is a shorter way. You might want to wrap this in a function.

 Bye,
 Chris.