from:"Rémi Cura"

Re: [GENERAL] Improve PostGIS performance with 62 million rows?

2017-01-09 Thread Rémi Cura

Hey,
I like your curiosity !

At the billion range, you __have__ to use pgpointcloud,
pyramid raster solution (actually the more common way to perform this task)
or another database (hello monetdb).
Cheers,
Rémi-C

2017-01-09 20:11 GMT+01:00 Jonathan Vanasco :

>
> On Jan 9, 2017, at 12:49 PM, Israel Brewster wrote:
>
> >  Planning time: 4.554 ms
> >  Execution time: 225998.839 ms
> > (20 rows)
> >
> > So a little less than four minutes. Not bad (given the size of the
> database), or so I thought.
> >
> > This morning (so a couple of days later) I ran the query again without
> the explain analyze to check the results, and noticed that it didn't take
> anywhere near four minutes to execute. So I ran the explain analyze again,
> and got this:
>
> ...
>
> >  Planning time: 0.941 ms
> >  Execution time: 9636.285 ms
> > (20 rows)
> >
> > So from four minutes on the first run to around 9 1/2 seconds on the
> second. Presumably this difference is due to caching? I would have expected
> any caches to have expired by the time I made the second run, but the data
> *is* static, so I guess not. Otherwise, I don't know how to explain the
> improvement on the second run - the query plans appear identical (at least
> to me). *IS* there something else (for example, auto vacuum running over
> the weekend) that could explain the performance difference?
>
>
> This may sound crazy, but I suggest running each of these scenarios 3+
> times:
>
> # cold explain
> stop postgres
> start postgres
> explain analyze SELECT
>
> # cold select
> stop postgres
> start postgres
> enable \t for query timing
> SELECT
>
> # cold explain to select
> stop postgres
> start postgres
> explain analyze SELECT
> enable \t for query timing
> SELECT
>
> # cold select to explain
> stop postgres
> start postgres
> enable \t for query timing
> SELECT
> explain analyze SELECT
>
> # cold select to select
> stop postgres
> start postgres
> enable \t for query timing
> SELECT
> SELECT
>
> I've found the timing for "Explain Analyze" to be incredibly different
> from an actual SELECT on complex/large dataset queries... and the
> differences don't seem to correlate to possible speedups from index/table
> caching.
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Improve PostGIS performance with 62 million rows?

2017-01-05 Thread Rémi Cura

Hey,
1 sec seems really good in this case,
and I'm assuming you tuned postgres so the main index fits into ram
(work_mem and all other stuff).

You could avoid a CTE by mixing both cte.

WITH pts AS (
SELECT (pt).geom, (pt).path[1] as vert
FROM
ST_DumpPoints(
ST_Segmentize(
ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
61.179167,-156.77 71.285833)'),
600
)::geometry
) as pt
)
SELECT elevation
FROM data
INNER JOIN (SELECT
ST_MakeLine(ARRAY[a.geom, b.geom]) as short_line
FROM pts a
INNER JOIN pts b
ON a.vert=b.vert-1 AND b.vert>1) segments
ON  ST_DWithin(location, segments.short_line, 600)
ORDER BY elevation DESC limit 1;


Then you could remove the useless and (potentially explosive if you have
large number of dump points) inner join on points :
"FROM pts a
INNER JOIN pts b "

You could simply use a window function to generate the segments, like in
here

.
The idea is to dump points, order them by path, and then link each point
with the previous one (function lag).
Assuming you don't want to use the available function,
this would be something like :



WITH segments AS (
SELECT ST_MakeLine( lag((pt).geom , 1, NULL) OVER (ORDER BY (pt).path)
  ,(pt).geom) AS short_line
FROM ST_DumpPoints(
  ST_Segmentize(
ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
61.179167,-156.77 71.285833)'),
600
)::geometry
) as pt
)
SELECT elevation
FROM data ,segments
WHERE segments.short_line IS NOT NULL --the first segment is null by design
(lag function)
  AND  ST_DWithin(location, segments.short_line, 600) = TRUE
ORDER BY elevation DESC
limit 1;


I don't know if you can further improve this query after that,
but I'll guess it would reduce your time and be more secure regarding
scaling.


if you want to further improve your result,
you'll have to reduce the number of row in your index,
that is partition your table into several tables !

This is not easy to do with current postgres partitionning methods as far
as I know
(partitionning is easy, automatic efficient query is hard).

Another way would be to reduce you requirement, and consider that in some
case you may want less details in the altimetry, which would allow you to
use a Level Of Detail approach.

Congrats for the well explained query/problem anyway !
Cheers,
Rémi-C

2017-01-05 23:09 GMT+01:00 Paul Ramsey :

> Varying the segment length upwards might have a salutary effect for a
> while, as the efficiency improvement of fewer inner loops battles with the
> inefficiency of having more points selected by the index filter. Worth an
> experiment.
>
> P
>
> On Thu, Jan 5, 2017 at 1:00 PM, Israel Brewster 
> wrote:
>
>>
>> On Jan 5, 2017, at 10:38 AM, Paul Ramsey 
>> wrote:
>>
>> Yes, you did. You want a query that spits out a tupleset of goemetries
>> (one each for each wee segment), and then you can join that set to your
>> main table using st_dwithin() as the join clause.
>> So start by ditching the main table and just work on a query that
>> generates a pile of wee segments.
>>
>>
>> Ahhh, I see you've done this sort of thing before (
>> http://blog.cleverelephant.ca/2015/02/breaking-linestring-
>> into-segments.html) :-)
>>
>> So following that advice I came up with the following query:
>>
>> WITH dump AS (SELECT
>> ST_DumpPoints(
>> ST_Segmentize(
>> ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
>> 61.179167,-156.77 71.285833)'),
>> 600
>> )::geometry
>> ) as pt
>> ),
>> pts AS (
>> SELECT (pt).geom, (pt).path[1] as vert FROM dump
>> )
>> SELECT elevation
>> FROM data
>> INNER JOIN (SELECT
>> ST_MakeLine(ARRAY[a.geom, b.geom]) as short_line
>> FROM pts a
>> INNER JOIN pts b
>> ON a.vert=b.vert-1 AND b.vert>1) segments
>> ON  ST_DWithin(location, segments.short_line, 600)
>> ORDER BY elevation DESC limit 1;
>>
>> Which yields the following EXPLAIN ANALYZE (https://explain.depesz.com/s/
>> RsTD ):
>>
>>
>>QUERY PLAN
>>
>>
>> 
>> 
>> 
>> 
>>  Limit  (cost=11611706.90..11611706.91 rows=1 width=4) (actual
>> time=1171.814..1171.814 rows=1 loops=1)
>>CTE dump
>>  ->  Result  (cost=0.00..5.25 rows=1000 width=32) (actual
>> time=0.024..1.989 rows=1939 loops=1)
>>CTE pts
>>  ->  CTE Scan on dump  (cost=0.00..20.00 rows=1000 width=36) (actual
>> time=0.032..4.071 rows=1939 loops=1)
>>->  Sort  (cost=11611681.65..11611768.65 rows=34800 width=4) (actual

Re: [GENERAL] forcing a table (parent in inheritance) tor emain empty

2016-10-11 Thread Rémi Cura

Perfect !

Cheers,
Rémi C

2016-10-11 19:12 GMT+02:00 Julien Rouhaud <julien.rouh...@dalibo.com>:

> On 11/10/2016 19:04, Rémi Cura wrote:
> > This solution is very nice.
> > Sadly the check is inherited by the children
> > (I only want the parent to be empty, not the children).
> >
> > It seems the element that are not inherited are
> >
> >   * Indexes
> >   * Unique constraints
> >   * Primary Keys
> >   * Foreign keys
> >   * Rules and Triggers
> >
>
> you can specify a "NO INHERIT" on the check constraint, that should
> solve your issue.
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>

Re: [GENERAL] forcing a table (parent in inheritance) tor emain empty

2016-10-11 Thread Rémi Cura

This solution is very nice.
Sadly the check is inherited by the children
(I only want the parent to be empty, not the children).

It seems the element that are not inherited are

   - Indexes
   - Unique constraints
   - Primary Keys
   - Foreign keys
   - Rules and Triggers


thanks anyway for the fast answer,
Cheers,
Rémi C

2016-10-11 18:33 GMT+02:00 Manuel Gómez <tar...@gmail.com>:

> On Tue, Oct 11, 2016 at 12:27 PM, Rémi Cura <remi.c...@gmail.com> wrote:
> > Hey dear list,
> > I can't find a nice solution to enforce a necessary behaviour in my case
> :
> > I want a parent table to remain empty.
> >
> > Of course I could define a trigger and return NULL in any case, but I'd
> like
> > a more elegant approach using check or constraints.
>
> You could probably do it with a simple constraint:
>
> postgres=# create table dum(check (false));
> CREATE TABLE
> postgres=# insert into dum default values;
> ERROR:  new row for relation "dum" violates check constraint "dum_check"
> DETAIL:  Failing row contains ().
>

[GENERAL] forcing a table (parent in inheritance) tor emain empty

2016-10-11 Thread Rémi Cura

Hey dear list,
I can't find a nice solution to enforce a necessary behaviour in my case :
I want a parent table to remain empty.

Of course I could define a trigger and return NULL in any case, but I'd
like a more elegant approach using check or constraints.

Any thought appreciated,
Cheers,
Rémi C

Re: [GENERAL] Update statement results in Out of memory

2016-07-06 Thread Rémi Cura

You could check the max number of points in your geometries :

SELECT max(ST_NumPoints(geom))
FROM ...

Of course you could still have invalid / abberant geometry,
which you could also check (ST_IsValid, St_IsSimple).

You could solve both those hypotheses if you could perform your buffer by
batch.

Cheers,
Rémi-C

2016-07-06 15:36 GMT+02:00 Paul Ramsey :

> Running a multi-million row update will take a long time.
> It's possible you've exposed a memory leak in ST_Buffer (the older
> your version of GEOS, the more likely that is) but it's also possible
> you're just running a really long update.
> I find for batch processing purposes that creating fresh tables is far
> preferable:
>
> CREATE TABLE newtable AS SELECT ST_Buffer(geom) ... FROM oldtable;
>
> If you still see memory issues with the above then you probably do
> have a leak, *or* you're just running buffer on a sufficiently large
> input geometry or with a large enough radius to blow up the memory
> naturally.
>
> P
>
>
> On Wed, Jul 6, 2016 at 2:13 AM, Ivan Bianchi  wrote:
> > Hello,
> >
> > I am trying to update a column using a PostGIS ST_Buffer function into a
> > table of 4.257.769 rows, but after 6 hours, an Out of memory error
> appears
> > and the kernel starts killing processes until a Kernel Panic shows up.
> >
> > I have simplified the buffer target geometry and also added a gist index
> to
> > that column.
> >
> > The statement is the following:
> >>
> >> psql -h host -U user -W -d database -c "UPDATE table SET buffer =
> >> ST_Buffer(simplified_geometry, 0.005);"
> >
> >
> > After reading and tunning the configuration, I still have the same
> result.
> >
> > Here's the initial memory stats:
> >
> >>   totalusedfree shared  buff/cache
>  available
> >> Mem:15G1.5G 12G503M1.4G
> >> 13G
> >> Swap:  7.8G  0B7.8G
> >
> >
> >
> > I'm running out of ideas, as I think the postgresql.conf memory
> parameters
> > are quite low for the machine specs. I understand I can split the process
> > and paginate the rows, but I can't see why I can't deal with this full
> > statement right now.
> >
> > Do you think this issue is related with the postgres memory parameters
> > configuration? Why is not respecting the shared_buffers or
> > effective_cache_size parameters and keeps growing?
> >
> >
> > Here's some info:
> >
> > Machine specs
> >
> > Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (8 cores)
> > 16 GB of memory
> > Fedora release 23 (Twenty Three)
> > Kernel - 4.5.7-202.fc23.x86_64
> >
> > postgresql.conf
> >
> > effective_cache_size = 5GB
> > shared_buffers = 3GB
> > work_mem = 10MB
> >
> > maintenance_work_mem = 800MB
> > wal_buffers = 16MB
> >
> > Kernel parameters
> >
> > vm.overcommit_memory=2
> >
> > kernel.shmmax = 8340893696
> > kernel.shmall = 2036351
> >
> > Versions:
> >
> > PostgreSQL 9.5.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 5.3.1
> > 20160406 (Red Hat 5.3.1-6), 64-bit
> > POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.1, 04
> > March 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.9.3"
> > LIBJSON="0.12" RASTER
> >
> >
> > Many thanks,
> >
> > --
> > Ivan
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] PLPythonu for production server

2016-03-05 Thread Rémi Cura

Thanks !
Cheers,
Rémi-C

2016-03-05 0:38 GMT+01:00 Adrian Klaver <adrian.kla...@aklaver.com>:

> On 03/04/2016 01:46 AM, Rémi Cura wrote:
>
>> Thanks for the answer guys.
>>
>> I should have mentionned that I had read the doc,
>> and was looking for non explicit knowledge,
>> like :
>> - what is the reputation of plpython for a dba?
>> - are there actual production system that use it
>> - what would be the recommended usage perimeter ?
>>(only administration script like function, advanced processing, etc
>> ...)
>>
>
> An example:
>
> http://bonesmoses.org/2016/03/04/pg-phriday-being-a-tattletale/
>
>
>
>> Cheers,
>> Rémi-C
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Re: [GENERAL] bloated postgres data folder, clean up

2016-03-04 Thread Rémi Cura

Hey Julien,
thanks for the original query !

There was a slight mistake in the query,
it was comparing the file name with
pg_class.relfilenode.
It is not safe in some case (see doc : "caution" in here
<http://www.postgresql.org/docs/current/static/storage-file-layout.html>)
, so better use the pg_relation_filenode() function.
AS a result this database could not be started anymore (no worry I had
copy).
However using pg_relation_filenode() seems to be safe (passes vacuum full
analyse).

I'll modify the query as soon as I have access to gist.

I agree the warning about tablespace is also important.
I'll put all of this on the wiki as soon has I have permission to create a
new page
(man, how long is this "cool-off", it's been already several days !).

Cheers,
Rémi-C

2016-03-03 20:10 GMT+01:00 Julien Rouhaud <julien.rouh...@dalibo.com>:

> On 03/03/2016 18:15, Rémi Cura wrote:
> > Hey,
>
> Hello Rémi,
>
> > first I forgot something in the querry to remove the annoying .XXX :
> > ---
> > SELECT distinct substring(file_name from '\d+' )
> > FROM find_useless_postgres_file('your_database_name') ;
> > ---
> >
>
> Thanks for working on this :)
>
> I added a comment on the gist URL you provided. It's a simplified
> version of the main query that should work fine and detect more orphan
> files. Double checking it would be a good idea though.
>
> Also, as you can't check other databases than the one you're connected
> to, I used current_database() instead of user defined database name.
>
> It's also important to warn that all of this only work for finding
> orphan files on the default
> 
> tablespace (and to never blindly remove
> files in the PGDATA of course).
> 
>
>

Re: [GENERAL] PLPythonu for production server

2016-03-04 Thread Rémi Cura

Thanks for the answer guys.

I should have mentionned that I had read the doc,
and was looking for non explicit knowledge,
like :
- what is the reputation of plpython for a dba?
- are there actual production system that use it
- what would be the recommended usage perimeter ?
  (only administration script like function, advanced processing, etc ...)

Cheers,
Rémi-C

2016-03-03 20:55 GMT+01:00 David G. Johnston <david.g.johns...@gmail.com>:

> On Thu, Mar 3, 2016 at 12:35 PM, Adrian Klaver <adrian.kla...@aklaver.com>
> wrote:
>
>> On 03/03/2016 10:09 AM, Rémi Cura wrote:
>>
>>> Hey List,
>>>
>>> would it be considered safe to use plpythonu for a production database?
>>> What would be the limitations/ dangers?
>>>
>>
>> They are explained here:
>>
>> http://www.postgresql.org/docs/9.5/interactive/plpython.html
>>
>> "PL/Python is only available as an "untrusted" language, meaning it does
>> not offer any way of restricting what users can do in it and is therefore
>> named plpythonu. A trusted variant plpython might become available in the
>> future if a secure execution mechanism is developed in Python. The writer
>> of a function in untrusted PL/Python must take care that the function
>> cannot be used to do anything unwanted, since it will be able to do
>> anything that could be done by a user logged in as the database
>> administrator. Only superusers can create functions in untrusted languages
>> such as plpythonu."
>>
>
> See also:
>
> http://www.postgresql.org/docs/9.5/interactive/sql-grant.html
>
> GRANT { USAGE | ALL [ PRIVILEGES ] }
> ON LANGUAGE lang_name [, ...]
> TO role_specification [, ...] [ WITH GRANT OPTION ]
>
> and
>
> GRANT { EXECUTE | ALL [ PRIVILEGES ] }
> ON { FUNCTION function_name ( [ [ argmode ] [ arg_name ] arg_type [,
> ...] ] ) [, ...]
>  | ALL FUNCTIONS IN SCHEMA schema_name [, ...] }
> TO role_specification [, ...] [ WITH GRANT OPTION ]
>
> David J.
>
>

[GENERAL] PLPythonu for production server

2016-03-03 Thread Rémi Cura

Hey List,

would it be considered safe to use plpythonu for a production database?
What would be the limitations/ dangers?

Thanks,
Cheers,
Rémi-C

Re: [GENERAL] bloated postgres data folder, clean up

2016-03-03 Thread Rémi Cura

gresql.org/wiki/Show_database_bloat I get:
>
>
>
> live=# select tbloat,wasted_space from table_bloat order by wasted_space
> desc limit 25;
>
> tbloat │ wasted_space
>
> ┼──
>
> 1.0 │ 9976 kB
>
> 1.2 │ 98 GB
>
> 1.0 │ 97 MB
>
> 1.4 │ 96 kB
>
> 1.2 │ 920 kB
>
> 1.2 │ 88 kB
>
> 1.1 │ 88 kB
>
> 2.0 │ 8192 bytes
>
> 0.0 │ 8192 bytes
>
> 1.3 │ 8192 bytes
>
> 2.0 │ 8192 bytes
>
> 1.3 │ 8192 bytes
>
> 1.5 │ 8192 bytes
>
> 1.5 │ 8192 bytes
>
> 2.0 │ 8192 bytes
>
> 1.1 │ 8192 bytes
>
> 1.0 │ 8192 bytes
>
> 1.1 │ 8192 bytes
>
> 1.3 │ 8192 bytes
>
> 1.5 │ 8192 bytes
>
> 1.1 │ 80 kB
>
> 1.0 │ 7584 kB
>
> 1.6 │ 71 MB
>
> 1.0 │ 704 kB
>
> 1.1 │ 6968 kB
>
> (25 rows)
>
>
>
> So actually, quite a lot of bloated data J
>
> What am I doing wrong?
>
>
>
> Mit freundlichen Grüßen / With kind regards,
>
> Johnny Morano
>
> 
>
>
>
> *Johnny Morano  |  Principal Systems Engineer*
>
>
>
> PAY.ON GmbH  |  AN ACI WORLDWIDE COMPANY  |  WWW.PAYON.COM
> <http://www.payon.com/>
>
> Jakob-Haringer-Str. 1  |  5020 Salzburg  |  Austria
>
>
>
> This email message and any attachments may contain confidential,
> proprietary or non-public information. This information is intended solely
> for the designated recipient(s). If an addressing or transmission error has
> misdirected this email, please notify the sender immediately and destroy
> this email. Any review, dissemination, use or reliance upon this
> information by unintended recipients is prohibited. Any opinions expressed
> in this email are those of the author personally.
>
>
>
> *From:* Rémi Cura [mailto:remi.c...@gmail.com]
> *Sent:* Mittwoch, 2. März 2016 17:49
> *To:* Johnny Morano
> *Cc:* Alvaro Herrera; PostgreSQL General
>
> *Subject:* Re: [GENERAL] bloated postgres data folder, clean up
>
>
>
> Hey,
>
> this is quite the *opposite*.
>
> The function find files in the postgres database folder that are not used
> by the database.
>
> To use it :
>
>  * connect to the database you want to analyse ( **mandatory** ).
>
>  * create the function (execute function definition)
>
>  * Execute `SELECT * FROM find_useless_postgres_file('your_database_name')`
>
>
> This will output a list of files that are on the disk but not used by
> postgres,
>
> and so can be removed.
>
> To be extra sure, you should use oid2name programme to check that the
> useless files are really useless.
>
>
> For this :
>  * output the list of potential useless files with copy for instance
>   ex :
>   COPY ( SELECT file_name
>
>  FROM find_useless_postgres_file('your_database_name')
>
>) TO 'path_to_you_database_folder/potential_useless.txt'
>
>now you've got a file with a list of potential erroneous files.
>
>  * Then use oid2name
>
>   `$su postgres
>$cd path_to_you_database_folder
>
>$while read i; do oid2name -f "$i" -i -S -q -d your_database_name; done
> < potential_useless.txt
>   `
>
>   Nothing should show, meaning that every potential erroneous file
>has not been recognized by oid2name !
>
>   If you feel unconvinced, you can manually try oid2name on some
>of the potential erroneous files, to be extra sure.
>   It should not find anything.
>
>
>
>  * Now delete all the files in `potential_useless.txt`.
>
>   It could be wiser to not delete the files but rename those
>
>   (for instance, adding `.potentially_useless` as a postfix)
>
>   so if it breaks something, you have an easy way to revert everything.
>
>
>
> Anyway, use *-*extra extra*-* caution if you delete.
> Except a backup, there would be no easy way to correct a mistake.
>
> Cheers,
>
> Rémi-C
>
>
>
> 2016-03-02 15:38 GMT+01:00 Johnny Morano <johnny.mor...@payon.com>:
>
> Hi Remi!
>
>
>
> This SQL function you have provided, seems to return all valid files, is
> that correct? In my case, it returned all my ‘base/’ files. Is that normal?
>
> If yes, maybe you rename the function to ‘find_useful_postgres_files’ ;-)
>
>
>
> Could you explain in steps how to use this function to make a cleanup of
> bloated data? (like in an example with commands and example output, if
> possible of course)
>
>
>
> Thanks!
>
>
>
>
>
> Mit besten Grüßen / With best regards,
>
> Johnny Morano
>
> _

Re: [GENERAL] bloated postgres data folder, clean up

2016-03-02 Thread Rémi Cura

Hey,
this is quite the *opposite*.
The function find files in the postgres database folder that are not used
by the database.

To use it :
 * connect to the database you want to analyse ( **mandatory** ).
 * create the function (execute function definition)
 * Execute `SELECT * FROM find_useless_postgres_file('your_database_name')`

This will output a list of files that are on the disk but not used by
postgres,
and so can be removed.

To be extra sure, you should use oid2name programme to check that the
useless files are really useless.


For this :
 * output the list of potential useless files with copy for instance
  ex :
  COPY ( SELECT file_name
 FROM find_useless_postgres_file('your_database_name')
   ) TO 'path_to_you_database_folder/potential_useless.txt'

   now you've got a file with a list of potential erroneous files.

 * Then use oid2name
  `$su postgres
   $cd path_to_you_database_folder
   $while read i; do oid2name -f "$i" -i -S -q -d your_database_name; done
< potential_useless.txt
  `

  Nothing should show, meaning that every potential erroneous file
   has not been recognized by oid2name !
  If you feel unconvinced, you can manually try oid2name on some
   of the potential erroneous files, to be extra sure.
  It should not find anything.

 * Now delete all the files in `potential_useless.txt`.
  It could be wiser to not delete the files but rename those
  (for instance, adding `.potentially_useless` as a postfix)
  so if it breaks something, you have an easy way to revert everything.

Anyway, use *-*extra extra*-* caution if you delete.
Except a backup, there would be no easy way to correct a mistake.

Cheers,
Rémi-C

2016-03-02 15:38 GMT+01:00 Johnny Morano <johnny.mor...@payon.com>:

> Hi Remi!
>
>
>
> This SQL function you have provided, seems to return all valid files, is
> that correct? In my case, it returned all my ‘base/’ files. Is that normal?
>
> If yes, maybe you rename the function to ‘find_useful_postgres_files’ ;-)
>
>
>
> Could you explain in steps how to use this function to make a cleanup of
> bloated data? (like in an example with commands and example output, if
> possible of course)
>
>
>
> Thanks!
>
>
>
>
>
> Mit besten Grüßen / With best regards,
>
> Johnny Morano
>
> 
>
>
>
> *Johnny Morano  | Principal Systems Engineer*
>
>
>
> PAY.ON GmbH  |  AN ACI WORLDWIDE COMPANY  |  WWW.PAYON.COM
> <http://www.payon.com/>
>
> Jakob-Haringer-Str. 1  |  5020 Salzburg  |  Austria
>
> Registered at: LG Salzburg  |  Company number: FN 315081 f  |  VAT-ID:
> ATU64439405
>
> Managing Director: Christian Bamberger
>
>
>
>
>
> Follow us on:
>
>
>
> [image: cid:image001.jpg@01D126D0.E1AB0670] <http://blog.payon.com/>  [image:
> cid:image002.jpg@01D126D0.E1AB0670]
> <http://www.linkedin.com/company/146260?trk=tyah>  [image:
> cid:image003.jpg@01D126D0.E1AB0670] <https://twitter.com/PAYON_com>
>
>
>
> This email message and any attachments may contain confidential,
> proprietary or non-public information. This information is intended solely
> for the designated recipient(s). If an addressing or transmission error has
> misdirected this email, please notify the sender immediately and destroy
> this email. Any review, dissemination, use or reliance upon this
> information by unintended recipients is prohibited. Any opinions expressed
> in this email are those of the author personally.
>
>
>
> This message and any attachments have been scanned for viruses prior
> leaving PAY.ON; however, PAY.ON does not guarantee the security of this
> message and will not be responsible for any damages arising as a result of
> any virus being passed on or arising from any alteration of this message by
> a third party. PAY.ON may monitor e-mails sent to and from PAY.ON.
>
>
>
>
>
>
>
>
>
> *From:* pgsql-general-ow...@postgresql.org [mailto:
> pgsql-general-ow...@postgresql.org] *On Behalf Of *Rémi Cura
> *Sent:* Mittwoch, 2. März 2016 14:58
> *To:* Alvaro Herrera
> *Cc:* PostgreSQL General
> *Subject:* Re: [GENERAL] bloated postgres data folder, clean up
>
>
>
> Would gladly do it,
>
> but still this "wiki cooloff" stuff,
>
> can't create a page
>
> Cheers,
>
> Rémi-C
>
>
>
> 2016-02-29 20:44 GMT+01:00 Alvaro Herrera <alvhe...@2ndquadrant.com>:
>
> Rémi Cura wrote:
> > Hey dear list,
> > after a fex years of experiments and crash,
> > I ended up with a grossly bloated postgres folder.
> > I had about 8 Go of useless files.
>
> Would you add a new page to the wiki with this?
>
> https://wiki.postgresql.org/wiki/Category:Administrative_Snippets
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>

Re: [GENERAL] bloated postgres data folder, clean up

2016-03-02 Thread Rémi Cura

Would gladly do it,
but still this "wiki cooloff" stuff,
can't create a page
Cheers,
Rémi-C

2016-02-29 20:44 GMT+01:00 Alvaro Herrera <alvhe...@2ndquadrant.com>:

> Rémi Cura wrote:
> > Hey dear list,
> > after a fex years of experiments and crash,
> > I ended up with a grossly bloated postgres folder.
> > I had about 8 Go of useless files.
>
> Would you add a new page to the wiki with this?
>
> https://wiki.postgresql.org/wiki/Category:Administrative_Snippets
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

[GENERAL] bloated postgres data folder, clean up

2016-02-29 Thread Rémi Cura

Hey dear list,
after a fex years of experiments and crash,
I ended up with a grossly bloated postgres folder.
I had about 8 Go of useless files.
All is in a virtualbox, so I'm sure to be able to reproduce exactly, and
fried my postgres folder a couple of time before getting it right.

Julien (Rouhaud) helped me to find those useless files via SQL.
The idea is to list files in postgres directory with `pg_ls_dir`, then to
check that the dir name correspond to something useful (using
pg_relation_filenode).
--
https://gist.github.com/Remi-C/926eaee04d61a7245eb8
--

To be sure I export the found files list,
then use oid2name to check that no file is recognized.

files can then be deleted (using plpythonu in my case).

So far a vacuum full analyze raise no errors.

Warning : for this to work, the SQL query must be sent while connected to
the database to clean.

Hope this may be useful
Cheers,
Rémi-C

[GENERAL] (expert) "insert into VIEW returning" inside an instead of trigger returns nothing

2015-09-02 Thread Rémi Cura

Hey list,
I'm stuck on a problem that I can't figure out (postgres 9.3).
In short, using an
INSERT INTO __view_with_trigger__ ...  RETURNING gid INTO _gid;
returns nothing.
I need this feature because I use views on tables as user interface.

This must have to do with the postgres order of execution,
because inserting into a table instead of the view of the table returns the
expected result.

Here is a synthetic example (of course the real use really requires this
kind of architecture),
any help is much appreciated,
because I don't see any work-around (except not using view at all, which
would be terrible data duplication in my case)

Cheers,
Rémi-C




-- test inserting and instead of trigger --
---

CREATE SCHEMA IF NOT EXISTS test ;
SET search_path to test, public ;

DROP TABLE IF EXISTS generic_object CASCADE;
CREATE TABLE generic_object (
gid SERIAL PRIMARY KEY
, orientation float
) ;

DROP VIEW IF EXISTS editing_generic_object ;
CREATE VIEW editing_generic_object AS(
SELECT gid,
degrees(orientation) AS orientation
FROM generic_object
) ;


DROP TABLE IF EXISTS specific_object CASCADE ;
CREATE TABLE specific_object (
gid int references generic_object (gid) ON DELETE CASCADE
, width float
) ;

DROP VIEW IF EXISTS editing_specific_object ;
CREATE VIEW editing_specific_object AS(
SELECT g.gid
, g.orientation
, so.width
FROM specific_object AS so LEFT OUTER JOIN
generic_object AS g USING (gid)
) ;




DROP FUNCTION IF EXISTS test.rc_editing_generic_object() CASCADE ;
CREATE OR REPLACE FUNCTION test.rc_editing_generic_object(  )
RETURNS  trigger  AS $BODY$
/** @brief : this trigger deals with editing generic object*/
DECLARE
BEGIN
IF TG_OP = 'DELETE' THEN DELETE FROM test.generic_object WHERE gid
= OLD.gid ; RETURN OLD ;
ELSIF TG_OP = 'INSERT' THEN INSERT INTO test.generic_object
(orientation) VALUES (radians(NEW.orientation) ) ;
ELSE UPDATE test.generic_object SET orientation =
radians(NEW.orientation) ;
END IF ;

RETURN NEW ;
END ;
$BODY$  LANGUAGE plpgsql VOLATILE;


DROP TRIGGER IF EXISTS rc_editing_generic_object ON
test.editing_generic_object ;
CREATE TRIGGER rc_edit_street_object_pedestrian INSTEAD OF UPDATE OR INSERT
OR DELETE
ON test.editing_generic_object
FOR ROW  EXECUTE PROCEDURE rc_editing_generic_object(  ) ;




DROP FUNCTION IF EXISTS test.rc_editing_specific_object() CASCADE ;
CREATE OR REPLACE FUNCTION test.rc_editing_specific_object(  )
RETURNS  trigger  AS $BODY$
/** @brief : this trigger deals with editing specific object*/
DECLARE
_gid int;
BEGIN
IF TG_OP = 'DELETE' THEN DELETE FROM test.generic_object WHERE gid
= OLD.gid ; RETURN OLD ;
ELSIF TG_OP = 'INSERT' THEN
--does not works
INSERT INTO test.editing_generic_object (orientation) VALUES (
NEW.orientation)  RETURNING gid INTO _gid;
--does works
--INSERT INTO test.generic_object (orientation) VALUES (
radians(NEW.orientation) )  RETURNING gid INTO _gid;

RAISE WARNING 'here is the gid deduced after insertion : %',
_gid ;
INSERT INTO test.specific_object (gid, width) VALUES (_gid,
NEW.width) ;
ELSE
UPDATE test.editing_generic_object  AS e SET orientation =
NEW.orientation WHERE e.gid = NEW.gid;
UPDATE test.specific_object AS s SET width = NEW.width WHERE
s.gid = NEW.gid;
END IF ;
RETURN NEW ;
END ;
$BODY$  LANGUAGE plpgsql VOLATILE;


DROP TRIGGER IF EXISTS rc_editing_specific_object ON
test.editing_specific_object ;
CREATE TRIGGER rc_editing_specific_object INSTEAD OF UPDATE OR INSERT OR
DELETE
ON test.editing_specific_object
FOR ROW  EXECUTE PROCEDURE rc_editing_specific_object(  ) ;

--testing

--inserting into generic : works
INSERT INTO editing_generic_object ( orientation) VALUES (180) ;
SELECT *
FROM generic_object ;

-- insert into specific : don't work
INSERT INTO editing_specific_object ( orientation,width) VALUES (180,
123) ;
SELECT *
FROM specific_object ;

[GENERAL] Re: (expert) "insert into VIEW returning" inside an instead of trigger returns nothing

2015-09-02 Thread Rémi Cura

I think I got it,
I have to always return something (like NEW) in the instead of trigger,
but fill NEW
with returnings of INSERT into regular table.
CHeers,
Rémi-C

2015-09-02 13:44 GMT+02:00 Rémi Cura <remi.c...@gmail.com>:

> Hey list,
> I'm stuck on a problem that I can't figure out (postgres 9.3).
> In short, using an
> INSERT INTO __view_with_trigger__ ...  RETURNING gid INTO _gid;
> returns nothing.
> I need this feature because I use views on tables as user interface.
>
> This must have to do with the postgres order of execution,
> because inserting into a table instead of the view of the table returns
> the expected result.
>
> Here is a synthetic example (of course the real use really requires this
> kind of architecture),
> any help is much appreciated,
> because I don't see any work-around (except not using view at all, which
> would be terrible data duplication in my case)
>
> Cheers,
> Rémi-C
>
> 
>
> 
> -- test inserting and instead of trigger --
> ---
>
> CREATE SCHEMA IF NOT EXISTS test ;
> SET search_path to test, public ;
>
> DROP TABLE IF EXISTS generic_object CASCADE;
> CREATE TABLE generic_object (
> gid SERIAL PRIMARY KEY
> , orientation float
> ) ;
>
> DROP VIEW IF EXISTS editing_generic_object ;
> CREATE VIEW editing_generic_object AS(
> SELECT gid,
> degrees(orientation) AS orientation
> FROM generic_object
> ) ;
>
>
> DROP TABLE IF EXISTS specific_object CASCADE ;
> CREATE TABLE specific_object (
> gid int references generic_object (gid) ON DELETE CASCADE
> , width float
> ) ;
>
> DROP VIEW IF EXISTS editing_specific_object ;
> CREATE VIEW editing_specific_object AS(
> SELECT g.gid
> , g.orientation
> , so.width
> FROM specific_object AS so LEFT OUTER JOIN
> generic_object AS g USING (gid)
> ) ;
>
>
>
>
> DROP FUNCTION IF EXISTS test.rc_editing_generic_object() CASCADE ;
> CREATE OR REPLACE FUNCTION test.rc_editing_generic_object(  )
> RETURNS  trigger  AS $BODY$
> /** @brief : this trigger deals with editing generic object*/
> DECLARE
> BEGIN
> IF TG_OP = 'DELETE' THEN DELETE FROM test.generic_object WHERE gid
> = OLD.gid ; RETURN OLD ;
> ELSIF TG_OP = 'INSERT' THEN INSERT INTO test.generic_object
> (orientation) VALUES (radians(NEW.orientation) ) ;
> ELSE UPDATE test.generic_object SET orientation =
> radians(NEW.orientation) ;
> END IF ;
>
> RETURN NEW ;
> END ;
> $BODY$  LANGUAGE plpgsql VOLATILE;
>
>
> DROP TRIGGER IF EXISTS rc_editing_generic_object ON
> test.editing_generic_object ;
> CREATE TRIGGER rc_edit_street_object_pedestrian INSTEAD OF UPDATE OR
> INSERT OR DELETE
> ON test.editing_generic_object
> FOR ROW  EXECUTE PROCEDURE rc_editing_generic_object(  ) ;
>
>
>
>
> DROP FUNCTION IF EXISTS test.rc_editing_specific_object() CASCADE ;
> CREATE OR REPLACE FUNCTION test.rc_editing_specific_object(  )
> RETURNS  trigger  AS $BODY$
> /** @brief : this trigger deals with editing specific object*/
> DECLARE
> _gid int;
> BEGIN
> IF TG_OP = 'DELETE' THEN DELETE FROM test.generic_object WHERE gid
> = OLD.gid ; RETURN OLD ;
> ELSIF TG_OP = 'INSERT' THEN
> --does not works
> INSERT INTO test.editing_generic_object (orientation) VALUES (
> NEW.orientation)  RETURNING gid INTO _gid;
> --does works
> --INSERT INTO test.generic_object (orientation) VALUES (
> radians(NEW.orientation) )  RETURNING gid INTO _gid;
>
> RAISE WARNING 'here is the gid deduced after insertion : %',
> _gid ;
> INSERT INTO test.specific_object (gid, width) VALUES (_gid,
> NEW.width) ;
> ELSE
> UPDATE test.editing_generic_object  AS e SET orientation =
> NEW.orientation WHERE e.gid = NEW.gid;
> UPDATE test.specific_object AS s SET width = NEW.width WHERE
> s.gid = NEW.gid;
> END IF ;
> RETURN NEW ;
> END ;
> $BODY$  LANGUAGE plpgsql VOLATILE;
>
>
> DROP TRIGGER IF EXISTS rc_editing_specific_object ON
> test.editing_specific_object ;
> CREATE TRIGGER rc_editing_specific_object INSTEAD OF UPDATE OR INSERT OR
> DELETE
> ON test.editing_specific_object
> FOR ROW  EXECUTE PROCEDURE rc_editing_specific_object(  ) ;
>
> --testing
>
> --inserting into generic : works
> INSERT INTO editing_generic_object ( orientation) VALUES (180) ;
> SELECT *
> FROM generic_object ;
>
> -- insert into specific : don't work
> INSERT INTO editing_specific_object ( orientation,width) VALUES (180,
> 123) ;
> SELECT *
> FROM specific_object ;
>

Re: [GENERAL] pl/python composite type array as input parameter

2015-06-02 Thread Rémi Cura

Hey,
the only straight workaround I know (which is pretty bad)
is to cast down your record to text.
Then you have an array of text, which is manageable.

For this you can either 'flatten' your record into a unique text,
or cast each part of your record to text, then emulate an array of array
(you need to know the length of the inner array in your function though).

I used this to emulate a 2D numpy vector (N*3)(for numpy).

You'll need a custom aggregate, like this one
https://github.com/Remi-C/_utilities/blob/master/postgres/array_of_array.sql
.

The other more sane solution is to pass the information about the row you
want to retrieve, and retrieve the row directly within the python.
For instance, here you would pass an array of id of the employee you want
to work with.
This is saner, but as a design I don't really like to have specific SQL
code into a generic python function.

I agree it is cumbersome, and I also badly miss more powerful input for
python function (after all, plpython can already return composite types,
which is awesome)


Cheers,
Rémi-C

2015-06-02 2:44 GMT+02:00 Adrian Klaver adrian.kla...@aklaver.com:

 On 06/01/2015 07:42 AM, Filipe Pina wrote:

 Thanks for the reply anyway, it's a pity though, it'd be useful..

 Another bump I've found along the pl/python road: insert ROWTYPE in
 table..
 Maybe you have some hint on that? :)

 So, in PLPGSQL I can:

 DECLARE
my_var my_table;
 BEGIN
my_var.col1 := 'asd';
INSERT INTO my_table VALUES(my_table.*);
 END;

 How would I do something like that in pl/python?

 First, how to declare a ROW-TYPE variable, as they're all python mappings?

 my_var = { 'col1': 'asd' } enough? it'd would miss all the other
 columns...

 Second, how to insert it?

 plpy.prepare and .execute say they don't support composite types, so I
 cannot simply pass

 pl = plpy.prepare('INSERT INTO core_customer VALUES ($1)', ['my_table'])

 Any workarounds for this? (meaning I wouldn't have to specify any
 columns in the insert statement)


 http://www.postgresql.org/docs/9.4/interactive/sql-insert.html

 pl = plpy.prepare('INSERT INTO core_table SELECT * FROM my_table')


 Thanks

 On Sex, Mai 29, 2015 at 2:00 , Peter Eisentraut pete...@gmx.net wrote:

 On 5/18/15 10:52 AM, Filipe Pina wrote:

 But one of the functions I need to create needs to accept an array
 of records.

 PL/Python doesn't support that. Some more code needs to be written to
 support that. You did everything correctly. I don't know of a good
 workaround.



 --
 Adrian Klaver
 adrian.kla...@aklaver.com



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pl/python composite type array as input parameter

2015-06-02 Thread Rémi Cura

OUps,

I forget another strategy I used :
instead of having
testp2(es employee[])

you can use
testp2( names text[], salaries integer[], ages integer[])

This might be the solution with the less work, but it is absolutely
terrible practice,
because it will be hard to change you record type (evolution difficult)
, and having many columns will make you create function with many arguments,
which is often a bad idea.

Cheers,
Rémi-C

2015-06-02 10:36 GMT+02:00 Rémi Cura remi.c...@gmail.com:

 Hey,
 the only straight workaround I know (which is pretty bad)
 is to cast down your record to text.
 Then you have an array of text, which is manageable.

 For this you can either 'flatten' your record into a unique text,
 or cast each part of your record to text, then emulate an array of array
 (you need to know the length of the inner array in your function though).

 I used this to emulate a 2D numpy vector (N*3)(for numpy).

 You'll need a custom aggregate, like this one
 https://github.com/Remi-C/_utilities/blob/master/postgres/array_of_array.sql
 .

 The other more sane solution is to pass the information about the row you
 want to retrieve, and retrieve the row directly within the python.
 For instance, here you would pass an array of id of the employee you want
 to work with.
 This is saner, but as a design I don't really like to have specific SQL
 code into a generic python function.

 I agree it is cumbersome, and I also badly miss more powerful input for
 python function (after all, plpython can already return composite types,
 which is awesome)


 Cheers,
 Rémi-C

 2015-06-02 2:44 GMT+02:00 Adrian Klaver adrian.kla...@aklaver.com:

 On 06/01/2015 07:42 AM, Filipe Pina wrote:

 Thanks for the reply anyway, it's a pity though, it'd be useful..

 Another bump I've found along the pl/python road: insert ROWTYPE in
 table..
 Maybe you have some hint on that? :)

 So, in PLPGSQL I can:

 DECLARE
my_var my_table;
 BEGIN
my_var.col1 := 'asd';
INSERT INTO my_table VALUES(my_table.*);
 END;

 How would I do something like that in pl/python?

 First, how to declare a ROW-TYPE variable, as they're all python
 mappings?

 my_var = { 'col1': 'asd' } enough? it'd would miss all the other
 columns...

 Second, how to insert it?

 plpy.prepare and .execute say they don't support composite types, so I
 cannot simply pass

 pl = plpy.prepare('INSERT INTO core_customer VALUES ($1)', ['my_table'])

 Any workarounds for this? (meaning I wouldn't have to specify any
 columns in the insert statement)


 http://www.postgresql.org/docs/9.4/interactive/sql-insert.html

 pl = plpy.prepare('INSERT INTO core_table SELECT * FROM my_table')


 Thanks

 On Sex, Mai 29, 2015 at 2:00 , Peter Eisentraut pete...@gmx.net wrote:

 On 5/18/15 10:52 AM, Filipe Pina wrote:

 But one of the functions I need to create needs to accept an array
 of records.

 PL/Python doesn't support that. Some more code needs to be written to
 support that. You did everything correctly. I don't know of a good
 workaround.



 --
 Adrian Klaver
 adrian.kla...@aklaver.com



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Python 3.2 XP64 and Numpy...

2015-06-02 Thread Rémi Cura

Hey,
python is installed from official binary, 64 b for windows,
in C/Python32

I can't remember the argument, but it might be irrelevant.
The problem doesn't seem to be to install numpy, it works perfectly in the
regular terminal.

The problem seems to be that postgres can't use correctly numpy.


I found no version of scipy installer for win 64 with python 3.2,
I tried several but not having the exact python version always end in
failure.
The binaries you linked to are dependent on intel math kernel library,
which I don't have.


Cheers,
Rémi-C


2015-06-01 19:41 GMT+02:00 Adrian Klaver adrian.kla...@aklaver.com:

 On 06/01/2015 09:09 AM, Rémi Cura wrote:

 Hey, thanks to help me with that.
 I started fresh to have a truly reproducible process,
 so you can have all information and rule out some error possibilities.

   - Uninstall all python.
   - Check that PythonPath doesn't exist anymore
   - check that python doesn't exist anymore

   - install python 3.2.5 64 bit from official python website into
 C/Python32

   - Reload configuration for server.

   - create plpython3u , create a python function, test it (show path)

 * It works, python path is
  'C:\\Windows\\system32\\python32.zip', 'C:\\Python32\\Lib',
 'C:\\Python32\\DLLs', 'E:\\9.3\\data', 'C:\\Program
 Files\\PostgreSQL\\9.3\\bin', 'C:\\Python32',
 'C:\\Python32\\lib\\site-packages'

   - Donwload latest numpy from website.
   - ON antoher PC


 So what is the Python setup on this machine?

  * Compile numpy with visual 2008 , 64 bit
 * Create an binary installer for windows (using python.exe setup.py
 )with proper argument


 The argument would be?


   - On the server :
   - install numpy with the compiled installer.


 Best guess is that the numpy compilation you are doing on Machine A is not
 compatible with what you have installed on Machine B(the server).

 Have you looked at this:
 http://www.lfd.uci.edu/~gohlke/pythonlibs/

 or

 http://www.scipy.org/install.html



   - check that numpy is correctly installer in
 C:\Python32\Lib\site-packages
   - using an external terminal, check that numpy works (import numpy -
 OK)

   - Now, define a plpython3u function containing import numpy

   - Run the function -- error is
ERREUR:  ImportError: DLL load failed: Le module spécifié est
 introuvable.,
 which roughly translate to
ERROR: ImportError : DLL load failed : the specified module couldn't
 be found.

   - Create a plpython3u function returning sys.path
 the path is C:\\Windows\\system32\\python32.zip', 'C:\\Python32\\Lib',
 'C:\\Python32\\DLLs', 'E:\\9.3\\data', 'C:\\Program
 Files\\PostgreSQL\\9.3\\bin', 'C:\\Python32',
 'C:\\Python32\\lib\\site-packages
 numpy is in this path, in C:\\Python32\\lib\\site-packages
 All user of the computer have all rights on the
 C:\\Python32\\lib\\site-packages folder


   - execute `import imp; imp.find_package('numpy')` within the
 plpython3u function
 - returns None, 'C:\\Python32\\lib\\site-packages\\numpy', ('',
 '', 5)

   - create a helloworld module , put it next to numpy, try to call it
 - it gets called

 I really don't see what I can do more.

 Cheers,
 Rémi-C



 --
 Adrian Klaver
 adrian.kla...@aklaver.com

Re: [GENERAL] Python 3.2 XP64 and Numpy...

2015-06-01 Thread Rémi Cura

Hey, thanks to help me with that.
I started fresh to have a truly reproducible process,
so you can have all information and rule out some error possibilities.

 - Uninstall all python.
 - Check that PythonPath doesn't exist anymore
 - check that python doesn't exist anymore

 - install python 3.2.5 64 bit from official python website into C/Python32

 - Reload configuration for server.

 - create plpython3u , create a python function, test it (show path)

   * It works, python path is
'C:\\Windows\\system32\\python32.zip', 'C:\\Python32\\Lib',
'C:\\Python32\\DLLs', 'E:\\9.3\\data', 'C:\\Program
Files\\PostgreSQL\\9.3\\bin', 'C:\\Python32',
'C:\\Python32\\lib\\site-packages'

 - Donwload latest numpy from website.
 - ON antoher PC
   * Compile numpy with visual 2008 , 64 bit
   * Create an binary installer for windows (using python.exe setup.py
)with proper argument

 - On the server :
 - install numpy with the compiled installer.


 - check that numpy is correctly installer in C:\Python32\Lib\site-packages
 - using an external terminal, check that numpy works (import numpy - OK)

 - Now, define a plpython3u function containing import numpy

 - Run the function -- error is
  ERREUR:  ImportError: DLL load failed: Le module spécifié est
introuvable.,
   which roughly translate to
  ERROR: ImportError : DLL load failed : the specified module couldn't be
found.

 - Create a plpython3u function returning sys.path
the path is C:\\Windows\\system32\\python32.zip', 'C:\\Python32\\Lib',
'C:\\Python32\\DLLs', 'E:\\9.3\\data', 'C:\\Program
Files\\PostgreSQL\\9.3\\bin', 'C:\\Python32',
'C:\\Python32\\lib\\site-packages
numpy is in this path, in C:\\Python32\\lib\\site-packages
All user of the computer have all rights on the
C:\\Python32\\lib\\site-packages folder


 - execute `import imp; imp.find_package('numpy')` within the plpython3u
function
   - returns None, 'C:\\Python32\\lib\\site-packages\\numpy', ('', '', 5)

 - create a helloworld module , put it next to numpy, try to call it
   - it gets called

I really don't see what I can do more.

Cheers,
Rémi-C

Re: [GENERAL] Python 3.2 XP64 and Numpy...

2015-06-01 Thread Rémi Cura

Here is the test code
---
--creating plpython3u
DROP LANGUAGE plpython3u CASCADE;
CREATE LANGUAGE plpython3u ;


--create a test function
DROP FUNCTION IF EXISTS rc_test_python(   );
CREATE FUNCTION rc_test_python(  )
RETURNS void
AS $$
@brief This function test python

import sys
import imp
#sys.path.insert(0, 'C:\\Python32\\Lib\\site-packages\\numpy')
plpy.notice(sys.path)
plpy.notice('importing numpy')
#import numpy #first way to do it

file, pathname, description = imp.find_module('numpy')
plpy.notice(file, pathname, description) # numpy was found
imp.load_module('numpy',file, pathname, description ) #second way to do it
plpy.notice('end of importing numpy, this message won t show, an exception
is raised before')
#plpy.notice(numpy.__version__)
import helloworld as h
imp.reload(h)
s = h.helloworld()
plpy.notice(s)
return
$$ LANGUAGE plpython3u IMMUTABLE STRICT;

--call function
SELECT *
FROM rc_test_python(   ) ;
-
---

2015-06-01 18:09 GMT+02:00 Rémi Cura remi.c...@gmail.com:

 Hey, thanks to help me with that.
 I started fresh to have a truly reproducible process,
 so you can have all information and rule out some error possibilities.

  - Uninstall all python.
  - Check that PythonPath doesn't exist anymore
  - check that python doesn't exist anymore

  - install python 3.2.5 64 bit from official python website into C/Python32

  - Reload configuration for server.

  - create plpython3u , create a python function, test it (show path)

* It works, python path is
 'C:\\Windows\\system32\\python32.zip', 'C:\\Python32\\Lib',
 'C:\\Python32\\DLLs', 'E:\\9.3\\data', 'C:\\Program
 Files\\PostgreSQL\\9.3\\bin', 'C:\\Python32',
 'C:\\Python32\\lib\\site-packages'

  - Donwload latest numpy from website.
  - ON antoher PC
* Compile numpy with visual 2008 , 64 bit
* Create an binary installer for windows (using python.exe setup.py
 )with proper argument

  - On the server :
  - install numpy with the compiled installer.


  - check that numpy is correctly installer in C:\Python32\Lib\site-packages
  - using an external terminal, check that numpy works (import numpy - OK)

  - Now, define a plpython3u function containing import numpy

  - Run the function -- error is
   ERREUR:  ImportError: DLL load failed: Le module spécifié est
 introuvable.,
which roughly translate to
   ERROR: ImportError : DLL load failed : the specified module couldn't be
 found.

  - Create a plpython3u function returning sys.path
 the path is C:\\Windows\\system32\\python32.zip', 'C:\\Python32\\Lib',
 'C:\\Python32\\DLLs', 'E:\\9.3\\data', 'C:\\Program
 Files\\PostgreSQL\\9.3\\bin', 'C:\\Python32',
 'C:\\Python32\\lib\\site-packages
 numpy is in this path, in C:\\Python32\\lib\\site-packages
 All user of the computer have all rights on the
 C:\\Python32\\lib\\site-packages folder


  - execute `import imp; imp.find_package('numpy')` within the plpython3u
 function
- returns None, 'C:\\Python32\\lib\\site-packages\\numpy', ('', '',
 5)

  - create a helloworld module , put it next to numpy, try to call it
- it gets called

 I really don't see what I can do more.

 Cheers,
 Rémi-C

Re: [GENERAL] Python 3.2 XP64 and Numpy...

2015-05-28 Thread Rémi Cura

Hey thanks for the help !

  Hey dear List,

 

  On a windows XP 64.

 

  I installed python (64b),

  it works.



 What version of Python 2 or 3 or both?



 What does python -V show at the command line?





Python 3.2 and python 2.6, both 64bits are installed on the PC.

When I return sys.version from inside a plpython3u function, i get

3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)]



  CReating plpython3u works, and python works within database.

 

  I installed numpy (manually compiled,64b),

  it works outside of Postgres,

  but inside a plpython3u function, simply doing

  'import numpy' raises an error saying that python 32 is not a valid

  win32 application.



 Is there a 32 bit version of numpy on your machine?



Nope! I freshly installed all of this.

Numpy is installed only in my Python 3.2.

My python 2.6 does not have numpy.



I tried:

C:\Python32python.exe

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)]
on win32

Type help, copyright, credits or license for more information.

 import numpy

 print(numpy.__version__)

1.9.2





but when i try to import numpy from a plpython3u function it raised
error...







 

  I'm really stuck and have checked everything I could (path, rights,

  depends...)



 Does that include PYTHONPATH?



Yes! i tried to print python path from a plpython3u function (return
sys.path), i get :

['C:\\WINDOWS\\system32\\python32.zip', 'C:\\Python32\\Lib',
'C:\\Python32\\DLLs', 'F:\\postgresData', 'C:\\Program
Files\\PostgreSQL\\9.3\\bin', 'C:\\Python32',
'C:\\Python32\\lib\\site-packages']



I look in all those folders and the only numpy i found is the one i have
recompiled in C:\\Python32\\lib\\site-packages

[GENERAL] Python 3.2 XP64 and Numpy...

2015-05-27 Thread Rémi Cura

Hey dear List,

On a windows XP 64.

I installed python (64b),
it works.
CReating plpython3u works, and python works within database.

I installed numpy (manually compiled,64b),
it works outside of Postgres,
but inside a plpython3u function, simply doing
'import numpy' raises an error saying that python 32 is not a valid win32
application.

I'm really stuck and have checked everything I could (path, rights,
depends...)

Somebody has an idea of how to make it works?

Thanks
Cheers,
Rémi-C

Re: [GENERAL] GiST indeices on range types

2015-04-01 Thread Rémi Cura

As far as I understand it (not much), gist index over spatial data is in
fact gist index over range(x), range(y).

This is why Gist works in n-dimension. It always works on range
(conceptually).

In fact rectangle are the intersection of a range on x and a range on y
(literally)
same, a 3D box is the intersection of range on x,y,z
You could go further by adding time, etc.

Cheers,
Rémi-C

2015-04-01 9:00 GMT+02:00 Magnus Hagander mag...@hagander.net:

On Sat, Mar 28, 2015 at 7:52 AM, Rebecca Zahra rebeccaza...@gmail.com
wrote:

Good morning,

I am Rebecca Zahra and I am currently in my final year of Masters studies
at the University of Malta. My thesis is about the usage of indexes for
multi-dimensional data.

I was going through the posts regarding GIST indexes and I came across
the following
http://dba.stackexchange.com/questions/39589/optimizing-queries-on-a-range-of-timestamps-two-columns

I was wondering if maybe you can help me with a question. I know that an
R-Tree index implementation is used on top of GIST to index spatial data.
Can you please tell me what type of index is used on top of GIST to index
*range
types*?

PostgreSQL has had indexable range types for quite some time now:
http://www.postgresql.org/docs/current/static/rangetypes.html#RANGETYPES-GIST

Indexable with gist or spgist. I don't think the docs cover the actual
implementation internals though - you'll probably have to go to the source
if you need that.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

[GENERAL] partitoning expert : Partitonning with specialization of one column type

2015-04-01 Thread Rémi Cura

Hey, postgres 9.3 here.
for partitionning expert.

I have several table child with columns (gid serial, patch pcpatch(n))
where n may change depending on the tables, and pcpatch is a type from
pgpointcloud extension
(here is the definition)
-
CREATE TYPE pcpatch
   (INPUT=pcpatch_in,
   OUTPUT=pcpatch_out,
   RECEIVE=-,
   SEND=-,
   TYPMOD_IN=pc_typmod_in,
   TYPMOD_OUT=pc_typmod_out,
   ANALYZE=-,
   CATEGORY='U', DEFAULT='',
   INTERNALLENGTH=-1, ALIGNMENT=int4, STORAGE=MAIN);
-

The question is, how do I create an inheritance scheme?

If the father table is

CREATE TABLE father(
gid serial,
patch pcpatch
) ;

trying to put child_1, child_2 ... to inherit father raise an error


child table test_child_1 has different type for column patch


So my question is, how would it be possible (if at all) to inherit of
father table, while specializing the type of father table in child table?

Thanks,
Cheers,
Rémi-C

Re: [GENERAL] partitoning expert : Partitonning with specialization of one column type

2015-04-01 Thread Rémi Cura

Hm sorry I wasn't very clear

child table already have type pcpatch(3) , pcpatch(4), etc.
This is the normal behaviour and can't be changed.

pcpatch(n) are specialization of pcpatch type.

pcpatch(3) is widely different from pcpatch(4)

For those who know PostGIS pcpatch is like geometry and pcpatch(3) like
geometry(point) and pcpatch(5) like geometry(polygon),
for instance.

Cheers,
Rémi-C


2015-04-01 16:21 GMT+02:00 Steven Erickson steven.erick...@telventdtn.com:

  Your child could be:



 CREATE TABLE child-1(

 patchn pcpatchn

 ) INHERITS (father);







 *From:* pgsql-general-ow...@postgresql.org [mailto:
 pgsql-general-ow...@postgresql.org] *On Behalf Of *Rémi Cura
 *Sent:* Wednesday, April 01, 2015 8:41 AM
 *To:* PostgreSQL General
 *Subject:* [GENERAL] partitoning expert : Partitonning with
 specialization of one column type



 Hey, postgres 9.3 here.

 for partitionning expert.

 I have several table child with columns (gid serial, patch pcpatch(n))

 where n may change depending on the tables, and pcpatch is a type from
 pgpointcloud extension

 (here is the definition)

 -
 CREATE TYPE pcpatch
(INPUT=pcpatch_in,
OUTPUT=pcpatch_out,
RECEIVE=-,
SEND=-,
TYPMOD_IN=pc_typmod_in,
TYPMOD_OUT=pc_typmod_out,
ANALYZE=-,
CATEGORY='U', DEFAULT='',
INTERNALLENGTH=-1, ALIGNMENT=int4, STORAGE=MAIN);
 -

 The question is, how do I create an inheritance scheme?

 If the father table is

 CREATE TABLE father(

 gid serial,

 patch pcpatch

 ) ;

 trying to put child_1, child_2 ... to inherit father raise an error

 
 child table test_child_1 has different type for column patch
 

 So my question is, how would it be possible (if at all) to inherit of
 father table, while specializing the type of father table in child table?

 Thanks,

 Cheers,

 Rémi-C

 --
 NOTICE: This email message is for the sole use of the intended
 recipient(s) and may contain confidential and privileged information. Any
 unauthorized use, disclosure or distribution is prohibited. If you are not
 the intended recipient, please contact the sender by reply email and
 destroy all copies of the original message.

[GENERAL] Partitionning using geometry

2015-04-01 Thread Rémi Cura

Hey dear list,

I'd like to partition geographical (geometry) data with postgres mechanism.
(my usage is in fact related to pointcloud, but I use geometry as a work
around)
From example I read on constraint, nothing should prevent it from working
Here is a self contained example, the planner doesnt seems to use the
constraint_exclusion mechanism, whatever the constraint

Thanks,
Cheers,
Rémi-C

--

CREATE SCHEMA IF NOT EXISTS test_partitionning;
SET search_path TO test_partitionning, public ;

DROP TABLE IF  EXISTS test_father CASCADE;
CREATE TABLE test_father  (
gid SERIAL PRIMARY KEY
, geom geometry
);

create table test_child_1 (
check (geometry_overlaps(geom,ST_Expand(ST_MakePoint(10,10),10  ) ) )
,check ( geomST_Expand(ST_MakePoint(10,10),10  ) )
, CHECK (ST_X(geom) BETWEEN 0 AND 20)
, CHECK (ST_Y(geom) BETWEEN 0 AND 20)
, CHECK (  ST_Intersects(geom, ST_Expand(ST_MakePoint(10,10),10  ))  )
) inherits (test_father);
--CREATE INDEX ON test_child_1 USING GIST(geom);

create table test_child_2 (
check (geometry_overlaps(geom,ST_Expand(ST_MakePoint(30,10),10  ) ) )
,check ( geomST_Expand(ST_MakePoint(30,10),10  ) )
, CHECK (ST_X(geom) BETWEEN 20 AND 40)
, CHECK (ST_Y(geom) BETWEEN 0 AND 20)
, CHECK (  ST_Intersects(geom, ST_Expand(ST_MakePoint(30,10),10  ))  )
) inherits (test_father);
--CREATE INDEX ON test_child_2 USING GIST(geom);


INSERT INTO test_child_1 (geom)
SELECT ST_MakePoint(s1/10.0+random(),s2/10.0+random())
FROM generate_series(1,90) AS s1, generate_series(1,90) AS s2;

INSERT INTO test_child_2 (geom)
SELECT ST_MakePoint(s1/10.0+random(),s2/10.0+random())
FROM generate_series(200,300) AS s1, generate_series(1,90) AS s2;


SHOW constraint_exclusion;
SET constraint_exclusion TO partition;


WITH area_of_interest AS (
SELECT ST_Buffer(ST_MakePoint(5,5),1) as buf
)
SELECT *
FROM area_of_interest, test_father
WHERE  -- geom  buf
ST_X(geom) BETWEEN ST_XMin(buf) AND ST_Xmax(buf)
AND ST_Y(geom) BETWEEN ST_YMin(buf) AND ST_Ymax(buf) ;


SELECT *
FROM  test_father , ST_Buffer(ST_MakePoint(5,5),1) as buf
WHERE
ST_X(geom) BETWEEN ST_XMin(buf) AND ST_Xmax(buf)
AND ST_Y(geom) BETWEEN ST_YMin(buf) AND ST_Ymax(buf);
--

Re: [GENERAL] partitoning expert : Partitonning with specialization of one column type

2015-04-01 Thread Rémi Cura

Okay,
thanks for the answer.
I take it that it is do it differently then.

I might have an idea but it is not working yet, so I'll post another
message.

Thanks dear co-list user ^^
Cheers,
Rémi-C


2015-04-01 16:56 GMT+02:00 David G. Johnston david.g.johns...@gmail.com:

 On Wed, Apr 1, 2015 at 7:26 AM, Rémi Cura remi.c...@gmail.com wrote:

 pcpatch(n) are specialization of pcpatch type.


 While this may be true PostgreSQL doesn't understand specialization of
 data types.  If you cannot do as Steven suggested you do not get to
 leverage inheritance directly and will need to devise your own work-arounds
 for the capabilities that you require.

 I'm somewhat surprised that what you describe works at all - mainly given
 that functions do not allow typemod specifications to pass through the
 function invocation...but alas I'm not all that familiar with PostGIS and
 whatever is being done does indeed seem to be working...

 David J.

Re: [GENERAL] Failure loading materialized view with pg_restore

2015-02-18 Thread Rémi Cura

Hey,
pg_hba is to manage who has *access* to database.
Your problem seems to be who has* SELECT permission* to x table.
Cheers,
Rémi-C

2015-02-18 12:03 GMT+01:00 BladeOfLight16 bladeofligh...@gmail.com:

 On Wed, Feb 18, 2015 at 5:48 AM, Brian Sutherland br...@vanguardistas.net
  wrote:

 # dump and reload
 pg_dump --username super --format c -f dump.dump orig
 createdb copied


 It might be helpful to dump in the plain SQL format and look at what it's
 doing.

Re: [GENERAL] Missing table from in INSERT RETURNING

2015-02-18 Thread Rémi Cura

Hey folks,
thanks for the answers.
As you guessed it is just a synthetic example,
(so of course RETURNING some_value / 10  is possible in this case, but not
in general when the function is much more complicated than /10).
Same wise, adding a column is just not a serious option.

It correspond to a real need that is that you have rows associated to an
*id*.
Now you want to insert part of this row into a table with a serial field (
*gid*).
Upon insertion, the serial field is automatically filled, and you get it
with a returning statement.
The problem is that you have no way to know which value of *gid* is
associated to which *id*.

The other workaround I found is to get nextvalue() before insert to to know
beforehand what will be the *(gid, id)*  association.

It is suboptimal and ugly, so I would prefer another solution.

Cheers,
Rémi-C


2015-02-17 21:33 GMT+01:00 John McKown john.archie.mck...@gmail.com:

 On Tue, Feb 17, 2015 at 2:15 PM, David G Johnston 
 david.g.johns...@gmail.com wrote:

 On Tue, Feb 17, 2015 at 1:08 PM, John McKown [via PostgreSQL] [hidden
 email] http:///user/SendEmail.jtp?type=nodenode=5838309i=0 wrote:

 I haven't seen any one else reply. I don't know if you've gotten a
 solution. But the following seemed to work for me:


 mine apparently got bounced...



 WITH serie AS (
 select s, s*10 as computing
 from generate_series(1,10) as s
 )
 INSERT INTO test_insert_returning (some_value)
 SELECT computing
 FROM serie
 RETURNING gid, some_value;


 or, RETURNING some_value / 10


 From my reading on the RETURNING phrase, you can only return values from
 the table into which you are doing the INSERT. Not any other table or view
 which might be referenced.


 This is correct; and I am curious on the use case that requires
 otherwise.


 A weird one might be where in data available (s) in the CTE is in
 English measure (feet, miles, etc) and the OP wants to insert the
 equivalent Metric value (computing) into the table, but needs to return
 the English value to the caller (why?). He does not want to put the English
 measure into the table itself, just to be able to return it. And not need
 to do a reverse conversion. As I said, just a weird thought. From a
 effervescent fount of weird thoughts - me. Or perhaps what he is storing in
 the table is a one-way hash of a password, and wants to return the
 clear-text password to the caller? Hum, that is almost reasonable. I'll
 need to be more on guard.




 David J.




 --
 He's about as useful as a wax frying pan.

 10 to the 12th power microphones = 1 Megaphone

 Maranatha! 
 John McKown

[GENERAL] Missing table from in INSERT RETURNING

2015-02-17 Thread Rémi Cura

Hello dear list,
I would appreciate some help on a small matter that has been bothering me
for a long time :


CREATE TABLE test_insert_returning(
gid SERIAL
,some_value int
);
WITH serie AS (
select s, s*10 as computing
from generate_series(1,10) as s
)
INSERT INTO test_insert_returning (some_value)
SELECT computing
FROM serie
RETURNING gid, serie.s

doesn't work.

The only workaround I found was to create a plpgsql function that doesan
idnividual insert
so that :
WITH serie AS (
select s, s*10 as computing
from generate_series(1,10) as s
)
SELECT serie.s, my_inserting_function(computing)
FROM serie ;

But it is very annoying and potentially bad for performance because many
insert may be fired.

Any solution?
(postgres 9.3)
Cheers,
Rémi-C

Re: [GENERAL] How do I bump a row to the front of sort efficiently

2015-02-05 Thread Rémi Cura

Hey,
I'm not a guru, here is what I understood.
You are mixing several problems in the same question :
 - 1. why the planner isn't more efficient
 - 2. why the workaround is difficult to use with an ORM.

for 1. you can't do much (as said by others, you don't really need a case
here anyway). I think using a CASE is equivalent for the planner to using
your own custom blackbox function. So no way to improve that.
for 2. : if you can't pass limit and offset in your ORM,
a small workaround is to number your row following the order you defined
with the function row_number() over(your order here),
then you can use your ORM to design where conditions equivalent to limit
and offset :

WHERE row_number BETWEEN your_offset AND your_limit

Cheers,
Rémi-C

2015-02-04 21:40 GMT+01:00 Paul Jungwirth p...@illuminatedcomputing.com:

  I imagine your original would be at risk of LIMITing out the very row
 you
  seek to get at the top, since you don't have an ORDER BY to tell it
 which
  ones to keep during the outer LIMIT.

 Here is an old thread about combining ORDER BY with UNION:

 http://www.postgresql.org/message-id/16814.1280268...@sss.pgh.pa.us

 So I think this query would work:

 select * from topic
 where id = 1000
 union all
 (select * from topic
 where id  1000
 order by bumped_at desc
 limit 29)
 order by case when id = 1000 then 0 else 1 end, bumped_at desc
 ;

  I need to be able to offset and limit the union hack in a view, which
  is proving very tricky.

 Since this is sort of a parameterized view (which Postgres does not
 have) you are probably better off figuring out how to make the UNION
 query work with your ORM. What ORM is it? Maybe someone here can help
 you with that. Or maybe instead of a view you could write a
 set-returning function, e.g. as described here:


 http://stackoverflow.com/questions/11401749/pass-in-where-parameters-to-postgresql-view

 Paul

 --
 _
 Pulchritudo splendor veritatis.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] how to calculate standard deviation from a table

2015-01-22 Thread Rémi Cura

Are you sur you don't want a moving windows
(stddev on 0 to 50 , then stdev on 1 to 51)
..

If you don't want moving windows your query would look like

DROP TABLE IF EXISTS your_data;
CREATE TABLE your_data AS
SELECT s as gid , random() as your_data_value
FROM generate_series(1,1) as  s ;

SELECT min(gid) as min_gid, max(gid) as max_gid, stddev(your_data_value) as
your_stddev
FROM your_data
GROUP BY (gid-1)/50
ORDER BY min_gid ASC


Please note that min(gid) as min_gid, max(gid) as max_gid and ORDER BY
min_gid ASC are just there to help you understand the result
Cheers,
Rémi-C

2015-01-22 16:49 GMT+01:00 David G Johnston david.g.johns...@gmail.com:

 Pierre Hsieh wrote
  Hi
 
  This table just has a column which type is integer. There are one million
  data in this table. I wanna calculate standard deviation on each 50 data
  by
  order. It means SD1 is from data 1 to data 50, SD2 is from data 51 to
  100 Is there anyone who can give me some suggestions? Thanks
 
  Pierre

 Integer division

 David J.



 --
 View this message in context:
 http://postgresql.nabble.com/how-to-calculate-standard-deviation-from-a-table-tp5835031p5835042.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Fwd: Ask for a question

2015-01-21 Thread Rémi Cura

More bluntly maybe :

if you can do it in Excel,
you can do it in Postgres.

Cheers,
Rémi-C

2015-01-21 16:37 GMT+01:00 Raymond O'Donnell r...@iol.ie:

 On 21/01/2015 14:38, Pierre Hsieh wrote:
 
 
  Hi,
 
 
 
  Would you please tell me whether PostgreSQL can execute the following
  tasks? If not, please also tell me which one can help me for that. Thanks


 Not clear what you're asking, but if you just want to find the standard
 deviation of a sample then that's no problem:


 http://www.postgresql.org/docs/9.4/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE

 Hope this helps,

 Ray.


 --
 Raymond O'Donnell :: Galway :: Ireland
 r...@iol.ie


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Indexing large table of coordinates with GiST

2015-01-16 Thread Rémi Cura

Please let me one more guess ^^
Third guess :  you are using topology (nodes are indexed by node_id).

  -  If this is the case, you could use postgis topology.
  - The gain is that with this topology model, you store *shared linestring*,
and not shared points.


More seriously from what you say it seems possible to use pg_pointcloud
with your data,
if the following assumption is correct :
*When querying by other attributes, the points you get are roughly in the
same area (at least the area is a significant subset of the total area).*
So to be perfectly clear : if for a  node with node_id *N*, you can expect
that the node with node_id *N+1* is spatially close to the node N, you can
use pg_pointcloud and it will be effective.

Then the solution could be : partition your points spatially (aka, from
your billions points, you create few millions of groups of points, with a
grid, clustering, whatever).
Then create an index on each group of points bounding box.
Then create an index (gist) on range(node_id) for each group of point.
.. create indexes for other attributes : on range(attribute)

The you can query your data effectively, and the index size will fit into
RAM (about 1Go for 8 Million patch for me).
The query would be :
  - first get group of points of potential interest
(WHERE st_intersects(group_of_points.bbox, your_polygon) AND
group_of_points.range(node_id) numrange(123,678) AND other attribute
filtering )
  - second, from the group of points selected, extract the actual points,
and do the fine filtering you need
   (WHERE ST_Intersects(ST_MakePoint(point.X,point.Y,point.Z),your_polygon
AND node_id BETWEEN 123 AND 678 ...))


If the assumption is correct, it works well (for instance, all the billions
points I use also have a time stamp (equivalent to your node_id I would
say), I frequently query on time range and it is as fast as spatial query
(that is milliseconds order of magnitude) ).

To give you an order of magnitude of work involved it would take me a
couple of hours to put your data into pg_pointcloud (computing time would
be about 12 hours multi-processed , absolutely all inclusive).

Cheers,
Rémi-C



2015-01-16 1:18 GMT+01:00 Nathan Clayton nathanclay...@gmail.com:


 On 1/15/2015 12:36 PM, Daniel Begin wrote:


 Thank, there is a lot of potential ways to resolve this problem!

 For Rob, here is a bit of context concerning my IT environment…

 Windows 7 64b Desktop, running with an Intel i7 core and 16GB ram. The
 PostgreSQL 9.3 database is stored on a 3TB external drive (USB-3 connection
 with write cache enabled and backup battery) and a temp_tablespaces is
 pointing to a 1TB internal drive.

 Now, let me answered/questioned given proposals in the order I received
 them…

 1-Andy, I will set maintenance_work_mem as large as I can unless someone
 points to an important caveat.

 2-Vick, partitioning the table could have been very interesting. However,
 I will have to query the table using both the node ID (which could have
 provided a nice partition criterion) and/or the node location (find nodes
 within a polygon). I am not familiar with table partition but I suspect I
 can’t create a spatial index on a table that have been partitioned (split
 into multiple tables that inherit from the “master table). Am I right?

 3-Rémi, so many rows does not necessarily mean either raster or points
 cloud (but it’s worth asking!-).  As I mentioned previously, I must be able
 to query the table not only using nodes location (coordinates) but also
 using the few other fields the table contains (but mainly node IDs). So, I
 don’t think it could work, unless you tell me otherwise?

 4-Paul, the nodes distribution is all over the world but mainly over
 inhabited areas. However, if I had to define a limit of some sort, I would
 use the dateline.  Concerning spatial queries, I will want to find nodes
 that are within the boundary of irregular polygons (stored in another
 table). Is querying on irregular polygons is compatible with geohashing?

 Regards,

 Daniel


  Provided you have an integer primary key on both your node tables and
 polygon tables, would it make sense to preprocess the overlaps and have a
 many-to-many table with the node-id and polygon-id? Depending on the speed
 in which data is ingested, you could easily build triggers to run after
 inserts/updates to keep the table updated, or you could create a globally
 unique autoincrement field that tracks revisions and update everything
 after a given high-water mark.

 Lookups and joins would be using integers and should give you much better
 performance than searching through the polygons.

 For the many-to-many table, something like (you can obviously parse it out
 into smaller batches on the insert if you need to so you don't blow up your
 memory usage. If needed you can have two tables partitioned on either the
 node-id or the polygon-id to speed up lookups, as this table has the
 potential to carry many times the records in either table

Re: [GENERAL] Indexing large table of coordinates with GiST

2015-01-15 Thread Rémi Cura

Hey,
You may want to post this on postGIS list.

I take that so many rows mean either raster or point cloud.
If it is point cloud simply consider using pg_pointcloud.
A 6 billion point cloud is about 600 k lines for one of my data set.

If it is raster, you may consider using postgis raster type.
If you really want to keep that much geometry,
you may want to partition your data on a regular grid.
Cheers,
Rémi-C

2015-01-15 15:45 GMT+01:00 Andy Colson a...@squeakycode.net:

 On 1/15/2015 6:44 AM, Daniel Begin wrote:

 Hi, I'm trying to create an index on coordinates (geography type) over a
 large table (4.5 billion records) using GiST...

 CREATE INDEX nodes_geom_idx ON nodes USING gist (geom);

 The command ran for 5 days until my computer stops because a power outage!
 Before restarting the index creation, I am asking the community if there
 are
 ways to shorten the time it took the first time :-)

 Any idea?

 Daniel




 Set maintenance_work_mem as large as you can.

 -Andy



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] update several columns from function returning several values

2014-12-03 Thread Rémi Cura

Thanks,
waiting for the 9.5 then =)
Cheers,
Rémi-C

2014-12-02 18:23 GMT+01:00 Tom Lane t...@sss.pgh.pa.us:

 =?UTF-8?Q?R=C3=A9mi_Cura?= remi.c...@gmail.com writes:
  IF I define a function returning several values , I can't use it to
 update
  mutliple columns of a table at once.
  ...
  UPDATE test_update_m_values SET (gid_plus_1,gid_minus_1) =
  (rc_test_update_m_values(gid)); --doesn't work

  Somebody now if this is possible?

 Not currently.  In 9.5 it'll be possible to do

 UPDATE test_update_m_values SET (gid_plus_1,gid_minus_1) =
 (select * from rc_test_update_m_values(gid));

 but the syntax you were trying will never work, because it would be
 ambiguous with the case of assigning a composite value to a single
 composite column.

 regards, tom lane

[GENERAL] update several columns from function returning several values

2014-12-02 Thread Rémi Cura

Hey,
a trivial question I guess,
can't make it work.

IF I define a function returning several values , I can't use it to update
mutliple columns of a table at once.
i __don't__ want to use CTE or subquerry,
and of course I don't wan tto compute the function several time.

CREATE TABLE test_update_m_values (
gid int,
gid_plus_1 int,
gid_minus_1 int
);

INSERT INTO test_update_m_values VALUES (1,0,0) ;

CREATE OR REPLACE FUNCTION rc_test_update_m_values( gid int,OUT gid_plus_1
int, OUT gid_minus_1 int)
AS $$  -- @brief : test function, can be deleted
BEGIN
SELECT gid+1,gid-1 INTO gid_plus_1, gid_minus_1;
RETURN ;
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE STRICT   ;


UPDATE test_update_m_values SET (gid_plus_1,gid_minus_1) =
(rc_test_update_m_values(gid)); --doesn't work

Somebody now if this is possible?

CHeers,
Rémi-C

Re: [GENERAL] DBlink, postgres to DB2

2014-10-22 Thread Rémi Cura

Hey
http://www.postgresql.org/docs/9.3/static/contrib-dblink-connect.html
Seems dblink is for postgres to postgres connections.
Cheers,
Rémi-C

2014-10-22 12:45 GMT+02:00 Postgres India pgbugin...@gmail.com:

 Hi,
 I am trying to connect DB2 from postgres using dblink, is there any
 configuration required at DB2 and postgres server.

 If any command string please provide it.



 Thanks
 Manmohan. K

Re: [GENERAL] DBlink, postgres to DB2

2014-10-22 Thread Rémi Cura

You may want to use foreign data wrapper (fdw)

https://wiki.postgresql.org/wiki/Foreign_data_wrappers

Cheers,
Rémi-C

2014-10-22 12:50 GMT+02:00 Rémi Cura remi.c...@gmail.com:

 Hey
 http://www.postgresql.org/docs/9.3/static/contrib-dblink-connect.html
 Seems dblink is for postgres to postgres connections.
 Cheers,
 Rémi-C

 2014-10-22 12:45 GMT+02:00 Postgres India pgbugin...@gmail.com:

 Hi,
 I am trying to connect DB2 from postgres using dblink, is there any
 configuration required at DB2 and postgres server.

 If any command string please provide it.



 Thanks
 Manmohan. K

Re: [GENERAL] csv import error

2014-09-23 Thread Rémi Cura

Why don't you use COPY ?
Cheers,
Rémi-C

2014-09-23 12:50 GMT+02:00 FarjadFarid(ChkNet) 
farjad.fa...@checknetworks.com:

 Hi,



 Quotation marks should be around both the name of each and every column
 and their values.



 Also replace column comas as separation character for sql insert
 statement.

 What has happened here is that the values from CSV are directly  into sql.

  Hope this helps.



 Best Regards





 Farjad Farid



 *From:* pgsql-general-ow...@postgresql.org [mailto:
 pgsql-general-ow...@postgresql.org] *On Behalf Of *Eugenio Trumpy
 *Sent:* 23 September 2014 11:26
 *To:* pgsql-general@postgresql.org
 *Subject:* [GENERAL] csv import error



 Hello,

 I'm trying to import data using a csv file,
 but I got an error:


 ERROR:  column key;daprof;aprof;tipo;valore;note;oid;unit_mis of relation 
 assorb does not exist

 LINE 1: INSERT INTO info_pozzi_hydrocarbon.assorb (key;daprof;a...

^

 *In statement:*

 INSERT INTO info_pozzi_hydrocarbon.assorb
 (key;daprof;aprof;tipo;valore;note;oid;unit_mis) VALUES
 ('1001334;19.1;21;A;6;;;11')


 My sql statement for the table that I would like to populate is:

 -- Table: info_pozzi_hydrocarbon.assorb

 -- DROP TABLE info_pozzi_hydrocarbon.assorb;

 CREATE TABLE info_pozzi_hydrocarbon.assorb
 (
   key integer,
   daprof double precision,
   aprof double precision,
   tipo character(1),
   valore double precision,
   note character(254),
   oid serial NOT NULL,
   unit_mis smallint,
   CONSTRAINT assorb_pk PRIMARY KEY (oid),
   CONSTRAINT assorb_fk FOREIGN KEY (key)
   REFERENCES pozzi (key) MATCH SIMPLE
   ON UPDATE CASCADE ON DELETE CASCADE,
   CONSTRAINT unit_m_fk FOREIGN KEY (unit_mis)
   REFERENCES info_cod.unita (unita) MATCH SIMPLE
   ON UPDATE NO ACTION ON DELETE NO ACTION
 )
 WITH (
   OIDS=FALSE
 );



 the csv file is like the following (it is only a part):


 key;daprof;aprof;tipo;valore;note;oid;unit_mis

 1001334;19.1;21;A;6;;;11

 1001334;93.5;94;A;30;;;11

 1001334;94;115;A;20;;;11

 1001334;154.5;255;A;644;;;11

 1001334;273;282;A;4;;;11

 1001334;298;309;A;7;;;11

 1001334;432;1224;P;1850;;;11

 4277001;121;901;A;397;ALLARGAMENTO FORO;;11

 4277001;121;901;A;96;PERFORAZIONE;;11


 The filed order it is the same and also the decimal separator is set as dot. 
 I set also the delimiter for fields.
 I don't understand what was the problem.

 Can somebody help me?

 Eugenio

Re: [GENERAL] PostgreSQL Portable

2014-09-12 Thread Rémi Cura

Hey,
I had many external hard drive crash (savage unplug, power off, pc forced
restart).
The server on the virtual machine was never hurt, nor the data.

Cheers,
Rémi-C

2014-09-12 15:34 GMT+02:00 George Neuner gneun...@comcast.net:

 Hi Craig,

 On Fri, 12 Sep 2014 11:33:55 +0800, Craig Ringer
 cr...@2ndquadrant.com wrote:

 On 09/11/2014 03:16 PM, George Neuner wrote:
 
  If the driver permits it and you [or your users] can be trusted to
  perform a safe unmount via the OS *before* disconnecting the device,
  then you can enable write caching for the device using the device
  manager.  [Note that the device must be connected for it to be visible
  in the device manager.]
 
 It shouldn't be living dangerously, actually.
 
 While I haven't tested it myself, writeback caching on the external
 drive should be safe so long as it continues to honour explicit disk
 flush requests.
 
 That's why we have the WAL and do periodic checkpoints. If you yank the
 drive mid-write you'll lose uncommitted transactions and might have
 slower startup next time around, but it should otherwise not be overly
 problematic.

 For the most part you're correct, but recall that WAL itself can be
 made asynchronous [see fsync() and synchronous_commit() settings] and
 the periodic OS sync also may be disabled - which doesn't affect WAL
 handling but may(?) affect the background writer.

 Even having synchronous WAL the most recent transactions can be lost
 if the log device fails *during* a write.  That's why, if we use
 external devices at all, we tend to use closely coupled devices - disk
 array, wired SAN, etc. - that aren't very likely to be physically
 disconnected.  And uninterruptible power all around 8-)

 A portable device can be reasonably safe if treated properly, but it
 never will be quite as safe as an internal device.

 George



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL Portable

2014-09-11 Thread Rémi Cura

Hey,
I'm working in GIS field and I had the same problems.
Solution I found, which has been working for the past year :
virtual box on external drive !
This way you can have an independent OS (Linux for easy
postgres/postgis/whatever gis you want).

I find it very comfortable because my server is separated from guest os. So
I can take the disk and work on any pc with virtual box installed (require
admin right), and I have all GIS tools on the server, so the virtual
machine is very self contained.
It is also easy to backup (but very slow due to huge iso file).

I use a USB2 okay-ish disk. Guest win XP 64 / win seven 32 ; Host Ubuntu
12.04 32b.
About perfo : I do complex queries. Perf are OK for my use case (about same
as a dedicated XP 32bit).

Using the external disk to hold a table space is a __very__ bad idea.
As soon you do some upgrade/the disk get disconnected/anything happen, you
are really screwed.
(I had the issue. Without backup you can't do much without very strong
postgres skills)

Cheers,
Rémi-C



2014-09-10 23:50 GMT+02:00 Steve Crawford scrawf...@pinpointresearch.com:

  On 09/10/2014 02:00 PM, Daniel Begin wrote:

  First, I am a Newbie regarding PostgreSQL …



 I just started to look at PostgreSQL to implement a large GIS DB (1Tb).
 The data must reside in an external disk with eSATA connection and may be
 moved to different locations (and Windows desktops/laptops). I was looking
 to install PostgreSQL and PostGIS extensions on each PC (setting-up the
 proper PGDATA directory to the external disk) until I read about PostgreSQL
 and PgAdmin Portable …



 http://sourceforge.net/projects/pgadminportable/

 http://sourceforge.net/projects/postgresqlportable/



 Is that a viable alternative considering the expected size of the DB? Any
 comments or proposal would be appreciated J

 Daniel


 It appears you are looking to take the PostgreSQL data directory from
 machine to machine on an external drive. I fear you will run into some
 potential problems:

 1. Performance (mentioned by others).

 2. OS mismatch. Have you ensured that all client machines are running
 identical setups? The underlying files are not guaranteed portable between
 OS versions and 64/32-bit. In fact they probably won't be.

 3. Backups. What happens when one user screws up the database?

 Perhaps you could explain further the genesis of this requirement. The
 message list is littered with questions like this asking how to implement a
 certain solution when, given an understanding of the reason the question is
 being asked, a far better solution exists. This happens even more often
 when the person asking is a newbie.

 Cheers,
 Steve

Re: [GENERAL] permission denied for schema topology

2014-09-11 Thread Rémi Cura

Hey,
if you are using postgis and postgis_topology,
there are specific backup/restore process.
Cheers,
Rémi-C

2014-09-11 12:49 GMT+02:00 Iain Mott m...@reverberant.com:

 Hello,

 The server for my websites was recently changed and upgraded. I have ssh
 access to the server and since the upgrade I am no longer able to use
 pg_dump to perform scripted backups. I've written to the site's
 support services, but until now, they've not been able to help (they've
 responded saying they will likely need to refer the case on to a
 developer - I don't  have any guarantees that it will be resolved).

 I've googled for this - but didn't come up with any relevant solutions.
 Can someone on this list please make suggestions that I can pass on to
 the technical support?

 Here's what happens (the important error messages are in English):


 [~]# pg_dump mydatabase  dump.sql
 Senha:
 pg_dump: comando SQL falhou
 pg_dump: Mensagem de erro do servidor: ERROR:  permission denied for
 schema topology
 pg_dump: O comando foi: LOCK TABLE topology.topology IN ACCESS SHARE MODE


 ---

 I am able to perform dumps of the databases via phpPdAdmin in the
 cpanel of the server, but this is going very inconvenient - hoping to
 use pg_dump

 Thanks,



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] permission denied for schema topology

2014-09-11 Thread Rémi Cura

Yep,
this means you are using postgis.
I don't know if you use postgis topology.
If you have a schema topology containing a topology table, then you also
have postgis_topology installed.

You can check this with this query : SELECT postgis_full_version();

Maybe the extensions have been created with another user id than your's,
hence the trouble.
Another trouble could come from the fact that postgis and postgis_topology
may have been installed without using postgres CREAtE EXTENSION (old
version of postgis).

Cheers,
Rémi-C

2014-09-11 13:12 GMT+02:00 Iain Mott m...@reverberant.com:

 Thanks Rémi-C for the quick reply. By coincidence my site does involve
 mapping, however I wasn't aware that the server might be using postgis
 and postgis_topology. In psql when I type \d i get the following:

 public  | comentarios   | tabela | myusername
  public  | featuredata   | tabela | myusername
  public  | geography_columns | visão  | myusername
  public  | geometry_columns  | visão  | myusername
  public  | raster_columns| visão  | myusername
  public  | raster_overviews  | visão  | myusername
  public  | rss   | tabela | myusername
  public  | spatial_ref_sys   | tabela | myusername

 geography_columns, geometry_columns, raster_columns, raster_overviews
 and spatial_ref_sys are all recent changes to my database (I didn't
 create them and I wasn't aware of their existence until the switch to
 the new server). Do these and the errors I'm receiving indicate the use
 of postgis and postgis_topology by the server?

 Just googled for doing a dump of a postgis database and encountered this
 command:

 pg_dump --no-acl --no-owner $DATABASE  dump.sql

 I tried it, but it results in the same error messages

 Cheers,









  Em Qui, 2014-09-11 às 12:53 +0200, Rémi Cura escreveu:
  Hey,
 
  if you are using postgis and postgis_topology,
  there are specific backup/restore process.
  Cheers,
  Rémi-C
 
 
  2014-09-11 12:49 GMT+02:00 Iain Mott m...@reverberant.com:
  Hello,
 
  The server for my websites was recently changed and upgraded.
  I have ssh
  access to the server and since the upgrade I am no longer able
  to use
  pg_dump to perform scripted backups. I've written to the
  site's
  support services, but until now, they've not been able to help
  (they've
  responded saying they will likely need to refer the case on to
  a
  developer - I don't  have any guarantees that it will be
  resolved).
 
  I've googled for this - but didn't come up with any relevant
  solutions.
  Can someone on this list please make suggestions that I can
  pass on to
  the technical support?
 
  Here's what happens (the important error messages are in
  English):
 
 
  [~]# pg_dump mydatabase  dump.sql
  Senha:
  pg_dump: comando SQL falhou
  pg_dump: Mensagem de erro do servidor: ERROR:  permission
  denied for schema topology
  pg_dump: O comando foi: LOCK TABLE topology.topology IN ACCESS
  SHARE MODE
 
 
  ---
 
  I am able to perform dumps of the databases via phpPdAdmin in
  the
  cpanel of the server, but this is going very inconvenient -
  hoping to
  use pg_dump
 
  Thanks,
 
 
 
  --
  Sent via pgsql-general mailing list
  (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_advisory_lock problem

2014-08-20 Thread Rémi Cura

Hey,
just a quick follow-up for archive:

problem solved with pg_try_advisory...
using lock (ROW EXCLUSIVE) does'nt work (same duplicate key value)
using lock (ACCESS EXCLUSIVE) work (no error) but stop the parallelism (one
thread as to wait for another to go on).

The optimal solution would be to slice the big CTE into several successive
temp table creation, using a procedural language allowing control of
transaction.
This would allow to control visibility and use the powerful MVCC features.
(I'm unwilling to do that).

Cheers everybody and many thanks for the help,
Rémi-C


2014-08-12 14:45 GMT+02:00 Merlin Moncure mmonc...@gmail.com:

 On Tue, Aug 12, 2014 at 5:07 AM, Rémi Cura remi.c...@gmail.com wrote:
 
 
 
  2014-08-11 22:48 GMT+02:00 Kevin Grittner kgri...@ymail.com:
 
  Rémi Cura remi.c...@gmail.com wrote:
 
   as you (both?) suggested it works using advisory lock used at the
   beginning and end of the transaction. This way there is no upsert
   at all if the element is locked? (I used general advisory lockbut
   in the same way as transactionnal lock)
 
  This is too vague to comment on.
 
   The issue in this case is simple : I have to use about 100k
   advisory locks, which is a big memory requirement for my hardware
   :-(
 
  ... and that doesn't seem to make any sense.  Either you are not
  understanding advisory locks or you are doing something very, very
  unusual.
 
   Merlin I'm afraid I don't understand what is vanilla LOCK
   TABLE.
 
  See the LOCK TABLE command.
 
  http://www.postgresql.org/docs/current/interactive/sql-lock.html
 
 
 http://www.postgresql.org/docs/current/interactive/explicit-locking.html
 
   I can't really use a lock table because each query upsert
   sequentially into 3 tables, doing lots of computing between.
 
  Now *that* I understand.  :-)  It's not an unusual requirement,
  but can be a challenge when using snapshot isolation (where writes
  don't block reads and reads don't block anything).  There are two
  main approaches -- introduce blocking to serialize some of the
  operations, or use the SERIALIZABLE transaction isolation level to
  ensure that the behavior of all concurrent transactions is
  consistent with the behavior you would see if they were run one at
  a time.  The latter approach doesn't introduce any new blocking,
  but it can cause transactions to get an ERROR with a SQLSTATE of
  40001 at just about any point, so you need to be prepared to
  recognize that and retry those transactions from the beginning (not
  just the last statement of the transaction), ignoring any data read
  during the failed attempt.
 
  You may want to read the entire chapter on concurrency control:
 
  http://www.postgresql.org/docs/current/interactive/mvcc.html
 
  If you are considering using SERIALIZABLE transactions, you should
  probably review the examples in the Wiki, to get an idea of how it
  behaves in various cases:
 
  http://wiki.postgresql.org/wiki/SSI
 
   I use parallel query to compute faster (load dividing). I guess
   it would be very slow with about 8 parallel queries with locks.
 
  Well, if you introduce blocking you reduce your parallelism, but if
  you use serializable transactions and there are actually a lot of
  conflicts you can see poor performance because of the errors
  rolling back transactions and the need to retry them from the
  start.  The techniques used to implement serializable transactions
  in PostgreSQL are basically a refinement of the Optimistic
  Concurrency Control (OCC) techniques, but generally with far fewer
  retries needed -- the point being that it optimistically assumes
  that there will not be a conflict so that concurrency is better,
  but has to cancel things if that optimism proves to be unfounded.
 
  To make related to changes to multiple tables and maintain coherent
  data, you probably will need to do one or the other.
 
  --
  Kevin Grittner
  EDB: http://www.enterprisedb.com
  The Enterprise PostgreSQL Company
 
 
 
  Hey, thanks for your detailed answer.
 
  The particularity here is that I use a big query with CTE instead of a
 more
  procedural way.
  I do sophisticated geometric computing using postGIS. I guess it is a
 hack
  of both postgres and postgis.
 
  I explain better the pg_advisory locks uses I have tried.
 
  First classic use of pg_advisory, not working :
  CTE_1 (find what rows will be upserted in table_1)
  CTE_2 (find what rows will be upserted in table_2)
  CTE_3 (find what rows will be upserted in table_3)
  CTE_4 (compute the result to be upserted into table_1)
  CTE_5 (upsert into table_1 using custom upsert_function)
  CTE_6 (compute the result to be upserted into table_2)
  CTE_7 (upsert into table_2 using custom upsert_function)
  CTE_8 (compute the result to be upserted into table_2)
  CTE_9 (upsert into table_2 using custom upsert_function)
  CTE_10 (end of query)
  each of the upserting function is plpgsql and do something like
  pg_advisory_lock(table_number, id of row

Re: [GENERAL] pg_advisory_lock problem

2014-08-12 Thread Rémi Cura

2014-08-11 22:48 GMT+02:00 Kevin Grittner kgri...@ymail.com:

 Rémi Cura remi.c...@gmail.com wrote:

  as you (both?) suggested it works using advisory lock used at the
  beginning and end of the transaction. This way there is no upsert
  at all if the element is locked? (I used general advisory lockbut
  in the same way as transactionnal lock)

 This is too vague to comment on.

  The issue in this case is simple : I have to use about 100k
  advisory locks, which is a big memory requirement for my hardware
  :-(

 ... and that doesn't seem to make any sense.  Either you are not
 understanding advisory locks or you are doing something very, very
 unusual.

  Merlin I'm afraid I don't understand what is vanilla LOCK
  TABLE.

 See the LOCK TABLE command.

 http://www.postgresql.org/docs/current/interactive/sql-lock.html

 http://www.postgresql.org/docs/current/interactive/explicit-locking.html

  I can't really use a lock table because each query upsert
  sequentially into 3 tables, doing lots of computing between.

 Now *that* I understand.  :-)  It's not an unusual requirement,
 but can be a challenge when using snapshot isolation (where writes
 don't block reads and reads don't block anything).  There are two
 main approaches -- introduce blocking to serialize some of the
 operations, or use the SERIALIZABLE transaction isolation level to
 ensure that the behavior of all concurrent transactions is
 consistent with the behavior you would see if they were run one at
 a time.  The latter approach doesn't introduce any new blocking,
 but it can cause transactions to get an ERROR with a SQLSTATE of
 40001 at just about any point, so you need to be prepared to
 recognize that and retry those transactions from the beginning (not
 just the last statement of the transaction), ignoring any data read
 during the failed attempt.

 You may want to read the entire chapter on concurrency control:

 http://www.postgresql.org/docs/current/interactive/mvcc.html

 If you are considering using SERIALIZABLE transactions, you should
 probably review the examples in the Wiki, to get an idea of how it
 behaves in various cases:

 http://wiki.postgresql.org/wiki/SSI

  I use parallel query to compute faster (load dividing). I guess
  it would be very slow with about 8 parallel queries with locks.

 Well, if you introduce blocking you reduce your parallelism, but if
 you use serializable transactions and there are actually a lot of
 conflicts you can see poor performance because of the errors
 rolling back transactions and the need to retry them from the
 start.  The techniques used to implement serializable transactions
 in PostgreSQL are basically a refinement of the Optimistic
 Concurrency Control (OCC) techniques, but generally with far fewer
 retries needed -- the point being that it optimistically assumes
 that there will not be a conflict so that concurrency is better,
 but has to cancel things if that optimism proves to be unfounded.

 To make related to changes to multiple tables and maintain coherent
 data, you probably will need to do one or the other.

 --
 Kevin Grittner
 EDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Hey, thanks for your detailed answer.

The particularity here is that I use a big query with CTE instead of a more
procedural way.
I do sophisticated geometric computing using postGIS. I guess it is a hack
of both postgres and postgis.

I explain better the pg_advisory locks uses I have tried.

*First classic use of pg_advisory, not working :*
CTE_1 (find what rows will be upserted in table_1)
CTE_2 (find what rows will be upserted in table_2)
CTE_3 (find what rows will be upserted in table_3)
CTE_4 (compute the result to be upserted into table_1)
CTE_5 (upsert into table_1 using custom upsert_function)
CTE_6 (compute the result to be upserted into table_2)
CTE_7 (upsert into table_2 using custom upsert_function)
CTE_8 (compute the result to be upserted into table_2)
CTE_9 (upsert into table_2 using custom upsert_function)
CTE_10 (end of query)
each of the upserting function is plpgsql and do something like
pg_advisory_lock(table_number, id of row to be upserted)
with updating AS (update table)
insert into table if not updated
pg_advisory_unlock(table_number,id of row to be upserted)

According to what the list said, it doesn't work because of visibility
issues : the locking work, so we know each processes will upsert the same
thing sequentially. However it will fail because each process has no
visibility on the insert done by the others. So when the second process
upsert the same thing, it will try to insert , and we get 2 inserts for the
same row

*Second non classic use of pg_adivsory, working but too taxing on resources*
CTE_1 (find what rows will be upserted in table_1*,
pg_try_advisory_lock(1,id1)*  )
CTE_2 (find what rows will be upserted in table_2,
*pg_try_advisory_lock(2,id2)*  )
CTE_3 (find what rows will be upserted in table_3,*
pg_try_advisory_lock(3,id3)*  )
CTE_4

Re: [GENERAL] pg_advisory_lock problem

2014-08-11 Thread Rémi Cura

2014-08-11 5:33 GMT+02:00 John R Pierce pie...@hogranch.com:


 -

 PERFORM pg_advisory_lock(#1 , 2 or 3 depending on the table#,id of the
 stuff I want to upsert) ;
 WITH stuff_to_upsert ()
 ,updating AS (update returning id)
 ,inserting AS (insert if not updated)
 PERFORM pg_advisory_unlock(same as above).
 



 ah, you're releasing the lock before the insert is committed, since this
 is all within a function call, its entirely within a single transaction.



 --
 john r pierce  37N 122W
 somewhere on the middle of the left coast




 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Hey,
thanks your two !

Oh no I hoped it was something fixable
, because I don't think I can fix this transaction problem.
I guess because the big query is called in one plpgsql function it will be
in one transaction
,so even if I resorted to plpython in the upserting functions to issue
commit it shouldn't work.
The only way would be to transform the big query plpgsqp function into
another pl function,
but I can't really afford it :-(

Damn, it is really hard to use a table with 2 different session at the same
time!

Thanks for your answers, and for the much needed typo correction ;-)

Cheers,
Rémi-C

Re: [GENERAL] pg_advisory_lock problem

2014-08-11 Thread Rémi Cura

Hey,
as you (both?) suggested it works using advisory lock used at the beginning
and end of the transaction. This way there is no upsert at all if the
element is locked? (I used general advisory lockbut in the same way as
transactionnal lock)
The issue in this case is simple : I have to use about 100k advisory locks,
which is a big memory requirement for my hardware :-(

Merlin I'm afraid I don't understand what is vanilla LOCK TABLE.

I can't really use a lock table because each query upsert sequentially into
3 tables, doing lots of computing between.
I use parallel query to compute faster (load dividing). I guess it would be
very slow with about 8 parallel queries with locks.

I should test this lock approach to be sure.

Thanks both of you !
Cheers,
Rémi-C



2014-08-11 17:51 GMT+02:00 Merlin Moncure mmonc...@gmail.com:

 On Mon, Aug 11, 2014 at 9:49 AM, Kevin Grittner kgri...@ymail.com wrote:
  Rémi Cura remi.c...@gmail.com wrote:
 
  2014-08-11 5:33 GMT+02:00 John R Pierce pie...@hogranch.com:
 
  ah, you're releasing the lock before the insert is committed,
  since this is all within a function call, its entirely within a
  single transaction.
 
  Oh no I hoped it was something fixable
 
  Well, it might be.  Try using a transactional advisory lock and
  letting it expire at the end of the transaction, rather than
  explicitly releasing it before the transaction commits.  Depending
  on some other details, that might get it to do what you want.

 Better to use vanilla LOCK TABLE statement in my opinion for this purpose.

 merlin

[GENERAL] pg_advisory_lock problem

2014-08-10 Thread Rémi Cura

Hey dear list,

following the advise of Depesz I'm trying to use advisory lock.

I'm trying to perform parallel upsert.
I have 2 different sessions in which I run a very complicated querry (lot's
of CTE) concurrently. In fact this complicated querry is simply put inside
a plpgsql function for ease of calling.

the querry performs upsert in 3 different tables, each time using a
dedicated plpgsql function that looks like this :
-
PERFORM pg_advisory_lock(#1 , 2 or 3 depending on the table#,id of the
stuff I want to upsert) ;
WITH stuff_to_upsert ()
,updating AS (update returning id)
,inserting AS (insert if not updated)
PERFORM pg_advisory_unlock(same as above).


The querry call such plpgsql function like this
CTE_N(
SELECT r.*
FROM result_to_be_upserted, function_upserting(...) as r
)

Yet I still have errors of duplicated primary key being broken because
trying to insert 2 times the same stuff.
ERROR:  duplicate key value violates unique constraint
result_intersection_pkey
DETAIL:  Key (node_id)=(*12621*) already exists.

Of course when performing the querry sequencially I don't have any errors,
even performing it several times.

I have read the 2 pages relevant to pg_advisory lock, and I clean all the
advisory lock before executing the test that gives those errors.

After the errors happens (which means that 1 process completed and the
other failed), I can see that there is a lock in pg_locks with the id of
the row that caused the error when being upserted.
advisory;2953366;;3;*12621*;2;8/0;11380;ExclusiveLock;t;f

Any help is greatly appreciated, I have tried everything I could think of.

Thanks,
Cheers,
Rémi-C

Re: [GENERAL] Design ? table vs. view?

2014-07-16 Thread Rémi Cura

Hey,
I guess you know all about PL/R,
the R language extension for postgres .
It is very convenient, though be carefull as sometime it crashed my server.

Cheers,
Rémi-C


2014-07-16 3:42 GMT+02:00 John McKown john.archie.mck...@gmail.com:

 On Tue, Jul 15, 2014 at 8:46 AM, David G Johnston 
 david.g.johns...@gmail.com wrote:

 John McKown wrote
  I have a table which has some raw data in it. By raw, I mean it is
  minimally processed from a log file. Every week, I update this table by
  processing the weekly log using awk to create a psql script file which
  looks similar to:
 
 snip
  So the overhead may be quite high, because to SELECT from RUNINFO,
  PostgreSQL must realize all four views.
 
  I appreciate your thoughts on if this is OK, given that performance is
  currently acceptable. Mainly because this work is basically only done
 one
  a
  week, on Sundays. And I don't do it myself, it is done via a scheduler
  (not
  cron, but similar) which runs some scripts.

 I would likely make jobrun.runinfo into a table while leaving
 jobrun.rawdata as-is.  I would have a function that populates runinfo
 from rawdata that I would call after performing the copy to rawdata.
 There would be no views - unless you desire a view interface over
 runinfo
 for API or permission reasons.

 In 9.4 you can (probably) make runinfo an explicit MATERIALIZED VIEW and
 perform REFRESH command to accomplish the same thing - though I am not
 particularly familiar with the mechanics of that feature.

 David J.


 Being the indecisive nut that I am, I am going to do both grin/. I will
 keep the current view. But when I update the rawdata, what I will then do
 is:

 drop table runinfo_table;
 create table runinfo_table as select distinct * from runinfo;

 I am fairly confident that there cannot be any duplicates in runinfo. But,
 being paranoid as well, I will do the DISTINCT just to be sure. I may
 change the VIEW to do that in the future, and remove it from the
 preceeding. Since the process which updates the rawdata table is automated
 and runs on a Sunday, the time needed to recreate runinfo_table is not
 relevant to me. So I get what I want, unless I update rawdata off schedule.
 I cannot imagine why I would do that since the logs from which I create it
 are generally only available after 17:00 local time on Sunday. Getting the
 iogs-to-date information for the time since the last dump is basically a
 PITA and my current use is not critical. Actually, it is more a
 skunkworks project of my own to produce a set of nice graphs, using R,
 which _might_ turn out to be interesting to management, but the production
 of which _will_ help me learn PostgreSQL and R better (hopefully).

 Many thanks.

 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown

Re: [GENERAL] how to create multiple databases running in different dirs

2014-07-01 Thread Rémi Cura

Hey,
postgres already takes care of multiple client writting/reading,
so you don't really need to be afraid of concurrency (for most of the stuff)

If it is so your desire, you could also have multiple server on the same
machine (althought on different port).
This way each server would have its own repository.

Cheers,
Rémi-C


2014-07-01 4:59 GMT+02:00 John R Pierce pie...@hogranch.com:

 On 6/30/2014 4:58 PM, frank ernest wrote:

 Hi, I'm new to postgresql and sql in general. I desired to write a
 program in C that used an sql data base for IPC and because multiple copies
 of my program might run on the same machine I wanted a way to ensure that
 only one copy of each multithreaded program got one database but I'm
 uncertain how to go about this. As the program might be run several
 different times and each time it should only get the data base from it's
 predisesor I can't use the pid as a unique data base name. I thought that I
 might start multiple pgsql instances but somehow that seems wrong. Any
 ideas?


 how would an instance of your program know what to connect to, or which
 previous instance its 'predecessor' was ?

 normally, you have ONE database for a given set of applications, and all
 the applications share the same database tables and such.


 --
 john r pierce  37N 122W
 somewhere on the middle of the left coast



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] python modul pre-import to avoid importing each time

2014-06-27 Thread Rémi Cura

Hey,
thanks, now we have good information:

the python package are really loaded once per connection, so no
optimization is needed.
Unlike plperl or plR there is no easy way to preload packages.
There may be some solutions to make this import at connection start but it
would involve C modification (found no trace of python file or hackable sql
script in postgres source and install directory)

After that,
further optimization is possible by avoiding the useless 'import' (because
it is already loaded) (see the trick here
http://stackoverflow.com/questions/15023080/how-are-import-statements-in-plpython-handled
)
,however benefits are not proven.



My use case is simple geometry manipulation functions. It is easier to use
plpython rather than plpgsql thanks to numpy for vector manipulation.
Usually the functions are called inside complex query with many CTE, and
execute over 100k of rows. Total execution time is in the order of minutes.
(exemple of querry at the end)

Thanks everybody,
Rémi

Example of querry
CREATE TABLE holding_result AS
WITH the_geom AS (
SELECT gid, geom
FROM my_big_table --200k rows
)
SELECT gid, my_python_function(geom) AS result
FROM the_geom;







2014-06-27 4:27 GMT+02:00 Adrian Klaver adrian.kla...@aklaver.com:

 On 06/26/2014 02:14 AM, Rémi Cura wrote:

 Hey,
 thanks for your answer !

 Yep you are right, the function I would like to test are going to be
 called a lot (100k times), so even 15 ms per call matters.


 I got to thinking about this.

 100K over what time frame?

 How is it being called?




 --
 Adrian Klaver
 adrian.kla...@aklaver.com

Re: [GENERAL] python modul pre-import to avoid importing each time

2014-06-26 Thread Rémi Cura

Hey,
thanks for your answer !

Yep you are right, the function I would like to test are going to be called
a lot (100k times), so even 15 ms per call matters.

I'm still a bit confused by a topic I found here :
http://stackoverflow.com/questions/15023080/how-are-import-statements-in-plpython-handled
The answer gives a trick to avoid importing each time, so somehow it must
be usefull.

On another internet page (can't find it anymore) somebody mentioned this
module loading at server startup, one way or another, but gave no
precision. It seems that the plpy python module get loaded by default,
would'nt it be possible to hack this module to add other import inside it?

I also use PL/R (untrusted I guess) and you can create a special table to
indicate which module to load at startup.

Cheers,
Rémi-C



2014-06-25 21:46 GMT+02:00 Jeff Janes jeff.ja...@gmail.com:

 On Thu, Jun 19, 2014 at 7:50 AM, Rémi Cura remi.c...@gmail.com wrote:
  Hey List,
 
  I use plpython with postgis and 2 python modules (numpy and shapely).
  Sadly importing such module in the plpython function is very slow
 (several
  hundreds of milliseconds).

 Is that mostly shapely (which I don't have)?  numpy seems to be pretty
 fast, like 16ms.  But that is still slow for what you want, perhaps.

 
  I also don't know if this overhead is applied each time the function is
  called in the same session.

 It is not.  The overhead is once per connection, not once per call.
 So using a connection pooler could be really be a help here.

  Is there a way to pre-import those modules once and for all,
  such that the python function are accelerated?

 I don't think there is.  With plperl you can do this by loading the
 module in plperl.on_init and by putting plperl into
 shared_preload_libraries so that this happens just at server start up.
 But I don't see a way to do something analogous for plpython due to
 lack of plpython.on_init.  I think that is because the infrastructure
 to do that is part of making a trusted version of the language,
 which python doesn't have.  (But it could just be that no one has ever
 gotten around to adding it.)

 Cheers,

 Jeff

[GENERAL] python modul pre-import to avoid importing each time

2014-06-19 Thread Rémi Cura

Hey List,

I use plpython with postgis and 2 python modules (numpy and shapely).
Sadly importing such module in the plpython function is very slow (several
hundreds of milliseconds).

I also don't know if this overhead is applied each time the function is
called in the same session.

Is there a way to pre-import those modules once and for all,
such that the python function are accelerated?

Thanks,

Cheers,
Rémi-C

Re: [GENERAL] Re: any psql \copy tricks for default-value columns without source data?

2014-05-07 Thread Rémi Cura

Hey,
you may want to have a look at pg_bulkload (
http://pgbulkload.projects.pgfoundry.org/).
Using filter you could get the function you want.

Another solution is pgloader (http://pgloader.tapoueh.org) , but I don't
know if it is as fast as copy.

Cheers,
Rémi-C


2014-05-06 23:04 GMT+02:00 David G Johnston david.g.johns...@gmail.com:

 On Tue, May 6, 2014 at 4:48 PM, John R Pierce [via PostgreSQL] [hidden
 email] http://user/SendEmail.jtp?type=nodenode=5802804i=0 wrote:

 On 5/6/2014 1:22 PM, David G Johnston wrote:
  I know that I can pre-process the input file and simply add the needed
 data
  but I am curious if maybe there is some trick to having defaults
 populate
  for missing columns WITHOUT explicitly specifying each and every column
 that
  is present?

 if you didn't specify the columns in your file, how would you expect it
 to know whats there and not there?


 The default copy behavior is column-order dependent.  If your input file
 has 10 columns and the table has 10 columns they get matched up 1-to-1 and
 everything works just fine.  It would be nice if there was some way to say
 that if the table has 12 columns but the file has 10 columns that the first
 10 columns of the table get matched to the file and the remaining two
 columns use their default values; that way you can add default columns to
 the end of the table and still do an auto-matching import.

 David J.
 

 --
 View this message in context: Re: any psql \copy tricks for default-value
 columns without source 
 data?http://postgresql.1045698.n5.nabble.com/any-psql-copy-tricks-for-default-value-columns-without-source-data-tp5802795p5802804.html

 Sent from the PostgreSQL - general mailing list 
 archivehttp://postgresql.1045698.n5.nabble.com/PostgreSQL-general-f1843780.htmlat
  Nabble.com.

Re: [GENERAL] importing a messy text file

2014-04-30 Thread Rémi Cura

Hey,
with latest version 9.3 you can use a copy from with a programm as argument.
I had a similar problem (an extra space at the end of each line), so I used
sed in a pipeline to remove it before feeding it to database.

You can use the unix | pipe for most of the stuff.

If you have an older version you can use the mkfifo command to trick
postgres into believing he will copy from a file.

Cheers,
Rémi-C


2014-04-30 10:47 GMT+02:00 Willy-Bas Loos willy...@gmail.com:

 Alberto, it would be hard to use sed( s/^I$// ), because there is no
 explicit NULL value and there are many NULLs in the last column.
 So i can't be sure how many tabs should be in the end of each line.

 Yes, Karsten, maybe scripting would be the easiest way to fix this, i
 would then probably insert this line for line. That's a possibility. It
 might take a long time to run, but that's not much of an issue.

 --
 Willy-Bas Loos

Re: [GENERAL] About Large object Storage

2014-04-23 Thread Rémi Cura

Hey,
about your backup,
you must have use plain text backup,
writing in ascii to represent binary data is costly (and pointless?).
You can use compressed version of it.
About the size of your database,
maybe you can try a vacuum full before measuring the size?

Cheers,
Rémi-C


2014-04-23 15:22 GMT+02:00 Kalai R softlinne...@gmail.com:

 Hello,

 I am using postgresql 9.3 in Windows. I take data backup using pg_dump.
 The backup file size is 5 MB only. After that I insert 8 images using large
 object.

 The total size of 8 images are 24 MB. After insert large object only, I
 take data backup using pg_dump. Now the backup file size becomes 246 MB. I
 really shocked. so I check database size using the following query

 SELECT pg_size_pretty(pg_database_size('testdb'));

 The return value is 710 MB.

 What is the problem? How DB size increased extremely?

 Thanks for your Suggestions.

Re: [GENERAL] efficient way to do fuzzy join

2014-04-15 Thread Rémi Cura

A little related bonus :

when doing the time-join,
the next step is to interpolate to have a more accurate estimation :

---
DROP FUNCTION IF EXISTS range_interpolate(nr anyrange,obs anyelement) ;
CREATE OR REPLACE FUNCTION range_interpolate(nr anyrange,obs
anyelement)
RETURNS TABLE(lower_weight NUMERIC,upper_weight NUMERIC)
AS $$
--@param a range
--@param an observation (value) of the same type as the range
--@return the weight to apply to lower bound and upper bound of
range to get the value.

--exceptions : -inf or +inf : weight of the bound is 0, the other
1.
--exceptions : range = a point : returns weight of 0.5 for each
bound (they are identical but the asociated data may not be)
SELECT
CASE WHEN upper(nr)=lower(nr) THEN ROW(0.5,0.5)
--WHEN obs=lower(nr) AND obs=upper(nr) THEN ARRAY[0.5,0.5]
--THEN (obs-lower(nr))/ra, (upper(nr)-obs)/ra
WHEN lower_inf(nr)=TRUE OR lower(nr) IS NULL THEN ROW(0,1)
WHEN upper_inf(nr)=TRUE OR upper(nr) IS NULL THEN ROW(1,0)
ELSE ROW(
(upper(nr)-obs)/(upper(nr)-lower(nr)),(obs-lower(nr))/(upper(nr)-lower(nr)))
END

--testing :
--SELECT * FROM range_interpolate(numrange(1,10) ,  round(10,2))
$$
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
--
Cheers,
Rémi-C

Re: [GENERAL] efficient way to do fuzzy join

2014-04-14 Thread Rémi Cura

2014-04-12 15:04 GMT+02:00 Andy Colson a...@squeakycode.net:

 On 04/12/2014 06:29 AM, Rémi Cura wrote:

 (please note that this random string function is NOT the good way to
 do it, i should random int then use it as index to an array
 containing all the letter)

 Thanks a lot for this new version! It seems to be slower than your
 first solution (no index use I guess, I gave up after 5 minutes vs 5
 sec for the previous). Morevover, I canno't make assumption about a
 fixed interval (2 sec in your example). But I think I see where you
 are going.


 After some test, the fastest is using BETWEEN and range. (it is way
 faster than using the @, strangely)

 Here is the code :


 Ah, sorry about that.  I got pulled away to work on work stuff.  I was
 trying to figure out how to use an index on the range query, but not sure,
 without adding a new column if it would even work.

 I've never had the need for ranges yet, this is the first time I've gotten
 to play with them.

 I would not have thought about between like that, good call.  I'd have
 never guess it would be so fast.


 If you can't use the fixed interval, then ranges are out.

 I was thinking this could be improved:


 select t,
  (select t from a where a.t = b.t order by a.t limit 1) as mint,
  (select t from a where a.t  b.t order by a.t desc limit 1) as maxt
 from b

 It does two selects into a to find the nearest.  Given this:

 create table a(t float);


 insert into a values (1), (5), (6);

 could you write a single query to find the number nearest 3.5?  If so we
 might cut the work by 50%.

 -Andy

 PS: This list prefers you don't top post.


Hey,
the best I can come up with using your original idea is :
--
--fast-ish: 10sec
DROP TABLE IF EXISTS t;
CREATE TABLE t AS
SELECT lower_b_a.gid AS gid_b, lower_b_a.t AS t_b --, lower_b_a.data AS
data_b
, lower_b_a.gid_l_b AS gid_a_lower , a1.t AS t_a_lower--, a1.data
AS data_a_lower
, lower_b_a.gid_l_b -1 AS gid_a_upper , a2.t AS t_a_upper--,
a2.data AS data_a_upper
FROM (
SELECT b.gid, b.t
, (SELECT  gid  FROM a WHERE a.t=b.t order by a.t ASC
LIMIT 1  ) AS gid_l_b
FROM b) as lower_b_a
LEFT OUTTER JOIN a AS a1 ON (a1.gid = gid_l_b) LEFT OUTTER JOIN a
AS a2 ON  (a2.gid = gid_l_b-1)
---

As you suggested it doesn't read the table twice, but only once (to find
the closest lower value). The closest upper value is found by knowing it is
in the next row taht the closest lower value.

Yet it is still slower :-/

The way to go seems to be the numrange.

Thanks a lot for the help in this optimization !

Cheers,

Rémi-C

Re: [GENERAL] efficient way to do fuzzy join

2014-04-12 Thread Rémi Cura

(please note that this random string function is NOT the good way to do it,
i should random int then use it as index to an array containing all the
letter)

Thanks a lot for this new version!
It seems to be slower than your first solution (no index use I guess, I
gave up after 5 minutes vs 5 sec for the previous). Morevover, I canno't
make assumption about a fixed interval (2 sec in your example). But I think
I see where you are going.


After some test, the fastest is using BETWEEN and range.
(it is way faster than using the @, strangely)

Here is the code :
---

--usefull function to fill with random text
CREATE OR REPLACE FUNCTION rc_random_string(INTEGER )
RETURNS text AS $$
SELECT array_to_string(
ARRAY(
SELECT
substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' FROM
(random()*36)::int + 1 FOR 1)
FROM generate_series(1,$1)
)
,'')
$$ LANGUAGE sql;


--creating tables
DROP TABLE IF EXISTS a;
DROP TABLE IF EXISTS b;

create table a(gid int, t numeric, r numrange, data text);
create table b(gid int, t numeric, data text);
CREATE INDEX ON a (t);
CREATE INDEX ON b (t);
--CREATE INDEX ON a USING spgist (r);
CREATE INDEX ON a (r);

--filling tables
WITH the_serie AS (
SELECT s AS gid,  s+random()/2-0.5 AS s, rc_random_string(100) aS
data
FROM generate_series(1,10) AS s

)
insert into a (gid, t,r, data) SELECT gid, s, numrange((lag(s,1)
over(order by gid ASC))::numeric  ,s::numeric) , data
FROM the_serie;


WITH the_serie AS (
SELECT  s as gid, s+(random()-0.5)*2 AS s, rc_random_string(100) aS
data
FROM generate_series(1,10) AS s
)
insert into b (gid, t, data) SELECT gid,s, data
FROM the_serie;

ANALYZE a;
ANALYZE b;

--computing join with range

--slow : 80 sec
DROP TABLE IF EXISTS t;
CREATE TABLE t AS
SELECT b.*
FROM b LEFT JOIN a ON (b.t @ a.r)
ORDER BY gid ASC
LIMIT 30

--slow: 80 sec
DROP TABLE IF EXISTS t;
CREATE TABLE t AS
SELECT b.*
FROM a,b
WHERE b.t @a.r

--fast : 3sec
DROP TABLE IF EXISTS t;
CREATE TABLE t AS
SELECT b.* , a.data as d2
FROM a,b
WHERE b.t BETWEEN lower(a.r) AND upper(a.r)

--fast : 8 sec
DROP TABLE IF EXISTS t;
CREATE TABLE t AS
select a.t As a_t, b.t as b_t

from (
  select t, least( least(t, mint), least(t, maxt)) as t2 from (
select t,
 (select t from a where a.t = b.t order by a.t limit 1) as mint,
 (select t from a where a.t  b.t order by a.t desc limit 1) as maxt
  from b
  ) as tmp
) as tmp2
inner join a on (tmp2.t2 = a.t)
inner join b on (tmp2.t = b.t)
---

Thanks again,
Rémi-C


2014-04-11 20:18 GMT+02:00 Rémi Cura remi.c...@gmail.com:

 Wow many thanks!

 I had thought about the order by and limit because it is the natural way
 to express the problem,
 but I had discarded it for fear of suchbad complexity
 (theoretically, for each row of B , compute the distance to every other
 row of A!)
 .

 And it's okay if 2 row from B share the same join to row from A, because
 when interpolating it will be different.

 Here is the test env with realistic number, your solution is very fast, I
 have to raise my hat (4 sec!)
 ---

 --usefull function to fill with random text
 CREATE OR REPLACE FUNCTION rc_random_string(INTEGER )
 RETURNS text AS $$
 SELECT array_to_string(
 ARRAY(
 SELECT
 substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' FROM
 (random()*36)::int + 1 FOR 1)
 FROM generate_series(1,$1)
 )
 ,'')
 $$ LANGUAGE sql;


 --creating tables
 DROP TABLE IF EXISTS a;
 DROP TABLE IF EXISTS b;

 create table a(t float, data text);
 create table b(t float, data text);
 CREATE INDEX ON a (t);
 CREATE INDEX ON b (t);


 --filling tables
 WITH the_serie AS (
 SELECT  s+random()/2 AS s, rc_random_string(100) aS data
 FROM generate_series(1,10) AS s

 )
 insert into a SELECT s, data
 FROM the_serie;

 WITH the_serie AS (
 SELECT  s+(random()-0.5)*2 AS s, rc_random_string(100) aS data
 FROM generate_series(1,10) AS s

 )
 insert into b SELECT s, data
 FROM the_serie;

 ANALYZE a;
 ANALYZE b;

 --computing result
 DROP TABLE IF EXISTS t;
 CREATE TABLE t AS
 select a.t As a_t, b.t as b_t

 from (
   select t, least( least(t, mint), least(t, maxt)) as t2 from (
 select t,
  (select t from a where a.t = b.t order by a.t limit 1) as mint,
  (select t from a where a.t  b.t order by a.t desc limit 1) as
 maxt
   from b
   ) as tmp
 ) as tmp2
 inner join a on (tmp2.t2 = a.t)
 inner join b on (tmp2.t = b.t

[GENERAL] Re: [postgis-users] design question: how to geocode multiple dynamic city, country?

2014-04-11 Thread Rémi Cura

Hey,
my 2 cents :
If you give write access, beware of the sql injections
http://en.wikipedia.org/wiki/SQL_injection

Cheers,
Rémi-C



2014-04-10 21:48 GMT+02:00 zach cruise zachc1...@gmail.com:

 i accept multiple city, country from users on-the-fly, and want to
 dynamically map them.

 i could create a table where i insert their multiple entries, and then
 geocode that table for display.

 but i also want to avoid giving write permission to the web user.

 i could create a schema and restrict write to that schema.

 or something better?
 ___
 postgis-users mailing list
 postgis-us...@lists.osgeo.org
 http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

[GENERAL] efficient way to do fuzzy join

2014-04-11 Thread Rémi Cura

Hey dear List,

I'm looking for some advice about the best way to perform a fuzzy join,
that is joining two table based on approximate matching.

It is about temporal matching
given a table A with rows containing data and a control_time (for instance
1 ; 5; 6;  .. sec, not necessarly rounded of evenly-spaced)

given another table B with lines on no precise timing (eg control_time =
2.3 ; 5.8 ; 6.2 for example)

How to join every row of B to A based on
min(@(A.control_time-B.control_time))
(that is, for every row of B, get the row of A that is temporaly the
closest),
in an efficient way?
(to be explicit, 2.3 would match to 1, 5.8 to 6, 6.2 to 6)

Optionnaly, how to get interpolation efficiently (meaning one has to get
the previous time and next time for 1 st order interpolation, 2 before and
2 after for 2nd order interpolation, and so on)?
(to be explicit 5.8 would match to 5 and 6, the weight being 0.2 and 0.8
respectively)


Currently my data is spatial so I use Postgis function to interpolate a
point on a line, but is is far from efficient or general, and I don't have
control on interpolation (only the spatial values are interpolated).


Cheers,
Rémi-C

Re: [GENERAL] efficient way to do fuzzy join

2014-04-11 Thread Rémi Cura

Hey,
thanks for your answer.

I think you are right, range type with index could at least provide a fast
matching,
thus avoiding the numrow(A) * numrow(B) complexity .

Though I don't see how to use it to interpolate for more than 1st order.

Cheers,
Rémi-C


2014-04-11 17:09 GMT+02:00 Andy Colson a...@squeakycode.net:

 On 4/11/2014 7:50 AM, Rémi Cura wrote:

 Hey dear List,

 I'm looking for some advice about the best way to perform a fuzzy
 join, that is joining two table based on approximate matching.

 It is about temporal matching
 given a table A with rows containing data and a control_time (for
 instance 1 ; 5; 6;  .. sec, not necessarly rounded of evenly-spaced)

 given another table B with lines on no precise timing (eg control_time =
 2.3 ; 5.8 ; 6.2 for example)

 How to join every row of B to A based on
 min(@(A.control_time-B.control_time))
 (that is, for every row of B, get the row of A that is temporaly the
 closest),
 in an efficient way?
 (to be explicit, 2.3 would match to 1, 5.8 to 6, 6.2 to 6)

 Optionnaly, how to get interpolation efficiently (meaning one has to get
 the previous time and next time for 1 st order interpolation, 2 before
 and 2 after for 2nd order interpolation, and so on)?
 (to be explicit 5.8 would match to 5 and 6, the weight being 0.2 and 0.8
 respectively)


 Currently my data is spatial so I use Postgis function to interpolate a
 point on a line, but is is far from efficient or general, and I don't
 have control on interpolation (only the spatial values are interpolated).


 Cheers,
 Rémi-C



 Have you seen the range type?

 http://www.postgresql.org/docs/9.3/static/rangetypes.html

 Not fuzzy, but is indexable.

 -Andy

Re: [GENERAL] efficient way to do fuzzy join

2014-04-11 Thread Rémi Cura

Wow many thanks!

I had thought about the order by and limit because it is the natural way to
express the problem,
but I had discarded it for fear of suchbad complexity
(theoretically, for each row of B , compute the distance to every other row
of A!)
.

And it's okay if 2 row from B share the same join to row from A, because
when interpolating it will be different.

Here is the test env with realistic number, your solution is very fast, I
have to raise my hat (4 sec!)
---

--usefull function to fill with random text
CREATE OR REPLACE FUNCTION rc_random_string(INTEGER )
RETURNS text AS $$
SELECT array_to_string(
ARRAY(
SELECT
substring('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' FROM
(random()*36)::int + 1 FOR 1)
FROM generate_series(1,$1)
)
,'')
$$ LANGUAGE sql;


--creating tables
DROP TABLE IF EXISTS a;
DROP TABLE IF EXISTS b;
create table a(t float, data text);
create table b(t float, data text);
CREATE INDEX ON a (t);
CREATE INDEX ON b (t);


--filling tables
WITH the_serie AS (
SELECT  s+random()/2 AS s, rc_random_string(100) aS data
FROM generate_series(1,10) AS s

)
insert into a SELECT s, data
FROM the_serie;

WITH the_serie AS (
SELECT  s+(random()-0.5)*2 AS s, rc_random_string(100) aS data
FROM generate_series(1,10) AS s

)
insert into b SELECT s, data
FROM the_serie;

ANALYZE a;
ANALYZE b;

--computing result
DROP TABLE IF EXISTS t;
CREATE TABLE t AS
select a.t As a_t, b.t as b_t
from (
  select t, least( least(t, mint), least(t, maxt)) as t2 from (
select t,
 (select t from a where a.t = b.t order by a.t limit 1) as mint,
 (select t from a where a.t  b.t order by a.t desc limit 1) as maxt
  from b
  ) as tmp
) as tmp2
inner join a on (tmp2.t2 = a.t)
inner join b on (tmp2.t = b.t)




2014-04-11 19:16 GMT+02:00 Andy Colson a...@squeakycode.net:

 On 4/11/2014 7:50 AM, Rémi Cura wrote:

 Hey dear List,

 I'm looking for some advice about the best way to perform a fuzzy
 join, that is joining two table based on approximate matching.

 It is about temporal matching
 given a table A with rows containing data and a control_time (for
 instance 1 ; 5; 6;  .. sec, not necessarly rounded of evenly-spaced)

 given another table B with lines on no precise timing (eg control_time =
 2.3 ; 5.8 ; 6.2 for example)

 How to join every row of B to A based on
 min(@(A.control_time-B.control_time))
 (that is, for every row of B, get the row of A that is temporaly the
 closest),
 in an efficient way?
 (to be explicit, 2.3 would match to 1, 5.8 to 6, 6.2 to 6)

 Optionnaly, how to get interpolation efficiently (meaning one has to get
 the previous time and next time for 1 st order interpolation, 2 before
 and 2 after for 2nd order interpolation, and so on)?
 (to be explicit 5.8 would match to 5 and 6, the weight being 0.2 and 0.8
 respectively)


 Currently my data is spatial so I use Postgis function to interpolate a
 point on a line, but is is far from efficient or general, and I don't
 have control on interpolation (only the spatial values are interpolated).


 Cheers,
 Rémi-C



 Ok, here is a just sql way.  No ranges.  No idea if its right.  A first
 pass, so to speak.



 create table a(t float, data text);
 create table b(t float, data text);

 insert into a values (1), (5), (6);
 insert into b values (2.3), (5.8), (6.2);


 select a.t, b.t
 from (
   select t, least( least(t, mint), least(t, maxt)) as t2 from (
 select t,
  (select t from a where a.t = b.t order by a.t limit 1) as mint,
  (select t from a where a.t  b.t order by a.t desc limit 1) as maxt
   from b
   ) as tmp
 ) as tmp2
 inner join a on (tmp2.t2 = a.t)
 inner join b on (tmp2.t = b.t)




 The middle part is the magic:

 select t,
  (select t from a where a.t = b.t order by a.t limit 1) as mint,
  (select t from a where a.t  b.t order by a.t desc limit 1) as maxt
 from b

 The rest is just to make it usable.  If t is indexed, it'll probably be
 fast too.

 -Andy

Re: [GENERAL] Synchronizing a table that is in two different databases : Need to dump a table as inserts from db1 and change the insert statements into UPDATE statements

2014-03-28 Thread Rémi Cura

Not sure, but maybe

update foo

set br_desc = bar.br_desc

, br_active = bar.br_active

(rest of columns)

where br_cde = bar.br_cde;


Anyway it seem sto be terribly bad idea to give those kind of names !


Cheers,

Rémi-C

Re: [GENERAL] Synchronizing a table that is two different databases : Need to dump a table a insert from db1 and change the insert statements into UPDATE statements

2014-03-27 Thread Rémi Cura

Hey,
it seems to be a very classical problem call
upsert

You'll find a lot of answer on hte web,
See for example
http://www.the-art-of-web.com/sql/upsert/

Cheers,
Rémi-C


2014-03-27 11:16 GMT+01:00 Khangelani Gama kg...@argility.com:

 Hi all



 Synchronizing a *table* that is in two different databases(e.g *db1 and
 db2*).



 Please help me with this. I need to dump a table as INSERTS from db1
 (postgres 9.1.2) and change the INSERT statements into UPDATE statements in
 order to apply that change in *db2(postgres 8.3.0.112)* which has the
 same *table* as *db1*. Where the record does not exist I need to insert
 that record.   There is more than 1000 INSERTS I need to convert to UPDATES
 for the same table. Please help .



 *Example:*



 Below it’s the INSERT from postgres 9.1.2 DATABASE which should convert to
 UPDATE statement.



 INSERT INTO* table* (br_cde, br_desc, br_active, br_closed, grp_cde,
 ctry_cde, state_cde, br_addr_line1, br_addr_line2, br_addr_line3,
 br_addr_line4, br_addr_cde, br_post_addr_line1, br_post_addr_line2,
 br_post_addr_line3, br_post_addr_line4, br_post_addr_cde, br_phone,
 tax_cde, br_server_name, br_lang_ind, bureau_cde, br_risk_typ_cde,
 br_access_ccd, br_access_bureau, br_network_active, br_desc_short,
 br_is_trading, br_is_test, br_is_nomodem, br_is_whs, br_is_merch,
 br_is_cco, br_is_central, br_is_merch_active, br_central_brn, br_merch_brn,
 br_cco_brn, br_cgo_brn, br_cluster_brn, br_contact, br_tollfree, br_fax,
 br_telex, br_email, ops_cde, ops_director, br_cmpy_reg, br_tax_reg,
 audit_id, comp_cde, br_batch, br_release_pilot_type, br_is_order_active,
 bnk_id, br_class_cde, br_erp_xref, br_to_open, br_to_close, br_comments,
 br_is_hire, br_is_voucher, br_is_clear_center, br_is_headoffice,
 br_is_nsp_active, usr_pass_history, br_network_protocol,
 br_gen_ins_central_brn, br_cpi_ins_central_brn, br_rsc_area, br_rsc_perc,
 br_switch_deduc, br_tvlic_dealer_no, br_cco_followup, br_triad_live,
 br_do_cust_locking, br_is_rainbow, br_nca_live, br_ncr_reg_no, prov_cde,
 br_is_data_archive, br_icu_brn, br_living_expenses_live, br_ab_dorder_live,
 br_ab_receipts_active, br_spiv_max_alloc_perc, br_merger_brn,
 br_dep_bnk_acc, br_merchant_number, br_goods_ins_live, br_cgs_connection,
 whs_cde, br_crm_brn, subscription_active, br_prn_doc_default_active,
 br_use_jms, br_sso_active, br_paym8_properties, creditors_supp_no_prefix,
 br_block_debtors_obo, ccms_branch, br_is_ccms_live, ceres_conv_flag,
 cims_branch, br_is_cims_live, br_accept_sql_releases) VALUES ('9940',
 'ELLERINES CENTRAL - 9940', true, NULL, '1', 'BAF', 'BAF', '209 SMIT
 STREET', 'BRA', 'Jgf', 'Jrg', '3025', '209 SMIT STREET', 'BRA', 'Jrg',
 NULL, '2017', '(4562) 712 1300', '1', 'dfgike.com', 'ENG', 'ITC', 'L',
 false, false, false, 'BATCH - 9940', false, false, false, false, false,
 false, true, false, '9940', NULL, NULL, '10.52.1.31', '9940', 'DOUG', NULL,
 '(4562) 712 1300' ‘, NULL, NULL, '001', NULL, '1969/02687/07', NULL, 0,
 NULL, '9940', NULL, false, NULL, NULL, NULL, NULL, NULL, NULL, false,
 false, false, false, false, 30, 'WS2~WS2', '1002', '1002', NULL, NULL,
 false, NULL, NULL, false, true, false, true, 'NC', NULL, true, NULL, true,
 true, NULL, NULL, NULL, NULL, NULL, true, 'oracle_live', NULL, NULL, true,
 NULL, true, false, NULL, NULL, NULL, NULL, NULL, NULL, NULL, false, false);








 CONFIDENTIALITY NOTICE
 The contents of and attachments to this e-mail are intended for the addressee 
 only, and may contain the confidential
 information of Argility (Proprietary) Limited and/or its subsidiaries. Any 
 review, use or dissemination thereof by anyone
 other than the intended addressee is prohibited.If you are not the intended 
 addressee please notify the writer immediately
 and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries 
 distance themselves from and accept no liability
 for unauthorised use of their e-mail facilities or e-mails sent other than 
 strictly for business purposes.

Re: [postgis-users] [GENERAL] postgis in postgresql apt and upgrades

2014-02-20 Thread Rémi Cura

Hey,
could you please explain what you mean by Congrats on the availability of
postgis in de postgresql apt reporsitory?
It would be very great,
but I can't find postgis package in
http://apt.postgresql.org/pub/repos/apt/dists/

Thanks,
Rémi-C


2014-02-20 16:45 GMT+01:00 Willy-Bas Loos willy...@gmail.com:

 Hi,

 Congrats on the availability of postgis in de postgresql apt reporsitory,
 this is great work!

 I have one question about versions and upgrade sof postgis.
 Since the package names are equal to those in debian and ubuntu
 (postgresql-x.x-postgis), how will new versions of postgis and upgrades be
 handled?

 Cheers,

 WBL

 --
 Quality comes from focus and clarity of purpose -- Mark Shuttleworth

 ___
 postgis-users mailing list
 postgis-us...@lists.osgeo.org
 http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Re: [postgis-users] [GENERAL] postgis in postgresql apt and upgrades

2014-02-20 Thread Rémi Cura

So cool ^^

Going to simplify a lot install and upgrades !

Cheers,

Rémi-C


2014-02-20 17:47 GMT+01:00 Willy-Bas Loos willy...@gmail.com:

 On Thu, Feb 20, 2014 at 4:45 PM, Willy-Bas Loos willy...@gmail.comwrote:

 Since the package names are equal to those in debian and ubuntu
 (postgresql-x.x-postgis), how will new versions of postgis and upgrades be
 handled?


 please excuse me, i was being silly.
 the package names are actually: postgresql-x.x-postgis-y.y
 where x.x is the postgres major version and y.y is the postgis major
 version.

 That is just beautiful!
 thank you.

 Cheers,

 WBL

 --
 Quality comes from focus and clarity of purpose -- Mark Shuttleworth

 ___
 postgis-users mailing list
 postgis-us...@lists.osgeo.org
 http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Re: [GENERAL] Toast and slice of toast

2014-02-17 Thread Rémi Cura

Thanks everybody !

Cheers,

Rémi-C


2014-02-17 10:37 GMT+01:00 Andres Freund and...@2ndquadrant.com:

 On 2014-02-17 14:16:33 +1100, Haribabu Kommi wrote:
  On Sun, Feb 16, 2014 at 9:38 PM, Rémi Cura remi.c...@gmail.com wrote:
 
   Hey Dear List,
   could somebody point me to some ressources about getting only parts of
   toasted data?
  
   I have a very big custom type and I would like to take blocks of it
 (like
   byte A to B then byte C to D  then... ).
  
   I found a function in
 http://doxygen.postgresql.org/tuptoaster_8c.html#calledtoast_fetch_datum_slice,
  is it the right way to use it
   (a for loop and calling it several time?).
  
 
  pg_detoast_datum_slice is the function which will solve your problem.

 Note that you need to prevent your type/column from being compressed for
 that being effective. Check the storage options for CREATE TABLE et al.

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

[GENERAL] Toast and slice of toast

2014-02-16 Thread Rémi Cura

Hey Dear List,
could somebody point me to some ressources about getting only parts of
toasted data?

I have a very big custom type and I would like to take blocks of it (like
byte A to B then byte C to D  then... ).

I found a function in
http://doxygen.postgresql.org/tuptoaster_8c.html#called
toast_fetch_datum_slice, is it the right way to use it
(a for loop and calling it several time?).

Thanks,

Cheers

Rémi-C

Re: [GENERAL] function with different return type depending on parameter?

2014-02-12 Thread Rémi Cura

Hey
http://www.postgresql.org/docs/9.3/static/plpgsql-declarations.html
at anyelement.

Cheers
Rémi-C


2014-02-12 10:20 GMT+01:00 James Harper james.har...@bendigoit.com.au:

 is it possible to have a function that can return a different type
 depending on the parameters? Eg (approximately)

 if param = one then return 1
 if param = two then return 2
 if param = three then return 3.0
 etc

 I can't see any variant type

 thanks

 James






 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Ordering Results by a Supplied Order

2014-02-06 Thread Rémi Cura

Hey,
 I dont understand the difference between this ORDINALITY option and
adding a row_number() over() in the SELECT.

Thanks,

Cheers,
Remi-C


2014-02-06 Vik Fearing vik.fear...@dalibo.com:

  On 02/06/2014 04:16 AM, Michael Sacket wrote:

 Often times I find it necessary to work with table rows in a specific,
 generally user-supplied order.  It could be anything really that requires
 an ordering that can't come from a natural column.  Most of the time this
 involved manipulating a position column from the client application.  In
 any case, I've often found that to be cumbersome, but I think I've come up
 with a solution that some of you may find useful.


 Up until 9.4, that's a good way to do it.

 Starting from 9.4, you can use the WITH ORDINALITY feature.
 http://www.postgresql.org/docs/devel/static/sql-select.html

 --
 Vik

Re: [GENERAL] Hard upgrade (everything)

2014-02-06 Thread Rémi Cura

On my private computer I upgraded first the postgres to 9.3, then upgraded
postgis.
Sadly according to
http://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS ,
postgis 1.5 is not compatible with postgres 9.3.
However POstgis 2.1 is compatible with you current postgres option.
So as suggested you can upgrade postgis (see hard/soft upgrade), the
upgrade postgres.
By the way postgis is very easy to compil with ubuntu (use package system
to get dependecies, then simply sudo ./autogen.sh, sudo ./configure, make ,
sudo make install)
Cheers,
Remi-C



2014-02-06 alexandros_e alexandros...@gmail.com:

 I would use normal pg_dump and pg_restore for the DBs and not
 utils/postgis_restore.pl. Also, AFTER I backup all databases and
 everything
 else, you could try to upgrade Postgis without upgrading PostgreSQL by
 buliding from source e.g.
 http://trac.osgeo.org/postgis/wiki/UsersWikiPostGIS21Ubuntu1304src. There
 it
 says that Postgis 2.1 could be used with PostgreSQL 9.1. This will give you
 access to Postgis 2.1 features without reinstalling everything.

 Of course normally I would not upgrade if this is an 1-2 years project,
 unless I 100% need Postgis 2.1 features not present in 1.5.





 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Hard-upgrade-everything-tp5790801p5790811.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Ordering Results by a Supplied Order

2014-02-06 Thread Rémi Cura

Ok, thanks ^^

Cheers,
Rémi-C


2014-02-06 Vik Fearing vik.fear...@dalibo.com:

 On 02/06/2014 10:00 AM, Rémi Cura wrote:
  Hey,
   I dont understand the difference between this ORDINALITY option and
  adding a row_number() over() in the SELECT.

 WITH ORDINALITY will give you something to order by.  You should never
 do row_number() over () because that will give you potentially random
 results.

 --
 Vik

Re: [GENERAL] reading array[text] in C extension function

2014-01-16 Thread Rémi Cura

I'll auto-answer ;-)

Based on the function btoptions from postgres source, which takes aas a
first arg a text[] :

ArrayType  *array;
Datum  *dimdatums;
int ndim;
array = DatumGetArrayTypeP(dimensions);
Assert(ARR_ELEMTYPE(array) == TEXTOID);
pcinfo(after assert \n);
deconstruct_array(array, TEXTOID, -1, false, 'i',
  dimdatums, NULL, ndim);

//construct the array to hold the result :
char ** final_dimension_array = (char **) pcalloc(ndim * sizeof(char * ) );
 for (i = 0; i  ndim; i++)
{
  text   *dimensiontext = DatumGetTextP(dimdatums[i]);
char   *text_str = VARDATA(dimensiontext);
int text_len = VARSIZE(dimensiontext) - VARHDRSZ;
char   *s;
char   *p;
s = TextDatumGetCString(dimdatums[i]);
final_dimension_array[i] = s;
}
//pcinfo(end of the text retrieval\n);

Cheers,

Rémi-C



2014/1/15 Rémi Cura remi.c...@gmail.com

 Hey,
 I'm trying to use an array of text as input in a C function in a custom
 extension.

 the prototype of the sql function is  :
 CREATE OR REPLACE FUNCTION pc_subset(  dimensions TEXT[])


 it is called like :
 pc_subset(  ARRAY['X'::text,'Y'::text])

 and the C function trying to read the text array (converting it to cstring
 ) :

 text ** vals;

 char ** cstring_array;
  nelems = ARR_DIMS(arrptr)[0];
  vals = (text**) ARR_DATA_PTR(arrptr);
 cstring_array = (char **) pcalloc(nelems * sizeof(char * ) );
  for (i3=0;i3nelems;i3++)
 {
 cstring_array[i3] = text_to_cstring( vals[i3]);
  elog(INFO, elem %d of dim_array : %s\n,i3,cstring_array[i3]);
 }

 I crashes postgres because of a segfault.


 Any help appreciated =)

 Cheers,
 Rémi-C

[GENERAL] expert : SRF returning double[]

2014-01-16 Thread Rémi Cura

Hey list,

another tricky C function interface problem :

How to write a set returning function, that returns for each row an array?

it seems like the main function SRF_RETURN_NEXT takes Datum and so I can't
use PG_RETURN_ARRAYTYPE_P().

Shall I encapsulate the array into a composite field (only 1 field : the
double array)

I looked a lot for an example wihtout success.

Help greatly appreciated,

thanks,

Cheers,
Rémi-C

Re: [GENERAL] expert : SRF returning double[]

2014-01-16 Thread Rémi Cura

another auto-answer :

Suprisingly ,

result = construct_array(...)
SRF_RETURN_NEXT(funcctx, PointerGetDatum(result));

But Datum memory must be allocated

Cheers,

Rémi-C


2014/1/16 Rémi Cura remi.c...@gmail.com

 Hey list,

 another tricky C function interface problem :

 How to write a set returning function, that returns for each row an array?

 it seems like the main function SRF_RETURN_NEXT takes Datum and so I can't
 use PG_RETURN_ARRAYTYPE_P().

 Shall I encapsulate the array into a composite field (only 1 field : the
 double array)

 I looked a lot for an example wihtout success.

 Help greatly appreciated,

 thanks,

 Cheers,
 Rémi-C

[GENERAL] reading array[text] in C extension function

2014-01-15 Thread Rémi Cura

Hey,
I'm trying to use an array of text as input in a C function in a custom
extension.

the prototype of the sql function is  :
CREATE OR REPLACE FUNCTION pc_subset(  dimensions TEXT[])


it is called like :
pc_subset(  ARRAY['X'::text,'Y'::text])

and the C function trying to read the text array (converting it to cstring
) :

text ** vals;

char ** cstring_array;
 nelems = ARR_DIMS(arrptr)[0];
vals = (text**) ARR_DATA_PTR(arrptr);
cstring_array = (char **) pcalloc(nelems * sizeof(char * ) );
for (i3=0;i3nelems;i3++)
{
cstring_array[i3] = text_to_cstring( vals[i3]);
 elog(INFO, elem %d of dim_array : %s\n,i3,cstring_array[i3]);
}

I crashes postgres because of a segfault.


Any help appreciated =)

Cheers,
Rémi-C

Re: [GENERAL] excution time for plpgsql function and subfunction

2014-01-13 Thread Rémi Cura

Thanks all =)

Cheers,

Rémi-C


2014/1/13 Guillaume Lelarge guilla...@lelarge.info

 On Fri, 2014-01-10 at 10:10 +0100, Rémi Cura wrote:
  Hey List,
  kind of a simple question :
 
  I'm using the postgis_topology extension,
  and I'm trying to figure where the slowness comes from when importing
 data.
 
  It involves plpgsql function calling other plpgsql functions, insert,
  update, etc etc.
 
  I know I can use explain analyze for one querry, but I don't know how to
  get details of how much time takes each sub-function called by a main
  function.
 
  Thus it is very difficult to guess where is the bottleneck.
 

 Shameless plug, but here is how I do it:

 http://blog.guillaume.lelarge.info/index.php/post/2012/03/31/Profiling-PL/pgsql-functions


 --
 Guillaume
 http://blog.guillaume.lelarge.info
 http://www.dalibo.com

[GENERAL] excution time for plpgsql function and subfunction

2014-01-10 Thread Rémi Cura

Hey List,
kind of a simple question :

I'm using the postgis_topology extension,
and I'm trying to figure where the slowness comes from when importing data.

It involves plpgsql function calling other plpgsql functions, insert,
update, etc etc.

I know I can use explain analyze for one querry, but I don't know how to
get details of how much time takes each sub-function called by a main
function.

Thus it is very difficult to guess where is the bottleneck.

Thanks ,cheers,

Rémi-C

Re: [GENERAL] Grouping, Aggregate, Min, Max

2013-12-13 Thread Rémi Cura

There is a trick to simplify the thing and avoid using aggregates :
I think it will give you your answer.

http://postgresql.1045698.n5.nabble.com/Count-of-records-in-a-row-td5775363i20.html

Cheers,
Rémi-C


2013/12/13 Misa Simic misa.si...@gmail.com

 Hi All,

 I am not sure how to define with words what I want to accomplish (so can't
 ask google the right question :) )

 So will try to explain with sample data and expected result:

 Scenario 1)

   id thing_id category period_id  1 1 A 1  2 1 A 2  3 1 A 3  4 1 A 4  5 1
 A 5  6 1 A 6  7 1 A 7  8 1 A 8  9 1 A 9  10 2 A 1  11 2 A 2  12 2 A 3  13
 2 A 4


 Expected result:

   thing_id category periods  1 A 1-9  2 A 1-4
 (Sounds easy, group by, thing_id, category use Min and Max for period id -
 but further scenarios makes it a bit complicated...)

 Scenario 2)

   id thing_id category period_id  1 1 A 1  2 1 A 2  3 1 A 3  4 1 A 4  5 1
 B 5  6 1 B 6  7 1 B 7  8 1 A 8  9 1 A 9  10 2 A 1  11 2 A 2  12 2 A 3  13
 2 A 4
 Expected result:
  thing_id category periods  1 A 1-4, 8-9  1 B 5-7  2 A 1-4
 Scenario 3)

  id thing_id category period_id  1 1 A 1  2 1 A 2  3 1 A 3  4 1 A 7  5 1 A
 8  6 1 A 9  7 2 A 1  8 2 A 2  9 2 A 3  10 2 A 4
 Expected result:

  thing_id category periods  1 A 1-3, 7-9  2 A 1-4


 So goal is, to group by thing_id, category id - but if period_id is
 interupted (not in incremented by 1) to have aggregated spans...

 To desired results we have came up using several CTE's (what makes a query
 a bit big, and more procedural way: make cte what calculated diff between
 current and previous row, next cte uses previous one to define groupings,
 next cte to make aggregates etc...)

 So I wonder - is there some kind of aggregate window function what does
 desired results?


 Many Thanks,

 Misa

Re: [GENERAL] Set returning aggregate?

2013-12-09 Thread Rémi Cura

Hello,
could it be possible then to define a custom CTE as the result of a
function?
somthing like :

with first_cte AS (
select blabla)
, second_cte AS (
a_function_returning_a_set(parameters)
)
SELECT blabla

A CTE is much like what you would like bborie :
you can reference previously defined CTE, and you output a set of row.

Cheers,
Rémi-C


2013/12/8 Pavel Stehule pavel.steh...@gmail.com

 Hello


 2013/12/8 Bborie Park dustym...@gmail.com

 I'm wondering if an aggregate function can return a set of records?


 No, final function cannot returns set. It is disallowed.

 Theoretically, it should be possible - it is explicitly prohibited. But if
 it will be allowed, you can get same problems like using SRF function in
 target list.

 postgres=# select generate_series(1,2),generate_series(1,2);
  generate_series │ generate_series
 ─┼─
1 │   1
2 │   2
 (2 rows)

 Time: 49.332 ms
 postgres=# select generate_series(1,2),generate_series(1,3);
  generate_series │ generate_series
 ─┼─
1 │   1
2 │   2
1 │   3
2 │   1
1 │   2
2 │   3
 (6 rows)

 Time: 0.445 ms

 It will be hard defined a expected behaviour when somebody use more these
 aggregates in same query and returns different number of rows.


 Regards

 Pavel




 Say I have a table with a column of type raster (PostGIS). I want to get
 the number of times the pixel values 1, 3 and 4 occur in that raster
 column. I am hoping to build an aggregrate function that returns the
 following...

 value | count
 +
 1   | 12
 +
 2   | 12
 +
 3   | 12

 Is it possible for an aggregate function to return a set? I've written
 some test cases and it looks like the answer is No but I'd like
 confirmation.

 Thanks,
 Bborie Park

 PostGIS Steering Committee

Re: [GENERAL] Similarity search for sentences

2013-12-05 Thread Rémi Cura

May be totally a bad idea :
explode your sentence into(sentence_number, one_word), n times , (makes a
big table, you may want to partition)
then, classic index on sentence number, and on the one world (btree if you
make = comparison , more subtel if you do like 'word' )

depending on perf, it could be wort it to regroup by words :
sentence_number[], on_word
Then you could try array or hstore on sentence_number[] ?

Cheers,

Rémi-C


2013/12/5 Janek Sendrowski jane...@web.de

 Hi,

 I have tables with millions of sentences. Each row contains a sentence. It
 is natural language and every language is possible, but the sentences of
 one table have the same language.
 I have to do a similarity search on them. It has to be very fast,
 because I have to search for a few hundert sentences many times.
 The search shouldn't be context-based. It should just get sentences with
 similar words(maybe stemmed).

 I already had a try with gist/gin-index-based trigramm search (pg_trgm
 extension), fulltextsearch (tsearch2 extension) and a pivot-based indexing
 (Fixed Query Array), but it's all to slow or not suitable.
 Soundex and Metaphone aren't suitable, as well.

 I'm already working on this project since a long time, but without any
 success.
 Do any of you have an idea?

 I would be very thankful for help.

 Janek Sendrowski


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] tracking scripts...

2013-11-27 Thread Rémi Cura

First serious answer :
you don't have to use command line,
you can use the pgadmin gui, loading your file with all the command, and
then hit F6 (or select run as pgscript).
This will wrapp each command in a transaction , and will print messages all
along.
Please test this on a few line before trying on everything.

Second :
lol for the secret message.
There is a very famous one like this in french, in private letters by
George Sand, a famous writter.

The text is very high level and nice french, but if you read one line then
skip the next ..,
the message is very dirty !

It is of course also very hard to translate ...

Cheers,
Rémi-C

Cher ami,
Je suis toute émue de vous dire que j'ai
bien compris l'autre jour que vous aviez
toujours une envie folle de me faire
danser. Je garde le souvenir de votre
baiser et je voudrais bien que ce soit
une preuve que je puisse être aimée
par vous. Je suis prête à montrer mon
affection toute désintéressée et sans cal-
cul, et si vous voulez me voir ainsi
vous dévoiler, sans artifice, mon âme
toute nue, daignez me faire visite,
nous causerons et en amis franchement
je vous prouverai que je suis la femme
sincère, capable de vous offrir l'affection
la plus profonde, comme la plus étroite
amitié, en un mot : la meilleure épouse
dont vous puissiez rêver. Puisque votre
âme est libre, pensez que l'abandon ou je
vis est bien long, bien dur et souvent bien
insupportable. Mon chagrin est trop
gros. Accourrez bien vite et venez me le
faire oublier. À vous je veux me sou-
mettre entièrement.
Votre poupée


2013/11/26 Merlin Moncure mmonc...@gmail.com

 On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell r...@iol.ie wrote:
  On 26/11/2013 20:30, Merlin Moncure wrote:
  There are not many ways to Hand off information outside of the
  database while a transaction Is running. one way Is to write a Simple
  trigger in plpgsql that 'raise'es A notice every 'n' times trigger
  condition fires.  that'S Essentially the only Clean way to do it in
  such a way that the information is Returned to the Executing console.
  Thanks!
 
  Totally unrelated to the thread I noticed that the capitalised
  letters in the email above spell out this:
 
   THIISASECRET
 
  .. which (almost) spells This is a secret. Was this intentional, or am
  I just working too hard? :-)

 Well, bad spelling on my part.  To get the joke, you have to be A.
 observant, B. be using a gmail account, and C. be a comic book geek
 that grew up in the 80's.

 merlin


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] tracking scripts...

2013-11-27 Thread Rémi Cura

I'm not an expert,
 I would think if you can spare using only one transaction , it would be
way way faster to do it !

the system simply could skip keeping log to be ready to roll back for a 1
billion row update !

Of course it would be preferable to use psql to execute statement by
statement as separate transactions , and do it with X several parallel psql
(splitting the big text file into X parts), yet Joey seemed reluctant to
use console =)


Cheers,
Rémi-C


2013/11/27 Albe Laurenz laurenz.a...@wien.gv.at

 John R Pierce wrote:
  On 11/26/2013 9:24 AM, Joey Quinn wrote:
  When I ran that command (select * from pg_stat_activity), it returned
  the first six lines of the scripts. I'm fairly sure it has gotten a
  bit beyond that (been running over 24 hours now, and the size has
  increased about 300 GB). Am I missing something for it to tell me what
  the last line processed was?
 
  that means your GUI lobbed the entire file at postgres in a single
  PQexec call, so its all being executed as a single statement.
 
  psql -f filename.sql dbname   would have processed the queries one at
  a time.

 Yes, but that would slow down processing considerably, which would
 not help in this case.

 I'd opt for
 psql -1 -f filename.sql dbname
 so it all runs in a single transaction.

 Yours,
 Laurenz Albe

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] tracking scripts...

2013-11-27 Thread Rémi Cura

Sorry,
if you cancel everything will be rolled back
(it is actually what makes DB so powerfull).

Unless it finishes I don't know of a way to keep changes.

At least on my computer (I don't know if you can generalize this), it is
way faster to split into many transaction, so you would gain time.
Using pgscript will make you loose some time, but it won't be much if each
querry is long enough (some seconds at least).
If you intend to do it often, you may want to consider mutliple parallel
psql.

Cheers,

Rémi-C



2013/11/27 Joey Quinn bjquinn...@gmail.com

 Wow, thank-you (sometimes the answer is right there in front of you...
 very new to Postgres, had wondered what the difference was between the run
 query and run as PGS script, but hadn't looked into it yet).

 So, here's the critical question(s) right now (for me)...

 With the way I launched it, using the Execute query button, if I now hit
 the Cancel query button, what happens? Have the last two days of updates
 already been committed? Or will they get rolled back? I would love to
 switch to the other method, so that I can gauge progress, but would hate to
 lose two days worth of run time...

  If I do run the same script (a bit over 100k lines) in PGS mode, will
 that affect the speed? If so, how much? 1%? 5%? More?




 On Wed, Nov 27, 2013 at 3:20 AM, Rémi Cura remi.c...@gmail.com wrote:

 First serious answer :
 you don't have to use command line,
 you can use the pgadmin gui, loading your file with all the command, and
 then hit F6 (or select run as pgscript).
 This will wrapp each command in a transaction , and will print messages
 all along.
 Please test this on a few line before trying on everything.

 Second :
 lol for the secret message.
 There is a very famous one like this in french, in private letters by
 George Sand, a famous writter.

 The text is very high level and nice french, but if you read one line
 then skip the next ..,
 the message is very dirty !

 It is of course also very hard to translate ...

 Cheers,
 Rémi-C

 Cher ami,
 Je suis toute émue de vous dire que j'ai
 bien compris l'autre jour que vous aviez
 toujours une envie folle de me faire
 danser. Je garde le souvenir de votre
 baiser et je voudrais bien que ce soit
 une preuve que je puisse être aimée
 par vous. Je suis prête à montrer mon
 affection toute désintéressée et sans cal-
 cul, et si vous voulez me voir ainsi
 vous dévoiler, sans artifice, mon âme
 toute nue, daignez me faire visite,
 nous causerons et en amis franchement
 je vous prouverai que je suis la femme
 sincère, capable de vous offrir l'affection
 la plus profonde, comme la plus étroite
 amitié, en un mot : la meilleure épouse
 dont vous puissiez rêver. Puisque votre
 âme est libre, pensez que l'abandon ou je
 vis est bien long, bien dur et souvent bien
 insupportable. Mon chagrin est trop
 gros. Accourrez bien vite et venez me le
 faire oublier. À vous je veux me sou-
 mettre entièrement.
 Votre poupée


 2013/11/26 Merlin Moncure mmonc...@gmail.com

  On Tue, Nov 26, 2013 at 3:20 PM, Raymond O'Donnell r...@iol.ie wrote:
  On 26/11/2013 20:30, Merlin Moncure wrote:
  There are not many ways to Hand off information outside of the
  database while a transaction Is running. one way Is to write a Simple
  trigger in plpgsql that 'raise'es A notice every 'n' times trigger
  condition fires.  that'S Essentially the only Clean way to do it in
  such a way that the information is Returned to the Executing console.
  Thanks!
 
  Totally unrelated to the thread I noticed that the capitalised
  letters in the email above spell out this:
 
   THIISASECRET
 
  .. which (almost) spells This is a secret. Was this intentional, or
 am
  I just working too hard? :-)

 Well, bad spelling on my part.  To get the joke, you have to be A.
 observant, B. be using a gmail account, and C. be a comic book geek
 that grew up in the 80's.

 merlin


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Autodocumenting plpgsql function

2013-11-26 Thread Rémi Cura

Hey all,
somebody knows of a way to autodocument plpgsql function, in a docxygen
style
(adding tags in comments for instance, or creating doc templates to fill).

It would really help to write the doc and maintain it.

Thanks,

Rémi-C

Re: [GENERAL] Autodocumenting plpgsql function

2013-11-26 Thread Rémi Cura

Hey thanks for the answer.

I'm thinking way more than that.
Typically in you comments you include special tags, like @input, then
doxygen will parse it and generate an html documentation.

Cheers,

Rémi-C


2013/11/26 Albe Laurenz laurenz.a...@wien.gv.at

 Rémi Cura wrote:
  somebody knows of a way to autodocument plpgsql function, in a docxygen
 style
  (adding tags in comments for instance, or creating doc templates to
 fill).
 
  It would really help to write the doc and maintain it.

 I am not sure what you need, but I see two ways to
 document a function:

 1) With /** .. */ comments in the beginning.
Maybe doxygen can be used to parse a database dump.

 2) With COMMENT ON FUNCTION ... IS '...';
That also keeps the documentation close to where
the code is, and it shows up in database dumps.

 Yours,
 Laurenz Albe

Re: [GENERAL] tracking scripts...

2013-11-26 Thread Rémi Cura

Now it's too late,
but maybe you could allow to not use a single transaction ( but instead
127k transactions).4

Then at the end of every transaction you could print something in gui
(print for pgscript, raise for plpgsql) or execute a command to write in a
file (copy for instance).
It would also be in the log, but not so clear.

Cheers,

Rémi-C


2013/11/26 Joey Quinn bjquinn...@gmail.com

 I have a fairly large table (4.3 billion rows) that I am running an update
 script on (a bit over 127 thousand individual update queries). I am using
 the gui. It has been running for about 24 hours now. Is there any good way
 to gauge progress (as in, how many of the individual update queries have
 finished)?

Re: [GENERAL] psql variable interpolation with subsequent underscore

2013-11-08 Thread Rémi Cura

maybe a stupid answer,
but why not use another language (plpgsql? python?).
Is it because of transaction issue?
Sorry for not being more helpfull.
Cheers,
Rémi-C


2013/11/8 Tim Kane tim.k...@gmail.com

 Hi all,

 I’m having difficulty using variable interpolation within psql, where that
 variable is within a table name…


 Like so..

 =# set MM 201310
 =# select :MM;
  ?column?
 --
201309
 (1 row)


 =# alter table my_table rename to my_table_:MM_raw;
 ERROR:  syntax error at or near :
 LINE 1: …my_table rename to my_table_:MM_ra...
  ^

 The problem is that psql tries to interpret ‘MM_raw’ as the variable
 name, but my intention is to only interpret ‘MM’ followed by a literal
 underscore.

 I can’t find any other way to encapsulate the variable name in this way…
  Is there a trick to it?

 I suspect I’ll need to work around this by altering the naming convention
 such that the MM is at the ‘end’ of the table name.  Maybe..   Thoughts?


 Tim

Re: [GENERAL] Connection pooling

2013-10-31 Thread Rémi Cura

Isn'it a client problem?

It should be client application closing connection when done with data
retrieval, and not the other way around?

Cheers,
Rémi-C


2013/10/31 Jayadevan maymala.jayade...@gmail.com

 I have never used pgbouncer myself. But my guess is you have to look at the
 Timeout parameters in the configuration file.



 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Connection-pooling-tp5776378p5776481.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Connection pooling

2013-10-31 Thread Rémi Cura

Hey,
I might be completly wrong, but when you say

get the connections to close if they are not being used,

I'd say that it is a bad client design to not close a connection when it
doesn't need it anymore.
The client should retrieve the data or close when not using after a certain
amount of time.

What you are trying to do is garbage collector.

Cheers,
Rémi-C


2013/10/31 si24 smrcoutt...@gmail.com

 I'm not 100% sure I follow in that part of if its the client cause
 currently
 when I run it on my own computer it does the same thing. Only when I stop
 tomcat and start it again then i get the 3 default connection that postgres
 has set up. our server does the same thing.





 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Connection-pooling-tp5776378p5776490.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Connection pooling

2013-10-30 Thread Rémi Cura

Are the geoserver and postgres on same computer?

Cheers,

Rémi-C


2013/10/30 si24 smrcoutt...@gmail.com

 I'm not sure if its suppose to be under general so please let me know if I
 need to move it to another topic area please.



 --
 View this message in context:
 http://postgresql.1045698.n5.nabble.com/Connection-pooling-tp5776378p5776382.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] GIST index : order Hack : getting the order used by CLUSTER .. USING my_index

2013-10-24 Thread Rémi Cura

Hey List,

I would like to be able to get the rows following the order of an index
(*NOT* getting an order by accelerated, but only an order defined by an
index).

Something like this :

SELECT my_row
FROM my_table
ORDER BY the_index ASC

where the_index is a GIST index over points.

I know there is a possibility as it is exactly what the command
CLUSTER my_table USING the_index
does.


I read the following page :
http://www.postgresql.org/docs/9.3/static/indexes-ordering.html, but it is
not of great help.
How much of a hack is it?

Cheers,

Rémi-C

Re: [GENERAL] GIST index : order Hack : getting the order used by CLUSTER .. USING my_index

2013-10-24 Thread Rémi Cura

Hello,

I'm interested in the tree structure inherent to the gist indexing.
I was thinking to retrieve it from order of index.
Do you know how I could access it directly?

My use case would be to take advantage of this gist ordering to order 2D
points
s1 : N1 N2 .. Nn
so that for any given tn,  s2 : [N1 Nt] points are an extract of s1 which
is well spread (spatially speaking).

Ideally I would prefer to use the quadtree index in contrib rather than the
Gist R Tree.

Cheers,

Rémi-C




2013/10/24 Tom Lane t...@sss.pgh.pa.us

 =?UTF-8?Q?R=C3=A9mi_Cura?= remi.c...@gmail.com writes:
  I would like to be able to get the rows following the order of an index
  (*NOT* getting an order by accelerated, but only an order defined by an
  index).

 Since a GiST index hasn't got any specific internal order, I fail to see
 the point of this.

 regards, tom lane

Re: [GENERAL] GIST index : order Hack : getting the order used by CLUSTER .. USING my_index

2013-10-24 Thread Rémi Cura

Ok,
thank you Tom for this precise answer !

I don't understand how the CLUSTER .. USING index command work then.
It is supposed to rewrite on disk following index order. Does it do nothing
for GIST index?

Cheers,
Rémi-C


2013/10/24 Tom Lane t...@sss.pgh.pa.us

 =?UTF-8?Q?R=C3=A9mi_Cura?= remi.c...@gmail.com writes:
  I'm interested in the tree structure inherent to the gist indexing.
  I was thinking to retrieve it from order of index.

 How?  A SQL query would have no idea where the index page boundaries were
 in the sequence of retrieved tuples.

  Do you know how I could access it directly?

 I don't think there's any way to do that without modifying the GiST code.
 What you really care about here is the contents of the upper index levels,
 which is something that's not exposed at all outside the index AM.

 regards, tom lane

Re: [GENERAL] GIST index : order Hack : getting the order used by CLUSTER .. USING my_index

2013-10-24 Thread Rémi Cura

Great,
thanks.

Now you say that I never saw any improvement when clustering table with
gist.
You just saved me a lot of unnecessary queries :-)

Cheers,

Rémi-C


2013/10/24 Tom Lane t...@sss.pgh.pa.us

 =?UTF-8?Q?R=C3=A9mi_Cura?= remi.c...@gmail.com writes:
  I don't understand how the CLUSTER .. USING index command work then.
  It is supposed to rewrite on disk following index order. Does it do
 nothing
  for GIST index?

 Nobody has ever demonstrated that CLUSTER has any value for anything
 except btree indexes.  It seems likely to me that it'd actually be
 counterproductive for indexes like GiST, which depend on data arriving in
 random order for the highest index levels to end up well-distributed.

 regards, tom lane

[GENERAL] Re: [postgis-users] Error with return query ( return next working ) with custom type

2013-10-23 Thread Rémi Cura

Hey,
thanks for the answers,

sorry for the cross post, i didn't know if it was postgis or postgres
issue, hence the double post (removed postgis now).

I'm afraid I don't understand perfectly the answer. Are you (both) saying
that it is a normal behavior that a function that should return a custom
type doesn't in fact return this custom type, but a number of columns
composing this custom type?

This seems like at best a strange behavior !

The whole point of using custom type is to provide interface, right?

To be precise, when specifying return setof fake_topogeometry I would
expect that the function returns a  fake_topogeometry object (like the
querry `SELECT (5,5,5,5)::fake_topogeometry`), not some columns !

I'm obviously missing something, as
SELECT * FROM testTopogeom(); --returns columns
SELECT testTopogeom(); --returns object

Could you suggest me some more documentation (other than
http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html)?

Is this difference between Return Next and return query documented?

Thanks for your help,
Cheers,
Rémi-C


2013/10/23 Steve Grey steven.c.r.g...@gmail.com

 try:

 RETURN QUERY  SELECT 1,1,1,1;

 The error message means the cast failed between ttt.fake_topogeometry and
 the topology_id (i.e. first) field of the return type of the function,
 which isn't what you wanted to do.

 Pls. don't cross-post between lists.


 On 23 October 2013 01:21, Rémi Cura remi.c...@gmail.com wrote:



 Hey dear lists,

 Here is a self contained example showing strange behavior from a real
 life example concerning the use of postgis_topology topogeometry type.


 The problem is :
 when trying to return setof topogeometry,
 the return query  gives an error of type where there is none, and the
 return next is working fine.

 The precise error message is ERROR 42804

 ERROR:  structure of query does not match function result type
 DETAIL:  Returned type ttt.fake_topogeometry does not match expected type
 integer in column 1.
 CONTEXT:  PL/pgSQL function ttt.testtopogeom(ttt.fake_topogeometry) line
 9 at RETURN QUERY
 


 Is it ok, postres bug, postgis bug?
  What are the possible corrections?


 Here is the self contained code stored in the ttt schema.


 DROP SCHEMA IF EXISTS ttt CASCADE;
 CREATE SCHEMA ttt;

 DROP TYPE IF EXISTS ttt.fake_topogeometry CASCADE;
 CREATE TYPE ttt.fake_topogeometry AS
(topology_id integer,
 layer_id integer,
 id integer,
 a_type integer);

 DROP FUNCTION IF EXISTS ttt.testTopogeom(test_topom
 ttt.fake_topogeometry);
 CREATE FUNCTION ttt.testTopogeom(test_topom ttt.fake_topogeometry)
  RETURNS SETOF ttt.fake_topogeometry AS
 $BODY$
 -- this function is an empty function to test return of multiple topogeom
  DECLARE
 the_topo ttt.fake_topogeometry;
 BEGIN
  RETURN NEXT (1,1,1,1)::ttt.fake_topogeometry;
 --RETURN NEXT (3,3,3,3)::ttt.fake_topogeometry;

 RETURN QUERY  SELECT (1,1,1,1)::ttt.fake_topogeometry as foo;
 -- UNION
  --SELECT  (3,3,3,3)::ttt.fake_topogeometry as foo
 RETURN  ;
 END ;
  $BODY$
 LANGUAGE plpgsql IMMUTABLE;

 SELECT *
  FROM ttt.testTopogeom((2,2,2,2)::ttt.fake_topogeometry);



 ___
 postgis-users mailing list
 postgis-us...@lists.osgeo.org
 http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users



 ___
 postgis-users mailing list
 postgis-us...@lists.osgeo.org
 http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Re: [GENERAL] Count of records in a row

2013-10-23 Thread Rémi Cura

Ok thanks for this precision Merlin.
Seems like aggregates are way more powerful than I thought.

Obviously I need a lot more reading about custom aggregates before fully
understanding it.

Elliot's query is pure SQL so obviously very cool !

It could be improved at the margin, and aggregates/function are certainly
faster on big data.
But if you have no specific needs I would say Elliot is easier and more
universal.


Cheers  thanks all for this good discussion.

Rémi-C


2013/10/23 Merlin Moncure mmonc...@gmail.com

  2013/10/22 Merlin Moncure mmonc...@gmail.com
   With a standard loop, I loop n times, and each times I only need the
   current
   row plus the previous row which I put in memory, thus O(n).
 
  For posterity, the above is incorrect.  Since the aggregate is ordered
  through the window function, it gets executed exactly once per output
  row.  It behaves exactly like a loop.  You know this because there is
  no array in the aggregate state.
 
  just out of pure curiosity,
  is it always the case or is it due to this particular aggregate?

 It is always the case.  Generally speaking, aggregates, especially
 user defined aggregates, are run once per input row.   In this case
 the main utility of window functions is to order the aggregate
 execution calls and (especially) allow intermediate output per input
 row, instead of per aggregate grouping.

 On Tue, Oct 22, 2013 at 6:01 PM, Robert James srobertja...@gmail.com
 wrote:
  Wow, this is an excellent discussion - and I must admit, a bit beyond
  my abilities.  Is there a consensus as to the best approach to adopt?
  Is Elliot's the best?

 For this *specific* problem, I would give Elliot's (extremely clever)
 query the nod on the basis that it does not require any supporting
 infrastructure, which is always nice.  That being said, once you start
 getting the mojo of user defined aggregates + window functions it
 starts to become clear that it's a cleaner way of doing many types of
 things that are normally handled by loops.

 merlin

1 2 >

1 - 100 of 117 matches

Mail list logo