from:"\"Ivan Sergio Borgonovo\""

Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Ivan Sergio Borgonovo


On 12/29/2016 10:35 AM, Pavel Stehule wrote:


2016-12-29 10:03 GMT+01:00 Tim Uckun mailto:timuc...@gmail.com>>:

I think it's awesome that postgres allows you to code in different
languages like this. It really is a unique development environment
and one that is overlooked as a development platform.  It would be
nice if more languages were delivered in the default package
especially lua, V8 and mruby.


It is about dependencies and maintenance. There are not too much people
who has good experience with C embedding Lua, V8 and others. Any people
who can do some work are welcome.

The living outside main package has disadvantages - only enthusiast
knows about it, but some advantages too - you are not fixed on
PostgreSQL development cycle, and development can be faster.


I'll add my 2 cents.

Postgresql and in general SQL are about integrity and coherency.
Checking coherency is much easier with strict data type.
PL/PGSQL gives you that, JS is far far away from that.

Postgresql is a very flexible database and you can stretch it to do 
"MEAN like"[1] stuff but that's going to increase your "impedance mismatch".


If you think there is some space for JS in your application stack that's 
nearer to the client rather than to the DB.
Or possibly you need to do "MEAN like" stuff but you don't want to 
install another "database".


As other said using stored procedures is a two edged sword.
It can decouple DB schema from the application or it can increase the 
coupling.
Choosing JS for performance in the stored procedure realm is going to 
encourage coupling and make scalability harder and it is going to become 
a mess when you'll need to refactor.


[1] https://en.wikipedia.org/wiki/MEAN_(software_bundle)

--
Ivan Sergio Borgonovo
http://www.webthatworks.it http://www.borgonovo.net



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] CRM where pg is a first class citizen?

2016-12-13 Thread Ivan Sergio Borgonovo




On 12/13/2016 11:45 PM, Rich Shepard wrote:

On Tue, 13 Dec 2016, Adrian Klaver wrote:


This killed the community(Open Source) edition going forward:
https://community.sugarcrm.com/thread/18434


  I'd like to comment regarding this paragraph from the above-referenced
blog post:



"In the course of the past five years, we have surveyed tens of
thousands of
Sugar Community Edition users and found that we see two types of users of
Sugar Community Edition: 1) developers that wish to build on an open source
CRM platform, and 2) users, generally first time CRM users, that are
looking
for a free/inexpensive CRM solution. We don’t believe that the current
Sugar
Community Edition serves both audiences effectively. We envision an open
source solution targeted exclusively for developers. And, we also
envision a
simpler way for first-time CRM users to find and use CRM."



  This is an interesting perspective, but not surprising for a large
for-profit corporation like SugarCRM.



  I'm an environmental consultant sole practitioner and have been looking
for years for a postgres-supporting CRM that I could use. There is none.
Every business is different and has different needs. This is why a generic
CRM like Sugar that tries to fit every business regardless of type or size
forces its customers to fit into their generic model rather than supporting
a developer _and_ end-user framework that can be customized for each
business's specific needs and way of working.


This reminds me of Drupal and the companies driving its development again...
Drupal was interesting because it was a packaged product and a framework.

Most SME can't afford customization of ERP and accounting programs (if 
you're not including invoice formatting).
SalesForce is not offering custom products and it is still pretty 
successful.


While at least here in Italy I think most accounting programs are a 
trap, I've realized that most of the times SME should learn from the 
procedures proposed by CRM/ERP/accounting programs and adapt rather than 
customize.
Processes are generally not scientifically planned, rather built up as 
they go. A program that has been built to serve many through years may 
not be optimal but at least tend to be more rational.


Still I'm not looking for something perfect, but something simple with 
low maintenance.



*Postgres in this case is one of the ingredients of low maintenance or 
at least maintenance I'm familiar with.*




  That's why I'm developing my own using PyQt5, Python3, psychpg2, and
postgres-9.6.



  I have the postgres schema that works for me and am willing to share it
with others because of this thread. I had not planned on putting it on
GitHub, but see no reason not to do so if there's interest by others. I'm
starting to learn PyQt5 and Python3 after a decade of wxPython use with
Python2 and am just about ready to start creating the UI.


Unfortunately I don't want to depend on something I'll have to put 
developing resources in and I need something that work reasonably quickly.


But I admit that considering the few requirement I have I spent a couple 
of seconds considering the idea to write one.


Nothing bad could come out by publishing your code on Github and if not 
to contribute I'll surely give a look to learn something.


--
Ivan Sergio Borgonovo
http://www.webthatworks.it http://www.borgonovo.net



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] CRM where pg is a first class citizen?

2016-12-13 Thread Ivan Sergio Borgonovo


On 12/13/2016 10:49 PM, Adrian Klaver wrote:

On 12/13/2016 12:36 PM, George Weaver wrote:

I've never used it but what about:

https://developer.sugarcrm.com/2012/08/03/like-postgresql-and-want-to-use-it-with-sugarcrm-check-out-a-new-community-project/



This killed the community(Open Source) edition going forward:

https://community.sugarcrm.com/thread/18434


There are several forks. It is not even clear if the forks support pg 
and that's not a good start.


Wikipedia lists:

https://www.dolibarr.org/ (EPR) seems a bit messy
http://epe.si/ (CRM) not clear workflow
http://www.tryton.org/ (EPR) very postgres oriented, same family of odoo

supporting postgres.

Still looking for some good advice.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it http://www.borgonovo.net



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] CRM where pg is a first class citizen?

2016-12-13 Thread Ivan Sergio Borgonovo


On 12/13/2016 09:36 PM, George Weaver wrote:

I've never used it but what about:

https://developer.sugarcrm.com/2012/08/03/like-postgresql-and-want-to-use-it-with-sugarcrm-check-out-a-new-community-project/


That's one of the first thing I've read and it doesn't look as a good 
premise to make pg a first class citizen.


BTW

Joshua D. Drake cited Drupal 7+

Drupal was the reason I started using postgres back in the days and it 
has been one of the reasons I was bothering you all on this list more 
frequently few years ago.


I've spent a pretty huge amount of time locally patching Drupal to make 
it work with postgresql and to try to push some of the patches upstream.


I don't develop on Drupal anymore but up to at least D7 Postgresql was 
still not a first class citizen.


I've heard DB abstraction layer in D8 is much better but I don't have 
anything critical on Drupal anymore and life is too short to fight to 
see your patches refused from upstream because "supporting postgres is 
holding us back".
Considering that most PHP web applications are not optimized for any DB 
and I wonder what features could a developer exploit to optimize for 
mysql, that's really a shame.


I don't want to repeat the experience, especially on software I'm just 
going to use and not develop on.


Forgive me for I have sinned: last Drupal I've installed was 7 and I 
picked up mysql and I still feel awkward when I've to deal with it.


I'm using horde on postgres. Postgres is not a first class citizen in 
Horde but at least they were happy and quick to merge a couple of my 
patches.


Working with pg was fun, it was a very nice piece of software, it has a 
great community and it got even better.

I'm not anymore a hardcore user but I still enjoy using it.

There are much more services offering Postgres on the internet (Amazon, 
Heroku, OpenStack...).

Lack of proper support from "web applications" still drives me nuts.

It comes to no surprise that Python and Java "web applications" tend to 
support postgres better.
Python would be absolutely welcome, but I don't have that much 
experience managing Java on the web.



https://www.odoo.com/ supports postgres but it is an ERP and it is far 
more complicated than I would like.


--
Ivan Sergio Borgonovo
http://www.webthatworks.it http://www.borgonovo.net



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] CRM where pg is a first class citizen?

2016-12-13 Thread Ivan Sergio Borgonovo


Hi,

I was looking for a open source CRM, PHP or python based, with a large 
community where Postgresql is a first class citizen.


I'd prefer ease of use over features.

Thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it http://www.borgonovo.net



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] "Too far out of the mainstream"

2012-09-05 Thread Ivan Sergio Borgonovo

On Tue, 4 Sep 2012 19:14:28 -0700
Chris Travers  wrote:

> So people are using PostgreSQL in roles that aren't very visible
> anyway, DBA's are usually coming to PostgreSQL from other RDBMS's,
> and few applications are really distributed for PostgreSQL.

I know a bunch of people working for huge sites that love Postgres but
use MySQL. The main reason is they build what Postgres is famous for at
a higher level and in a more specialized way with their own glue.

It's easy to get visibility if you're on the internet and you're huge.

But not everyone can "rebuild" eg. transactions at a higher level and
need as much specialized solutions.

On the other hand for historical reasons MySQL and PHP have nearly
monopolized the hosting space and for many web sites it's hard to
appreciate the difference between Postgres and MySQL (unless your DB
crash and burn).

That's what most people perceive as "the mainstream" if you don't have
a big marketing dept lying.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] fsync default setting and version

2012-03-12 Thread Ivan Sergio Borgonovo

Where can I check in which version the default setting for fsync was
changed?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] how to drop function?

2011-11-16 Thread Ivan Sergio Borgonovo

On Wed, 16 Nov 2011 09:17:45 +0100
Thomas Kellerer  wrote:

> Ivan Sergio Borgonovo, 16.11.2011 01:01:
> > test=# begin;
> > create or replace function process_table (
> >  action TEXT, v_table_name varchar(100)
> > ) RETURNS BOOLEAN
> > AS $$
> > DECLARE
> >
> > BEGIN
> >  return true;
> > END;
> > $$ LANGUAGE plpgsql;
> >
> > drop function process_table (
> >  action TEXT, v_table_name varchar(100)
> > );
> > commit;
> > BEGIN
> > CREATE FUNCTION
> > DROP FUNCTION
> > COMMIT
> > test=#
> >
> > Repeat just the input parameters.
> 
> You don't have to include the parameter names though
>   
> drop function process_table (TEXT, varchar(100));

> is just as good and bit less typing ;)

In psql/pgadmin you've tab completion. It will complete without the
parameters name.
If you're writing more durable code generally you can just cut&paste
the creation code.

I admit I haven't spent enough time to see if I can have tab
completion inside my IDE/editor.

When you're refactoring the function most probably you'll have to
refactor the drop code too.

I tend to refactor much more frequently the number/type of parameters
rather than the names, so skipping the names is anticipating some
work that I'll seldom take advantage of.

I admit I drop functions much more frequently in psql rather than in
my IDE/editor, but still I generally have the creation code handy.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] how to drop function?

2011-11-16 Thread Ivan Sergio Borgonovo

On Tue, 15 Nov 2011 18:48:00 -0700
"J.V."  wrote:

> the drop function works when running from a pgAdmin III Sql window
> 
> but when I try to do from the command line and script it:
>  psql -h $PGHOST -p $PGPORT -d $PGDATABASE -U $PGUSER -c
> "*drop function *"
> 
> the above fails.

> It does however work with functions with no params or a single
> param. It seems to get hung up on the comma and the extra set of
> parenthesis

It would be nice to know how it fails and if you reposted exactly
what you wrote to make it fail and what you wrote to make it succede.

What does it mean "get hung on the comma and extra set of
parenthesis"?

ivan@dawn:~$ psql -h lan test -c 'drop function process_table
  (action TEXT, v_table_name varchar(100));'
DROP FUNCTION
ivan@dawn:~$

the psql command all on the same line.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] how to drop function?

2011-11-15 Thread Ivan Sergio Borgonovo

On Tue, 15 Nov 2011 16:38:20 -0700
"J.V."  wrote:

> How do I drop a function that was created like so:
> 
> create or replace function process_table (action TEXT,
> v_table_name varchar(100)) RETURNS BOOLEAN
> AS $$
> DECLARE
> 
> BEGIN
>...
> END;
> $$ LANGUAGE plpgsql;
> 
> ---
> I have tried various ways, but it always fails.
> 
> 
> J.V.
> 

test=# begin;  
create or replace function process_table (
action TEXT, v_table_name varchar(100)
) RETURNS BOOLEAN
AS $$
DECLARE

BEGIN
return true;
END;
$$ LANGUAGE plpgsql;

drop function process_table (
action TEXT, v_table_name varchar(100)
);
commit;
BEGIN
CREATE FUNCTION
DROP FUNCTION
COMMIT
test=#

Repeat just the input parameters.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] function doesn't see change in search_path

2011-11-07 Thread Ivan Sergio Borgonovo

On Mon, 7 Nov 2011 19:07:29 +0100
Pavel Stehule  wrote:

> 2011/11/7 Ivan Sergio Borgonovo :
> > On Mon, 7 Nov 2011 17:55:11 +0100
> > Pavel Stehule  wrote:
> >
> >> Hello
> >>
> >> this is know bug/feature based on caching plans
> >
> > What puzzled me is I'm operating in a similar way in a different
> > system and I'm not experiencing the same problem.
> >
> > Do different users have different caches?
> 
> depend on usage - cache is per session

OK. It is clear it is "per session".
Up to my knowledge users can't be changed inside the same session.
What are you referring to with "depend on usage".
Is there any other thing that can influence cached plans?

Right now I just need a workaround and calling the function in
different sessions seems cleaner than writing a function for each
schema especially since I can use psql \connect.

It seems that cache management happens completely behind the scenes
and there are no way to control it other than playing tricks as

sql := 'select * from ' | sometable |...
execute sql;

I didn't find anything on cache other than what's written here

http://developer.postgresql.org/pgdocs/postgres/plpgsql-implementation.html

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] function doesn't see change in search_path

2011-11-07 Thread Ivan Sergio Borgonovo

I have a behaviour similar to this
http://archives.postgresql.org/pgsql-bugs/2007-09/msg00017.php

create language plpgsql;

create schema test1;
create schema test2;
create table test1.a(a varchar(3) unique);
create table test2.a(a varchar(3) unique);

create or replace function test_insert() returns void as
$$
begin
raise notice 'path %', current_schemas(true);
insert into a values('a');
end;
$$ language plpgsql volatile;


set search_path to 'test1', 'public';

select * from test_insert();
NOTICE:  path {pg_catalog,test1,public}
 test_insert 
-
 
(1 row)

set search_path to 'test2', 'public';

select * from test_insert();
NOTICE:  path {pg_catalog,test2,public}
ERROR:  duplicate key value violates unique constraint "a_a_key"
CONTEXT:  SQL statement "insert into a values('a')"
PL/pgSQL function "test_insert" line 3 at SQL statement

PostgreSQL 8.3.14

what's going on?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] access to lexems or access to parsed elements

2011-08-25 Thread Ivan Sergio Borgonovo

On Thu, 25 Aug 2011 18:21:21 +0200
"Massa, Harald Armin"  wrote:

> I want to access the single words in a text. Better yet: the
> relevant words (i.e. without stop words) in a text.
> 
> to_tsvector or casting gets me the lexems as a tsvector:

I wrote this piece of C code more than a year ago. [1]
It has been working in a production environment for quite a lot.

It just works with versions < 8.4 since it doesn't support *.
I'd be willing to maintain the module or even expand its
features and release it on any license that will please postgresql
community if my effort would actually make it more easily available
to other people but throwing it on the internet won't be enough.

If someone think it is worth the effort to help me understand how
that could happen I'd be glad to learn.

[1]
http://www.webthatworks.it/d1/content/postgresql-c-module-turn-tsvectors-tsquery-and-return-tsvectors-tables

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pg multi-tenanted hard? was: FREE hosting platforms with PostgreSQL, Java SDK, Tomcat, ecc.?

2011-08-07 Thread Ivan Sergio Borgonovo

On Mon, 08 Aug 2011 05:38:02 +0800
Craig Ringer  wrote:

> On 7/08/2011 11:43 PM, Ivan Sergio Borgonovo wrote:
> > On Sun, 07 Aug 2011 20:41:27 +0800
> > Craig Ringer  wrote:
> >
> >> By the way, one of the reasons you're not finding much free
> >> hosting for PostgreSQL is that it takes a fair bit of work to
> >> run Pg multi-tenanted. Your additional requirement for Java and
> >> Tomcat
> > Why should pg be harder than anything else?

> There are DBs - mostly $OMFGLOTS DBs - that offer much greater 

OK. I forgot to specify "anything else in the same league" or
shortly open source.

Nice reading anyway, thanks.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Pg multi-tenanted hard? was: FREE hosting platforms with PostgreSQL, Java SDK, Tomcat, ecc.?

2011-08-07 Thread Ivan Sergio Borgonovo

On Sun, 07 Aug 2011 20:41:27 +0800
Craig Ringer  wrote:

> By the way, one of the reasons you're not finding much free
> hosting for PostgreSQL is that it takes a fair bit of work to run
> Pg multi-tenanted. Your additional requirement for Java and Tomcat

Why should pg be harder than anything else?


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Read MS-SQL data into Postgres via ODBC link?

2011-07-05 Thread Ivan Sergio Borgonovo

On Tue, 5 Jul 2011 19:38:25 -0400
"Jonathan Brinkman"  wrote:

> I was really hoping to keep the data-replication (between MSSQL
> --> PG) contained within a PG function. 
> 
> Instead I could write a small shell script or C service to do this
> using tsql (freetds). I have access to the MSSQL data via unixodbc
> and tdsodbc/freetds in my Ubuntu console. 

I wrote some scripts, mainly in php and perl that make use of
freetds/odbc to migrate a schema from MSSQL to pg and import the data
via csv.
Postgresql can run perl, python and php "internally" and all those
languages have odbc drivers so you could skip the csv passage.
It's not elegant as it would be accessing the data directly from
odbc in pg but it works.
I can share some code.

> But I want to read from that ODBC stream directly in Postgresql,
> like a SELECT on Linked Servers in MSSQL-world or on a linked
> table in MSACCESS. That would give control over the ODBC
> interaction to the PG function rather than a non-DB entity.

> Has anyone tried ODBC-Link
> (http://www.cybertec.at/en/postgresql_products/odbc-link)? I
> couldn't get through the install due to Ubuntu-related error.

Nice. I'd be interested in some feedback if you succede to make it
work.
Does this project has any chance to be included in contrib? It seems
alive and kicking.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] benchmark comparing different postgresql versions

2011-06-11 Thread Ivan Sergio Borgonovo

Is there a place where I can find comparison in performances of
different postgresql versions, hopefully related to new
feature/improvements?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] find the greatest, pick it up and group by

2011-05-16 Thread Ivan Sergio Borgonovo

On Mon, 16 May 2011 20:05:45 -0400
"David Johnston"  wrote:

> When asking for help on non-trivial SELECT queries it really helps
> to tell us the version of PG you are using so that responders know
> what functionality you can and cannot use.  In this case
> specifically, whether WINDOW (and maybe WITH) clauses available?

Unfortunately I'm on 8.3 so no WINDOW.

I didn't even think of using them and I can't think of any way to
use WINDOW/WITH but if there is a more readable solution that use
them I'd like to see it even if I won't be able to use it.
Of course I'm more interested to know if there is any cleaner
solution for 8.3.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY complaining about unquoted carriage return found in data... in a quoted field

2011-05-16 Thread Ivan Sergio Borgonovo

On Tue, 10 May 2011 15:59:07 +0200
Ivan Sergio Borgonovo  wrote:

Sorry for the noise. The csv was automatically generated. The code
was right but during generation there was some problem with the box
generating it (php segfaulting) and there were some unclosed quotes
in a much earlier line.

Postgresql actually behaves as documented and expected... and the
documentation is clear.

> On Tue, 10 May 2011 14:38:23 +0200
> Pavel Stehule  wrote:
> 
> > Hello
> > 
> > COPY doesn't like '\n' too.
> > 
> > Replace '\n' by '\\n'
> 
> mmm maybe you were mislead by the "semi-transliterated" hexdump.
> 
> There is no "slash" in the record, the actual input was the one
> reported in hex. The following line was just to help reading the
> hexdump.
> 
> 09 22 56 | 69 61 20 46 | 61 65 6E 74 | 69 6E 61 2C
> 20 35 33 0D | 0A 22 09 22 | 35 30 30 31 | 34 22 09
> 
> as a quick reference:
> 0x09 = \t
> 0x0a = \n
> 0x0d = \r
> 0x20 = space
> 0x22 = "
> 
> I thought that
> 
> csv
> quote as '"'
> 
> where there exactly to avoid escaping of delimiters (other than
> the quoting character itself) inside fields.
> 
> From the docs:
> http://www.postgresql.org/docs/8.3/static/sql-copy.html
> «If the value contains the delimiter character, the QUOTE
> character, the NULL string, a carriage return, or line feed
> character, then the whole value is prefixed and suffixed by the
> QUOTE character, and any occurrence within the value of a QUOTE
> character or the ESCAPE character is preceded by the escape
> character.»
> 
> So up to my understanding once I've "quoted" a field I've to take
> care of escaping just the quote and the escape characters and the
> field may contain delimiter characters (field separator and record
> separator) without escaping.
> 
> I was wondering if a) the documentation is wrong b) I didn't write
> a correct \COPY command string c) there is a bug d) I can't
> correctly interpret the documentation e) I'm more stupid then usual
> 
> thanks
> 


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] find the greatest, pick it up and group by

2011-05-16 Thread Ivan Sergio Borgonovo

I've a table like:

CREATE TABLE ordiniitem
(
  idordine numeric(18,0) NOT NULL,
  grupposped smallint,
  idart numeric(18,0) NOT NULL,
  qevasa integer,
  qfuoricat integer,
  qinris integer,
  qnonpub integer,
  qann integer,
  qord integer,
  qpren integer,
  qrichpag integer,
  qinriass integer,
  qinesa integer
);

I'd like to group by idordine, grupposped, idart.
For every row grouped that way, I'd like to pick the greatest of the
q* columns and insert:
idordine, grupposped, idart, name of the greatest(q*) in a new table.
I don't mind if more than one q* column is equal to greatest(q*). It
should pick up one, just one no matter which among the one equal to
greatest(q*).

I think this way works but it hurts my eyes. Any alternative
approach?

SELECT 
  ioi.idordine,
  ioi.grupposped,
  ioi.idart,
  -- ioi.quantita,
  case
   when ioi.qevasa =
greatest(
  ioi.qevasa, ioi.qfuoricat,
  ioi.qinris, ioi.qnonpub, ioi.qann, ioi.qord, ioi.qpren,
  ioi.qrichpag, ioi.qinriass, ioi.qinesa )
then 'evaso'
   when ioi.qfuoricat =
greatest(
  ioi.qevasa, ioi.qfuoricat,
  ioi.qinris, ioi.qnonpub, ioi.qann, ioi.qord, ioi.qpren,
  ioi.qrichpag, ioi.qinriass, ioi.qinesa )
then 'fuoricatalogo'
   when ioi.qinris =
greatest(
  ioi.qevasa, ioi.qfuoricat,
  ioi.qinris, ioi.qnonpub, ioi.qann, ioi.qord, ioi.qpren,
  ioi.qrichpag, ioi.qinriass, ioi.qinesa )
then 'in ristampa'

-- ...

end
FROM 
  ordiniitem ioi
  group by
ioi.idordine,
ioi.grupposped,
ioi.idart,
ioi.qevasa, ioi.qfuoricat, ioi.qinris,
ioi.qnonpub, ioi.qann, ioi.qord, ioi.qpren,
ioi.qrichpag, ioi.qinriass, ioi.qinesa
;

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY complaining about unquoted carriage return found in data... in a quoted field

2011-05-10 Thread Ivan Sergio Borgonovo

On Tue, 10 May 2011 14:38:23 +0200
Pavel Stehule  wrote:

> Hello
> 
> COPY doesn't like '\n' too.
> 
> Replace '\n' by '\\n'

mmm maybe you were mislead by the "semi-transliterated" hexdump.

There is no "slash" in the record, the actual input was the one
reported in hex. The following line was just to help reading the
hexdump.

09 22 56 | 69 61 20 46 | 61 65 6E 74 | 69 6E 61 2C
20 35 33 0D | 0A 22 09 22 | 35 30 30 31 | 34 22 09

as a quick reference:
0x09 = \t
0x0a = \n
0x0d = \r
0x20 = space
0x22 = "

I thought that

csv
quote as '"'

where there exactly to avoid escaping of delimiters (other than
the quoting character itself) inside fields.

From the docs:
http://www.postgresql.org/docs/8.3/static/sql-copy.html
«If the value contains the delimiter character, the QUOTE character,
the NULL string, a carriage return, or line feed character, then the
whole value is prefixed and suffixed by the QUOTE character, and any
occurrence within the value of a QUOTE character or the ESCAPE
character is preceded by the escape character.»

So up to my understanding once I've "quoted" a field I've to take
care of escaping just the quote and the escape characters and the
field may contain delimiter characters (field separator and record
separator) without escaping.

I was wondering if a) the documentation is wrong b) I didn't write a
correct \COPY command string c) there is a bug d) I can't correctly
interpret the documentation e) I'm more stupid then usual

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] COPY complaining about unquoted carriage return found in data... in a quoted field

2011-05-10 Thread Ivan Sergio Borgonovo

I'm on pg 8.3.14
I'm trying to import a csv with

\copy anagraficaclienti from
  'myfile.csv'
  delimiter as E'   '  -- this is a tab \t
  null as 'NULL'
  csv
header
quote as E'"'
escape as E'\\'

What I get is

ERROR:  unquoted carriage return found in data
HINT:  Use quoted CSV field to represent carriage return.
CONTEXT:  COPY anagraficaclienti, line 48656

The record actually contains \r\n in a field but up to my
understanding that field is quoted.
I think what seems to be wrong is my understanding of what postgres
consider quoted field.

the content in hex
09 22 56 | 69 61 20 46 | 61 65 6E 74 | 69 6E 61 2C
20 35 33 0D | 0A 22 09 22 | 35 30 30 31 | 34 22 09

\t"Via Faentina, 53\r\n"\t

What am I missing?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Convert Simple Query into tsvector & tsquery format.

2011-03-18 Thread Ivan Sergio Borgonovo

On Fri, 18 Mar 2011 12:30:50 +0530
Adarsh Sharma  wrote:

> Dear all,
> 
> I have a simple query mentioned below :
> 
> select  count(*)  from page_content where (content like
> '%Militant%' OR content like '%jihad%' OR  content like
> '%Mujahid%'  OR content like '%fedayeen%' OR content like
> '%insurgent%'  OR content like '%terrORist%' OR
>   content like '%cadre%'  OR content like '%civilians%' OR content
> like '%police%' OR content like '%defence%' OR content like
> '%cops%' OR content like '%crpf%' OR content like '%dsf%' OR
> content like '%ssb%') AND (content like '%kill%' OR content like
> '%injure%');

It's not clear where the problem is.

select to_tsquery('english', '(yellow | blue) & (red | black)');
 to_tsquery  
-
 ( 'yellow' | 'blue' ) & ( 'red' | 'black' )

select to_tsvector('english', 'yellow red') @@ to_tsquery('english',
'(yellow | blue) & (red | black)');

The only thing I can see that could cause problems is you may have
previously "mangled" words in the ilike query while you'd leave that
task to ts engine that will find a proper lexeme.

Could you be more specific about the problem you're encountering.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Set new owner on cloned database

2010-12-08 Thread Ivan Sergio Borgonovo

On Wed, 8 Dec 2010 13:40:29 -0500 (EST)
"James B. Byrne"  wrote:

> I am testing a Rails deployment and wish to copy a database
> assigning it an new owner.  I have tried this:
> 
> createdb --owner=hll_theheart_db_devl
> --template=hll_th_deploytest_prod hll_theheart_devl
> 
> While this indeed sets the database owner to hll_theheart_db_devl
> everything else, schema, tables whatever, remains owned by the
> original owner.  Is there no way to change the owner everywhere in
> the cloned database using cretedb?  Or am I constrained to do a
> dump all and restore?

http://www.mail-archive.com/pgsql-hack...@postgresql.org/msg51048.html
http://www.mail-archive.com/pgsql-hack...@postgresql.org/msg51047.html

I'm not sure if there has been any progress in newer postgres to
support easier change of owner.
I'm not aware of any more current better solution.

Unfortunately I think the license of the above didn't help to make
people willing to improve and make the code more popular.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] finding the other statement causing a sharelock

2010-11-08 Thread Ivan Sergio Borgonovo

On Mon, 8 Nov 2010 14:22:16 -0700
Scott Marlowe  wrote:

> Don't know how much it helps here, but this page:
> http://wiki.postgresql.org/wiki/Lock_Monitoring
> is priceless when you're having issues midday with a lock that
> won't go away.

I was thinking to reinvent the wheel and write something similar.

But I was already thinking how am I supposed to "intercept" a lock
that is caused by a long transaction that I know and a process
happening at some unknown time?

I've some strong suspect... and I'd like to exit earlier from a
function if a process is running but I'm not really sure how to add a
semaphore...

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] finding the other statement causing a sharelock

2010-11-08 Thread Ivan Sergio Borgonovo

On Mon, 08 Nov 2010 15:45:12 -0500
Tom Lane  wrote:

> Ivan Sergio Borgonovo  writes:
> > I get
> > DETAIL:  Process 24749 waits for ShareLock on transaction
> > 113443492; blocked by process 25199. Process 25199 waits for
> > ShareLock on transaction 113442820; blocked by process 24749.
> 
> > I would like to know both statements that caused the sharelock
> > problem.

> Recent versions of PG record both (or all) statements involved in a
> deadlock in the postmaster log.

What about not so recent 8.3.9?
thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] finding the other statement causing a sharelock

2010-11-08 Thread Ivan Sergio Borgonovo

I get

DETAIL:  Process 24749 waits for ShareLock on transaction 113443492;
blocked by process 25199. Process 25199 waits for ShareLock on
transaction 113442820; blocked by process 24749.

I would like to know both statements that caused the sharelock
problem.

This is a long running transaction. I know one of the statement. I'd
like to know the other.

How?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] exceptionally large UPDATE

2010-10-29 Thread Ivan Sergio Borgonovo

On Fri, 29 Oct 2010 10:21:14 -0400
Vick Khera  wrote:

> On Thu, Oct 28, 2010 at 1:06 PM, Ivan Sergio Borgonovo
>  wrote:
> > What I'm planning to do is:
> > max_connections = 5
> > shared_buffers = 240M
> > work_mem = 90MB
> > maintenance_work_mem = 1GB
> > max_fsm_pages = 437616
> > max_fsm_relations = 1200
> > checkpoint_segments = 70
> > default_statistics_target = 30
> > #log_min_duration_statement = 1000

> default_statistics_target = 100 is the new "default" for newer
> postgres, and with good reason... try that.
> 
> if you boost your checkpoint_segments, also twiddle the
> checkpoint_timeout (increase it) and checkpoint_completion_target
> (something like 0.8 would be good, depending on how fast your disks
> are) values to try to smooth out your I/O (ie, keep it from
> bursting at checkpoint timeout).  Is 5 connections really enough
> for you?

No. 5 is too few.
OK... this is what I end up with:

max_connections = 100

shared_buffers = 240M
work_mem = 90MB
maintenance_work_mem = 1GB

max_fsm_pages = 437616
max_fsm_relations = 1200

default_statistics_target = 100

checkpoint_segments = 70
checkpoint_timeout = 10min
checkpoint_completion_target = 0.6 #(not very fast drives in raid5)
#log_min_duration_statement = 1000

random_page_cost = 3.0


I tested this on a RAID10 SATA, 8Gb RAM and 2x4cores Xeons

- updating 227985 records over roughly 1.4M took 197744.374 ms
- recreating the gin index took 313962.162 ms
- commit took 7699.595 ms
- vacuum analyse 188261.481 ms

The total update took around 13min.
I've just heard that a similar update on a slower box (RAID1 SAS,
4Gb, 2x2Cores Xeon) running MS SQL took over 30min.
Considering MUCH less pk/fk, constraint and actions where defined on
the MS SQL DB, things now look much better for postgres.

Furthermore postgresql full text search kicks ass to the MS SQL box
even on the slowest box of all (RAID5 SATA, 4Gb, 2x1core HT Xeon,
over 6 years old).

I'll take note of performance even on the slower box as soon as I'll
have large updates, still I'm looking how to make it faster.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] share lock when only one user connected?

2010-10-29 Thread Ivan Sergio Borgonovo

On Fri, 29 Oct 2010 08:19:27 +0200
Alban Hertroys  wrote:

I've to amend the "one user connected" assumption.
But I'm really sure there is no other process writing on
catalog_items.

There is a process that read catalog_items and write on another
table.

create catalog_items {
  itemid bigint primary key
}
create catalog_related {
  itemid bigint references catalog_items (itemid) on delete cascade,
  itemid_related bigint references catalog_items (itemid) on delete
  cascade
}

To add some more details the update is running inside a larger
transaction that update other tables.
There are no "on update cascade" and no other triggers.
I'm using 8.3.4

I admit I've a very naïve knowledge of locks and maybe I'm making
assumptions that aren't valid.
I've been able to find:
http://www.postgresql.org/docs/7.2/static/locking-tables.html
but no equivalent for 8.3
I assume the same is valid for 8.3 so since there are no explicit
LOCK TABLE on catalog_items what's left are the INDEX.
I've been experiencing the same problem even dropping the gin index
on the FT1IDX column but there are other btree index on that table.

How can I get more information in the logs to know which statement
were producing the lock?
One for sure was the update.

> On 28 Oct 2010, at 19:28, Ivan Sergio Borgonovo wrote:
> 
> > I'm running this query when I'm the only user and this should be
> > the only thing running.
> > 
> > update catalog_items
> ...
> >  from (
> >select a.id, a.codice, a.codicealt,
> ...
> >from
> >  import.Articoli a
> >  left join import.ArticoliPropTipo_1 ip on a.id=ip.idArt
> >  where a.action=8
> >) as s
> >where s.id=catalog_items.ItemID
> > ;
> > 
> > And I get
> > 
> > DETAIL:  Process 7188 waits for ShareLock on transaction
> > 110562621; blocked by process 7244. Process 7244 waits for
> > ShareLock on transaction 110562544; blocked by process 7188.
> > 
> > On that table 2 triggers are defined:
> 
> You left out the actual trigger definitions ;)

create trigger FT1IDX_catalog_items_update_trigger
  after insert or update on catalog_items for each
  row execute procedure FT1IDX_catalog_items_update();
create trigger FT1IDX_catalog_brands_update_trigger
  after update or delete on catalog_brands for each
  row execute procedure FT1IDX_catalog_brands_update();

> Could it be possible that you accidentally call the wrong trigger
> on update of catalog_items?

> Another possibility is that the trigger on catalog_items has a
> side-effect of updating catalog_brands - which in turn updates
> catalog_items again, causing your situation.

Is this the case?
My intention was that when catalog_brands.brandid get changed
catalog_items.FT1IDX get updated

> > create or replace function FT1IDX_catalog_items_update() returns
> > trigger as
> > $$
> > declare
> > _Name varchar(64);
> > begin
> >  select into _Name Name from catalog_brands
> >   where BrandID=new.BrandID;
> >   new.FT1IDX:=
> > GetFTIDX('pg_catalog.english', new.Code, new.CodeAlt,
> > new.ISBN, new.Name, new.Authors, _Name);
> >  return new;
> > end;
> > $$ language plpgsql volatile;
> > 
> > create or replace function FT1IDX_catalog_brands_update() returns
> > trigger as
> > $$
> > begin
> >  if(TG_OP='DELETE') then
> >update catalog_items set
> >  FT1IDX=
> >   GetFTIDX('pg_catalog.english', Code, CodeAlt, ISBN, Name,
> > Authors, '') where BrandID=old.BrandID;
> >return old;
> >  else
> >if(coalesce(new.Name,'')<>coalesce(old.Name,'')) then
> >  update catalog_items set
> >FT1IDX=
> >  GetFTIDX('pg_catalog.english', Code, CodeAlt, ISBN,
> > Name, Authors, new.Name) where BrandID=new.BrandID; end if;
> >  end if;
> > return new;
> > end;
> > $$ language plpgsql volatile;
> > 
> > What could it be? how can I fix it?


thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] share lock when only one user connected?

2010-10-28 Thread Ivan Sergio Borgonovo

On Thu, 28 Oct 2010 13:57:18 -0400
Tom Lane  wrote:

> Ivan Sergio Borgonovo  writes:
> > I'm running this query when I'm the only user and this should be
> > the only thing running.
> 
> > And I get
> 
> > DETAIL:  Process 7188 waits for ShareLock on transaction
> > 110562621; blocked by process 7244. Process 7244 waits for
> > ShareLock on transaction 110562544; blocked by process 7188.
> 
> It's pretty hard to believe that you haven't got two processes ...
> maybe there's something using dblink down in there somewhere?
> 
> Also, if you're using a recent PG release, the postmaster log
> should contain additional information about the deadlock, like the
> specific queries involved on both sides.

I can't think of any other process running concurrently... but yeah
I'm surprised as well, and no there is no dblink running.

I may be wrong on assuming there is no other process running and
I'll check further, but at least I'm pretty sure there is no query
that is writing in that table other than the update.
These start to happen when involved row are above 80K and chances it
happens grow up as the number of rows involved grow (not a big
surprise).
Chances that this happens decrease if I drop the triggers. This may
just be related to the chance that 2 queries clashes since leaving
the triggers there increase the execution time.

There is nothing else on the log other than:
2010-10-26 18:12:09 CEST ERROR:  deadlock detected
2010-10-26 18:12:09 CEST DETAIL:  Process 9729 waits for ShareLock
on transaction 110146905; blocked by process 11082. Process 11082
waits for ShareLock on transaction 110145470; blocked by process
9729.

Followed by the update statement...

How am I going to find which statements where locking each other?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] share lock when only one user connected?

2010-10-28 Thread Ivan Sergio Borgonovo

I'm running this query when I'm the only user and this should be the
only thing running.

update catalog_items
  set Code=s.codice, CodeAlt=s.codicealt, BrandID=s.marca,
  Name=initcap(s.nome), Description=s.desc_l1,
  qty=coalesce(s.quantita, 0),
  ListPrice=coalesce(s.prezzoListino, 0) ,
  valIva=s.valIva, ivaInc=s.ivaInc,
  StatusID=coalesce(s.stato, 1) ,
  Weight=s.peso, dataInserimento=s.dataInserimento, daPub=s.daPub,
  BestSeller=s.bestSeller, posInMag=s.posInMag , ISBN=s.ISBN,
  dataPub=coalesce(s.dataPubblicazione, datapub),
  updated=now()
  from (
select a.id, a.codice, a.codicealt,
a.marca, a.nome, a.desc_l1, a.quantita, a.prezzoListino,
a.valIva, a.ivaInc, a.stato, a.peso, a.dataInserimento, a.daPub,
a.bestSeller, a.posInMag, a.ISBN, ip.dataPubblicazione
from
  import.Articoli a
  left join import.ArticoliPropTipo_1 ip on a.id=ip.idArt
  where a.action=8
) as s
where s.id=catalog_items.ItemID
;

And I get

DETAIL:  Process 7188 waits for ShareLock on transaction 110562621;
blocked by process 7244. Process 7244 waits for ShareLock on
transaction 110562544; blocked by process 7188.

On that table 2 triggers are defined:

create or replace function FT1IDX_catalog_items_update() returns
trigger as
$$
declare
_Name varchar(64);
begin
  select into _Name Name from catalog_brands
   where BrandID=new.BrandID;
   new.FT1IDX:=
 GetFTIDX('pg_catalog.english', new.Code, new.CodeAlt, new.ISBN,
 new.Name, new.Authors, _Name);
  return new;
end;
$$ language plpgsql volatile;

create or replace function FT1IDX_catalog_brands_update() returns
trigger as
$$
begin
  if(TG_OP='DELETE') then
update catalog_items set
  FT1IDX=
   GetFTIDX('pg_catalog.english', Code, CodeAlt, ISBN, Name, Authors, '')
where BrandID=old.BrandID;
return old;
  else
if(coalesce(new.Name,'')<>coalesce(old.Name,'')) then
  update catalog_items set
FT1IDX=
  GetFTIDX('pg_catalog.english', Code, CodeAlt, ISBN, Name, Authors, 
new.Name) where BrandID=new.BrandID;
end if;
  end if;
return new;
end;
$$ language plpgsql volatile;

What could it be? how can I fix it?



-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] exceptionally large UPDATE

2010-10-28 Thread Ivan Sergio Borgonovo

On Thu, 28 Oct 2010 08:58:34 -0400
Vick Khera  wrote:

> On Wed, Oct 27, 2010 at 10:26 PM, Ivan Sergio Borgonovo
>  wrote:
> > I'm increasing maintenance_work_mem to 180MB just before
> > recreating the gin index. Should it be more?
> >
> 
> You can do this on a per-connection basis; no need to alter the
> config file.  At the psql prompt (or via your script) just execute
> the query
> 
> SET maintenance_work_mem="180MB"

> If you've got the RAM, just use more of it.  'd suspect your server
> has plenty of it, so use it!  When I reindex, I often give it 1 or
> 2 GB.  If you can fit the whole table into that much space, you're
> going to go really really fast.

> Also, if you are going to update that many rows you may want to
> increase your checkpoint_segments.  Increasing that helps a *lot*
> when you're loading big data, so I would expect updating big data
> may also be helped.  I suppose it depends on how wide your rows
> are.  1.5 Million rows is really not all that big unless you have
> lots and lots of text columns.

Actually I'm pretty happy with performance of the DB under normal
circumstances. I never investigated to much if I could squeeze it
more. But when I have to deal with such "huge" updates the
performance is painful.

You made me start to wonder if I could improve performances even
under normal load.

But right now I've to take care of this huge (well the use of huge
is just related to the performance I'm obtaining right now) update.

The things I've touched compared to stock configuration where:
max_connections = 100 
shared_buffers = 240M
work_mem = 42MB
maintenance_work_mem = 180MB #(generally it is 40MB)
# these were touched as of autovacuum suggestion
max_fsm_pages = 437616
max_fsm_relations = 1200
checkpoint_segments = 35
random_page_cost = 3.0
default_statistics_target = 30
log_min_duration_statement = 1000

The box is running apache, total average occupied length of tetxt
for each row should be around 1Kb on the largest table.

What I'm planning to do is:
max_connections = 5
shared_buffers = 240M
work_mem = 90MB
maintenance_work_mem = 1GB
max_fsm_pages = 437616
max_fsm_relations = 1200
checkpoint_segments = 70
default_statistics_target = 30
#log_min_duration_statement = 1000

Any improvement?


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] exceptionally large UPDATE

2010-10-27 Thread Ivan Sergio Borgonovo

I've to make large UPDATE to a DB.
The largest UPDATE involve a table that has triggers and a gin
index on a computed tsvector.
The table is 1.5M records with about 15 fields of different types.
I've roughly 2.5-3Gb of ram dedicated to postgres.

UPDATE queries are simple, few of them use join and mainly consist
of updating records from temporary tables that contains a very
similar structure to the target.

This updates are rare so I can afford to tune postgresql just for
this large update and then return to a more balanced configuration.
I can even afford to be the only user of the DB so responsiveness of
the application using the DB is not an issue. Duration of the update
is.

Anything I can tune in postgresql.conf to speed up the UPDATE?

I'm increasing maintenance_work_mem to 180MB just before recreating
the gin index. Should it be more?
The update should be monolithic and it is inside a single
transaction. Since I can afford to be the only user of the DB for a
while, is there anything I can tune to take advantage of it?
What else could I change to speed up the update?

The triggers recreate the tsvector. One of the component of the
tsvector is taken from a join table.

I'll surely drop the gin index and recreate it when everything is
over.
I'm not sure if it's a good idea to drop the triggers since I'll
have to update the tsvectr later and I suspect this will cause twice
the disk IO.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search

2010-09-28 Thread Ivan Sergio Borgonovo

On Tue, 28 Sep 2010 20:19:10 +0200
Alban Hertroys  wrote:

> On 28 Sep 2010, at 12:49, Ivan Sergio Borgonovo wrote:
> > The hardware on the 2 machines is a bit different.
> > MS SQL 2005 runs on a box that is 2 years old, 2 SATA drives on
> > RAID 1 hw, 2 Xeon dual core (I can't check details right now)
> > PG runs on a box that has more than 5 years, 3 SCSI drives on
> > RAID 5 hw, 2 Xeon single core HT (Intel(R) Xeon(TM) CPU 3.20GHz,
> > cpu fam 14, model 4)

> > In both tables I've to update price.
> > VACUUM FULL was performed just before updating the prices.

> VACUUM FULL? Was that really necessary? You did REINDEX after
> that, didn't you? If not, your indexes became bloated. If the
> table wasn't empty before you probably meant to do a VACUUM
> ANALYSE, but if it was, just ANALYSE would have been sufficient.

...

> Did you ANALYSE between loading the data and updating?

I thought VACUUM FULL was more "magical" and implied a REINDEX.
Am I wrong?
The index that should be reindexed is the one on the pk, a simple
btree, that's not going to be as slow as rebuilding a gin... still
I'd really thought that VACUUM FULL implied a lot of things
(including ANALYZE as well).

> Also, executing the trigger on each copied line is likely to be a
> little slow, due to the overhead of calling a stored procedure
> (especially if it's plpgsql). It's probably quite a bit faster to
> disable the trigger and create the gin-index after loading the
> data.

I'll try to drop the trigger. I'm not expecting it the biggest
factor still adding something here and something there may end up in
the huge difference between the 2.
Anyway MS SQL seems to overcome all this nuisances auto-magically.

> An EXPLAIN ANALYSE of that statement would tell what it's doing
> and what's taking so long. A simple EXPLAIN would probably be
> sufficient to see what query plan it thinks it needs though.

I'll post the EXPLAIN. Before I run EXPLAIN ANALYZE I've to take
some precaution the DB doesn't explode.

> Did you tune that database? Several options (work_mem for example)
> could significantly improve your performance if you can set them
> higher (or reduce it if you set them too high). You can do that
> per session too.

pg is generally faster than the other MS SQL box on what's normally
done on a daily basis. Just large updates to the product page seems
to be a pain. Other INSERT/UPDATE operations are seldom performed,
they involve smaller tables with no gin index.

> > If HW can justify such huge difference I'll devote my time to
> > other problems.

> Partially, yes, but not that much I think.

That's my worry... but still in many circumstances pg performs
better than the MS SQL box... yeah... on pretty different
workload... but while on other workloads pg is a bit faster (20%
to 100% faster) even if it is on an older box, on this one is very
slow.

> > I'd say that a potential culprit could be the gin index. No
> > matter if the tsvector is updated or not, if the row is changed
> > I think the index is going to be updated anyway.

> gin indexes require relatively much RAM. If you didn't assign much
> in your settings then it's quite possible that the database can't
> keep the index in memory or that things have to spill to disk.
> Leave enough room for the OS's disk cache though, Postgres
> benefits from that as well.

> Is there any particular reason you went with a gin index and not a
> gist one? Gin can be faster, but consumes (much) more memory, but
> gist is also quite good with text searches and doesn't require
> quite as much memory.

gin index is doing a very good work and well full text searches are
the typical workload of that box and the one that is more important
to be fast.

I'd say if gin was occupying so much memory performances wouldn't be
so good on a daily basis. 

I'd post excerpt of my postgres.conf (what's important here?) and
see if anything can be improved for *this* workload and temporary
tune the DB for this exceptional update still I'm going to continue
to be a bit surprised of such a huge difference even if it will come
out that it was actually a .conf not suited for this workload.

BTW the box is running Apache and php.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search

2010-09-28 Thread Ivan Sergio Borgonovo

I know I'm comparing apples and orange but still the difference in
performance was quite astonishing.

I've 2 tables that look like:

create table products(
  id bigint 
  price double precision, /* legacy, don't ask */
  sometextfield1 varchar(128),
  sometextfield2 varchar(128),
  ...
);

one on a MS SQL 2005 and another one on pg 8.3.

MS SQL has full text search on the text fields (I don't know the
details).

pg product table has a tsvector field and a gin index defined on it +
trigger that update the tsvector field when the textfields change.
The trigger is made in a way that it actually update the tsvector
just if the text fields are changed.

The hardware on the 2 machines is a bit different.
MS SQL 2005 runs on a box that is 2 years old, 2 SATA drives on RAID
1 hw, 2 Xeon dual core (I can't check details right now)
PG runs on a box that has more than 5 years, 3 SCSI drives on RAID 5
hw, 2 Xeon single core HT (Intel(R) Xeon(TM) CPU 3.20GHz, cpu fam
14, model 4)

Both have 4Gb of ram.
shared_buffers is 240Mb.
Both share a similar workload.
Both boxes were in the same "price class" when they were bought.

In both tables I've to update price.
VACUUM FULL was performed just before updating the prices.

MS SQL receives a large sql file that contain all the UPDATE
statements.
PG receive a csv file that is loaded into a table with COPY and then
does the update as
update products set price=p.price from temp_price where id=p.id and
price<>p.price;

MS SQL ingurgitate the whole sql file in around 10sec.
pg takes more than 5 min to just run the single update statement.

I'd like to know if such a large difference can be justified just by
HW difference or by a difference in the process on how data are
loaded [1] or by the difference in performance of the 2 servers on
this kind of workload or by some postgres config before I decide how
to manage my time to redesign the import procedure.

If HW can justify such huge difference I'll devote my time to other
problems.

I'd say that a potential culprit could be the gin index. No matter
if the tsvector is updated or not, if the row is changed I think the
index is going to be updated anyway.
Somehow MS SQL circumvent this problem, possibly by building the
equivalent of a tsvector column in a "hidden" table that
automatically join to the "text" table.
This add a join but reduce the cost of table modification since
simpler (btree) indexes are faster to update.

Still huge updates are rare and that table is mostly read and very
rarely written. During unusually huge updates I may consider to drop
the gin index.

[1] I'd expect that excluding the time it takes to load the csv a
single update should run faster than a huge list of single statement
update

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] MySQL versus Postgres

2010-08-13 Thread Ivan Sergio Borgonovo

On Fri, 13 Aug 2010 14:17:17 +0800
Craig Ringer  wrote:

> On 13/08/10 08:38, Tatsuo Ishii wrote:
> >> It's slower than smaller numbers, and if you actually dirty a 
> >> significant portion of it you can have a checkpoint that takes
> >> hours to sync, completely trashing system responsiveness for a
> >> good portion of it.
> > 
> > So how much is the reasonal upper limit of shared_buffers at this
> > point? If it's obvious, should we disable or warn to use more
> > than that number?
> 
> Trouble is, there won't be a "reasonable upper limit" ... because
> it depends so much on the ratio of memory to I/O throughput, the
> system's writeback aggressiveness, etc etc etc.
> 
> Personally I've had two Pg machines where one seems to suffer with
> shared_buffers > 250MB out of 4GB and the other, which has 8GB of
> RAM, wants shared_buffers to be around 4GB! The main difference:
> disk subsystems.

What about the ratio of R/W? If it is a mostly read system is the
memory/IO throughput still a limiting factor for increasing
shared_buffers?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] delete query taking way too long

2010-08-12 Thread Ivan Sergio Borgonovo

On Thu, 12 Aug 2010 12:50:49 +0100
Thom Brown  wrote:

> On 12 August 2010 12:14, Ivan Sergio Borgonovo
>  wrote:
> > I've
> > delete from catalog_items where ItemID in (select id from
> > import.Articoli_delete);
> >
> > id and ItemID have an index.
> >
> > catalog_items is ~1M rows
> > Articoli_delete is less than 2K rows.
> >
> > This query has been running for roughly 50min.
> > Right now it is the only query running.
> >
> > PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC)
> > 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
> >
> > --
> You can try to do deletes in batches of 10,000:

ItemID is a PK.
Even if there is no constraint on Articoli_delete.id they *should*
be unique.

There are some other tables with indexes... but I don't expect that
more than 4K row for each table will be deleted.
There are a couple of other large (1M rows) table where an on delete
cascade is defined.

This is the query plan
Nested Loop  (cost=30.07..10757.29 rows=1766 width=6)
 ->  HashAggregate  (cost=30.07..47.73 rows=1766 width=8)
   ->  Seq Scan on articoli_delete  (cost=0.00..25.66 rows=1766
   width=8)
 ->  Index Scan using catalog_items_pkey on catalog_items
  (cost=0.00..6.05 rows=1 width=14)
Index Cond: (catalog_items.itemid = articoli_delete.id)

BTW it is happening again... after I stopped pg, restarted the whole
server and re-run the query.

This query get generally unnoticed in a longer process but I doubt
it ever lasted more than a couple of minutes in the past.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] delete query taking way too long

2010-08-12 Thread Ivan Sergio Borgonovo

On Thu, 12 Aug 2010 17:14:17 +0530
Jayadevan M  wrote:

> > I've
> > delete from catalog_items where ItemID in (select id from
> > import.Articoli_delete);
> Does catalog_items have child tables where the FK columns are not
> indexed? Regards,

Possibly, but very small ones.

What I missed to say is... that query always worked reasonably fast
in the past. The size of the DB didn't grow substantially recently.

I'd say the query shouldn't be the problem... the question should
have been:
what should I look when postgresql start to behave strangely?
eg. missing resources, locks, solved bug (it is a reasonably old
version)...

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] delete query taking way too long

2010-08-12 Thread Ivan Sergio Borgonovo

I've
delete from catalog_items where ItemID in (select id from
import.Articoli_delete);

id and ItemID have an index.

catalog_items is ~1M rows
Articoli_delete is less than 2K rows.

This query has been running for roughly 50min.
Right now it is the only query running.

PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC)
4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-29 Thread Ivan Sergio Borgonovo

On Thu, 29 Jul 2010 14:57:04 -0400
Tom Lane  wrote:

> Samantha Atkins  writes:
> > Why is MySQL so much more popular right now, especially in the
> > OpenSource community?
> 
> I think it's strictly historical.  The mysql bias you see in so
> many web tools was established in the late 90s, a time when mysql
> worked reasonably well (at least according to the mysql developers'
> notion of "reasonably well") whereas postgres was still pretty slow
> and buggy.  It took us a long time to get from the original
> academically-oriented code to something of real production quality.
> We're definitely competitive now, but I don't know if we'll ever
> fully overcome that historical disadvantage.

How popular is Visual Basic right now?
And even if it was more popular than C#... what kind of application
would you expect to find that start development right now in VB?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-29 Thread Ivan Sergio Borgonovo

On Thu, 29 Jul 2010 08:52:46 -0700
"Joshua D. Drake"  wrote:

> The issue isn't Drupal. It is modules. There are a lot of popular
> modules that do not work with PostgreSQL (Lightbox for example).

> The google checkout module for Ubercart didn't work either until
> relatively recently.

I'd say the opposite but I'll wait to test more D7.
Core takes ages to agree on what should be done to fix bugs for
Postgres without affecting even the "feelings" of MySQL developers.

Modules may have more problems but fixing them is generally trivial
and generally upstream is quick to integrate the fix.

The problem for core is maintaining your patches till and if they
fix the bug.

I agree that PHP and MySQL are a perverse match.

Still if he plans to deploy stuff as "commodity" software they are a
necessary evil.
The problem arise when you're in-between custom and RAD.
Anyway more python/django based cms are flourishing... and given
Django originally supported DB was Postgres...
http://www.django-cms.org/ [1]

Migration of Onion from Drupal/Mysql -> Django/Postgresql is
emblematic.

[1] I think I could make a quick benchmark if possible on postgresql
and mysql

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-29 Thread Ivan Sergio Borgonovo

On Wed, 28 Jul 2010 22:24:07 -0700
"Joshua D. Drake"  wrote:

> On Thu, 2010-07-29 at 07:04 +0200, Ivan Sergio Borgonovo wrote:
> 
> > BTW up to my memory Django suggest postgres. I haven't seen any
> > benchmark of Django with pg vs mysql.

> Django was originally developed for Postgres but really, they are
> wholly different beasts. 

You're right. It would be nice to see benchmark of any cms developed
with Django on postgresql and mysql.
I tried to find benchmark of Plone on postgres vs mysql.

I'd tend to think (and I may be wrong) that as a rule of thumb,
being everything else equal, mysql is more suited to "commodity" cms
just because it is easier to find coupled with php in hosting (and
this reflects on communities etc...).

Still it would be nice to put the myth of mysql is better on cms,
since they are read most apps, to rest too.

But then... there are no popular [anything but php] cms but there
are a lot of [anything but php] web framework.

You start with a pre-packaged web application that looks like a
framework, then you start to do custom code, then you start to have
more impedance mismatch...
The more you've to code, the more you will prefer a framework and
postgres... but if you've coded enough it means you can afford to
code your own web application out of a framework and have your own
box (no hosting).

BTW which one of the example you posted uses ubercart?
I'd be curious about how many concurrent operation on the basket does
http://www.commandprompt.com/portal/
have...

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-28 Thread Ivan Sergio Borgonovo

On Wed, 28 Jul 2010 18:56:56 -0400
Greg Smith  wrote:

> Ivan Sergio Borgonovo wrote:
> > Are there companies that offer drupal/postgres tuning?

> I am quite sure that Command Prompt would be happy and fully
> prepared to sell you Drupal + PostgreSQL tuning services.  We also
> have some projects around it, and I'm sure other consulting
> companies or individuals do too.  I'd predict that if you sent a
> message to pgsql-jobs saying you're looking to hire someone for
> that sort of work, you'd get a stack of responses from qualified
> people in the PostgreSQL community.

Sure. What I haven't been able to spot are drupal companies that do
drupal tuning when it is running with postgres.

Of course here on pg ml is not hard to find companies that won't
refuse to tune postgres even if you use it for drupal ;)

BTW up to my memory Django suggest postgres. I haven't seen any
benchmark of Django with pg vs mysql.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-28 Thread Ivan Sergio Borgonovo

On Wed, 28 Jul 2010 12:45:47 -0700
"Joshua D. Drake"  wrote:

> On Thu, 2010-07-29 at 00:36 +0530, Sandeep Srinivasa wrote:
> > yup I did. The reason why I wanted examples was to amply
> > demonstrate,to clients, that postgresql is viable. 
> > It is kinda weird if the only examples I have are restricted to
> > the postgresql _community_ websites themselves.
> > 
> Well you are kind of asking in the wrong place. You should be
> asking in #drupal, #drupal-support, #drupal-ubercart or in the
> Drupal forums.

Well he will spend most of the time filtering people bashing
postgres there.

> > Plus, it would have been interesting to know which version of
> > Drupal, Ubercart, etc was being used for such deployments.
> > Again, it is relevant because of certain (older) benchmarks
> > which denote significantly worse performance because of the
> > suboptimal way that

> Latest 6.x release and latest Ubercart release.

> >  Drupal integrates with Postgresql :
> > http://mikkel.hoegh.org/blog/2008/oct/13/drupal-database-performance-mysql-postgresql/
> > There has been _nothing_ to disprove the above numbers, ever
> > since - please correct me if I am wrong.

> You should read that "whole" blog. PostgreSQL does very well in
> consideration of the environment. I would also note that there is
> no reference to whether or not he tuned PostgreSQL or not. 

> I have zero problems running Drupal with PostgreSQL and getting
> great performance but then again I know enough to tune both
> Drupal, PHP and PostgreSQL. Most people can't say that (I am not
> saying you can't).

I'm happy with PostgreSQL and Drupal too and right now I didn't have
to get too worried about performances.

D7 should support many things that makes more sense to use Postgres.
I had to tweak D5 and D6 core to make it work with Postgres as I
needed... the problem is it takes a lot of time to see postgres
related patch get into core.
Modules that are worth to use generally have reasonable maintainer,
fixes and release are much faster.

Still I'd say that if you don't have any specific reason to use
postgresql (you have to access data on another app using postgres,
you need some special feature (full text, GIS), you've a lot of
writes to the DB...) would be a better choice if you had equal
knowledge of both.

Are there companies that offer drupal/postgres tuning?

> That is a tough one. I mean, prove it to him. Set up Drupal with
> MySQL/Innodb and setup Drupal with PostgreSQL and do some tests.
> You can also look for things like this:
> 
> http://www.commandprompt.com/blogs/joshua_drake/2010/07/multiple_drupal_installations_single_login_10_steps/

Schemas in postgres with drupal are great.

using:
http://www.webthatworks.it/d1/content/howto-duplicating-schema-postgresql
and
http://www.webthatworks.it/d1/content/excluding-some-tables-data-backup-including-their-schema
makes a breeze to duplicate sites.
And you can still conserve all triggers pk, fk, on duplicate
cascade...

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] resource management, letting user A use no more than X resource (time, CPU, memory...)

2010-07-16 Thread Ivan Sergio Borgonovo

On Fri, 16 Jul 2010 19:43:01 +0800
Craig Ringer  wrote:

> On 16/07/10 19:21, Ivan Sergio Borgonovo wrote:
> > If I'd like to learn how to manage resources in postgres and
> > grant different users different time slot/memory/CPU?
> > 
> > eg. I'd like to grant to user A to execute queries that last less
> > than 1min or that occupy no more than X Mb... etc...
> 
> PostgreSQL doesn't really offer much in the way of features for
> per-user resource control, resource impact isolation, etc.
> 
> You can potentially run different PostgreSQL instances
> (postmasters, not just databases) in different domains of a
> virtualization or resource-control setup, but that's pretty
> inefficient, adds a lot of admin work, and doesn't help if your
> users need to be able to use the same database(s).

> If you need strong user resource limits, user storage limits, etc
> PostgreSQL might not be your best option. There are some things
> you can do, but there's not much.

What about an external process that monitor backend and kill them
gracefully if they suck too many resources accordingly to the user
linked to that backend?

Or... gluing together a load balancing solution that divert
accordingly to the user to different slaves accordingly that have
slightly different setup?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] resource management, letting user A use no more than X resource (time, CPU, memory...)

2010-07-16 Thread Ivan Sergio Borgonovo

If I'd like to learn how to manage resources in postgres and grant
different users different time slot/memory/CPU?

eg. I'd like to grant to user A to execute queries that last less
than 1min or that occupy no more than X Mb... etc...

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Need Some Recent Information on the Differences between Postgres and MySql

2010-06-25 Thread Ivan Sergio Borgonovo

On Fri, 25 Jun 2010 08:48:11 -0700
Rob Wultsch  wrote:

> The freedom of the storage engine interface allows for much more
> varied backend characteristics. Some examples:

This is *really* fascinating but pg transactional engine is very
mature and solid.
Before any of the should-be-really-transactional-engine of MySQL
will be as mature as pg's one, pg will have a lot more
feature/speed/development sugar.

If I didn't need a transactional engine I'd be happy to go (well not
really happy[1]) with MySQL and any of its engines.

It is nice to reuse and mix and match different tools.

[1] I find some inconsistency of mysql a bit painful to live with...
as I find some inconsistency in PHP equally irritating but they are
there, quite diffused and still for certain domains the best
compromise available.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] A thought about other open source projects

2010-06-21 Thread Ivan Sergio Borgonovo

On Mon, 21 Jun 2010 11:27:20 -0700
David Fetter  wrote:

> On Mon, Jun 21, 2010 at 01:55:36PM -0400, Brad Nicholson wrote:
> > Scott Marlowe wrote:
> > >>As with phrases like, "the quickest way to grill a unicorn
> > >>steak," that it can be stated in a few words does not make in
> > >>possible.
> > >
> > >Exactly.  The big issue here is that nobody's saying what kind
> > >of app they want to write.
> > Or what sort of performance requirements are tied to that app.
> 
> It's not performance requirements that tend to tank such projects,
> but the amount of maintenance involved.  Extending the app gets
> quadratically painful.

I perfectly agree. But maybe the number of hackers involved in a
project could grow faster if the project serves more purposes.

Reuse lower maintenance costs too.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] A thought about other open source projects

2010-06-20 Thread Ivan Sergio Borgonovo

On Sun, 20 Jun 2010 12:52:22 +0200
Thomas Kellerer  wrote:

> David Goodenough wrote on 20.06.2010 11:08:
> > I don't support anyone has written a "how to write database
> > agnostic code" guide?  That way its not a matter of porting,
> > more a matter of starting off right.
> 
> I don't believe in "database agnostic code".

It depends on what you're after.

There are applications that really use a DB as a SQL interface to
the file system (sort of).
Coding non standard SQL for such kind of application (and there are
many out there) is just a different practice of BDSM.

Still even when you deal with application that can really take
advantage of the various special feature of some DB and their SQL
dialect... 80% of the written SQL could be standard.

Sometimes people write abstraction code to make it easier to write
in the only SQL dialect they know for a very small coding advantage
while they could write a better one that at least wouldn't make a
pain to post the application.

When thinking about portable code I generally find this documents
useful:
http://sql-info.de/postgresql/postgres-gotchas.html

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] how to alias a table

2010-06-13 Thread Ivan Sergio Borgonovo

I'm refactoring some code and I'll find helpful to be able to alias
tables.

What I'd like to do would be to refer to the same table with an
alias in the code and later substitute the alias with an actual VIEW.

Of course I could define a view as
select * from original_table
right from the start but I'm worried this may incur in some overhead
I currently can't afford.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Debian: upgrading from was: What Linux edition we should chose?

2010-05-31 Thread Ivan Sergio Borgonovo

On Mon, 31 May 2010 17:23:51 +0200
Szymon Guz  wrote:

> > Yeah it is. But what is it going to be an upgrade process? On a
> > production box?
> > Any experience to share on upgrading from source on Debian?

> Usually that's pretty easy: for upgrading the minor version (e.g.
> from 8.3.1 to 8.3.3) it should be enough to compile the new
> sources, stop server, run `make install` and run the server with
> new binaries. Upgrading from 8.3 to 8.4 can be easily done using
> dump from current version. There is nothing wrong to run the new
> and old postgres versions parallel so you can copy data from one
> database to another. There is also pgmigrator, but I haven't
> checked that yet.

That's clear but there are a bunch of small and possibly very
annoying details that make deploying in production a bit more
challenging than ./configure, make, make install.

I admit I only compiled postgres in my /home when I was developing
an extension. It is something I do rarely and never on production.

If I was thinking to upgrade on a debian box that is already running
a packaged version I'd have to understand how deal with debian
patches (I think most were related to paths where postgres expect to
find it's stuff).

Once I understand what all debian patches do I'll try to see if I
can avoid them all so that upgrading will be easier the next time.

I'll have to see how debian ./configure the package, I'll have to
replicate the init.d script for the newer version, take care of
making the 2 servers run temporarily on different ports... etc...

I could even think of making a .deb

I think about it I could even come up with a longer list of things I
should do.

I bet I'm not the first one that's going to upgrade Debian from
source. So someone may share his recipe and caveats.

I was actually thinking to test 9.0 in my /home on some real world
DB. That could be a chance to learn how to upgrade from source.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] What Linux edition we should chose?

2010-05-31 Thread Ivan Sergio Borgonovo

On Mon, 31 May 2010 08:47:25 -0600
Scott Marlowe  wrote:

> On Mon, May 31, 2010 at 2:29 AM, Michal Szymanski
>  wrote:
> > Hi,
> > Currently we use Debian, but it chosen by our OS admnistrator.
> > Now we can change our OS and it is question what Linux edition
> > will be the best. We would like have access to new versions of
> > Postgres as soon as  possible, for Debian sometimes we had to
> > wait many weeks for official packages.
> 
> Pgsql is pretty easy to build from source.

Yeah it is. But what is it going to be an upgrade process? On a
production box?
Any experience to share on upgrading from source on Debian?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] unable to avoid a deadlock at the end of a long transaction

2010-05-07 Thread Ivan Sergio Borgonovo

On Fri, 07 May 2010 10:29:20 -0400
Tom Lane  wrote:

> Ivan Sergio Borgonovo  writes:
> > I've been having this:
> > psql:include/custom/import_update.custom.sql:63: ERROR:  deadlock
> > detected DETAIL:  Process 13349 waits for AccessExclusiveLock on
> > relation 250510 of database 248569; blocked by process 14153.
> > Process 14153 waits for ShareLock on transaction 59160779;
> > blocked by process 13349. CONTEXT:  SQL statement "drop trigger
> > if exists FT1IDX_catalog_brands_update_trigger on
> > catalog_brands" PL/pgSQL function "ft1idx_trigger_drop" line 2
> > at SQL statement

> I'd suggest not using DROP TRIGGER in operations that need to run
> concurrently with other accesses to the same table.  Consider
> fixing things so the trigger is always there but knows enough to
> not do anything when it doesn't need to.

That's nice to know... but even skipping the whole drop/create
trigger thing the lock problem is still there and still happens near
the end of a long transaction that makes a lot of other stuff on
mainly one table.

The statement that cause the lock is not always the same, what is
"constant" across several modification of the overall transaction
is: the lock happens near the end of the transaction.

I'd say that that *should* be the only one transaction *writing* to
the few tables that are involved in the transaction, some of which
are very small (hundreds of record).

I expect (that doesn't mean I know) that from a writing point of
view the overall transaction doesn't involve any write concurrency.
So I thought I wouldn't be involved in stable locking problems on
*random* statement whose only fault is being near the end of the
whole transaction.

I need some help on how to learn how to track down this kind of
problem.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] unable to avoid a deadlock at the end of a long transaction

2010-05-07 Thread Ivan Sergio Borgonovo

I've been having this:

psql:include/custom/import_update.custom.sql:63: ERROR:  deadlock
detected DETAIL:  Process 13349 waits for AccessExclusiveLock on
relation 250510 of database 248569; blocked by process 14153.
Process 14153 waits for ShareLock on transaction 59160779; blocked
by process 13349. CONTEXT:  SQL statement "drop trigger if exists
FT1IDX_catalog_brands_update_trigger on catalog_brands" PL/pgSQL
function "ft1idx_trigger_drop" line 2 at SQL statement

I reshuffled the update process and I started to have other lock
problems.
The only process that I'm aware of that should be *writing* to the
tables involved is the update process. I'd expect other process are
reading but just the update should be writing.

The lock problem happens nearly at the end of the overall update
process that is one big transaction, reshuffling a bit the steps
doesn't make it go away... it just locks on other statements but
still at the *end* of the process after a bunch of update and insert
have been made on a bunch of other tables the largest of whom is an
update of roughly 85834 rows on a table containing 1M rows.

The only thing that look constant is: I get a deadlock at the end of
a long process on a random statement.

Where am I going to start from to solve this?

DETAIL:  Process 3662 waits for ShareLock on transaction 59301028;
blocked by process 4303. Process 4303 waits for ShareLock on
transaction 59299342; blocked by process 3662. CONTEXT:  SQL
statement "update catalog_items set Authors= $1  where ItemID= $2 "
PL/pgSQL function "updateauthorsall" line 19 at SQL statement

create or replace function UpdateAuthorsAll()
  returns void
  as
  $$
  declare
_row record;
_ItemID bigint;
_Authors varchar(1024);
_AuthorsOLD varchar(1024);
_Name varchar(50);
  begin
_Authors := '';
_ItemID := null;
for _row in select a.Name, ia.ItemID from   catalog_itemauthor
  ia join   catalog_author   a on a.AuthorID=ia.AuthorID
  order by ia.ItemID
  loop
  if(_row.ItemID<>_ItemID) then
if(length(_Authors)>2) then
  _Authors := substring(_Authors from 3);
  select into _AuthorsOLD Authors from   catalog_items
where ItemID=_ItemID;
  if(coalesce(_Authors, '')<>coalesce(_AuthorsOLD, '')) then
update   catalog_items   set Authors=_Authors where
  ItemID=_ItemID;
  end if;
end if;
_Authors := '';
  end if;
  _ItemID := _row.ItemID;
  _Name := trim(E' \t' from _row.Name);
  if(length(_Name)>0) then
    _Authors := _Authors || ', ' || _Name;
  end if;
end loop;
  return;
  end;
$$ language plpgsql volatile;

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] restore multiple tables postgres

2010-04-14 Thread Ivan Sergio Borgonovo

On Wed, 14 Apr 2010 10:56:36 -0400
akp geek  wrote:

> Dear all -
> 
> Can you please help me with this? Is there a way to restore
> multiples ( more than one table ) using a single command from a
> whole database dump that was created using pg_dump

Something along the line of:

http://www.webthatworks.it/d1/node/page/excluding_some_tables_data_backup_including_their_schema


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] HW and performances

2010-04-02 Thread Ivan Sergio Borgonovo

I'm using apache and postgres to make an ecommerce website work on
an old xeon box 

processor   : 0   
vendor_id   : GenuineIntel
cpu family  : 15  
model   : 4   
model name  :   Intel(R) Xeon(TM) CPU 3.20GHz

It looks it has 4 cores but I think they are 2 cores + HT

4Gb RAM
3 disks RAID5 [sic]

free
 total   used   free sharedbuffers cached
Mem:   40480243984896  63128  0 2083042645748
-/+ buffers/cache:11308442917180
Swap:  78131124927812620

Swap get slightly hit after long period of uptime.

Most of the work happens in the items table ~1M rows.

The queries that actually impact on usability are:
- full text searches on a gin index on a tsvector
- updates of the catalogue (and related gin index) in batches of 20K
  items updated/inserted.

It *seems* (not really sure) that the bottleneck for full text
searches are the CPU.

I'll have the chance to split the load over a newer box:
2 socket 5120 Xeon box
4GB RAM
2 disks RAID0 (single disk io seems faster than the box above)

I'm planning to move the DB on the newer box and leave apache on the
old one. This should free enough ram on the DB box so that the swap
will be untouched.

I'm hoping to halve the average page generation time.

I could add ram or disks to the newer box. Changing CPU doesn't look
a good investment.

Did I assign the task to the various boxes correctly?
Do I have any chance of getting near to my target even adding some
more HW to the newer box?
If not I'd think to buy a brand new box. Any suggestion?

By comparison I've a 2 socket x 4 cores Xeon E5310 @ 1.60GHz
8 Gb RAM
4 disks SATA RAID10
with apache and pg running inside a vserver (2.6.26-2-vserver-amd64)
that looks it can handle both jobs (apache and pg) at a reasonable
speed (roughly twice faster than needed) but that is still a bit
slower than what I'd like on catalogue updates.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] explicit cast for null::bigint

2010-03-13 Thread Ivan Sergio Borgonovo

I've found this a bit strange:

insert into catalog_related (itemid, related_itemid, rank)
select distinct itemid, null::bigint, 0::float4
from catalog_categoryitem
where catid is null or catid<>0
and itemid not in (select itemid from
catalog_related);

create table catalog_related (
  itemid bigint,
  updated timestamp default now(),
  related_itemid bigint,
  rank float4
);

without the explicit cast for the null I get 

ERROR:  column "related_itemid" is of type bigint but expression is
of type text LINE 1: insert into catalog_related (itemid,
related_itemid, rank) ^

HINT:  You will need to rewrite or cast the expression.
PostgreSQL 8.3.4 on x86_64-pc-linux-gnu

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] make available C extensions to others

2010-03-10 Thread Ivan Sergio Borgonovo

I've finished to write an extension to manipulate tsvectors and
tsquery in C.

I think it could be useful for someone else and I think I may take
advantage at someone else looking at the code too.

What would be the right place where to publish the code and how?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] bug in function arguments "recognition"

2010-03-03 Thread Ivan Sergio Borgonovo

On Wed, 03 Mar 2010 10:22:31 -0500
Tom Lane  wrote:

> Ivan Sergio Borgonovo  writes:
> > I've a function defined as:
> > create or replace function catalog_relateditems(__itemid bigint,
> >   families int[]...
> 
> If you want any useful comments, you're going to have to show a
> complete example, rather than selectively editing out what you
> think is irrelevant (and, no doubt, isn't).

Golden rule... I thought that psql auto completion was a good enough
proof something wasn't working and I was thinking if someone could
give me a good advice to trim down the clutter to build up a simpler
function that could misbehave.

I was not aware that in and out parameters could be intermixed and I
was expecting an error, but pg was pretty happy with the syntax and
this was enough to put me astray.

Sorry

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] bug in function arguments "recognition"

2010-03-03 Thread Ivan Sergio Borgonovo

On Wed, 3 Mar 2010 16:05:29 +0100
Ivan Sergio Borgonovo  wrote:

> I've a function defined as:
> 
> create or replace function catalog_relateditems(__itemid bigint,
>   families int[]...

Forget about it... there was a typo (missed out) that mixed in/out
parameters.

Sorry for the noise.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] bug in function arguments "recognition"

2010-03-03 Thread Ivan Sergio Borgonovo

I've a function defined as:

create or replace function catalog_relateditems(__itemid bigint,
  families int[]...

I call it with

select * from catalog_relateditems(6538::bigint, ARRAY[1,2,3,4,5]);
and I get:

HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.

The I try to drop the function using psql autocomplete and I get...

DROP FUNCTION catalog_relateditems ( bigint, integer[], character
varying)

\df *rela*
reports:

public | catalog_relateditems   | setof record | __itemid
bigint, families integer[], OUT ...

There is no other function named catalog_relateditems

I'm on 8.3.9 debian lenny

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] C function to create tsquery not working

2010-02-26 Thread Ivan Sergio Borgonovo

On Thu, 25 Feb 2010 11:41:58 +0100
Pavel Stehule  wrote:

> use --enable-assert configure flag?
> 
> you can use memory in bad context. So you are alloc good memory,
> but when you leave function, then complete memory context is
> freeed and you can have a problem.

Meanwhile I experienced some new strange behaviour.
I created a table in the same DB containing some tsvector to test the
function on a better known, easier to control set of data.
The tsvectors contained aren't that different from the one contained
in the "real" table, they are just fewer.

I finally downloaded all pg source, compiled it, compiled my
extension inside contrib.
Restored the "real" DB. Tested on the "real" table... and no problem
at all.

Nothing really helpful since the 2 setup aren't really identical,
one was hand compiled on sid, the other is stock debian lenny
install.

I'll try to compile the debian lenny version in a new virtual
machine.

Meanwhile if someone could give a glimpse to the source it would be
really appreciated.

http://www.webthatworks.it/d1/files/ts_utilities.tar.bz2

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] C function to create tsquery not working

2010-02-25 Thread Ivan Sergio Borgonovo

On Thu, 11 Feb 2010 20:11:54 +0100
Ivan Sergio Borgonovo  wrote:

> I'm still having trouble making this work:
> 
> http://pgsql.privatepaste.com/14a6d3075e

I tried to play with
item->operator.left
to see if reshuffling the expression could make any difference.
item->operator.left = 2 * lexeme - 2 (1 + i)
or
item->operator.left = lexemes

But the result seems pretty indifferent to what I put in
operator.left.
That makes me think the error is here.

But I still get those 2 kind of error:
ERROR:  unrecognized operator type: 50 (first run)
or
ERROR:  stack depth limit exceeded

Just at the 3rd returned row, just for certain queries (see previous
email).

It doesn't look as if I palloced too few memory, I tried to allocate
3x the memory I estimated and I still get the errors.

The function is actually returning correct results, so it seems the
tsquery object is well formed.

But still it looks like infix() is trying to read more operators
than the one I thought I've put in... but just for certain queries,
and just at the 3rd row returned.

Should I use something different than palloc? Should I return the
query differently? Am I supposed to free something?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] C function manipulating tsquery doesn't work with -O2

2010-02-24 Thread Ivan Sergio Borgonovo

http://psql.privatepaste.com/53cde5e24a

I've the above function.
Output is something like:

'9788876412646':A | ( '8876412646':A | ( 'edizioni':D | ( 'quasi':B
| ( 'estat':B | ( 'levi':C | ( 'lia':C | ( 'e/o':D |
'un':B ) ) ) ) ) ) )

It seems it always work with -O0

I can make it work with -O2 in:

select tsvector_to_tsquery(ft1idx, '|', 'ABCD', 22::smallint)
  from catalog_items limit 2; -- q1

select itemid, tsvector_to_tsquery(ft1idx, '|', 'ABCD', 22::smallint)
  from catalog_items; -- q2

select tsvector_to_tsquery(ft1idx, '|', 'ABCD', 22::smallint)
  from catalog_items where itemid=10 or itemid=15 or itemid=27; -- q3

select tsvector_to_tsquery(ft1idx, '|', 'ABCD', 22::smallint)
  from catalog_items order by random() limit 3; -- q4

As soon as I run
select tsvector_to_tsquery(ft1idx, '|', 'ABCD', 22::smallint)
  from catalog_items limit 3; -- q5

it commit seppuku in 2 ways:
ERROR:  unrecognized operator type: 50 (first run)
or
ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth", after
ensuring the platform's stack depth limit is adequate.

This looks to happen in infix() tsquery.c

This doesn't seem to be related with length of the tsvector passed.

Crazily increasing the only palloc in the function (3x) doesn't
solve the problem.

This is what I get back from gdb once the function exit:
(gdb) backtrace
#0  0x0053739a in ?? ()
#1  0x00536fd4 in ExecProject ()
#2  0x0053d150 in ExecScan ()
#3  0x00536470 in ExecProcNode ()
#4  0x00549ea0 in ExecLimit ()
#5  0x00536458 in ExecProcNode ()
#6  0x00534337 in ExecutorRun ()
#7  0x005d6b2b in ?? ()
#8  0x005d8339 in PortalRun ()
#9  0x005d2de9 in ?? ()
#10 0x005d4624 in PostgresMain ()
#11 0x0000005a6c68 in ?? ()
#12 0x005a7b30 in PostmasterMain ()
#13 0x0055aaae in main ()

version is:

PostgreSQL 8.3.9 on x86_64-pc-linux-gnu, compiled by GCC
gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] errmsg and multi-byte strings.

2010-02-18 Thread Ivan Sergio Borgonovo

How am I supposed to output multibyte strings in an errmsg (and Co.)
as in

errmsg("operator not permitted '%s'", mbstring)

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] [SOLVED] C function to create tsquery not working

2010-02-12 Thread Ivan Sergio Borgonovo

On Thu, 11 Feb 2010 20:11:54 +0100
Ivan Sergio Borgonovo  wrote:

> I'm still having trouble making this work:

> http://pgsql.privatepaste.com/14a6d3075e

Finally I got it working, not the above version anyway...

CREATE OR REPLACE FUNCTION tsvector_to_tsquery(IN tsv tsvector, op
IN char(1), weights IN varchar(4), maxpos IN smallint
)
RETURNS tsquery
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

There were some small errors, but the main one was setting
SET_VARSIZE passing the pointer to the query in spite of the query.

I'll need some smaller help to polish everything.

It is a small work but there was someone on the list that showed
some interest and it may be a nice simple helper for tsearch.

What would be the right place to advertise it and make it available?

To sum it up... I wrote 2 functions:
1 takes a tsvector and return it as a setof record text, int[], int[]
2 takes a tsvector, filter it according to weights and maximum
  position and return a | or & tsquery

The first is just for "debugging" or to be able to build more
complicated tsqueries in your preferred language.

The second can come handy to look for text similarity skipping to
compute tsvectors twice.

create or replace function similar(_id int,
  out id int, out title text) returns setof record as
$$
declare
  tsvin tsvector;
  tsq tsquery;
begin
  select into tsvin from table where id = _id;
  tsq := tsvector_to_tsquery(
tsvin, '|', 'AB', 100);
  return query
select t.id, t.title from table t where
  t.tsv @@ tsq
;
  return;
end;
$$ language plpgsql stable;

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] C function to create tsquery not working

2010-02-11 Thread Ivan Sergio Borgonovo

I'm still having trouble making this work:

http://pgsql.privatepaste.com/14a6d3075e

CREATE OR REPLACE FUNCTION tsvector_to_tsquery(IN tsv tsvector, op
IN char(1), weights IN varchar(4), maxpos IN smallint
)
RETURNS tsquery
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

What I expect is:

tsvector_to_tsquery('java tano', '&', 'ABCD', 100) ->
java & tano

tsvector_to_tsquery('java:1A,2B tano:3C,4D', '&', 'ABC', 100) ->
java:A & java:B & tano:C

tsvector_to_tsquery('java:1A,2B tano:3C,4D', '|', 'ABC', 100) ->
java:AB | tano:C

I've made some improvement compared to previous version I've posted
but still it returns an empty tsquery.

Things that works:
- tsvector_tsquery_size returns reasonable total length of strings
  and total number of (operand + operator)
- curout is actually filled with a lexeme
- filters (wf, posmax) work



-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] LIKE a set of pattern generated from a table

2010-02-09 Thread Ivan Sergio Borgonovo

I've a column that contain something formed by concatenating fields
from another table

create table stuff(
 glued varchar(30),
  ...
);

insert into stuff select 'constA,' || field1 || ',' || field2 from
origin where ...;

insert into stuff select 'constB,' || field1 || ',' || field2 from
origin where ...;

I know this is terrible... but this is what I have, and I can't
change it.

Now what I'd like to do is selecting in stuff using a pattern as:

select * from stuff where glue like (
  select '%,' || field1 || ',' || field2 || '%'
from origin
where ...
);

But this isn't going to work.
Any other way other than specifying all the const one by one in a
union and then look for equality?

select * from stuff where glue in (
 select 'constA,' || field1 || ',' || field2 from
   origin where ...
 union
 select 'constB,' || field1 || ',' || field2 from
   origin where ...
);

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] border case ::tsvector vs. to_tsvector was turning a tsvector without position in a weighted tsvector

2010-02-09 Thread Ivan Sergio Borgonovo

This was what I was after:

test=# select version();
version

 PostgreSQL 8.3.9 on x86_64-pc-linux-gnu, compiled by GCC
 gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2

test=# select to_tsvector('pino gino');
to_tsvector
---
 'gino':2 'pino':1
(1 row)

test=# select 'pino gino'::tsvector;
   tsvector
---
 'gino' 'pino'
(1 row)

test=# select to_tsvector('pino gino') @@ 'gino:B'::tsquery;
 ?column?
--
 f
(1 row)

test=# select to_tsvector('pino gino') @@ 'gino:D'::tsquery;
 ?column?
--
 t
(1 row)

test=# select ('pino gino'::tsvector) @@ 'gino:B'::tsquery;
 ?column?
--
 t
(1 row)

test=# select to_tsvector('pino:1B gino') @@ 'pino'::tsquery;
 ?column?
--
 t
(1 row)

test=# select 'pino gino'::tsvector || to_tsvector('pino gino');
 ?column?
---
 'gino':2 'pino':1
(1 row)

test=# select 'pino gino'::tsvector || 'pino gino'::tsvector;
   ?column?
---
 'gino' 'pino'
(1 row)

test=# select to_tsvector('pino gino') || to_tsvector('pino gino');
   ?column?
---
 'gino':2,4 'pino':1,3
(1 row)

test=# select 'pino gino'::tsvector || to_tsvector('gino tano');
 ?column?
--
 'gino':1 'pino' 'tano':2

test=# select
  setweight('pino gino'::tsvector || to_tsvector('gino tano'), 'A');
 setweight

 'gino':1A 'pino' 'tano':2A
(1 row)


So (even if it may sound obvious to many):
- tsvectors may be a mix of lexemes with and without weights
- a lexeme without a weight (=! default D weight) is a lexeme with
  ALL weights
- you can't assign a weight to a lexeme without a position and it
  would be hard to assign a position after a document is parsed into
  a tsvector, so while in theory it could be reasonable to have
  lexemes with weight and no position, in practice you'll have to
  assign not meaningful positions if you'd like to assign a weight
  to a tsvector with no positions.

I still wonder if it would be reasonable to write a function that
forcefully assign a position and a weight to vectors to be used with
ts_rank.
I've some ideas about possible use cases but I'm still unsure if they
are reasonable.
eg. someone would be willing to save storage and CPU cycles storing
part of documents in precomputed tsvectors with no weight and then
build up a search with merged tsvectors with weights using ts_rank.

OK.. trying to finish up my tsvector_to_tsquery function in a
reasonable way first.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How do I delete duplicate rows in a table?

2010-02-08 Thread Ivan Sergio Borgonovo

On Mon, 8 Feb 2010 15:32:51 -0800
"Wang, Mary Y"  wrote:

> Hi,
> 
> I have a table that have that duplicate rows.  How do I find them
> and delete them?

http://www.webthatworks.it/d1/node/page/eliminating_duplicates_place_without_oid_postgresql

Most likely I've learned it here...

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] turning a tsvector without position in a weighted tsvector

2010-02-08 Thread Ivan Sergio Borgonovo

On Mon, 8 Feb 2010 23:01:45 +0300 (MSK)
Oleg Bartunov  wrote:

> Ivan,
> 
> what's wrong with:
> 
> postgres=# select 'abc:1'::tsvector;
>   tsvector
> --
>   'abc':1

Yes you're right. I think I misplaced some quotes.
But still, once a vector has no position, I can't add the weights.

test=# select setweight('tano'::tsvector, 'A');
 setweight
---
 'tano'
(1 row)

test=# select setweight('tano:1'::tsvector, 'A');
 setweight
---
 'tano':1A
(1 row)

Since I'm writing some helper to manipulate tsvectors I was
wondering if
a) there is any reasonable use case of adding weights to
vectors with no position
b) I missed any obvious way to add weights to tsvectors that were
initially without positions

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] turning a tsvector without position in a weighted tsvector

2010-02-08 Thread Ivan Sergio Borgonovo

If I convert a string to a tsvector just casting (::tsvector) I
obtain a vector without positions.
tsvectors without positions don't have weights too.

I haven't found a way to turn a vector without weight/pos, into a
vector with weight/pos.

Is there a way to apply weight/add positions to tsvectors without
positions?
Is there any use-case?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Query to find list of dates between two dates

2010-02-05 Thread Ivan Sergio Borgonovo

On Fri, 5 Feb 2010 02:06:12 -0800 (PST)
aravind chandu  wrote:

> Hello guys,
> 
> can you please help me with the following query
> 
> I need a query that displays all the dates in between two dates
> say i give two dates 12/1/2009 and 12/31/2009 The result should be
> like this

select
  date '2008-05-01' + i
  from generate_series(0,
  (date '2009-12-10' - date '2008-05-01')) s(i);

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] reducing result set of tsvector @@ tsquery avoiding to use ts_rank

2010-02-01 Thread Ivan Sergio Borgonovo

I've finally made some fruitful steps in writing C functions that
manipulate tsvectors.

I'd like to build up a simple system based on ts_rank to find
similarities between documents.

I've some documents containing 4 parts.

I build a tsvector the "usual way"

setweight(tsvector(field1), 'A') |
setweight(tsvector(field2), 'B') |

etc...

then I'd like to build a query similar to:

tsvector @@ to_tsquery(
  'field1_lexeme1':A | 'field1_lexeme2':A | ...
  'field2_lexeme2':B | 'field2_lexeme2':B | ...

Anyway so many OR are going to return a lot of rows and filtering on
rank is "too late" for performances.

One way to shrink the result set would be to build a query that
requires at least 2 lexemes to be present:

  'field1_lexeme1':A & ('field1_lexeme2':A | ...
  'field2_lexeme2':B | 'field2_lexeme2':B | ...
   ) |
   'field1_lexeme2':A & ('field1_lexeme1 | ...
   ) |

I don't have very long documents and this looks feasible but I'd
like to hear any other suggestion to shrink the result set further
before filtering on ts_rank... especially suggestions that will
exploit the index.

So any suggestion that could reduce the result set before filtering
on rank is welcome and I'll try to put them in practice in some
C functions that taken a tsvector build up a tsquery to be used to
find similar documents.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to generate unique hash-type id?

2010-01-29 Thread Ivan Sergio Borgonovo

On Fri, 29 Jan 2010 13:13:17 +0100
"Wappler, Robert"  wrote:

> I'd suggest to use some kind of sequence or something constructed
> from the primary keys. But you may still see hash collisions
> although the input is different.

Concatenate /* ::text */ random() with something like:

http://www.webthatworks.it/d1/node/page/pseudo_random_sequences_postgresql


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] is this the correct result for ts_rewrite? reducing tsquery to improve performance?

2010-01-28 Thread Ivan Sergio Borgonovo

select ts_rewrite(
  to_tsquery('java:A & cola & java:AB'),
  'java:AB'::tsquery,
  'java:AB'::tsquery);

ts_rewrite

 'cola' & 'java':AB & 'java':AB

Is this the expected (documented) result?

I found this while looking for a way to build up a tsquery directly
in it's own structure without passing through its text
representation (and maybe reduce it).

The following looks equivalent. Are they?

test=# select 'java:ABC'::tsquery;
  tsquery

 'java':ABC
(1 row)

test=# select 'java:A | java:B | java:C'::tsquery;
  tsquery

 ( 'java':A | 'java':B ) | 'java':C
(1 row)

I did try to pass them through nodetree... but the result keeps on
being different.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] tsvector minimum match using index

2010-01-28 Thread Ivan Sergio Borgonovo

Would it be possible without writing a very long tsquery to exploit
the index to retrieve the tsvectors that contain at least N lexemes?

If not exploiting the index... any suggestion to improve performance
of such a query?

computing rank still requires retrieving a lot of tsvectors and
compute the rank for each of them.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] FTS uses "tsquery" directly in the query

2010-01-25 Thread Ivan Sergio Borgonovo

On Mon, 25 Jan 2010 23:35:12 +0300 (MSK)
Oleg Bartunov  wrote:

> Do you guys wanted something like:
> 
> arxiv=# select and2or(to_tsquery('1 & 2 & 3'));
> and2or 
> -
>   ( '1' | '2' ) | '3'
> (1 row)

Nearly. I'm starting from a weighted tsvector not from text/tsquery.
I would like to:
- keep the weights in the query
- avoid parsing the text to extract lexemes twice (I already have a
  tsvector)

For me extending pg in C is a new science, but I'm actually trying
to write at least a couple of functions that:
- will return a tsvector as a weight int, pos int[], lexeme text
  record
- will turn a tsvector + operator into a tsquery
  'orange':A1,2,3 'banana':B4,5 'tomato':C6,7 ->
  'orange':A | 'banana':B | 'tomato':C
  or eventually
  'orange':A & 'banana':B & 'tomato':C

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] FTS uses "tsquery" directly in the query

2010-01-25 Thread Ivan Sergio Borgonovo

On Mon, 25 Jan 2010 07:19:59 -0800 (PST)
xu fei  wrote:

> Hi, Oleg Bartunov:
> First thanks for your quick replay. Could you explain it a little
> more on "it's general limitation/feature"? I just confuse that
> to_tsquery('item') function will return a tsquery type which is
> same as 'item'::tsquery, to my understanding. Let me explain what
> I want:First Step: extract top K tokensI have a table with a
> column as tsvector type. Some records in this column are too big,
> which contain hundreds tokens. I just want the top K tokens based
> on the frequency, for example top 5. I am not sure there is a
> direct way to get such kind top K tokens. I just read them out in
> Java and count frequency for each token and sort them. Second
> Step: generate queryNow I will use these tokens to construct a
> query to search other vectors in the same table. I can not
> directly use to_tsquery() due to two reasons: 1) The default logic
> operator in to_tsquery() is "&" but what I need it "|". 2) Since
> the tokens are from tsvector, they are already normalized. If I
> use to_tsquery() again, they will be normalized again! For
> example, “course” -> “cours” -> “cour”. So I just concatenate the
> top K tokens with “|” and directly use "::tsquery ".
> Unfortunately, as you say "it's general limitation/feature”, I can
> not do that. I checked your manual “Full-Text Search
> in PostgreSQL A Gentle Introduction”, but could not figure out
> how. So is it possible to implement what I want in FTS? If so,
> how? Thank! Xu --- On Sun, 1/24/10, Oleg Bartunov

You're trying to solve a similar problems than mine.
I'd like to build up a näive similar text search.
I don't have the "length" problem still I'd like to avoid to
tokenize/lexize a text twice to build up a tsquery.
I've weighted tsvectors stored in a column and once I pick up one
I'd like to look for similar ones in the same column.

There are thousands way to measure text similarity (and Oleg pointed
me to some), still ts_rank should be "good enough for me".

I've very short text so I can't use & on the whole tsvector
otherwise there will be very high chances to find just one match.

As you suggested I could pick up a subset of "important"[1] lexemes
in the tsvector and build up an "&"ed tsquery with them.

Still at least in my case, since I'm dealing with very short texts,
this still looks too risky (just 1 match). Considering that I'm
using weighted tsvectors it seems that "|" and picking up the ones
with the best rank could be a way to go.

But as you've noted there is no function that turns a tsvector in a
tsquery (including weight possibly) and give you the choice to use
"|".

Well... I'm trying to write a couple of helper functions in C.
But I'm pretty new to postgres internals and well I miss a reference
of functions/macro with some examples... and this is a side project
and I haven't been using C for quite a while.

Once I'll have that function I'll have to solve how to return few
rows (since I'll have to use | I expect a lot of returned rows) to
make efficient use of the gin index and avoid to compute ts_rank for
too many rows.

Don't hold your breath waiting... but let me know if you're
interested so I don't have to be the only one posting newbies
questions on pgsql-hackers ;)

[1] ts_stat could give you some hints about what lexemes may be
important... but well deciding what's important is another can of
worms... and as anticipated ts_rank should be "good enough for me".

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] not officially documented use of setweight??

2010-01-25 Thread Ivan Sergio Borgonovo

On Mon, 25 Jan 2010 12:01:04 +0100
Ivan Sergio Borgonovo  wrote:

> I think I've learned how to use pg text search from Oleg and Teodor
> documentation since I've found on my code this use of setweight:
> 
> query := query &&
>   setweight(configuration, 'banana apple orange', 'B', '&');
> 
> But I can't find any trace of this use in official postgres docs.
> The docs just says:
> 
> setweight(vector tsvector, weight "char") returns tsvector
> 
> Am I missing them? Is that use "supported" in future versions?

Forgive me!

setweight(cfg, text, weight, op) was actually one of my creations of
1 or 2 years ago using ts_debug.
I got mad looking for it in pg sources... till suddently I tought to
grep my codebase!!!

Since I had not previously processed user input using ts_debug was
fine. Now I've already processed tsvectors... so I'm writing
something that basically do the above but starting from preprocessed
tsvectors in spite of text.

Sorry.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] not officially documented use of setweight??

2010-01-25 Thread Ivan Sergio Borgonovo

I think I've learned how to use pg text search from Oleg and Teodor
documentation since I've found on my code this use of setweight:

query := query &&
  setweight(configuration, 'banana apple orange', 'B', '&');

But I can't find any trace of this use in official postgres docs.
The docs just says:

setweight(vector tsvector, weight "char") returns tsvector

Am I missing them? Is that use "supported" in future versions?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] more docs on extending postgres in C

2010-01-22 Thread Ivan Sergio Borgonovo

On Fri, 22 Jan 2010 11:02:46 -0500
Tom Lane  wrote:

> Ivan Sergio Borgonovo  writes:
> > If I had to build stuff in the pg source tree I'd just clone a
> > contrib directory and change the makefile [1]. What am I
> > supposed to do if I'd like to create a contrib elsewhere (eg.
> > ~/Documents/nfs/projects/ts_extensions)?
> 
> Use PGXS:
> http://www.postgresql.org/docs/8.4/static/xfunc-c.html#XFUNC-C-PGXS
> 
> If the docs aren't enough for you, all of the contrib modules can
> be built via pgxs, using "make PGXS=1".  So their makefiles are
> useful examples.

What did work for me on Debian sid was:

- installing postgresql-server-dev-[version]
- apt-get source postgresql-server-dev-[version]
- copy from there a contrib dir in my ~ (or wherever you prefer)
- export USE_PGXS=1; make

I didn't have postgresql server installed on this box but I have
stuff that got in to satisfy dependencies for php/python pg drivers
and psql.
I wasn't able to understand from where PGXS pick up the version
since I installed -dev-8.3 but everything else was for 8.4 and it
didn't work:

i...@dawn:~/ts_extension$ export USE_PGXS=1; make
Makefile:12: /usr/lib/postgresql/8.4/lib/pgxs/src/makefiles/pgxs.mk:
No such file or directory make: *** No rule to make target
`/usr/lib/postgresql/8.4/lib/pgxs/src/makefiles/pgxs.mk'.  Stop.

Installing dev-8.4 made everything work.

Nothing that really trouble me... but I'd expect that installing
-dev-8.3 it would look in the right place since actually pgxs.mk is
included in both versions.

Thanks... I'd publish a summary as soon as I've clearer ideas so
that the next poor guy will find easier to write contrib on Debian.

I don't know if the version problem is worth a bug report to Debian
(or pg people that built the pgxs system).

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] more docs on extending postgres in C

2010-01-22 Thread Ivan Sergio Borgonovo

On Wed, 20 Jan 2010 17:43:27 +0100
Adrian von Bidder  wrote:

> On Wednesday 20 January 2010 15.42:14 Ivan Sergio Borgonovo wrote:
> > I'd also appreciate some suggestion about dev environment and
> > best practices on Debian, something that could help me to
> > compile, install, test easily on Debian.
> 
> (Disclaimer: Haven't done any postgres related programming so far,
> so this might need adjusting.)
> 
> The desciption for package postgresql-server-dev-8.4 includes 
> 
> [[[
>  This package also contains the Makefiles necessary for building
> add-on modules of PostgreSQL, which would otherwise have to be
> built in the PostgreSQL source-code tree.
> ]]]

I need some babysitting from someone that actually developed
extension on Debian.

The package was already installed, but it seems it provides more
than just the header files.
Now that I know I can build extensions outside pg source tree I'd be
nice I understand how to make it in a kosher way.

If I had to build stuff in the pg source tree I'd just clone a
contrib directory and change the makefile [1]. What am I supposed to
do if I'd like to create a contrib elsewhere (eg.
~/Documents/nfs/projects/ts_extensions)?

I've seen some forward to this list from pgsql-hackers.
Would that list be a better place to ask this kind of things or is
it too elite ;) for newbies?

[1] actually I tried to make in a contrib module dir in pg source
code tree... but unless you run ./configure at the root of the source
tree it fails. So I've to guess that -dev gives you a preconfigured
environment.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PgSQL problem: How to split strings into rows

2010-01-21 Thread Ivan Sergio Borgonovo

On Thu, 21 Jan 2010 13:49:45 -0500
Kynn Jones  wrote:

> I have a table X with some column K consisting of
> whitespace-separated words.  Is there some SELECT query that will
> list all these words (for the entire table) so that there's one
> word per row in the returned table?  E.g. If the table X is
> 
>K
> -
>  foo bar baz
>  quux frobozz
>  eeny meeny
>  miny moe
> 
> ...I want the result of this query to be
> 
>  foo
>  bar
>  baz
>  quux
>  frobozz
>  eeny
>  meeny
>  miny
>  moe

http://www.postgresql.org/docs/current/static/functions-array.html
string_to_array

select (string_to_array('tano pino gino', ' '))[i] from
generate_series(1, 3) s(i);

You'd get the idea... to get the length of the array you've
array_length.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] more docs on extending postgres in C

2010-01-20 Thread Ivan Sergio Borgonovo

On Wed, 20 Jan 2010 17:38:17 +0100
Pavel Stehule  wrote:

> > Is there a reference of all macro and functions?

> no, only source code

It would be nice to at least a list of functions that could be used
in extension development to avoid reading all the source.

Since I'm a new entry in pg C coding and internals it will take me
ages to just find what's worth to know. I'll try to take some notes
while I grasp stuff and publish them somewhere.

I'm still trying to digest:

There are two ways you can build a composite data value (henceforth
a "tuple"): you can build it from an array of Datum values, or from
an array of C strings that can be passed to the input conversion
functions of the tuple's column data types. In either case, you
first need to obtain or construct a TupleDesc descriptor for the
tuple structure. When working with Datums, you pass the TupleDesc to
BlessTupleDesc, and then call heap_form_tuple for each row. When
working with C strings, you pass the TupleDesc to
TupleDescGetAttInMetadata, and then call BuildTupleFromCStrings for
each row. In the case of a function returning a set of tuples, the
setup steps can all be done once during the first call of the
function.

I grep throu contrib and I wasn't able to find anything that really
enlighted me about BlessTupleDesc.

I'll try to see if tomorrow things will look clearer.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] more docs on extending postgres in C

2010-01-20 Thread Ivan Sergio Borgonovo

On Wed, 20 Jan 2010 16:56:04 +0100
Dimitri Fontaine  wrote:

> Ivan Sergio Borgonovo  writes:
> > I haven't been able to find anything better than the online
> > manual and pg source code to learn how to write extensions.
> 
> Maybe this will help:
>   http://wiki.postgresql.org/wiki/Image:Prato_2008_prefix.pdf
>   http://github.com/dimitri/prefix

Thanks to all.

Is there a reference of all macro and functions?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] more docs on extending postgres in C

2010-01-20 Thread Ivan Sergio Borgonovo

I haven't been able to find anything better than the online manual
and pg source code to learn how to write extensions.

I couldn't find a reference of all the function/macros I could use
and some more examples on how to use them.

I'd also appreciate some suggestion about dev environment and best
practices on Debian, something that could help me to compile,
install, test easily on Debian.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] C: extending text search: from where to start

2010-01-20 Thread Ivan Sergio Borgonovo

On Tue, 19 Jan 2010 10:12:21 +0100
Dimitri Fontaine  wrote:

> Ivan Sergio Borgonovo  writes:
> > I'd appreciate any pointer that will quickly put me on the right
> > track.
> 
> I'd guess you begin here:
>   http://wiki.postgresql.org/wiki/Developer_FAQ

With the exception of formatting style info I didn't find anything
useful there.

Right now I'm reading:
http://www.postgresql.org/docs/8.3/static/xfunc-c.html
There is just an example on how to build up the data returned but it
uses C string. I couldn't find an example that uses TupleDesc
BlessTupleDesc.

I'm installing deb contrib source package.

Any other resource that will help me to write my own contrib?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] C: extending text search: from where to start

2010-01-18 Thread Ivan Sergio Borgonovo

I'd like to extend full text search so that I can transform tvectors
in tquery and have direct access to a tvector as a record/array.

I'm on Debian.

This is my first experience with pg source code.
I'd appreciate any pointer that will quickly put me on the right
track.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ranking how "similar" are tsvectors was: OR tsquery

2010-01-17 Thread Ivan Sergio Borgonovo

On Sun, 17 Jan 2010 20:19:59 +0300 (MSK)
Oleg Bartunov  wrote:

> Ivan,
> 
> You can write function to get lexemes from tsvector:

> CREATE OR REPLACE FUNCTION ts_stat(tsvector, weights text, OUT
> word text, OUT ndoc integer, OUT nentry integer)
> RETURNS SETOF record AS
> $$
>  SELECT ts_stat('SELECT ' || quote_literal( $1::text ) ||
> '::tsvector', quote_literal( $2::text) ); $$ LANGUAGE SQL RETURNS
> NULL ON NULL INPUT IMMUTABLE;

Thanks very much Oleg.

Still it is not really making the pain go away.
I've weights stored in my tsvector and I need to build the query
using them.

This means that if I have:
'aubergine':4A 'orange':1B 'banana':5A 'apple':3C
and
'coconut':3B 'bananas':1A 'tomatoes:2C
stored in a column (tsv) I really would like to build up the query:

to_tsquery('aubergine:A | orange:B | bananas:A | apple:C')

then

tsv
@@
to_tsquery('aubergine:A | orange:B | bananas:A | apple:C')

and relative ts_rank()

I'm aware that it is not symmetrical, but it looks as the cheapest
and fastest thing I can do right now.

I'm using pg_catalog.english. Am I supposing correctly that NO
lexeme will contain spaces?

If that is the case I could simply use string manipulation tools.
Not nice to see but it will work.

> Then, you can create ARRAY like:
> 
> select ARRAY ( select (ts_stat(fts,'*')).word from papers where
> id=2);
> 
> Then, you will have two arrays and you're free to apply any
> similarity function (cosine, jaccard,) to calculate what do
> you want. If you want to preserve weights, then use weight label
> instead of '*'.

What ts_rank does is more than enough right now.

> Another idea is to use array_agg, but I'm not ready to discuss it.
> 
> Please, keep in mind, that document similarity is a hot topic in

Not hard to imagine.

> IR, and, yes, I and Teodor have something about this, but code
> isn't available for public. Unfortunately, we had no sponsor for
> full-text search for last year and I see no perspectives this
> year, so we postpone our text-search development.

Good luck. Do you have anything like http://www.chipin.com/ for
small donations?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] ranking how "similar" are tsvectors was: OR tsquery

2010-01-17 Thread Ivan Sergio Borgonovo

My initial request was about a way to build up a tsquery that was
made similar to what plainto_tsquery does but using | inspite of &
as a glue.

But at the end of the day I'd like to find similar tsvectors and
rank them.

I've a table containing several fields that contribute to build up a
weighted tsvector.

I'd like to pick up a tsvector and find which are the N most similar
ones.

I've found this:

http://domas.monkus.lt/document-similarity-postgresql

That's not really too far from what I was trying to do.

But I have precomputed tsvectors (I think turning text into a
tsvector should be a more expensive operation than string
replacement) and I'd like to conserve weights.

I'm not really sure but I think a lexeme can actually contain a '
or a space (depending on stemmer/parser?), so I'd have to take care
of escaping etc...

Since there is no direct access to the elements of a tsvector... the
only "correct" way I see to build the query would be to manually
rebuild the tsvector and getting back the result as a record using
ts_debug and ts_lexize... that looks a bit a PITA.

I don't even think that having direct access to elements of a
tsvector will completely solve the problem since tsvectors store
positions too, but it will be a step forward in making easier to
compare documents to find similar ones.
An operator that check the intersection of tsvectors would come
handy.
Adding a ts_rank(tsvector, tsvector) will surely help too.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] OR tsquery

2010-01-16 Thread Ivan Sergio Borgonovo

On Sat, 16 Jan 2010 19:10:45 +0300 (MSK)
Oleg Bartunov  wrote:

> Ivan,

> did you ever read official documentation ?
> http://www.postgresql.org/docs/8.4/static/textsearch-controls.html

Yes but I still can't find something that works like plainto_tsquery
but with | or any example that wouldn't make obtaining that result
convoluted.

plainto_tsquery do a couple of stuff that I find hard to replicate
with the available functions.
It split a string into lexemes.
It loops over the lexemes to build up the query with &.

Something like:

select (
  string_to_array(
strip(
to_tsvector('pg_catalog.english',
  'orange banana woods booking'))::text
   , ' ')
   )[i]
  from generate_series(0,3) s(i);

and then gluing up the pieces with |.

And the above example still miss to solve some of the details like
cleaning the '.

Another option would be to return the tsvector to the client and
then build the tsquery there and send it back to the server.

I'm on 8.3 but I don't think it makes any real difference for this.

Sorry if I'm still missing the obvious.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] OR tsquery

2010-01-16 Thread Ivan Sergio Borgonovo

to_tsquery and plainto_tsquery produce AND tsquery
'apple banana orange' -> 'apple' & 'banana' & 'orange'
I can't see anything that will produce OR tsquery.
'apple banana orange' -> 'apple' | 'banana' | 'orange'

The only thing I can think of is looping on ts_lexize that looks not
very efficient in plpgsql.

Am I missing something?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] how much left for restore?

2010-01-11 Thread Ivan Sergio Borgonovo

On Mon, 11 Jan 2010 18:36:18 +
Sam Mason  wrote:

> On Fri, Jan 08, 2010 at 11:28:15AM +0100, Ivan Sergio Borgonovo
> wrote:
> > Is there a way to know/estimate how much is left to complete a
> > restore?

> maybe something like "pv" would help?

>   http://www.ivarch.com/programs/pv.shtml

Nice. Start to look more as what I was looking for... and yeah...
I'm aware it could be misleading.

It would be nice to have it integrated in pg_restore/dump.
If the file is compressed pg_* may make a better work to give an
estimate.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] how much left for restore?

2010-01-11 Thread Ivan Sergio Borgonovo

On Mon, 11 Jan 2010 12:30:45 -0500
Francisco Reyes  wrote:

> Ivan Sergio Borgonovo writes:
> 
> > Is there a way to know/estimate how much is left to complete a
> > restore?
> 
> Not sure on plain ASCII files but if your pg_dump used Fc then at
> restore you can  pass the -v flag.

It get a bit better but even knowing what are the largest tables it
is hard to get an estimate of how much is missing before complete
restore.

I'm really looking at rough figures... even a: I've read 40% of the
file will give a more usable information than: I've already loaded
table A.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] R: aggregate over tables in different schema

2010-01-11 Thread Ivan Sergio Borgonovo

On Sun, 10 Jan 2010 10:49:48 +0100
Vincenzo Romano  wrote:

> Try using inheritance.

One of the things I didn't mention is: I've to join these tables
with other tables that may or may not (public) belong to the same
schema.

select sum(i.qty) from s1.list_items li
  join public.item i on i.itemid=li.itemid;

Now I'd like to pick up the sum over all list_items tables across
all the schemas.

If I define the summary table as the child of all the sub-tables I'm
going to write dynamic SQL anyway.
So I guess I should define a common ancestor for all the tables
(list_items) in different schema.

create public.list_items (
  itemid int primary key, // trouble
  name varchar(32)
);

create table s1.list_items (
) inherits (public.list_items);

create table s2.list_items (
) inherits (public.list_items);

But I can't see how am I going to write the query.

Furthermore the children should have their own pk and not share them.

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] aggregate over tables in different schema

2010-01-09 Thread Ivan Sergio Borgonovo

I've tables in different schemas all with the same name and
structure.
I'd like to compute an aggregate on the union of those tables.
I don't know the schemas in advance.
The list of the schema will be built selecting all the schemas that
contain a table with that name.

Other than building dynamically the statement as a list of union or
building up a view is there any other way?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] how much left for restore?

2010-01-08 Thread Ivan Sergio Borgonovo

Is there a way to know/estimate how much is left to complete a
restore?
It would be enough just knowing which part of the file is being
restored (without causing too much extra IO, that will definitively
put my notebook on its knee).

Next time I try a restore on this box is there anything I could
tweak in pg config to make it faster?

For dev only... could I just stop the dev server, copy the *files*
on flash and mount them on the notebook?

thanks

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimized Select Statement

2010-01-06 Thread Ivan Sergio Borgonovo

On Wed, 6 Jan 2010 17:45:31 -0800 (PST)
Yan Cheng Cheok  wrote:

> situation, I will be only interested in 1 "YanChengCHEOK".


> SELECT measurement_type_id INTO _measurement_type_id FROM
> measurement_type WHERE measurement_type_name='YanChengCHEOK';

LIMIT 1

Is that what you were looking for?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_dump excluding tables content but not table schema

2009-12-31 Thread Ivan Sergio Borgonovo

On Mon, 28 Dec 2009 21:20:17 +0100
Ivan Sergio Borgonovo  wrote:

> pg_dump -Fc -Z9 -s -t *.cache* -d mydb > schema_only.bak
> pg_dump -Fc -Z9 -T *.cache* -d mydb > nearly_full.bak

> cat nearly_full.bak schema_only.bak | pg_restore -1 -d mydb

> It seems it is working... I'll test if everything is there.

Unfortunately it doesn't work as expected.
It silently skip to restore the second backup (schema_only.bak).
I'm surprised it didn't output any error message, but the cache
tables aren't there.

It seems that you have to actually restore the 2 backup separately.

pg_restore -1 -d mydb < nearly_full.bak
pg_restore -1 -d mydb < schema_only.bak

I can't think of any other way to restore both in one transaction
unless I backup in plain text. But that should have other drawback.

Any hint?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

1 2 3 4 5 6 >

1 - 100 of 520 matches

Mail list logo