[GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Tim van der Linden
Good morning/afternoon all

I am currently writing a few articles about PostgreSQL's full text capabilities 
and have a question about the Ispell dictionary which I cannot seem to find an 
answer to. It is probably a very simple issue, so forgive my ignorance.

In one article I am explaining about dictionaries and I have setup a sample 
configuration which maps most token categories to only use a Ispell dictionary 
(timusan_ispell) which has a default configuration: 

CREATE TEXT SEARCH DICTIONARY timusan_ispell (
TEMPLATE = ispell,
DictFile = en_us,
AffFile = en_us,
StopWords = english
);

When I run a simple query like SELECT to_tsvector('timusan-ispell','smiling') 
I get back the following tsvector:

'smile':1 'smiling':1

As you can see I get two lexemes with the same pointer.
The question here is: why does this happen? 

Is it normal behavior for the Ispell dictionary to emit multiple lexemes for a 
single token? And if so, is this efficient? I mean, why could it not simply 
save one lexeme 'smile' which (same as the snowball dictionary) would match 
'smiling' as well if later matched with the accompanying tsquery?

Thanks!

Cheers,
Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] break table into portions for writing to separate files

2014-05-02 Thread Francisco Olarte
Hi Seb:

On Thu, May 1, 2014 at 8:50 PM, Seb splu...@gmail.com wrote:
 Thanks, I'm glad to hear you've used this approach successfully.

Well, this is always successful if you are able to develop a
moderately complex script.

 It seems as though the best solution is to do a single SELECT to get the
 data out of the server (it is a view with a very complex query plan
 joining several other similar views), and then pipe the output through
 say awk to break down into chunks for writing the files, as you
 describe.

If you've got enough disk in the target machine ( several Gb, even a
hundred, shouldn't be a problem with todays, or even yesterdays
machines ) for two copies of the data, I'll recommend to just do a
COPY of the view to a file, then process the file. This lets you do
very simple scripts to chunk it, and normally workstation disk is way
cheaper than servers.

What I would normally do from what you describe is to spool the whole
table ( maybe gzipping it along the way if it's real big, in my
experience some current machines ( fast multicores with not so fast
disks ) are faster gziping and reading/writing a third of the data (
easily achievable with gzip if your data are numbers and timestamps )
than writing the full set, and then make a set of scripts which work
on it. I do not grok awk ( I began using perl in the 4.019 era and
substituted awk/sed with it for all kind of one liners and similars ),
but with perl, on a moderate machine ( like the one I'm using, core
duo with 2Gb, quite old by today standards ) you can filter the data
and extract it to several hundreds files at nearly full disk speed.

As I commented before, if your query results exhibit some locality on
the key, you can open several files keeping an LRU cache and split it
in one go. I wouldn't try to force order in the db, I've found disk
sorts are better for this kind of problems.

Ah, another hint. I've had to make somehow similar tasks ( dumping a
big query which joins a lot ) in the past. In some of this cases the
result was sweeping and concatenating a slew of really big tables
while joining a lot of small ones, something like sweeping 20Gb while
joining it with 15 more totalling 512Mb among them, generating a 100Gb
denormalized result. In these cases I developed a program which just
slurped all the small tables into RAM and did the join before writing,
which greatly improved the speed ( as, among other things, Postgres
only had to send me 20.5Gb over the net ). Sometimes you can use
things like this to improve performance as WS RAM is sometimes
plentiful and cheap, as you have all of it for a single task, while db
server ram is scarcer.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backups over slave instead master?

2014-05-02 Thread Edson Richter

Em 01/05/2014 16:39, bricklen escreveu:


On Thu, May 1, 2014 at 8:54 AM, Shaun Thomas stho...@optionshouse.com 
mailto:stho...@optionshouse.com wrote:


On 05/01/2014 10:31 AM, Edson Richter wrote:

I'm wondering if would be possible to execute these backups in
the slave
server instead, so I can avoid the overhead of backups on
master system?


If you're on PostgreSQL 9.3, you can backup the slave server
safely. If not, you'll need to run this command on the master
system first:

SELECT pg_start_backup('some-label');

After the backup is done, run this on the master server:

SELECT pg_stop_backup();


Or alternatively, if backup = pg_dump, then backups can taken from 
the slave too. Have a look at pg_xlog_replay_pause() + pg_dump + 
pg_xlog_replay_resume().

http://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL-TABLE

Yes, backup = pg_dump.
Can I use it on slave even on 9.2.7?

Edson



Re: [GENERAL] Backups over slave instead master?

2014-05-02 Thread Shaun Thomas

On 05/02/2014 09:46 AM, Edson Richter wrote:


Yes, backup = pg_dump.
Can I use it on slave even on 9.2.7?


Yes. The reason I suggested 9.3 was assuming you were using 
pg_basebackup, rsync, or a similar tool to backup the database binaries. 
You should be fine with pg_dump.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Oleg Bartunov
Yes, it's normal for ispell dictionary, think about morphological dictionary.

On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp wrote:
 Good morning/afternoon all

 I am currently writing a few articles about PostgreSQL's full text 
 capabilities and have a question about the Ispell dictionary which I cannot 
 seem to find an answer to. It is probably a very simple issue, so forgive my 
 ignorance.

 In one article I am explaining about dictionaries and I have setup a sample 
 configuration which maps most token categories to only use a Ispell 
 dictionary (timusan_ispell) which has a default configuration:

 CREATE TEXT SEARCH DICTIONARY timusan_ispell (
 TEMPLATE = ispell,
 DictFile = en_us,
 AffFile = en_us,
 StopWords = english
 );

 When I run a simple query like SELECT 
 to_tsvector('timusan-ispell','smiling') I get back the following tsvector:

 'smile':1 'smiling':1

 As you can see I get two lexemes with the same pointer.
 The question here is: why does this happen?

 Is it normal behavior for the Ispell dictionary to emit multiple lexemes for 
 a single token? And if so, is this efficient? I mean, why could it not simply 
 save one lexeme 'smile' which (same as the snowball dictionary) would match 
 'smiling' as well if later matched with the accompanying tsquery?

 Thanks!

 Cheers,
 Tim


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Timeouts after upgrade from 9.0 to 9.3

2014-05-02 Thread gvim
I've just upgraded a server from 9.0 to 9.3 using the pgupgrade.sh 
script and setup the same config but now,  when I connect with the 
latest pgAdmin III I get a timeout after leaving the connection open for 
a few minutes. Where can I diagnose this fault?


gvim


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Optimize query for listing un-read messages

2014-05-02 Thread Andreas Joseph Krogh
Hi all,   I have a schema where I have lots of messages and some users who 
might have read some of them. When a message is read by a user I create an 
entry i a table message_property holding the property (is_read) for that user.  
The schema is as follows:   drop table if exists message_property;
 drop table if exists message;
 drop table if exists person;   create table person(
     id serial primary key,
     username varchar not null unique
 );   create table message(
     id serial primary key,
     subject varchar
 );   create table message_property(
     message_id integer not null references message(id),
     person_id integer not null references person(id),
     is_read boolean not null default false,
     unique(message_id, person_id)
 );   insert into person(username) values('user_' || generate_series(0, 999)); 
insert into message(subject) values('Subject ' || random() || 
generate_series(0, 99)); insert into message_property(message_id, 
person_id, is_read) select id, 1, true from message order by id limit 90;
 insert into message_property(message_id, person_id, is_read) select id, 1, 
false from message order by id limit 5 offset 90;     So, for person 1 
there are 10 unread messages, out of a total 1mill. 5 of those unread does not 
have an entry in message_property and 5 have an entry andis_read set to FALSE.  
I have the following query to list all un-read messages for person with id=1:   
SELECT
     m.id  AS message_id,
     prop.person_id,
     coalesce(prop.is_read, FALSE) AS is_read,
     m.subject
 FROM message m
     LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1
 WHERE 1 = 1
   AND NOT EXISTS(SELECT
  *
  FROM message_property pr
  WHERE pr.message_id = m.id AND pr.person_id = 
prop.person_id AND prop.is_read = TRUE)
     ;   The problem is that it's not quite efficient and performs badly, 
explain analyze shows: 

 
QUERY PLAN
 
-
  Merge Anti Join  (cost=1.27..148784.09 rows=5 width=40) (actual 
time=907.412..907.419 rows=10 loops=1)
    Merge Cond: (m.id = pr.message_id)
    Join Filter: (prop.is_read AND (pr.person_id = prop.person_id))
    Rows Removed by Join Filter: 5
    -  Merge Left Join  (cost=0.85..90300.76 rows=100 width=40) (actual 
time=0.039..524.070 rows=100 loops=1)
  Merge Cond: (m.id = prop.message_id)
  -  Index Scan using message_pkey on message m  (cost=0.42..34317.43 
rows=100 width=35) (actual time=0.014..112.840 rows=100 loops=1)
  -  Index Scan using message_property_message_id_person_id_key on 
message_property prop  (cost=0.42..40983.40 rows=95 width=9) (actual 
time=0.019..129.035 rows=95 loops=1)
    Index Cond: (person_id = 1)
    -  Index Only Scan using message_property_message_id_person_id_key on 
message_property pr  (cost=0.42..40983.40 rows=95 width=8) (actual 
time=0.023..138.327 rows=95 loops=1)
  Index Cond: (person_id = 1)
  Heap Fetches: 95
  Total runtime: 907.480 ms
 (13 rows)     Does anyone have suggestions on how to optimize the query or 
schema? It's important that any message not having an entry in message_property 
for a user is considered un-read.   Thanks!   -- Andreas Jospeh Krogh CTO / 
Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com 
mailto:andr...@visena.com www.visena.com https://www.visena.com  
https://www.visena.com

[GENERAL] Manipulating jsonb

2014-05-02 Thread Andreas Heiduk
Hello,

I'm playing with a 9.4 preview and the upcoming jsonb datatype. But
I'm missing the basic modification syntax.

Given the following situation:

CREATE TABLE test(key int, jsonval jsonb);
INSERT INTO test VALUES(1, '{a: 1, c: 3}');

How can I UPDATE that row with '{b: 2, d: 4}' ? Something like
this does not work:

UPDATE test SET jsonval = jsonval || '{a: 1, c: 3}'::jsonb
where key = 1;

The result should be

{a: 1, b: 2, c: 3, d: 4}

The same goes for removing values.

Did I overlook something obvious or is there really no way to do that right now?

Best regards,
Andreas Heiduk


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Vacuuming strategy

2014-05-02 Thread Jeff Janes
On Wed, Apr 30, 2014 at 10:40 AM, Elanchezhiyan Elango elanela...@gmail.com
 wrote:





 Why do you have a 4 minute timeout?  That seems counter-productive.

 Oh, Is it less or more?


I would not have timeouts on maintenance operations at all.  To me a
statement timeout is a last ditch method to deal with a recalcitrant
application (or recalcitrant colleague) which occasionally does something
silly and which cannot be fixed.





  You delete a bunch of tuples every night, so of course a vacuum full
 after that is going to return a lot of space.  But that space is probably
 just going to be needed again the next day.  If you don't do the vacuum
 full, does the *peak* space keep increasing, or does it stabilize?

 I haven't tested to see if the space keeps on increasing.

 I did pgstattupe() on one of the tables:

 managed_target_stats=# select * from pgstattuple('xyz');

  table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count
 | dead_tuple_len | dead_tuple_percent | free_space | free_percent


 +-++---+--++++--

  5642526720 |18880283 | 4042874387 | 71.65 |   122037
 |   21495560 |   0.38 | 1239598044 |21.97

 This is one of Tables1 tables and this is after running for a week or so
 with the default autovacuum settings. The dead_tuple_percent look good. But
 the free_percent looks high. Is this normal?


I don't think 21.97 percent free space is anything to worry about.
 Especially since, if you have not done any successful VACUUM FULL or
CLUSTER recently, the observed value represents the effects of a long
history under different settings, so it means little for what to expect in
the future.  You were doing vacuum full every night, but if it always timed
out then the old state of the table was retained.


 Also when I enabled autovacuum logs, I saw the autovacuums triggering very
 rarely. May be that's the reason for too much free space? I am going to try
 with the following settings:

 log_autovacuum_min_duration = 0

 autovacuum_vacuum_scale_factor = 0

 autovacuum_vacuum_threshold = 4


I would not use that setting system wide, or any small tables you have
might bloat by a lot.  The size lost to such bloating is not meaningful,
but the performance lost to it could be.

Indeed, I don't think you need to change these at all, or at least not
based on current evidence.

The only concrete problem you have is that you are doing vacuum fulls,
which are probably unnecessary, and they are hitting an arbitrary timeout,
which was also probably unnecessary.  So just stop doing those two things,
and monitor the situation to see what happens.

Cheers,

Jeff


Re: [GENERAL] Vacuuming strategy

2014-05-02 Thread Jeff Janes
On Wed, Apr 30, 2014 at 10:45 AM, Elanchezhiyan Elango elanela...@gmail.com
 wrote:

 Missed to answer this one:

  Is the 5GB for the table plus indexes, or just the table itself?

 No it's not including the the indices. Including indices it's actually
 17GB!!


Is it one particular index that is bloated?

Cheers,

Jeff


Re: [GENERAL] Manipulating jsonb

2014-05-02 Thread Oleg Bartunov
Andreas,

take a look on https://github.com/erthalion/jsonbx. This is a place,
where all hstore functionality will be eventually ported. See this
table - https://gist.github.com/erthalion/10890778

Oleg

On Fri, May 2, 2014 at 10:32 PM, Andreas Heiduk ashei...@gmail.com wrote:
 Hello,

 I'm playing with a 9.4 preview and the upcoming jsonb datatype. But
 I'm missing the basic modification syntax.

 Given the following situation:

 CREATE TABLE test(key int, jsonval jsonb);
 INSERT INTO test VALUES(1, '{a: 1, c: 3}');

 How can I UPDATE that row with '{b: 2, d: 4}' ? Something like
 this does not work:

 UPDATE test SET jsonval = jsonval || '{a: 1, c: 3}'::jsonb
 where key = 1;

 The result should be

 {a: 1, b: 2, c: 3, d: 4}

 The same goes for removing values.

 Did I overlook something obvious or is there really no way to do that right 
 now?

 Best regards,
 Andreas Heiduk


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Vacuuming strategy

2014-05-02 Thread Jeff Janes
On Wed, Apr 30, 2014 at 10:49 AM, Elanchezhiyan Elango elanela...@gmail.com
 wrote:


 Each table has 4 indices. The updates are to the indexed columns.

 Here is the schema of a table: http://pastebin.com/ffu0dUjS All tables
 have this same schema except that some tables don't have a port column and
 so will have one less index

 What indexes exist?  Are the updates to indexed columns?


 Sorry I was wrong when I said the updates are to the indexed columns. The
 updates are to the 'data[]' column which doesn't have any index.



In this case your updates might qualify for Heap-Only-Tuple (HOT) updates.
 Those are nice because they don't bloat the indexes and because the space
within the tables can be re-used faster, not needing an intervening vacuum
in order to reuse it.  But to get that benefit, you do have to have enough
space in the page to record the new tuple, and so it would benefit from a
lower fillfactor.  But once that part of the table goes cold, then the
lower fillfactor no longer is a benefit and justs wastes space.  That is
where partitioning could come in handy.  Once the tuples have become old
enough that they can't be updated any more by the app, just move them from
the live partition to the cold partition and they get repacked in the
process

But so far there is no indication you need to resort to such things.

Cheers,

Jeff


Re: [GENERAL] Timeouts after upgrade from 9.0 to 9.3

2014-05-02 Thread Adrian Klaver

On 05/02/2014 10:46 AM, gvim wrote:

I've just upgraded a server from 9.0 to 9.3 using the pgupgrade.sh
script and setup the same config but now,  when I connect with the
latest pgAdmin III I get a timeout after leaving the connection open for
a few minutes. Where can I diagnose this fault?


A lot of information missing, so flying sort of blind here. To help fill 
in the blanks:


1) What OS/distribution are you running?

2) I am not familiar with pgupgrade.sh. What does it do?

3) You say latest PgAdmin3, does that mean you upgraded it at the same 
time and if so to what version?


As to how to troubleshoot:

1) Look in the Postgres log and see if there is anything informative. If 
you have not already, in postgresql.conf enable connection/disconnection 
logging.


2) Are connecting locally or remotely?



gvim





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Tim van der Linden
On Fri, 2 May 2014 21:12:56 +0400
Oleg Bartunov obartu...@gmail.com wrote:

Hi Oleg

Thanks for the response!

 Yes, it's normal for ispell dictionary, think about morphological dictionary.

Hmm, I see, that makes sense. I thought the morphological aspect of the Ispell 
only dealt with splitting up compound words, but it also deals with deriving 
the word to a more stem like form, correct?

As a last question on this, is there a way to disable this dictionary to emit 
multiple lexemes? 

The reason I am asking is because in my (fairly new) understanding of 
PostgreSQL's full text it is always best to have as few lexemes as possible 
saved in the vector. This to get smaller indexes and faster matching 
afterwards. Also, if you run a tsquery afterwards to, you can still employ the 
power of these multiple lexemes to find a match.

Or...probably answering my own question...if I do not desire this behavior I 
should maybe not use Ispell and simply use another dictionary :)

Thanks again.

Cheers,
Tim

 On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp wrote:
  Good morning/afternoon all
 
  I am currently writing a few articles about PostgreSQL's full text 
  capabilities and have a question about the Ispell dictionary which I cannot 
  seem to find an answer to. It is probably a very simple issue, so forgive 
  my ignorance.
 
  In one article I am explaining about dictionaries and I have setup a sample 
  configuration which maps most token categories to only use a Ispell 
  dictionary (timusan_ispell) which has a default configuration:
 
  CREATE TEXT SEARCH DICTIONARY timusan_ispell (
  TEMPLATE = ispell,
  DictFile = en_us,
  AffFile = en_us,
  StopWords = english
  );
 
  When I run a simple query like SELECT 
  to_tsvector('timusan-ispell','smiling') I get back the following tsvector:
 
  'smile':1 'smiling':1
 
  As you can see I get two lexemes with the same pointer.
  The question here is: why does this happen?
 
  Is it normal behavior for the Ispell dictionary to emit multiple lexemes 
  for a single token? And if so, is this efficient? I mean, why could it not 
  simply save one lexeme 'smile' which (same as the snowball dictionary) 
  would match 'smiling' as well if later matched with the accompanying 
  tsquery?
 
  Thanks!
 
  Cheers,
  Tim
 
 
  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general


-- 
Tim van der Linden t...@shisaa.jp


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Oleg Bartunov
Tim,

you did answer yourself - don't use ispell :)

On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden t...@shisaa.jp wrote:
 On Fri, 2 May 2014 21:12:56 +0400
 Oleg Bartunov obartu...@gmail.com wrote:

 Hi Oleg

 Thanks for the response!

 Yes, it's normal for ispell dictionary, think about morphological dictionary.

 Hmm, I see, that makes sense. I thought the morphological aspect of the 
 Ispell only dealt with splitting up compound words, but it also deals with 
 deriving the word to a more stem like form, correct?

 As a last question on this, is there a way to disable this dictionary to emit 
 multiple lexemes?

 The reason I am asking is because in my (fairly new) understanding of 
 PostgreSQL's full text it is always best to have as few lexemes as possible 
 saved in the vector. This to get smaller indexes and faster matching 
 afterwards. Also, if you run a tsquery afterwards to, you can still employ 
 the power of these multiple lexemes to find a match.

 Or...probably answering my own question...if I do not desire this behavior I 
 should maybe not use Ispell and simply use another dictionary :)

 Thanks again.

 Cheers,
 Tim

 On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp wrote:
  Good morning/afternoon all
 
  I am currently writing a few articles about PostgreSQL's full text 
  capabilities and have a question about the Ispell dictionary which I 
  cannot seem to find an answer to. It is probably a very simple issue, so 
  forgive my ignorance.
 
  In one article I am explaining about dictionaries and I have setup a 
  sample configuration which maps most token categories to only use a Ispell 
  dictionary (timusan_ispell) which has a default configuration:
 
  CREATE TEXT SEARCH DICTIONARY timusan_ispell (
  TEMPLATE = ispell,
  DictFile = en_us,
  AffFile = en_us,
  StopWords = english
  );
 
  When I run a simple query like SELECT 
  to_tsvector('timusan-ispell','smiling') I get back the following tsvector:
 
  'smile':1 'smiling':1
 
  As you can see I get two lexemes with the same pointer.
  The question here is: why does this happen?
 
  Is it normal behavior for the Ispell dictionary to emit multiple lexemes 
  for a single token? And if so, is this efficient? I mean, why could it not 
  simply save one lexeme 'smile' which (same as the snowball dictionary) 
  would match 'smiling' as well if later matched with the accompanying 
  tsquery?
 
  Thanks!
 
  Cheers,
  Tim
 
 
  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general


 --
 Tim van der Linden t...@shisaa.jp


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text: Ispell dictionary

2014-05-02 Thread Tim van der Linden
Hi Oleg

Haha, understood!

Thanks for helping me on this one.

Cheers
Tim

On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov obartu...@gmail.com wrote:
Tim,

you did answer yourself - don't use ispell :)

On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden t...@shisaa.jp
wrote:
 On Fri, 2 May 2014 21:12:56 +0400
 Oleg Bartunov obartu...@gmail.com wrote:

 Hi Oleg

 Thanks for the response!

 Yes, it's normal for ispell dictionary, think about morphological
dictionary.

 Hmm, I see, that makes sense. I thought the morphological aspect of
the Ispell only dealt with splitting up compound words, but it also
deals with deriving the word to a more stem like form, correct?

 As a last question on this, is there a way to disable this dictionary
to emit multiple lexemes?

 The reason I am asking is because in my (fairly new) understanding of
PostgreSQL's full text it is always best to have as few lexemes as
possible saved in the vector. This to get smaller indexes and faster
matching afterwards. Also, if you run a tsquery afterwards to, you can
still employ the power of these multiple lexemes to find a match.

 Or...probably answering my own question...if I do not desire this
behavior I should maybe not use Ispell and simply use another
dictionary :)

 Thanks again.

 Cheers,
 Tim

 On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden t...@shisaa.jp
wrote:
  Good morning/afternoon all
 
  I am currently writing a few articles about PostgreSQL's full text
capabilities and have a question about the Ispell dictionary which I
cannot seem to find an answer to. It is probably a very simple issue,
so forgive my ignorance.
 
  In one article I am explaining about dictionaries and I have setup
a sample configuration which maps most token categories to only use a
Ispell dictionary (timusan_ispell) which has a default configuration:
 
  CREATE TEXT SEARCH DICTIONARY timusan_ispell (
  TEMPLATE = ispell,
  DictFile = en_us,
  AffFile = en_us,
  StopWords = english
  );
 
  When I run a simple query like SELECT
to_tsvector('timusan-ispell','smiling') I get back the following
tsvector:
 
  'smile':1 'smiling':1
 
  As you can see I get two lexemes with the same pointer.
  The question here is: why does this happen?
 
  Is it normal behavior for the Ispell dictionary to emit multiple
lexemes for a single token? And if so, is this efficient? I mean, why
could it not simply save one lexeme 'smile' which (same as the snowball
dictionary) would match 'smiling' as well if later matched with the
accompanying tsquery?
 
  Thanks!
 
  Cheers,
  Tim
 
 
  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general


 --
 Tim van der Linden t...@shisaa.jp


Re: [GENERAL] Manipulating jsonb

2014-05-02 Thread Oleg Bartunov
No way, Andreas !

But, we hope, Dimitri will release his extension before 9.4, so
anybody could install it.

Oleg

On Sat, May 3, 2014 at 1:21 AM, Andreas Heiduk ashei...@gmail.com wrote:
 Hello Oleg,

 how are the odds that the '||' and '-' operators from jsonbx will be
 included in the public 9.4 release?

 Andreas


 On 2 May 2014 21:21, Oleg Bartunov obartu...@gmail.com wrote:
 Andreas,

 take a look on https://github.com/erthalion/jsonbx. This is a place,
 where all hstore functionality will be eventually ported. See this
 table - https://gist.github.com/erthalion/10890778

 Oleg

 On Fri, May 2, 2014 at 10:32 PM, Andreas Heiduk ashei...@gmail.com wrote:
 Hello,

 I'm playing with a 9.4 preview and the upcoming jsonb datatype. But
 I'm missing the basic modification syntax.

 Given the following situation:

 CREATE TABLE test(key int, jsonval jsonb);
 INSERT INTO test VALUES(1, '{a: 1, c: 3}');

 How can I UPDATE that row with '{b: 2, d: 4}' ? Something like
 this does not work:

 UPDATE test SET jsonval = jsonval || '{a: 1, c: 3}'::jsonb
 where key = 1;

 The result should be

 {a: 1, b: 2, c: 3, d: 4}

 The same goes for removing values.

 Did I overlook something obvious or is there really no way to do that right 
 now?

 Best regards,
 Andreas Heiduk


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general