[BUGS] BUG #8469: Xpath behaviour unintuitive / arguably wrong

2013-09-24 Thread dennis . noordsij
The following bug has been logged on the website:

Bug reference:  8469
Logged by:  Dennis
Email address:  dennis.noord...@helsinki.fi
PostgreSQL version: 9.3.0
Operating system:   FreeBSD 9.2-RC4
Description:

Hi,


After upgrading an 8.1 version to 9.3.0 I am suddenly seeing text fields
containing "&" where they are populated from XML. This may be a
coincidence and the problem may have existed earlier, in any case, now I
noticed.


I extract the text content of XML nodes using xpath, from something like:


Jones & Smith


The reason I end up with "&" is the IMHO rather odd xpath behaviour:


# select xpath('/a/text()', (select xmlelement(name "a", 'A & B')));


 xpath 
---
 {"A & B"}


The canonical contents of "a" is "A & B". At first search I've found some
rather heated debates about this with bits of name calling; I certainly do
not want to get into that and I apologize in advance to those who feel very
strongly about this.


I've seen one "fix" describe the problem as:


""DESCRIPTION: Submitter invokes following statement:
SELECT (XPATH('/*/text()', '<'))[1].
He expect (escaped) result "<", but gets "<"
"""


With respect, this "bug" makes no sense as this produces in fact the right
result. The actual value of  is "<", it's just escaped when serialized
to XML. If  were to actually contain "<", it'd be serialized as
"<". It should not be possible to be blindly cast to a text type, but
explicitly serialized as such.


At least the reviewer at:


http://www.postgresql.org/message-id/201106291934.23089.rsmog...@softperience.eu


agrees, but I don't know what happened with that.


The python lxml implementation based on libxml2 seems to also agree:


>>> from lxml import etree
>>> a = etree.XML("")
>>> a.text = "A & B";
>>> a

>>> etree.tostring(a);
'A & B'
>>> a.text
'A & B'
>>> a.xpath('/a')
[]
>>> a.xpath('/a/text()')
['A & B']


and similarly for a simple test using xsltproc when set to output text.


If this really is the intended behaviour or something which can or will not
be changed, then it invites double (un)escaping bugs and so on, and I would
like to ask how you are supposed to sanely extract the intended text from a
node in an XML document without risking double (un)escaping, and whether
everybody else is doing it wrong?


I get that xpath(..text()) apparently wants to return a type XML, that this
is on purpose and that there are certain use cases where you want to treat
the result as a type XML which you could not do if it returned an unescaped
text value, like here:
 
select xmlelement(name "b", (select (select xpath('/a/text()', (select
xmlelement(name "a", 'A & B'[1]));
xmlelement
--
 A & B


which does not double escape the contents, but where if you cast, it does:


select xmlelement(name "b", (select (select xpath('/a/text()', (select
xmlelement(name "a", 'A & B'[1]::text));
  xmlelement  
--
 A & B
(1 row)


I personally don't believe this is very helpful. The escaping is only a
serialization artifact, a text node does not actually contain any &s and
so on.


My first thought is then that casting between text and xml should not even
be possible, and always an explicit (de)serialization to/from text using a
chosen encoding (with a shortcut to a PostgreSQL unicode text type), i.e.
treated similarly to the difference between a unicode string and utf-8
encoded representation , and not the equivalent of blindly casting a byte
sequence to a string and back and hoping for the best.


If xpath(..text()) then absolutely has to return a type XML I would be happy
to explicitly serialize it to a type text, if PostgreSQL would forbid me
from (accidentally) storing a result in my text field I almost certainly did
not intend (the escaped value containing &). Of course my first
preference would be that it would return a type TEXT.


I appreciate any thoughts and workarounds. I don't really want to add xml
unescapes everywhere, that feels like that php method of unescaping a string
until it stops changing. If the user did intend the literal text "&" I
of course want to preserve that.


Many thanks!



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #4814: Segmentation fault when using indexed prefix FT search

2009-05-17 Thread Dennis Noordsij

The following bug has been logged online:

Bug reference:  4814
Logged by:  Dennis Noordsij
Email address:  dennis.noord...@helsinki.fi
PostgreSQL version: 8.4 SNAP 17MAY
Operating system:   Linux
Description:Segmentation fault when using indexed prefix FT search
Details: 

To replicate on latest 8.4 snapshot (17 May):


begin;

create table foo (
ts  tsvector not null
);

-- Without this index the query does not
-- crash.
create index foo_idx on foo using gin(ts);

insert into foo values (to_tsvector('Lore ipsum'));

select * from foo where 'ipsu:*'::tsquery @@ ts;

-- The query works correctly when not doing a
-- prefix search

abort;



The EXPLAIN for the search query is:

  QUERY PLAN
--
 Bitmap Heap Scan on foo  (cost=4.30..14.45 rows=7 width=32)
   Recheck Cond: ('''ipsu'':*'::tsquery @@ ts)
   ->  Bitmap Index Scan on foo_idx  (cost=0.00..4.30 rows=7 width=0)
 Index Cond: ('''ipsu'':*'::tsquery @@ ts)





Backtrace:


#0  0x in ?? ()
#1  0x0075144d in FunctionCall4 (flinfo=0xd4c908, arg1=13968504,
arg2=47309168818128, arg3=1, arg4=13968480) at fmgr.c:1360
#2  0x004b7480 in matchPartialInPendingList (ginstate=0xd4b0d8,
page=0x2b07063397e0 "", off=1, maxoff=3, value=13968504,
attrnum=1, datum=0x7fffa5883ae0, datumExtracted=0x7fffa5885ae0
"\001\001", strategy=1, extra_data=0xd52460 "") at ginget.c:858
#3  0x004b794c in collectDatumForItem (scan=0xd4afb0,
pos=0x7fffa5885fa0) at ginget.c:991
#4  0x004b7b45 in scanPendingInsert (scan=0xd4afb0, tbm=0xd4e2b8,
ntids=0x7fffa5886028) at ginget.c:1081
#5  0x004b7ff4 in gingetbitmap (fcinfo=0x7fffa5886070) at
ginget.c:1229
#6  0x0075125e in FunctionCall2 (flinfo=0xd21330, arg1=13938608,
arg2=13951672) at fmgr.c:1308
#7  0x00480696 in index_getbitmap (scan=0xd4afb0, bitmap=0xd4e2b8)
at indexam.c:670
#8  0x0059c2cd in MultiExecBitmapIndexScan (node=0xd4ad10) at
nodeBitmapIndexscan.c:91
#9  0x005894db in MultiExecProcNode (node=0xd4ad10) at
execProcnode.c:501
#10 0x0059b643 in BitmapHeapNext (node=0xd338d0) at
nodeBitmapHeapscan.c:131
#11 0x00593a44 in ExecScan (node=0xd338d0, accessMtd=0x59b494
) at execScan.c:68
#12 0x0059bdaf in ExecBitmapHeapScan (node=0xd338d0) at
nodeBitmapHeapscan.c:429
#13 0x0058928e in ExecProcNode (node=0xd338d0) at
execProcnode.c:373
#14 0x005a300e in ExecNestLoop (node=0xd32830) at
nodeNestloop.c:154
#15 0x0058930c in ExecProcNode (node=0xd32830) at
execProcnode.c:404
#16 0x00586b61 in ExecutePlan (estate=0xd32310, planstate=0xd32830,
operation=CMD_SELECT, numberTuples=0,
direction=ForwardScanDirection, dest=0xd2fba0) at execMain.c:1504
#17 0x00584df4 in standard_ExecutorRun (queryDesc=0xcefa20,
direction=ForwardScanDirection, count=0) at execMain.c:309
#18 0x00584cfd in ExecutorRun (queryDesc=0xcefa20,
direction=ForwardScanDirection, count=0) at execMain.c:258
#19 0x0066a47e in PortalRunSelect (portal=0xd1ae40, forward=1
'\001', count=0, dest=0xd2fba0) at pquery.c:953
#20 0x0066a121 in PortalRun (portal=0xd1ae40,
count=9223372036854775807, isTopLevel=1 '\001', dest=0xd2fba0,
altdest=0xd2fba0, completionTag=0x7fffa5886aa0 "") at pquery.c:779
#21 0x0066490d in exec_simple_query (
query_string=0xcbe830 "select * from foo, to_tsquery('ipsu:*') as query
where query @@ ts;") at postgres.c:991
#22 0x0066867d in PostgresMain (argc=4, argv=0xc21178,
username=0xc21140 "developer") at postgres.c:3606
#23 0x00632c0d in BackendRun (port=0xc423e0) at postmaster.c:3389
#24 0x006321f4 in BackendStartup (port=0xc423e0) at
postmaster.c:3005
#25 0x0062fb0a in ServerLoop () at postmaster.c:1343
#26 0x0062f409 in PostmasterMain (argc=1, argv=0xc1e750) at
postmaster.c:1039
#27 0x005bd216 in main (argc=1, argv=0xc1e750) at main.c:188

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4793: Segmentation fault when doing vacuum analyze

2009-05-05 Thread Dennis Noordsij
On Tuesday 05 May 2009 17:16:03 Tom Lane wrote:
> "Dennis Noordsij"  writes:
> > (gdb) bt
> > #0  0x004eecf7 in compute_scalar_stats (stats=0x1abd878,
> > fetchfunc=0x4f0f30 , samplerows= > out>, totalrows=4154315)
> > at analyze.c:2321
> > #1  0x004efbf5 in analyze_rel (relid=16484, vacstmt=0x1aaf140,
> > bstrategy=, update_reltuples=1 '\001') at
> > analyze.c:433
>
> Hmm, that code hasn't changed in quite some time, so I doubt this is a
> new bug in 8.4.  You'll need to either poke into it yourself, or supply
> a dump of the table to someone who can.
>
>   regards, tom lane

As an update on the bug (sorry if this arrived twice):

for (i = 0; i < num_hist; i++)
{
int pos;

pos = (i * (nvals - 1)) / (num_hist - 1);
hist_values[i] = datumCopy(values[pos].value,
   stats->attr->attbyval,
   stats->attr->attlen);
}

What happens is that:

(gdb) print i
$17 = 1458
(gdb) print nvals
$18 = 1473527
(gdb) print num_hist
$19 = 5001
(gdb) print pos
$20 = -429313

(gdb) print samplerows
$22 = 150
(gdb) print values_cnt
$34 = 150

(gdb) print ndistinct
$35 = 904980
(gdb) print nmultiple
$36 = 435290
(gdb) print num_hist
$37 = 5001
(gdb) print dups_cnt
$38 = 0
(gdb) print slot_idx
$39 = 1

Without the overflow the result of 
pos = (i * (nvals - 1)) / (num_hist - 1);
would be 429680, which would be a valid index into "values"

Cheers
Dennis


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #4793: Segmentation fault when doing vacuum analyze

2009-05-05 Thread Dennis Noordsij

The following bug has been logged online:

Bug reference:  4793
Logged by:  Dennis Noordsij
Email address:  dennis.noord...@helsinki.fi
PostgreSQL version: snapshot/beta1
Operating system:   64bit arch Linux
Description:Segmentation fault when doing vacuum analyze
Details: 

Seen on both 8.4beta1 and daily snapshot downloaded 5 May 2009.

Table used (effects also seen on similar tables):

osm=# \d way_tags
   Table "public.way_tags"
 Column |  Type  | Modifiers
++---
 way| bigint | not null
 key| text   | not null
 value  | text   |
Indexes:
"way_tags_pkey" PRIMARY KEY, btree (way, key)
"way_tag_kv_idx" btree (key, value)
"way_tag_wkv_idx" btree (way, key, value)


osm=# select count(*) from way_tags;
  count
-
 4154315
(1 row)

osm=# select count(*) from (select distinct key from way_tags) as foo;
 count
---
   322
(1 row)

osm=# select count(*) from (select distinct value from way_tags) as foo;
  count
-
 1124909
(1 row)



Statement:
alter table way_tags alter column way set statistics 5000;
alter table way_tags alter column key set statistics 5000;
vacuum analyze;

results in a segmentation fault with the following backtrace:

(gdb) bt
#0  0x004eecf7 in compute_scalar_stats (stats=0x1abd878,
fetchfunc=0x4f0f30 , samplerows=,
totalrows=4154315)
at analyze.c:2321
#1  0x004efbf5 in analyze_rel (relid=16484, vacstmt=0x1aaf140,
bstrategy=, update_reltuples=1 '\001') at
analyze.c:433
#2  0x00538681 in vacuum (vacstmt=0x1aaf140, relid=,
do_toast=1 '\001', bstrategy=, for_wraparound=0
'\0',
isTopLevel=) at vacuum.c:466
#3  0x005e8bc7 in PortalRunUtility (portal=0x1ae00c0,
utilityStmt=0x1aaf140,
isTopLevel=64 '@', dest=0xabc160, completionTag=0x7fff39e7f0e0 "") at
pquery.c:1192
#4  0x005e9d0d in PortalRunMulti (portal=0x1ae00c0,
isTopLevel=,
dest=0xabc160, altdest=0xabc160, completionTag=0x7fff39e7f0e0 "") at
pquery.c:1297
#5  0x005ea482 in PortalRun (portal=0x1ae00c0,
count=9223372036854775807,
isTopLevel=1 '\001', dest=0xabc160, altdest=0xabc160,
completionTag=0x7fff39e7f0e0 "")
at pquery.c:823
#6  0x005e5807 in exec_simple_query (query_string=0x1aae880 "vacuum
analyze way_tags;\n")
at postgres.c:991
#7  0x005e6dc7 in PostgresMain (argc=1, argv=,
username=0x1a07560 "dennis") at postgres.c:3606
#8  0x005676af in main (argc=5, argv=0x1a05b70) at main.c:186



Note if I vacuum analyze between the alter table statements then everything
completes fine.

After the segfault, restarting and then running "vacuum analyze way_tags" is
enough to trigger the segfault again.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs