date:20181223

Small doc tweak for array/string functions

2018-12-23 Thread Ian Barwick


Hi

On these pages:

 - https://www.postgresql.org/docs/current/functions-array.html
 - https://www.postgresql.org/docs/current/functions-string.html

we point out via "See also" the existence of aggregate array and string
functions, but I think it would be useful to also mention the existence
of string-related array functions and array-related string (regexp) functions
respectively.

(Background: due to brain fade I was looking on the array functions page
for the array-related function whose name was escaping me which does something
with regexes to make an array, and was puzzled to find no reference on that 
page).


Regards

Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 3786099..1684189
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 2363,2369 
  
 
 See also the aggregate function string_agg in
!.
 
  
 
--- 2363,2371 
  
 
 See also the aggregate function string_agg in
!, and the array functions
!array_to_string and string_to_array
!in .
 
  
 
*** NULL baz(3 rows)
*** 13163,13169 
  
 
  See also  about the aggregate
! function array_agg for use with arrays.
 

  
--- 13165,13173 
  
 
  See also  about the aggregate
! function array_agg for use with arrays, and
!  about the regular expression function
! regexp_split_to_array.

Minor comment fix for pg_config_manual.h

2018-12-23 Thread Ian Barwick


Hi

Attached is mainly to fix a comment in $subject which has a typo in the 
referenced initdb
option ("--walsegsize", should be "--wal-segsize"), and while I'm there also 
adds a
couple of "the" for readability.


Regards

Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
new file mode 100644
index cc5eedf..cf1cfb4
*** a/src/include/pg_config_manual.h
--- b/src/include/pg_config_manual.h
***
*** 14,21 
   */
  
  /*
!  * This is default value for wal_segment_size to be used at initdb when run
!  * without --walsegsize option. Must be a valid segment size.
   */
  #define DEFAULT_XLOG_SEG_SIZE	(16*1024*1024)
  
--- 14,21 
   */
  
  /*
!  * This is the default value for wal_segment_size to be used at initdb when run
!  * without the --wal-segsize option. Must be a valid segment size.
   */
  #define DEFAULT_XLOG_SEG_SIZE	(16*1024*1024)

Re: Performance issue in foreign-key-aware join estimation

2018-12-23 Thread David Rowley

On Sat, 22 Dec 2018 at 10:53, David Rowley  wrote:
> Back in [1], I mentioned that I'd like to start moving away from our
> linked list based implementation of List and start using an array
> based version instead. If we had this we could easily further improve
> this code fkey matching code to not even look near equivalence classes
> that don't contain members for the relations in question with a design
> something like:
>
> 1. Make PlannerInfo->eq_classes an array list instead of an array,
> this will significantly improve the performance of list_nth().
> 2. Have a Bitmapset per relation that indexes which items in
> eq_classes have ec_members for the given relation.
> 3. In match_eclasses_to_foreign_key_col() perform a bms_overlap() on
> the eq_classes index bitmapsets for the relations at either side of
> the foreign key then perform a bms_next_member() type loop over the
> result of that in order to skip over the eq_classes items that can't
> match.

Using the above idea, but rather than going to the trouble of storing
PlannerInfo->eq_classes as an array type list, if we build the array
on the fly inside match_foreign_keys_to_quals(), then build a
Bitmapset type index to mark which of the eclasses contains members
for each relation, then I can get the run-time for the function down
to just 0.89%.  Looking at other functions appearing high on the
profile I also see; have_relevant_eclass_joinclause() (14%),
generate_join_implied_equalities_for_ecs() (23%).

I think if we seriously want to improve planning performance when
there are many stored equivalence classes, then we need to have
indexing along the lines of what I've outlined above.

I've attached the patch I used to test this idea.  It might be
possible to develop this into something committable, perhaps if we
invent a new function in equivclass.c that builds the index into a
single struct and we pass a pointer to that down to the functions that
require the index.  Such a function could also optionally skip
indexing eclasses such as ones with ec_has_volatile or ec_has_const
when the use case for the index requires ignoring such eclasses.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

poc_eclass_indexing_v1.patch
Description: Binary data

Re: Function to track shmem reinit time

2018-12-23 Thread Alexander Korotkov

On Tue, Apr 10, 2018 at 8:58 PM Robert Haas  wrote:
> On Tue, Apr 10, 2018 at 9:07 AM, David Steele  wrote:
> > On 3/29/18 9:40 AM, Tomas Vondra wrote:
> >> On 03/28/2018 08:55 PM, David Steele wrote:
> >>> I'm setting this entry to Waiting on Author, but based on the discussion
> >>> I think it should be Returned with Feedback.
> >>
> >> Fine with me.
> >
> > This entry has been marked Returned with Feedback.
>
> I guess I'm in the minority here, but I don't see why it isn't useful
> to have something like this. It's a lot easier for a monitoring
> process to poll for this kind of thing than it is to monitor the logs.
> Not that log monitoring isn't a good idea, but this is pretty
> lightweight.

+1,

I think the issue this patch is addressing is that PostgreSQL now
doesn't have true uptime function.  We now recommend using
pg_postmaster_start_time() for uptime calculation, but that isn't it.
When backed crashes, shmen is reinitialized and PostgreSQL is
restarted then uptime should be reset to zero, but
pg_postmaster_start_time() is not about that.

The major argument against this patch I got from the thread is that we
shouldn't introduce new functions exposing information, which could be
already fetched from logs.  However, I think following this logic we
should remove pg_postmaster_start_time() too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2018-12-23 Thread John Naylor

On 12/22/18, Tom Lane  wrote:
> John Naylor  writes:
>> Using a single file also gave me another idea: Take value and category
>> out of ScanKeyword, and replace them with an index into another array
>> containing those, which will only be accessed in the event of a hit.
>> That would shrink ScanKeyword to 4 bytes (offset, index), further
>> increasing locality of reference. Might not be worth it, but I can try
>> it after moving on to the core scanner.
>
> I like that idea a *lot*, actually, because it offers the opportunity
> to decouple this mechanism from all assumptions about what the
> auxiliary data for a keyword is.

Okay, in that case I went ahead and did it for WIP v3.

> (it'd be a good idea to have a switch that allows specifying the
> prefix of these constant names).

Done as an optional switch, and tested, but not yet used in favor of
the previous method as a fallback. I'll probably do it in the final
version to keep lines below 80, and to add 'core_' to the core keyword
vars.

> /* Payload data for keywords */
> typedef struct MyKeyword
> {
> int16value;
> int16category;
> } MyKeyword;

I tweaked this a bit to

typedef struct ScanKeywordAux
{
int16   value;  /* grammar's token code */
charcategory;   /* see codes above */
} ScanKeywordAux;

It seems that category was only 2 bytes to make ScanKeyword a power of
2 (of course that was on 32 bit machines and doesn't hold true
anymore). Using char will save another few hundred bytes in the core
scanner. Since we're only accessing this once per identifier, we may
not need to worry so much about memory alignment.

> Aside from being arguably better from the locality-of-reference
> standpoint, this gets us out of the weird ifdef'ing you've got in
> the v2 patch.  The kwlist_d.h headers can be very ordinary headers.

Yeah, that's a nice (and for me unexpected) bonus.

-John Naylor
 src/common/keywords.c |  55 
 src/include/common/keywords.h |  12 +++
 src/pl/plpgsql/src/.gitignore |   1 +
 src/pl/plpgsql/src/Makefile   |  13 +--
 src/pl/plpgsql/src/pl_scanner.c   | 127 +++
 src/pl/plpgsql/src/pl_unreserved_kwlist.h | 107 +++
 src/tools/gen_keywords.pl | 139 ++
 src/tools/msvc/Solution.pm|  10 +++
 8 files changed, 361 insertions(+), 103 deletions(-)

diff --git a/src/common/keywords.c b/src/common/keywords.c
index 0c0c794c68..b0e5a721b6 100644
--- a/src/common/keywords.c
+++ b/src/common/keywords.c
@@ -112,3 +112,58 @@ ScanKeywordLookup(const char *text,
 
 	return NULL;
 }
+
+/* Like ScanKeywordLookup, but uses offsets into a keyword string. */
+int
+ScanKeywordLookupOffset(const char *string_to_lookup,
+		const char *kw_strings,
+		const uint16 *kw_offsets,
+		int num_keywords)
+{
+	int			len,
+i;
+	char		word[NAMEDATALEN];
+	const uint16 *low;
+	const uint16 *high;
+
+	len = strlen(string_to_lookup);
+	/* We assume all keywords are shorter than NAMEDATALEN. */
+	if (len >= NAMEDATALEN)
+		return -1;
+
+	/*
+	 * Apply an ASCII-only downcasing.  We must not use tolower() since it may
+	 * produce the wrong translation in some locales (eg, Turkish).
+	 */
+	for (i = 0; i < len; i++)
+	{
+		char		ch = string_to_lookup[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		word[i] = ch;
+	}
+	word[len] = '\0';
+
+	/*
+	 * Now do a binary search using plain strcmp() comparison.
+	 */
+	low = kw_offsets;
+	high = kw_offsets + (num_keywords - 1);
+	while (low <= high)
+	{
+		const uint16 *middle;
+		int			difference;
+
+		middle = low + (high - low) / 2;
+		difference = strcmp(kw_strings + *middle, word);
+		if (difference == 0)
+			return middle - kw_offsets;
+		else if (difference < 0)
+			low = middle + 1;
+		else
+			high = middle - 1;
+	}
+
+	return -1;
+}
diff --git a/src/include/common/keywords.h b/src/include/common/keywords.h
index 0b31505b66..201d0fcc7a 100644
--- a/src/include/common/keywords.h
+++ b/src/include/common/keywords.h
@@ -28,6 +28,13 @@ typedef struct ScanKeyword
 	int16		category;		/* see codes above */
 } ScanKeyword;
 
+/* Payload data for keywords */
+typedef struct ScanKeywordAux
+{
+	int16		value;			/* grammar's token code */
+	char		category;		/* see codes above */
+} ScanKeywordAux;
+
 #ifndef FRONTEND
 extern PGDLLIMPORT const ScanKeyword ScanKeywords[];
 extern PGDLLIMPORT const int NumScanKeywords;
@@ -41,4 +48,9 @@ extern const ScanKeyword *ScanKeywordLookup(const char *text,
   const ScanKeyword *keywords,
   int num_keywords);
 
+int ScanKeywordLookupOffset(const char *string_to_lookup,
+		const char *kw_strings,
+		const uint16 *kw_offsets,
+		int num_keywords);
+
 #endif			/* KEYWORDS_H */
diff --git a/src/pl/plpgsql/src/.gitignore b/src/pl/plpgsql/src/.gitignore
index ff6ac965fd..a649302fdb 100644
---

Re: Joins on TID

2018-12-23 Thread Komяpa

Hi,

Writing as someone who used TID joins and group by's in the past.

One use case is having a chance to peek into what will DELETE do.
A lot of GIS tables don't have any notion of ID, and dirty datasets tend to
have many duplicates you need to cross-reference with something else. So,
you write your query in form of

CREATE TABLE ttt as (SELECT distinct on (ctid) ctid as ct, field1, field2,
b.field3, ... from table b join othertable b on ST_Whatever(a.geom,
b.geom));



DELETE FROM table a USING ttt b where a.ctid = b.ct;
DROP TABLE ttt;

Here:
 - distinct on ctid is used (hash?)
 - a.ctid = b.ct (hash join candidate?)

I know it's all better with proper IDs, but sometimes it works like that,
usually just once per dataset.


сб, 22 дек. 2018 г. в 19:31, Tom Lane :

> Simon Riggs  writes:
> > On Sat, 22 Dec 2018 at 04:31, Tom Lane  wrote:
> >> BTW, if we're to start taking joins on TID seriously, we should also
> >> add the missing hash opclass for TID, so that you can do hash joins
> >> when dealing with a lot of rows.
>
> > I don't think we are trying to do TID joins more seriously, just fix a
> > special case.
> > The case cited requires the batches of work to be small, so nested loops
> > works fine.
> > Looks to me that Edmund is trying to solve the same problem. If so, this
> is
> > the best solution.
>
> No, I think what Edmund is on about is unrelated, except that it touches
> some of the same code.  He's interested in problems like "find the last
> few tuples in this table".  You can solve that today, with e.g.
> "SELECT ... WHERE ctid >= '(n,1)'", but you get a stupidly inefficient
> plan.  If we think that's a use-case worth supporting then it'd be
> reasonable to provide less inefficient implementation(s).
>
> What I'm thinking about in this thread is joins on TID, which we have only
> very weak support for today --- you'll basically always wind up with a
> mergejoin, which requires full-table scan and sort of its inputs.  Still,
> that's better than a naive nestloop, and for years we've been figuring
> that that was good enough.  Several people in the other thread that
> I cited felt that that isn't good enough.  But if we think it's worth
> taking seriously, then IMO we need to add both parameterized scans (for
> nestloop-with-inner-fetch-by-tid) and hash join, because each of those
> can dominate depending on how many tuples you're joining.
>
> regards, tom lane
>
> --
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: [PATCH] Improve tab completion for CREATE TABLE

2018-12-23 Thread Dagfinn Ilmari Mannsåker

Michael Paquier  writes:

> On Sat, Dec 22, 2018 at 01:33:23PM +, Dagfinn Ilmari Mannsåker wrote:
>> The CREATE and ALTER TABLE documentation calls them storage parameters,
>> so I've gone for table_storage_parameters in the attached v2 patch.
>
> Sold.  And committed.

Thanks again!

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law

Re: Speeding up text_position_next with multibyte encodings

2018-12-23 Thread Tomas Vondra




On 12/23/18 1:26 AM, Heikki Linnakangas wrote:
> On 14/12/2018 20:20, John Naylor wrote:
>> I signed up to be a reviewer, and I will be busy next month, so I went
>> ahead and fixed the typo in the patch that broke assert-enabled
>> builds. While at it, I standardized on the spelling "start_ptr" in a
>> few places to match the rest of the file. It's a bit concerning that
>> it wouldn't compile with asserts, but the patch was written by a
>> committer and seems to work.
> 
> Thanks for fixing that!
> 
>> On 10/19/18, Heikki Linnakangas  wrote:
>>> This is a win in most cases. One case is slower: calling position() with
>>> a large haystack input, where the match is near the end of the string.
>>
>> I wanted to test this case in detail, so I ran the attached script, ...
> 
> I'm afraid that script doesn't work as a performance test. The
> position() function is immutable, so the result gets cached in the plan
> cache. All you're measuring is the speed to get the constant from the
> plan cache :-(.
> 
> I rewrote the same tests with a little C helper function, attached, to
> fix that, and to eliminate any PL/pgSQL overhead. And I adjusted the
> loop counts so that the runtimes are reasonable, now that it actually
> does some work.
> 
> UTF-8
> 
> length    master    patch
> short    2.7s    10.9s
> med    2.6s    11.4s
> long    3.9s    9.9s
> 
> EUC_KR
> 
> length    master    patch
> short    2.2s    7.5s
> med    2.2s    3.5s
> long    3.4s    3.6s
> 
> This doesn't look very flattering for the patch. Although, this is
> deliberately testing the worst case.
> 
> You chose interesting characters for the UTF-8 test. The haystack is a
> repeating pattern of byte sequence EC 99 84, and the needle is a
> repeating pattern of EC 84 B1. In the 'long' test, the lengths in the
> skip table are '2', '1' and '250'. But the search bounces between the
> '2' and '1' cases, and never hits the byte that would allow it to skip
> 250 bytes. Interesting case, I had not realized that that can happen.
> But I don't think we need to put much weight on that, you could come up
> with other scenarios where the current code has skip table collisions, too.
> 
> So overall, I think it's still a worthwhile tradeoff, given that that is
> a worst case scenario. If you choose less pathological UTF-8 codepoints,
> or there is no match or the match is closer to the beginning of the
> string, the patch wins.
> 

So, what is the expected speedup in a "good/average" case? Do we have
some reasonable real-world workload mixing these cases that could be
used as a realistic benchmark?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: CF app feature request

2018-12-23 Thread Magnus Hagander

On Tue, Nov 20, 2018 at 7:19 PM Tom Lane  wrote:

> Magnus Hagander  writes:
> > I'm trying to figure out where this thread left off :) My understanding
> of
> > the consensus is we don't actually want/need a change in the app, but are
> > instead OK with the admin just handling it a somewhat ugly way in the few
> > cases where it's necessary?
>
> The original case (just a mistakenly duplicated entry) seems OK to solve
> with a quick DELETE on the underlying table.
>
> > Or is the consensus to add a "Withdrawn" status, just to solve a slightly
> > different problem from the one that started this thread?
>
> I think there is a use-case for "Withdrawn", it's more polite than
> "Rejected" ;-).  But it's not a very high-priority request.
>

Status "Withdrawn" has now been added.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: CF app feature request

2018-12-23 Thread Magnus Hagander

On Wed, Nov 21, 2018 at 12:52 AM Michael Paquier 
wrote:

> On Tue, Nov 20, 2018 at 03:30:38PM -0300, Alvaro Herrera wrote:
> > On 2018-Nov-20, Tom Lane wrote:
> > Certainly not higher than having the dropdown for entry author/reviewer
> > be sorted alphabetically ... *wink* *wink*
>
> More *wink* *wink*
>

Here's a slightly late response to that winking :P

They *are* sorted. By lastname. Should I take all this winking to mean that
people would prefer them  being sorted by firstname? :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: pgsql: Update ssl test certificates and keys

2018-12-23 Thread Michael Paquier

On Tue, Nov 27, 2018 at 02:21:39PM +, Peter Eisentraut wrote:
> Update ssl test certificates and keys
> 
> Debian testing and newer now require that RSA and DHE keys are at
> least 2048 bit long and no longer allow SHA-1 for signatures in
> certificates.  This is currently causing the ssl tests to fail there
> because the test certificates and keys have been created in violation
> of those conditions.
> 
> Update the parameters to create the test files and create a new set of
> test files.

Peter, would it make sense to back-patch this commit down to where the
SSL tests have been introduced?  If /etc/ssl/ is not correctly
configured, this results in failures across branches on Debian if the
default is used.
--
Michael


signature.asc
Description: PGP signature

Small doc tweak for array/string functions

Minor comment fix for pg_config_manual.h

Re: Performance issue in foreign-key-aware join estimation

Re: Function to track shmem reinit time

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

Re: Joins on TID

Re: [PATCH] Improve tab completion for CREATE TABLE

Re: Speeding up text_position_next with multibyte encodings

Re: CF app feature request

Re: CF app feature request

Re: pgsql: Update ssl test certificates and keys

11 matches

Site Navigation

Mail list logo

Footer information