Re: [HACKERS] 9.6 phrase search distance specification
On Thu, Aug 11, 2016 at 10:42 AM, Ryan Pedela <rped...@datalanche.com> wrote: > On Thu, Aug 11, 2016 at 9:27 AM, Oleg Bartunov <obartu...@gmail.com> > wrote: > >> On Tue, Aug 9, 2016 at 9:59 PM, Ryan Pedela <rped...@datalanche.com> >> wrote: >> > >> > >> >> > I would say that it is worth it to have a "phrase slop" operator >> (Apache >> > Lucene terminology). Proximity search is extremely useful for improving >> > relevance and phrase slop is one of the tools to achieve that. >> > >> >> It'd be great if you explain what is "phrase slop". I assume it's not >> about search, but about relevance. >> > > Sure. An exact phrase query has slop = 0 which means find all terms in the > exact positions relative to each other. Phrase query with slop > 0 means > find all terms within positions relative to each other. If slop = > 10, find all terms within 10 positions of each other. Here is a concrete > example from my current work searching SEC filings. > > Bill Gates' full legal name is William H. Gates, III. In the SEC database > [1], his name is GATES WILLIAM H III. If you are searching the records of > people within the SEC database and you want to find Bill Gates, most users > will type "bill gates". Since there are many people with the first name > Bill (William) and the last name Gates, Bill Gates most likely won't be the > first result with a standard keyword query. Likewise an exact phrase query > (slop = 0) will not find him either because the first and last names are > transposed. What you need is a phrase query with a slop = 2 which will > match "William Gates", "William H Gates", "Gates William", etc. There is > still the issue of Bill vs William, but that can be solved with synonyms > and is a different topic. > > 1. https://www.sec.gov/cgi-bin/browse-edgar?CIK=902012 > =exclude=getcompany=Search > One more thing. In that trivial example, an AND query would probably do a great job too. However if you are searching for Bill Gates in large text documents rather than a list of names, an AND query will not give you very good results because the words "bill" and "gates" are so common.
Re: [HACKERS] 9.6 phrase search distance specification
On Thu, Aug 11, 2016 at 9:27 AM, Oleg Bartunov <obartu...@gmail.com> wrote: > On Tue, Aug 9, 2016 at 9:59 PM, Ryan Pedela <rped...@datalanche.com> > wrote: > > > > > > > I would say that it is worth it to have a "phrase slop" operator (Apache > > Lucene terminology). Proximity search is extremely useful for improving > > relevance and phrase slop is one of the tools to achieve that. > > > > It'd be great if you explain what is "phrase slop". I assume it's not > about search, but about relevance. > Sure. An exact phrase query has slop = 0 which means find all terms in the exact positions relative to each other. Phrase query with slop > 0 means find all terms within positions relative to each other. If slop = 10, find all terms within 10 positions of each other. Here is a concrete example from my current work searching SEC filings. Bill Gates' full legal name is William H. Gates, III. In the SEC database [1], his name is GATES WILLIAM H III. If you are searching the records of people within the SEC database and you want to find Bill Gates, most users will type "bill gates". Since there are many people with the first name Bill (William) and the last name Gates, Bill Gates most likely won't be the first result with a standard keyword query. Likewise an exact phrase query (slop = 0) will not find him either because the first and last names are transposed. What you need is a phrase query with a slop = 2 which will match "William Gates", "William H Gates", "Gates William", etc. There is still the issue of Bill vs William, but that can be solved with synonyms and is a different topic. 1. https://www.sec.gov/cgi-bin/browse-edgar?CIK=902012=exclude= getcompany=Search Thanks, Ryan
Re: [HACKERS] 9.6 phrase search distance specification
On Tue, Aug 9, 2016 at 12:59 PM, Ryan Pedela <rped...@datalanche.com> wrote: > > > Thanks, > > Ryan Pedela > Datalanche CEO, founder > www.datalanche.com > > On Tue, Aug 9, 2016 at 11:58 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Bruce Momjian <br...@momjian.us> writes: >> > Does anyone know why the phrase distance "<3>" was changed from "at most >> > three tokens away" to "exactly three tokens away"? >> >> So that it would correctly support phraseto_tsquery's use of the operator >> to represent omitted words (stopwords) in a phrase. >> >> I think there's probably some use in also providing an operator that does >> "at most this many tokens away", but Oleg/Teodor were evidently less >> excited, because they didn't take the time to do it. >> >> The thread where this change was discussed is >> >> https://www.postgresql.org/message-id/flat/c19fcfec308e6ccd9 >> 52cdde9e648b505%40mail.gmail.com >> >> see particularly >> >> https://www.postgresql.org/message-id/11252.1465422251%40sss.pgh.pa.us > > > I would say that it is worth it to have a "phrase slop" operator (Apache > Lucene terminology). Proximity search is extremely useful for improving > relevance and phrase slop is one of the tools to achieve that. > > Sorry for the position of my signature Ryan
Re: [HACKERS] 9.6 phrase search distance specification
Thanks, Ryan Pedela Datalanche CEO, founder www.datalanche.com On Tue, Aug 9, 2016 at 11:58 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Bruce Momjian <br...@momjian.us> writes: > > Does anyone know why the phrase distance "<3>" was changed from "at most > > three tokens away" to "exactly three tokens away"? > > So that it would correctly support phraseto_tsquery's use of the operator > to represent omitted words (stopwords) in a phrase. > > I think there's probably some use in also providing an operator that does > "at most this many tokens away", but Oleg/Teodor were evidently less > excited, because they didn't take the time to do it. > > The thread where this change was discussed is > > https://www.postgresql.org/message-id/flat/c19fcfec308e6ccd952cdde9e648b5 > 05%40mail.gmail.com > > see particularly > > https://www.postgresql.org/message-id/11252.1465422251%40sss.pgh.pa.us I would say that it is worth it to have a "phrase slop" operator (Apache Lucene terminology). Proximity search is extremely useful for improving relevance and phrase slop is one of the tools to achieve that.
Re: [HACKERS] Calling json_* functions with JSONB data
On Mon, May 23, 2016 at 11:14 AM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Mon, May 23, 2016 at 12:55 PM, Peter van Hardenbergwrote: > >> Hi there, >> >> I noticed it was very easy to accidentally call the json_* form of JSON >> manipulation functions with jsonb data as input. This is pretty >> sub-optimal, since it involves rendering the jsonb then reparsing it and >> calling the json_* form of the function. >> >> Fortunately, this seems quite easy to resolve by taking advantage of our >> ability to add json_*(jsonb) form of the functions. >> >> I talked this over with Andrew who had no objections and suggested I >> float it on the list before writing a patch. Looks pretty straightforward, >> just a few new data rows in pg_proc.h. >> >> Anyone have any concerns or suggestions? >> >> > Please provide an example of what you are talking about. > > SELECT json_array_length('[1,2]'::jsonb) > ERROR: function json_array_length(jsonb) does not exist > > -- The function name is "jsonb_array_length"; and there is no implicit > cast between the two. > He is saying that he accidentally calls json_array_length() instead of jsonb_array_length() and that it is an annoying usability problem. It happens to me too and I agree it would be better if you could just call json_array_length() regardless if the type is JSON or JSONB. If there is some significant functionality difference from the user's perspective then having separate "json_" and "jsonb_" functions makes sense, but in your example there is not.
Re: [HACKERS] Html parsing and inline elements
On Wed, Apr 13, 2016 at 9:57 AM, Marcelo Zabani <mzab...@gmail.com> wrote: > Hi, Tom, > > You're right, I don't think one can argue that the default parser should > know HTML. > How about your suggestion of there being an HTML parser, is it feasible? I > ask this because I think that a lot of people store HTML documents these > days, and although there probably aren't lots of HTML with words written > along multiple inline elements, it would certainly be nice to have a proper > parser for these use cases. > > What do you think? > I recommend using Apache Tika [1] for plain text extraction from HTML. There are so many weird edge cases when parsing HTML that it is easier to use something that is already mature than reinventing the wheel. 1. https://tika.apache.org/ Thanks, Ryan Pedela
Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text
On Tue, Apr 26, 2016 at 10:49 AM, Stephen Frost <sfr...@snowman.net> wrote: > * Ryan Pedela (rped...@datalanche.com) wrote: > > On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehr...@jackdb.com> > wrote: > > > The default text representation of jsonb adds whitespace in between > > > key/value pairs (after the colon ":") and after successive properties > > > (after the comma ","): > > [...] > > > > It'd be nice to have a stable text representation of a jsonb value with > > > minimal whitespace. The latter would also save a few bytes per record > in > > > text output formats, on the wire, and in backups (ex: COPY ... TO > STDOUT). > > > > +1 > > > > I cannot comment on the patch itself, but I welcome jsonb_compact() or > some > > way to get JSON with no inserted whitespace. > > As I mentioned to Sehrope on IRC, at least for my 2c, if you want a > compact JSON format to reduce the amount of traffic over the wire or to > do things with on the client side, we should probably come up with a > binary format, rather than just hack out the whitespace. It's not like > representing numbers using ASCII characters is terribly efficient > either. Why build a Ferrari when a skateboard would suffice? Besides, that doesn't help one of the most common cases for JSONB: REST APIs. Now that PG fully supports JSON, a user can use PG to construct the JSON payload of a REST API request. Then the web server would simply be a pass-through for the JSON payload. I personally have this use case, it is not hypothetical. However currently, a user must parse the JSON string from PG and re-stringify it to minimize the whitespace. Given that HTTP is text-based, removing all extraneous whitespace is the best way to compress it, and on top of that you can do gzip compression. Unless you are suggesting that the binary format is just a gzipped version of the minimized text format, I don't see how a binary format helps at all in the REST API case. In addition, every JSON implementation I have ever seen fully minimizes JSON by default. PG appears to deviate from standard practice for no apparent reason.
Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text
On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuniwrote: > Hi, > > The default text representation of jsonb adds whitespace in between > key/value pairs (after the colon ":") and after successive properties > (after the comma ","): > > postgres=# SELECT '{"b":2,"a":1}'::jsonb::text; >text > -- > {"a": 1, "b": 2} > (1 row) > > AFAIK, there's also no guarantee on the specific order of the resulting > properties in the text representation either. I would suppose it's fixed > for a given jsonb value within a database major version but across major > versions it could change (if the underlying representation changes). > > I originally ran into this when comparing the hashes of the text > representation of jsonb columns. The results didn't match up because the > javascript function JSON.stringify(...) does not add any extra whitespace. > > It'd be nice to have a stable text representation of a jsonb value with > minimal whitespace. The latter would also save a few bytes per record in > text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT). > +1 I cannot comment on the patch itself, but I welcome jsonb_compact() or some way to get JSON with no inserted whitespace.
Re: [HACKERS] No Issue Tracker - Say it Ain't So!
On Mon, Sep 28, 2015 at 4:09 PM, Jim Nasbywrote: > On 9/28/15 11:43 AM, Andres Freund wrote: > >> On 2015-09-28 09:41:18 -0700, David Fetter wrote: >> >>> Since you're convinced that this is an unqualified win, please put >>> together a project plan for switching from our current system to >>> Github. >>> >> >> Err, no. That's a waste of all our time. >> >> It has been stated pretty clearly in this thread by a number of senior >> community people that we're not going to use a closed source system. >> > > GitLab OTOH is released under a MIT license, so it is an option. I don't > know how it compares to other suggested options, but if YUriy wants to > propose it it's at least a viable option. I haven't used Gitlab extensively, but it has a feature set similar to Github and then some [1]. The OSS project does seem active [2], but it is still relatively new. 1. https://about.gitlab.com/better-than-github/ 2. https://gitlab.com/gitlab-org/gitlab-ce
Re: [HACKERS] No Issue Tracker - Say it Ain't So!
Kam Lasater wrote: > I'd suggest: Github Issues, Pivotal Tracker or Redmine (probably in > that order). There are tens to hundreds of other great ones out there, > I'm sure one of them would also work. Why not just use Github issues? 1. You can set it up to send emails to the list when an issue is created or updated. 2. Replies to the email will automatically update the issue on Github. 3. Github is where most of the OSS activity happens now. 4. If Github goes away, so what? The core workflow never changes. Thanks, Ryan Pedela
Re: [HACKERS] [PATCH] Generalized JSON output functions
On Wed, Jul 15, 2015 at 11:10 AM, Ryan Pedela rped...@datalanche.com wrote: On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com wrote: FWIW, I don't agree. If it's not easy to read the JSON that PostgreSQL generates using JavaScript, then a lot of people are just going to give up on doing it, and IMO that would be sad. Telling people that they have to parse the JSON using some parser other than the one built into their JavaScript engine, whack it around, and then render it as text and parse it again is not really an acceptable answer. The vast majority of Javascript users are going to be using Node.js when they connect to Postgres if only for security reasons. If they use Node, they will be using node-postgres [1] or something that builds on top of it. For int64 and numerics in a row, the default is to return a string, and there is a flag you can set to round returned numbers if you prefer. There is also a way to override the default parsing of each Postgres type [2]. So in the case of JSON using my json-bignum module [3], the code looks like this: var pgTypes = require('pg').types; var bignumJSON = require('json-bignum'); types.setTypeParser(JSON_TYPE_OID, function (value) { return bignumJSON.parse(value); }); types.setTypeParser(JSONB_TYPE_OID, function (value) { return bignumJSON.parse(value); }); To me that code is super simple, and no a pain in the ass. In other words, it is not Telling people that they have to parse the JSON using some parser other than the one built into their JavaScript engine, whack it around, and then render it as text and parse it again. Like I said previously, the situation with Javascript will hopefully be remedied in a few years with ES7 anyway. 1. https://github.com/brianc/node-postgres 2. https://github.com/brianc/node-pg-types 3. https://github.com/datalanche/json-bignum On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com wrote: The reason why the logical decoding stuff allows multiple output formats is because Andres, quite correctly, foresaw that different people would need different output formats. He could have designed that system to output only one output format and just said, everybody's got to read and parse this, but that would have been slow. Instead, he tried to set things up so that you could get the output in the format that was most convenient for your client, whatever that is. On this thread, we're back-pedaling from that idea: sorry, you can get JSON output, but if you want JSON output that will be properly interpreted by your JSON parser, you can't have that. Regardless of the details of this particular patch, I can't endorse that approach. If we want people to use our software, we need to meet them where they are at, especially when we are only (IIUC) talking about inserting a few extra quotation marks. I would be okay with a generic way to specify output formats if there are many use cases beyond Javascript and JSON. I vaguely remember someone suggesting a FORMAT clause on CREATE TABLE which would specify how a particular column would output from a SELECT. For example, returning a date with a non-ISO format. I liked that idea. However if the only reason for different output formats is Javascript, that is silly. I have a very long list of feature requests that would probably only be beneficial to me or a handful of users. Should we implement them? No, of course not! If we did that Postgres would cease to be the best open-source database. You can't have the best product and say yes to everything. Feature creep is the enemy of quality. If Javascript is the sole reason for supporting multiple output formats, then that is the definition of feature creep in my opinion. If there are many use cases beyond Javascript and JSON, then that is different and a conversation worth having. Bottom line: Large numbers are a pain to deal with in Javascript regardless of where they come from or what format they are in. Adding code to Postgres core will never change that.
Re: [HACKERS] [PATCH] Generalized JSON output functions
On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com wrote: FWIW, I don't agree. If it's not easy to read the JSON that PostgreSQL generates using JavaScript, then a lot of people are just going to give up on doing it, and IMO that would be sad. Telling people that they have to parse the JSON using some parser other than the one built into their JavaScript engine, whack it around, and then render it as text and parse it again is not really an acceptable answer. The vast majority of Javascript users are going to be using Node.js when they connect to Postgres if only for security reasons. If they use Node, they will be using node-postgres [1] or something that builds on top of it. For int64 and numerics in a row, the default is to return a string, and there is a flag you can set to round returned numbers if you prefer. There is also a way to override the default parsing of each Postgres type [2]. So in the case of JSON using my json-bignum module [3], the code looks like this: var pgTypes = require('pg').types; var bignumJSON = require('json-bignum'); types.setTypeParser(JSON_TYPE_OID, function (value) { return bignumJSON.parse(value); }); types.setTypeParser(JSONB_TYPE_OID, function (value) { return bignumJSON.parse(value); }); To me that code is super simple, and no a pain in the ass. In other words, it is not Telling people that they have to parse the JSON using some parser other than the one built into their JavaScript engine, whack it around, and then render it as text and parse it again. Like I said previously, the situation with Javascript will hopefully be remedied in a few years with ES7 anyway. 1. https://github.com/brianc/node-postgres 2. https://github.com/brianc/node-pg-types 3. https://github.com/datalanche/json-bignum On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com wrote: The reason why the logical decoding stuff allows multiple output formats is because Andres, quite correctly, foresaw that different people would need different output formats. He could have designed that system to output only one output format and just said, everybody's got to read and parse this, but that would have been slow. Instead, he tried to set things up so that you could get the output in the format that was most convenient for your client, whatever that is. On this thread, we're back-pedaling from that idea: sorry, you can get JSON output, but if you want JSON output that will be properly interpreted by your JSON parser, you can't have that. Regardless of the details of this particular patch, I can't endorse that approach. If we want people to use our software, we need to meet them where they are at, especially when we are only (IIUC) talking about inserting a few extra quotation marks. I would be okay with a generic way to specify output formats if there are many use cases beyond Javascript and JSON. I vaguely remember someone suggesting a FORMAT clause on CREATE TABLE which would specify how a particular column would output from a SELECT. For example, returning a date with a non-ISO format. I liked that idea. However if the only reason for different output formats is Javascript, that is silly. I have a very long list of feature requests that would probably only be beneficial to me or a handful of users. Should we implement them? No, of course not! If we did that Postgres would cease to be the best open-source database. You can't have the best product and say yes to everything. Feature creep is the enemy of quality. If Javascript is the sole reason for supporting multiple output formats, then that is the definition of feature creep in my opinion. If there are many use cases beyond Javascript and JSON, then that is different and a conversation worth having.
Re: [HACKERS] [PATCH] Generalized JSON output functions
On Mon, Jul 13, 2015 at 1:30 AM, Shulgin, Oleksandr oleksandr.shul...@zalando.de wrote: To reiterate: for my problem, that is escaping numerics that can potentially overflow[1] under ECMAScript standard, I want to be able to override the code that outputs the numeric converted to string. There is no way in current implementation to do that *at all*, short of copying all the code involved in producing JSON output and changing it at certain points. One could try re-parsing JSON instead, but that doesn't actually solve the issue, because type information is lost forever at that point. I had the exact same problem with Node.js and client-side Javascript. That is why I wrote json-bignum [1] for Node.js. There is a bower version [2] as well. The only caveat is that it is slower than the native JSON functions, but I am happy to receive PRs to improve performance. 1. https://github.com/datalanche/json-bignum 2. https://libraries.io/bower/json-bignum As far as large numbers in JSON, I think Postgres is doing the right thing and should not be changed. It is Javascript that is stupid here, and I don't think it is wise to add something to core just because one client does stupid things with large numbers. In addition, ES7 is introducing value types which will hopefully solve the large number problem in Javascript. The random whitespace issue is valid in my opinion and should be fixed. Thanks, Ryan Pedela
Re: [HACKERS] [PATCH] Generalized JSON output functions
On Fri, May 22, 2015 at 10:51 AM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, May 22, 2015 at 9:43 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andrew Dunstan wrote: On 05/20/2015 09:16 AM, Shulgin, Oleksandr wrote: Attached is a patch against master to generalize the JSON-producing functions in utils/adt/json.c and to provide a set of callbacks which can be overridden the same way that is already provided for *parsing* JSON. I'm not necessarily opposed to this, but it sure seems like a lot of changes, and moderately invasive ones, to support something that could be done, at the cost of reparsing, with a simple loadable extension that I could create in a few hours of programming. But this seems like a pretty reasonable change to make, no? Doesn't the total amount of code decrease after this patch? JSON stuff is pretty new so some refactoring and generalization of what we have is to be expected. Yeah. Also, there have been a few previous gripes about this, for example, http://www.postgresql.org/message-id/cahbvmpzs+svr+y-ugxjrq+xw4dqtevl-cozc69zffwmxjck...@mail.gmail.com . As noted, I definitely prefer 'space free' by default for efficiency reasons, but standardizing the output has definitely got to be a reasonable goal. Every JSON implementation I have ever used defaults to the minified version of JSON (no whitespace) when printed.
Re: [HACKERS] jsonb concatenate operator's semantics seem questionable
On Wed, May 20, 2015 at 12:34 PM, Andrew Dunstan and...@dunslane.net wrote: So Dmitry, at my suggestion, has come up with a way of doing that, by adding a parameter to jsonb_replace(). If this parameter is set to true (it defaults to false) and the key or array element pointed to by the last element of the path doesn't exist, it gets created. +1
Re: [HACKERS] jsonb concatenate operator's semantics seem questionable
On Sun, May 17, 2015 at 9:41 PM, Josh Berkus j...@agliodbs.com wrote: Is there a particular reason why + makes more sense as shallow concatination and || makes more sense as deep concatination? Like, something in JS or other client languages which would make that preference make more sense to users? As someone who uses JSON day-to-day in Javascript and Python, I personally don't think || or + matters much. Python uses json.loads() for JSON concat and you have use a 3rd-party library in Javascript if you want that functionality such as JQuery.extends(). I agree with Peter that we need deep concatenation, but I don't think there is any standard for the operator. I think the word shallow should be added to the docs though. What is far more important than shallow or deep concatenation for the document database use case is being able to delete or replace/update a specific, nested path in the JSON object. It looks like that is possible with the minus operator and jsonb_replace(). This is great, however it took me awhile to figure out the path syntax. I think adding a paragraph to the docs explaining the path syntax would help. Ryan Pedela
Re: [HACKERS] jsonb concatenate operator's semantics seem questionable
On Mon, May 18, 2015 at 8:41 AM, Ryan Pedela rped...@datalanche.com wrote: On Sun, May 17, 2015 at 9:41 PM, Josh Berkus j...@agliodbs.com wrote: Is there a particular reason why + makes more sense as shallow concatination and || makes more sense as deep concatination? Like, something in JS or other client languages which would make that preference make more sense to users? As someone who uses JSON day-to-day in Javascript and Python, I personally don't think || or + matters much. Python uses json.loads() for JSON concat and you have use a 3rd-party library in Javascript if you want that functionality such as JQuery.extends(). I agree with Peter that we need deep concatenation, but I don't think there is any standard for the operator. I think the word shallow should be added to the docs though. What is far more important than shallow or deep concatenation for the document database use case is being able to delete or replace/update a specific, nested path in the JSON object. It looks like that is possible with the minus operator and jsonb_replace(). This is great, however it took me awhile to figure out the path syntax. I think adding a paragraph to the docs explaining the path syntax would help. Having looked at this more, I think I understand the problem Peter has identified and it is a significant usability problem in my opinion. I think the word concatenate has confused me because I think of it as a higher-level operation when I want to merge two, large JSON objects which isn't a very common operation, at least for me. What is absolutely required for the document database use case is the following: 1. Get element at any arbitrary path. ( # operator ) 2. Delete any arbitrary path. ( minus operator ) 3. Replace/update element at any arbitrary path. ( jsonb_replace ) 4. Add element to any arbitrary path. ( ? ) It is #4 that does not seem to exist unless jsonb_replace() creates the specified path if it does not exist. Does it? I am not currently at my desk to test it myself. If not, deep concatenation would solve this problem, but I can also see another solution. Use + for shallow concatenation since it really means add element to top-level path as Peter suggests. Then add another function: jsonb_add( target jsonb, path text[], new jsonb ) to add element at any arbitrary path. Then leave || for deep concatenation in 9.6 or whenever. If jsonb_replace() satisfies #4 then I think everything is fine. Without #4 however, jsonb would remain an incomplete document database solution in my opinion. Thanks, Ryan Pedela
Re: [HACKERS] jsonb concatenate operator's semantics seem questionable
On Mon, May 18, 2015 at 12:24 PM, Josh Berkus j...@agliodbs.com wrote: On 05/18/2015 08:57 AM, Ryan Pedela wrote: If not, deep concatenation would solve this problem, but I can also see another solution. Use + for shallow concatenation since it really means add element to top-level path as Peter suggests. Then add another function: jsonb_add( target jsonb, path text[], new jsonb ) to add element at any arbitrary path. Then leave || for deep concatenation in 9.6 or whenever. Since swapping the operator seems still on the table, is there any particular reason why you think + is more suited to shallow concatination? Both you and Peter have said this, but as a heavy user of JSON/JSONB, to me it seems the other way around. That is, + says add to arbitrary nested node to me more than || does. Let me back up a little. I always like to think about what is the ideal interface first and then worry about implementation because implementation can always be changed but interface can't. I think the current concat/merge interface is the ideal. It should be || because that means concat/merge everywhere else in the PG interface that I am aware of. In the case of JSON which is a hierarchically data structure, it should be implemented as a deep merge which by definition satisfies a shallow merge. This is what I would expect as a user and I would think there was a bug if it didn't perform deep merge. I expect this because I can implement shallow merge easily myself using Javascript, Python, etc but deep merge is non-trivial. Therefore I would expect a special JSON concat/merge library function to do deep merge. I would rather the interface stay the same and it documented that the current implementation is a shallow merge and may become a deep merge in the future. In the context of splitting shallow and deep merge into two operators, I think + is better for shallow and || better for deep. The reason for + is because many programming languages have this behavior. If I see the below code in language I have never used before: objC = objA + objB My default assumption is that + performs a shallow merge. Like I said, I would rather there just be one operator. If jsonb_replace() satisfies #4 then I think everything is fine. Without #4 however, jsonb would remain an incomplete document database solution in my opinion. Oh, no question, we're still incomplete. Aside from nested append, we kinda lack easy sharded scale-out, which is a rather more major feature, no? I think it depends on the point of view which is more important. If you have a massive dataset, then obviously sharding is more important. But my own take on why NoSQL became so popular has only a little to do with sharding. MongoDB pitched to tech entrepreneurs use our database and implement your MVP 10x faster/easier and we have sharding when you become the next Google. And it worked brilliantly. Many tech entrepreneurs are worried about time constraints and dream of becoming the next Google (myself included). But the reality is that most fail and the majority who don't fail achieve moderate success, only a handful reach Google-level success. Therefore the vast majority end up never needing sharding, but they all experience that advertised 10x development speed improvement. I doubt it really is 10x, but JSON maps very well to programming language data structures (no impedence mismatch) so it is usually faster to build prototypes with MongoDB. If jsonb supported nested append, then I think that would be enough for people who care most about development speed which I think is a larger group than the group with massive datasets. In addition, sharding seems like a server-level or database-level issue rather than a data type issue.
Re: [HACKERS] deparsing utility commands
On Wed, Mar 25, 2015 at 11:59 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: * Should we prohibit DDL from within event triggers? Please don't prohibit DDL unless there is a really, really good reason to do so. I have several use cases in mind for event triggers, but they are only useful if I can perform DDL. For example, I want to create an Elasticsearch FDW and use that to index and search Postgres tables. When a table is created, I am planning to use an event trigger to capture the CREATE event and automatically create a foreign table via the Elasticsearch FDW. In addition, I would add normal triggers to the new table which capture DML and update the foreign table accordingly. In other words, I want to use FDWs and event triggers to automatically sync table DDL and DML to Elasticsearch.
Re: [HACKERS] deparsing utility commands
On Tue, Feb 24, 2015 at 6:48 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Stephen Frost wrote: I'm thinking something like SELECT * FROM pg_creation_commands({'pg_class'::regclass, 'sometable'::pg_class}); would return a set of commands in the JSON-blob format that creates the table. The input argument is an array of object addresses, so that you can request creation commands for multiple objects. (It's not my intention to provide this functionality right away, but if somebody else wants to work on top of the current deparse patch, by my guest; if it proves simple enough we can still get it into 9.5 as part of this patch.) +1 Another possible function could be to diff two relations to produce a set of DDL commands that could be used for schema migrations. Also thank you very much! CREATE/ALTER support in event triggers is the upcoming feature I am most excited about, and I am happy to see progress. Thanks, Ryan Pedela
Re: [HACKERS] PL/pgSQL 2
If PL/Javascript is a serious consideration, how will int64 and numeric be handled? Thanks, Ryan Pedela Datalanche CEO, co-founder www.datalanche.com rped...@datalanche.com 513-571-6837 On Tue, Sep 2, 2014 at 6:38 AM, Andrew Dunstan and...@dunslane.net wrote: On 09/02/2014 05:44 AM, Álvaro Hernández Tortosa wrote: On 02/09/14 11:34, Mark Kirkwood wrote: On 02/09/14 21:25, Álvaro Hernández Tortosa wrote: On 02/09/14 05:24, Craig Ringer wrote: I couldn't disagree more. If we were to implement anything, it'd be PL/PSM (http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as bizarre and quirky as anything else the SQL committee has brought forth, but it's at least a standard(ish) language. So we'd choose a bizarre and quirky language instead of anything better just because it's standard. I'm sure current and prospective users will surely prefer a bizarre and quirky language that is standard approved, rather than a modern, comfortable, easy-to-use, that is not embodied by the ISO. No doubt ^_^ Well there is the risk that by randomly adding new syntax to PL/pgSQL we turn it in a bizarre and quirky *non standard* language. Part of the attraction of PL/pgsql is that it is Ada like - if we break that too much then...well...that would be bad. So I think a careful balance is needed, to add new features that keep the spirit of the original language. I agree. I think I haven't suggested adding new syntax to pl/pgsql. But having its syntax similar to ADA is IMHO not something good. I'm sure few prospective postgres users would be compelled to that. They are compelled about JavaScript, python, Scala or Ruby, to name a few, but definitely not ADA. Just as a small nit pick - the name of the language is not ADA, but Ada. It isn't an acronym. The language is named after Ada Lovelace, arguably the world's first programmer. If you're not familiar with modern Ada, let me recommend the newly published Programming in Ada 2012 by John Barnes. But I digress. JavaScript would actually be quite a good alternative. However, using it involves something others have objected to, namely calling SQL via a function call. It's true that plpgsql lets you call SQL commands without explicitly invoking SPI. OTOH, it actually relies on SPI under the hood a lot more that other PLs, which I have little doubt is responsible for timings like this: andrew=# do $$ declare x int = 1; i int = 1; begin while i 1000 loop i := i + 1; x := x + 46; end loop; raise notice ' x = %',x; end; $$; NOTICE: x = 45955 DO Time: 13222.195 ms andrew=# do $$ var x = 1; var i = 1; while (i 1000) { i += 1; x += 46; } plv8.elog(NOTICE, x = + x); $$ language plv8; NOTICE: x = 45955 DO Time: 27.976 ms But I'm not suggesting we should implement a Javascript PL in core either. Finally, +1 to Tom's suggestion upthread that we implement different behaviours via pragmas rather than some new offshoot language. Maybe a GUC could specify a default set of such pragmas, so you wouldn't need to decorate every function with them. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] JSON Patch (RFC 6902) support?
This is my first email to the PostgreSQL mailing lists so I hope this is the correct place. If not, please let me know. I was wondering if it would be possible and wise to support JSON Patch? https://tools.ietf.org/html/rfc6902 One of the problems I have as a user is how to update a portion of a JSON object efficiently. Right now I have to read the entire field from the database, update it, and then write it back. I am thinking JSON Patch might be a good way to solve this problem because it would allow partial updates and I think it could easily fit into the existing set of JSON functions such as: // applies a JSON Patch json_patch_apply(json, patch) // diffs two JSON objects and produces a JSON Patch json_patch_diff(json a, json b) Thanks, Ryan Pedela