Re: [HACKERS] 9.6 phrase search distance specification

2016-08-11 Thread Ryan Pedela
On Thu, Aug 11, 2016 at 10:42 AM, Ryan Pedela <rped...@datalanche.com>
wrote:

> On Thu, Aug 11, 2016 at 9:27 AM, Oleg Bartunov <obartu...@gmail.com>
> wrote:
>
>> On Tue, Aug 9, 2016 at 9:59 PM, Ryan Pedela <rped...@datalanche.com>
>> wrote:
>> >
>> >
>>
>> >  I would say that it is worth it to have a "phrase slop" operator
>> (Apache
>> > Lucene terminology). Proximity search is extremely useful for improving
>> > relevance and phrase slop is one of the tools to achieve that.
>> >
>>
>> It'd be great if you explain what is "phrase slop". I assume it's not
>> about search, but about relevance.
>>
>
> Sure. An exact phrase query has slop = 0 which means find all terms in the
> exact positions relative to each other. Phrase query with slop > 0 means
> find all terms within  positions relative to each other. If slop =
> 10, find all terms within 10 positions of each other. Here is a concrete
> example from my current work searching SEC filings.
>
> Bill Gates' full legal name is William H. Gates, III. In the SEC database
> [1], his name is GATES WILLIAM H III. If you are searching the records of
> people within the SEC database and you want to find Bill Gates, most users
> will type "bill gates". Since there are many people with the first name
> Bill (William) and the last name Gates, Bill Gates most likely won't be the
> first result with a standard keyword query. Likewise an exact phrase query
> (slop = 0) will not find him either because the first and last names are
> transposed. What you need is a phrase query with a slop = 2 which will
> match "William Gates", "William H Gates", "Gates William", etc. There is
> still the issue of Bill vs William, but that can be solved with synonyms
> and is a different topic.
>
> 1. https://www.sec.gov/cgi-bin/browse-edgar?CIK=902012
> =exclude=getcompany=Search
>


One more thing. In that trivial example, an AND query would probably do a
great job too. However if you are searching for Bill Gates in large text
documents rather than a list of names, an AND query will not give you very
good results because the words "bill" and "gates" are so common.


Re: [HACKERS] 9.6 phrase search distance specification

2016-08-11 Thread Ryan Pedela
On Thu, Aug 11, 2016 at 9:27 AM, Oleg Bartunov <obartu...@gmail.com> wrote:

> On Tue, Aug 9, 2016 at 9:59 PM, Ryan Pedela <rped...@datalanche.com>
> wrote:
> >
> >
>
> >  I would say that it is worth it to have a "phrase slop" operator (Apache
> > Lucene terminology). Proximity search is extremely useful for improving
> > relevance and phrase slop is one of the tools to achieve that.
> >
>
> It'd be great if you explain what is "phrase slop". I assume it's not
> about search, but about relevance.
>

Sure. An exact phrase query has slop = 0 which means find all terms in the
exact positions relative to each other. Phrase query with slop > 0 means
find all terms within  positions relative to each other. If slop =
10, find all terms within 10 positions of each other. Here is a concrete
example from my current work searching SEC filings.

Bill Gates' full legal name is William H. Gates, III. In the SEC database
[1], his name is GATES WILLIAM H III. If you are searching the records of
people within the SEC database and you want to find Bill Gates, most users
will type "bill gates". Since there are many people with the first name
Bill (William) and the last name Gates, Bill Gates most likely won't be the
first result with a standard keyword query. Likewise an exact phrase query
(slop = 0) will not find him either because the first and last names are
transposed. What you need is a phrase query with a slop = 2 which will
match "William Gates", "William H Gates", "Gates William", etc. There is
still the issue of Bill vs William, but that can be solved with synonyms
and is a different topic.

1. https://www.sec.gov/cgi-bin/browse-edgar?CIK=902012=exclude=
getcompany=Search

Thanks,
Ryan


Re: [HACKERS] 9.6 phrase search distance specification

2016-08-09 Thread Ryan Pedela
On Tue, Aug 9, 2016 at 12:59 PM, Ryan Pedela <rped...@datalanche.com> wrote:

>
>
> Thanks,
>
> Ryan Pedela
> Datalanche CEO, founder
> www.datalanche.com
>
> On Tue, Aug 9, 2016 at 11:58 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> Bruce Momjian <br...@momjian.us> writes:
>> > Does anyone know why the phrase distance "<3>" was changed from "at most
>> > three tokens away" to "exactly three tokens away"?
>>
>> So that it would correctly support phraseto_tsquery's use of the operator
>> to represent omitted words (stopwords) in a phrase.
>>
>> I think there's probably some use in also providing an operator that does
>> "at most this many tokens away", but Oleg/Teodor were evidently less
>> excited, because they didn't take the time to do it.
>>
>> The thread where this change was discussed is
>>
>> https://www.postgresql.org/message-id/flat/c19fcfec308e6ccd9
>> 52cdde9e648b505%40mail.gmail.com
>>
>> see particularly
>>
>> https://www.postgresql.org/message-id/11252.1465422251%40sss.pgh.pa.us
>
>
>  I would say that it is worth it to have a "phrase slop" operator (Apache
> Lucene terminology). Proximity search is extremely useful for improving
> relevance and phrase slop is one of the tools to achieve that.
>
>
Sorry for the position of my signature

Ryan


Re: [HACKERS] 9.6 phrase search distance specification

2016-08-09 Thread Ryan Pedela
Thanks,

Ryan Pedela
Datalanche CEO, founder
www.datalanche.com

On Tue, Aug 9, 2016 at 11:58 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Bruce Momjian <br...@momjian.us> writes:
> > Does anyone know why the phrase distance "<3>" was changed from "at most
> > three tokens away" to "exactly three tokens away"?
>
> So that it would correctly support phraseto_tsquery's use of the operator
> to represent omitted words (stopwords) in a phrase.
>
> I think there's probably some use in also providing an operator that does
> "at most this many tokens away", but Oleg/Teodor were evidently less
> excited, because they didn't take the time to do it.
>
> The thread where this change was discussed is
>
> https://www.postgresql.org/message-id/flat/c19fcfec308e6ccd952cdde9e648b5
> 05%40mail.gmail.com
>
> see particularly
>
> https://www.postgresql.org/message-id/11252.1465422251%40sss.pgh.pa.us


 I would say that it is worth it to have a "phrase slop" operator (Apache
Lucene terminology). Proximity search is extremely useful for improving
relevance and phrase slop is one of the tools to achieve that.


Re: [HACKERS] Calling json_* functions with JSONB data

2016-05-23 Thread Ryan Pedela
On Mon, May 23, 2016 at 11:14 AM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Mon, May 23, 2016 at 12:55 PM, Peter van Hardenberg  wrote:
>
>> Hi there,
>>
>> I noticed it was very easy to accidentally call the json_* form of JSON
>> manipulation functions with jsonb data as input. This is pretty
>> sub-optimal, since it involves rendering the jsonb then reparsing it and
>> calling the json_* form of the function.
>>
>> Fortunately, this seems quite easy to resolve by taking advantage of our
>> ability to add json_*(jsonb) form of the functions.
>>
>> I talked this over with Andrew who had no objections and suggested I
>> float it on the list before writing a patch. Looks pretty straightforward,
>> just a few new data rows in pg_proc.h.
>>
>> Anyone have any concerns or suggestions?
>>
>>
> Please provide an example of what you are talking about.
>
> SELECT json_array_length('[1,2]'::jsonb)
> ERROR: function json_array_length(jsonb) does not exist
>
> -- The function name is "jsonb_array_length"; and there is no implicit
> cast between the two.
>

He is saying that he accidentally calls json_array_length() instead of
jsonb_array_length()
and that it is an annoying usability problem. It happens to me too and I
agree it would be better if you could just call json_array_length()
regardless if the type is JSON or JSONB. If there is some significant
functionality difference from the user's perspective then having separate
"json_" and "jsonb_" functions makes sense, but in your example there is
not.


Re: [HACKERS] Html parsing and inline elements

2016-05-01 Thread Ryan Pedela
On Wed, Apr 13, 2016 at 9:57 AM, Marcelo Zabani <mzab...@gmail.com> wrote:

> Hi, Tom,
>
> You're right, I don't think one can argue that the default parser should
> know HTML.
> How about your suggestion of there being an HTML parser, is it feasible? I
> ask this because I think that a lot of people store HTML documents these
> days, and although there probably aren't lots of HTML with words written
> along multiple inline elements, it would certainly be nice to have a proper
> parser for these use cases.
>
> What do you think?
>

I recommend using Apache Tika [1] for plain text extraction from HTML.
There are so many weird edge cases when parsing HTML that it is easier to
use something that is already mature than reinventing the wheel.

1. https://tika.apache.org/

Thanks,
Ryan Pedela


Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-04-28 Thread Ryan Pedela
On Tue, Apr 26, 2016 at 10:49 AM, Stephen Frost <sfr...@snowman.net> wrote:

> * Ryan Pedela (rped...@datalanche.com) wrote:
> > On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehr...@jackdb.com>
> wrote:
> > > The default text representation of jsonb adds whitespace in between
> > > key/value pairs (after the colon ":") and after successive properties
> > > (after the comma ","):
>
> [...]
>
> > > It'd be nice to have a stable text representation of a jsonb value with
> > > minimal whitespace. The latter would also save a few bytes per record
> in
> > > text output formats, on the wire, and in backups (ex: COPY ... TO
> STDOUT).
> >
> > +1
> >
> > I cannot comment on the patch itself, but I welcome jsonb_compact() or
> some
> > way to get JSON with no inserted whitespace.
>
> As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
> compact JSON format to reduce the amount of traffic over the wire or to
> do things with on the client side, we should probably come up with a
> binary format, rather than just hack out the whitespace.  It's not like
> representing numbers using ASCII characters is terribly efficient
> either.


Why build a Ferrari when a skateboard would suffice? Besides, that doesn't
help one of the most common cases for JSONB: REST APIs.

Now that PG fully supports JSON, a user can use PG to construct the JSON
payload of a REST API request. Then the web server would simply be a
pass-through for the JSON payload. I personally have this use case, it is
not hypothetical. However currently, a user must parse the JSON string from
PG and re-stringify it to minimize the whitespace. Given that HTTP is
text-based, removing all extraneous whitespace is the best way to compress
it, and on top of that you can do gzip compression. Unless you are
suggesting that the binary format is just a gzipped version of the
minimized text format, I don't see how a binary format helps at all in the
REST API case.

In addition, every JSON implementation I have ever seen fully minimizes
JSON by default. PG appears to deviate from standard practice for no
apparent reason.


Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-04-26 Thread Ryan Pedela
On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni  wrote:

> Hi,
>
> The default text representation of jsonb adds whitespace in between
> key/value pairs (after the colon ":") and after successive properties
> (after the comma ","):
>
> postgres=# SELECT '{"b":2,"a":1}'::jsonb::text;
>text
> --
>  {"a": 1, "b": 2}
> (1 row)
>
> AFAIK, there's also no guarantee on the specific order of the resulting
> properties in the text representation either. I would suppose it's fixed
> for a given jsonb value within a database major version but across major
> versions it could change (if the underlying representation changes).
>
> I originally ran into this when comparing the hashes of the text
> representation of jsonb columns. The results didn't match up because the
> javascript function JSON.stringify(...) does not add any extra whitespace.
>
> It'd be nice to have a stable text representation of a jsonb value with
> minimal whitespace. The latter would also save a few bytes per record in
> text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
>

+1

I cannot comment on the patch itself, but I welcome jsonb_compact() or some
way to get JSON with no inserted whitespace.


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Ryan Pedela
On Mon, Sep 28, 2015 at 4:09 PM, Jim Nasby  wrote:

> On 9/28/15 11:43 AM, Andres Freund wrote:
>
>> On 2015-09-28 09:41:18 -0700, David Fetter wrote:
>>
>>> Since you're convinced that this is an unqualified win, please put
>>> together a project plan for switching from our current system to
>>> Github.
>>>
>>
>> Err, no. That's a waste of all our time.
>>
>> It has been stated pretty clearly in this thread by a number of senior
>> community people that we're not going to use a closed source system.
>>
>
> GitLab OTOH is released under a MIT license, so it is an option. I don't
> know how it compares to other suggested options, but if YUriy wants to
> propose it it's at least a viable option.


I haven't used Gitlab extensively, but it has a feature set similar to
Github and then some [1]. The OSS project does seem active [2], but it is
still relatively new.

1. https://about.gitlab.com/better-than-github/
2. https://gitlab.com/gitlab-org/gitlab-ce


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-24 Thread Ryan Pedela
Kam Lasater wrote:
> I'd suggest: Github Issues, Pivotal Tracker or Redmine (probably in
> that order). There are tens to hundreds of other great ones out there,
> I'm sure one of them would also work.

Why not just use Github issues?

1. You can set it up to send emails to the list when an issue is created or
updated.
2. Replies to the email will automatically update the issue on Github.
3. Github is where most of the OSS activity happens now.
4. If Github goes away, so what? The core workflow never changes.

Thanks,
Ryan Pedela


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-15 Thread Ryan Pedela
On Wed, Jul 15, 2015 at 11:10 AM, Ryan Pedela rped...@datalanche.com
wrote:

 On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com
 wrote:

 FWIW, I don't agree.  If it's not easy to read the JSON that
 PostgreSQL generates using JavaScript, then a lot of people are just
 going to give up on doing it, and IMO that would be sad.  Telling
 people that they have to parse the JSON using some parser other than
 the one built into their JavaScript engine, whack it around, and then
 render it as text and parse it again is not really an acceptable
 answer.


 The vast majority of Javascript users are going to be using Node.js when
 they connect to Postgres if only for security reasons. If they use Node,
 they will be using node-postgres [1] or something that builds on top of it.
 For int64 and numerics in a row, the default is to return a string, and
 there is a flag you can set to round returned numbers if you prefer. There
 is also a way to override the default parsing of each Postgres type [2]. So
 in the case of JSON using my json-bignum module [3], the code looks like
 this:

 var pgTypes = require('pg').types;
 var bignumJSON = require('json-bignum');

 types.setTypeParser(JSON_TYPE_OID, function (value) {
 return bignumJSON.parse(value);
 });

 types.setTypeParser(JSONB_TYPE_OID, function (value) {
 return bignumJSON.parse(value);
 });

 To me that code is super simple, and no a pain in the ass. In other words,
 it is not Telling people that they have to parse the JSON using some
 parser other than the one built into their JavaScript engine, whack it
 around, and then render it as text and parse it again. Like I said
 previously, the situation with Javascript will hopefully be remedied in a
 few years with ES7 anyway.

 1. https://github.com/brianc/node-postgres
 2. https://github.com/brianc/node-pg-types
 3. https://github.com/datalanche/json-bignum

  On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com
  wrote:

 The reason why the logical decoding stuff allows multiple
 output formats is because Andres, quite correctly, foresaw that
 different people would need different output formats.  He could have
 designed that system to output only one output format and just said,
 everybody's got to read and parse this, but that would have been slow.
 Instead, he tried to set things up so that you could get the output in
 the format that was most convenient for your client, whatever that is.
 On this thread, we're back-pedaling from that idea: sorry, you can get
 JSON output, but if you want JSON output that will be properly
 interpreted by your JSON parser, you can't have that.  Regardless of
 the details of this particular patch, I can't endorse that approach.
 If we want people to use our software, we need to meet them where they
 are at, especially when we are only (IIUC) talking about inserting a
 few extra quotation marks.


 I would be okay with a generic way to specify output formats if there are
 many use cases beyond Javascript and JSON. I vaguely remember someone
 suggesting a FORMAT clause on CREATE TABLE which would specify how a
 particular column would output from a SELECT. For example, returning a date
 with a non-ISO format. I liked that idea. However if the only reason for
 different output formats is Javascript, that is silly. I have a very long
 list of feature requests that would probably only be beneficial to me or a
 handful of users. Should we implement them? No, of course not! If we did
 that Postgres would cease to be the best open-source database. You can't
 have the best product and say yes to everything. Feature creep is the enemy
 of quality. If Javascript is the sole reason for supporting multiple output
 formats, then that is the definition of feature creep in my opinion. If
 there are many use cases beyond Javascript and JSON, then that is different
 and a conversation worth having.


Bottom line: Large numbers are a pain to deal with in Javascript regardless
of where they come from or what format they are in. Adding code to Postgres
core will never change that.


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-15 Thread Ryan Pedela
On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com wrote:

 FWIW, I don't agree.  If it's not easy to read the JSON that
 PostgreSQL generates using JavaScript, then a lot of people are just
 going to give up on doing it, and IMO that would be sad.  Telling
 people that they have to parse the JSON using some parser other than
 the one built into their JavaScript engine, whack it around, and then
 render it as text and parse it again is not really an acceptable
 answer.


The vast majority of Javascript users are going to be using Node.js when
they connect to Postgres if only for security reasons. If they use Node,
they will be using node-postgres [1] or something that builds on top of it.
For int64 and numerics in a row, the default is to return a string, and
there is a flag you can set to round returned numbers if you prefer. There
is also a way to override the default parsing of each Postgres type [2]. So
in the case of JSON using my json-bignum module [3], the code looks like
this:

var pgTypes = require('pg').types;
var bignumJSON = require('json-bignum');

types.setTypeParser(JSON_TYPE_OID, function (value) {
return bignumJSON.parse(value);
});

types.setTypeParser(JSONB_TYPE_OID, function (value) {
return bignumJSON.parse(value);
});

To me that code is super simple, and no a pain in the ass. In other words,
it is not Telling people that they have to parse the JSON using some
parser other than the one built into their JavaScript engine, whack it
around, and then render it as text and parse it again. Like I said
previously, the situation with Javascript will hopefully be remedied in a
few years with ES7 anyway.

1. https://github.com/brianc/node-postgres
2. https://github.com/brianc/node-pg-types
3. https://github.com/datalanche/json-bignum

 On Wed, Jul 15, 2015 at 8:52 AM, Robert Haas robertmh...@gmail.com wrote:

 The reason why the logical decoding stuff allows multiple
 output formats is because Andres, quite correctly, foresaw that
 different people would need different output formats.  He could have
 designed that system to output only one output format and just said,
 everybody's got to read and parse this, but that would have been slow.
 Instead, he tried to set things up so that you could get the output in
 the format that was most convenient for your client, whatever that is.
 On this thread, we're back-pedaling from that idea: sorry, you can get
 JSON output, but if you want JSON output that will be properly
 interpreted by your JSON parser, you can't have that.  Regardless of
 the details of this particular patch, I can't endorse that approach.
 If we want people to use our software, we need to meet them where they
 are at, especially when we are only (IIUC) talking about inserting a
 few extra quotation marks.


I would be okay with a generic way to specify output formats if there are
many use cases beyond Javascript and JSON. I vaguely remember someone
suggesting a FORMAT clause on CREATE TABLE which would specify how a
particular column would output from a SELECT. For example, returning a date
with a non-ISO format. I liked that idea. However if the only reason for
different output formats is Javascript, that is silly. I have a very long
list of feature requests that would probably only be beneficial to me or a
handful of users. Should we implement them? No, of course not! If we did
that Postgres would cease to be the best open-source database. You can't
have the best product and say yes to everything. Feature creep is the enemy
of quality. If Javascript is the sole reason for supporting multiple output
formats, then that is the definition of feature creep in my opinion. If
there are many use cases beyond Javascript and JSON, then that is different
and a conversation worth having.


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Ryan Pedela
On Mon, Jul 13, 2015 at 1:30 AM, Shulgin, Oleksandr 
oleksandr.shul...@zalando.de wrote:


 To reiterate: for my problem, that is escaping numerics that can
 potentially overflow[1] under ECMAScript standard, I want to be able to
 override the code that outputs the numeric converted to string.  There is
 no way in current implementation to do that *at all*, short of copying all
 the code involved in producing JSON output and changing it at certain
 points.  One could try re-parsing JSON instead, but that doesn't actually
 solve the issue, because type information is lost forever at that point.


I had the exact same problem with Node.js and client-side Javascript. That
is why I wrote json-bignum [1] for Node.js. There is a bower version [2] as
well. The only caveat is that it is slower than the native JSON functions,
but I am happy to receive PRs to improve performance.

1. https://github.com/datalanche/json-bignum
2. https://libraries.io/bower/json-bignum

As far as large numbers in JSON, I think Postgres is doing the right thing
and should not be changed. It is Javascript that is stupid here, and I
don't think it is wise to add something to core just because one client
does stupid things with large numbers. In addition, ES7 is introducing
value types which will hopefully solve the large number problem in
Javascript.

The random whitespace issue is valid in my opinion and should be fixed.

Thanks,
Ryan Pedela


Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-05-22 Thread Ryan Pedela
On Fri, May 22, 2015 at 10:51 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, May 22, 2015 at 9:43 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Andrew Dunstan wrote:
 
  On 05/20/2015 09:16 AM, Shulgin, Oleksandr wrote:
 
  Attached is a patch against master to generalize the JSON-producing
  functions in utils/adt/json.c and to provide a set of callbacks which
 can
  be overridden the same way that is already provided for *parsing* JSON.
 
  I'm not necessarily opposed to this, but it sure seems like a lot of
  changes, and moderately invasive ones, to support something that could
 be
  done, at the cost of reparsing, with a simple loadable extension that I
  could create in a few hours of programming.
 
  But this seems like a pretty reasonable change to make, no?  Doesn't the
  total amount of code decrease after this patch?  JSON stuff is pretty
  new so some refactoring and generalization of what we have is to be
  expected.

 Yeah.  Also, there have been a few previous gripes about this, for
 example,
 http://www.postgresql.org/message-id/cahbvmpzs+svr+y-ugxjrq+xw4dqtevl-cozc69zffwmxjck...@mail.gmail.com
 .
 As noted, I definitely prefer 'space free' by default for efficiency
 reasons, but standardizing the output has definitely got to be a
 reasonable goal.


Every JSON implementation I have ever used defaults to the minified version
of JSON (no whitespace) when printed.


Re: [HACKERS] jsonb concatenate operator's semantics seem questionable

2015-05-20 Thread Ryan Pedela
On Wed, May 20, 2015 at 12:34 PM, Andrew Dunstan and...@dunslane.net
wrote:


 So Dmitry, at my suggestion, has come up with a way of doing that, by
 adding a parameter to jsonb_replace(). If this parameter is set to true (it
 defaults to false) and the key or array element pointed to by the last
 element of the path doesn't exist, it gets created.


+1


Re: [HACKERS] jsonb concatenate operator's semantics seem questionable

2015-05-18 Thread Ryan Pedela
On Sun, May 17, 2015 at 9:41 PM, Josh Berkus j...@agliodbs.com wrote:

 Is there a particular reason why + makes more sense as shallow
 concatination and || makes more sense as deep concatination?  Like,
 something in JS or other client languages which would make that
 preference make more sense to users?


As someone who uses JSON day-to-day in Javascript and Python, I personally
don't think || or + matters much. Python uses json.loads() for JSON concat
and you have use a 3rd-party library in Javascript if you want that
functionality such as JQuery.extends(). I agree with Peter that we need
deep concatenation, but I don't think there is any standard for the
operator. I think the word shallow should be added to the docs though.

What is far more important than shallow or deep concatenation for the
document database use case is being able to delete or replace/update a
specific, nested path in the JSON object. It looks like that is possible
with the minus operator and jsonb_replace(). This is great, however it took
me awhile to figure out the path syntax. I think adding a paragraph to the
docs explaining the path syntax would help.

Ryan Pedela


Re: [HACKERS] jsonb concatenate operator's semantics seem questionable

2015-05-18 Thread Ryan Pedela
On Mon, May 18, 2015 at 8:41 AM, Ryan Pedela rped...@datalanche.com wrote:

 On Sun, May 17, 2015 at 9:41 PM, Josh Berkus j...@agliodbs.com wrote:

 Is there a particular reason why + makes more sense as shallow
 concatination and || makes more sense as deep concatination?  Like,
 something in JS or other client languages which would make that
 preference make more sense to users?


 As someone who uses JSON day-to-day in Javascript and Python, I personally
 don't think || or + matters much. Python uses json.loads() for JSON concat
 and you have use a 3rd-party library in Javascript if you want that
 functionality such as JQuery.extends(). I agree with Peter that we need
 deep concatenation, but I don't think there is any standard for the
 operator. I think the word shallow should be added to the docs though.

 What is far more important than shallow or deep concatenation for the
 document database use case is being able to delete or replace/update a
 specific, nested path in the JSON object. It looks like that is possible
 with the minus operator and jsonb_replace(). This is great, however it took
 me awhile to figure out the path syntax. I think adding a paragraph to the
 docs explaining the path syntax would help.


Having looked at this more, I think I understand the problem Peter has
identified and it is a significant usability problem in my opinion. I think
the word concatenate has confused me because I think of it as a
higher-level operation when I want to merge two, large JSON objects which
isn't a very common operation, at least for me. What is absolutely required
for the document database use case is the following:

1. Get element at any arbitrary path. ( # operator )
2. Delete any arbitrary path. ( minus operator )
3. Replace/update element at any arbitrary path. ( jsonb_replace )
4. Add element to any arbitrary path. ( ? )

It is #4 that does not seem to exist unless jsonb_replace() creates the
specified path if it does not exist. Does it? I am not currently at my desk
to test it myself.

If not, deep concatenation would solve this problem, but I can also see
another solution. Use + for shallow concatenation since it really means
add element to top-level path as Peter suggests. Then add another
function: jsonb_add( target jsonb, path text[], new jsonb ) to add element
at any arbitrary path. Then leave || for deep concatenation in 9.6 or
whenever.

If jsonb_replace() satisfies #4 then I think everything is fine. Without #4
however, jsonb would remain an incomplete document database solution in my
opinion.

Thanks,
Ryan Pedela


Re: [HACKERS] jsonb concatenate operator's semantics seem questionable

2015-05-18 Thread Ryan Pedela
On Mon, May 18, 2015 at 12:24 PM, Josh Berkus j...@agliodbs.com wrote:

 On 05/18/2015 08:57 AM, Ryan Pedela wrote:
  If not, deep concatenation would solve this problem, but I can also see
  another solution. Use + for shallow concatenation since it really means
  add element to top-level path as Peter suggests. Then add another
  function: jsonb_add( target jsonb, path text[], new jsonb ) to add
  element at any arbitrary path. Then leave || for deep concatenation in
  9.6 or whenever.

 Since swapping the operator seems still on the table, is there any
 particular reason why you think + is more suited to shallow
 concatination?  Both you and Peter have said this, but as a heavy user
 of JSON/JSONB, to me it seems the other way around.  That is, + says
 add to arbitrary nested node to me more than || does.


Let me back up a little. I always like to think about what is the ideal
interface first and then worry about implementation because implementation
can always be changed but interface can't. I think the current concat/merge
interface is the ideal. It should be || because that means concat/merge
everywhere else in the PG interface that I am aware of. In the case of JSON
which is a hierarchically data structure, it should be implemented as a
deep merge which by definition satisfies a shallow merge. This is what I
would expect as a user and I would think there was a bug if it didn't
perform deep merge. I expect this because I can implement shallow merge
easily myself using Javascript, Python, etc but deep merge is non-trivial.
Therefore I would expect a special JSON concat/merge library function to do
deep merge. I would rather the interface stay the same and it documented
that the current implementation is a shallow merge and may become a deep
merge in the future.

In the context of splitting shallow and deep merge into two operators, I
think + is better for shallow and || better for deep. The reason for + is
because many programming languages have this behavior. If I see the below
code in language I have never used before:

objC = objA + objB

My default assumption is that + performs a shallow merge. Like I said, I
would rather there just be one operator.


  If jsonb_replace() satisfies #4 then I think everything is fine. Without
  #4 however, jsonb would remain an incomplete document database solution
  in my opinion.

 Oh, no question, we're still incomplete.  Aside from nested append, we
 kinda lack easy sharded scale-out, which is a rather more major feature,
 no?


I think it depends on the point of view which is more important. If you
have a massive dataset, then obviously sharding is more important. But my
own take on why NoSQL became so popular has only a little to do with
sharding. MongoDB pitched to tech entrepreneurs use our database and
implement your MVP 10x faster/easier and we have sharding when you become
the next Google.  And it worked brilliantly. Many tech entrepreneurs are
worried about time constraints and dream of becoming the next Google
(myself included). But the reality is that most fail and the majority who
don't fail achieve moderate success, only a handful reach Google-level
success. Therefore the vast majority end up never needing sharding, but
they all experience that advertised 10x development speed improvement. I
doubt it really is 10x, but JSON maps very well to programming language
data structures (no impedence mismatch) so it is usually faster to build
prototypes with MongoDB.

If jsonb supported nested append, then I think that would be enough for
people who care most about development speed which I think is a larger
group than the group with massive datasets. In addition, sharding seems
like a server-level or database-level issue rather than a data type issue.


Re: [HACKERS] deparsing utility commands

2015-03-25 Thread Ryan Pedela
On Wed, Mar 25, 2015 at 11:59 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:


 * Should we prohibit DDL from within event triggers?


Please don't prohibit DDL unless there is a really, really good reason to
do so. I have several use cases in mind for event triggers, but they are
only useful if I can perform DDL.

For example, I want to create an Elasticsearch FDW and use that to index
and search Postgres tables. When a table is created, I am planning to use
an event trigger to capture the CREATE event and automatically create a
foreign table via the Elasticsearch FDW. In addition, I would add normal
triggers to the new table which capture DML and update the foreign table
accordingly. In other words, I want to use FDWs and event triggers to
automatically sync table DDL and DML to Elasticsearch.


Re: [HACKERS] deparsing utility commands

2015-02-24 Thread Ryan Pedela
On Tue, Feb 24, 2015 at 6:48 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Stephen Frost wrote:

 I'm thinking something like
  SELECT * FROM pg_creation_commands({'pg_class'::regclass,
 'sometable'::pg_class});
 would return a set of commands in the JSON-blob format that creates the
 table.  The input argument is an array of object addresses, so that you
 can request creation commands for multiple objects.  (It's not my
 intention to provide this functionality right away, but if somebody else
 wants to work on top of the current deparse patch, by my guest; if it
 proves simple enough we can still get it into 9.5 as part of this
 patch.)


 +1

Another possible function could be to diff two relations to produce a set
of DDL commands that could be used for schema migrations.

Also thank you very much! CREATE/ALTER support in event triggers is the
upcoming feature I am most excited about, and I am happy to see progress.

Thanks,
Ryan Pedela


Re: [HACKERS] PL/pgSQL 2

2014-09-02 Thread Ryan Pedela
If PL/Javascript is a serious consideration, how will int64 and numeric be
handled?

Thanks,

Ryan Pedela
Datalanche CEO, co-founder
www.datalanche.com
rped...@datalanche.com
513-571-6837


On Tue, Sep 2, 2014 at 6:38 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 09/02/2014 05:44 AM, Álvaro Hernández Tortosa wrote:


 On 02/09/14 11:34, Mark Kirkwood wrote:

 On 02/09/14 21:25, Álvaro Hernández Tortosa wrote:


 On 02/09/14 05:24, Craig Ringer wrote:

 I couldn't disagree more.

 If we were to implement anything, it'd be PL/PSM
 (http://en.wikipedia.org/wiki/SQL/PSM). I'm sure it's as bizarre and
 quirky as anything else the SQL committee has brought forth, but it's
 at
 least a standard(ish) language.

  So we'd choose a bizarre and quirky language instead of anything
 better just because it's standard. I'm sure current and prospective
 users will surely prefer a bizarre and quirky language that is standard
 approved, rather than a modern, comfortable, easy-to-use, that is not
 embodied by the ISO. No doubt ^_^


 Well there is the risk that by randomly adding new syntax to PL/pgSQL we
 turn it in a bizarre and quirky *non standard* language. Part of the
 attraction of PL/pgsql is that it is Ada like - if we break that too much
 then...well...that would be bad. So I think a careful balance is needed, to
 add new features that keep the spirit of the original language.


 I agree. I think I haven't suggested adding new syntax to pl/pgsql.
 But having its syntax similar to ADA is IMHO not something good. I'm sure
 few prospective postgres users would be compelled to that. They are
 compelled about JavaScript, python, Scala or Ruby, to name a few, but
 definitely not ADA.



 Just as a small nit pick - the name of the language is not ADA, but Ada.
 It isn't an acronym. The language is named after Ada Lovelace, arguably the
 world's first programmer. If you're not familiar with modern Ada, let me
 recommend the newly published Programming in Ada 2012 by John Barnes. But
 I digress.

 JavaScript would actually be quite a good alternative. However, using it
 involves something others have objected to, namely calling SQL via a
 function call. It's true that plpgsql lets you call SQL commands without
 explicitly invoking SPI. OTOH, it actually relies on SPI under the hood a
 lot more that other PLs, which I have little doubt is responsible for
 timings like this:

andrew=# do $$ declare x int = 1; i int = 1; begin while i 
1000 loop i := i + 1; x := x + 46; end loop; raise notice ' x =
%',x; end; $$;
NOTICE:   x = 45955
DO
Time: 13222.195 ms
andrew=# do $$ var x = 1; var i = 1; while (i  1000) { i += 1;
x += 46; } plv8.elog(NOTICE, x =  + x); $$ language plv8;
NOTICE:  x = 45955
DO
Time: 27.976 ms

 But I'm not suggesting we should implement a Javascript PL in core either.

 Finally, +1 to Tom's suggestion upthread that we implement different
 behaviours via pragmas rather than some new offshoot language. Maybe a GUC
 could specify a default set of such pragmas, so you wouldn't need to
 decorate every function with them.

 cheers

 andrew





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



[HACKERS] JSON Patch (RFC 6902) support?

2014-03-13 Thread Ryan Pedela
This is my first email to the PostgreSQL mailing lists so I hope this is
the correct place. If not, please let me know.

I was wondering if it would be possible and wise to support JSON Patch?
https://tools.ietf.org/html/rfc6902

One of the problems I have as a user is how to update a portion of a JSON
object efficiently. Right now I have to read the entire field from the
database, update it, and then write it back. I am thinking JSON Patch might
be a good way to solve this problem because it would allow partial updates
and I think it could easily fit into the existing set of JSON functions
such as:

// applies a JSON Patch
json_patch_apply(json, patch)

// diffs two JSON objects and produces a JSON Patch
json_patch_diff(json a, json b)

Thanks,
Ryan Pedela