Re: Ask for reviewing on Structured Streaming PRs

2019-01-13 Thread Sean Owen
Yes you're preaching to the choir here. SS does seem somewhat abandoned by those that have worked on it. I have also been at times frustrated that some areas fall into this pattern. There isn't a way to make people work on it, and I personally am not interested in it nor have a background in SS.

Re: Ask for reviewing on Structured Streaming PRs

2019-01-13 Thread Jungtaek Lim
Sean, this is actually a fail-back on pinging committers. I know who can review and merge in SS area, and pinged to them, didn't work. Even there's a PR which approach was encouraged by committer and reviewed the first phase, and no review. That's not the first time I have faced the situation,

Re: [DISCUSS] Identifiers with multi-catalog support

2019-01-13 Thread Ryan Blue
I think that the solution to this problem is to mix the two approaches by supporting 3 identifier parts: catalog, namespace, and name, where namespace can be an n-part identifier: type Namespace = Seq[String] case class CatalogIdentifier(space: Namespace, name: String) This allows catalogs to

Re: [DISCUSS] Identifiers with multi-catalog support

2019-01-13 Thread Reynold Xin
Thanks for writing this up. Just to show why option 1 is not sufficient. MySQL and Postgres are the two most popular open source database systems, and both support database → schema → table 3 part identification, so Spark supporting only 2 part name passing to the data source (option 1) isn't

[DISCUSS] Identifiers with multi-catalog support

2019-01-13 Thread Ryan Blue
In the DSv2 sync up, we tried to discuss the Table metadata proposal but were side-tracked on its use of TableIdentifier. There were good points about how Spark should identify tables, views, functions, etc, and I want to start a discussion here. Identifiers are orthogonal to the TableCatalog

Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-13 Thread Felix Cheung
Eh, yeah, like the one with signing, I think doc build is mostly useful when a) right before we do a release or during the RC resets; b) someone makes a huge change to doc and want to check Not sure we need this nightly? From: Sean Owen Sent: Sunday, January

Re: Ask for reviewing on Structured Streaming PRs

2019-01-13 Thread Hyukjin Kwon
But it's true that imho there's less activity in SS in general. Should be noted. Maybe it's also because committers are busy for other stuffs. Yea, I agree that one actionable strategy for now might be to make the PR description as clear as possible to make the review easier, and then ping them

Re: Ask for reviewing on Structured Streaming PRs

2019-01-13 Thread Sean Owen
Jungtaek, the best strategy is to find who wrote the code you are modifying (use Github history or git blame) and ping them directly on the PR. I don't know this code well myself. It also helps if you can address why the functionality is important, and describe compatibility implications. Most

Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-13 Thread Sean Owen
Will do. Er, maybe add Shane here too -- should we disable this docs job? are these docs used, and is there much value in nightly snapshots of the whole site? On Sat, Jan 12, 2019 at 9:04 PM Felix Cheung wrote: > > These get “published” by doc nightly build from riselab Jenkins... > > >