Re: PR Milestone policy
My feeling is that setting a milestone on PRs before they're merged is a way of making their authors feel more included. I don't necessarily see a problem with setting milestones optimistically and then, when a release branch is about to be cut (based on the timed release date), we bulk-update anything that hasn't been merged yet to the next milestone. However, there are other ways to make authors feel more included. If we end up doing a more formalized proposal process then this helps too. (It should be easier for people to comment on proposals than on PRs, since there isn't a need to read code.) I guess I'm not really fussy either way on this one. On Wed, Dec 12, 2018 at 10:27 PM 邱明明 wrote: > I agree with Jonathan. > Jay Nash 于2018年12月13日周四 下午1:05写道: > > > > Dear all, > > I am just bystander on Druid List however I like to contribute code to > Druids some day because it is very great, we use it at my company. It > sounds consensus was reached that Github milestones should be used not so > frequently and is proposed vote about to change this.. is this correct? > > > > Regards, > > Jay > > > > On 2018/12/12 00:39:29, Jonathan Wei wrote: > > > After a PR has been reviewed and merged, I think we should tag it with > the> > > > upcoming milestone to make life easier for release managers, for all > PRs.> > > > > > > Regarding unresolved PRs:> > > > > > > > I advocate for not assigning milestones to any non-bug (or otherwise> > > > "critical") PRs, including "feature", non-refactoring PRs.> > > > > > > That seems like a reasonable policy to me, based on the points Roman > made> > > > in the thread.> > > > > > > On Tue, Dec 11, 2018 at 1:13 AM Julian Hyde wrote:> > > > > > > > Well, see if you can get consensus around such a policy. Other Druid> > > > > folks, please speak up if you agree or disagree.> > > > >> > > > > > On Dec 8, 2018, at 8:02 AM, Roman Leventov > > > > > wrote:> > > > > >> > > > > > It's not exactly and not only that. I advocate for not assigning> > > > > milestones> > > > > > to any non-bug (or otherwise "critical") PRs, including "feature",> > > > > > non-refactoring PRs.> > > > > >> > > > > > On Fri, 7 Dec 2018 at 19:29, Julian Hyde > wrote:> > > > > >> > > > > >> Consensus.> > > > > >>> > > > > >> We resolve debates by going into them knowing that we need to > find> > > > > >> consensus. A vote is a last step to prove that consensus exists, > and> > > > > >> in most cases is not necessary.> > > > > >>> > > > > >> Reading between the lines, it sounds as if you and FJ have a> > > > > >> difference of opinion about refactoring changes. Two extreme > positions> > > > > >> would be (1) we don't accept changes that only refactor code, (2) > and> > > > > >> I assert my right to contribute a refactoring change at any point > in> > > > > >> the project lifecycle. A debate that starts with those positions > is> > > > > >> never going to reach consensus. A better starting point might be > "I> > > > > >> would like to make the following change because I believe it > would be> > > > > >> beneficial. How could I best structure it / time it to minimize> > > > > >> impact?"> > > > > >> On Fri, Dec 7, 2018 at 9:19 AM Roman Leventov > > > > > >> wrote:> > > > > > > > > >>> I would like like learn what is the Apache way to resolve > debates. But> > > > > >> you> > > > > >>> are right, this question probably doesn't deserve that. Thanks > for> > > > > >> guidance> > > > > >>> Julian.> > > > > > > > > >>> On Fri, 7 Dec 2018 at 16:43, Julian Hyde > > > > > wrote:> > > > > > > > > May I suggest that a vote is not the solution. In this > discussion I> > > > > see> > > > > two people beating each other over the head with policy.> > > > > > > > > > Let’s strive to operate according to the Apache way. Accept> > > > > >> contributions> > > > > on merit in a timely manner. Avoid the urge to “project > manage”.> > > > > > > > > > Julian> > > > > > > > > > > On Dec 7, 2018, at 07:03, Roman Leventov > > > > > >> wrote:> > > > > >> > > > > > The previous consensus community decision seems to be to not > use PR> > > > > > milestones for any PRs except bugs. To change this policy, > probably> > > > > >> there> > > > > > should be a committer (or PPMC?) vote.> > > > > >> > > > > >> On Thu, 6 Dec 2018 at 20:49, Julian Hyde > wrote:> > > > > >>> > > > > >> FJ,> > > > > >>> > > > > >> What you are proposing sounds suspiciously like project > management.> > > > > >> If a> > > > > >> contributor makes a contribution, that contribution should be > given> > > > > >> a> > > > > fair> > > > > >> review in a timely fashion and be committed based on its > merits. You> > > > > >> overstate the time-sensitivity of contributions. I would > imagine> > > > > >> that> > > > > there> > > > > >> are only a few days preceding each release where stability is > a> > > > > >> major> > > > > >> concern. At any othe
Re: Off list major development
It sounds like splitting design from code review is a common theme in a few of the posts here. How does everyone feel about making a point of encouraging design reviews to be done as issues, separate from the pull request, with the expectations that (1) the design review issue ("proposal") should generally appear somewhat _before_ the pull request; (2) pull requests should _not_ have design review happen on them, meaning there should no longer be PRs with design review tags, and we should move the design review approval process to the issue rather than the PR. For (1), even if we encourage design review discussions to start before a pull request appears, I don't see an issue with them running concurrently for a while at some point. On Thu, Jan 3, 2019 at 5:35 PM Jonathan Wei wrote: > Thanks for raising these concerns! > > My initial thoughts: > - I agree that separation of design review and code-level review for major > changes would be more efficient > > - I agree that a clear, more formalized process for handling major changes > would be helpful for contributors: > - Define what is considered a major change > - Define a standard proposal structure, KIP-style proposal format sounds > good to me > > - I think it's too rigid to have a policy of "no code at all with the > initial proposal" > - Code samples can be useful references for understanding aspects of a > design > - In some cases it's necessary to run experiments to fully understand a > problem and determine an appropriate design, or to determine whether > something is even worth doing before committing to the work of fleshing out > a proposal, prototype code is a natural outcome of that and I'm not against > someone providing such code for reference > - I tend to view design/code as things that are often developed > simultaneously in an intertwined way > > > Let's not be naive this is very rare that a contributor will accept that > his work is to be thrown, usually devs takes coding as personal creation > and they get attached to it. > > If we have a clear review process that emphasizes the need for early > consensus building, with separate design and code review, then I feel we've > done enough and don't need a hard rule against having some code linked with > the initial proposal. If a potential contributor then still wants to go > ahead and write a lot of code that may be rejected or change significantly, > the risks were made clear. > > > Once code is written hard to think abstract. > > I can see the validity of the concern, but I personally don't see it as a > huge risk. My impression from the Druid PR reviews I've seen is that our > reviewers are able to keep abstract design vs. implementation details > separate and consider alternate designs when reviewing. > > To summarize I think it's probably enough to have a policy along the lines > of: > - Create more formalized guidelines for proposals and what changes require > proposals > - Separate design and code review for major changes, with design review > first, code-level review after reaching consensus on the design. > - Code before the design review is completed is just for reference, not > regarded as a candidate for review/merging. > > - Jon > > > On Thu, Jan 3, 2019 at 12:48 PM Slim Bouguerra > wrote: > > > On Thu, Jan 3, 2019 at 12:16 PM Clint Wylie > wrote: > > > > > I am definitely biased in this matter as an owner of another large PR > > that > > > wasn't preceded by a direct proposal or dev list discussion, and in > > general > > > I agree that proposal first is usually better, but I think in some > rarer > > > cases approaching a problem code first *is* the most appropriate way to > > > have a discussion. > > > > > > I am wondering here what is the case where code first is better? > > In general when you are writing code you have an idea about what you want > > to change, why you want to change and why you want to change it. > > I do not see what is wrong with sharing this primitive ideas and thoughts > > as an abstract proposal (at least to avoid overlapping) > > > > I see nothing wrong with it so long as the author > > > accepts that the PR is treated as a combined proposal and proof of > > concept, > > > and fair game to be radically changed via discussion or even rejected, > > > which sounds like Gian's attitude on the matter and is mine as well > with > > my > > > compression stuff. > > > > > > Let's not be naive this is very rare that a contributor will accept that > > his work is to be thrown, usually devs takes coding as personal creation > > and they get attached to it. > > To my point you can take a look on some old issue in the Druid forum > > > https://github.com/apache/incubator-druid/pull/3755#issuecomment-265667690 > > and am sure other communities have similar problems. > > So leaving the door open to some side cases is not a good idea in my > > opinion and will lead to similar issue in the future. > > > > This seems to me especially likely to happ
Re: Off list major development
Small contributions don’t need any design review, whereas large contributions need significant review. I don’t think we should require an additional step for those (many) small contributions. But who decides whether a contribution fits into the small or large category? I think the solution is for authors to log a case (or send an email to dev) before they start work on any contribution. Then committers can request a more heavy-weight process if they think it is needed. Julian > On Jan 7, 2019, at 11:24 AM, Gian Merlino wrote: > > It sounds like splitting design from code review is a common theme in a few > of the posts here. How does everyone feel about making a point of > encouraging design reviews to be done as issues, separate from the pull > request, with the expectations that (1) the design review issue > ("proposal") should generally appear somewhat _before_ the pull request; > (2) pull requests should _not_ have design review happen on them, meaning > there should no longer be PRs with design review tags, and we should move > the design review approval process to the issue rather than the PR. > > For (1), even if we encourage design review discussions to start before a > pull request appears, I don't see an issue with them running concurrently > for a while at some point. > > On Thu, Jan 3, 2019 at 5:35 PM Jonathan Wei wrote: > >> Thanks for raising these concerns! >> >> My initial thoughts: >> - I agree that separation of design review and code-level review for major >> changes would be more efficient >> >> - I agree that a clear, more formalized process for handling major changes >> would be helpful for contributors: >> - Define what is considered a major change >> - Define a standard proposal structure, KIP-style proposal format sounds >> good to me >> >> - I think it's too rigid to have a policy of "no code at all with the >> initial proposal" >> - Code samples can be useful references for understanding aspects of a >> design >> - In some cases it's necessary to run experiments to fully understand a >> problem and determine an appropriate design, or to determine whether >> something is even worth doing before committing to the work of fleshing out >> a proposal, prototype code is a natural outcome of that and I'm not against >> someone providing such code for reference >> - I tend to view design/code as things that are often developed >> simultaneously in an intertwined way >> >>> Let's not be naive this is very rare that a contributor will accept that >> his work is to be thrown, usually devs takes coding as personal creation >> and they get attached to it. >> >> If we have a clear review process that emphasizes the need for early >> consensus building, with separate design and code review, then I feel we've >> done enough and don't need a hard rule against having some code linked with >> the initial proposal. If a potential contributor then still wants to go >> ahead and write a lot of code that may be rejected or change significantly, >> the risks were made clear. >> >>> Once code is written hard to think abstract. >> >> I can see the validity of the concern, but I personally don't see it as a >> huge risk. My impression from the Druid PR reviews I've seen is that our >> reviewers are able to keep abstract design vs. implementation details >> separate and consider alternate designs when reviewing. >> >> To summarize I think it's probably enough to have a policy along the lines >> of: >> - Create more formalized guidelines for proposals and what changes require >> proposals >> - Separate design and code review for major changes, with design review >> first, code-level review after reaching consensus on the design. >> - Code before the design review is completed is just for reference, not >> regarded as a candidate for review/merging. >> >> - Jon >> >> >> On Thu, Jan 3, 2019 at 12:48 PM Slim Bouguerra >> wrote: >> >>> On Thu, Jan 3, 2019 at 12:16 PM Clint Wylie >> wrote: >>> I am definitely biased in this matter as an owner of another large PR >>> that wasn't preceded by a direct proposal or dev list discussion, and in >>> general I agree that proposal first is usually better, but I think in some >> rarer cases approaching a problem code first *is* the most appropriate way to have a discussion. >>> >>> >>> I am wondering here what is the case where code first is better? >>> In general when you are writing code you have an idea about what you want >>> to change, why you want to change and why you want to change it. >>> I do not see what is wrong with sharing this primitive ideas and thoughts >>> as an abstract proposal (at least to avoid overlapping) >>> >>> I see nothing wrong with it so long as the author accepts that the PR is treated as a combined proposal and proof of >>> concept, and fair game to be radically changed via discussion or even rejected, which sounds like Gian's attitude on the matter and is mine as w
Re: Off list major development
I don't think there's a need to raise issues for every change: a small bug fix or doc fix should just go straight to PR. (GitHub PRs show up as issues in the issue-search UI/API, so it's not like this means the patch has no corresponding issue -- in a sense the PR _is_ the issue.) I do think it makes sense to encourage potential contributors to write to the dev list or raise an issue if they aren't sure if something would need to go through a more heavy weight process. Fwiw we do have a set of 'design review' criteria already (we had a discussion about this a couple years ago) at: http://druid.io/community/#getting-your-changes-accepted. So we wouldn't be starting from zero on defining that. We set it up back when we were trying to _streamline_ our process -- we used to require two non-author +1s for _every_ change, even minor ones. The introduction of design review criteria was meant to classify which PRs need that level of review and which ones are minor and can be merged with less review. I do think it helped with getting minor PRs merged more quickly. The list of criteria is, - Major architectural changes or API changes - HTTP requests and responses (e. g. a new HTTP endpoint) - Interfaces for extensions - Server configuration (e. g. altering the behavior of a config property) - Emitted metrics - Other major changes, judged by the discretion of Druid committers Some of it is subjective, but it has been in place for a while, so it's at least something we are relatively familiar with. On Mon, Jan 7, 2019 at 11:32 AM Julian Hyde wrote: > Small contributions don’t need any design review, whereas large > contributions need significant review. I don’t think we should require an > additional step for those (many) small contributions. But who decides > whether a contribution fits into the small or large category? > > I think the solution is for authors to log a case (or send an email to > dev) before they start work on any contribution. Then committers can > request a more heavy-weight process if they think it is needed. > > Julian > > > > On Jan 7, 2019, at 11:24 AM, Gian Merlino wrote: > > > > It sounds like splitting design from code review is a common theme in a > few > > of the posts here. How does everyone feel about making a point of > > encouraging design reviews to be done as issues, separate from the pull > > request, with the expectations that (1) the design review issue > > ("proposal") should generally appear somewhat _before_ the pull request; > > (2) pull requests should _not_ have design review happen on them, meaning > > there should no longer be PRs with design review tags, and we should move > > the design review approval process to the issue rather than the PR. > > > > For (1), even if we encourage design review discussions to start before a > > pull request appears, I don't see an issue with them running concurrently > > for a while at some point. > > > > On Thu, Jan 3, 2019 at 5:35 PM Jonathan Wei wrote: > > > >> Thanks for raising these concerns! > >> > >> My initial thoughts: > >> - I agree that separation of design review and code-level review for > major > >> changes would be more efficient > >> > >> - I agree that a clear, more formalized process for handling major > changes > >> would be helpful for contributors: > >> - Define what is considered a major change > >> - Define a standard proposal structure, KIP-style proposal format > sounds > >> good to me > >> > >> - I think it's too rigid to have a policy of "no code at all with the > >> initial proposal" > >> - Code samples can be useful references for understanding aspects of a > >> design > >> - In some cases it's necessary to run experiments to fully understand a > >> problem and determine an appropriate design, or to determine whether > >> something is even worth doing before committing to the work of fleshing > out > >> a proposal, prototype code is a natural outcome of that and I'm not > against > >> someone providing such code for reference > >> - I tend to view design/code as things that are often developed > >> simultaneously in an intertwined way > >> > >>> Let's not be naive this is very rare that a contributor will accept > that > >> his work is to be thrown, usually devs takes coding as personal creation > >> and they get attached to it. > >> > >> If we have a clear review process that emphasizes the need for early > >> consensus building, with separate design and code review, then I feel > we've > >> done enough and don't need a hard rule against having some code linked > with > >> the initial proposal. If a potential contributor then still wants to go > >> ahead and write a lot of code that may be rejected or change > significantly, > >> the risks were made clear. > >> > >>> Once code is written hard to think abstract. > >> > >> I can see the validity of the concern, but I personally don't see it as > a > >> huge risk. My impression from the Druid PR reviews I've seen is that our > >> reviewers are able
Re: Off list major development
Statically, yes, GitHub PRs are the same as GitHub cases. But dynamically, they are different, because you can only log a PR when you have finished work. A lot of other Apache projects use JIRA, so there is a clear distinction between cases and contributions. JIRA cases, especially when logged early in the lifecycle of a contribution, become long-running conversation threads with a lot of community participation. If the Druid chose to do so, GitHub cases could be the same. Be careful that you do not treat “potential contributors” (by which I presume you mean non-committers) differently from committers and PMC members. Anyone starting a major piece of work should follow the same process. (Experienced committers probably have a somewhat better idea what work will turn out to be “major”, so they get a little more leeway.) Julian > On Jan 7, 2019, at 12:10 PM, Gian Merlino wrote: > > I don't think there's a need to raise issues for every change: a small bug > fix or doc fix should just go straight to PR. (GitHub PRs show up as issues > in the issue-search UI/API, so it's not like this means the patch has no > corresponding issue -- in a sense the PR _is_ the issue.) > > I do think it makes sense to encourage potential contributors to write to > the dev list or raise an issue if they aren't sure if something would need > to go through a more heavy weight process. > > Fwiw we do have a set of 'design review' criteria already (we had a > discussion about this a couple years ago) at: > http://druid.io/community/#getting-your-changes-accepted. So we wouldn't be > starting from zero on defining that. We set it up back when we were trying > to _streamline_ our process -- we used to require two non-author +1s for > _every_ change, even minor ones. The introduction of design review criteria > was meant to classify which PRs need that level of review and which ones > are minor and can be merged with less review. I do think it helped with > getting minor PRs merged more quickly. The list of criteria is, > > - Major architectural changes or API changes > - HTTP requests and responses (e. g. a new HTTP endpoint) > - Interfaces for extensions > - Server configuration (e. g. altering the behavior of a config property) > - Emitted metrics > - Other major changes, judged by the discretion of Druid committers > > Some of it is subjective, but it has been in place for a while, so it's at > least something we are relatively familiar with. > > On Mon, Jan 7, 2019 at 11:32 AM Julian Hyde wrote: > >> Small contributions don’t need any design review, whereas large >> contributions need significant review. I don’t think we should require an >> additional step for those (many) small contributions. But who decides >> whether a contribution fits into the small or large category? >> >> I think the solution is for authors to log a case (or send an email to >> dev) before they start work on any contribution. Then committers can >> request a more heavy-weight process if they think it is needed. >> >> Julian >> >> >>> On Jan 7, 2019, at 11:24 AM, Gian Merlino wrote: >>> >>> It sounds like splitting design from code review is a common theme in a >> few >>> of the posts here. How does everyone feel about making a point of >>> encouraging design reviews to be done as issues, separate from the pull >>> request, with the expectations that (1) the design review issue >>> ("proposal") should generally appear somewhat _before_ the pull request; >>> (2) pull requests should _not_ have design review happen on them, meaning >>> there should no longer be PRs with design review tags, and we should move >>> the design review approval process to the issue rather than the PR. >>> >>> For (1), even if we encourage design review discussions to start before a >>> pull request appears, I don't see an issue with them running concurrently >>> for a while at some point. >>> >>> On Thu, Jan 3, 2019 at 5:35 PM Jonathan Wei wrote: >>> Thanks for raising these concerns! My initial thoughts: - I agree that separation of design review and code-level review for >> major changes would be more efficient - I agree that a clear, more formalized process for handling major >> changes would be helpful for contributors: - Define what is considered a major change - Define a standard proposal structure, KIP-style proposal format >> sounds good to me - I think it's too rigid to have a policy of "no code at all with the initial proposal" - Code samples can be useful references for understanding aspects of a design - In some cases it's necessary to run experiments to fully understand a problem and determine an appropriate design, or to determine whether something is even worth doing before committing to the work of fleshing >> out a proposal, prototype code is a natural outcome of that and I'm not >> against someone providing such code for refere
Re: Watermarks!
For Kafka, maybe something that tells you if all committed data is actually loaded, & what offset has been committed up to? Would there by any problems caused by the fact that only the most recent commit is saved in the DB? Is this feature connected at all to an ask I have heard from a few people: that there be an option to fail a query (or at least include a special response header) if some segments in the interval are unavailable? (Which, currently, the broker can't know since it doesn't know details about all available segments.) Btw, at your site do you have any plans to migrate to Kafka indexing? On Wed, Jan 2, 2019 at 5:37 PM Charles Allen wrote: > Hi all! > > https://github.com/apache/incubator-druid/pull/6799 > > A contribution is up that includes a neat feature we have been using > internally called Watermarks. Basically when operating a large scale and > multi-tenant system, it is handy to be able to monitor how 'well behaved' > the data is with regard to history. This is commonly used to spot holes in > data, and to help give hints to data consumers in a lambda environment on > when data has been run through a thorough check (batch job) vs a best > effort sketch of the results which may or may not handle late data well > (streaming intake). > > Unfortunately i'm not really sure what meta-data would be handy to have for > the kafka indexing service, so I'd love input there as well if anyone knows > of any "watermarks" that would make sense for it. > > Since the extension was written to be a stand alone service, it can remain > as an extension forever if desired. An alternative I would like to propose > is that the primitives for the watermark feature be added to core druid, > and the extension points be added to their respective places (mysql > extension and google extension to name two explicitly). > > Let me know what you think! > Charles Allen >
Re: Watermarks!
I'll answer the last question first: Many data groups are processed via Airflow, so having a batch component compatible with Airflow is more impactful than being able to live stream data as it stands right now. I'm constantly on the lookout for a use case where druid streaming is a good fit for a solution (as opposed to Graphite/grafana, or even potentially prometheus) but haven't found one yet where the overhead for maintaining the extra realtime and streaming system is worth the payout. From a technology investment point of view, a Beam compatible sink (which we have an internal one based on tranquility for streaming sinks) might end up working. I am interested to see if the KIS features can be leveraged to work with systems outside of kafka. Also of great interest is to see if the "resources per task" can be made more tunable instead of being a single cookie cutter footprint. The need for huge resources during the final merge-and-push phase compared to the incremental intake phase is also a major pain point and cause of inefficiency for Druid streaming stuff. Watermarking *could* tell if segments are unavailable (i.e. a whole hour of data is missing) and fail the query accordingly if the watermark cursor was not advanced beyond the interval end. I have not attempted to put such an interrupt into the query layer though. It is a very intriguing idea. In general the cursors work by monitoring the segment availability announcements and watches for certain criteria to be met before advancing. A very simple example here would be to halt a watermark's progression until at least *some* data for a time range is available in some segment somewhere. A more advanced cursor would have a concept of "completeness" and only advance the watermark once some time range has reached some "complete" criteria (number of events, or signal from external system could make sense). The nice thing here is also with automated checks, which can wait until the watermark has progressed before querying the druid cluster for some data. Hopefully that answers some questions, Charles Allen On Mon, Jan 7, 2019 at 12:50 PM Gian Merlino wrote: > For Kafka, maybe something that tells you if all committed data is actually > loaded, & what offset has been committed up to? Would there by any problems > caused by the fact that only the most recent commit is saved in the DB? > > Is this feature connected at all to an ask I have heard from a few people: > that there be an option to fail a query (or at least include a special > response header) if some segments in the interval are unavailable? (Which, > currently, the broker can't know since it doesn't know details about all > available segments.) > > Btw, at your site do you have any plans to migrate to Kafka indexing? > > On Wed, Jan 2, 2019 at 5:37 PM Charles Allen .invalid> > wrote: > > > Hi all! > > > > https://github.com/apache/incubator-druid/pull/6799 > > > > A contribution is up that includes a neat feature we have been using > > internally called Watermarks. Basically when operating a large scale and > > multi-tenant system, it is handy to be able to monitor how 'well behaved' > > the data is with regard to history. This is commonly used to spot holes > in > > data, and to help give hints to data consumers in a lambda environment on > > when data has been run through a thorough check (batch job) vs a best > > effort sketch of the results which may or may not handle late data well > > (streaming intake). > > > > Unfortunately i'm not really sure what meta-data would be handy to have > for > > the kafka indexing service, so I'd love input there as well if anyone > knows > > of any "watermarks" that would make sense for it. > > > > Since the extension was written to be a stand alone service, it can > remain > > as an extension forever if desired. An alternative I would like to > propose > > is that the primitives for the watermark feature be added to core druid, > > and the extension points be added to their respective places (mysql > > extension and google extension to name two explicitly). > > > > Let me know what you think! > > Charles Allen > > >
Guava compat
Hi all! Just FYI https://github.com/apache/incubator-druid/pull/6815 is up which is hopefully the last change needed to get old AND recent versions of Guava working with druid
Pointers on implementing a new ShardSpec
Hey all, Are there any major caveats or gotchas I should be aware of when implementing a new ShardSpec? The context here is that we have a datasource that is the combined result of multiple input jobs. We're trying to do write-side joining by having all of the jobs write segments for the same intervals (e.g. partitioning on both partition number and source pipeline). For now, I've modified the Spark-Druid batch ingestor ( https://github.com/metamx/druid-spark-batch) to run in our various pipelines and to write out segments with identifier form `dataSource_startInterval_endInterval_version_sourceName_partitionNum. This is working without issue for loading, querying, and deleting data, but the metadata API reports the incorrect segment identifier, since it reconstructs the identifier instead of reading from metadata (e.g. it reports segment identifiers of the form `dataSource_startInterval_endInterval_version_partitionNum`). Both because we'd like this to be fully supported, and because we imagine that this feature may be useful to others, I'd like to implement this via a ShardSpec. Julian
Re: Druid 0.14 timing
On 2019/01/04 21:06:40, Gian Merlino wrote: > It feels like 0.13.0 was just recently released, but it was branched off > back in October, and it has almost been 3 months since then. How do we feel > about doing an 0.14 branch cut at the end of January (Thu Jan 31) - going > back to the every 3 months cycle? > > For this release, based on the feedback we got from the Incubator vote last > time, we'll need to fix up the LICENSE and NOTICE issues that were flagged > but waved through for our first release. (Justin said he would have -1'd > based on that if it was anything beyond a first release.) > +1 - To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org