I agree that looking at it from the pov of "code paths where isBarrier
tests were introduced" seems right.

>From pr-21758 <https://github.com/apache/spark/pull/21758/files> (the one
already merged) there are 13 files touched under
core/src/main/scala/org/apache/spark/scheduler/, although most of those
appear to be relatively small edits. The "big" modifications are
concentrated on Task.scala and TaskSchedulerImpl.scala. The followup
pr-21898 <https://github.com/apache/spark/pull/21898/files> touches a
subset of those.

The project-hydrogen epic for "barrier execution" SPARK-24374
<https://issues.apache.org/jira/browse/SPARK-24374> contains 22 sub-issues,
most of which are still open. Some are marked for future release cycles; Is
there a specific set being proposed for 2.4?  The various back-end supports
look tagged for subsequent release cycles: is the 2.4 scope standalone
clusters?

CI will obviously exercise standard task scheduling code paths, which
indicates some level of stability.  Folks on the k8s big data SIG today
were interested in building test distributions for the barrier-related
features. I was reflecting that although the spark-on-kube fork was awkward
in some ways, it did provide a unified distribution that interested
community members could build, download and/or run. Project hydrogen is
currently incarnated as a set of PRs, but a unified test build that
included pr-21758 <https://github.com/apache/spark/pull/21758/files> and
pr-21898 <https://github.com/apache/spark/pull/21898/files> (and others?)
would be cool. I've never seen an ideal workflow for handling multi-PR
development efforts.


On Wed, Aug 1, 2018 at 1:43 PM, Imran Rashid <im...@therashids.com> wrote:

> I still would like to do more review on barrier mode changes, but from
> what I've seen so far I agree. I dunno if it'll really be ready for use,
> but it should not pose much risk for code which doesn't touch the new
> features.  of course, every change has some risk, especially in the
> scheduler which has proven to be very brittle (I've written plenty of
> scheduler bugs while fixing other things myself).
>
> On Wed, Aug 1, 2018 at 1:13 PM, Xingbo Jiang <jiangxb1...@gmail.com>
> wrote:
>
>> Speaking of the code from hydrogen PRs, actually we didn't remove any of
>> the existing logic, and I tried my best to hide almost all of the newly
>> added logic behind a `isBarrier` tag (or something similar). I have to add
>> some new variables and new methods to the core code paths, but I think they
>> shall not be hit if you are not running barrier workloads.
>>
>> The only significant change I can think of is I swapped the sequence of
>> failure handling in DAGScheduler, moving the `case FetchFailed` block to
>> before the `case Resubmitted` block, but again I don't think this shall
>> affect a regular workload because anyway you can only have one failure type.
>>
>> Actually I also reviewed the previous PRs adding Spark on K8s support,
>> and I feel it's a good example of how to add new features to a project
>> without breaking existing workloads, I'm trying to follow that way in
>> adding barrier execution mode support.
>>
>> I really appreciate any notice on hydrogen PRs and welcome comments to
>> help improve the feature, thanks!
>>
>> 2018-08-01 4:19 GMT+08:00 Reynold Xin <r...@databricks.com>:
>>
>>> I actually totally agree that we should make sure it should have no
>>> impact on existing code if the feature is not used.
>>>
>>>
>>> On Tue, Jul 31, 2018 at 1:18 PM Erik Erlandson <eerla...@redhat.com>
>>> wrote:
>>>
>>>> I don't have a comprehensive knowledge of the project hydrogen PRs,
>>>> however I've perused them, and they make substantial modifications to
>>>> Spark's core DAG scheduler code.
>>>>
>>>> What I'm wondering is: how high is the confidence level that the
>>>> "traditional" code paths are still stable. Put another way, is it even
>>>> possible to "turn off" or "opt out" of this experimental feature? This
>>>> analogy isn't perfect, but for example the k8s back-end is a major body of
>>>> code, but it has a very small impact on any *core* code paths, and so if
>>>> you opt out of it, it is well understood that you aren't running any
>>>> experimental code.
>>>>
>>>> Looking at the project hydrogen code, I'm less sure the same is true.
>>>> However, maybe there is a clear way to show how it is true.
>>>>
>>>>
>>>> On Tue, Jul 31, 2018 at 12:03 PM, Mark Hamstra <m...@clearstorydata.com
>>>> > wrote:
>>>>
>>>>> No reasonable amount of time is likely going to be sufficient to fully
>>>>> vet the code as a PR. I'm not entirely happy with the design and code as
>>>>> they currently are (and I'm still trying to find the time to more publicly
>>>>> express my thoughts and concerns), but I'm fine with them going into 2.4
>>>>> much as they are as long as they go in with proper stability annotations
>>>>> and are understood not to be cast-in-stone final implementations, but
>>>>> rather as a way to get people using them and generating the feedback that
>>>>> is necessary to get us to something more like a final design and
>>>>> implementation.
>>>>>
>>>>> On Tue, Jul 31, 2018 at 11:54 AM Erik Erlandson <eerla...@redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Barrier mode seems like a high impact feature on Spark's core code:
>>>>>> is one additional week enough time to properly vet this feature?
>>>>>>
>>>>>> On Tue, Jul 31, 2018 at 7:10 AM, Joseph Torres <
>>>>>> joseph.tor...@databricks.com> wrote:
>>>>>>
>>>>>>> Full continuous processing aggregation support ran into
>>>>>>> unanticipated scalability and scheduling problems. We’re planning to
>>>>>>> overcome those by using some of the barrier execution machinery, but 
>>>>>>> since
>>>>>>> barrier execution itself is still in progress the full support isn’t 
>>>>>>> going
>>>>>>> to make it into 2.4.
>>>>>>>
>>>>>>> Jose
>>>>>>>
>>>>>>> On Tue, Jul 31, 2018 at 6:07 AM Tomasz Gawęda <
>>>>>>> tomasz.gaw...@outlook.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> what is the status of Continuous Processing + Aggregations? As far
>>>>>>>> as I
>>>>>>>> remember, Jose Torres said it should  be easy to perform
>>>>>>>> aggregations if
>>>>>>>> coalesce(1) work. IIRC it's already merged to master.
>>>>>>>>
>>>>>>>> Is this work in progress? If yes, it would be great to have full
>>>>>>>> aggregation/join support in Spark 2.4 in CP.
>>>>>>>>
>>>>>>>> Pozdrawiam / Best regards,
>>>>>>>>
>>>>>>>> Tomek
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2018-07-31 10:43, Petar Zečević wrote:
>>>>>>>> > This one is important to us: https://issues.apache.org/jira
>>>>>>>> /browse/SPARK-24020 (Sort-merge join inner range optimization) but
>>>>>>>> I think it could be useful to others too.
>>>>>>>> >
>>>>>>>> > It is finished and is ready to be merged (was ready a month ago
>>>>>>>> at least).
>>>>>>>> >
>>>>>>>> > Do you think you could consider including it in 2.4?
>>>>>>>> >
>>>>>>>> > Petar
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Wenchen Fan @ 1970-01-01 01:00 CET:
>>>>>>>> >
>>>>>>>> >> I went through the open JIRA tickets and here is a list that we
>>>>>>>> should consider for Spark 2.4:
>>>>>>>> >>
>>>>>>>> >> High Priority:
>>>>>>>> >> SPARK-24374: Support Barrier Execution Mode in Apache Spark
>>>>>>>> >> This one is critical to the Spark ecosystem for deep learning.
>>>>>>>> It only has a few remaining works and I think we should have it in 
>>>>>>>> Spark
>>>>>>>> 2.4.
>>>>>>>> >>
>>>>>>>> >> Middle Priority:
>>>>>>>> >> SPARK-23899: Built-in SQL Function Improvement
>>>>>>>> >> We've already added a lot of built-in functions in this release,
>>>>>>>> but there are a few useful higher-order functions in progress, like
>>>>>>>> `array_except`, `transform`, etc. It would be great if we can get them 
>>>>>>>> in
>>>>>>>> Spark 2.4.
>>>>>>>> >>
>>>>>>>> >> SPARK-14220: Build and test Spark against Scala 2.12
>>>>>>>> >> Very close to finishing, great to have it in Spark 2.4.
>>>>>>>> >>
>>>>>>>> >> SPARK-4502: Spark SQL reads unnecessary nested fields from
>>>>>>>> Parquet
>>>>>>>> >> This one is there for years (thanks for your patience Michael!),
>>>>>>>> and is also close to finishing. Great to have it in 2.4.
>>>>>>>> >>
>>>>>>>> >> SPARK-24882: data source v2 API improvement
>>>>>>>> >> This is to improve the data source v2 API based on what we
>>>>>>>> learned during this release. From the migration of existing sources and
>>>>>>>> design of new features, we found some problems in the API and want to
>>>>>>>> address them. I believe this should be
>>>>>>>> >> the last significant API change to data source v2, so great to
>>>>>>>> have in Spark 2.4. I'll send a discuss email about it later.
>>>>>>>> >>
>>>>>>>> >> SPARK-24252: Add catalog support in Data Source V2
>>>>>>>> >> This is a very important feature for data source v2, and is
>>>>>>>> currently being discussed in the dev list.
>>>>>>>> >>
>>>>>>>> >> SPARK-24768: Have a built-in AVRO data source implementation
>>>>>>>> >> Most of it is done, but date/timestamp support is still missing.
>>>>>>>> Great to have in 2.4.
>>>>>>>> >>
>>>>>>>> >> SPARK-23243: Shuffle+Repartition on an RDD could lead to
>>>>>>>> incorrect answers
>>>>>>>> >> This is a long-standing correctness bug, great to have in 2.4.
>>>>>>>> >>
>>>>>>>> >> There are some other important features like the adaptive
>>>>>>>> execution, streaming SQL, etc., not in the list, since I think we are 
>>>>>>>> not
>>>>>>>> able to finish them before 2.4.
>>>>>>>> >>
>>>>>>>> >> Feel free to add more things if you think they are important to
>>>>>>>> Spark 2.4 by replying to this email.
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >> Wenchen
>>>>>>>> >>
>>>>>>>> >> On Mon, Jul 30, 2018 at 11:00 PM Sean Owen <sro...@apache.org>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >>   In theory releases happen on a time-based cadence, so it's
>>>>>>>> pretty much wrap up what's ready by the code freeze and ship it. In
>>>>>>>> practice, the cadence slips frequently, and it's very much a 
>>>>>>>> negotiation
>>>>>>>> about what features should push the
>>>>>>>> >>   code freeze out a few weeks every time. So, kind of a hybrid
>>>>>>>> approach here that works OK.
>>>>>>>> >>
>>>>>>>> >>   Certainly speak up if you think there's something that really
>>>>>>>> needs to get into 2.4. This is that discuss thread.
>>>>>>>> >>
>>>>>>>> >>   (BTW I updated the page you mention just yesterday, to reflect
>>>>>>>> the plan suggested in this thread.)
>>>>>>>> >>
>>>>>>>> >>   On Mon, Jul 30, 2018 at 9:51 AM Tom Graves
>>>>>>>> <tgraves...@yahoo.com.invalid> wrote:
>>>>>>>> >>
>>>>>>>> >>   Shouldn't this be a discuss thread?
>>>>>>>> >>
>>>>>>>> >>   I'm also happy to see more release managers and agree the time
>>>>>>>> is getting close, but we should see what features are in progress and 
>>>>>>>> see
>>>>>>>> how close things are and propose a date based on that.  Cutting a 
>>>>>>>> branch to
>>>>>>>> soon just creates
>>>>>>>> >>   more work for committers to push to more branches.
>>>>>>>> >>
>>>>>>>> >>    http://spark.apache.org/versioning-policy.html mentioned the
>>>>>>>> code freeze and release branch cut mid-august.
>>>>>>>> >>
>>>>>>>> >>   Tom
>>>>>>>> >
>>>>>>>> > ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>

Reply via email to