Thank you, Josh and Xiao. That sounds great. Do you think we can have some parts of that improvement in `2.4.4` document first since that is the very next release?
Bests, Dongjoon. On Sun, Jul 14, 2019 at 4:25 PM Xiao Li <lix...@databricks.com> wrote: > Yeah, Josh! All these ideas sound good to me. All the top commercial > database products have very detailed guide/document about the version > upgrading. You can easily find them. > > Currently, only SQL and ML modules have the migration or upgrade guides. > Since Spark 2.3 release, we strictly require the PR authors to document all > the behavior changes in the SQL component. I would suggest to do the same > things in the other modules. For example, Spark Core and Structured > Streaming. Any objection? > > Cheers, > > Xiao > > > > On Sun, Jul 14, 2019 at 2:05 PM Josh Rosen <rosenvi...@gmail.com> wrote: > >> I'd like to discuss the Spark SQL migration / upgrade guides in the Spark >> documentation: these are valuable resources and I think we could increase >> that value by making these docs easier to discover and by adding a bit more >> structure to the existing content. >> >> For folks who aren't familiar with these docs: the Spark docs have a "SQL >> Migration Guide" which lists the deprecations and changes of behavior in >> each release: >> >> - Latest published version: >> https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html >> - Master branch version (will become 3.0): >> >> https://github.com/apache/spark/blob/master/docs/sql-migration-guide-upgrade.md >> >> A lot of community work went into crafting this doc and I really >> appreciate those efforts. >> >> This doc is a little hard to find, though, because it's not consistently >> linked from release notes pages: the 2.4.0 page links it under "Changes of >> Behavior" ( >> https://spark.apache.org/releases/spark-release-2-4-0.html#changes-of-behavior) >> but subsequent maintenance releases do not link to it ( >> https://spark.apache.org/releases/spark-release-2-4-1.html). It's also >> not very cross-linked from the rest of the Spark docs (e.g. the Overview >> doc, doc drop-down menus, etc). >> >> I'm also concerned that the doc may be overwhelming to end users (as >> opposed to Spark developers): >> >> - *Entries aren't grouped by component*, so users need to read the >> entire document to spot changes relevant to their use of Spark (for >> example, PySpark changes are not grouped together). >> - *Entries aren't ordered by size / risk of change,* e.g. performance >> impact vs. loud behavior change (stopping with an explicit exception) vs. >> silent behavior changes (e.g. changing default rounding behavior). If we >> assume limited reader attention then it may be important to prioritize the >> order in which we list entries, putting the highest-expected-impact / >> lowest-organic-discoverability changes first. >> - *We don't link JIRAs*, forcing users to do their own archaeology to >> learn more about a specific change. >> >> The existing ML migration guide addresses some of these issues, so maybe >> we can emulate it in the SQL guide: >> https://spark.apache.org/docs/latest/ml-guide.html#migration-guide >> >> I think that documentation clarity is especially important with Spark 3.0 >> around the corner: many folks will seek out this information when they >> upgrade, so improving this guide can be a high-leverage, high-impact >> activity. >> >> What do folks think? Does anyone have examples from other projects which >> do a notably good job of crafting release notes / migration guides? I'd be >> glad to help with pre-release editing after we decide on a structure and >> style. >> >> Cheers, >> Josh >> > > > -- > [image: Databricks Summit - Watch the talks] > <https://databricks.com/sparkaisummit/north-america> >