Re: [discuss] Spark 2.x release cadence
Sorry. I think I just replied to the wrong thread. :( WQ On Thu, Sep 29, 2016 at 10:58 AM, Weiqing Yangwrote: > +1 (non binding) > > > > RC4 is compiled and tested on the system: CentOS Linux release > 7.0.1406 / openjdk 1.8.0_102 / R 3.3.1 > > All tests passed. > > > > ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver > -Dpyspark -Dsparkr -DskipTests clean package > > ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver > -Dpyspark -Dsparkr test > > > > > > Best, > > Weiqing > > On Thu, Sep 29, 2016 at 8:02 AM, Cody Koeninger > wrote: > >> Regarding documentation debt, is there a reason not to deploy >> documentation updates more frequently than releases? I recall this >> used to be the case. >> >> On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley >> wrote: >> > +1 for 4 months. With QA taking about a month, that's very reasonable. >> > >> > My main ask (especially for MLlib) is for contributors and committers to >> > take extra care not to delay on updating the Programming Guide for new >> APIs. >> > Documentation debt often collects and has to be paid off during QA, and >> a >> > longer cycle will exacerbate this problem. >> > >> > On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves >> >> > wrote: >> >> >> >> +1 to 4 months. >> >> >> >> Tom >> >> >> >> >> >> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin < >> r...@databricks.com> >> >> wrote: >> >> >> >> >> >> We are 2 months past releasing Spark 2.0.0, an important milestone for >> the >> >> project. Spark 2.0.0 deviated (took 6 month from the regular release >> cadence >> >> we had for the 1.x line, and we never explicitly discussed what the >> release >> >> cadence should look like for 2.x. Thus this email. >> >> >> >> During Spark 1.x, roughly every three months we make a new 1.x feature >> >> release (e.g. 1.5.0 comes out three months after 1.4.0). Development >> >> happened primarily in the first two months, and then a release branch >> was >> >> cut at the end of month 2, and the last month was reserved for QA and >> >> release preparation. >> >> >> >> During 2.0.0 development, I really enjoyed the longer release cycle >> >> because there was a lot of major changes happening and the longer time >> was >> >> critical for thinking through architectural changes as well as API >> design. >> >> While I don't expect the same degree of drastic changes in a 2.x >> feature >> >> release, I do think it'd make sense to increase the length of release >> cycle >> >> so we can make better designs. >> >> >> >> My strawman proposal is to maintain a regular release cadence, as we >> did >> >> in Spark 1.x, and increase the cycle from 3 months to 4 months. This >> >> effectively gives us ~50% more time to develop (in reality it'd be >> slightly >> >> less than 50% since longer dev time also means longer QA time). As for >> >> maintenance releases, I think those should still be cut on-demand, >> similar >> >> to Spark 1.x, but more aggressively. >> >> >> >> To put this into perspective, 4-month cycle means we will release Spark >> >> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at >> the >> >> end of Oct). >> >> >> >> I am curious what others think. >> >> >> >> >> >> >> >> >> > >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >
Re: [discuss] Spark 2.x release cadence
+1 (non binding) RC4 is compiled and tested on the system: CentOS Linux release 7.0.1406 / openjdk 1.8.0_102 / R 3.3.1 All tests passed. ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpyspark -Dsparkr -DskipTests clean package ./build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpyspark -Dsparkr test Best, Weiqing On Thu, Sep 29, 2016 at 8:02 AM, Cody Koeningerwrote: > Regarding documentation debt, is there a reason not to deploy > documentation updates more frequently than releases? I recall this > used to be the case. > > On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley > wrote: > > +1 for 4 months. With QA taking about a month, that's very reasonable. > > > > My main ask (especially for MLlib) is for contributors and committers to > > take extra care not to delay on updating the Programming Guide for new > APIs. > > Documentation debt often collects and has to be paid off during QA, and a > > longer cycle will exacerbate this problem. > > > > On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves > > > wrote: > >> > >> +1 to 4 months. > >> > >> Tom > >> > >> > >> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin < > r...@databricks.com> > >> wrote: > >> > >> > >> We are 2 months past releasing Spark 2.0.0, an important milestone for > the > >> project. Spark 2.0.0 deviated (took 6 month from the regular release > cadence > >> we had for the 1.x line, and we never explicitly discussed what the > release > >> cadence should look like for 2.x. Thus this email. > >> > >> During Spark 1.x, roughly every three months we make a new 1.x feature > >> release (e.g. 1.5.0 comes out three months after 1.4.0). Development > >> happened primarily in the first two months, and then a release branch > was > >> cut at the end of month 2, and the last month was reserved for QA and > >> release preparation. > >> > >> During 2.0.0 development, I really enjoyed the longer release cycle > >> because there was a lot of major changes happening and the longer time > was > >> critical for thinking through architectural changes as well as API > design. > >> While I don't expect the same degree of drastic changes in a 2.x feature > >> release, I do think it'd make sense to increase the length of release > cycle > >> so we can make better designs. > >> > >> My strawman proposal is to maintain a regular release cadence, as we did > >> in Spark 1.x, and increase the cycle from 3 months to 4 months. This > >> effectively gives us ~50% more time to develop (in reality it'd be > slightly > >> less than 50% since longer dev time also means longer QA time). As for > >> maintenance releases, I think those should still be cut on-demand, > similar > >> to Spark 1.x, but more aggressively. > >> > >> To put this into perspective, 4-month cycle means we will release Spark > >> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at > the > >> end of Oct). > >> > >> I am curious what others think. > >> > >> > >> > >> > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: [discuss] Spark 2.x release cadence
Regarding documentation debt, is there a reason not to deploy documentation updates more frequently than releases? I recall this used to be the case. On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradleywrote: > +1 for 4 months. With QA taking about a month, that's very reasonable. > > My main ask (especially for MLlib) is for contributors and committers to > take extra care not to delay on updating the Programming Guide for new APIs. > Documentation debt often collects and has to be paid off during QA, and a > longer cycle will exacerbate this problem. > > On Wed, Sep 28, 2016 at 7:30 AM, Tom Graves > wrote: >> >> +1 to 4 months. >> >> Tom >> >> >> On Tuesday, September 27, 2016 2:07 PM, Reynold Xin >> wrote: >> >> >> We are 2 months past releasing Spark 2.0.0, an important milestone for the >> project. Spark 2.0.0 deviated (took 6 month from the regular release cadence >> we had for the 1.x line, and we never explicitly discussed what the release >> cadence should look like for 2.x. Thus this email. >> >> During Spark 1.x, roughly every three months we make a new 1.x feature >> release (e.g. 1.5.0 comes out three months after 1.4.0). Development >> happened primarily in the first two months, and then a release branch was >> cut at the end of month 2, and the last month was reserved for QA and >> release preparation. >> >> During 2.0.0 development, I really enjoyed the longer release cycle >> because there was a lot of major changes happening and the longer time was >> critical for thinking through architectural changes as well as API design. >> While I don't expect the same degree of drastic changes in a 2.x feature >> release, I do think it'd make sense to increase the length of release cycle >> so we can make better designs. >> >> My strawman proposal is to maintain a regular release cadence, as we did >> in Spark 1.x, and increase the cycle from 3 months to 4 months. This >> effectively gives us ~50% more time to develop (in reality it'd be slightly >> less than 50% since longer dev time also means longer QA time). As for >> maintenance releases, I think those should still be cut on-demand, similar >> to Spark 1.x, but more aggressively. >> >> To put this into perspective, 4-month cycle means we will release Spark >> 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the >> end of Oct). >> >> I am curious what others think. >> >> >> >> > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: [discuss] Spark 2.x release cadence
+1 for 4 months. With QA taking about a month, that's very reasonable. My main ask (especially for MLlib) is for contributors and committers to take extra care not to delay on updating the Programming Guide for new APIs. Documentation debt often collects and has to be paid off during QA, and a longer cycle will exacerbate this problem. On Wed, Sep 28, 2016 at 7:30 AM, Tom Graveswrote: > +1 to 4 months. > > Tom > > > On Tuesday, September 27, 2016 2:07 PM, Reynold Xin > wrote: > > > We are 2 months past releasing Spark 2.0.0, an important milestone for the > project. Spark 2.0.0 deviated (took 6 month from the regular release > cadence we had for the 1.x line, and we never explicitly discussed what the > release cadence should look like for 2.x. Thus this email. > > During Spark 1.x, roughly every three months we make a new 1.x feature > release (e.g. 1.5.0 comes out three months after 1.4.0). Development > happened primarily in the first two months, and then a release branch was > cut at the end of month 2, and the last month was reserved for QA and > release preparation. > > During 2.0.0 development, I really enjoyed the longer release cycle > because there was a lot of major changes happening and the longer time was > critical for thinking through architectural changes as well as API design. > While I don't expect the same degree of drastic changes in a 2.x feature > release, I do think it'd make sense to increase the length of release cycle > so we can make better designs. > > My strawman proposal is to maintain a regular release cadence, as we did > in Spark 1.x, and increase the cycle from 3 months to 4 months. This > effectively gives us ~50% more time to develop (in reality it'd be slightly > less than 50% since longer dev time also means longer QA time). As for > maintenance releases, I think those should still be cut on-demand, similar > to Spark 1.x, but more aggressively. > > To put this into perspective, 4-month cycle means we will release Spark > 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the > end of Oct). > > I am curious what others think. > > > > >
Re: [discuss] Spark 2.x release cadence
+1 to 4 months. Tom On Tuesday, September 27, 2016 2:07 PM, Reynold Xinwrote: We are 2 months past releasing Spark 2.0.0, an important milestone for the project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we had for the 1.x line, and we never explicitly discussed what the release cadence should look like for 2.x. Thus this email. During Spark 1.x, roughly every three months we make a new 1.x feature release (e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily in the first two months, and then a release branch was cut at the end of month 2, and the last month was reserved for QA and release preparation. During 2.0.0 development, I really enjoyed the longer release cycle because there was a lot of major changes happening and the longer time was critical for thinking through architectural changes as well as API design. While I don't expect the same degree of drastic changes in a 2.x feature release, I do think it'd make sense to increase the length of release cycle so we can make better designs. My strawman proposal is to maintain a regular release cadence, as we did in Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively gives us ~50% more time to develop (in reality it'd be slightly less than 50% since longer dev time also means longer QA time). As for maintenance releases, I think those should still be cut on-demand, similar to Spark 1.x, but more aggressively. To put this into perspective, 4-month cycle means we will release Spark 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct). I am curious what others think.
Re: [discuss] Spark 2.x release cadence
+1 on longer release cycle at schedule and more maintenance releases. _ From: Mark Hamstra <m...@clearstorydata.com<mailto:m...@clearstorydata.com>> Sent: Tuesday, September 27, 2016 2:01 PM Subject: Re: [discuss] Spark 2.x release cadence To: Reynold Xin <r...@databricks.com<mailto:r...@databricks.com>> Cc: <dev@spark.apache.org<mailto:dev@spark.apache.org>> +1 And I'll dare say that for those with Spark in production, what is more important is that maintenance releases come out in a timely fashion than that new features are released one month sooner or later. On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xin <r...@databricks.com<mailto:r...@databricks.com>> wrote: We are 2 months past releasing Spark 2.0.0, an important milestone for the project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we had for the 1.x line, and we never explicitly discussed what the release cadence should look like for 2.x. Thus this email. During Spark 1.x, roughly every three months we make a new 1.x feature release (e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily in the first two months, and then a release branch was cut at the end of month 2, and the last month was reserved for QA and release preparation. During 2.0.0 development, I really enjoyed the longer release cycle because there was a lot of major changes happening and the longer time was critical for thinking through architectural changes as well as API design. While I don't expect the same degree of drastic changes in a 2.x feature release, I do think it'd make sense to increase the length of release cycle so we can make better designs. My strawman proposal is to maintain a regular release cadence, as we did in Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively gives us ~50% more time to develop (in reality it'd be slightly less than 50% since longer dev time also means longer QA time). As for maintenance releases, I think those should still be cut on-demand, similar to Spark 1.x, but more aggressively. To put this into perspective, 4-month cycle means we will release Spark 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct). I am curious what others think.
Re: [discuss] Spark 2.x release cadence
+1 And I'll dare say that for those with Spark in production, what is more important is that maintenance releases come out in a timely fashion than that new features are released one month sooner or later. On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xinwrote: > We are 2 months past releasing Spark 2.0.0, an important milestone for the > project. Spark 2.0.0 deviated (took 6 month from the regular release > cadence we had for the 1.x line, and we never explicitly discussed what the > release cadence should look like for 2.x. Thus this email. > > During Spark 1.x, roughly every three months we make a new 1.x feature > release (e.g. 1.5.0 comes out three months after 1.4.0). Development > happened primarily in the first two months, and then a release branch was > cut at the end of month 2, and the last month was reserved for QA and > release preparation. > > During 2.0.0 development, I really enjoyed the longer release cycle > because there was a lot of major changes happening and the longer time was > critical for thinking through architectural changes as well as API design. > While I don't expect the same degree of drastic changes in a 2.x feature > release, I do think it'd make sense to increase the length of release cycle > so we can make better designs. > > My strawman proposal is to maintain a regular release cadence, as we did > in Spark 1.x, and increase the cycle from 3 months to 4 months. This > effectively gives us ~50% more time to develop (in reality it'd be slightly > less than 50% since longer dev time also means longer QA time). As for > maintenance releases, I think those should still be cut on-demand, similar > to Spark 1.x, but more aggressively. > > To put this into perspective, 4-month cycle means we will release Spark > 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the > end of Oct). > > I am curious what others think. > > >
Re: [discuss] Spark 2.x release cadence
+1 -- I think the minor releases were taking more like 4 months than 3 months anyway, and it was good for the reasons you give. This reflects reality and is a good thing. All the better if we then can more comfortably really follow the timeline. On Tue, Sep 27, 2016 at 3:06 PM, Reynold Xinwrote: > We are 2 months past releasing Spark 2.0.0, an important milestone for the > project. Spark 2.0.0 deviated (took 6 month from the regular release cadence > we had for the 1.x line, and we never explicitly discussed what the release > cadence should look like for 2.x. Thus this email. > > During Spark 1.x, roughly every three months we make a new 1.x feature > release (e.g. 1.5.0 comes out three months after 1.4.0). Development > happened primarily in the first two months, and then a release branch was > cut at the end of month 2, and the last month was reserved for QA and > release preparation. > > During 2.0.0 development, I really enjoyed the longer release cycle because > there was a lot of major changes happening and the longer time was critical > for thinking through architectural changes as well as API design. While I > don't expect the same degree of drastic changes in a 2.x feature release, I > do think it'd make sense to increase the length of release cycle so we can > make better designs. > > My strawman proposal is to maintain a regular release cadence, as we did in > Spark 1.x, and increase the cycle from 3 months to 4 months. This > effectively gives us ~50% more time to develop (in reality it'd be slightly > less than 50% since longer dev time also means longer QA time). As for > maintenance releases, I think those should still be cut on-demand, similar > to Spark 1.x, but more aggressively. > > To put this into perspective, 4-month cycle means we will release Spark > 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the > end of Oct). > > I am curious what others think. > > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: [discuss] Spark 2.x release cadence
+1 I think having a 4 month window instead of a 3 month window sounds good. However I think figuring out a timeline for maintenance releases would also be good. This is a common concern that comes up in many user threads and it'll be better to have some structure around this. It doesn't need to be strict, but something like the first maintenance release for the latest 2.x.0 release within 2 months. And then a second maintenance release within 6 months or something like that. Thanks Shivaram On Tue, Sep 27, 2016 at 12:06 PM, Reynold Xinwrote: > We are 2 months past releasing Spark 2.0.0, an important milestone for the > project. Spark 2.0.0 deviated (took 6 month from the regular release cadence > we had for the 1.x line, and we never explicitly discussed what the release > cadence should look like for 2.x. Thus this email. > > During Spark 1.x, roughly every three months we make a new 1.x feature > release (e.g. 1.5.0 comes out three months after 1.4.0). Development > happened primarily in the first two months, and then a release branch was > cut at the end of month 2, and the last month was reserved for QA and > release preparation. > > During 2.0.0 development, I really enjoyed the longer release cycle because > there was a lot of major changes happening and the longer time was critical > for thinking through architectural changes as well as API design. While I > don't expect the same degree of drastic changes in a 2.x feature release, I > do think it'd make sense to increase the length of release cycle so we can > make better designs. > > My strawman proposal is to maintain a regular release cadence, as we did in > Spark 1.x, and increase the cycle from 3 months to 4 months. This > effectively gives us ~50% more time to develop (in reality it'd be slightly > less than 50% since longer dev time also means longer QA time). As for > maintenance releases, I think those should still be cut on-demand, similar > to Spark 1.x, but more aggressively. > > To put this into perspective, 4-month cycle means we will release Spark > 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the > end of Oct). > > I am curious what others think. > > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
[discuss] Spark 2.x release cadence
We are 2 months past releasing Spark 2.0.0, an important milestone for the project. Spark 2.0.0 deviated (took 6 month from the regular release cadence we had for the 1.x line, and we never explicitly discussed what the release cadence should look like for 2.x. Thus this email. During Spark 1.x, roughly every three months we make a new 1.x feature release (e.g. 1.5.0 comes out three months after 1.4.0). Development happened primarily in the first two months, and then a release branch was cut at the end of month 2, and the last month was reserved for QA and release preparation. During 2.0.0 development, I really enjoyed the longer release cycle because there was a lot of major changes happening and the longer time was critical for thinking through architectural changes as well as API design. While I don't expect the same degree of drastic changes in a 2.x feature release, I do think it'd make sense to increase the length of release cycle so we can make better designs. My strawman proposal is to maintain a regular release cadence, as we did in Spark 1.x, and increase the cycle from 3 months to 4 months. This effectively gives us ~50% more time to develop (in reality it'd be slightly less than 50% since longer dev time also means longer QA time). As for maintenance releases, I think those should still be cut on-demand, similar to Spark 1.x, but more aggressively. To put this into perspective, 4-month cycle means we will release Spark 2.1.0 at the end of Nov or early Dec (and branch cut / code freeze at the end of Oct). I am curious what others think.