Re: [DISCUSS] Deprecate DStream in 3.4
Heads-up: It's addressed via https://issues.apache.org/jira/browse/SPARK-42075. We just marked deprecation in the entry point of DStream, StreamContext. Marking all classes in the DStream module is not pragmatic and users would see the warning message anyway. On Mon, Jan 16, 2023 at 8:26 AM Jungtaek Lim wrote: > Given that I got more than 3 PMC members' positive votes as well as > several active contributors' positive votes as well, I will proceed with > the actual work. > (It may take a couple of more days as folk in US will help me and there's > a holiday in US.) > > Please let me know if we want to have an official vote thread before > moving forward. > > Thanks all for providing your voices on this! > > On Sat, Jan 14, 2023 at 3:56 AM Anish Shrigondekar < > anish.shrigonde...@databricks.com> wrote: > >> +1 on the Dstreams deprecation proposal >> >> On Fri, Jan 13, 2023 at 10:47 AM Jerry Peng >> wrote: >> >>> +1 in general for marking the DStreams API as deprecated >>> >>> Jungtaek, can you please provide / elaborate on the concrete actions you >>> intend on taking for the depreciation process? >>> >>> Best, >>> >>> Jerry >>> >>> On Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh wrote: >>> +1 On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim wrote: > > Yes, exactly. I'm sorry to bring confusion - should have clarified action items on the proposal. > > On Fri, Jan 13, 2023 at 3:31 PM Dongjoon Hyun < dongjoon.h...@gmail.com> wrote: >> >> Then, could you elaborate `the proposed code change` specifically? >> Maybe, usual deprecation warning logs and annotation on the API? >> >> >> On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim < kabhwan.opensou...@gmail.com> wrote: >>> >>> Maybe I need to clarify - my proposal is "explicitly" deprecating it, which incurs code change for sure. Guidance on the Spark website is done already as I mentioned - we updated the DStream doc page to mention that DStream is a "legacy" project and users should move to SS. I don't feel this is sufficient to refrain users from using it, hence initiating this proposal. >>> >>> Sorry to make confusion. I just wanted to make sure the goal of the proposal is not "removing" the API. The discussion on the removal of API doesn't tend to go well, so I wanted to make sure I don't mean that. >>> >>> On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun < dongjoon.h...@gmail.com> wrote: +1 for the proposal (guiding only without any code change). Thanks, Dongjoon. On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu wrote: > > +1 > > > On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < tathagata.das1...@gmail.com> wrote: >> >> +1 >> >> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon < gurwls...@gmail.com> wrote: >>> >>> +1 >>> >>> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < kabhwan.opensou...@gmail.com> wrote: bump for more visibility. On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < kabhwan.opensou...@gmail.com> wrote: > > Hi dev, > > I'd like to propose the deprecation of DStream in Spark 3.4, in favor of promoting Structured Streaming. > (Sorry for the late proposal, if we don't make the change in 3.4, we will have to wait for another 6 months.) > > We have been focusing on Structured Streaming for years (across multiple major and minor versions), and during the time we haven't made any improvements for DStream. Furthermore, recently we updated the DStream doc to explicitly say DStream is a legacy project. > https://spark.apache.org/docs/latest/streaming-programming-guide.html#note > > The baseline of deprecation is that we don't see a particular use case which only DStream solves. This is a different story with GraphX and MLLIB, as we don't have replacements for that. > > The proposal does not mean we will remove the API soon, as the Spark project has been making deprecation against public API. I don't intend to propose the target version for removal. The goal is to guide users to refrain from constructing a new workload with DStream. We might want to go with this in future, but it would require a new discussion thread at that time. > > What do you think? > > Thanks, > Jungtaek Lim (HeartSaVioR) - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: [DISCUSS] Deprecate DStream in 3.4
Given that I got more than 3 PMC members' positive votes as well as several active contributors' positive votes as well, I will proceed with the actual work. (It may take a couple of more days as folk in US will help me and there's a holiday in US.) Please let me know if we want to have an official vote thread before moving forward. Thanks all for providing your voices on this! On Sat, Jan 14, 2023 at 3:56 AM Anish Shrigondekar < anish.shrigonde...@databricks.com> wrote: > +1 on the Dstreams deprecation proposal > > On Fri, Jan 13, 2023 at 10:47 AM Jerry Peng > wrote: > >> +1 in general for marking the DStreams API as deprecated >> >> Jungtaek, can you please provide / elaborate on the concrete actions you >> intend on taking for the depreciation process? >> >> Best, >> >> Jerry >> >> On Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh wrote: >> >>> +1 >>> >>> On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim >>> wrote: >>> > >>> > Yes, exactly. I'm sorry to bring confusion - should have clarified >>> action items on the proposal. >>> > >>> > On Fri, Jan 13, 2023 at 3:31 PM Dongjoon Hyun >>> wrote: >>> >> >>> >> Then, could you elaborate `the proposed code change` specifically? >>> >> Maybe, usual deprecation warning logs and annotation on the API? >>> >> >>> >> >>> >> On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> >>> >>> >>> Maybe I need to clarify - my proposal is "explicitly" deprecating >>> it, which incurs code change for sure. Guidance on the Spark website is >>> done already as I mentioned - we updated the DStream doc page to mention >>> that DStream is a "legacy" project and users should move to SS. I don't >>> feel this is sufficient to refrain users from using it, hence initiating >>> this proposal. >>> >>> >>> >>> Sorry to make confusion. I just wanted to make sure the goal of the >>> proposal is not "removing" the API. The discussion on the removal of API >>> doesn't tend to go well, so I wanted to make sure I don't mean that. >>> >>> >>> >>> On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun < >>> dongjoon.h...@gmail.com> wrote: >>> >>> +1 for the proposal (guiding only without any code change). >>> >>> Thanks, >>> Dongjoon. >>> >>> On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu >>> wrote: >>> > >>> > +1 >>> > >>> > >>> > On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < >>> tathagata.das1...@gmail.com> wrote: >>> >> >>> >> +1 >>> >> >>> >> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon >>> wrote: >>> >>> >>> >>> +1 >>> >>> >>> >>> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> >>> bump for more visibility. >>> >>> On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> > >>> > Hi dev, >>> > >>> > I'd like to propose the deprecation of DStream in Spark 3.4, >>> in favor of promoting Structured Streaming. >>> > (Sorry for the late proposal, if we don't make the change in >>> 3.4, we will have to wait for another 6 months.) >>> > >>> > We have been focusing on Structured Streaming for years >>> (across multiple major and minor versions), and during the time we haven't >>> made any improvements for DStream. Furthermore, recently we updated the >>> DStream doc to explicitly say DStream is a legacy project. >>> > >>> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note >>> > >>> > The baseline of deprecation is that we don't see a particular >>> use case which only DStream solves. This is a different story with GraphX >>> and MLLIB, as we don't have replacements for that. >>> > >>> > The proposal does not mean we will remove the API soon, as the >>> Spark project has been making deprecation against public API. I don't >>> intend to propose the target version for removal. The goal is to guide >>> users to refrain from constructing a new workload with DStream. We might >>> want to go with this in future, but it would require a new discussion >>> thread at that time. >>> > >>> > What do you think? >>> > >>> > Thanks, >>> > Jungtaek Lim (HeartSaVioR) >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>>
Re: [DISCUSS] Deprecate DStream in 3.4
I described it in the thread - I had to add it in the reply so it's not easy to find. Sorry for the inconvenience. https://lists.apache.org/thread/d9yg7w9pnb9rw7c2yglp4qk6jt43y0kw On Sat, Jan 14, 2023 at 3:46 AM Jerry Peng wrote: > +1 in general for marking the DStreams API as deprecated > > Jungtaek, can you please provide / elaborate on the concrete actions you > intend on taking for the depreciation process? > > Best, > > Jerry > > On Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh wrote: > >> +1 >> >> On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim >> wrote: >> > >> > Yes, exactly. I'm sorry to bring confusion - should have clarified >> action items on the proposal. >> > >> > On Fri, Jan 13, 2023 at 3:31 PM Dongjoon Hyun >> wrote: >> >> >> >> Then, could you elaborate `the proposed code change` specifically? >> >> Maybe, usual deprecation warning logs and annotation on the API? >> >> >> >> >> >> On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> >> >>> Maybe I need to clarify - my proposal is "explicitly" deprecating it, >> which incurs code change for sure. Guidance on the Spark website is done >> already as I mentioned - we updated the DStream doc page to mention that >> DStream is a "legacy" project and users should move to SS. I don't feel >> this is sufficient to refrain users from using it, hence initiating this >> proposal. >> >>> >> >>> Sorry to make confusion. I just wanted to make sure the goal of the >> proposal is not "removing" the API. The discussion on the removal of API >> doesn't tend to go well, so I wanted to make sure I don't mean that. >> >>> >> >>> On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun < >> dongjoon.h...@gmail.com> wrote: >> >> +1 for the proposal (guiding only without any code change). >> >> Thanks, >> Dongjoon. >> >> On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu >> wrote: >> > >> > +1 >> > >> > >> > On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >> >> >> +1 >> >> >> >> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon >> wrote: >> >>> >> >>> +1 >> >>> >> >>> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >> bump for more visibility. >> >> On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> > >> > Hi dev, >> > >> > I'd like to propose the deprecation of DStream in Spark 3.4, in >> favor of promoting Structured Streaming. >> > (Sorry for the late proposal, if we don't make the change in >> 3.4, we will have to wait for another 6 months.) >> > >> > We have been focusing on Structured Streaming for years (across >> multiple major and minor versions), and during the time we haven't made any >> improvements for DStream. Furthermore, recently we updated the DStream doc >> to explicitly say DStream is a legacy project. >> > >> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note >> > >> > The baseline of deprecation is that we don't see a particular >> use case which only DStream solves. This is a different story with GraphX >> and MLLIB, as we don't have replacements for that. >> > >> > The proposal does not mean we will remove the API soon, as the >> Spark project has been making deprecation against public API. I don't >> intend to propose the target version for removal. The goal is to guide >> users to refrain from constructing a new workload with DStream. We might >> want to go with this in future, but it would require a new discussion >> thread at that time. >> > >> > What do you think? >> > >> > Thanks, >> > Jungtaek Lim (HeartSaVioR) >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>
Re: [DISCUSS] Deprecate DStream in 3.4
+1 on the Dstreams deprecation proposal On Fri, Jan 13, 2023 at 10:47 AM Jerry Peng wrote: > +1 in general for marking the DStreams API as deprecated > > Jungtaek, can you please provide / elaborate on the concrete actions you > intend on taking for the depreciation process? > > Best, > > Jerry > > On Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh wrote: > >> +1 >> >> On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim >> wrote: >> > >> > Yes, exactly. I'm sorry to bring confusion - should have clarified >> action items on the proposal. >> > >> > On Fri, Jan 13, 2023 at 3:31 PM Dongjoon Hyun >> wrote: >> >> >> >> Then, could you elaborate `the proposed code change` specifically? >> >> Maybe, usual deprecation warning logs and annotation on the API? >> >> >> >> >> >> On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> >> >>> Maybe I need to clarify - my proposal is "explicitly" deprecating it, >> which incurs code change for sure. Guidance on the Spark website is done >> already as I mentioned - we updated the DStream doc page to mention that >> DStream is a "legacy" project and users should move to SS. I don't feel >> this is sufficient to refrain users from using it, hence initiating this >> proposal. >> >>> >> >>> Sorry to make confusion. I just wanted to make sure the goal of the >> proposal is not "removing" the API. The discussion on the removal of API >> doesn't tend to go well, so I wanted to make sure I don't mean that. >> >>> >> >>> On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun < >> dongjoon.h...@gmail.com> wrote: >> >> +1 for the proposal (guiding only without any code change). >> >> Thanks, >> Dongjoon. >> >> On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu >> wrote: >> > >> > +1 >> > >> > >> > On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >> >> >> +1 >> >> >> >> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon >> wrote: >> >>> >> >>> +1 >> >>> >> >>> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >> bump for more visibility. >> >> On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> > >> > Hi dev, >> > >> > I'd like to propose the deprecation of DStream in Spark 3.4, in >> favor of promoting Structured Streaming. >> > (Sorry for the late proposal, if we don't make the change in >> 3.4, we will have to wait for another 6 months.) >> > >> > We have been focusing on Structured Streaming for years (across >> multiple major and minor versions), and during the time we haven't made any >> improvements for DStream. Furthermore, recently we updated the DStream doc >> to explicitly say DStream is a legacy project. >> > >> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note >> > >> > The baseline of deprecation is that we don't see a particular >> use case which only DStream solves. This is a different story with GraphX >> and MLLIB, as we don't have replacements for that. >> > >> > The proposal does not mean we will remove the API soon, as the >> Spark project has been making deprecation against public API. I don't >> intend to propose the target version for removal. The goal is to guide >> users to refrain from constructing a new workload with DStream. We might >> want to go with this in future, but it would require a new discussion >> thread at that time. >> > >> > What do you think? >> > >> > Thanks, >> > Jungtaek Lim (HeartSaVioR) >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>
Re: [DISCUSS] Deprecate DStream in 3.4
+1 in general for marking the DStreams API as deprecated Jungtaek, can you please provide / elaborate on the concrete actions you intend on taking for the depreciation process? Best, Jerry On Thu, Jan 12, 2023 at 11:16 PM L. C. Hsieh wrote: > +1 > > On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim > wrote: > > > > Yes, exactly. I'm sorry to bring confusion - should have clarified > action items on the proposal. > > > > On Fri, Jan 13, 2023 at 3:31 PM Dongjoon Hyun > wrote: > >> > >> Then, could you elaborate `the proposed code change` specifically? > >> Maybe, usual deprecation warning logs and annotation on the API? > >> > >> > >> On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >>> > >>> Maybe I need to clarify - my proposal is "explicitly" deprecating it, > which incurs code change for sure. Guidance on the Spark website is done > already as I mentioned - we updated the DStream doc page to mention that > DStream is a "legacy" project and users should move to SS. I don't feel > this is sufficient to refrain users from using it, hence initiating this > proposal. > >>> > >>> Sorry to make confusion. I just wanted to make sure the goal of the > proposal is not "removing" the API. The discussion on the removal of API > doesn't tend to go well, so I wanted to make sure I don't mean that. > >>> > >>> On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun > wrote: > > +1 for the proposal (guiding only without any code change). > > Thanks, > Dongjoon. > > On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu > wrote: > > > > +1 > > > > > > On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < > tathagata.das1...@gmail.com> wrote: > >> > >> +1 > >> > >> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon > wrote: > >>> > >>> +1 > >>> > >>> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > > bump for more visibility. > > On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > > > > Hi dev, > > > > I'd like to propose the deprecation of DStream in Spark 3.4, in > favor of promoting Structured Streaming. > > (Sorry for the late proposal, if we don't make the change in > 3.4, we will have to wait for another 6 months.) > > > > We have been focusing on Structured Streaming for years (across > multiple major and minor versions), and during the time we haven't made any > improvements for DStream. Furthermore, recently we updated the DStream doc > to explicitly say DStream is a legacy project. > > > https://spark.apache.org/docs/latest/streaming-programming-guide.html#note > > > > The baseline of deprecation is that we don't see a particular > use case which only DStream solves. This is a different story with GraphX > and MLLIB, as we don't have replacements for that. > > > > The proposal does not mean we will remove the API soon, as the > Spark project has been making deprecation against public API. I don't > intend to propose the target version for removal. The goal is to guide > users to refrain from constructing a new workload with DStream. We might > want to go with this in future, but it would require a new discussion > thread at that time. > > > > What do you think? > > > > Thanks, > > Jungtaek Lim (HeartSaVioR) > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: [DISCUSS] Deprecate DStream in 3.4
+1 On Thu, Jan 12, 2023 at 10:39 PM Jungtaek Lim wrote: > > Yes, exactly. I'm sorry to bring confusion - should have clarified action > items on the proposal. > > On Fri, Jan 13, 2023 at 3:31 PM Dongjoon Hyun wrote: >> >> Then, could you elaborate `the proposed code change` specifically? >> Maybe, usual deprecation warning logs and annotation on the API? >> >> >> On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim >> wrote: >>> >>> Maybe I need to clarify - my proposal is "explicitly" deprecating it, which >>> incurs code change for sure. Guidance on the Spark website is done already >>> as I mentioned - we updated the DStream doc page to mention that DStream is >>> a "legacy" project and users should move to SS. I don't feel this is >>> sufficient to refrain users from using it, hence initiating this proposal. >>> >>> Sorry to make confusion. I just wanted to make sure the goal of the >>> proposal is not "removing" the API. The discussion on the removal of API >>> doesn't tend to go well, so I wanted to make sure I don't mean that. >>> >>> On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun >>> wrote: +1 for the proposal (guiding only without any code change). Thanks, Dongjoon. On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu wrote: > > +1 > > > On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das > wrote: >> >> +1 >> >> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon wrote: >>> >>> +1 >>> >>> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim >>> wrote: bump for more visibility. On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim wrote: > > Hi dev, > > I'd like to propose the deprecation of DStream in Spark 3.4, in favor > of promoting Structured Streaming. > (Sorry for the late proposal, if we don't make the change in 3.4, we > will have to wait for another 6 months.) > > We have been focusing on Structured Streaming for years (across > multiple major and minor versions), and during the time we haven't > made any improvements for DStream. Furthermore, recently we updated > the DStream doc to explicitly say DStream is a legacy project. > https://spark.apache.org/docs/latest/streaming-programming-guide.html#note > > The baseline of deprecation is that we don't see a particular use > case which only DStream solves. This is a different story with GraphX > and MLLIB, as we don't have replacements for that. > > The proposal does not mean we will remove the API soon, as the Spark > project has been making deprecation against public API. I don't > intend to propose the target version for removal. The goal is to > guide users to refrain from constructing a new workload with DStream. > We might want to go with this in future, but it would require a new > discussion thread at that time. > > What do you think? > > Thanks, > Jungtaek Lim (HeartSaVioR) - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: [DISCUSS] Deprecate DStream in 3.4
Yes, exactly. I'm sorry to bring confusion - should have clarified action items on the proposal. On Fri, Jan 13, 2023 at 3:31 PM Dongjoon Hyun wrote: > Then, could you elaborate `the proposed code change` specifically? > Maybe, usual deprecation warning logs and annotation on the API? > > > On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> Maybe I need to clarify - my proposal is "explicitly" deprecating it, >> which incurs code change for sure. Guidance on the Spark website is done >> already as I mentioned - we updated the DStream doc page to mention that >> DStream is a "legacy" project and users should move to SS. I don't feel >> this is sufficient to refrain users from using it, hence initiating >> this proposal. >> >> Sorry to make confusion. I just wanted to make sure the goal of the >> proposal is not "removing" the API. The discussion on the removal of API >> doesn't tend to go well, so I wanted to make sure I don't mean that. >> >> On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun >> wrote: >> >>> +1 for the proposal (guiding only without any code change). >>> >>> Thanks, >>> Dongjoon. >>> >>> On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu wrote: >>> +1 On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < tathagata.das1...@gmail.com> wrote: > +1 > > On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon > wrote: > >> +1 >> >> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> bump for more visibility. >>> >>> On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> Hi dev, I'd like to propose the deprecation of DStream in Spark 3.4, in favor of promoting Structured Streaming. (Sorry for the late proposal, if we don't make the change in 3.4, we will have to wait for another 6 months.) We have been focusing on Structured Streaming for years (across multiple major and minor versions), and during the time we haven't made any improvements for DStream. Furthermore, recently we updated the DStream doc to explicitly say DStream is a legacy project. https://spark.apache.org/docs/latest/streaming-programming-guide.html#note The baseline of deprecation is that we don't see a particular use case which only DStream solves. This is a different story with GraphX and MLLIB, as we don't have replacements for that. The proposal does not mean we will remove the API soon, as the Spark project has been making deprecation against public API. I don't intend to propose the target version for removal. The goal is to guide users to refrain from constructing a new workload with DStream. We might want to go with this in future, but it would require a new discussion thread at that time. What do you think? Thanks, Jungtaek Lim (HeartSaVioR) >>>
Re: [DISCUSS] Deprecate DStream in 3.4
Then, could you elaborate `the proposed code change` specifically? Maybe, usual deprecation warning logs and annotation on the API? On Thu, Jan 12, 2023 at 10:05 PM Jungtaek Lim wrote: > Maybe I need to clarify - my proposal is "explicitly" deprecating it, > which incurs code change for sure. Guidance on the Spark website is done > already as I mentioned - we updated the DStream doc page to mention that > DStream is a "legacy" project and users should move to SS. I don't feel > this is sufficient to refrain users from using it, hence initiating > this proposal. > > Sorry to make confusion. I just wanted to make sure the goal of the > proposal is not "removing" the API. The discussion on the removal of API > doesn't tend to go well, so I wanted to make sure I don't mean that. > > On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun > wrote: > >> +1 for the proposal (guiding only without any code change). >> >> Thanks, >> Dongjoon. >> >> On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu wrote: >> >>> +1 >>> >>> >>> On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < >>> tathagata.das1...@gmail.com> wrote: >>> +1 On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon wrote: > +1 > > On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> bump for more visibility. >> >> On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> Hi dev, >>> >>> I'd like to propose the deprecation of DStream in Spark 3.4, in >>> favor of promoting Structured Streaming. >>> (Sorry for the late proposal, if we don't make the change in 3.4, we >>> will have to wait for another 6 months.) >>> >>> We have been focusing on Structured Streaming for years (across >>> multiple major and minor versions), and during the time we haven't made >>> any >>> improvements for DStream. Furthermore, recently we updated the DStream >>> doc >>> to explicitly say DStream is a legacy project. >>> >>> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note >>> >>> The baseline of deprecation is that we don't see a particular use >>> case which only DStream solves. This is a different story with GraphX >>> and >>> MLLIB, as we don't have replacements for that. >>> >>> The proposal does not mean we will remove the API soon, as the Spark >>> project has been making deprecation against public API. I don't intend >>> to >>> propose the target version for removal. The goal is to guide users to >>> refrain from constructing a new workload with DStream. We might want to >>> go >>> with this in future, but it would require a new discussion thread at >>> that >>> time. >>> >>> What do you think? >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>
Re: [DISCUSS] Deprecate DStream in 3.4
There might be possible terminology differences, so let me elaborate the action item from the proposal explicitly: - Add "deprecation" annotation to the user-facing public API in streaming directory (DStream) - Write a release note to explicitly mention the deprecation. (Maybe promote again that they are encouraged to move to SS.) This is not an action item from the proposal: - Add (tentative) target version to remove the API on the deprecation message. Hope this makes the proposal crystally clear. On Fri, Jan 13, 2023 at 3:05 PM Jungtaek Lim wrote: > Maybe I need to clarify - my proposal is "explicitly" deprecating it, > which incurs code change for sure. Guidance on the Spark website is done > already as I mentioned - we updated the DStream doc page to mention that > DStream is a "legacy" project and users should move to SS. I don't feel > this is sufficient to refrain users from using it, hence initiating > this proposal. > > Sorry to make confusion. I just wanted to make sure the goal of the > proposal is not "removing" the API. The discussion on the removal of API > doesn't tend to go well, so I wanted to make sure I don't mean that. > > On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun > wrote: > >> +1 for the proposal (guiding only without any code change). >> >> Thanks, >> Dongjoon. >> >> On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu wrote: >> >>> +1 >>> >>> >>> On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < >>> tathagata.das1...@gmail.com> wrote: >>> +1 On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon wrote: > +1 > > On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> bump for more visibility. >> >> On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> Hi dev, >>> >>> I'd like to propose the deprecation of DStream in Spark 3.4, in >>> favor of promoting Structured Streaming. >>> (Sorry for the late proposal, if we don't make the change in 3.4, we >>> will have to wait for another 6 months.) >>> >>> We have been focusing on Structured Streaming for years (across >>> multiple major and minor versions), and during the time we haven't made >>> any >>> improvements for DStream. Furthermore, recently we updated the DStream >>> doc >>> to explicitly say DStream is a legacy project. >>> >>> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note >>> >>> The baseline of deprecation is that we don't see a particular use >>> case which only DStream solves. This is a different story with GraphX >>> and >>> MLLIB, as we don't have replacements for that. >>> >>> The proposal does not mean we will remove the API soon, as the Spark >>> project has been making deprecation against public API. I don't intend >>> to >>> propose the target version for removal. The goal is to guide users to >>> refrain from constructing a new workload with DStream. We might want to >>> go >>> with this in future, but it would require a new discussion thread at >>> that >>> time. >>> >>> What do you think? >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>
Re: [DISCUSS] Deprecate DStream in 3.4
Maybe I need to clarify - my proposal is "explicitly" deprecating it, which incurs code change for sure. Guidance on the Spark website is done already as I mentioned - we updated the DStream doc page to mention that DStream is a "legacy" project and users should move to SS. I don't feel this is sufficient to refrain users from using it, hence initiating this proposal. Sorry to make confusion. I just wanted to make sure the goal of the proposal is not "removing" the API. The discussion on the removal of API doesn't tend to go well, so I wanted to make sure I don't mean that. On Fri, Jan 13, 2023 at 2:46 PM Dongjoon Hyun wrote: > +1 for the proposal (guiding only without any code change). > > Thanks, > Dongjoon. > > On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu wrote: > >> +1 >> >> >> On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> +1 >>> >>> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon >>> wrote: >>> +1 On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < kabhwan.opensou...@gmail.com> wrote: > bump for more visibility. > > On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> Hi dev, >> >> I'd like to propose the deprecation of DStream in Spark 3.4, in favor >> of promoting Structured Streaming. >> (Sorry for the late proposal, if we don't make the change in 3.4, we >> will have to wait for another 6 months.) >> >> We have been focusing on Structured Streaming for years (across >> multiple major and minor versions), and during the time we haven't made >> any >> improvements for DStream. Furthermore, recently we updated the DStream >> doc >> to explicitly say DStream is a legacy project. >> >> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note >> >> The baseline of deprecation is that we don't see a particular use >> case which only DStream solves. This is a different story with GraphX and >> MLLIB, as we don't have replacements for that. >> >> The proposal does not mean we will remove the API soon, as the Spark >> project has been making deprecation against public API. I don't intend to >> propose the target version for removal. The goal is to guide users to >> refrain from constructing a new workload with DStream. We might want to >> go >> with this in future, but it would require a new discussion thread at that >> time. >> >> What do you think? >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >
Re: [DISCUSS] Deprecate DStream in 3.4
+1 On Thu, Jan 12, 2023 at 9:46 PM, Dongjoon Hyun < dongjoon.h...@gmail.com > wrote: > > +1 for the proposal (guiding only without any code change). > > > Thanks, > Dongjoon. > > On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu < zsxwing@ gmail. com ( > zsxw...@gmail.com ) > wrote: > > >> +1 >> >> >> >> On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das < tathagata. das1565@ gmail. >> com ( tathagata.das1...@gmail.com ) > wrote: >> >> >>> +1 >>> >>> >>> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon < gurwls223@ gmail. com ( >>> gurwls...@gmail.com ) > wrote: >>> >>> +1 On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim < kabhwan. opensource@ gmail. com ( kabhwan.opensou...@gmail.com ) > wrote: > bump for more visibility. > > On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < kabhwan. opensource@ > gmail. > com ( kabhwan.opensou...@gmail.com ) > wrote: > > >> Hi dev, >> >> >> I'd like to propose the deprecation of DStream in Spark 3.4, in favor of >> promoting Structured Streaming. >> (Sorry for the late proposal, if we don't make the change in 3.4, we will >> have to wait for another 6 months.) >> >> >> We have been focusing on Structured Streaming for years (across multiple >> major and minor versions), and during the time we haven't made any >> improvements for DStream. Furthermore, recently we updated the DStream >> doc >> to explicitly say DStream is a legacy project. >> https:/ / spark. apache. org/ docs/ latest/ streaming-programming-guide. >> html#note >> ( >> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note >> ) >> >> >> >> The baseline of deprecation is that we don't see a particular use case >> which only DStream solves. This is a different story with GraphX and >> MLLIB, as we don't have replacements for that. >> >> >> The proposal does not mean we will remove the API soon, as the Spark >> project has been making deprecation against public API. I don't intend to >> propose the target version for removal. The goal is to guide users to >> refrain from constructing a new workload with DStream. We might want to >> go >> with this in future, but it would require a new discussion thread at that >> time. >> >> >> What do you think? >> >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> > > >>> >>> >> >> > > smime.p7s Description: S/MIME Cryptographic Signature
Re: [DISCUSS] Deprecate DStream in 3.4
+1 for the proposal (guiding only without any code change). Thanks, Dongjoon. On Thu, Jan 12, 2023 at 9:33 PM Shixiong Zhu wrote: > +1 > > > On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das > wrote: > >> +1 >> >> On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon wrote: >> >>> +1 >>> >>> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim >>> wrote: >>> bump for more visibility. On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < kabhwan.opensou...@gmail.com> wrote: > Hi dev, > > I'd like to propose the deprecation of DStream in Spark 3.4, in favor > of promoting Structured Streaming. > (Sorry for the late proposal, if we don't make the change in 3.4, we > will have to wait for another 6 months.) > > We have been focusing on Structured Streaming for years (across > multiple major and minor versions), and during the time we haven't made > any > improvements for DStream. Furthermore, recently we updated the DStream doc > to explicitly say DStream is a legacy project. > > https://spark.apache.org/docs/latest/streaming-programming-guide.html#note > > The baseline of deprecation is that we don't see a particular use case > which only DStream solves. This is a different story with GraphX and > MLLIB, > as we don't have replacements for that. > > The proposal does not mean we will remove the API soon, as the Spark > project has been making deprecation against public API. I don't intend to > propose the target version for removal. The goal is to guide users to > refrain from constructing a new workload with DStream. We might want to go > with this in future, but it would require a new discussion thread at that > time. > > What do you think? > > Thanks, > Jungtaek Lim (HeartSaVioR) >
Re: [DISCUSS] Deprecate DStream in 3.4
+1 On Thu, Jan 12, 2023 at 5:08 PM Tathagata Das wrote: > +1 > > On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon wrote: > >> +1 >> >> On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim >> wrote: >> >>> bump for more visibility. >>> >>> On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> Hi dev, I'd like to propose the deprecation of DStream in Spark 3.4, in favor of promoting Structured Streaming. (Sorry for the late proposal, if we don't make the change in 3.4, we will have to wait for another 6 months.) We have been focusing on Structured Streaming for years (across multiple major and minor versions), and during the time we haven't made any improvements for DStream. Furthermore, recently we updated the DStream doc to explicitly say DStream is a legacy project. https://spark.apache.org/docs/latest/streaming-programming-guide.html#note The baseline of deprecation is that we don't see a particular use case which only DStream solves. This is a different story with GraphX and MLLIB, as we don't have replacements for that. The proposal does not mean we will remove the API soon, as the Spark project has been making deprecation against public API. I don't intend to propose the target version for removal. The goal is to guide users to refrain from constructing a new workload with DStream. We might want to go with this in future, but it would require a new discussion thread at that time. What do you think? Thanks, Jungtaek Lim (HeartSaVioR) >>>
Re: [DISCUSS] Deprecate DStream in 3.4
+1 On Thu, Jan 12, 2023 at 7:46 PM Hyukjin Kwon wrote: > +1 > > On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim > wrote: > >> bump for more visibility. >> >> On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> Hi dev, >>> >>> I'd like to propose the deprecation of DStream in Spark 3.4, in favor of >>> promoting Structured Streaming. >>> (Sorry for the late proposal, if we don't make the change in 3.4, we >>> will have to wait for another 6 months.) >>> >>> We have been focusing on Structured Streaming for years (across multiple >>> major and minor versions), and during the time we haven't made any >>> improvements for DStream. Furthermore, recently we updated the DStream doc >>> to explicitly say DStream is a legacy project. >>> >>> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note >>> >>> The baseline of deprecation is that we don't see a particular use case >>> which only DStream solves. This is a different story with GraphX and MLLIB, >>> as we don't have replacements for that. >>> >>> The proposal does not mean we will remove the API soon, as the Spark >>> project has been making deprecation against public API. I don't intend to >>> propose the target version for removal. The goal is to guide users to >>> refrain from constructing a new workload with DStream. We might want to go >>> with this in future, but it would require a new discussion thread at that >>> time. >>> >>> What do you think? >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>
Re: [DISCUSS] Deprecate DStream in 3.4
+1 On Fri, 13 Jan 2023 at 08:51, Jungtaek Lim wrote: > bump for more visibility. > > On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> Hi dev, >> >> I'd like to propose the deprecation of DStream in Spark 3.4, in favor of >> promoting Structured Streaming. >> (Sorry for the late proposal, if we don't make the change in 3.4, we will >> have to wait for another 6 months.) >> >> We have been focusing on Structured Streaming for years (across multiple >> major and minor versions), and during the time we haven't made any >> improvements for DStream. Furthermore, recently we updated the DStream doc >> to explicitly say DStream is a legacy project. >> https://spark.apache.org/docs/latest/streaming-programming-guide.html#note >> >> The baseline of deprecation is that we don't see a particular use case >> which only DStream solves. This is a different story with GraphX and MLLIB, >> as we don't have replacements for that. >> >> The proposal does not mean we will remove the API soon, as the Spark >> project has been making deprecation against public API. I don't intend to >> propose the target version for removal. The goal is to guide users to >> refrain from constructing a new workload with DStream. We might want to go >> with this in future, but it would require a new discussion thread at that >> time. >> >> What do you think? >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >
Re: [DISCUSS] Deprecate DStream in 3.4
bump for more visibility. On Wed, Jan 11, 2023 at 12:20 PM Jungtaek Lim wrote: > Hi dev, > > I'd like to propose the deprecation of DStream in Spark 3.4, in favor of > promoting Structured Streaming. > (Sorry for the late proposal, if we don't make the change in 3.4, we will > have to wait for another 6 months.) > > We have been focusing on Structured Streaming for years (across multiple > major and minor versions), and during the time we haven't made any > improvements for DStream. Furthermore, recently we updated the DStream doc > to explicitly say DStream is a legacy project. > https://spark.apache.org/docs/latest/streaming-programming-guide.html#note > > The baseline of deprecation is that we don't see a particular use case > which only DStream solves. This is a different story with GraphX and MLLIB, > as we don't have replacements for that. > > The proposal does not mean we will remove the API soon, as the Spark > project has been making deprecation against public API. I don't intend to > propose the target version for removal. The goal is to guide users to > refrain from constructing a new workload with DStream. We might want to go > with this in future, but it would require a new discussion thread at that > time. > > What do you think? > > Thanks, > Jungtaek Lim (HeartSaVioR) >
[DISCUSS] Deprecate DStream in 3.4
Hi dev, I'd like to propose the deprecation of DStream in Spark 3.4, in favor of promoting Structured Streaming. (Sorry for the late proposal, if we don't make the change in 3.4, we will have to wait for another 6 months.) We have been focusing on Structured Streaming for years (across multiple major and minor versions), and during the time we haven't made any improvements for DStream. Furthermore, recently we updated the DStream doc to explicitly say DStream is a legacy project. https://spark.apache.org/docs/latest/streaming-programming-guide.html#note The baseline of deprecation is that we don't see a particular use case which only DStream solves. This is a different story with GraphX and MLLIB, as we don't have replacements for that. The proposal does not mean we will remove the API soon, as the Spark project has been making deprecation against public API. I don't intend to propose the target version for removal. The goal is to guide users to refrain from constructing a new workload with DStream. We might want to go with this in future, but it would require a new discussion thread at that time. What do you think? Thanks, Jungtaek Lim (HeartSaVioR)